Metadata-Version: 2.1
Name: cardscraper
Version: 0.0.1
Summary: A tool for generating Anki cards by web scraping
Project-URL: Homepage, https://github.com/sakhezech/cardscraper
License: MIT License
        
        Copyright (c) 2023 Sakhezech
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Requires-Dist: genanki
Requires-Dist: playwright
Requires-Dist: pyyaml
Description-Content-Type: text/markdown

# cardscraper

A tool for generating Anki cards by web scraping

## Installation

### with pip

```
python3 -m pip install cardscraper
```

### with [pipx](https://pypa.github.io/pipx/)

```
pipx install --include-deps cardscraper
```

## [Playwright](https://github.com/microsoft/playwright)

cardscraper uses Playwright for scraping by default; you will need to install
Chromium

```
playwright install chromium
```

## Usage

`cardscraper ...` or `python3 -m cardscraper ...`

cardscraper has 3 main subcommands:

- `cardscraper gen` - takes in [yaml instruction files](#yaml-instruction-file)
  and generates Anki packages
- `cardscraper init` - generates [yaml instruction file](#yaml-instruction-file)
  templates
- `cardscraper list` - lists all available module implementations

and you can always use `cardscraper <subcommand> -h`

I recommend doing something like:

1. `cardscraper init hello.yaml`
2. edit the file to suit your needs
3. `cardscraper gen hello.yaml`

## YAML instruction file

```yaml
# Meta defines which functions will take care of each step
# Get list of available implementations from 'cardscraper list'
meta:
  package: default
  deck: default
  model: default
  scraping: default

package:
  # Output package name
  name: sample_package
  # Output folder
  output_path: ./out/
  # Path to the media folder where 'all to include' media files are
  # Defaults to null
  media: null

deck:
  # Deck name
  name: Countries
  # Deck ID
  id: 84269713

model:
  # Model name
  name: Countries Model
  # Model ID
  id: 97138426
  # CSS styling
  css: |
    * {
        color: #333;
        background-color: #fffffa;
    }
    .q, .a {
        text-align: center;
    }
    .q {
        font-size: 5rem;
        font-weight: 700;
    }
    .a {
        font-size: 3rem;
    }
  # Templates
  templates:
    CountryToInfo: # Template name
      # Front template
      # Note that you can use query output by {{QueryName}}
      qfmt: |
        <div class='q'>
            {{Country}}
        </div>
      # Back template
      afmt: |
        {{FrontSide}}
        <hr id=answer>
        <div class='a'>
            {{Info}}
            <a href="https://en.wikipedia.org/w/index.php?search={{Country}}">
                more info
            </a>
        </div>
    CapitalToCountry:
      qfmt: |
        <div class='q'>
            {{Capital}}
        </div>
      afmt: |
        {{FrontSide}}
        <hr id=answer>
        <div class='a'>
            {{Country}}<br>
            <a href="https://en.wikipedia.org/w/index.php?search={{Capital}}">
                more info
            </a>
        </div>

scraping:
  # List of URLs to scrape
  urls:
    - https://www.scrapethissite.com/pages/simple/
  # Queries to run
  # Each child query runs inside the parent
  queries:
    CountryEntry: # Query name
      # What to query for
      query: .country
      # Should we select all elements?
      # (querySelector or querySelectorAll)
      # Defaults to false
      all: true
      # JS function to evaluate the selected element(s)
      # Defaults to (e) => e.innerText
      eval: (e) => e.innerText
      # Python regex
      # If set, catches the first group with re.DOTALL enabled
      # Defaults to null
      regex: null
      # Result formatting
      # 'Hello my name is {}'.format(...)
      # Defaults to '{}'
      format: "{}"
      # Queries to run inside the selected element(s)
      # Defaults to null
      children:
        Country:
          query: .country-name
          # all: false
          # eval: (e) => e.innerText
          # regex: null
          # format: '{}'
          # children: null
        Info:
          query: .country-info
          # all: false
          eval: (e) => e.innerHTML
          # regex: null
          # format: '{}'
          # children: null
        Capital:
          query: .country-capital
          # all: false
          # eval: (e) => e.innerText
          # regex: null
          # format: '{}'
          # children: null
```

## Plugin system

You can add custom implementations by exposing 'cardscraper.x' entry point in
your package

```toml
[project.entry-points.'cardscraper.model']
my_impl = 'mypackage:gen_model'
[project.entry-points.'cardscraper.scraping']
my_impl = 'mypackage:gen_notes'
[project.entry-points.'cardscraper.deck']
my_impl = 'mypackage:gen_deck'
[project.entry-points.'cardscraper.package']
my_impl = 'mypackage:gen_package'
```
