Metadata-Version: 2.1
Name: ankipan
Version: 0.2
Summary: A language learning utility with Anki integration
Author-email: Daniel Otto de Mentock <daniel.mentock@gmail.com>
Project-URL: repository, https://gitlab.com/ankipan/ankipan
Classifier: Intended Audience :: Education
Classifier: Topic :: Education
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: beautifulsoup4
Requires-Dist: ipython
Requires-Dist: ipywidgets
Requires-Dist: langdetect
Requires-Dist: lxml
Requires-Dist: numpy
Requires-Dist: pysubs2
Requires-Dist: requests
Requires-Dist: python-dotenv

# Ankipan

Ankipan is a language learning utility which systematically tracks the words you already know or are currently learning, and allows you to parse any source you are personally interested in (text, subtitles, websites, lyrics etc.) for the most relevant new words.

New words are internally stored and converted to Anki Flashcards, which contain customizable content such as scraped dictionary definitions and example sentences from different sources.

Currently supported languages are japanese (`jp`), german (`de`), french (`fr`) and english (`en`).

## Getting started

### 1. Prerequisites

- Download and install anki from https://apps.ankiweb.net/
- Create an account on their website
- Install the ankiconnect plugin from https://ankiweb.net/shared/info/2055492159
- Open the app and login, keep anki open when syncing databases (in anki, open Tools -> Add Ons -> Get Add-Ons -> paste code 2055492159)

### 2. Installation

```bash
# Clone the repository
git clone git@gitlab.com:ankipan/ankipan.git
cd ankipan

# Install dependencies:
python -m pip install -r requirements.txt

```

### 3. (Optional) Install lemmatizers to parse your own texts

- Download pytorch from https://pytorch.org/get-started/locally/ (for stanza lemma parsing)
- install dependencies:

```bash
 pip install stanza hanta
```
-  Select language, currently supported are 'jp, 'de', 'fr', 'en', see https://stanfordnlp.github.io/stanza/performance.html

```bash
 python -c "import stanza; stanza.download('jp')"
```

## Usage

See interactive notebook in `/examples`

```python
# Create a new collection with your name, learning language and native language
from ankipan import Collection
collection = Collection('testcollection', source_lang='jp', native_lang='en')

# Specify content to be downloaded for flashcards (see collection.get_available_sources() for example sentences and scraper.py module)
collection.set_flashcard_fields(definitions=['jisho', 'wadoku', 'wikitionary_en', 'wikitionary_jp'],
                                    sentence_sources=['lyrics', 'youtube'])

# Specify a source the words of which you would like to add to your deck, either directly as string, as path to file or folder, or directly by source name (see source names from collection.get_available_sources())
words = collection.collect(string='本音言う無邪気なペース')
# words = collection.collect('example_text_jp.txt')
# words = collection.collect(source_name='Tatsuya-kitani')

# Select the words you already know and the words you would like to learn from the table overview
words.select_new_words()

# Add words to collection
collection.add_deck(words, 'example_source')

# Optional: Persist collection state to harddrive (see /'.data' folder)
collection.save()

# Download content for new cards (also autosaves collection to drive)
collection.fetch('example_source')

# Sync current collection with anki to upload them to currently open anki instance
collection.sync_with_anki('testsource')

```

## Notes

- Current lemmatization is done via the `stanza` library in the reader.py module. While this works mostly fine, the library still just uses a statistical model to estimate the likely word roots (lemmas) of the different pieces of sentences. It sometimes makes mistakes, which requires the users to manually filter them in the `select_new_words` overview, or suspend the card later on in anki.

- The translation engine running on the server has a limited quota. Once it has been exceeded for the day, users will have to specify their own API key which is then locally used for translations. (TODO)
