Metadata-Version: 2.1
Name: calamanCy
Version: 0.1.0
Summary: NLP Pipelines for Tagalog
Author-email: "Lj V. Miranda" <ljvmiranda@gmail.com>
License: MIT License
Project-URL: Repository, https://github.com/ljvmiranda921/calamanCy
Project-URL: Bug Tracker, https://github.com/ljvmiranda921/calamanCy/issues
Project-URL: Release Notes, https://github.com/ljvmiranda921/calamanCy/releases
Keywords: nlp,natural language processing,language technology,tagalog
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: spacy (>=3.5.0)
Requires-Dist: wasabi (>=0.9.1)
Requires-Dist: typer (>=0.4.2)
Provides-Extra: dev
Requires-Dist: black (>=23.1.0) ; extra == 'dev'
Requires-Dist: isort (>=5.0.0) ; extra == 'dev'
Requires-Dist: ruff (>=0.0.272) ; extra == 'dev'
Requires-Dist: mypy (>=1.0.0) ; extra == 'dev'
Requires-Dist: pytest ; extra == 'dev'

<img src="https://raw.githubusercontent.com/ljvmiranda921/calamanCy/master/logo.png" width="125" height="125" align="right" />

# calamanCy: NLP pipelines for Tagalog

**calamanCy** is a Tagalog natural language preprocessing framework made with
[spaCy](https://spacy.io). Its goal is to provide pipelines and datasets for
downstream NLP tasks. This repository contains material for using calamanCy,
reproduction of results, and guides on usage. 

> calamanCy takes inspiration from other language-specific [spaCy Universe frameworks](https://spacy.io/universe) such as 
> [DaCy](https://github.com/centre-for-humanities-computing/DaCy), [huSpaCy](https://github.com/huspacy/huspacy),
> and [graCy](https://github.com/jmyerston/graCy). The name is based from [*calamansi*](https://en.wikipedia.org/wiki/Calamansi),
> a citrus fruit native to the Philippines and used in traditional Filipino cuisine.

## 🔧 Installation

To get started with calamanCy, simply install it using `pip` by running the
following line in your terminal:

```sh
pip install calamanCy
``` 

### Development

If you are developing calamanCy, first clone the repository:

```sh
git clone git@github.com:ljvmiranda921/calamanCy.git
```

Then, create a virtual environment and install the dependencies:

```sh
python -m venv venv
venv/bin/pip install -e .  # requires pip>=23.0
venv/bin/pip install .[dev]

# Activate the virtual environment
source venv/bin/activate
```

or alternatively, use `make dev`.


## 👩‍💻 Usage

To use calamanCy you first have to download either the medium, large, or
transformer model. To see a list of all available models, run:

```python
import calamancy
from model in calamancy.models():
    print(model)

# ..
# tl_calamancy_md-0.1.0
# tl_calamancy_lg-0.1.0
# tl_calamancy_trf-0.1.0
```

To download and load a model, run:

```python
nlp = calamancy.load("tl_calamancy_md-0.1.0")
doc = nlp("Ako si Juan de la Cruz")
```

The `nlp` object is an instance of spaCy's [`Language`
class](https://spacy.io/api/language) and you can use it as any other spaCy
pipeline. 

## 📦 Models and Datasets

calamanCy provides Tagalog models and datasets that you can use in your spaCy
pipelines. You can download them directly or use the `calamancy` Python library
to access them. The training procedure for each pipeline can be found in the
`models/` directory. They are further subdivided into versions. Each folder is
an instance of a [spaCy project](https://spacy.io/usage/projects).
