Metadata-Version: 2.1
Name: LexiLang
Version: 1.0.4
Summary: Simple, fast dictionary-based language detector
Home-page: https://github.com/LibreTranslate/LexiLang
Author: Piero Toffanin
Author-email: pt@masseranolabs.com
Maintainer: Piero Toffanin
Maintainer-email: pt@masseranolabs.com
License: AGPLv3
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Intended Audience :: Developers
Classifier: Topic :: Text Processing :: Linguistic
Description-Content-Type: text/markdown
License-File: LICENSE

# LexiLang

Simple, fast dictionary-based language detector for short texts.

## Installation

```bash
pip install lexilang
```

## Usage

```python
from lexilang.detector import detect

print(detect("bonjour")) # ('fr', 0.45)
print(detect("学中文")) # ('zh', 0.45)
print(detect("ciao mondo")) # ('it', 0.9)
print(detect("El gato doméstico")) # ('es', 0.45)

# Optionally, specify a subset of languages to consider
print(detect("ciao", languages=["de", "ro"])) # ('de', 0.45)
```

`detect(text, languages=[])` -> tuple (`iso_639_1`, `confidence`)

## Supported Languages

 * Afrikaans
 * Albanian
 * Arabic
 * Basque
 * Bengali
 * Bulgarian
 * Catalan
 * Chinese
 * Czech
 * Danish
 * Dutch
 * English
 * Esperanto
 * Estonian
 * Finnish
 * French
 * German
 * Greek
 * Hebrew
 * Hindi
 * Hungarian
 * Indonesian
 * Italian
 * Japanese
 * Kabyle
 * Kazakh
 * Korean
 * Latvian
 * Lithuanian
 * Macedonian
 * Norwegian
 * Occitan
 * Polish
 * Portuguese
 * Romanian
 * Russian
 * Serbian
 * Slovak
 * Slovenian
 * Spanish
 * Swedish
 * Thai
 * Turkish
 * Ukrainian
 * Vietnamese
 * Farsi

## Limitations

This detector was designed for handling small texts (< 20 characters). It will probably not work reliably for longer text sequences. As it relies on dictionaries, if a word is missing or mispelled, the detection will fail.

## Contributing

If you want to add a new language, or improve an existing one, add more words to the respective dictionary in the `dictionaries` folder.

## License

AGPLv3
