Metadata-Version: 2.1
Name: UzbekLemma
Version: 1.2
Summary: Finds the lemma of Uzbek words
Home-page: https://github.com/ddasturbek/UzbekLemma
Author-email: sobirovogabek0409@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown

# Authors

**Author1: [Maksud Sharipov](https://github.com/MaksudSharipov)**

**Author2: [Dasturbek](https://github.com/ddasturbek)**

# Lemma & Lemmatization
The package finds lemmas of Uzbek words based on the dictionary.

The process of finding a lemma is called lemmatization.

There are 4 different ways of lemmatization: rule, dictionary, model, hybrid.

It is dictionary-based lemmatization algorithm [program, package].

# Install & Clone

```bash
pip install UzbekLemma
```

```bash
git clone https://github.com/ddasturbek/UzbekLemma.git
```

# Usage

```Python
import UzbekLemma as UL

print(UL.lemmatize("kelganlar")) #kelmoq
```


# The algorithm flowchart

<img alt="Flowchart algorithm" src="https://github.com/user-attachments/assets/6504ee82-e98f-46ac-9b09-6dd811809be0"/>

# The dictionary structure

<img alt="soz_turkumlari" src="https://github.com/ddasturbek/UzbekLemma/assets/76460501/f9d9b0bd-6549-48cc-91d5-b10b208681b7"/>

# Scientific field

<img alt="Certificate" src="https://github.com/user-attachments/assets/16da0619-5d75-4d46-99e5-a4b3b828e7d7"/>

# Patent

<img alt="image" src="https://github.com/user-attachments/assets/2293c61b-b200-4a46-8433-59f7bd8928b5"/>

# Some results of the program

<img alt="image" src="https://github.com/ddasturbek/UzbekLemma/assets/76460501/2f9455a0-ebff-4677-b947-3cbfbd46bdf4"/>

# Corpus & Results
We collected an equal number of texts from 23 different fields and stored them as a [corpus](https://github.com/ddasturbek/UzbekLemma/tree/main/Corpus).

We tested all the files (i.e. corpora) in the program and got these [results](https://github.com/ddasturbek/UzbekLemma/tree/main/Results).
