Metadata-Version: 2.1
Name: TakeSentenceTokenizer
Version: 0.0.1
Summary: TakeSentenceTokenizer is a tool for tokenizing and pre processing messages
Home-page: UNKNOWN
Author: Karina Tiemi Kato
Author-email: karinat@take.net
License: UNKNOWN
Keywords: Tokenization
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
Requires-Dist: emoji (==0.5.1)

# TakeSentenceTokenizer

TakeSentenceTokenizer is a tool for pre processing and tokenizing sentences. 
The package is used:
	- to convert the first letter of the sentence to lower case
	- replace words for placeholders: laugh, date, time, ddd, measures (10kg, 20m, 5gb, etc), code, phone number, cnpj, cpf, email, money, url, number (ordinal and cardinal)
	- remove emoji
	- tokenize the sentence

## Installation

Use the package manager [pip](https://pip.pypa.io/en/stable/) to install TakeSentenceTokenizer

```bash
pip install TakeSentenceTokenizer
```

## Usage

```python
import SentenceTokenizer as st

```

## Author
Karina Tiemi Kato

## License
[MIT](https://choosealicense.com/licenses/mit/)

