Metadata-Version: 2.4
Name: sentify
Version: 1.0.1
Summary: Converter from urls,pdfs,wikipages to clean text document one sentence per line.
Home-page: https://github.com/ptarau/sentify.git
Author: Paul Tarau
Author-email: ptarau@gmail.com
License: MIT
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pdfminer.six
Requires-Dist: smart_open
Requires-Dist: Wikipedia-API
Requires-Dist: twine
Requires-Dist: pysbd
Requires-Dist: python-docx
Requires-Dist: openpyxl
Requires-Dist: pandas
Dynamic: author
Dynamic: author-email
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: requires-dist
Dynamic: summary

### sentify is a simple and fast open source Python toolkit that aggregates in one step the tedious task of fetching, converting to  text and segmenting  documents into one sentence per line clean text files


I put it together thinking that it is an often unavoidable "stepping stone"  for getting quickly to the really interesting NLP and AI tasks we care about these days.

The collected clean sentences are ready for NLP and ML tasks, including passing them to Generative AI  for summarization, relation extraction and QA.

It  handles local and remote txt and pdf files and urls as well as Wikipedia pages given by their title.

See code at 

https://github.com/ptarau/sentify/blob/main/sentify/main.py 

for the simple, all in one API.

Get it from [github](https://github.com/ptarau/sentify) or fetch it from [pypi](https://pypi.org/project/sentify/) with

```
pip3 install sentify
```

See tests/tests.py for testing out the API on several use cases.

Enjoy,

Paul Tarau

January, 2024
