Metadata-Version: 2.1
Name: NLPurify
Version: 2.0.0a0
Summary: Text cleaning and feature extractions using NLP, Traditional approach.
Home-page: https://github.com/sharkutilities/NLPurify
Author: shark-utilities developers
Author-email: neuralNOD@outlook.com
Project-URL: Issue Tracker, https://github.com/sharkutilities/NLPurify/issues
Project-URL: Org. Homepage, https://github.com/sharkutilities
Keywords: nlp,text-cleaning,nlp-cleaning,llm,utility,utilities,util,utils,functions,wrappers,data science,data analysis,data scientist,data analyst
Classifier: Development Status :: 1 - Planning
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: Unix
Classifier: Operating System :: POSIX
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE

<h1 align = "center">
  <img alt = "favicon" src = "https://cdn-icons-png.flaticon.com/512/10306/10306116.png" height = 125px><br>
  NLPurify
</h1>

<div align = "center">

[![Documentation Status](https://readthedocs.org/projects/nlpurify/badge/?version=latest&style=plastic)](https://nlpurify.readthedocs.io/en/latest/?badge=latest)
[![GitHub Issues](https://img.shields.io/github/issues/sharkutilities/NLPurify?style=plastic)](https://github.com/sharkutilities/NLPurify/issues)
[![GitHub Forks](https://img.shields.io/github/forks/sharkutilities/NLPurify?style=plastic)](https://github.com/sharkutilities/NLPurify/network)
[![GitHub Stars](https://img.shields.io/github/stars/sharkutilities/NLPurify?style=plastic)](https://github.com/sharkutilities/NLPurify/stargazers)
[![LICENSE File](https://img.shields.io/github/license/sharkutilities/NLPurify?style=plastic)](https://github.com/sharkutilities/NLPurify/blob/master/LICENSE)
[![PyPI - Downloads](https://img.shields.io/pypi/dm/NLPurify?style=plastic)](https://pypistats.org/packages/pandas-wizard)
[![PyPI Latest Release](https://img.shields.io/pypi/v/NLPurify.svg?style=plastic)](https://pypi.org/project/NLPurify/)

[![GuardRails badge](https://api.guardrails.io/v2/badges/252951?token=2e1d82f6a737cdd3151ea0c869ee61c86196c3a05d17b0d91bf5a032e7766dc0)](https://dashboard.guardrails.io/gh/sharkutilities/repos/252951)

</div>

<div align = "justify">

A text cleaning and extraction engine was developed using a combination of traditional techniques like Unicode translations,
cleaning using regular expressions, and modern tools like "natural language processing" and "large language models" to
detect and clean long texts and create word vectors.

## Getting Started

The source code is hosted at GitHub: [**sharkutilities/NLPurify**](https://github.com/sharkutilities/NLPurify).
The binary installers for the latest release are available at the [Python Package Index (PyPI)](https://pypi.org/project/NLPurify/).

```bash
pip install -U NLPurify
```

The module is currently under development, and new ideas are welcomed. Raise a new PR/issue for the same.
The changes between each release are available [here](./CHANGELOG.md).

</div>

---

> [!CAUTION]
> **This code depreciates the existing GitHub Gist which was previously designed.**
> Check [`#1`](https://github.com/sharkutilities/NLPurify/issues/1) for more details.

> [!NOTE]
> **_Legacy_ codes are available as a submodule.**
> Check [`#5`](https://github.com/sharkutilities/NLPurify/issues/5) for more details.
