Metadata-Version: 2.4
Name: datapreprocessorbv
Version: 0.3.0
Summary: A simple and powerful data preprocessing library for cleaning datasets
Home-page: https://github.com/yourusername/datacleaner
Author: Bharathan
Author-email: bharathantrk2001@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: pandas>=1.5
Requires-Dist: numpy>=1.21
Requires-Dist: scikit-learn>=1.2
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# datacleanbv — Advanced Data Cleaning Library

A config-driven data preprocessing library covering validation, null handling, outlier detection, scaling, and encoding.

## Install

```bash
pip install -r requirements.txt
pip install -e .
```

## Quick Start

```python
import pandas as pd
from datacleaner import validate, replace_nulls, zscore, standard_scaler, encode, dtypeconversion

df = pd.read_csv("data.csv")

config = {
    "fixes":   {"Price": {"method": "clip", "min": 0, "max": 10000}},
    "missing": {"Price": "median", "Category": "mode"},
    "zscore":  {"Price": {"threshold": 2.5, "action": "cap"}},
    "scaling": {"Price": "standard"},
    "encoding":{"Category": "onehot"},
}

df = validate(df, config)
df = replace_nulls(df, config)
df = zscore(df, config)
df = standard_scaler(df, config)
df = encode(df, config)
df = dtypeconversion(df)
```

## Print usage hints

```python
from datacleaner import structure, functionName
structure()      # prints a full config template
functionName()   # prints the recommended import block
```

## Run tests

```bash
pytest tests/ -v
```

## Pipeline order

```
validate → replace_nulls → zscore / isolation_forest → standard_scaler → encode → dtypeconversion
```

## License

MIT
