Metadata-Version: 2.4
Name: lethe-cli
Version: 0.2.0
Summary: Data anonymization CLI tool
Author: Marco Kotrotsos
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Requires-Dist: faker<35,>=25.0
Requires-Dist: pandas<3,>=2.0
Requires-Dist: presidio-analyzer<3,>=2.2
Requires-Dist: presidio-anonymizer<3,>=2.2
Requires-Dist: rich<14,>=13.0
Requires-Dist: spacy<4,>=3.7
Requires-Dist: typer[all]<1,>=0.12
Provides-Extra: dev
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Provides-Extra: trf
Requires-Dist: spacy[transformers]; extra == 'trf'
Description-Content-Type: text/markdown

# Lethe

Pseudo-anonymization CLI for structured files and SQL dumps. Detect and replace PII in CSV, TSV, plain text, and SQL dump files using Presidio and spaCy NER, with Faker-generated replacements that stay consistent across your dataset.

Lethe performs **pseudo-anonymization**: PII is replaced with realistic fake values, preserving data structure and relationships. This is different from true anonymization, which irreversibly removes personal data. See [Architecture docs](docs/architecture.md#anonymization-vs-pseudo-anonymization) for the full distinction and GDPR implications.

## Install

```bash
pip install lethe-cli
python -m spacy download en_core_web_trf
```

For a faster, lighter model instead of the transformer:

```bash
python -m spacy download en_core_web_sm
```

## Usage

### Anonymize

Replace detected PII with consistent fake values:

```bash
lethe anonymize data.csv -o anonymized.csv
lethe anonymize data.csv --model sm --threshold 0.7
lethe anonymize notes.txt -o clean.txt --locale nl_NL
```

### Multiply

Generate synthetic rows from an existing dataset:

```bash
lethe multiply data.csv --factor 5 -o expanded.csv
lethe multiply data.csv --factor 10 --sanitize --seed 42
```

## Options

Run `lethe anonymize --help` or `lethe multiply --help` for the full list of options.

## License

MIT
