Metadata-Version: 2.4
Name: auto-ehrmonize-flowsheet
Version: 0.1.0
Summary: Clinical flowsheet label harmonization using sentence embeddings and vector search.
License: MIT
Keywords: ehr,clinical,harmonization,flowsheet,embeddings,healthcare
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Healthcare Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: chromadb
Requires-Dist: h5py
Requires-Dist: torch
Requires-Dist: transformers
Requires-Dist: sentence-transformers
Requires-Dist: huggingface-hub
Requires-Dist: python-dotenv
Requires-Dist: scikit-learn
Dynamic: license-file

# auto-ehrmonize-flowsheet

Clinical flowsheet label harmonization using sentence embeddings and vector search.

## Install

```bash
pip install auto-ehrmonize-flowsheet
```

## One-time setup

Download the vector database (ChromaDB + HDF5 value embeddings) from HuggingFace
into `~/.auto_ehrmonize_flowsheet/`:

```bash
ehrmonize setup --token <HF_TOKEN> --repo-id <user/repo>
```

Alternatively, set `HUGGING_FACE_KEY` and `HF_REPO_ID` as environment variables
(or in a `.env` file in your working directory) and just run `ehrmonize setup`.

## Quick start (Python)

```python
from auto_ehrmonize_flowsheet import AutoEHRmonizeFlowsheet

harmonizer = AutoEHRmonizeFlowsheet("data/test_data.csv")
results = harmonizer.harmonize("Heart Rate")
print(results)
```

## CLI

```bash
# Look up a single label
ehrmonize lookup "Heart Rate" --data data/test_data.csv

# With options
ehrmonize lookup "Heart Rate" --data data/test_data.csv --source mimic --threshold 0.6 --top-k 5

# List unique labels in your dataset
ehrmonize labels --data data/test_data.csv
```

## License

MIT
