Metadata-Version: 2.1
Name: bert-deid
Version: 0.2.3
Summary: Remove identifiers from data using BERT
Home-page: https://github.com/alistairewj/bert-deid
Author: Alistair Johnson
Author-email: aewj@mit.edu
License: Apache 2.0
Platform: UNKNOWN
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: nltk (>=3.4.5)
Requires-Dist: mpmath (>=1.1.0)
Requires-Dist: numpy (>=1.19.2)
Requires-Dist: pandas (>=1.1.3)
Requires-Dist: pytest (>=4.2.0)
Requires-Dist: pytorch (>=1.6.0)
Requires-Dist: scikit-learn (>=0.23.2)
Requires-Dist: spacy (>=2.3.2)
Requires-Dist: sympy (>=1.6.2)
Requires-Dist: tqdm (>=4.32.1)
Requires-Dist: regex (>=2020.10.23)
Requires-Dist: transformers (>=3.4.0)
Requires-Dist: tokenizers (>=0.9.2)
Requires-Dist: stanfordnlp (>=0.2.0)
Requires-Dist: google-cloud-storage (>=1.32.0)

# bert-deid

Code to fine-tune BERT on a medical note de-identification task.

## Install

* **(Recommended)** Create an environment called `deid`
    * `conda env create -f environment.yml`
<!-- * conda: `conda install bert-deid` -->
* pip install locally
    * `pip install bert-deid`

## Download

To download the model, we have provided a helper script in bert-deid:

```sh
# note: MODEL_DIR environment variable used by download
# by default, we download to bert_deid_model in the current directory
export MODEL_DIR="bert_deid_model"
bert_deid download
```

## Usage

The model can be imported and used directly within Python.

```python
from bert_deid.model import Transformer

# load in a trained model
model_path = 'bert_deid_model'
deid_model = Transformer(model_path)

with open('tests/example_note.txt', 'r') as fp:
    text = ''.join(fp.readlines())

print(deid_model.apply(text, repl='___'))

# we can also get the original predictions
preds = deid_model.predict(text)

# print out the identified entities
for p, pred in enumerate(preds):
    prob = pred[0]
    label = pred[1]
    start, stop = pred[2:]

    # print the prediction labels out
    print(f'{text[start:stop]:15s} {label} ({prob:0.3f})')
```

