Metadata-Version: 2.1
Name: ROCFunctions
Version: 0.1.1
Summary: Receiver Operating Characteristic (ROC) functions package.
Home-page: https://github.com/antononcube/Python-packages/tree/main/ROCFunctions
Author: Anton Antonov
Author-email: antononcube@posteo.net
Keywords: roc,receiver operating characteristic,classifier,classification,ml,machine learning
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: BSD License
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE

# ROCFunctions basic usage

This repository has the code of a Python package for
[Receiver Operating Characteristic (ROC)](https://en.wikipedia.org/wiki/Receiver_operating_characteristic)
functions.

The ROC framework is used for analysis and tuning of binary classifiers, [Wk1].
(The classifiers are assumed to classify into a positive/true label or a negative/false label. )

For computational introduction to ROC utilization (in Mathematica) see the article
["Basic example of using ROC with Linear regression"](https://mathematicaforprediction.wordpress.com/2016/10/12/basic-example-of-using-roc-with-linear-regression/)
,
[AA1].

The examples below use the package
["RandomDataGenerators"](https://pypi.org/project/RandomDataGenerators/),
[AA2].

-------

## Installation

From PyPI.org:

```shell
python3 -m pip install ROCFunctions
```

------

## Usage examples

### Properties

Here are some retrieval functions:


```python
import pandas
from ROCFunctions import *
print(roc_functions("properties"))
```

    ['FunctionInterpretations', 'FunctionNames', 'Functions', 'Methods', 'Properties']



```python
print(roc_functions("FunctionInterpretations"))
```

    {'TPR': 'true positive rate', 'TNR': 'true negative rate', 'SPC': 'specificity', 'PPV': 'positive predictive value', 'NPV': 'negative predictive value', 'FPR': 'false positive rate', 'FDR': 'false discovery rate', 'FNR': 'false negative rate', 'ACC': 'accuracy', 'AUROC': 'area under the ROC curve', 'FOR': 'false omission rate', 'F1': 'F1 score', 'MCC': 'Matthews correlation coefficient', 'Recall': 'same as TPR', 'Precision': 'same as PPV', 'Accuracy': 'same as ACC', 'Sensitivity': 'same as TPR'}



```python
print(roc_functions("FPR"))
```

    <function FPR at 0x7f7612f48050>


### Single ROC record

**Definition:** A
ROC record (ROC-dictionary, or ROC-hash, or ROC-hash-map) is an associative object that has the keys:
"FalseNegative", "FalsePositive", "TrueNegative", "TruePositive".Here is an example:


```python
{"FalseNegative": 50, "FalsePositive": 51, "TrueNegative": 60, "TruePositive": 39}
```




    {'FalseNegative': 50,
     'FalsePositive': 51,
     'TrueNegative': 60,
     'TruePositive': 39}



Here we generate a random "dataset" with columns "Actual" and "Predicted" that have the values
"true" and "false"and show the summary:


```python
from RandomDataGenerators import *

dfRandomLabels = random_data_frame(200, ["Actual", "Predicted"],
                                   generators={"Actual": ["true", "false"],
                                               "Predicted": ["true", "false"]})
dfRandomLabels.shape
```




    (200, 2)



Here is a sample of the dataset:



```python
print(dfRandomLabels[:4])
```

      Actual Predicted
    0  false     false
    1  false     false
    2  false     false
    3   true     false


Here we make the corresponding ROC dictionary:


```python
to_roc_dict('true', 'false',
            list(dfRandomLabels.Actual.values),
            list(dfRandomLabels.Predicted.values))
```




    {'TruePositive': 52,
     'FalsePositive': 48,
     'TrueNegative': 50,
     'FalseNegative': 50}



### Multiple ROC records

Here we make random dataset with entries that associated with a certain threshold parameter with three unique values:


```python
dfRandomLabels2 = random_data_frame(200, ["Threshold", "Actual", "Predicted"],
                                    generators={"Threshold": [0.2, 0.4, 0.6],
                                                "Actual": ["true", "false"],
                                                "Predicted": ["true", "false"]})
```

**Remark:** Threshold parameters are typically used while tuning Machine Learning (ML) classifiers. Here we find and print the ROC records(dictionaries) for each unique threshold value:



```python
thresholds = list(dfRandomLabels2.Threshold.drop_duplicates())

rocGroups = {}
for x in thresholds:
    dfLocal = dfRandomLabels2[dfRandomLabels2["Threshold"] == x]
    rocGroups[x] = to_roc_dict('true', 'false',
                        list(dfLocal.Actual.values),
                        list(dfLocal.Predicted.values))

rocGroups
```




    {0.4: {'TruePositive': 13,
      'FalsePositive': 23,
      'TrueNegative': 24,
      'FalseNegative': 12},
     0.2: {'TruePositive': 18,
      'FalsePositive': 11,
      'TrueNegative': 19,
      'FalseNegative': 18},
     0.6: {'TruePositive': 23,
      'FalsePositive': 9,
      'TrueNegative': 16,
      'FalseNegative': 14}}



### Application of ROC functions

Here we define a list of ROC functions:


```python
funcs = ["PPV", "NPV", "TPR", "ACC", "SPC", "MCC"]
```

Here we apply each ROC function to each of the ROC records obtained above:


```python
import pandas
rocRes = { k : {f: roc_functions(f)(v) for f in funcs} for (k, v) in rocGroups.items()}

print(pandas.DataFrame(rocRes))
```

              0.4       0.2       0.6
    PPV  0.361111  0.620690  0.718750
    NPV  0.666667  0.513514  0.533333
    TPR  0.520000  0.500000  0.621622
    ACC  0.513889  0.560606  0.629032
    SPC  0.510638  0.633333  0.640000
    MCC  0.030640  0.134535  0.261666



-------

## References

### Articles

[Wk1] Wikipedia entry,
["Receiver operating characteristic"](https://en.wikipedia.org/wiki/Receiver_operating_characteristic).

[AA1] Anton Antonov,
["Basic example of using ROC with Linear regression"](https://mathematicaforprediction.wordpress.com/2016/10/12/basic-example-of-using-roc-with-linear-regression/)
,
(2016),
[MathematicaForPrediction at WordPress](https://mathematicaforprediction.wordpress.com).

[AA2] Anton Antonov,
["Introduction to data wrangling with Raku"](https://rakuforprediction.wordpress.com/2021/12/31/introduction-to-data-wrangling-with-raku/)
,
(2021),
[RakuForPrediction at WordPress](https://rakuforprediction.wordpress.com).

### Packages

[AAp1] Anton Antonov,
[ROCFunctions Mathematica package](https://github.com/antononcube/MathematicaForPrediction/blob/master/ROCFunctions.m),
(2016-2022),
[MathematicaForPrediction at GitHub/antononcube](https://github.com/antononcube/MathematicaForPrediction/).

[AAp2] Anton Antonov,
[ROCFunctions R package](https://github.com/antononcube/R-packages/tree/master/ROCFunctions),
(2021),
[R-packages at GitHub/antononcube](https://github.com/antononcube/R-packages).

[AAp3] Anton Antonov,
[ML::ROCFunctions Raku package](https://github.com/antononcube/Raku-ML-ROCFunctions),
(2022),
[GitHub/antononcube](https://github.com/antononcube).
