Metadata-Version: 2.1
Name: EthicML
Version: 0.1.0a2
Summary: A toolkit for understanding and researching algorithmic bias
Home-page: https://github.com/predictive-analytics-lab/EthicML
Author: Predictive Analytics Lab - University of Sussex
Author-email: olliethomas86@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: imageio (>=2.4.1)
Requires-Dist: matplotlib (>=3.0.2)
Requires-Dist: numpy (>=1.14.2)
Requires-Dist: pandas (>=0.24.0)
Requires-Dist: scikit-learn (>=0.20.1)
Requires-Dist: seaborn (>=0.9.0)
Requires-Dist: pyarrow (>=0.11)
Requires-Dist: numba
Requires-Dist: fairlearn (>=0.2.0)
Requires-Dist: GitPython (>=2.1.11)
Requires-Dist: tqdm (>=4.31.1)
Requires-Dist: pipenv (>=2018.11.26)
Requires-Dist: dataclasses ; python_version < "3.7"
Provides-Extra: dev
Requires-Dist: pylint (>=2.0) ; extra == 'dev'
Requires-Dist: pytest (>=3.3.2) ; extra == 'dev'
Requires-Dist: pytest-cov (>=2.6.0) ; extra == 'dev'
Requires-Dist: mypy (>=0.710) ; extra == 'dev'
Requires-Dist: black ; extra == 'dev'

# **README**

EthicML exists to combat the problems we've found with off-the-shelf fairness comparison packages.

These other packages are useful, but given that we primarily do research, a lot of the work we do doesn't fit into some nice box. For example, we might want to use a 'fair' pre-processing method on the data before training a classifier on it. We may still be experimenting and only want part of the framework to execute, or we may want to do hyper-parameter optimization. Whilst other frameworks can be modified to do these tasks, you end up with hacked-together approaches that don't lend themselves to be built on in the future. Because of this, we're drawing a line in the sand with some of the other frameworks we've used and building our own.

**Why not use XXX?**

There are an increasing number of other options, IBM's fair-360, Aequitas, EthicalML/XAI, Fairness-Comparison and others. They're all great at what they do, they're just not right for us. We will however be influenced by them.

# **Design Principles**

## The Triplet

Given that we're considering fairness, the base of the toolbox is the triplet {x, s, y}

- X - Features
- S - Sensitive Label
- Y - Class Label

All methods must assume S and Y are multi-class.

We use a named tuple to contain the triplet

    triplet = DataTuple(x=dataframe, s=dataframe, y=dataframe)

The dataframe may be a little innefficient, but given the amount of splicing on conditions that we're doing it feels worth it.

## Separation of Methods

We purposefully keep pre, during and post algorithm methods separate. This is because they have different return types.

    pre-algorithm.run(train: DataTuple, test: DataTuple) -> Tuple[pandas.DataFrame, pandas.DataFrame]
    in-algorithm.run(train: DataTuple, test: DataTuple) -> pandas.DataFrame
    post-algorithm.run(preds: DataFrame, test: DataTuple) -> pandas.DataFrame

where preds is a one column dataframe with the column name 'preds'.

## General Rules of Thumb

- Mutable data structures are bad.
- At the very least, functions should be Typed.
- Readability > Efficiency
- Don't get around warnings by just turning them off...

# Future Plans

Hopefully EthicML becomes a super easy way to look at the biases in different datasets and get a comparison of different models.


