Metadata-Version: 1.1
Name: BlackBoxAuditing
Version: 0.0.4
Summary: Sample Implementation of Gradient Feature Auditing (GFA)
Home-page: https://github.com/algofairness/BlackBoxAuditing
Author: Philip Adler, Casey Falk, Sorelle A. Friedler, Gabriel Rybeck, Carlos Scheidegger, Brandon Smith, Suresh Venkatasubramanian, Michael Feldman, John Moeller, Derek Roth, Charlie Marx
Author-email: fairness@haverford.edu
License: Apache 2.0
Description: # Black Box Auditing and Certifying and Removing Disparate Impact
        
        This repository contains a sample implementation of Gradient Feature Auditing (GFA) meant to be generalizable to most datasets.  For more information on the repair process, see our paper on [Certifying and Removing Disparate Impact](http://arxiv.org/abs/1412.3756).  For information on the full auditing process, see our paper on [Auditing Black-box Models for Indirect Influence](http://arxiv.org/abs/1602.07043).
        
        # License
        
        This code is licensed under an [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0.html) license.
        
        # Setup and Installation
        
        1. Install the Python dependencies listed in the requirements.txt file.
        2. Install python-matplotlib if you do not already have it (https://matplotlib.org/users/installing.html).
        3. Install BlackBoxAuditing (`pip install BlackBoxAuditing`)
        
        Many of the ModelVisitors rely on [Weka](http://www.cs.waikato.ac.nz/ml/weka/). Similarly, we use [TensorFlow](https://www.tensorflow.org/) for network-based machine learning. Any Python libraries that need to be installed are included in the `requirements.txt` file. Weka and Tensorflow should be downloaded during installation, but here's the download links just in case. 
        
        - Weka 3.6.13 [download](http://www.cs.waikato.ac.nz/ml/weka/downloading.html)
        - TensorFlow [download](https://www.tensorflow.org/versions/master/get_started/os_setup.html) (original experiments run with version 0.6.0)
        
        
        # Certifying and Removing Disparate Impact
        
        After installing BlackBoxAuditing, you can run the data repair described in [Certifying and Removing Disparate Impact](http://arxiv.org/abs/1412.3756) using the command `BlackBoxAuditing-repair` on a terminal which will tell you the arguments the script takes.
        
        # Black Box Auditing
        
        To run GFA on a dataset (as in [Auditing Black-box Models for Indirect Influence](http://arxiv.org/abs/1602.07043)),
        
        
        ## Running as a Python Script
        
        After installing BlackBoxAuditing, GFA can be run on a dataset (as in [Auditing Black-box Models for Indirect Influence](http://arxiv.org/abs/1602.07043)) using a simple python script. For reference, the following includes sample code:
        
        ```python
        %python
        # import BlackBoxAuditing
        import BlackBoxAuditing as BBA
        # import machine learning technique
        from BlackBoxAuditing.model_factories import Weka_SVM, Weka_DecisionTree
        
        """
        Using a preloaded dataset
        """
        # load in preloaded dataset
        data = BBA.load_data("german")
        
        # initialize the auditor and set parameters
        auditor = BBA.Auditor()
        auditor.model = Weka_SVM
        
        # call the auditor with the data
        auditor(data)
        
        
        """
        Using your own dataset
        """
        # load your own data
        datafile = 'path/to/datafile'
        data = BBA.load_from_file(datafile)
        
        # initialize the auditor and set parameters
        auditor = BBA.Auditor()
        auditor.model = Weka_DecisionTree
        
        # call the auditor
        auditor(data)
        
        ```
        
        ### More Advanced Script Options
        
        #### Using a preloaded dataset
        
        The BlackBoxAuditing package has a few datasets preloaded and ready to use for auditing. In a script, they are available via the function `load_data` which takes as input the name of the dataset and returns formatted data ready for auditing. The following is the list of preloaded datasets available for auditing:
        
        * adult
        * diabetes
        * ricci
        * german
        * glass
        * sample
        * DRP
        
        Refer to the Sources section down below for more information about the datasets
        
        #### Using you own dataset
        
        To use your own data for auditing, the function `load_from_file`, most simply, takes as input the path to your dataset and returns formatted data ready for auditing. `load_from_file` also includes other paramters which should be set to ensure that your data is processed correctly. Refer to the full function and its defaults:
        
        ```
        load_from_file(datafile, testdata=None, correct_types=None, train_percentage=2.0/3.0,
                           response_header=None, features_to_ignore=None, missing_data_symbol=""
        ```
        
        * *datafile*: path to your dataset
        * *testdata*: path to the dataset used for testing a model. Assumes that *datafile* is the training dtata
        * *correct_types*: list of the types (str, int, or float) of the features in the data. If not given, the types will be automatically generated by inspecting the values of each feature
        * *train_percentage*: train/test split of the data given as floats
        * *response_header*: name of the response column in the data. if not given, assumes that the last column in the data is the response
        * *features_to_ignore*: list of the names of any feature than you wish to be ignored by the model
        * *missing_data_symbol*: symbol that marks missing or unknown value in the data
        
        #### Auditor options
        
        After initializing the auditor `auditor = BlackBoxAuditor.Auditor()`, there are a few options that can be set to tune the auditor listed as follows:
        
        `auditor.measurers`: (*default = [accuracy, BCR]*) list of measurers to use for GFA
        
        `auditor.model_options`: (*default = {}*) options for machine learning model
        
        `auditor.verbose`: (*default = True*) Set to "True" to allow for more detailed status updates
        
        `auditor.REPAIR_STEPS`: (*default = 10*) Number of repair steps take 
        
        `auditor.RETRAIN_MODEL_PER_REPAIR`: (*default = False*) 
        
        `auditor.WRITE_ORIGINAL_PREDICTIONS`: (*default = True*)
        
        `auditor.ModelFactory`: (*default = Weka_SVM*) Available machine learning options: Weka_SVM, Weka_DecisionTree, TensorFlow
        
        `auditor.kdd`: (*default = False*) 
        
        
        ## Testing Code Changes
        
        After BlackBoxAuditing has been installed, you can run the test suite using the command on a terminal `BlackBoxAuditing-test`.
        
        Every python file should include test functions at the bottom that will be run when the file is run. This can be done by including the line `if __name__=="__main__": test()` as long as there is a function defined as `test`.
        
        These tests should use print statements with `True` or `False` readouts indicating success or failure (where `True` should always be success). It is fine/good to have multiple of these per file.
        
        Note: if a test requires reading data from the `test_data` directory, it should import the appropriate `load_data` file from the `experiments` directory.
        
        ## Implementing a New Machine-Learning Method
        
        The best way to create a model would be to use a ModelFactory and ModelVisitors. A ModelVisitor should be thought of as a wrapper that knows how to load a machine-learning model of a given type and communicate with that model file in order to output predicted values of some test dataset. A ModelFactory simply knows how to "build" a ModelVisitor based on some provided training data. Check out the "Abstract" files in the `sample_experiment` directory for outlines of what these two classes should do; similarly, check out the "SVM_ModelFactory" files in the `sample_experiment` subdirectory for examples that use WEKA to create model files and produce predictions.
        
        # Sources
        
        Dataset Sources:
         - adult.csv [link](https://archive.ics.uci.edu/ml/datasets/Adult)
         - german_categorical.csv (Modified from [link](https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data))
         - RicciDataMod.csv (Modified from [link](http://www.amstat.org/publications/jse/v18n3/RicciData.csv))
         - DRP Datasets (Source and data-files coming soon.)
         - Arrests/Recidivism Datasets [link](http://www.icpsr.umich.edu/icpsrweb/RCMD/studies/3355)
         - Linear Datasets ("sample_2" Experiment) [link](https://github.com/jasonbaldridge/try-tf)
        
        More information on DRP can be found at the [Dark Reactions Project](http://darkreactions.haverford.edu/) official site.
        
        # Bug Reports and Feature-Requests
        
        All bug reports and feature-requests should be submitted through the [Issue Tracker](https://github.com/cfalk/BlackBoxAuditing/issues).
        
Keywords: algorithmic fairness
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3.0
