Metadata-Version: 1.1
Name: BlackBoxAuditing
Version: 0.1.1
Summary: Sample Implementation of Gradient Feature Auditing (GFA)
Home-page: https://github.com/algofairness/BlackBoxAuditing
Author: Philip Adler, Casey Falk, Sorelle A. Friedler, Gabriel Rybeck, Carlos Scheidegger, Brandon Smith, Suresh Venkatasubramanian, Michael Feldman, John Moeller, Derek Roth, Charlie Marx
Author-email: fairness@haverford.edu
License: Apache 2.0
Description: # Black Box Auditing and Certifying and Removing Disparate Impact
        
        This repository contains a sample implementation of Gradient Feature Auditing (GFA) meant to be generalizable to most datasets.  For more information on the repair process, see our paper on [Certifying and Removing Disparate Impact](http://arxiv.org/abs/1412.3756).  For information on the full auditing process, see our paper on [Auditing Black-box Models for Indirect Influence](http://arxiv.org/abs/1602.07043).
        
        # License
        
        This code is licensed under an [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0.html) license.
        
        # Setup and Installation
        
        1. Install the Python dependencies listed in the requirements.txt file.
        2. Install python-matplotlib if you do not already have it (https://matplotlib.org/users/installing.html).
        3. Install BlackBoxAuditing (`pip3 install BlackBoxAuditing`)
        
        Many of the ModelVisitors rely on [Weka](http://www.cs.waikato.ac.nz/ml/weka/). Similarly, we use [TensorFlow](https://www.tensorflow.org/) for network-based machine learning. Any Python libraries that need to be installed are included in the `requirements.txt` file. Weka and Tensorflow should be downloaded during installation, but here's the download links just in case. 
        
        - Weka 3.6.13 [download](http://www.cs.waikato.ac.nz/ml/weka/downloading.html)
        - TensorFlow [download](https://www.tensorflow.org/versions/master/get_started/os_setup.html) (original experiments run with version 0.6.0)
        
        
        # Certifying and Removing Disparate Impact
        
        After installing BlackBoxAuditing, you can run the data repair described in [Certifying and Removing Disparate Impact](http://arxiv.org/abs/1412.3756) using the command `BlackBoxAuditing-repair` on a terminal which will tell you the arguments the script takes.
        
        # Black Box Auditing
        
        To run GFA on a dataset (as in [Auditing Black-box Models for Indirect Influence](http://arxiv.org/abs/1602.07043)),
        
        
        ## Running as a Python Script
        
        After installing BlackBoxAuditing, GFA can be run on a dataset (as in [Auditing Black-box Models for Indirect Influence](http://arxiv.org/abs/1602.07043)) using a simple python script. For reference, the following includes sample code:
        
        ```python3
        %python
        # import BlackBoxAuditing
        import BlackBoxAuditing as BBA
        # import machine learning technique
        from BlackBoxAuditing.model_factories import Weka_SVM, Weka_DecisionTree
        
        """
        Using a preloaded dataset
        """
        # load in preloaded dataset
        data = BBA.load_data("german")
        
        # initialize the auditor and set parameters
        auditor = BBA.Auditor()
        auditor.model = Weka_SVM
        
        # call the auditor with the data
        auditor(data)
        
        
        """
        Using your own dataset
        """
        # load your own data
        datafile = 'path/to/datafile'
        data = BBA.load_from_file(datafile)
        
        # initialize the auditor and set parameters
        auditor = BBA.Auditor()
        auditor.model = Weka_DecisionTree
        
        # call the auditor
        auditor(data)
        
        ```
        
        ### More Advanced Script Options
        
        #### Using a preloaded dataset
        
        The BlackBoxAuditing package has a few datasets preloaded and ready to use for auditing. In a script, they are available via the function `load_data` which takes as input the name of the dataset and returns formatted data ready for auditing. The following is the list of preloaded datasets available for auditing:
        
        * adult
        * diabetes
        * ricci
        * german
        * glass
        * sample
        * DRP
        
        Refer to the Sources section down below for more information about the datasets
        
        #### Using you own dataset
        
        To use your own data for auditing, the function `load_from_file`, most simply, takes as input the path to your dataset and returns formatted data ready for auditing. `load_from_file` also includes other paramters which should be set to ensure that your data is processed correctly. Refer to the full function and its defaults:
        
        ```
        load_from_file(datafile, testdata=None, correct_types=None, train_percentage=2.0/3.0,
                           response_header=None, features_to_ignore=None, missing_data_symbol=""
        ```
        
        * *datafile*: path to your dataset
        * *testdata*: path to the dataset used for testing a model. Assumes that *datafile* is the training dtata
        * *correct_types*: list of the types (str, int, or float) of the features in the data. If not given, the types will be automatically generated by inspecting the values of each feature
        * *train_percentage*: train/test split of the data given as floats
        * *response_header*: name of the response column in the data. if not given, assumes that the last column in the data is the response
        * *features_to_ignore*: list of the names of any feature than you wish to be ignored by the model
        * *missing_data_symbol*: symbol that marks missing or unknown value in the data
        
        #### Auditor setup options
        
        After initializing the auditor `auditor = BlackBoxAuditor.Auditor()`, there are a few options that can be set to tune the auditor listed as follows:
        
        `auditor.measurers`: (*default = [accuracy, BCR]*) list of measurers to use for GFA
        
        `auditor.model_options`: (*default = {}*) options for machine learning model
        
        `auditor.verbose`: (*default = True*) Set to "True" to allow for more detailed status updates
        
        `auditor.REPAIR_STEPS`: (*default = 10*) Number of repair steps take 
        
        `auditor.RETRAIN_MODEL_PER_REPAIR`: (*default = False*) 
        
        `auditor.WRITE_ORIGINAL_PREDICTIONS`: (*default = True*)
        
        `auditor.ModelFactory`: (*default = Weka_SVM*) Available machine learning options: Weka_SVM, Weka_DecisionTree, TensorFlow
        
        `auditor.kdd`: (*default = False*) 
        
        #### Auditor call options
        
        Once the auditor is initialized and tuned `auditor = BlackBoxAuditor.Auditor()`, there are a few options that can be set to configure how the audit is run. Refer to the full audit call and its defaults:
        
        ```
        auditor(data, output_dir=None, dump_all=False, features_to_audit=None)
        ```
        
        * *data*: data object returned from calling either `load_data' or `load_from_file`
        * *output_dir*: name of the directory that audit files will be dumped to. If no output directory is specified, a default directory will be generated
        * *dump_all*: boolean value. If True, all files generated by the audit will be dumped including all original and repaired files, predictions files, audit files, and graphs. If False, only audit files and full repaired files will be dumped.
        * *features_to_audit*: list of specific features that should be audited. If none specified, all features will be audited
        
        ## Testing Code Changes
        
        After BlackBoxAuditing has been installed, you can run the test suite using the command on a terminal `BlackBoxAuditing-test`.
        
        Every python file should include test functions at the bottom that will be run when the file is run. This can be done by including the line `if __name__=="__main__": test()` as long as there is a function defined as `test`.
        
        These tests should use print statements with `True` or `False` readouts indicating success or failure (where `True` should always be success). It is fine/good to have multiple of these per file.
        
        Note: if a test requires reading data from the `test_data` directory, it should import the appropriate `load_data` file from the `experiments` directory.
        
        ## Implementing a New Machine-Learning Method
        
        The best way to create a model would be to use a ModelFactory and ModelVisitors. A ModelVisitor should be thought of as a wrapper that knows how to load a machine-learning model of a given type and communicate with that model file in order to output predicted values of some test dataset. A ModelFactory simply knows how to "build" a ModelVisitor based on some provided training data. Check out the "Abstract" files in the `sample_experiment` directory for outlines of what these two classes should do; similarly, check out the "SVM_ModelFactory" files in the `sample_experiment` subdirectory for examples that use WEKA to create model files and produce predictions.
        
        # Sources
        
        Dataset Sources:
         - adult.csv [link](https://archive.ics.uci.edu/ml/datasets/Adult)
         - german_categorical.csv (Modified from [link](https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data))
         - RicciDataMod.csv (Modified from [link](http://www.amstat.org/publications/jse/v18n3/RicciData.csv))
         - DRP Datasets (Source and data-files coming soon.)
         - Arrests/Recidivism Datasets [link](http://www.icpsr.umich.edu/icpsrweb/RCMD/studies/3355)
         - Linear Datasets ("sample_2" Experiment) [link](https://github.com/jasonbaldridge/try-tf)
        
        More information on DRP can be found at the [Dark Reactions Project](http://darkreactions.haverford.edu/) official site.
        
        # Bug Reports and Feature-Requests
        
        All bug reports and feature-requests should be submitted through the [Issue Tracker](https://github.com/cfalk/BlackBoxAuditing/issues).
        
Keywords: algorithmic fairness
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3.0
