Metadata-Version: 2.4
Name: cafga
Version: 0.0.3
Summary: CafGa is a library that facilitates creating and evaluating grouped-attribution explanations.
Author-email: Alan Boyle <aboyle@student.ethz.ch>
License: 
        Copyright (c) 2024, Alan Boyle, IVIA Lab ETH Zurich
        All rights reserved.
        
        Redistribution and use in source and binary forms, with or without modification,
        are permitted provided that the following conditions are met:
        
        * Redistributions of source code must retain the above copyright notice, this
          list of conditions and the following disclaimer.
        
        * Redistributions in binary form must reproduce the above copyright notice, this
          list of conditions and the following disclaimer in the documentation and/or
          other materials provided with the distribution.
        
        * Neither the name of cafga nor the names of its
          contributors may be used to endorse or promote products derived from this
          software without specific prior written permission.
        
        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
        ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
        WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
        IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
        INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
        BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
        DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
        OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
        OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
        OF THE POSSIBILITY OF SUCH DAMAGE.
        
        
Keywords: LLM,XAI,NLP,salience,attribution
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: BSD License
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pip
Requires-Dist: matplotlib>=3.10.0
Requires-Dist: numpy>=2
Requires-Dist: pandas>=2
Requires-Dist: black>=25
Requires-Dist: python-dotenv>=1.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: pydantic>=2.10
Requires-Dist: sacremoses>=0.1.1
Requires-Dist: openai>=1.63.0
Requires-Dist: shap==0.46.0
Requires-Dist: nltk>=3.8
Dynamic: license-file

# CafGa (**C**ustom **a**ssignments **f**or **G**roup **a**ttribution)

## Project Links:

Project repository: https://github.com/aboyleTD/CafGa

## Installation

CafGa can be installed through PyPI using 

```
pip install cafga
```

If you installed CafGa from the repository run:

```
pip install -r requirements.txt
```

Note that some of the extra functionality requires further installations:
<!-- 1. To get the syntax-parse requires downloading spaCy and the en_core_web_trf module. Which can be done with the following commands:
```
pip install spacy
python -m spacy download en_core_web_trf
```
Important Notes: 
1. spaCy may fail to build on python >= 3.13. So in case you into a build failure try downgrading python to 3.12. 

2. en_core_web_trf requires a version of torch that cannot be run on numpy 2. Thus, you may need to run the following command to downgrade numpy:
```
pip install numpy==1.26.4 
```
-->
1. CafGa provides two jupyter widgets. The edit widget allows one to visually edit assignments and the display widget displays the attributions generated by the explanation. To use these please follow the instructions in the 'Demo Instructions.md' file. 

2. CafGa offers a predefined ChatGPT model. To use it you need to place a .env file with your API key in your working directory. 

## Using CafGa

The following provides an explanation of the main functions of cafga. To see an example of how to use cafga please look at the demo. 

To begin using CafGa, start by importing CafGa creating a cafga object:

`from cafga.cafga import CAFGA`

`cafga = CAFGA(model = 'your_model')`

The model parameter is where you pass the model you want to explain. To allow for parallelization in how your model generates predictions (e.g. by batching) cafga sends lists of inputs to your model instead of single inputs. Thus, the function that implements your model should take a list of strings as input and output either a list of strings or a list of floats as output (i.e. a list containing one output for every input). 

Once cafga is instantiated the typical usage of cafga runs proceeds in three steps: Explanation, Evaluation, and Visualisation.

### 1. Explanation

To generate an explanation run the explain function on the instantiated cafga object:

`explanation = cafga.explain(params)`

There are two way of using the explain functions. 

Firstly, you can pass the string you want to get an explanation for without segmenting it into the individual parts that you want to get attributions for. In this case you need to provide the name of the predefined attribution method ('word', 'sentence', 'syntax-parse') that you want to use. 

Secondly, you can provide your own segmentation of the input by using the `segmented_input` parameter. In this case you will also need to provide the assignments of input segment to group with the `input_assignments` parameter. Specifically, the `input_assignments[i] = g_i` should be the index of the group that `input_segments[i]` belongs to. 

### 2. Evaluation

Once an explanation object has been generated you can pass it on to the evaluation function:

`evaluation = cafga.evaluate(explanation, params)`

The two forms of evaluation currently supported are deletion (going from all features present to no features present) and insertion (going from no features present to all features present), which can be indicated by the `direction` parameter. The resulting evaluation accordinlgy contains the array of difference values computed as part of the perturbation curve. 

### 3. Visualisation

Finally, the perturbation curve generated by the evaluation can be visualised using the visualisation function:

`cafga.visualize_evaluation(evaluated_explanations, params)`

Since you may want to plot the aggregate over many evaluations the visualisation functions takes in a list of evaluations as input. The two forms of aggregation currently supported are equal width binning and linear interpolation. 
