Metadata-Version: 2.4
Name: GAICo
Version: 0.4.0
Summary: GenAI Results Comparator, GAICo, is a Python library to help compare, analyze and visualize outputs from Large Language Models (LLMs), often against a reference text. In doing so, one can use a range of extensible metrics from the literature.
Project-URL: Bug Tracker, https://github.com/ai4society/GenAIResultsComparator/issues
Project-URL: Documentation, https://ai4society.github.io/projects/GenAIResultsComparator/index.html
Project-URL: Homepage, https://github.com/ai4society/GenAIResultsComparator
Project-URL: Repository, https://github.com/ai4society/GenAIResultsComparator
Author-email: AI4Society Team <ai4societyteam@gmail.com>, Nitin Gupta <nitin1209@gmail.com>, Pallav Koppisetti <pallav.koppisetti5@gmail.com>, Kausik Lakkaraju <klakkaraju98@gmail.com>, Biplav Srivastava <prof.biplav@gmail.com>
Maintainer-email: AI4Society Team <ai4societyteam@gmail.com>, Nitin Gupta <nitin1209@gmail.com>
License: MIT License
        
        Copyright (c) 2024 AI for Society Research Group
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: evaluation,generative-ai,llm,metrics,nlp,text-comparison
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: <3.13,>=3.10
Requires-Dist: dtaidistance>=2.3.13
Requires-Dist: levenshtein>=0.23.0
Requires-Dist: matplotlib>=3.7.5
Requires-Dist: numpy>=1.26
Requires-Dist: pandas>=2.2.1
Requires-Dist: rouge-score>=0.1.2
Requires-Dist: seaborn>=0.13.0
Provides-Extra: audio
Requires-Dist: scipy>=1.15.3; extra == 'audio'
Requires-Dist: soundfile>=0.12.0; extra == 'audio'
Provides-Extra: bertscore
Requires-Dist: bert-score>=0.3.13; extra == 'bertscore'
Provides-Extra: cosine
Requires-Dist: scikit-learn==1.5.0; extra == 'cosine'
Provides-Extra: jsd
Requires-Dist: nltk>=3.9.1; extra == 'jsd'
Requires-Dist: scipy>=1.15.3; extra == 'jsd'
Description-Content-Type: text/markdown

<!-- This file is generated by root's scripts/generate_pypi_description.py -->

# GAICo: GenAI Results Comparator

**Repository:** [github.com/ai4society/GenAIResultsComparator](https://github.com/ai4society/GenAIResultsComparator)

**Documentation:** [ai4society.github.io/projects/GenAIResultsComparator](https://ai4society.github.io/projects/GenAIResultsComparator/index.html)

## Overview

_GenAI Results Comparator (GAICo)_ helps you measure the quality of your Generative AI (LLM) outputs. It enables you to compare, analyze, and visualize results across text, images, audio, and structured data, helping you answer the question: "Which model performed better?"

At its core, the library provides a set of metrics for evaluating various types of outputs, from plain text strings to structured data like planning sequences and time-series, and multimedia content such as images and audio. While the `Experiment` class streamlines evaluation for text-based and structured string outputs, individual metric classes offer direct control for all data types, including binary or array-based multimedia. These metrics produce normalized scores (typically 0 to 1), where 1 indicates a perfect match, enabling robust analysis and visualization of LLM performance.

## Quickstart

GAICo's `Experiment` class offers a streamlined workflow for comparing multiple model outputs, applying thresholds, generating plots, and creating CSV reports.

Here's a quick example:

```python
from gaico import Experiment

# Sample LLM responses comparing different models
llm_responses = {
    "Google": "Title: Jimmy Kimmel Reacts to Donald Trump Winning...",
    "Mixtral 8x7b": "I'm an AI and I don't have the ability to predict...",
    "SafeChat": "Sorry, I am designed not to answer such a question.",
}
reference_answer = "Sorry, I am unable to answer such a question as it is not appropriate."

# Initialize and run comparison
exp = Experiment(llm_responses=llm_responses, reference_answer=reference_answer)
results = exp.compare(
    metrics=['Jaccard', 'ROUGE'],
    plot=True,
    output_csv_path="experiment_report.csv"
)

print(results)
```

For more detailed examples, please refer to our Jupyter Notebooks in the [`examples/`](https://github.com/ai4society/GenAIResultsComparator/tree/main/examples) folder in the repository.

## Features

- **Comprehensive Metric Library**
  - Textual similarity: Jaccard, Cosine, Levenshtein, Sequence Matcher
  - N-gram based: BLEU, ROUGE, JS Divergence
  - Semantic similarity: BERTScore
  - Structured data: Planning sequences and time-series metrics
  - Multimedia: Image similarity (SSIM, hash-based) and audio quality metrics

- **Streamlined Evaluation Workflow**
  - High-level `Experiment` class for comparing models, applying thresholds, and generating reports
  - `summarize()` method for aggregated performance overviews

- **Dynamic & Extensible**
  - Register custom metrics at runtime
  - Add your own evaluation criteria easily

- **Powerful Visualization**
  - Generate comparative plots automatically
  - Support for bar charts and radar plots

- **Robust & Tested**
  - Comprehensive test suite with Pytest
  - Production-ready reliability

## Installation

GAICo can be installed using pip.

**Create and activate a virtual environment:**

```shell
python3 -m venv gaico-env
source gaico-env/bin/activate  # On macOS/Linux
# gaico-env\Scripts\activate   # On Windows
```

**Install GAICo:**

```shell
pip install gaico
```

This installs the core GAICo library with essential metrics.

**Optional dependencies** for specialized metrics:

```shell
pip install 'gaico[audio]'                       # Audio metrics
pip install 'gaico[bertscore]'                   # BERTScore metric
pip install 'gaico[cosine]'                      # Cosine similarity
pip install 'gaico[jsd]'                         # JS Divergence
pip install 'gaico[audio,bertscore,cosine,jsd]'  # All features
```

## Citation

If you find GAICo useful in your research or work, please consider citing it:

If you find this project useful, please cite our work:

```bibtex
@article{Gupta_Koppisetti_Lakkaraju_Srivastava_2026,
  title={GAICo: A Deployed and Extensible Framework for Evaluating Diverse and Multimodal Generative AI Outputs},
  journal={Proceedings of the AAAI Conference on Artificial Intelligence},
  author={Gupta, Nitin and Koppisetti, Pallav and Lakkaraju, Kausik and Srivastava, Biplav},
  year={2026},
}
```
