Metadata-Version: 2.4
Name: llmSHAP
Version: 1.5.1
Summary: Multi-threaded explainability for LLMs: words, sentences, documents, images, and tools.
Author: Filip Naudot
License-Expression: MIT
Project-URL: Homepage, https://github.com/filipnaudot/llmSHAP
Project-URL: Repository, https://github.com/filipnaudot/llmSHAP
Project-URL: Documentation, https://filipnaudot.github.io/llmSHAP/
Project-URL: Issues, https://github.com/filipnaudot/llmSHAP/issues
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: tqdm
Provides-Extra: openai
Requires-Dist: openai; extra == "openai"
Requires-Dist: python-dotenv; extra == "openai"
Provides-Extra: embeddings
Requires-Dist: sentence-transformers; extra == "embeddings"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: matplotlib; extra == "dev"
Requires-Dist: ipywidgets; extra == "dev"
Requires-Dist: sphinx; extra == "dev"
Requires-Dist: myst-parser; extra == "dev"
Requires-Dist: sphinx-book-theme; extra == "dev"
Requires-Dist: sphinx-design; extra == "dev"
Provides-Extra: all
Requires-Dist: openai; extra == "all"
Requires-Dist: python-dotenv; extra == "all"
Dynamic: license-file

<div align='center'>
    <picture>
        <source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/filipnaudot/llmSHAP/main/docs/_static/llmSHAP-logo-lightmode.png">
        <img alt="llmSHAP logo" src="https://raw.githubusercontent.com/filipnaudot/llmSHAP/main/docs/_static/llmSHAP-logo-darkmode.png" width="50%" height="50%">
    </picture>
</div>
<br/>

![Unit Tests](https://github.com/filipnaudot/llmSHAP/actions/workflows/test.yml/badge.svg)
[![Documentation](https://img.shields.io/badge/docs-online-blue.svg)](https://filipnaudot.github.io/llmSHAP/)
[![PyPI Downloads](https://static.pepy.tech/personalized-badge/llmshap?period=total&units=INTERNATIONAL_SYSTEM&left_color=GRAY&right_color=GREEN&left_text=PyPI+downloads)](https://pepy.tech/projects/llmshap)

A multi-threaded explainability framework using Shapley values for LLM-based outputs.

---

## Getting Started
Install the `llmshap` package (with all optional dependencies):
```bash
pip install "llmshap[all]"
```

Install in editable mode with all optional dependencies (after cloning the repository):
```bash
pip install -e ".[all]"
```

Documentation is available at [llmSHAP Docs](https://filipnaudot.github.io/llmSHAP/) and a hands-on tutorial can be found [here](https://filipnaudot.github.io/llmSHAP/tutorial.html).

- [Full documentation](https://filipnaudot.github.io/llmSHAP/)  
- [Tutorial](https://filipnaudot.github.io/llmSHAP/tutorial.html)

---

# Example Usage

```python
from llmSHAP import DataHandler, BasicPromptCodec, ShapleyAttribution
from llmSHAP.llm import OpenAIInterface

data = "In what city is the Eiffel Tower?"
handler = DataHandler(data, permanent_keys={0,3,4})
result = ShapleyAttribution(model=OpenAIInterface("gpt-4o-mini"),
                            data_handler=handler,
                            prompt_codec=BasicPromptCodec(system="Answer the question briefly."),
                            use_cache=True,
                            num_threads=16,
                            ).attribution()

print("\n\n### OUTPUT ###")
print(result.output)

print("\n\n### ATTRIBUTION ###")
print(result.attribution)

print("\n\n### HEATMAP ###")
print(result.render())
```

## Multimodal Example with `Image`:
The following example shows `llmSHAP` with images.
```python
from llmSHAP import DataHandler, BasicPromptCodec, ShapleyAttribution, Image
from llmSHAP.llm import OpenAIInterface

data = {
    "question": "Has our stockprice increased or decreased since the beginning?",
    "Num employees"        : "The company has about 450 employees.",
    "[IMAGE] Stock chart"  : Image(image_path="./docs/_static/demo-stock-price.png"),
    "Report release date"  : "Quarterly reports are released on the 15th.",
    "Headquarter Location" : "The headquarters is located in a mid-sized city.",
    "Num countries"        : "It has offices in three countries."
}

result = ShapleyAttribution(model=OpenAIInterface("gpt-5-mini", reasoning="low"),
                            data_handler=DataHandler(data, permanent_keys={"question"}),
                            prompt_codec=BasicPromptCodec(system="Answer the question briefly."),
                            use_cache=True,
                            num_threads=35,
                            ).attribution()

print("\n\n### OUTPUT ###")
print(result.output)

print("\n\n### HEATMAP ###")
print(result.render(abs_values=True, render_labels=True))
```

<div align='center'>
    <picture>
        <source media="(prefers-color-scheme: light)" srcset="https://raw.githubusercontent.com/filipnaudot/llmSHAP/main/docs/_static/example-result-lightmode.png">
        <img alt="llmSHAP logo" src="https://raw.githubusercontent.com/filipnaudot/llmSHAP/main/docs/_static/example-result-darkmode.png" width="100%" height="100%">
    </picture>
</div>



## Embedding-Based Output Scoring

`EmbeddingCosineSimilarity` measures semantic similarity between outputs using embeddings. 
It supports two backends:
- **API** — any OpenAI-compatible embeddings endpoint via `api_url_endpoint`.
- **Local** — a `sentence-transformers` model downloaded on first use.

For the local backend, install the `embeddings` extra:
```bash
pip install "llmshap[embeddings]"
```

The example below uses the API backend, which is already included in `[all]`.

```python
from llmSHAP import DataHandler, BasicPromptCodec, ShapleyAttribution, EmbeddingCosineSimilarity
from llmSHAP.llm import OpenAIInterface

data = "In what city is the Eiffel Tower?"
handler = DataHandler(data)
result = ShapleyAttribution(model=OpenAIInterface("gpt-4o-mini"),
                            data_handler=handler,
                            prompt_codec=BasicPromptCodec(system="Answer the question briefly."),
                            use_cache=True,
                            num_threads=16,
                            value_function=EmbeddingCosineSimilarity(
                                model_name = "text-embedding-3-small",
                                api_url_endpoint = "https://api.openai.com/v1")
                            ).attribution()

print("\n\n### OUTPUT ###")
print(result.output)

print("\n\n### HEATMAP ###")
print(result.render(abs_values=True, render_labels=True))
```


---

## Example data

You can pass either a string or a dictionary:

```python
from llmSHAP import DataHandler

# String input
data = "The quick brown fox jumps over the lazy dog"
handler = DataHandler(data)

# Dictionary input
data = {"a": "The", "b": "quick", "c": "brown", "d": "fox"}
handler = DataHandler(data)
```

To exclude certain keys from the computations, use `permanent_keys`:
```python
from llmSHAP import DataHandler

data = {"a": "The", "b": "quick", "c": "brown", "d": "fox"}
handler = DataHandler(data, permanent_keys={"a", "d"})

# Get data with index 1 WITHOUT the permanent features.
print(handler.get_data({1}, exclude_permanent_keys=True, mask=False))
# Output: {'b': 'quick'}

# Get data with index 1 AND the permanent features.
print(handler.get_data({1}, exclude_permanent_keys=False, mask=False))
# Output: {'a': 'The', 'b': 'quick', 'd': 'fox'}
```
---


## Comparison with TokenSHAP
| Capability                                                                | **llmSHAP**                                                 | **TokenSHAP**                  |
| ------------------------------------------------------------------------- | ----------------------------------------------------------- | ------------------------------ |
| Threaded                                                                  | ✅ (optional ``num_threads``)                                | ❌                              |
| Modular architecture                                                      | ✅                                                           | ❌                              |
| Exact Shapley option                                                      | ✅ (Full enumeration)                                        | ❌ (Monte Carlo sampling)       |
| Generation caching across coalitions                                      | ✅                                                           | ❌                              |
| Heuristics                                                                | SlidingWindow • Monte Carlo • Counterfactual                 | Monte Carlo                    |
| Sentence-/chunk-level attribution                                         | ✅                                                           | ✅                             |
| Permanent context pinning (always-included features)                      | ✅                                                           | ❌                              |
| Pluggable similarity metric                                               | ✅ TF-IDF, embeddings                                        | ✅ TF-IDF, embeddings          |
| Docs & tutorial                                                           | ✅ Sphinx docs + tutorial                                    | ✅ README only                 |
| Unit tests & CI                                                           | ✅ Pytest + GitHub Actions                                   | ❌                              |
| Vision object attribution                                                 | ❌                                                           | ✅ PixelSHAP                   |

---

<br/>
<br/>
<br/>

# Stars ⭐️

[![Star History Chart](https://api.star-history.com/svg?repos=filipnaudot/llmSHAP&type=Date&legend=top-left)](https://www.star-history.com/#filipnaudot/llmSHAP&type=date&legend=top-left)
