Metadata-Version: 2.1
Name: bocoel
Version: 0.0.4
Summary: Bayesian Optimization as a Coverage Tool for Evaluating Large Language Models
Author-email: RenChu Wang <patrick1031wang@gmail.com>
License: Apache-2.0
License-File: LICENSE.md
Requires-Python: <3.11,>=3.10
Requires-Dist: alive-progress>=3.1.5
Requires-Dist: ax-platform>=0.3.6
Requires-Dist: botorch>=0.9.5
Requires-Dist: fire>=0.5.0
Requires-Dist: gpytorch>=1.11
Requires-Dist: networkx>=3.2.1
Requires-Dist: numpy>=1.26.3
Requires-Dist: pandas>=2.1.4
Requires-Dist: pyyaml>=6.0.1
Requires-Dist: scikit-learn>=1.3.2
Requires-Dist: scipy>=1.11.4
Requires-Dist: structlog>=24.1.0
Requires-Dist: torch>=2.1.2
Requires-Dist: typeguard>=2.13.3
Requires-Dist: typing-extensions>=4.9.0
Requires-Dist: ujson>=5.9.0
Provides-Extra: cma
Requires-Dist: cma>=3.3.0; extra == 'cma'
Provides-Extra: datasets
Requires-Dist: datasets>=2.16.1; extra == 'datasets'
Provides-Extra: index
Requires-Dist: faiss-cpu>=1.7.4; extra == 'index'
Requires-Dist: hnswlib>=0.8.0; extra == 'index'
Provides-Extra: metrics
Requires-Dist: nltk>=3.8.1; extra == 'metrics'
Requires-Dist: rouge-score>=0.1.2; extra == 'metrics'
Requires-Dist: rouge>=1.0.1; extra == 'metrics'
Requires-Dist: sacrebleu>=2.4.0; extra == 'metrics'
Provides-Extra: sklearn-extra
Requires-Dist: scikit-learn-extra>=0.3.0; extra == 'sklearn-extra'
Provides-Extra: transformers
Requires-Dist: sentence-transformers>=2.2.2; extra == 'transformers'
Requires-Dist: transformers>=4.36.2; extra == 'transformers'
Provides-Extra: visual
Requires-Dist: dash>=2.14.2; extra == 'visual'
Requires-Dist: flask>=3.0.0; extra == 'visual'
Requires-Dist: hiplot>=0.1.33; extra == 'visual'
Requires-Dist: plotly>=5.18.0; extra == 'visual'
Description-Content-Type: text/markdown

# ☂️ BoCoEL

## Bayesian Optimization as a Coverage Tool for Evaluating Large Language Models

![Logo](assets/logo-full.svg)

[![Publish](https://github.com/rentruewang/bocoel/actions/workflows/release.yaml/badge.svg)](https://github.com/rentruewang/bocoel/actions/workflows/release.yaml)
[![Build Pages](https://github.com/rentruewang/bocoel/actions/workflows/build.yaml/badge.svg)](https://github.com/rentruewang/bocoel/actions/workflows/build.yaml)
[![Formatting](https://github.com/rentruewang/bocoel/actions/workflows/format.yaml/badge.svg)](https://github.com/rentruewang/bocoel/actions/workflows/format.yaml)
[![Type Checking](https://github.com/rentruewang/bocoel/actions/workflows/typecheck.yaml/badge.svg)](https://github.com/rentruewang/bocoel/actions/workflows/typecheck.yaml)
[![Unit Testing](https://github.com/rentruewang/bocoel/actions/workflows/unittest.yaml/badge.svg)](https://github.com/rentruewang/bocoel/actions/workflows/unittest.yaml)


![GitHub License](https://img.shields.io/github/license/:user/:repo)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/bocoel)
[![Built with Material for MkDocs](https://img.shields.io/badge/Material_for_MkDocs-526CFE?style=for-the-badge&logo=MaterialForMkDocs&logoColor=white)](https://squidfunk.github.io/mkdocs-material/)

## 🤔 Why BoCoEL?

Large language models are expensive and slow behemoths, and evaluating them on gigantic modern datasets only makes it worse. 

If only there is a way to just select a meaningful (_and small_) subset of the corpus and obtain a highly accurate evaluation.....

Wait, sounds like [Bayesian Optmization](#bo)!

Bocoel works in the following steps:

1. Encode individual entry into embeddings (way cheaper / faster than LLM and reusable).
2. Use Bayesian optimization to select queries to evaluate.
3. Use the queries to retrieve from our corpus (with the encoded embeddings).
4. Profit.

The evaluations generated are easily managed by the provided manager utility.

## 🚀 Features

- 🎯 Accurately evaluate large language models with just tens of samples from your selected corpus.
- 💂‍♂️ Uses the power of Bayesian optimization to select an optimal set of samples for language model to evaluate.
- 💯 Evalutes the corpus on the model in addition to evaluating the model on corpus.
- 🤗 Integration with huggingface [transformers](https://huggingface.co/docs/transformers/en/index) and [datasets](https://huggingface.co/docs/datasets/en/index)
- 🧩 Modular design.

## 🗺️ Roadmap: work in progress

- 📊 Visualization module of the evaluation.
- 🎲 Integration of alternative methods (random, kmedoids...) with Gaussian process.
- 🥨 Integration with more backends such as [VLLM](https://github.com/vllm-project/vllm) and [OpenAI's API](https://github.com/openai/openai-python).

## ⭐ Give us a star!

Like what you see? Please consider giving this a star (★)!

## <a id="bo"></a> ♾️ Bayesian Optimization

<img src="https://upload.wikimedia.org/wikipedia/commons/0/02/GpParBayesAnimationSmall.gif" width="40%" align="right"/>

Simply put, Bayesian optimization aims to optimize either the exploration objective (the purple area in the image) or the exploitation object (the height of the black dots). It uses Gaussian processes as a backbone for inference, and uses an **acquisition function** to decide where to sample next. See [here](https://distill.pub/2019/visual-exploration-gaussian-processes/) for an a more in-depth introduction.

Since _Bayesian optimization works well with expensive-to-evaluate black-box model (paraphrase: LLM)_, it is perfect for this particular use case. Bocoel uses Bayesian optimization as a backbone for exploring the embedding space given by our corpus, which allows it to select a good subset acting as a mini snapshot of the corpus.


## ⬇️ Installation

I don't want optional dependencies:

```
pip install bocoel
```

Give me the full experience (all optional dependencies):

```
pip install "bocoel[all]"
```


## 🥰 Contributing

Openness and inclusiveness are taken very seriously. Please follow the guide to [contributing](./CONTRIBUTING.md) and the [code of conduct](./CODE_OF_CONDUCT.md).

## 🏷️ License and Citation

The code is available under [Apache License](./LICENSE.md).

TODO: Citation
