Metadata-Version: 2.1
Name: babymmlu
Version: 0.0.4
Summary: Sample implementation of babymmlu benchmark
Home-page: https://github.com/averkij/babymmlu
Author: Ivan Bychkov
Author-email: Sergei Averkiev <averoo@gmail.com>
Project-URL: Bug Tracker, https://github.com/averkij/babymmlu/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: transformers
Requires-Dist: datasets
Requires-Dist: torch
Requires-Dist: numpy

# babymmlu

Implementation of utilities to measure babymmlu benchmark (see https://huggingface.co/datasets/ai-forever/baby_mmlu).

## Methods

### eval_parallel

Calculates babymmlu measures.

| Parameter    | Optional | Type                      | Default value        | Description                                 |
|--------------|----------|---------------------------|----------------------|---------------------------------------------|
| model        |    No    | AutoModelForCausalLM      |                      | Model to evaluate.                          |
| tokenizer    |    No    | AutoTokenizer             |                      | Tokenizer used with model.                  |
| dataset      |    Yes   | Dataset (Optional) or str | ai-forever/baby_mmlu | Dataset to evaluate model on.               |
| q_batch_size |    Yes   | int                       | 10                   | Number of questions to process in parallel. |

#### Return value
The function returns a tuple with 3 elements: babymmlu measured be crossentropy-per-char, crossentropy-per-token and crossentropy-total.

### load_model_and_tokenizer

Loads model and tokenizer from the same location.

| Parameter    | Optional | Type  | Description                                 |
|--------------|----------|-------|---------------------------------------------|
| path         |    No    | str   | Path to load model and tokenizer from.      |
| use_cuda     |   Yes    | bool  | Whether to load model to cuda or to cpu.    |


#### Return value
The function returns a tuple with 2 elements: 
* model
* tokenizer


### load_model

Loads model from the specified location.

| Parameter    | Optional | Type  | Description                                 |
|--------------|----------|-------|---------------------------------------------|
| model_path   |    No    | str   | Path to load model from.                    |
| use_cuda     |   Yes    | bool  | Whether to load model to cuda or to cpu.    |

#### Return value
The function returns loaded model.

### load_tokenizer

Loads tokenizer from the specified location.

| Parameter      | Optional | Type  | Description                                 |
|----------------|----------|-------|---------------------------------------------|
| tokenizer_path |    No    | str   | Path to load tokenizer from.                |

#### Return value
The function returns loaded tokenizer.


## Example
```
import babymmlu
model, tokenizer = babymmlu.load_model_and_tokenizer('ai-forever/rugpt3small_based_on_gpt2')
result = babymmlu.eval_parallel(model, tokenizer)
print('babymmlu crossentropy-per-char, crossentropy-per-token and crossentropy-total', result)
```
