Metadata-Version: 2.1
Name: autotraino
Version: 0.1.0
Summary: AutoML for Tabular datasets.
Project-URL: Homepage, https://github.com/msetzu/autotraino
Project-URL: Bug Tracker, https://github.com/msetzu/autotraino/issues
Author-email: Mattia Setzu <mattia.setzu@unipi.it>
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.10
Requires-Python: <3.11,>=3.10
Provides-Extra: autogluon
Requires-Dist: autogluon-common==0.8.0; extra == 'autogluon'
Requires-Dist: autogluon-core==0.8.0; extra == 'autogluon'
Requires-Dist: autogluon-features==0.8.0; extra == 'autogluon'
Requires-Dist: autogluon-tabular==0.8.0; extra == 'autogluon'
Requires-Dist: boto3==1.26.154; extra == 'autogluon'
Requires-Dist: botocore==1.29.154; extra == 'autogluon'
Requires-Dist: certifi==2023.5.7; extra == 'autogluon'
Requires-Dist: charset-normalizer==3.1.0; extra == 'autogluon'
Requires-Dist: contourpy==1.1.0; extra == 'autogluon'
Requires-Dist: cycler==0.11.0; extra == 'autogluon'
Requires-Dist: fonttools==4.40.0; extra == 'autogluon'
Requires-Dist: idna==3.4; extra == 'autogluon'
Requires-Dist: jmespath==1.0.1; extra == 'autogluon'
Requires-Dist: joblib==1.2.0; extra == 'autogluon'
Requires-Dist: kiwisolver==1.4.4; extra == 'autogluon'
Requires-Dist: matplotlib==3.7.1; extra == 'autogluon'
Requires-Dist: networkx==3.1; extra == 'autogluon'
Requires-Dist: numpy==1.24.3; extra == 'autogluon'
Requires-Dist: packaging==23.1; extra == 'autogluon'
Requires-Dist: pandas==1.5.3; extra == 'autogluon'
Requires-Dist: pillow==9.5.0; extra == 'autogluon'
Requires-Dist: psutil==5.9.5; extra == 'autogluon'
Requires-Dist: pyparsing==3.0.9; extra == 'autogluon'
Requires-Dist: python-dateutil==2.8.2; extra == 'autogluon'
Requires-Dist: pytz==2023.3; extra == 'autogluon'
Requires-Dist: requests==2.31.0; extra == 'autogluon'
Requires-Dist: s3transfer==0.6.1; extra == 'autogluon'
Requires-Dist: scikit-learn==1.2.2; extra == 'autogluon'
Requires-Dist: scipy==1.10.1; extra == 'autogluon'
Requires-Dist: six==1.16.0; extra == 'autogluon'
Requires-Dist: threadpoolctl==3.1.0; extra == 'autogluon'
Requires-Dist: tqdm==4.65.0; extra == 'autogluon'
Requires-Dist: urllib3==1.26.16; extra == 'autogluon'
Provides-Extra: autogluon-all
Requires-Dist: autogluon-common==0.8.0; extra == 'autogluon-all'
Requires-Dist: autogluon-core==0.8.0; extra == 'autogluon-all'
Requires-Dist: autogluon-features==0.8.0; extra == 'autogluon-all'
Requires-Dist: autogluon-tabular==0.8.0; extra == 'autogluon-all'
Requires-Dist: boto3==1.26.154; extra == 'autogluon-all'
Requires-Dist: botocore==1.29.154; extra == 'autogluon-all'
Requires-Dist: catboost==1.2; extra == 'autogluon-all'
Requires-Dist: certifi==2023.5.7; extra == 'autogluon-all'
Requires-Dist: charset-normalizer==3.1.0; extra == 'autogluon-all'
Requires-Dist: cmake==3.26.4; extra == 'autogluon-all'
Requires-Dist: contourpy==1.1.0; extra == 'autogluon-all'
Requires-Dist: cycler==0.11.0; extra == 'autogluon-all'
Requires-Dist: filelock==3.12.2; extra == 'autogluon-all'
Requires-Dist: fonttools==4.40.0; extra == 'autogluon-all'
Requires-Dist: graphviz==0.20.1; extra == 'autogluon-all'
Requires-Dist: idna==3.4; extra == 'autogluon-all'
Requires-Dist: jinja2==3.1.2; extra == 'autogluon-all'
Requires-Dist: jmespath==1.0.1; extra == 'autogluon-all'
Requires-Dist: joblib==1.2.0; extra == 'autogluon-all'
Requires-Dist: kiwisolver==1.4.4; extra == 'autogluon-all'
Requires-Dist: lightgbm==3.3.5; extra == 'autogluon-all'
Requires-Dist: lit==16.0.6; extra == 'autogluon-all'
Requires-Dist: markupsafe==2.1.3; extra == 'autogluon-all'
Requires-Dist: matplotlib==3.7.1; extra == 'autogluon-all'
Requires-Dist: mpmath==1.3.0; extra == 'autogluon-all'
Requires-Dist: networkx==3.1; extra == 'autogluon-all'
Requires-Dist: numpy==1.24.3; extra == 'autogluon-all'
Requires-Dist: nvidia-cublas-cu11==11.10.3.66; extra == 'autogluon-all'
Requires-Dist: nvidia-cuda-cupti-cu11==11.7.101; extra == 'autogluon-all'
Requires-Dist: nvidia-cuda-nvrtc-cu11==11.7.99; extra == 'autogluon-all'
Requires-Dist: nvidia-cuda-runtime-cu11==11.7.99; extra == 'autogluon-all'
Requires-Dist: nvidia-cudnn-cu11==8.5.0.96; extra == 'autogluon-all'
Requires-Dist: nvidia-cufft-cu11==10.9.0.58; extra == 'autogluon-all'
Requires-Dist: nvidia-curand-cu11==10.2.10.91; extra == 'autogluon-all'
Requires-Dist: nvidia-cusolver-cu11==11.4.0.1; extra == 'autogluon-all'
Requires-Dist: nvidia-cusparse-cu11==11.7.4.91; extra == 'autogluon-all'
Requires-Dist: nvidia-nccl-cu11==2.14.3; extra == 'autogluon-all'
Requires-Dist: nvidia-nvtx-cu11==11.7.91; extra == 'autogluon-all'
Requires-Dist: packaging==23.1; extra == 'autogluon-all'
Requires-Dist: pandas==1.5.3; extra == 'autogluon-all'
Requires-Dist: pillow==9.5.0; extra == 'autogluon-all'
Requires-Dist: plotly==5.15.0; extra == 'autogluon-all'
Requires-Dist: psutil==5.9.5; extra == 'autogluon-all'
Requires-Dist: pyparsing==3.0.9; extra == 'autogluon-all'
Requires-Dist: python-dateutil==2.8.2; extra == 'autogluon-all'
Requires-Dist: pytz==2023.3; extra == 'autogluon-all'
Requires-Dist: requests==2.31.0; extra == 'autogluon-all'
Requires-Dist: s3transfer==0.6.1; extra == 'autogluon-all'
Requires-Dist: scikit-learn==1.2.2; extra == 'autogluon-all'
Requires-Dist: scipy==1.10.1; extra == 'autogluon-all'
Requires-Dist: six==1.16.0; extra == 'autogluon-all'
Requires-Dist: sympy==1.12; extra == 'autogluon-all'
Requires-Dist: tenacity==8.2.2; extra == 'autogluon-all'
Requires-Dist: threadpoolctl==3.1.0; extra == 'autogluon-all'
Requires-Dist: torch==2.0.1; extra == 'autogluon-all'
Requires-Dist: tqdm==4.65.0; extra == 'autogluon-all'
Requires-Dist: triton==2.0.0; extra == 'autogluon-all'
Requires-Dist: typing-extensions==4.6.3; extra == 'autogluon-all'
Requires-Dist: urllib3==1.26.16; extra == 'autogluon-all'
Provides-Extra: testing
Requires-Dist: datasets>=2.11.0; extra == 'testing'
Description-Content-Type: text/markdown

# Autotraino :truck:
> :warning: Warning, alpha version, everything brakes. :warning:


`autotraino` is a small wrapper library for AutoML on tabular datasets.

```python
from autotraino.gluon import AutogluonTrainer 
from datasets import load_dataset

train = load_dataset("mstz/adult", "income")["train"].to_pandas() 

# train the model
trainer = AutogluonTrainer()
trainer = trainer.fit(train, target_feature="over_threshold", time_limit=100)
```
When fitting we can control basic parameters such as where to store the resulting models
(parameter `save_path` of the trainer constructor) or the time budget assigned to the trainer (parameter
`time_limit`, expressed in seconds).

Once trained, we can access the single models
```python
# trained models
print(trainer.names)

print(trainer["LightGBM"])
```
and predict directly from the `Trainer` itself: 
```python
train_x = train.copy().drop("over_threshold", axis="columns")
predictions = trainer.predict(train_x, with_models=["LightGBM", "RandomForest"])
```


# Quickstart
You can install `autotraino` via test-pypi:
```shell
# optional
mkvirtualenv -p python3.10 autotraino

pip install --extra-index-url https://test.pypi.org/simple/autotraino
```

# Datasets
`autotraino` is based off `pandas.DataFrame`s.
You can find a large collection on a Huggingface repository I'm curating at [huggingface.co/mstz](https://huggingface.co/mstz).
Datasets are sourced from UCI, Kaggle, and OpenML.
Most are still to be updated (especially dataset cards).

## What model families to train?
Currently based on [Autogluon](https://auto.gluon.ai/stable/index.html), `autotraino` currently trains the following models:
- Boosting models
  - LightGBM
  - CatBoost
- Bagging models
  - Random Forest
  - ExtraTree Classifier
- Neural Network
  - FastAI
  - NNTorch
- Classical AI models
  - k-NN
  - Logistic Regression

## Preprocessing
`autotraino` automatically detects feature types and performs the necessary feature preprocessing per model.
To ease the process, consider setting the appropriate `dtypes` in the input `pandas.DataFrame`.

# In the works
Future developments include:
 - Fitting arbitrary functions (ray tune)
 - Fitting multi-output models.