Metadata-Version: 2.1
Name: adult-dataset
Version: 1.0.0
Summary: PyTorch dataset wrapper for the
Keywords: PyTorch,dataset,Adult,Census Income
Author-email: David Boetius <david.boetius@uni-konstanz.de>
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Dist: torch >=1.8, <3.0
Requires-Dist: numpy >=1.18, <2.0
Requires-Dist: pandas >=1.0, <3.0
Requires-Dist: requests >=2.23, <3.0
Requires-Dist: flit==3.9.0 ; extra == "develop"
Requires-Dist: black==23.7.0 ; extra == "develop"
Requires-Dist: pytest >=7.4, <8.0 ; extra == "test"
Requires-Dist: nox==2023.4.22 ; extra == "test"
Project-URL: Bug Tracker, https://github.com/cherrywoods/adult-dataset/issues
Project-URL: Homepage, https://github.com/cherrywoods/adult-dataset
Project-URL: Repository, https://github.com/cherrywoods/adult-dataset.git
Provides-Extra: develop
Provides-Extra: test

# adult-dataset
A PyTorch dataset wrapper for the 
[Adult (Census Income)](https://archive.ics.uci.edu/dataset/2/adult) dataset.
Adult is a popular dataset in machine learning fairness research. 

This package contains only a single class: `adult.Adult`, 
a `torch.utils.data.Dataset` loading and, optionally, downloading the
Adult dataset.
This class can be used like the `MNIST` dataset in
[torchvision](https://pytorch.org/vision/stable/generated/torchvision.datasets.MNIST.html?highlight=mnist#torchvision.datasets.MNIST).

## Installation
```shell
pip install adult-dataset
```

## Basic Usage
```python
from adult import Adult

# load (if necessary, download) the Adult training dataset
train_set = Adult(root="datasets", download=True)
# load the test set
test_set = Adult(root="datasets", train=False, download=True)

inputs, target = train_set[0]  # retrieve the first sample of the training set

# iterate over the training set
for inputs, target in iter(train_set):
    ...  # Do something with a single sample

# use a PyTorch data loader
from torch.utils.data import DataLoader

loader = DataLoader(test_set, batch_size=32, shuffle=True)
for epoch in range(100):
    for inputs, targets in iter(loader):
        ...  # Do something with a batch of samples
```

## Advanced Usage

Turn off status messages while downloading the dataset:
```python
Adult(root=..., output_fn=None)
```

Use the `logging` module for logging status messages while downloading the
dataset instead of placing the status messages on `sys.stdout`.
```python
import logging

Adult(root=..., output_fn=logging.info)
```

