Metadata-Version: 2.1
Name: aursad
Version: 0.2.4
Summary: Accompanying library to the AURSAD dataset
Home-page: https://github.com/CptPirx/robo-package
Author: Błażej Leporowski
Author-email: bleporowski@outlook.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3.6
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: Microsoft :: Windows :: Windows 10
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: tqdm
Requires-Dist: sklearn
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: tensorflow

# AURSAD 
<div align="left">

[comment]: <> (  <a href='https://ride.readthedocs.io/en/latest/?badge=latest'>)

[comment]: <> (      <img src='https://readthedocs.org/projects/ride/badge/?version=latest' alt='Documentation Status' height="20"/>)

[comment]: <> (  </a>)

[comment]: <> (  <a href="https://codecov.io/gh/LukasHedegaard/ride">)

[comment]: <> (    <img src="https://codecov.io/gh/LukasHedegaard/ride/branch/main/graph/badge.svg?token=SJ59JOWNAC" height="20"/>)

[comment]: <> (  </a>)
  <a href="https://opensource.org/licenses/MIT">
    <img src="https://img.shields.io/badge/License-MIT-yellow.svg" height="20">
  </a>
</div>

A python library for the AURSAD dataset. 
Detailed [description](https://arxiv.org/abs/2102.01409) of the dataset and [download](https://zenodo.org/record/4487073).

The library contains several useful functionalities for preprocessing the dataset for ML applications:
* Creating numpy training and test datasets for sampled data
* Creating [Keras TimeSeries generators](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/sequence/TimeseriesGenerator) 
  for sliding window data
  * Prediction and classification mode
    * In the default prediction mode, the target label of  window is the label of the next sample. This can be
      used to train a sliding window model that predicts the class of the next sample based on the window.
    * In the classification mode, the target label of a window is the most common label in that window.  
* Filtering the dataset
* Removing undesired columns as outlined in the paper
* 3 different types of labeling
    * Full sample labeling where loosening and tightening motions are labeled together
    * Separate sample labeling where loosening motion is given its own label
    * 'Tighten' sample labeling, when only the tightening parts of the whole process are labeled as normal/anomalies, 
      loosening and movement parts of the motion get its own separate labels
* Binary labels -> every anomaly is given the same label
* Subsampling the data
* Dimensionality reduction using PCA or ANOVA F-values
* One-hot label encoding
* Zero padding the samples to equalise their length
* Z-score standardisation
* Taking data only from screwdriver sensors

### Dataset
The dataset contains 2045 samples in total. The robot was sampled with frequency of 100 Hz.

| Type                     | Label | Samples | %  |
|--------------------------|-------|---------|----|
| Normal operation         | 0     | 1420    | 69 |
| Damaged screw            | 1     | 221     | 11 |
| Extra assembly component | 2     | 183     | 9  |
| Missing screw            | 3     | 218     | 11 |
| Damaged thread samples   | 4     | 3       | 0  |

Additionally, there are 2049 supplementary samples describing the loosening/screw picking motion, labeled 5.

## Installation
AURSAD has been tested on Windows 10 and Python 3.8.

### PIP installation
To install from pip with required dependencies use:
```bash
pip install aursad
```
### Source installation
To install latest version from github, clone the source from the project repository and install with setup.py:
```bash
git clone https://github.com/CptPirx/robo-package
cd robo-package
python setup.py install --user
```
## Instructions

The package presents to user two methods: get_dataset_numpy() and get_dataset_generator().

#### Sampling
```bash
def get_dataset_numpy(path, onehot_labels=True, reduce_dimensionality=False, reduce_method='PCA', n_dimensions=60,
                      subsample_data=True, subsample_freq=2, train_size=0.7, random_state=42, normal_samples=1,
                      damaged_samples=1, assembly_samples=1, missing_samples=1, damaged_thread_samples=0,
                      loosening_samples=1, move_samples=1, drop_extra_columns=True, pad_data=True,
                      label_type='partial', binary_labels=False, standardize=False, screwdriver_only=False):
"""
    Create numpy dataset from input file

    :param assembly_samples: float,
        percentage of extra assembly samples to take
    :param binary_labels: bool,
        if True all anomalies are labeled the same
    :param damaged_samples: float,
        percentage of damaged samples to take
    :param damaged_thread_samples: float,
        percentage of damaged thread samples to take
    :param drop_extra_columns: bool,
        drop the extra columns as outlined in the paper
    :param label_type: string,
        'full', 'partial' or 'tighten'
    :param loosening_samples: float,
        percentage of loosening samples to take
    :param missing_samples: float,
        percentage of missing samples to take
    :param move_samples: float,
        percentage of movement samples to take
    :param n_dimensions: int,
        the target number of dimensions
    :param normal_samples: float,
        percentage of normal samples to take
    :param onehot_labels: bool,
        output onehot encoded labels
    :param pad_data: bool,
        if True pad data to equal length samples, if False return data in continuous form
    :param path: path to the data
    :param random_state: int,
        random state for train_test split
    :param reduce_dimensionality: bool,
        reduce dimensionality of the dataset
    :param reduce_method: string,
        dimensionality reduction method to be used
    :param screwdriver_only: bool,
        take only the 4 dimensions from the screwdriver sensors
    :param standardize: bool,
        if True apply z-score standardisation
    :param subsample_data: bool,
        reduce number of events by taking every subsample_freq event
    :param subsample_freq: int,
        the frequency of subsampling
    :param train_size: float,
        percentage of data as training data

    :return: 4 np arrays,
        train and test data & labels
    """
```

Sample usage:
```bash
import aursad

data_path = 'C:/Users/my_path/robot_data.h5'

train_x, train_y, test_x, test_y = aursad.get_dataset_numpy(data_path)
```

#### Sliding window


```bash
def get_dataset_generator(path, window_size=100, reduce_dimensionality=False, reduce_method='PCA', n_dimensions=60,
                          subsample_data=True, subsample_freq=2, train_size=0.7, random_state=42, normal_samples=1,
                          damaged_samples=1, assembly_samples=1, missing_samples=1, damaged_thread_samples=0,
                          loosening_samples=1, move_samples=1,drop_extra_columns=True, label_type='partial',
                          batch_size=256, binary_labels=False, standardize=False, screwdriver_only=False,
                          onehot_labels=True):
    """
    Create Keras sliding window generator from input file

    :param assembly_samples: float,
        percentage of extra assembly samples to take
    :param batch_size: int,
        batch size for the sliding window generator
    :param binary_labels: bool,
        if True all anomalies are labeled the same
    :param damaged_samples: float,
        percentage of damaged samples to take
    :param damaged_thread_samples: float,
        percentage of damaged thread samples to take
    :param drop_extra_columns: bool,
        drop the extra columns as outlined in the paper
    :param label_type: string,
        'full', 'partial' or 'tighten'
    :param loosening_samples: float,
        percentage of loosening samples to take
    :param missing_samples: float,
        percentage of missing samples to take
    :param move_samples: float,
        percentage of movement samples to take
    :param n_dimensions: int,
        the target number of dimensions
    :param normal_samples: float,
        percentage of normal samples to take
    :param onehot_labels: bool,
        output onehot encoded labels
    :param path: path to the data
    :param prediction_mode: bool,
        if True the target of a window [x_0, x_100] is label of x_101, if False, the target is the most common label in [x_0, x_100]
    :param random_state: int,
        random state for train_test split
    :param reduce_dimensionality: bool,
        reduce dimensionality of the dataset
    :param reduce_method: string,
        dimensionality reduction method to be used
    :param screwdriver_only: bool,
        take only the 4 dimensions from the screwdriver sensors
    :param standardize: bool,
        if True apply z-score standardisation
    :param subsample_data: bool,
        reduce number of events by taking every subsample_freq event
    :param subsample_freq: int,
        the frequency of subsampling
    :param train_size: float,
        percentage of data as training data
    :param window_size: int,
        size of the sliding window

    :return: 4 np arrays,
        train and test data & labels
    :return: keras TimeSeries generators,
        train and test generators
    """
  ```

Sample usage:
```bash
import aursad

data_path = 'C:/Users/my_path/robot_data.h5'

train_x, train_y, test_x, test_y, train_generator, test_generator = aursad.get_dataset_generator(data_path)
```

