Metadata-Version: 2.1
Name: boxsers
Version: 1.3.1
Summary: Python package that provides a full range of functionality to process and analyze vibrational spectra (Raman, SERS, FTIR, etc.).
Home-page: https://github.com/ALebrun-108/BoxSERS
Author: Alexis Lebrun
Author-email: alexis.lebrun.1@ulaval.ca
License: MIT
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE.txt


----
Introduces **BoxSERS**, a complete and ready-to-use python library for the application of data augmentation, dimensional reduction, spectral correction, machine learning and other methods specially designed and adapted for vibrational spectra(Raman,FTIR, SERS, etc.). 

## Table of contents
* [BoxSERS Installation](#boxsers-installation)
* [Requirements](#requirements)
* [Included Features](#included-features)
  * [Module misc_tools](#module-boxsersmisc_tools)
  * [Module visual_tools](#module-boxsersvisual_tools)
  * [Module preprocessing](#module-boxserspreprocessing)
  * [Module data augmentation](#module-boxsersdata_augmentation)
  * [Dimensional Reduction](#dimensional-reduction)
  * [Unsupervised Machine Learning](#unsupervised-machine-learning)
  * [Supervised Machine Learning](#supervised-machine-learning)
  * [Module validation_metrics](#module-validation_metrics)
* [License](#license)


## BoxSERS Installation

From PypY
```bash
pip install boxsers
```

From Github 
```bash
pip install git+https://github.com/ALebrun-108/BoxSERS.git
```

### Requirements
Listed below are the main modules needed to operate the codes: 

* Sklearn
* Scipy
* Numpy
* Pandas
* Matplotlib
* Tensor flow (GPU or CPU)

Labels associated to spectra can be in one of the following three forms:

| Label Type    | Examples                             |
| ------------- | ------------------------------------ |
| Text          | Cholic, Deoxycholic, Lithocholic, ...|
| Integer       | 0, 3, 1 , ...                        |
| Binary        | [1 0 0 0], [0 0 0 1], [0 1 0 0], ... |

## Included Features
___


### Module ``boxsers.misc_tools``
This module provides functions for a variety of utilities.

* **data_split :** Randomly splits an initial set of spectra into two new subsets named in this
    function: subset A and subset B.
  

* **load_rruff :** Export a subset of Raman spectra from the RRUFF database in the form of three related lists
    containing Raman shifts, intensities and mineral names.
  
### Module ``boxsers.visual_tools``
This module provides different tools to visualize vibrational spectra quickly.

* **spectro_plot :** Returns a plot with the selected spectrum(s) 


* **random_plot :** Plot a number of randomly selected spectra from a set of spectra.


* **distribution_plot :** Return a bar plot that represents the distributions of spectra for each classes in
    a given set of spectra
  
```python
# Code example:
from boxsers.misc_tools import data_split
from boxsers.visual_tools import spectro_plot, random_plot, distribution_plot

wn = 3 
spec =5 

# randomly splits the spectra(spec) and the labels(lab) into test and training subsets.
(spec_train, spec_test, lab_train, lab_test) = data_split(wn, spec , b_size=0.4)  
# resulting train|test set proportions = 0.6|0.4

# plots the classes distribution within the training set.
distribution_plot(lab_train, title='Train set distribution')

# spectra array = spec, raman shift column = wn
random_plot(wn, spec, random_spectra=4)  # plots 4 randomly selected spectra
spectro_plot(wn, spec[0], spec[2])  # plots first and third spectra

```

### Module ``boxsers.preprocessing``
This module provides multiple functions to preprocess vibrational spectra. These features
improve spectrum quality and can improve performance for machine learning applications.

* **baseline_substraction** : Subtracts the baseline signal from the spectrum(s) using
  Asymmetric Least Squares estimation. 


* **intensity_normalization** : Normalizes the spectrum(s) using one of the available norms in this function.


* **savgol_smoothing** : Smoothes the spectrum(s) using a Savitzky-Golay polynomial filter.
  

* **spectral_cut** : Subtracts or sets to zero a delimited spectral region of the spectrum(s)  


* **spline_interpolation** : Performs a one-dimensional interpolation spline on the spectra to reproduce 
  them with a new x-axis.

```python
# Code example:
import numpy as np
from boxsers.preprocessing import baseline_subtraction, spectral_cut, intensity_normalization, spline_interpolation

# interpolates with splines the spectra and converts them to a new raman shift range(new_wn)
new_wn = np.linspace(500, 3000, 1000)
spec_cor = spline_interpolation(spec, wn, new_wn)
# removes the baseline signal measured with the als method 
(spec_cor, baseline) = baseline_subtraction(spec, lam=1e4, p=0.001, niter=10)
# normalizes each spectrum individually so that the maximum value equals one and the minimum value zero 
spec_cor = intensity_normalization(spec)
# removes part of the spectra delimited by the Raman shift values wn_start and wn_end 
spec_cor, wn_cor = spectral_cut(spec, wn, wn_start, wn_end)
```
### Module ``boxsers.data_augmentation``
This module provides several data augmentation methods that generate new spectra by adding
different variations to existing spectra.

* **aug_mixup** : Randomly generates new spectra by mixing together several spectra with a Dirichlet
    probability distribution. 


* **aug_noise** : Randomly generates new spectra with Gaussian noise added.


* **aug_multiplier** : Randomly generates new spectra with multiplicative factors applied.


* **aug_ioffset** : Randomly generates new spectra shifted in intensity.


* **aug_xshift** : Randomly generates new spectra shifted in wavelength.


* **aug_linslope** : Randomly generates new spectra with additional linear slopes

```python
# Code example:

from boxsers.data_augmentation import aug_mixup, aug_noise

spectra_nse, label_nse  = aug_noise(spec, lab, snr=10)
spectra_mult, label_mult = aug_multiplier(spectra, labels, 0.15,)
spectro_plot(wn, spec, spec_nse, spec_mult_sup, spec_mult_inf, legend=legend)

spec_nse, lab_nse = SpectroDataAug.aug_noise(spec, lab, param_nse, quantity=2, mode='random')
spec_mul, lab_mul = SpectroDataAug.aug_multiplier(spec, lab, mult_lim, quantity=2, mode='random')

# stacks all generated spectra and originals in a single array
spec_aug = np.vstack((x, spec_nse, spec_mul))
lab_aug = np.vstack((lab, lab_nse, lab_mul))

# spectra and labels are randomly mixed
x_aug, y_aug = shuffle(x_aug, y_aug)
```

### Module ``boxsers.dimension_reduction``
This module provides different techniques to perform dimensionality reduction of
vibrational spectra.

* **SpectroPCA:** Returns a plot with the selected spectrum(s) 


* **SpectroPCA** : Plot a number of randomly selected spectra from a set of spectra.


* **distribution_plot** : Return a bar plot that represents the distributions of spectra for each classes in
    a given set of spectra
  
### Dimensional Reduction
```python
# Code example:

from boxsers.dimension_reduction import SpectroPCA, SpectroFA, SpectroICA

pca_model = SpectroPCA(n_comp=50)
pca_model.fit_model(spec_train)
pca_model.scatter_plot(spec_test, spec_test, targets=classnames, component_x=1, component_y=2)
pca_model.component_plot(wn, component=2)
spec_pca = pca_model.transform_spectra(spec_test)
```



### Unsupervised Machine Learning 
```python
# Code example:

from boxsers.machine_learning import SpectroGmixture, SpectroKmeans

kmeans_model = SpectroKmeans(n_cluster=5)
kmeans_model.fit_model(spec_train)
kmeans_model.scatter_plot(spec_test)
```

### Supervised Machine Learning 
* Convolutional Neural Networt (3 x Convolutional layer 1D , 2 x Dense layer) 
```python
from boxsers.pca_model import SpectroPCA, SpectroFA, SpectroICA

pca_model = SpectroICA(n_comp=50)
pca_model.fit_model(x_train)
pca_model.scatter_plot(x_test, y_test, targets=classnames, comp_x=1, comp_y=2)
pca_model.pca_component(Wn, 2)
x_pca = pca_model.transform_spectra(x_train)
```

### Module ``validation_metrics``
This module provides different tools to evaluate the quality of a modelâ€™s predictions.

* **cf_matrix** :  Returns a confusion matrix (built with scikit-learn) generated on a given set of spectra.
    

* **clf_report** : Returns a classification report generated from a given set of spectra


