Metadata-Version: 2.1
Name: PICor
Version: 0.6.1
Summary: Isotope correction for MS data
Home-page: https://github.com/MolecularBioinformatics/PICor
Author: Jørn Dietze
Author-email: jorn.dietze@uit.no
License: gpl3
Platform: any
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python
Requires-Python: >=3.6
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Requires-Dist: pandas
Requires-Dist: scipy
Requires-Dist: docopt
Requires-Dist: openpyxl
Provides-Extra: testing
Requires-Dist: pytest ; extra == 'testing'
Requires-Dist: pytest-cov ; extra == 'testing'

# PICor: Statistical Isotope Correction

PICor is a python package for correcting mass spectrometry data for the effect of natural isotope abundance.


## Description

PICor takes pandas DataFrames of the measured integrated MS intensities as input, corrects them for natural isotope abundance and returns a DataFrame again.

PICor can also correct for overlapping isotopologues due to too low resoltion. For example, the 13-C4 and 2-H4 isotopologues of the metabolite NAD can't be resolved at a resolution of 60,000 at 200 m/z.

## Installation

To install:

```bash
$ pip install picor
```

PICor depends on `docopt`, `pandas`, `openpyxl` and `scipy` and installs those with pip if not available.

## Usage

You can use PICor in two ways:

### Command Line

After the installation you can use PICor anywhere from the command line with `picor`.

```bash
picor tests/test_dataset.xlsx NAD -x "dummy column int" -x "dummy column str"
```

Files with `.csv` or `.xlsx` suffix can be used as input files.
You can choose the output file (in csv format) with the `-o` option.
If no output file is given, output will be printed to stdout.

`picor -h` shows all options.

### Python Module

After importing PICor and loading your data (for example a csv file) with pandas you the correction works with:

```python
import pandas as pd
import picor

raw_data = pd.DataFrame(
    {
        "No label": {0: 100, 1: 200, 2: 300, 3: 400, 4: 500, 5: 600},
        "1C13": {0: 100, 1: 100, 2: 100, 3: 100, 4: 100, 5: 100},
        "4C13 6H02 3N15": {0: 30, 1: 40, 2: 50, 3: 60, 4: 70, 5: 80},
        "dummy column str": {0: "C", 1: "ER", 2: "C", 3: "ER", 4: "C", 5: "ER"},
    }
)
corr_data = picor.calc_isotopologue_correction(
    raw_data,
    molecule_name="NAD",
    exclude_col=["dummy column str"],
)
print(corr_data)
```

In case the DataFrame contains columns (except the index colum) with other data than raw measurements, you can use either the `subset` with a list of all columns to be used or `exclude_col` with a list of the column to be skipped.

You can activate a resolution depent correction by setting  `resolution_correction` to `True`. Specify the resolution and the reference m/z ratio with `resolution` and `mz_calibration`.


## Molecule Specification
The molecule to be corrected can either be specified by name (`molecule_name`) or by formula and charge (`molecule_formula` and `molecule_charge`).
If a name is used it has to be specified in the file specified by `molecules_file` (file path). 

The molecule file has to be tab-separated with the columns `name`, `formula` and `charge`. The column labels have to match exactly. Look at the example file ìn `src/picor/metabolites.csv`.

## Input Data
Using the command line interface `picor` both excel (`.xlsx`) and comma-separated data (`.csv`) can be corrected.

Both data formats should be arranged with the different samples as rows and different labels/isotopologues as columns.
Additional columns with for example more information about the samples have to be added to the excluded columns. Either with `-x` (command line) or `exclude_col` parameter as a list (python interface).

### Example Data

| sample number | No label | 1C13 | 4C13 6H02 3N15 | sample condition |
|:------------- | -------- | ---- | -------------- | ---------------- |
| 0             | 100      | 100  | 30             | C                |
| 1             | 200      | 100  | 40             | ER               |
| 2             | 300      | 100  | 50             | C                |
| 3             | 400      | 100  | 60             | ER               |
| 4             | 500      | 100  | 70             | C                |
| 5             | 600      | 100  | 80             | ER               |


### Label Specification

The isotopologues or labels are specified as string in the table header (first line).
Labels can include either one or multiple isotopes or 'No label'.
The number of labeled atoms of each isotope has to be specified before the element, e.g. '3H02 2C13' for three deuterium and two 13-C atoms.
Spaces and underscores are allowed but not necessary in the label definition.

In case you want to add additional information you can a prefix separated by a colon (':'), e.g. "NAD:2C13". The prefix will be ignored. 


Jørn Dietze, UiT -The Arctic University of Norway, 2021


