Metadata-Version: 2.1
Name: SCYN
Version: 1.0.6
Summary: SCYN: Single cell CNV profiling method using dynamic programming
Home-page: https://github.com/xikanfeng2/SCYN
Author: Xikang Feng
Author-email: xikanfeng2@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown

# SCYN: Single cell CNV profiling method using dynamic programming

SCYN: Single cell CNV profiling method using dynamic programming


## Pre-requirements
* python3
* numpy>=1.16.1
* pandas>=0.23.4,<0.24
* tasklogger>=0.4.0
* scipy>=1.3.0
* pysam>=0.15.3
* [SCOPE](https://github.com/rujinwang/SCOPE)


### install requirements
```Bash
pip install -r requirements.txt
```
To install R package SCOPE, please refer to the README of [SCOPE](https://github.com/rujinwang/SCOPE). SCYN integrates the SCOPE to get the cell-by-bin reads depth matrix and perform the normalization. SCYN mainly focuses on finding the optimal CNV segmentation profiling using dynamic programming.

## Installation

### Installation with pip
To install with pip, run the following from a terminal:
```Bash
pip install scyn
```

### Installation from Github
To clone the repository and install manually, run the following from a terminal:
```Bash
git clone https://github.com/xikanfeng2/SCYN.git
cd SCYN
python setup.py install
```

## Usage

### Quick start
The following code runs SCYN.

In command line:
```shell
usage: python run-scyn.py [-h] [options] -i input_bams_dir

SCYN: Single cell CNV profiling method using dynamic programming efficiently
and effectively

required arguments:
  -i, --indir   <str> the input bams directory (default: None)

optional arguments:
  -o, --outdir  <str> the output directory (default: ./)
  --seq           <str> the reads type: single-end or paired-end. (default:
                    single-end)
  --bin_len       <int> the bin length, default is 500K. (default: 500)
  --ref           <str> the reference genome version: hg19 or hg38.
                    (default: hg19)
  --reg           <str> the regular expression to match all BAM files in
                    your input directory. For example, ".bam" will match all
                    BAM files ended with '.bam'. (default: *.bam)
  --mapq          <int> the mapping quality cutoff when calculating the
                    reads coverage. (default: 40)
  --verbose       <int> If > 0, print log messages. (default: 1)
  -h, --help
```

In Python:
```Python
import scyn

# create SCYN object
scyn_operator = scyn.SCYN()

# call cnv
# bam_dir is the input bam directory and output_dir is the output directory
scyn_operator.call(bam_dir, output_dir)

# store cnv matrix to a csv file
scyn_operator.cnv.to_csv('your file name')
```

For 10X merged BAM(One bam file), SCYN provides the function to split merged bam to cell bams based on the barcodes.

```Python
import scyn
scyn.demultiplex_10X_bam(info_file, bam_file, out_dir)
```
This function demultiplexs 10X merged bam file according to barcode
Parameters:
 - info_file : the sample summary info file. Please refer to the 10X websites [breast_tissue_A_2k_per_cell_summary_metrics.csv](http://cf.10xgenomics.com/samples/cell-dna/1.1.0/breast_tissue_A_2k/breast_tissue_A_2k_per_cell_summary_metrics.csv)
 - bam_file : the merged bam file path.
 - out_dir : output directory. The splited bams will be saved in this directory, named as `cell-barcode`.bam. cell-barcode is the barcode of each cell.


### SCYN attributes
```Python
scyn_operator = scyn.SCYN()
```
 - `scyn_operator.cnv` is the copy number variants matrix.
 - `scyn_operator.segments` is the segments for each chromosome.
 - `scyn_operator.meta_info` is the meta information of cells, include gini and ploidy.



### SCYN Output Format
The output of `SCYN` consits of two cnv files and one meta file. 

 - `cnv.csv`: with cell as row and bin as column. This file can be used as the input of Oviz-SingleCell CNV analysis.
 - `cnv_T.csv`: with bin as column and cell as row, it is the transpose matrix of `cnv.csv`. This file can be parse by popular R packages like [`ExpressionSet`](https://www.bioconductor.org/packages/release/bioc/vignettes/Biobase/inst/doc/ExpressionSetIntroduction.pdf) for downstream analysis.
 - `segments.csv` is the cnv segments information for each chromosome.
 - `meta.csv`: with cell as row, and meta information as column. The default meta information is:
   + `c_gini`: stores the gini coeficient of each cell.
   + `c_ploidy`: stores the mean ploidy of each cell, it is calculated from `cnv.csv` (not the one SCOPE provide).

   User can manually add extra cell meta information like 'cell_type', 'cluster', or 'group' for downstream analysis. Prefix `c` here denotes numeric continuous value. The absence of prefix `c` denotes category meta information like 'group' or 'cluster'.

### Parameters
```Python
SCYN(seq='single-end', bin_len=500, ref='hg19', reg='*.bam', mapq=40, verbose=1)
```
Parameters

* seq : string, optional, default: single-end
    The reads type: single-end or paired-end

* bin_len : int, optional, default: 500
    The bin length, default is 500K

* ref : string, optional, default: hg19
    The reference genome version: hg19 or hg38

* reg : string, optional, default: *.bam
    The regular expression to match all BAM files in your input directory.
    For example, "*.bam" will match all BAM files ended with '.bam'

* mapq : int, optional, default: 40
    The mapping quality cutoff when calculating the reads coverage


* verbose : `int` or `boolean`, optional, default: 1

    If `True` or `> 0`, print status messages

## Cite us

## Help
If you have any questions or require assistance using SCYN, please contact us with xikanfeng2@gmail.com.

