Metadata-Version: 2.1
Name: SureTypeSC
Version: 0.3.0
Summary: SureTypeSC - software for improved genotyping in the single cell environment
Home-page: https://github.com/Meiomap/SureTypeSC_2
Author: Lishan Cai
Author-email: shanshancai@hotmail.com
License: GNU
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Python: >= 2.7.5
Description-Content-Type: text/markdown
Requires-Dist: pandas (>=0.22.0)
Requires-Dist: tabulate (>=0.8.3)
Requires-Dist: scikit-learn (>=0.20.4)
Requires-Dist: numpy (>=1.16.4)
Requires-Dist: IlluminaBeadArrayFiles
Requires-Dist: json-tricks (==3.13.1)

# SureTypeSC
SureTypeSC is implementation of algorithm for regenotyping of single cell data.

## Getting Started




```
pip install SureTypeSC
```

### Prerequisites
* python 2 (tested on Python 2.7.5)
* scikit = v0.19.1 (http://scikit-learn.org/stable/)
* numpy >= v1.14.1 (http://www.numpy.org/)
* pandas >= v0.22.0 (https://pandas.pydata.org/)
* IlluminaBeadArrayFiles >= 1.2.0
* json_tricks >= v3 (https://pypi.org/project/json_tricks/)
* tabulate >= v0.8.3(https://pypi.org/project/tabulate/)

### Usage

* create genome studio file (include name,chromosome,position, genotype, gencall score, x raw intensities, x normalized intensities, y raw instensities and y normalized intensities) [format, pandas dataframe]

```
import SureTypeSC as sc

df = sc.basic("HumanKaryomap-12v1_A.bpm","HumanKaryomap-12v1_A.egt","Samplesheetr.csv")


Parameters:The path of manifest file, the path of cluster file, the path of samplesheet file; 
The template of sample sheet is in Samplesheetr.csv
```

* return call rate of all samples over all chromosomes

```
import SureTypeSC.genome_library as gl

call_rate = gl.callrate(df,th=0.01)

call_rate is the data frame that includes the call rate over all the chromosomes

Parameters: df is the pandas data frame from basic function, th is the threshold based on the GenCall score.
```


* return call rate of all samples of one specific chromosome

```
call_rate_chrom = gl.callrate_chr(df,chr_name,th=0.01)

call_rate_chrom is the data frame that includes the call rate of one chromosome

Paramters: df is the pandas data frame from basic function; chr_name is the name of selected chromosome ('X'); th is the threshold based on the GenCall score
```

* return the M and A features of one locus

```
nc, ab, aa, bb = gl.locus_ma(df,locus_name)

nc, ab, aa and bb are pandas data frame with m and a features

Parameters: df is the pandas data frame from basic function; locus_name is the name of one specific locus ('rs3128117')
```

* return the M and A features of one chromsome of one sample

```
AM = gl.sample_ma(df,sample_name,chr_name)

A and M features of one chromsome of one sample

Parameters: df is the pandas data frame from basic function; sample_name is the name of the sample; chr_name is the name one chromosome
```

* return pca components of all samples over all chromosomes

```
pcs = gl.pca_samples(df,th=0.01,n=2)

pca is the pandas data frame that includes the first component and the second component.

Parameters: df is the pandas data frame from basic function; th is the threshold based on the GenCall score; n is the number of components

```

* return pca components of all samples of one specific chromosome

```
pcs_chr = gl.pca_chr(df,chr_name,th=0.01,n=2)

pcs_chr is the data frame that includes the first component and the second component of one chromosome

Parameters: df is the pandas data frame from basic function; chr_name is the name of the selective chromosome; th is the threshold based on the GenCall score; n is the number of components

```



* Index rearrangement (set index levels (including name chromosome and position))
```
dfs = sc.Data.create_from_frame(df)

dfs is Data type
```

* The attribute of Data type

```
dfs.restrict_chromosomes(['1','2']) (The parameters should be a list include the chromosome name)

dfs.apply_NC_threshold_3(0.01,inplace = True) (where 0.01 is the GenCall threshold)
```

* M,A calculation
```
dfs.calculate_transformations_2()
```
* Load classifier
```
from SureTypeSC import loader

clf = loader('clf_30trees_7228_ratio1_lightweight.clf')

clf_2 = loader('clf_GDA_7228_ratio1_58cells.clf') (input should be the path of classifier)
```
* predict

```
result_rf = clf.predict_decorate(dfs,clftype='rf',inn=['m','a'])  (test is the dataset,clftype is the short for classifier like 'rf' or 'gda'. inn is the input feature)

result_gda = clf.predict_decorate(result_rf,clftype='gda',inn=['m','a'])
```

* Train and predict

```
train = sc.Trainer(result_rf,clfname='gda',inner=['m','a'],outer='rf_ratio:1.0_pred')

train.train()

result_end = train.predict_decorate(result_gda,clftype='rf-gda',inn=['m','a'])
```


* save the result
```
result_end.save_complete_table('fulltable.txt',header=False)
```
* save the different modes

```
recall mode: result_end.save_mode('recall','recall.txt',header=False,ratio=1.0)

standard mode: result_end.save_mode('standard','st.txt',header=False,ratio=1.0)

precision mode: result_end.save_mode('precision','precision.txt',header=False,ratio=1.0)

customized saving: result_end.scsave('name.txt', header=True, clftype='rf',threshold=0.15)
```

```
The program enriches every sample in the input data by :

| Subcolumn name  | Meaning |
| ------------- | ------------- |
| rf_ratio:1_pred  | Random Forest prediction (binary)  |
| rf_ratio:1_prob  | Random Forest Score for the positive class |
| gda_ratio:1_prob | Gaussian Discriminant Analysis score for the positive class  | 
| gda_ratio:1_pred | Gaussian Disciminant Analysis prediction (binary) | 
| rf-gda_ratio:1_prob | combined 2-layer RF and GDA - probability score for the positive class | 
| rf-gda_ratio:1_pred | binary prediction of RF-GDA | 
```


<!---## Running the program - validation--->
<!--- Validation procedures are implemented in SureTypeSC.py. To run a validation procedure equivalent to basic configuration, run:--->
<!---```--->
<!---python genotyping/SureTypeSC.py config/GM12878_basic_test.conf--->
<!---```--->


### Contact
In case of any questions please contact Ivan Vogel (ivogel@sund.ku.dk)


