Metadata-Version: 2.1
Name: DigitalCellSorter
Version: 1.2.3
Summary: Toolkit for analysis and identification of hematological cell types from heterogeneous single cell RNA-seq data
Home-page: https://github.com/sdomanskyi/DigitalCellSorter
Author: S. Domanskyi , A. Szedlak, N. T Hawkins, J. Wang, G. Paternostro, C. Piermarocchi
Author-email: s.domanskyi@gmail.com
License: MIT License
Download-URL: https://github.com/sdomanskyi/DigitalCellSorter/archive/1.2.3.tar.gz
Keywords: single cell RNA sequencing,cell type identification,biomarkers
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: Unix
Classifier: Topic :: Education
Classifier: Topic :: Utilities
Description-Content-Type: text/markdown
Requires-Dist: numpy (>=1.16.4)
Requires-Dist: pandas (>=0.24.2)
Requires-Dist: tables (>=3.5.2)
Requires-Dist: scipy (>=1.3.0)
Requires-Dist: matplotlib (>=3.1.0)
Requires-Dist: scikit-learn (>=0.21.2)
Requires-Dist: plotly (>=4.1.1)
Requires-Dist: mygene (>=3.1.0)
Requires-Dist: pynndescent (>=0.3.3)
Requires-Dist: networkx (>=2.3)
Requires-Dist: python-louvain (>=0.13)
Requires-Dist: fitsne (>=1.0.1) ; platform_system == "Linux" or platform_system == "Darwin"

# Digital Cell Sorter

[![DOI](https://badge.fury.io/gh/sdomanskyi%2FDigitalCellSorter.svg)](https://badge.fury.io/gh/sdomanskyi%2FDigitalCellSorter)
[![DOI](https://badge.fury.io/py/DigitalCellSorter.svg)](https://pypi.org/project/DigitalCellSorter)
[![DOI](https://readthedocs.org/projects/digital-cell-sorter/badge/?version=latest)](https://digital-cell-sorter.readthedocs.io/en/latest/?badge=latest)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3538306.svg)](https://doi.org/10.5281/zenodo.3538306) 

Identification of hematological cell types from heterogeneous single cell RNA-seq data.

[Polled Digital Cell Sorter (p-DCS): Automatic identification of hematological cell types from single cell RNA-sequencing clusters](
https://doi.org/10.1186/s12859-019-2951-x 
"Polled Digital Cell Sorter (p-DCS): Automatic identification of hematological cell types from single cell RNA-sequencing clusters")
Sergii Domanskyi, Anthony Szedlak, Nathaniel T Hawkins, Jiayin Wang, Giovanni Paternostro & Carlo Piermarocchi, 
*BMC Bioinformatics* volume 20, Article number: 369 (**2019**)

The documentation is available at https://digital-cell-sorter.readthedocs.io/.

- [Getting Started](#getting-started)
  * [Prerequisites](#prerequisites)
  * [Loading the package](#loading-the-package)
  * [Gene Expression Data Format](#gene-expression-data-format)
  * [Other Data](#other-data)
- [Functionality](#functionality)
  * [Overall](#overall)
  * [Visualization](#visualization)
- [Demo](#demo)
  * [Usage](#usage)
    + [Main cell types](#main-cell-types)
    + [Cell sub-types](#cell-sub-types)
  * [Output](#output)

## Getting Started

These instructions will get you a copy of the project up and running on your machine for data analysis, development or testing purposes.

### Prerequisites

The code runs in Python >= 3.7 environment. 

It is highly recommended to install Anaconda.
Installers are available at https://www.anaconda.com/distribution/

It uses packages ```numpy```, ```pandas```, ```matplotlib```, ```scikit-learn```, ```scipy```, 
```mygene```, ```fftw```, ```pynndescent```, ```networkx```, ```python-louvain```, ```fitsne```
and a few other standard Python packages. Most of these packages are installed with installation of the 
latest release of ```DigitalCellSorter```:

	pip install DigitalCellSorter

Alternatively, you can install this module directly from GitHub using:

	pip install git+https://github.com/sdomanskyi/DigitalCellSorter

Also one can create a local copy of this project for development purposes by running:

	git clone https://github.com/sdomanskyi/DigitalCellSorter

To install ```fftw``` from the ```conda-forge``` channel add ```conda-forge``` to your channels.
Once the conda-forge channel has been enabled, ```fftw``` can be installed as follows:


	conda config --add channels conda-forge
	conda install fftw

### Loading the package

In your script import the package:

	import DigitalCellSorter

Create an instance of class ```DigitalCellSorter```. Here, for simplicity, we use Default parameter values:

	DCS = DigitalCellSorter.DigitalCellSorter()

<details><summary>During the initialization the following parameters can be specified (click me)</summary><p>

```dataName```: name used in output files, Default ''

```geneListFileName```: marker cell type list name, Default None

```mitochondrialGenes```: list of mitochondrial genes for quality conrol routine, Default None

```sigmaOverMeanSigma```: threshold to consider a gene constant, Default 0.3

```nClusters```: number of clusters, Default 5

```nComponentsPCA```: number of pca components, Default 100

```nSamplesDistribution```: number of random samples to generate, Default 10000

```saveDir```: directory for output files, Default is current directory

```makeMarkerSubplots```:  whether to make subplots on markers, Default True

```makePlots```: whether to make all major plots, Default True

```votingScheme```: voting shceme to use instead of the built-in, Default None

```availableCPUsCount```: number of CPUs available, Default os.cpu_count()

```zScoreCutoff```: zscore cutoff when calculating Z_mc, Default 0.3

```clusterName```: parameter used in subclustering, Default None

```doQualityControl```: whether to remove low quality cells, Default True

```doBatchCorrection```: whether to correct data for batches, Default False

</p></details>

These and other parameters can be modified after initialization using, e.g.:

	DCS.toggleMakeStackedBarplot = False



### Gene Expression Data Format

The input gene expression data is expected in one of the following formats:

1. Spreadsheet of comma-separated values ```csv``` containing condensed matrix in a form ```('cell', 'gene', 'expr')```. 
If there are batches in the data the matrix has to be of the form ```('batch', 'cell', 'gene', 'expr')```. Columns order can be arbitrary.

<details open><summary>Examples:</summary><p>

| cell | gene | expr |
|------|------|------|
| C1   | G1   | 3    |
| C1   | G2   | 2    |
| C1   | G3   | 1    |
| C2   | G1   | 1    |
| C2   | G4   | 5    |
| ...  | ...  | ...  |

or:

| batch  | cell | gene | expr |
|--------|------|------|------|
| batch0 | C1   | G1   | 3    |
| batch0 | C1   | G2   | 2    |
| batch0 | C1   | G3   | 1    |
| batch1 | C2   | G1   | 1    |
| batch1 | C2   | G4   | 5    |
| ...    | ...  | ...  | ...  |

</p></details>


2. Spreadsheet of comma-separated values ```csv``` where rows are genes, columns are cells with gene expression counts.
If there are batches in the data the spreadsheet the first row should be ```'batch'``` and the second ```'cell'```.

<details open><summary>Examples:</summary><p>

| cell  | C1     | C2     | C3     | C4     |
|-------|--------|--------|--------|--------|
| G1    |        | 3      | 1      | 7      |
| G2    | 2      | 2      |        | 2      |
| G3    | 3      | 1      |        | 5      |
| G4    | 10     |        | 5      | 4      |
| ...   | ...    | ...    | ...    | ...    |

or:

| batch | batch0 | batch0 | batch1 | batch1 |
|-------|--------|--------|--------|--------|
| cell  | C1     | C2     | C3     | C4     |
| G1    |        | 3      | 1      | 7      |
| G2    | 2      | 2      |        | 2      |
| G3    | 3      | 1      |        | 5      |
| G4    | 10     |        | 5      | 4      |
| ...   | ...    | ...    | ...    | ...    |

</p></details>

3. ```Pandas DataFrame``` where ```axis 0``` is genes and ```axis 1``` are cells.
If the are batched in the data then the index of ```axis 1``` should have two levels, e.g. ```('batch', 'cell')```, 
with the first level indicating patient, batch or expreriment where that cell was sequenced, and the
second level containing cell barcodes for identification.

<details open><summary>Examples:</summary><p>

    df = pd.DataFrame(data=[[2,np.nan],[3,8],[3,5],[np.nan,1]], 
                      index=['G1','G2','G3','G4'], 
                      columns=pd.MultiIndex.from_arrays([['batch0','batch1'],['C1','C2']], names=['batch', 'cell']))    


</p></details>

4. ```Pandas Series ``` where index should have two levels, e.g. ```('cell', 'gene')```. If there are batched in the data
the first level should be indicating patient, batch or expreriment where that cell was sequenced, the second level cell barcodes for 
identification and the third level gene names.

<details open><summary>Examples:</summary><p>

    se = pd.Series(data=[1,8,3,5,5], 
                   index=pd.MultiIndex.from_arrays([['batch0','batch0','batch1','batch1','batch1'],
                                                    ['C1','C1','C1','C2','C2'],
                                                    ['G1','G2','G3','G1','G4']], names=['batch', 'cell', 'gene']))


</p></details>

Any of the data types outlined above need to be prepared/validated with a function ```prepare()```. 
Let us demonstrate this on the input of type 1:

	df_expr = DCS.prepare('data/testData/dataFileCondensedWithBatches.tsv')

### Other Data

```markersDCS.xlsx```: An excel book with marker data. Rows are markers and columns are cell types. 
'1' means that the gene is a marker for that cell type, and '0' otherwise.
This gene marker file included in the package is used by Default. 
If you use your own file it has to be prepared in the same format (including the two-line header). Note that only the first worksheet will be read,
and its name can be arbitrary. The first column should contain gene names. The second row should contain cell types, and the first row how 
those cell types are grouped. If any of the cell types need to be skipped, have "NA" in the corresponding cell of the first row of that cell type.
See example below:

|A       |B            |C             |D           |E          |F                |G                         |H                           |I                        |J                         |K                  |L               |M                 |...      |
|--------|-------------|--------------|------------|-----------|-----------------|--------------------------|----------------------------|-------------------------|--------------------------|-------------------|----------------|------------------|---------|
|        |B cells      |B cells       |B cells     |T cells    |T cells          |T cells                   |T cells                     |T cells                  |T cells                   |T cells            |NK cells        |NK cells          |...      |
|Marker  |B cells naive|B cells memory|Plasma cells|T cells CD8|T cells CD4 naive|T cells CD4 memory resting|T cells CD4 memory activated|T cells follicular helper|T cells regulatory (Tregs)|T cells gamma delta|NK cells resting|NK cells activated|...      |
|ABCB4   |1            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
|ABCB9   |0            |0             |1           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
|ACAP1   |0            |0             |0           |0          |1                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
|ACHE    |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
|ACP5    |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
|ADAM28  |1            |1             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
|ADAMDEC1|0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
|ADAMTS3 |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
|ADRB2   |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
|AIF1    |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
|AIM2    |0            |1             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
|ALOX15  |0            |0             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
|ALOX5   |0            |1             |0           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
|AMPD1   |0            |0             |1           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
|ANGPT4  |0            |0             |1           |0          |0                |0                         |0                           |0                        |0                         |0                  |0               |0                 |...      |
|...     |...          |...           |...         |...        |...              |...                       |...                         |...                      |...                       |...                |...             |...               |...      |


```Human.MitoCarta2.0.csv```: An ```csv``` spreadsheet with human mitochondrial genes, created within work 
[MitoCarta2.0: an updated inventory of mammalian mitochondrial proteins](https://doi.org/10.1093/nar/gkv1003 "MitoCarta2.0")
Sarah E. Calvo, Karl R. Clauser, Vamsi K. Mootha, *Nucleic Acids Research*, Volume 44, Issue D1, 4 January 2016, Pages D1251вЂ“D1257.


## Functionality

### Overall

The main class for cell sorting functions and producing output images is DigitalCellSorter

<details open><summary>The class includes tools for:</summary><p>

  1. **Pre-preprocessing** of single cell mRNA sequencing data (gene expression data)
     1. Cleaning: filling in missing values, zemoving all-zero genes and cells, converting gene index to a desired convention, etc.
     2. Normalizing: rescaling all cells expression, log-transforming, etc.

  2. **Quality control**
  3. **Batch effects correction**
  4. **Cells anomaly score evaluation**
  4. **Dimensionality reduction**
  5. **Clustering** (Hierarchical, K-Means, knn-graph-based, etc.)
  6. **Annotating cell types**
  7. **Vizualization**
       1. t-SNE layout plot
       2. Quality Control histogram plot
       3. Marker expression t-SNE subplot
       4. Marker-centroids expression plot
       5. Voting results matrix plot
       6. Cell types stacked barplot
       7. Anomaly scores plot
       8. Histogram null distribution plot
       9. New markers plot
       10. Sankey diagram (a.k.a. river plot)

  8. **Post-processing** functions, e.g. extract cells of interest, find significantly expressed genes, 
plot marker expression of the cells of interest, etc.

</p></details>

### Visualization

Function ```visualize()``` or ```process()``` will produce all necessary files for post-analysis of the data. 

<details open><summary>The visualization tools include:</summary><p>

- ```makeMarkerExpressionPlot()```: a heatmap that shows all markers and their expression levels in the clusters, 
in addition this figure contains relative (%) and absolute (cell counts) cluster sizes

<p align="middle">
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_voting.png?raw=true" width="1000"/>
</p>

- ```getIndividualGeneExpressionPlot()```:  t-SNE layout colored by individual gene's expression

<p align="middle">
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/marker_subplots/BM1_CD19_(B4_CVID3_CD19).png?raw=true" width="400"/>
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/marker_subplots/BM1_CD4_(CD4_CD4mut).png?raw=true" width="400"/>
</p>

- ```makeVotingResultsMatrixPlot()```: z-scores of the voting results for each input cell type and each cluster, 
in addition this figure contains relative (%) and absolute (cell counts) cluster sizes

<p align="middle">
 <img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_matrix_voting.png?raw=true" height="700"/>
</p>

- ```makeHistogramNullDistributionPlot()```: null distribution for each cluster and each cell type illustrating 
the "machinery" of the Digital Cell Sorter

<p align="middle">
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_null_distributions.png?raw=true" width="800"/>
</p>

- ```makeQualityControlHistogramPlot()```: Quality control histogram plots

<p align="middle">
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/QC_plots/BM1_number_of_genes_histogram.png?raw=true" width="250"/>
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/QC_plots/BM1_count_depth_histogram.png?raw=true" width="250"/>
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/QC_plots/BM1_fraction_of_mitochondrialGenes_histogram.png?raw=true" width="250"/>
</p>

- ```makeTSNEplot()```: t-SNE layouts colored by number of unique genes expressed, 
number of counts measured, and a faraction of mitochondrial genes..

<p align="middle">
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_number_of_genes.png?raw=true" width="250"/>
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_count_depth.png?raw=true" width="250"/>
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_fraction_of_mitochondrialGenes.png?raw=true" width="250"/>
</p>

<p align="middle">
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_is_quality_cell.png?raw=true" width="500"/>
</p>

<p align="middle">
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_clusters.png?raw=true" width="375"/>
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_patients.png?raw=true" width="375"/>
</p>

Effect of batch correction demostrated on combining BM1, BM2, BM3 and processing the data jointly without (left) and with (right) batch correction option:

<p align="middle">
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/BM123_no_corr_clusters__by_patients.png?raw=true" width="375"/>
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/BM123_with_corr_clusters__by_patients.png?raw=true" width="375"/>
</p>

- ```makeStackedBarplot()```: plot with fractions of various cell types

<p align="middle">
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_clusters_annotated.png?raw=true" width="500"/>
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_subclustering_stacked_barplot_.png?raw=true" height="500"/>
</p>


- ```makeSankeyDiagram()```: river plot to compare various results 

[(see interactive HTML version, download it and open in a browser)](http://htmlpreview.github.io/?https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/Sankey_example.html "Sankey interactive diagram")

<p align="middle">
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/Sankey_example.png?raw=true" width="800"/>
</p>

- ```getAnomalyScoresPlot()```: plot with anomaly scores per cell

<p align="middle">
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score All.png?raw=true" width="750"/>
</p>

Calculate and plot anomaly scores for an arbitrary cell type or cluster:

<p align="middle">
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score B cell.png?raw=true" width="250"/>
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score T cell.png?raw=true" width="250"/>
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_clusters_by_anomaly_score Cluster2.png?raw=true" width="250"/>
</p>


- ```getIndividualGeneTtestPlot()```: Produce heatmap plot of t-test p-Values calculated gene-pair-wise
        from the annotated clusters

<p align="middle">
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_ttest_CD4_(CD4_CD4mut).png?raw=true" width="500"/>
</p>


- ```makePlotOfNewMarkers()```: genes significantly expressed in the annotated cell types

<p align="middle">
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/BM1_new_markers.png?raw=true" width="1000"/>
</p>

</p></details>


## Demo

### Usage

We have made an example execution file ```demo.py``` that shows how to use ```DigitalCellSorter```.

In the demo, folder ```data``` is intentionally left empty. The reader can download the file ```ica_bone_marrow_h5.h5``` 
from https://preview.data.humancellatlas.org/ (Raw Counts Matrix - Bone Marrow) and place in folder ```data```. 
The file is ~485Mb and contains all 378000 cells from 8 bone marrow donors (BM1-BM8). 
In our example, the data of BM1 is prepared by 
function ```PrepareDataOnePatient()``` in module ```ReadPrepareDataHCApreviewDataset```.
Load this function, and call it to create a ```BM1.h5``` file (HDF file of input type 3) in the ```data``` folder:

	from DigitalCellSorter.ReadPrepareDataHCApreviewDataset import PrepareDataOnePatient
	PrepareDataOnePatient(os.path.join('data', 'ica_bone_marrow_h5.h5'), 'BM1', os.path.join('data', ''))


#### Main cell types

In these instructions we have already created an instance of ```DigitalCellSorter``` class (see section **Loading the package**) .
Let's modify some of the ```DCS``` attributes:

	DCS.dataName = 'BM1'
	DCS.saveDir = os.path.join(os.path.dirname(__file__), 'output', 'BM1', '')
	DCS.nClusters = 20

Now we are ready to ```load``` the data, ```prepare```(validate) it and ```process```. The function ```process()``` 
takes takes as an input parameter a pandas DataFrame validated by function ```prepare()```:

	df_expr = pd.read_hdf(os.path.join('data', 'BM1.h5'), key='BM1', mode='r')
	df_expr = DCS.prepare(df_expr)	
	DCS.process(df_expr)

This will launch the processing workflow detailed in our paper and generate the plots. If you don't need any plots and
looking forward to use post-processing tools, call function ```process()``` with an additional parameter:

	DCS.process(df_expr, visualize=False)

Then, if necessary, you can generate all the default plots by:

	DCS.visualize()


#### Cell sub-types

Further analysis can be done on cell types of interest, e.g. here 'T cell' and 'B cell'.
Let's create a new instance of DigitalCellSorter to run "sub-analysis" with it:

    DCSsub = DigitalCellSorter.DigitalCellSorter(dataName=DCS.dataName, 
                                                nClusters=10, 
                                                doQualityControl=False)

It is important to disable Quality control, because the low quality cells have already been identified and filtered with ```DCS```.
Also ```dataName``` parameter points to the location processed with ```DCS```. 
Next modify a few other attributes and process cell type 'T cell':

    DCSsub.subclusteringName = 'T cell'
    DCSsub.saveDir = os.path.join(os.path.dirname(__file__), 'output', DCS.dataName, 'subclustering T cell', '')
    DCSsub.geneListFileName = os.path.join(os.path.dirname(__file__), 'docs', 'examples', 'CIBERSORT_T_SUB.xlsx')

    DCSsub.process(df_expr[DCS.getCells(celltype='T cell')])

This way the t-SNE layout with annotated clusters (left) of T cell sub-types and the corresponding voting matrix (right) 
are generated by the function ```process()```:

<p align="middle">
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/subclustering T cell/BM1_clusters_by_clusters_annotated.png?raw=true" width="400"/>
	<img src="https://github.com/sdomanskyi/DigitalCellSorter/blob/master/docs/examples/output/BM1/subclustering T cell/BM1_matrix_voting.png?raw=true" height="400"/>
</p>

We can reuse the ```DCSsub``` to analyze cell type 'B cell'. Just modify the following attributes:

    DCSsub.subclusteringName = 'B cell'
    DCSsub.saveDir = os.path.join(os.path.dirname(__file__), 'output', DCS.dataName, 'subclustering B cell', '')
    DCSsub.geneListFileName = os.path.join(os.path.dirname(__file__), 'docs', 'examples', 'CIBERSORT_B_SUB.xlsx')

    DCSsub.process(df_expr[DCS.getCells(celltype='B cell')])


To execute the complete script ```demo.py``` run:

	python demo.py

*Note that the HCA BM1 data contains 48000 sequenced cells, requiring approximately 60Gb of RAM (we recommend to use High Performance Computers).
If you want to run our example on a regular PC or a laptop, you can use a randomly chosen number of cells when using ```HCAtools```:

	PrepareDataOnePatient(os.path.join('data', 'ica_bone_marrow_h5.h5'), 'BM1', os.path.join('data', ''),
	                               useAllData=False, cellsLimitToUse=5000)

### Output

All the output files are saved in ```output``` directory inside the directory where the ```demo.py``` script is. 
If you specify any other directory, the results will be generetaed in it.
If you do not provide any directory the results will appear in the root where the script was executed.


