Metadata-Version: 2.1
Name: CellDrift
Version: 0.1.3
Summary: CellDrift: A Python Package to Infer Temporal Patterns of Peturbation Effects in Single Cell Data
Home-page: https://github.com/KANG-BIOINFO/CellDrift
Author: Kang Jin
Author-email: jinkg@mail.uc.edu
Maintainer: Kang Jin
License: UNKNOWN
Project-URL: Ducumentation, https://celldrift.readthedocs.io/en/latest/index.html#
Platform: UNKNOWN
Description-Content-Type: text/markdown
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: scanpy (>=1.6.0)
Requires-Dist: matplotlib
Requires-Dist: seaborn
Requires-Dist: scipy
Requires-Dist: statsmodels
Requires-Dist: dtw

[![Documentation Status](https://readthedocs.org/projects/celldrift/badge/?version=latest)](https://celldrift.readthedocs.io/en/latest/index.html) [![Pypi](https://img.shields.io/pypi/v/CellDrift?logo=PyPI)](https://pypi.org/project/CellDrift/)  [![Stars](https://img.shields.io/github/stars/KANG-BIOINFO/CellDrift)](https://github.com/KANG-BIOINFO/CellDrift/stargazers)

# CellDrift
CellDrift: temporal perturbation effects for single cell data

Perturbation effects on gene programs are commonly investigated in single-cell experiments. Existing models measure perturbation responses independently across time series, disregarding the temporal consistency of specific gene programs. We introduce CellDrift, a generalized linear model based functional data analysis approach to investigate temporal gene patterns in response to perturbations. 
![overview](Examples/overview.png)

### Reference
```
CellDrift: Inferring Perturbation Responses in Temporally-Sampled Single Cell Data. BioRxiv. Apr 2022 (https://www.biorxiv.org/content/10.1101/2022.04.13.488194v1)
```

### Prerequisite
```python
# It's recommended to create a new environment using conda (python 3.7 is recommended)
conda create -n celldrift_py python=3.7
# Install prerequisite package scikit-fda (development version)
conda activate celldrift_py # activate celldrift environment
pip install git+https://github.com/GAA-UAM/scikit-fda.git
```

### Installation
```python
git clone https://github.com/KANG-BIOINFO/CellDrift.git
cd CellDrift
pip install .
```

### Tutorial
- [Example on HIV Infection Study](https://github.com/KANG-BIOINFO/CellDrift/blob/main/Tutorial/hiv_infection_tutorial.md)
- [Example on Pseudo-time Data of Brain Organoid Development](https://github.com/KANG-BIOINFO/CellDrift/blob/main/Tutorial/brain_pseudotime_tutorial.md)
- [Check the complete Document](https://celldrift.readthedocs.io/en/latest/index.html)

### Quick Start
```python
import numpy as np
import pandas as pd
import scanpy as sc
import CellDrift as ct
```

#### 1. Load data and preparation
```python
adata = sc.read("example.h5ad")
adata.obs['size_factor'] = np.sum(adata.X, axis = 1)
```

#### 2. Set up CellDrift object
```python
adata = ct.setup_celldrift(
    adata, 
    cell_type_key = 'cell_type',
    perturb_key = 'perturb', 
    time_key = 'time', # the name of time covariate. Must be numeric
    control_name = 'Control', 
    perturb_name = None, 
    size_factor_key = 'size_factor', 
    batch_key = 'batch', 
    n_reps = 3,
    n_cells_perBlock = 100,
    use_pseudotime = False,
    min_cells_perGene = 0
)
```

#### 3. Run GLM model 
```python
adata = ct.model_timescale(
    adata, 
    n_processes = 16, # number of processes for multiprocessing
    chunksize = 100, # number of genes in each chunk
    pairwise_contrast_only = True, 
    adjust_batch = False
)
```

#### 4. set up FDA object
```python
df_zscore = pd.read_csv('Temporal_CellDrift/Contrast_Coefficients_combined_zscores_.txt', sep = '\t', header = 0, index_col = 0) # CellDrift z scores
df_meta = pd.read_csv('Temporal_CellDrift/Contrast_Coefficients_combined_metadata_.txt', sep = '\t', header = 0, index_col = 0) # metadata of contrast comparisons

fda = ct.FDA(df_zscore, df_meta)
```

#### 5. temporal clustering
```python
fd, genes = fda.create_fd_genes(genes = df_zscore.index.values, cell_type = 'Type_0', perturbation = 'Perturb_0')
df_cluster = ct.fda_cluster(fd, genes, n_clusters = 3)
```

#### 6. visualization for each temporal cluster
```python
ct.draw_smoothing_clusters(
    fd, 
    df_cluster, 
    n_neighbors = 2, 
    bandwidth = 1,
    cluster_key = 'clusters_fuzzy', 
    output_folder = 'Temporal_CellDrift/cluster_fuzzy/'
)
```

