Metadata-Version: 2.1
Name: GeneCluster
Version: 0.1.0
Summary: A gene clustering model using deep learning
Home-page: https://github.com/Byting820/GeneCluster
Author: Byting
Author-email: yutingya820@163.com
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: torch
Requires-Dist: torchvision
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: umap-learn
Requires-Dist: matplotlib
Requires-Dist: scikit-learn
Requires-Dist: argparse

# GeneCluster

GeneCluster, a gene clustering method for Single Cell RNA-seq data.

## Requirements
python --- 3.8.10

scanpy --- 1.8.2

umap-learn --- 0.5.3

torch --- 1.8.1

torchvision --- 0.9.1

faiss-gpu --- 1.7.1

## Usage
The raw single-cell data in h5 format first goes through DataProcess.py to get the gene-cell in csv format, and the csv format is fed into the model for training.

```bash
# Run main.py directly with specified parameters
python main.py --nmb_cluster 20 --batch 128 --epochs 100 \
               --lr 0.01 --data_path normal_data.csv \
               --ckpt_path train_res &

# Alternatively, run the GeneCluster package with parameters
GeneCluster --data_path path/to/dataset.csv --epochs 100 --nmb_cluster 20 --batch 128 --ckpt_path train_res   
```

or

```bash
sh train.sh
```

Please note that the model outputs gene embeddings extracted by training (features.npy), which can be used to predict gene co-expression relationships and identify gene modules.

Here is an example on pbmc3k.h5ad scRNA-seq data.

## Example

```python
from genecluster import main

args = {
    "data_path": "path/to/dataset.csv",
    "epochs": 100,
    "nmb_cluster": 10,
    # other arguments
}

main(args)
```

## Contact

Yuting Bai (yutingya820@163.com)

