Metadata-Version: 2.1
Name: annopro
Version: 0.1rc2
Summary: A simple python package for annotating protein sequences
Author-email: "Zheng.L.Y" <zhenglingyan@zju.edu.cn>, "Zhang.H.N" <zhang.h.n@foxmail.com>
License: MIT
Project-URL: Homepage, https://github.com/idrblab/AnnoPRO
Project-URL: Bug Tracker, https://github.com/idrblab/AnnoPRO/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Environment :: Console
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: llvmlite (>=0.38.1)
Requires-Dist: numpy (<=1.19.5)
Requires-Dist: pandas (<=1.4.8,>=1.2.4)
Requires-Dist: scikit-learn (>=1.0.2)
Requires-Dist: scipy (<=1.9.5,>=1.4.1)
Requires-Dist: tensorflow (<=2.6.5,>=2.5.0)
Requires-Dist: threadpoolctl (>=3.1.0)
Requires-Dist: fasta (>=2.3.2)
Requires-Dist: wget
Requires-Dist: diamond4py (>=0.0.2rc2)

# AnnoPRO
![AUR](https://img.shields.io/badge/license-MIT-blue.svg)
![python](https://img.shields.io/badge/python->=3.8-success.svg)
![keras](https://img.shields.io/badge/keras-2.5.0-success.svg)
![PMID](https://img.shields.io/badge/PMID-Not%20available-red.svg)
## AnnoPRO generation
* step 1: input proteins sequeces
* step 2: features extraction by Profeat
* step 3:  Feature pairwise distance calculation --> cosine, correlation, jaccard
* Step4: Feature 2D embedding --> umap, tsne, mds
* Step5: Feature grid arrangement --> grid, scatter
* Step5: Transform --> minmax, standard
![image](https://user-images.githubusercontent.com/76670356/204513203-2f0a430b-4b2c-4b1e-9587-3ee5a953150b.png)
## AnnoPRO architecture
* Encoding layers: Protein features was learned by CNNs and Protein similarity was learned by FCs.
* Decoding layers: LSTMs
![image](https://user-images.githubusercontent.com/76670356/204524869-31f558f0-0298-48c5-b4d2-3d5d087a2def.png)
## Installation
1. install compilers

dependency `lapjv` requires `g++` or other Cpp compiler, and annopro contains fortran extensional module and require `gfortran` or other fortran compiler. Here is an example of installing them on Ubuntu.

```bash
sudo apt install gcc g++ gfortran
# or you can install by conda in your virtual env
# command name is like 
# gcc: x86_64-conda_cos6-linux-gnu-cc
# g++: x86_64-conda_cos6-linux-gnu-c++
# gfortran: x86_64-conda_cos6-linux-gnu-gfortran
conda install gcc_linux-64 gxx_linux-64 gfortran_linux-64
```

2. install annopro

You can install it directly by `pip install annopro` or install from source code as following steps.
But you should install numpy first if you install it from source code because we need `numpy.f2py` to help us build fortran extension submodule.
```bash
git clone https://github.com/idrblab/AnnoPRO.git
cd AnnoPRO
conda create -n annopro python=3.8
conda activate annopro
pip install .
```

## Usage
- Use it as a terminal command. For all parameters, type `annopro -h`.
```bash
annopro -i test_proteins.fasta -o output
```
- Use it as a python executable package

```bash
python -m annopro -i test_proteins.fasta -o output
```

- Use it as a library to integrated with your project.
```python
from annopro import main
main("test_proteins.fasta", "output")
```

The result is displayed in the `./output/bp(cc,mf)_result.csv`.

**Notice**: if you use annopro for the first time, annopro will
automatically download required resources when they are used
(lazy download mechanism)

## Possible problems
1. pip is looking at multiple versions of XXX to determine which version is compatible with other requirements. this could take a while.

Your pip is latest, back to old version such as 20.2, or just add `--use-deprecated=legacy-resolver` param.

2. Argument mismatch when building source code.

Because your gfortran is latest and imcompatible,
edit setup.py and uncomment `-fallow-argument-mismatch` or 
just use a earlier version of gfortran such as 4.8.5, 8.4

## Contact
If any questions, please create an [issue](https://github.com/idrblab/AnnoPRO/issues/new/choose) on this repo, we will deal with it as soon as possible.
