Metadata-Version: 2.1
Name: SCNIC
Version: 0.6.4
Summary: A tool for finding and summarizing modules of highly correlated observations in compositional data
Home-page: https://github.com/lozuponelab/SCNIC/
Download-URL: https://github.com/lozuponelab/SCNIC/tarball/0.6.4
Author: Lozupone Lab
Author-email: lozuponelab.dev@olucdenver.onmicrosoft.com
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: scipy<=1.10.1,>=1.9.0
Requires-Dist: networkx>=2
Requires-Dist: biom-format
Requires-Dist: pandas>=1
Requires-Dist: scikit-bio
Requires-Dist: statsmodels
Requires-Dist: tqdm
Requires-Dist: seaborn
Requires-Dist: h5py

![PyPI](https://img.shields.io/pypi/v/SCNIC.svg?style=flat) ![Build Status](https://github.com/lozuponelab/SCNIC/actions/workflows/main.yaml/badge.svg)

# SCNIC
Sparse Cooccurrence Network Investigation for Compositional data
Pronounced 'scenic'.

*NOTE: The most up to date version of the SCNIC repo is [here](https://github.com/lozuponelab/SCNIC)!*

SCNIC is a package for the generation and analysis of cooccurrence (positive correlation) networks with compositional data. Data generated by
many next gen sequencing experiments is compositional (is a subsampling of the total community) which violates
assumptions of typical cooccurence network analysis techniques. 16S sequencing data is often very compositional in
nature so methods such as SparCC (https://bitbucket.org/yonatanf/sparcc) have been developed for studying correlations
microbiome data. SCNIC is designed with compositional data in mind and so provides multiple correlation measures
including SparCC.

Running SCNIC is possible via two different methods. SCNIC is packaged with scripts to allow running it on the command
line but also is avaliable as a Qiime2 plugin (https://www.github.com/lozuponelab/q2-SCNIC). Either method is valid but
usage of the Qiime2 plugin provides easier access when working within the Qiime2 ecosystem.

## Overview
### Within
The 'within' method takes as input [BIOM](http://biom-format.org/) formatted files and forms cooccurrence networks using a
 user specified correlation metric.

### Modules
From the correlation network generated as part of the within step, SCNIC  finds modules of cooccurring observations
by finding groups of observations which all have a minimum pairwise correlation value. Modules are summarized and a new
biom table with observations contained in modules collapsed into single observations are returned. This biom table along
with a list of modules and their contents are output.  A gml file of the network that can be opened using network
visualization tools such as [cytoscape](http://www.cytoscape.org/) is created which contains all observation metadata
provided in the input biom file as well as module information. Please be aware that the networks output by this analysis
will only include positive correlations as only positive correlations are used in module finding and summarization.

### Between
The 'between' method takes two biom tables as input and calculates all pairwise correlations between the tables using a
selection of correlation metrics. A gml correlation network is output as well as a file containing statistics and
p-values of all correlations.

## Installation

### Installing using environment.yaml
We recommend using mamba (or conda) and our `environment.yaml` file to install the full environment in one step. See the [mamba documentation](https://mamba.readthedocs.io/en/latest/mamba-installation.html#mamba-install) for mamba installation steps.

```
wget https://raw.githubusercontent.com/lozuponelab/SCNIC/master/environment.yml
mamba env create -n SCNIC2 --file environment.yml

# Optional cleanup
rm environment.yml
```

If not using mamba, you can install using `conda env create` in place of `mamba env create`. However, conda will be slower and may struggle to solve the dependencies.

#### ARM architecture
**Users with Apple M1/M2 chips** or other ARM architecture should pass `CONDA_SUBDIR=osx-64` at the beginning of the env create command, as can be seen in the following:
```
CONDA_SUBDIR=osx-64 mamba env create -n SCNIC3 --file environment.yml
```

### Multi-step installation directly from Conda + pip
#### Step 1

It is recommended to install all of SCNIC's dependencies via conda in a new conda environment. To do this you only need to create a new environment with SCNIC installed. However, since conda has not always accepted the latest version of SCNIC please manually install SCNIC into your conda environment via PIP. On some computers, there are SciPy conflicts when installing via conda, so we recommend installing SciPy via pip.
```
conda create -n SCNIC python=3 SCNIC
conda activate SCNIC
```

### Step 2 (Pip)
To download the latest release from PyPI install using this command:
```
pip install "scipy>=1.9.0,<=1.10.1"
pip install SCNIC
```



### Dependencies
SCNIC depends on a variety of software all of which can be install via conda and most of which can be installed by pip. The recommended installation method is to install via pip but you must also install [`fastspar`](https://github.com/scwatts/fastspar) and `parallel` and have them in your path. If using the environment.yaml installation, this should be unnecessary.

To do so you can create a conda environment below, then install both fastspar and parallel. 

ex: 
```
$conda install -c bioconda -c conda-forge fastspar
```



### Install the latest version from github
To download the lastest changes to the repository use the following commands:
```
git clone https://github.com/lozuponelab/SCNIC.git
cd SCNIC/
python setup.py install
```
NOTE: This latest code may not be functional and should only be used if you want to play around with the code this is
based on.

## Example usage:

### 'within' mode:
```
SCNIC_analysis.py within -i example_table.biom -o within_output/ -m sparcc
```

### 'modules' mode:
```
SCNIC_analysis.py modules -i within_output/correls.txt -o modules_output/ --min_r .35 --table example_table.biom
```
NOTE: We use a minimum R value of .3 when running SparCC with 16S data as a computationally demanding bootstrapping
procedure must be run to determine p-values. We have run SparCC with 1 million bootstraps on a variety of datasets and
found that a R value of between .3 and .35 will always return FDR adjusted p-values less than .05 and .1 respectively.

### 'between' mode:
```
SCNIC_analysis.py between -1 example_table1.biom -2 example_table2.biom -o output_folder/ -m spearman --min_p .05
```
