Metadata-Version: 1.2
Name: SVision
Version: 1.3.3
Summary: SV/CSV callers
Home-page: https://github.com/xjtu-omics/SVision
Author: Jiadong Lin, Songbo Wang
Author-email: jiadong66@stu.xjtu.edu.cn, songbowang125@163.com
License: GPLv3
Description: <img src="https://github.com/xjtu-omics/SVision/tree/master/supports/svision-logo.png" alt="svision_logo" width="30%" height="30%" align=center/>
        
        
        SVision is a deep learning-based structural variants caller that takes aligned reads or contigs as input. Especially, SVision implements a targeted multi-objects recognition framework, detecting and characterizing both simple and complex structural variants from three-channel similarity images.
        
        <img src="https://github.com/xjtu-omics/SVision/tree/master/supports/workflow.png" alt="SVision workflow" width="60%" height="60%" align=center/>
        
        
        ## License
        
        SVision is free for non-commercial use by academic, government, and non-profit/not-for-profit institutions. A commercial version of the software is available and licensed through Xi’an Jiaotong University. 
        For more information, please contact with Jiadong Lin (jiadong324@stu.xjtu.edu.cn) or Kai Ye (kaiye@xjtu.edu.cn).
        
        ## Install
        
        Step1: Create a python environment with conda
        
        ```
        conda create -n svision-env python=3.6
        ```
        Step2: Install required packages of specific versions
        
        ```
        conda install -c anaconda pysam==0.16.0
        conda install -c conda-forge opencv==4.5.1
        conda install -c conda-forge tensorflow==1.14.0
        ```
        Step3: Install SVision from PyPI
        
        ```
        pip install SVision
        ```
        
        (Optional) Install from source code
        
        ```
        git clone https://github.com/xjtu-omics/SVision.git
        cd SVision
        python setup.py install
        ```
        
        ## Usage
        
        ```
        SVision [parameters] -o <output path> -b <input bam path> -g <reference> -m <model path>
        ```
        
        Please check the [wiki](https://github.com/xjtu-omics/SVision/wiki) page for more usage details. 
        
        #### Input/output parameters
        
        ```
        -o OUT_PATH           Absolute path to output
        -b BAM_PATH           Absolute path to bam file
        -m MODEL_PATH         Absolute path to CNN predict model
        -g GENOME             Absolute path to your reference genome (.fai required in the directory)
        -n SAMPLE             Name of the BAM sample name
        ```
        
        ```-g``` path to the reference genome, the index file should under the same directory.
        
        ```-m``` path to the pre-trained deep learning model ([download link](https://drive.google.com/drive/folders/1j74IN6kPKEx9hy3aENx3zHYPUnyYWGvj?usp=sharing)).
        
        #### General parameters
        ```
        -t THREAD_NUM         Thread numbers [1]
        -s MIN_SUPPORT        Min support read number for an SV [1]
        -c CHROM              Specific region to detect, format: chr1:xxx-xxx or 1:xxx-xxx
        --hash_table          Activate hash table to align unmapped sequences
        --cluster_callset     Cluster original callset to merge uncovered event
        --report_mechanism    Report mechanisms for DEL event
        --report_graph        Report graph for events
        --contig              Activate contig mode
        ```
        
        ```--hash_table``` enables the image subtraction process, which is activated by default. 
        
        ```--report_graph``` enables the program to create the CSV graph in GFA format, which is not activated by default. 
        
        ```--report_mechanism``` is used to infer the formation mechansim according to the breakpoint sequence features. 
        This is still underdevelopment, which is not recommended to use for current version.
        
        ```--contig``` is used for calling from assemblies, which currently uses minimap2 aligned BAM file as input.
        
        #### Other parameters
        
        ```--partition_max_distsance``` maximum distance allowed of a group of feature sequences.
        
        ```--cluster_max_distance``` maximum distance for feature sequence clustering. This is implemented via Scipy hierarchical clustering.
        
        ```--k_size``` size of kmer used in hash-table realignment, only used when ```--hash_table``` is activated.
        
        ```--min_accept``` minimum matched segment length, default is 50bp.
        
        
        ## SVision output
        
        ### VCF
        
        The SV ```ID``` column is given in the format of ```a_b```, where ```b``` indicates site ```a``` contains other type of SVs. 
        
        Filters used in the output.
        
        ```Covered```: The entire SV is spanned by long-reads, producing the most confident calls.
        
        ```Uncovered```: SV is partially spanned by long-reads, i.e. reads spanning one of the breakpoints.
        
        ```Clustered```: SV is partially spanned by long-reads, but can be spanned through reads clusters.
        
        We add extra attributes in the ```INFO``` column of VCF format for SVision detected structural variants.
        
        ```BRPKS```: The CNN recognized breakpoint junctions through tMOR.
        
        ```GraphID```: The graph index used to indicate the graph structure, which requires ```--report_graph``` and is obtained by calculating isomorphic graphs. 
        The ID for simple SVs is -1. 
        
        ```VAF```: The estimated variant allele fraction, which is calculated by DV/DR. Note that SVision does not calculate the genotypes in the current version.
        
        ### CSV graph 
        
        #### Graph format
        
        The graph output requires ```--report_graph``` activated. 
        The below example is an CSV in rGFA format, which is detected by SVision at chr11:99,819,283-99,820,576 in HG00733. 
        The graph output is saved in separated files for each CSV events.
        
        ```
        S	S1	SN:Z:chr11	SO:i:99819338	SR:i:0	LN:i:2990
        S	I0	SN:Z:m54329U_190827_173812/140708091/ccs	SO:i:15813	SR:i:0	LN:i:1113
        S	I1	SN:Z:m54329U_190827_173812/140708091/ccs	SO:i:16927	SR:i:0	LN:i:466
        S	I2	SN:Z:m54329U_190827_173812/140708091/ccs	SO:i:17400	SR:i:0	LN:i:377	DP:S:S1:99820198
        S	I3	SN:Z:m54329U_190827_173812/140708091/ccs	SO:i:17778	SR:i:0	LN:i:838
        S	I4	SN:Z:m54329U_190827_173812/140708091/ccs	SO:i:18617	SR:i:0	LN:i:61	DP:S:S0:99819276
        L	S0	+	I0	+	0M	SR:i:0
        L	I0	+	I1	+	0M	SR:i:0
        L	I1	+	I2	-	0M	SR:i:0
        L	I2	-	I3	+	0M	SR:i:0
        L	I3	+	I4	+	0M	SR:i:0
        L	I4	+	S1	+	0M	SR:i:0
        ```
        
        Besides the information included in standard [rGFA](https://github.com/lh3/gfatools/blob/master/doc/rGFA.md) format, 
        we add another ```DP:S``` column to indicate sequence with detected origins via local realignment, 
        such as node ```I2``` is duplicated from node ```S1```. 
        
        #### Graph alignment (Experimental)
        
        **Note**: This function is not included in the current program, it is a post-processing step that tries to validate the detected CSVs. 
        
        To validate the detected CSV, we align raw HiFi reads to the mini graph (CSV graph) reported by SVision with GraphAligner.
        
        **Step1: Extract HiFi raw reads**
        
        ```
        samtools view -b HG00733.ngmlr.sorted.bam chr11:99810000-99830000 > tmp.bam
        samtools fasta tmp.bam > tmp.fasta
        ```
        
        **Step2: Align with GraphAligner**
        
        Please check [GraphAligner](https://github.com/maickrau/GraphAligner) for the detailed usage.
        
        ```
        GraphAligner -g chr11-99819283-99820576.gfa -f tmp.fasta -a aln.gaf -x vg
        ```
        
        Example of CSV path supporting reads
        
        ```
        m54329U_190827_173812/140708091/ccs     21668   0       21668   +       >S0>I0>I1<I2>I3>I4>S1
        m54329U_190617_231905/88145984/ccs      13612   0       13612   +       >S0>I0>I1<I2>I3>I4>S1
        m54329U_190617_231905/88145984/ccs      13612   0       13612   +       >S0>I0>I1<I2>I3>I4>S1
        ```
        
        ## Contact
        If you have any questions, please feel free to contact: jiadong66@stu.xjtu.edu.cn, songbowang125@163.com
        
Keywords: SVision,Deep learning,Complex structural variants,Structural variants,Single moleculo sequencing
Platform: UNKNOWN
Classifier: Operating System :: POSIX :: Linux
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3.6
Requires-Python: >=3.6
