Metadata-Version: 1.1
Name: NanoSV
Version: 1.1.1
Summary: Structural variation detection tool for Oxford Nanopore data.
Home-page: https://github.com/mroosmalen/nanosv
Author: Mark van Roosmalen
Author-email: m.vanroosmalen-2@umcutrecht.nl
License: MIT
Description: 
        NanoSV User Guide
        =================
        
        Table of Contents
        -----------------
        
        
        * `NanoSV <#nanosv>`_
        
          * `Summary <#summary>`_
          * `Installation <#installation>`_
          * `Citation <#citation>`_
        
        * `Pre-processing <#pre-processing>`_
        
          * `Basecalling <#basecalling>`_
          * `Mapping <#mapping>`_
          * `LAST mapping <#last-mapping>`_
        
            * `LAST installation <#last-installation>`_
            * `Running LAST <#running-last>`_
        
        * `SV calling using NanoSV <#sv-calling-using-nanosv>`_
        
          * `NanoSV usage <#nanosv-usage>`_
          * `NanoSV arguments and parameters <#nanosv-arguments-and-parameters>`_
        
            * `Required arguments <#required-arguments>`_
            * `Optional arguments <#optional-arguments>`_
            * `Optional configuration parameters <#optional-configuration-parameters>`_
            * `Ancillary files that can be used for running NanoSV <#ancillary-files-that-can-be-used-for-running-nanosv>`_
        
        NanoSV
        ------
        
        Summary
        ^^^^^^^
        
        NanoSV is a software package that can be used to identify structural genomic variations in long-read sequencing data, such as data produced by Oxford Nanopore Technologies’ MinION, GridION or PromethION instruments, or Pacific Biosciences sequencers.
        NanoSV has been extensively tested using Oxford Nanopore MinION sequencing data, as described here: https://www.nature.com/articles/s41467-017-01343-4
        The core algorithm of NanoSV identifies split mapped reads and clusters the split-mapped orientations and genomic positions to identify breakpoint-junctions of structural variations.
        
        Installation
        ^^^^^^^^^^^^
        
        NanoSV needs a working installation of python 3. You can install NanoSV using pip:
        
        .. code-block::
        
           > pip install nanosv
        
        Citation
        ^^^^^^^^
        
        Cretu Stancu, M. *et al.* Mapping and phasing of structural variation in patient genomes using nanopore sequencing. Nat. Commun. 8, 1326 **(2017)**. (https://www.nature.com/articles/s41467-017-01343-4)
        
        Pre-processing
        --------------
        
        Basecalling
        ^^^^^^^^^^^
        
        Raw sequencing data can be basecalled using any available basecaller that is suited for your data. We use albacore for MinION/GridION/PromethION sequencing data, available through the nanopore community: http://community.nanoporetech.com/. Albacore can directly produce fastq files suitable for subsequent mapping to a reference genome.
        
        Mapping
        ^^^^^^^
        
        NanoSV was built based on LAST mapping data. The current version of NanoSV is now also compatible with BWA MEM, MINIMAP2 and NGMLR.
        
        LAST mapping
        ^^^^^^^^^^^^
        
        LAST is able to map the reads in *non-overlapping* split segments.
        
        LAST installation
        ~~~~~~~~~~~~~~~~~
        
        Download the zip file from http://last.cbrc.jp/
        
        .. code-block::
        
           > gunzip last.zip
           > cd /path/to/lastdir
           > make
        
        Running LAST
        ~~~~~~~~~~~~
        
        First you need to index your reference genome by creating a lastal database:
        
        .. code-block::
        
           > lastdb [referencedb] [reference.fa]
        
        Train LAST to get the best scoring parameters for your particular alignment. We typically use a subset of > 10,000 reads for this step:
        
        .. code-block::
        
           > last-train -Q1 [referencedb] [reads_sample.fastq] > [my-params]
        
        Map your fastq data to reference:
        
        .. code-block::
        
           > lastal -Q1 -p [my-params] [referencedb] [reads.fastq] | last-split > [reads.maf]
        
        Convert the MAF file to SAM format:
        
        .. code-block::
        
           > maf-convert -f [reference.dict] sam -r ‘ID:[id] PL:[nanopore] SM:[sample]’ [reads.maf] > [reads.sam]
        
        The ``[reference.dict]`` file can be created by picard:
        
        .. code-block::
        
           > java -jar picard.jar CreateSequenceDictionary REFERENCE=[reference.fa] OUTPUT=[reference.dict]
        
        Convert SAM file to BAM file using sambamba (https://github.com/biod/sambamba) (samtools may function similarly):
        
        .. code-block::
        
           > sambamba view -h -S --format=bam [reads.sam] > [reads.bam]
        
        Sort the BAM file using sambamba: 
        
        .. code-block::
        
           > sambamba sort [reads.bam]
        
        All of the above commands can also be run at once using pipes:
        
        .. code-block::
        
           > lastal -Q1 -p [my-params] [referencedb] [reads.fastq] | \
           > last-split | \
           > maf-convert -f [reference.dict] sam -r ‘ID:[id] PL:[nanopore] SM:[sample]’ /dev/stdin | \
           > sambamba view -h -S --format=bam /dev/stdin | \
           > sambamba sort /dev/stdin -o [reads.sorted.bam]
        
        SV calling using NanoSV
        -----------------------
        
        NanoSV usage
        ^^^^^^^^^^^^
        
        .. code-block::
        
           > NanoSV [-h] [-s SAMBAMBA] [-c CONFIG] [-b BED] [-o OUTPUT] [reads.sorted.bam]
        
        NanoSV arguments and parameters:
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        
        required arguments:
        ~~~~~~~~~~~~~~~~~~~
        
        .. code-block::
        
           bam              :   /path/to/reads.sorted.bam
        
        optional arguments:
        ~~~~~~~~~~~~~~~~~~~
        
        .. code-block::
        
           -h, --help       :   Show the help message and exit
        
           -s, --sambamba   :   Give the full path to the sambamba or samtools executable [default: sambamba ]
        
           -c, --config     :   Give the full path to your own ini file [ default: config.ini ]
        
           -b, --bed        :   Give the full path to your own bed file, used for coverage depth calculations [default: human_hg19.bed ]
        
           -o, --output     :   Give the full path to the output vcf file [default: <stdout> ]
        
        optional configuration parameters:
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        NanoSV uses a config.ini file which contains default settings for all running parameters. Users can change the parameters by creating their own config.ini file and provide this as a command line argument [-c]
        
        .. code-block::
        
           #Reads and segments options
           [Filter options]
           # Maximum number of segments per read resulting from the mapping of the read the a reference sequence
           max_split = 10
           # Minimum percentage of identical bases of the mapped segment relative to the reference sequence      
           min_pid = 0.7
           # Minimum mapping quality of the segment
           min_mapq = 20
        
           #Parameters for tuning detection and clustering of breakpoints:
           [Detection options]
           # Maximum distance between two adjacent break-end positions
           cluster_distance = 10
           # Minimum number of breakpoint-junctions (i.e. split-read junctions) for clustering
           cluster_count = 2
           # Minimum flanking sequence length, to consider a read a reference read, i.e. the sequence mapped left and right of the breakpoint should be larger than the set value
           refreads_distance = 100
           # Minimum length of unmapped sequence for including reads as hanging reads that overlap (support) a break-end
           hanging_length = 20
           # Maximum distance to search for the MATEID, i.e. a reciprocal breakpoint-junction, for example an inversion consist of two breakpoint-junctions (3’-to-3’ and 5’-to-5’)
           mate_distance = 300
           # If TRUE, NanoSV will check the depth of coverage for possible breakpoint-junctions with orientations that indicate a possible deletion or duplication (3’-to-5’ and 5’-to-3’). Needs an auxiliar bed file, provided with -b to the main NanoSV command.
           depth_support = True
        
           #Parameters for setting the FILTER flag in the vcf output:
           [Output filter options]
           # Filter flag: LowQual, set if the QUAL score of the called structural variation is lower
           qual_flag = 20
           # Filter flag: SVcluster, set if there are more SVs within a window size, they will be marked as SVcluster
           window_size = 1000
           # Filter flag: SVcluster, set if the number of SVs within a certain window size (set by window_size above) exceeds this treshold
           svcluster = 2
           # Filter flag: MapQual, set if the median mapq is lower than specified by this parameter
           mapq_flag = 80
           # Filter flag: PID, set if the median percentage identity is lower than specified by this parameter
           pid_flag = 0.80
           # Filter flag: Gap, set if the median GAP is higher than specified by this parameter
           gap_flag = 100
           # Filter flag: CIPOS|CIEND, set if the CIPOS|CIEND is larger than specified by this parameter
           ci_flag = 30
        
        Ancillary files that can be used for running NanoSV:
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        To estimate a coverage increase or decrease near predicted breakpoint-junctions, the average coverage across a putative deletion or duplication interval is compared to the distribution of coverage across random positions in the reference sequence. This calculation is only performed if ``depth_support = True`` in config.ini. A default bed file is provided that contains 1,000,000 random positions on the hg19/GRCh37 human genome reference, excluding simple repeat regions (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/simpleRepeat.txt.gz) and gap regions (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/simpleRepeat.txt.gz). The file format is standard BED format (chr\<TAB&gt;startpos\<TAB&gt;endpos).
        
Keywords: nanosv sv-caller nanopore
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
