Metadata-Version: 1.1
Name: rmseq
Version: 0.0.53b0
Summary: RM-seq is a bioinformatics tool for for assessing resistance mutations from PE short-reads.
Home-page: UNKNOWN
Author: Romain Guerillot, Torsten Seemann, Mark B Schultz
Author-email: romain.guerillot@hotmail.fr
License: AGPL-3.0
Download-URL: https://github.com/rguerillot/RM-seq/zipball/master
Description: RM-seq: Resistance Mutation SEQuencing
        ======================================
        
        Analysis bioinformatic pipeline for high-throughput assessment of
        resistance mutations. RM-seq is an amplicon-based, deep-sequencing
        technique using single molecule barcoding. We have adapted this method
        to identify and characterise antibiotic resistance mutation.
        
        A complete descrition of the RM-seq method is available
        `here <https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-018-0572-z>`__:
        
        If you use this tool please cite: Guérillot R, Li L, Baines S, Howden
        BO, Schultz M, Seemann T et al. Comprehensive antibiotic-linked mutation
        assessment by Resistance Mutation Sequencing (RM-seq). 2018.
        doi:10.1101/257915.
        
        Is this the right tool for me?
        ------------------------------
        
        1. To be able to us this pipeline you need to have sequenced amplicon
           library with molecular barcodes.
        2. It only supports paired-end FASTQ reads (including .gz compressed
           fastq files).
        3. It needs paired reads that are overlapping.
        4. It needs a reference fasta sequences of the sequenced gene (DNA
           sequence).
        5. It's written in Python3 and Perl.
        
        Installation
        ------------
        
        Install RM-seq pipeline
        ~~~~~~~~~~~~~~~~~~~~~~~
        
        ::
        
            pip3 install rmseq
        
        Dependencies
        ~~~~~~~~~~~~
        
        RM-seq has the following package dependencies: \* EMBOSS >= 6.6 for
        ``clustalo``, ``cons``, ``getorf``, ``diffseq`` \* clustal-omega >=
        1.2.1 \* bwa >= 0.7.15 \* samtools >= 1.3 \* bedtools >= 2.26.0 \* pear
        >= 0.9.10 \* cd-hit >= 4.7 \* trimmomatic >= 0.36 \* seqtk >= 1.3-r106
        (only if you subsample reads) \* python modules: ``plumbum``,
        ``Biopython``
        
        If you are using the `OSX Brew <http://brew.sh/>`__ or
        `LinuxBrew <http://linuxbrew.sh/>`__ packaging system:
        
        ::
        
            brew tap homebrew/science
            brew tap tseemann/bioinformatics-linux
            brew install parallel; parallel --citation # please write will cite
            brew install bedtools
            brew install EMBOSS
            brew install clustal-omega
            brew install bwa
            brew install samtools
            brew install pear
            brew install cd-hit
            brew install trimmomatic
            brew install seqtk
            pip3 install plumbum
            pip3 install biopython
        
        Quick start
        -----------
        
        Do
        
        ::
        
            rmseq
        
        Help
        ~~~~
        
        ::
        
            usage: rmseq [-h]  ...
        
            Run RM-seq pipeline.
        
            optional arguments:
              -h, --help  show this help message and exit
        
            Commands:
        
                run       Run the pipeline.
                version   Print version.
                check     Check pipeline dependencies
                test      Run the test data set.
        
        To check dependencies are installed
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        ::
        
            rmseq check
        
        To run the test dataset
        ~~~~~~~~~~~~~~~~~~~~~~~
        
        ::
        
            rmseq test
        
        To run analysis pipeline, follow the steps in
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        ::
        
            rmseq run -h
            usage: rmseq run [options]
        
            Run the pipeline
        
            positional arguments:
              R1                    Path to read pair 1
              R2                    Path to read pair 2
              refnuc                Reference gene that will be used for premapping
                                    filtering (fasta).
              outdir                Output directory.
        
            optional arguments:
              -h, --help            show this help message and exit
              -d, --debug_on        Switch on debug mode.
              -f, --force           Force overwite of existing.
              -b BARLEN, --barlen BARLEN
                                    Length of barcode (default 16)
              -m MINFREQ, --minfreq MINFREQ
                                    Minimum barcode frequency to keep (default 5)
              -c CPUS, --cpus CPUS  Number of CPUs to use (default 72)
              -t TRANSLATION, --translation TRANSLATION
                                    Manually set the reading frame for translation (use 1,
                                    2 or 3 - use getorf by default)
              -r MINSIZE, --minsize MINSIZE
                                    Minimum ORF size in bp used when annotating variants
                                    (default 200)
              -w WSIZE, --wsize WSIZE
                                    Word-size option to pass to diffseq for comparison
                                    with reference sequence (default 5)
              -s SUBSAMPLE, --subsample SUBSAMPLE
                                    Only examine this many reads.
              -k, --keepfiles       Keep the intermediate files. Default is to remove
                                    intermediate files
        
        To check the version
        ~~~~~~~~~~~~~~~~~~~~
        
        ::
        
            rmseq version
        
        Outputs
        -------
        
        RM-seq produces a tap-separated output file called amplicons.effect
        where each raw correspond to a consensus amplicon (a genetic variant in
        the sequenced population):
        
        +----------------------+-------------------------+-----------------------------------------------------------------------------------------+
        | Column               | Example                 | Description                                                                             |
        +======================+=========================+=========================================================================================+
        | barcode              | GACACAACTGAGATTA        | sequence of the barcode                                                                 |
        +----------------------+-------------------------+-----------------------------------------------------------------------------------------+
        | sample               | Rifampicin1             | output folder name                                                                      |
        +----------------------+-------------------------+-----------------------------------------------------------------------------------------+
        | prot\_mutation       | H481N                   | annotation of the amino acid change (Histidine residue 481 substituted by Asparagine)   |
        +----------------------+-------------------------+-----------------------------------------------------------------------------------------+
        | prot\_start          | 481                     | start coordinate of the mutation                                                        |
        +----------------------+-------------------------+-----------------------------------------------------------------------------------------+
        | prot\_end            | 481                     | end coordinate of the mutation                                                          |
        +----------------------+-------------------------+-----------------------------------------------------------------------------------------+
        | nuc\_mutation        | C1443G                  | annotation of the nucleotide change                                                     |
        +----------------------+-------------------------+-----------------------------------------------------------------------------------------+
        | nuc\_start           | 1443                    | start coordinate of the nucleotide change                                               |
        +----------------------+-------------------------+-----------------------------------------------------------------------------------------+
        | nuc\_end             | 1443                    | end coordinate of the nucleotide change                                                 |
        +----------------------+-------------------------+-----------------------------------------------------------------------------------------+
        | prot                 | VRPPDKNNRFVGLYCTLV...   | protein sequence of the consensus sequence                                              |
        +----------------------+-------------------------+-----------------------------------------------------------------------------------------+
        | dna                  | GGTTAGACCACCCGATAA...   | dna sequence of the consensus sequence                                                  |
        +----------------------+-------------------------+-----------------------------------------------------------------------------------------+
        | reference\_barcode   | CTGACACGTCCTGAAG        | barcode of the identical consesnsus amplicon used for annotation                        |
        +----------------------+-------------------------+-----------------------------------------------------------------------------------------+
        
        The other files produced by RM-seq are:
        
        +-----------------------+---------------------------------------------------------------------------------------------------------------+
        | File name             | Description                                                                                                   |
        +=======================+===============================================================================================================+
        | amplicons.barcodes    | Table with the count of each barcode sequence                                                                 |
        +-----------------------+---------------------------------------------------------------------------------------------------------------+
        | amplicons.fna         | Multifasta file containing all the consensus nucleotide sequence (header of sequence is the barcode)          |
        +-----------------------+---------------------------------------------------------------------------------------------------------------+
        | amplicons.faa         | Multifasta file containing all the consensus protein sequence (header of sequence is the barcode)             |
        +-----------------------+---------------------------------------------------------------------------------------------------------------+
        | amplicons.fna.cdhit   | Multifasta file containing all the unique consensus nucleotide sequence (header of sequence is the barcode)   |
        +-----------------------+---------------------------------------------------------------------------------------------------------------+
        | amplicons.faa.cdhit   | Multifasta file containing all the unique consensus amino acid sequence (header of sequence is the barcode)   |
        +-----------------------+---------------------------------------------------------------------------------------------------------------+
        
        Issues
        ------
        
        Please report problems to the `Issues
        Page <https://github.com/rguerillot/RM-seq/issues>`__.
        
        Authors
        -------
        
        Romain Guerillot \| Torsten Seemann \| Mark B Schultz (github: schultzm)
        
Keywords: resistance,mutation
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
Classifier: Programming Language :: Python :: 3.5
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: Intended Audience :: Science/Research
