Metadata-Version: 2.1
Name: bgsignature
Version: 0.2
Summary: Compute genome signatures
Home-page: https://bitbucket.org/bgframework/bgsignature
Author: Barcelona Biomedical Genomics Lab
Author-email: bbglab@irbbarcelona.org
License: Apache Software License 2.0
Description: 
        .. |bs| replace:: **bgsignature**
        
        BGSignature
        ===========
        
        |bs| is a package used to compute signatures.
        
        The most basic type of computation is the computation
        of the counts of the different k-mers (e.g. 3 or 5).
        This computation can be done for a set of mutations,
        for a set of regions or for a set of mutation
        that fall within certain regions.
        
        
        |bs| consists of 3 tools:
        
        - **count**: count different k-mers
        - **frequency**: divide the counts by the total counts
        - **normalize**: divide the counts by counts obtained
          separately and normalize the results.
        
        **Advanced features** include:
        
        - ability to group the counts (e.g. group mutations by sample)
        - normalize the counts by the context taken from a regions file
        - collapse (add together) reverse complementary sequences
        
        
        
        Installation
        ------------
        
        This project is a Python package
        and can be installed with ``pip``.
        Download the source code, get into this
        project directory and execute:
        
        .. code:: bash
        
           pip install .
        
        
        Usage
        -----
        
        Command line interface
        **********************
        
        The 3 tools can be called using
        
        - *bgsignature count*
        - *bgsignature frequency*
        - *bgsignature normalize*
        
        Some examples:
        
        - getting help:
        
            .. code:: bash
        
               bgsignature -h
               bgsignature frequency -h
        
        - count triplets in mutation that fall in certain regions using hg38:
        
            .. code:: bash
        
               bgsignature count -m my/muts/file -r my/regions/file
               -g hg38 -o my/output.json --cores 4
        
        
        Python
        ******
        
        Alternative, the command line options have an equivalent in Python:
        
        .. code:: python
        
           from bgsignature import count, relative_frequency, normalize
        
        that accept similar parameters except the output.
        The return object can be used as a dictionary.
        
        If you already have your files loaded in Python
        you can use directly count function
        in the corresponding module.
        E.g.:
        
        .. code:: python
        
           from bgsignature.count import mutation
           mutation.count(mutations, 'hg38', 3)
        
        In addition, you can also
        use the the "low-level" functions that
        do the count (``count_all``
        and ``count_group``)
        which are much simple and do not
        perform any kind of parallelization.
        E.g.:
        
        .. code:: python
        
           from bgsignature.count import mutation
           mutation.count_all(mutations, 'hg38', 3)
           # or to group mutations by sample
           mutation.count_group(mutations, 'hg38', 3, 'SAMPLE')
        
        
        The return object can be normalized to 1,
        using the ``sum1()`` method
        or divided by some normalization counts
        using the ``normalize()`` method.
        
        
        
        Important
        ---------
        
        There are some behavioural characteristics that
        must be taken into account:
        
        - |bs| filters out mutations whose reference nucleotide
          (as provided in the file), and the
          corresponding one in the reference genome do not match.
        
        - when using the ``collapse`` option (enabled by default),
          |bs| does not remove one of the collapsed sequences but keeps both.
          This means that you need to manually remove the ones you
          are not interested in.
        
        - when using ``bgsignature.count.mutation.count``
          or ``bgsignature.count.region.count`` function
          and a number of ``cores`` for paralelization,
          the ``chunk`` parameter must be selected
          adequately, as a it can have a huge impact on performance.
        
        File formats
        ------------
        
        Mutations file
        **************
        
        Tab separated file
        (can be compressed into ``gz``, ``bgz`` or ``xz`` formats)
        with a header and at least these columns:
        ``CHROMOSOME``, ``POSITION``, ``REF``, ``ALT``.
        In addition, ``SAMPLE``, ``CANCER_TYPE`` and ``SIGNATURE``
        are optional columns that can be used for
        grouping the signature.
        
        
        Regions file
        ************
        
        Tab separated file
        (can be compressed into ``gz``, ``bgz`` or ``xz`` formats)
        with a header and at least these columns:
        ``CHROMOSOME``, ``START``, ``END``, ``ELEMENT``.
        In addition, ``SYMBOL``, and ``SEGMENT``
        are optional columns that can be used for
        grouping the signature.
        
        
        
        Support
        -------
        
        If you are having issues, please let us know.
        You can contact us at: bbglab@irbbarcelona.org
        
Platform: UNKNOWN
Description-Content-Type: text/x-rst
