Metadata-Version: 2.1
Name: biolearns
Version: 0.0.13
Summary: BioLearns: Computational Biology and Bioinformatics Toolbox in Python
Home-page: http://biolearns.com
Author: Zhi Huang
Author-email: huang898@purdue.edu
License: UNKNOWN
Description: # biolearns
        BioLearns: Computational Biology and Bioinformatics Toolbox in Python http://biolearns.com
        
        <div style="text-align:center"><img src="http://biolearns.com/img/logo.png" width=300/></div>
        
        
        ## Installation
        
        * From PyPI
        
        ```bash
        pip install biolearns
        ```
        
        ## Documentation and Tutorials
        
        * We select three examples listed below. For full list of tutorial, check our github wiki page:
        
            [Wiki](https://github.com/huangzhii/biolearns/wiki)
        
        
        
        
        ### 1. Read TCGA Data
        
        #### Example: Read TCGA Breast invasive carcinoma (BRCA) data
        
        Data is downloaded directly from https://gdac.broadinstitute.org/.
        The results here are in whole or part based upon data generated by 
        the TCGA Research Network: https://www.cancer.gov/tcga.
        
        ```python
        from biolearns.dataset.TCGA import TCGACancer
        ```
        
        ```python
        brca = TCGACancer('BRCA')
        mRNAseq = brca.mRNAseq
        clinical = brca.clinical
        ```
        
        #### TCGA cancer table shortcut:
        
        |              | Barcode            | Cancer full name         | Version            |
        |---|---|---|---|
        | 1      |  ACC          |  Adrenocortical carcinoma     | 2016_01_28 |
        | 2      |  BLCA         |  Bladder urothelial carcinoma         | 2016_01_28 |
        | 3      |  BRCA         |  Breast invasive carcinoma    | 2016_01_28 |
        | 4      |  CESC         |  Cervical and endocervical cancers    | 2016_01_28 |
        | 5      |  CHOL         |  Cholangiocarcinoma   | 2016_01_28 |
        | 6      |  COAD         |  Colon adenocarcinoma         | 2016_01_28 |
        | 7      |  COADREAD     |  Colorectal adenocarcinoma    | 2016_01_28 |
        | 8      |  DLBC         |  Lymphoid Neoplasm Diffuse Large B-cell Lymphoma      | 2016_01_28 |
        | 9      |  ESCA         |  Esophageal carcinoma         | 2016_01_28 |
        | ...     |  ...         |  ...          | ... |
        
        
        ### 2. Gene Co-expression Analysis
        
        We firstly download and access the mRNAseq data.
        ```python
        from biolearns.dataset.TCGA import TCGACancer
        
        brca = TCGACancer('BRCA')
        mRNAseq = brca.mRNAseq
        ```
        
        mRNAseq data is noisy. We filter out 50% of genes with lowest mean values, and then filter out 50% remained genes with lowest variance values.
        
        ```python
        from biolearns.preprocessing.filter import expression_filter
        mRNAseq = expression_filter(mRNAseq, meanq = 0.5, varq = 0.5)
        ```
        
        We then use lmQCM class to create an lmQCM object ```lobj```.
        
        The gene co-expression analysis is performed by simply call the ```fit()``` function.
        
        ```python
        from biolearns.coexpression.lmQCM import lmQCM
        
        lobj = lmQCM(mRNAseq)
        clusters, genes, eigengene_mat = lobj.fit()
        ```
        
        ### 3. Univariate survival analysis
        
        We firstly download and access the mRNAseq data. Use breast cancer as an example.
        ```python
        from biolearns.dataset.TCGA import TCGACancer
        
        brca = TCGACancer('BRCA')
        mRNAseq = brca.mRNAseq
        ```
        
        We import logranktest from survival subpackage. Choose gene "ABLIM3" as the univariate input.
        ```python
        from biolearns.survival import logranktest
        
        r = mRNAseq.loc['ABLIM3',].values
        ```
        
        We find the intersection of univariate, time, and event data
        ```python
        bcd_m = [b[:12] for b in mRNAseq.columns]
        bcd_p = [b[:12] for b in clinical.index]
        bcd = np.intersect1d(bcd_m, bcd_p)
        
        r = r[np.nonzero(np.in1d(bcd, bcd_m))[0]]
        t = brca.overall_survival_time[np.nonzero(np.in1d(bcd, bcd_p))[0]]
        e = brca.overall_survival_event[np.nonzero(np.in1d(bcd, bcd_p))[0]]
        ```
        
        We perform log-rank test:
        
        ```python
        logrank_results, fig = logranktest(r[~np.isnan(t)], t[~np.isnan(t)], e[~np.isnan(t)])
        test_statistic, p_value = logrank_results.test_statistic, logrank_results.p_value
        ```
        
        The output figure looks like:
        
        <div style="text-align:center"><img src="https://github.com/huangzhii/biolearns/blob/master/figures/survival_plot_BRCA_ABLIM3.png" width=600/></div>
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
