Metadata-Version: 2.1
Name: BRACoD
Version: 0.3.0
Summary: BRACoD is a method to identify associations between bacteria and physiological variables in Microbiome data
Home-page: https://github.com/ajverster/BRACoD/tree/main
Author: ['Adrian Verster']
Author-email: adrian.verster@canada.ca
License: UNKNOWN
Platform: UNKNOWN
Requires-Python: >3.6
Description-Content-Type: text/markdown
Requires-Dist: pymc3 (==3.9.0)
Requires-Dist: pandas (>=0.24.0)
Requires-Dist: numpy (>=1.15)
Requires-Dist: scikit-learn (>=0.20)
Requires-Dist: arviz (<=0.10)
Requires-Dist: Theano (>=1.0.5)

# BRACoD: Bayesian Regression Analysis of Compositional Data

### Installation

Installation in python: 

    pip install BRACoD

There is also an R interface, which depends on the python version being installed. There is a helper function that will do it for you, but it might be easier to do it with pip.

    devtools::install_github("ajverster/BRACoD/BRACoD.R")

### Python Walkthrough

1. Simulate some data and normalize it

    ```python
    import BRACoD
    sim_counts, sim_y, contributions = BRACoD.simulate_microbiome_counts(BRACoD.df_counts_obesity)
    sim_relab = BRACoD.scale_counts(sim_counts)
    ```

2. Run BRACoD

    ```python
    trace = BRACoD.run_bracod(sim_relab, sim_y, n_sample = 1000, n_burn=1000, njobs=4)
    ```

3. Examine the diagnostics

    ```python
    BRACoD.convergence_tests(trace, sim_relab)
    ```

4. Examine the results

    ```python
    df_results = BRACoD.summarize_trace(trace, sim_counts.columns, 0.3)
    ```

5. Compare the results to the simulated truth

    ```python
    bugs_identified = df_results["bugs"].values
    bugs_actual = np.where(contributions != 0)[0]

    precision, recall, f1 = BRACoD.score(bugs_identified, bugs_actual)
    print("Precision: {}, Recall: {}, F1: {}".format(precision, recall, f1))
    ```

6. Try with your real data. We have included some functions to help you threshold and process your data

    ```python
    df_counts = BRACoD.threshold_count_data(BRACoD.df_counts_obesity)
    df_rel = BRACoD.scale_counts(df_counts)
    df_rel, Y = remove_null(df_rel, BRACoD.df_scfa_obesity["butyric"].values)
    trace = BRACoD.run_bracod(df_rel, Y, n_sample = 1000, n_burn=1000, njobs=4)
    df_results = BRACoD.summarize_trace(trace, sim_counts.columns, 0.3)
    ```

### R Walkthrough

1. Simulate some data and normalize it

    ```R
    library('BRACoD.R')
    data(obesity)
    r <- simulate_microbiome_counts(df_counts_obesity)

    sim_counts <- r[[1]]
    sim_y <- r[[2]]
    contributions <- r[[3]]
    sim_relab <- scale_counts(sim_counts)
    ```

2. Run BRACoD

    ```R
    trace <- run_bracod(sim_relab, sim_y, n_sample = 1000, n_burn=1000, njobs=4)
    ```

3. Examine the diagnostics

    ```R
    convergence_tests(trace, sim_relab)
    ```

4. Examine the results

    ```R
    df_results <- summarize_trace(trace, colnames(sim_counts))
    ```

5. Compare the results to the simulated truth

    ```R
    bugs_identified <- df_results$bugs
    bugs_actual <- which(contributions != 0)

    r <- score(bugs_identified, bugs_actual)

    precision <- r[[1]]
    recall <- r[[2]]
    f1 <- r[[3]]

    print(sprintf("Precision: %.2f, Recall: %.2f, F1: %.2f",precision, recall, f1))
    ```

6. Try with your real data. We have included some functions to help you threshold and process your data

    ```R
    df_counts_obesity_sub <- threshold_count_data(df_counts_obesity)
    df_rel <- scale_counts(df_counts_obesity_sub)
    r <- remove_null(df_rel, Y)
    df_rel <- r[[1]]
    Y <- r[[2]]

    trace <- run_bracod(df_rel, Y, n_sample = 1000, n_burn=1000, njobs=4)
    df_results <- summarize_trace(trace, colnames(df_counts_obesity_sub), 0.3)
    ```



