Metadata-Version: 2.2
Name: biofusion
Version: 0.0.2
Summary: Multilayer networks for biological multimodal data fusion and analysis.
Home-page: https://github.com/CalmScout/BioFusion
Author: Anton Popov
Author-email: popovanton567@gmail.com
License: MIT License
Keywords: nbdev jupyter notebook python
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: provides-extra
Dynamic: requires-python
Dynamic: summary

# BioFusion


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

A tool for multimodal biological data integration and analysis with the
help of multilayer networks.

This repository contains code developed during collaboration between
Fujitsu Research of Europe and Barcelona Supercomputing Center.

## Organisation

The directory structure is as follows:

    .
    |-- data
    |   |-- GeneCelltypes
    |   |   |-- gene_celltypes_all_common.txt
    |   |   |-- gene_celltypes_all_common_cnv.txt
    |   |   |-- gene_celltypes_all_common_rna.txt
    |   |   |-- gene_celltypes_all_unique.txt
    |   |   |-- gene_celltypes_all_unique_cnv.txt
    |   |   `-- gene_celltypes_all_unique_rna.txt
    |   |-- MultilayerCommunities
    |   |   |-- <BSC-community-trajectories.tsv>
    |   |   `-- <BSC-distance-matrix.tsv>
    |   |-- MultilayerGraphs
    |   |   |-- <BSC-MLN-layer-1.json>
    |   |   |-- :
    |   |   `-- <BSC-MLN-layer-5.json>
    |   |-- TCGA_BRCA_Dic_Hover_files
    |   |   `-- TCGA-E2-A1B6-01A-03-TSC.f0917d61-c963-42cf-86c7-48b1e70c662d.pt
    |   |-- TopGenesWSI
    |   |   |-- common_genes
    |   |   |   |-- box_level
    |   |   |   |   `-- TCGA-E2-A1B6-01A-03-TSC.f0917d61-c963-42cf-86c7-48b1e70c662d
    |   |   |   |       `-- stats.csv
    |   |   |   `-- wsi_level
    |   |   `-- unique_genes
    |   |       |-- box_level
    |   |       `-- wsi_level
    |   |-- cnv.csv
    |   `-- rna.csv
    |-- outputs
    |   |-- TCGA_BRCA_spatial
    |   |-- TCGA_Gene_Graphs
    |   `-- TopGenesMLN
    |-- scripts
    |   |-- create_gene_graph.py
    |   |-- create_gene_list.py
    |   |-- get_WSI_celltype_weights.py
    |   `-- get_WSI_gene_info.py
    |-- README.md
    `-- requirements.txt

## Usage

The Python scripts can be run from the `/scripts` directory after
installing all necessary Python modules as listed in `requirements.txt`.

The following scripts are provided:

`create_gene_list.py` - Description: This script finds the set of genes
that are common between the MLN and the genomic data (CNV or RNA). Files
in the folder that have suffix “\_cnv” and “\_rna” are generated using
this script. - Input: /data/GeneCelltypes, /data/cnv.csv - Output:
/data/GeneCelltypes

`get_WSI_gene_info.py` - This script/module reads top genes from WSI
patches and retrieves gene associations and significant neighbourhood
communities from multilayer network. - Input: /data/TopGenesWSI -
Output: /outputs/TopGenesMLN

`get_WSI_celltype_weights.py` - This script takes WSI Graphs (where
patches correspond to groups of nodes), gene celltype associations, and
bulk-RNA data, and produces heatmaps of approximated spatial gene
expression. - Input: /data/TCGA_BRCA_Dic_Hover_files,
/data/GeneCelltypes, /data/rna.csv - Output: /outputs/TCGA_BRCA_spatial

`create_gene_graph.py` - Description: This script takes the genomic data
(CNV or RNA) and MLN graphs (along with computes Louvain community based
Hamming distance matrix) and generates a hierarchical clustering based
similarity matrix for the genes and a gene graph with edge attributes
reflecting the gene-gene similarities. - Input: /data/cnv.csv,
/data/MultilayerGraphs, /dataa/MultilayerCommunities - Output:
/outputs/TCGA_Gene_Graphs

To run notebooks, please install the package in the editable mode:

    pip install -e .

from the package roor directory.
