Metadata-Version: 2.1
Name: bioflow-insight
Version: 0.0.10
Summary: A software to extract and analyze the structure and associated metadata from a Nextflow workflow.
Author-email: George Marchment <author@example.com>
Project-URL: Homepage, https://github.com/George-Marchment/Newtflow-Structure
Project-URL: Issues, https://github.com/George-Marchment/Newtflow-Structure/issues
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: graphviz ==0.20.1
Requires-Dist: click
Requires-Dist: networkx ~=3.1 ; python_version == "3.8"
Requires-Dist: numpy ~=1.24.4 ; python_version == "3.8"
Requires-Dist: networkx ~=3.2.1 ; python_version >= "3.9"
Requires-Dist: numpy ~=1.26.1 ; python_version >= "3.9"
Provides-Extra: dev
Requires-Dist: build ; extra == 'dev'
Requires-Dist: twine ; extra == 'dev'
Requires-Dist: coverage ; extra == 'dev'
Requires-Dist: black ~=23.12.0 ; extra == 'dev'

# BioFlow-Insight


[![MIT licensed](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE) [![Version 0.1](https://img.shields.io/badge/version-v0.1-yellow)]()

## Description

This repository contains **BioFlow-Insight**, a Python software tool. **BioFlow-Insight** automatically analyses Nextflow workflow code, extracting useful information, notably in the form of visual graphs illustrating the workflow's structure and its various steps.

**BioFlow-Insight** is easily installable as a Python package (see here). It is also accessible as a free web service. For more information and to start using BioFlow-Insight, visit [here](https://bioflow-insight.pasteur.cloud/) (https://bioflow-insight.pasteur.cloud/).

<!--The outputs of **BioFlow-Insight** are saved in the results folder.-->

## Table of Contents

- [BioFlow-Insight](#bioflow-insight)
  - [Description](#description)
  - [Table of Contents](#table-of-contents)
  - [Installation](#installation)
    - [Using from source](#using-from-source)
    - [Using the Python package](#using-the-python-package)
  - [Usage](#usage)
    - [Input](#input)
    - [Output](#output)
  - [License](#license)
  - [Funding](#funding)

## Installation

### Using from source

BioFlow-Insight's dependencies are given in the `requirements.txt` file.

> Note : To install graphviz, in linux you might need to execute this command `sudo apt install graphviz`


### Using the Python package

**BioFlow-Insight** is easily installable as a [Python package]()<!--TODO : Add LINK-->.

To install it using *pip*, use the following command :

```
pip install bioflow-insight
```

TODO

## Usage

**BioFlow-Insight** automatically analyses the code of Nextflow workflows and extracts useful information, particularly in the form of visual graphs depicting the workflow's structure and representing its different steps. 

For an explanation of the different elements composing a Nextflow workflow, see [its documentation](https://www.nextflow.io/docs/latest/index.html).

The 3 different graphs generated by **BioFlow-Insight** are : 

1. The *specification graph* which represents all elements of the workflow, including processes and operations, and their interactions through channels. Within the specification graph, we define two types of operations: those without inputs and those with inputs (called branch operations).
2. The second graph represents operations without any inputs, along with processes and their dependencies. This graph, called the *dependency graph without branch operations*, is obtained by removing the branch operations and linking the remaining elements if a path exists between them in the original specification graph.
3. The final graph, called the *process dependency graph*, represents only processes and their dependencies. Similar to the latter, this graph is constructed by removing all operations, leaving only processes, and linking them based on their dependencies in the original specification graph.

> For a more in-depth explanation of BioFlow-Insight functionnalities, visit its webpage [here](https://bioflow-insight.pasteur.cloud/) (https://bioflow-insight.pasteur.cloud/).

> To examplify **BioFlow-Insight** utilisation, let's use the rnaseq-nf workflow proposed by Nextflow (its source code can be found [here](https://github.com/nextflow-io/rnaseq-nf/tree/8253a586cc5a9679d37544ac54f72167cced324b)). Examples of the output are given below. 

### Input 

In this example, we are going to use the **BioFlow-Insight** source code. After cloning both repositories (this one and the rnaseq-nf workflow). We can run the following command to run the analyses (the different steps are described below) :

```python
import os
current_path= os.getcwd()
os.chdir("bioflow-insight/")
from src.workflow import Workflow
os.chdir(current_path)

w = Workflow("./rnaseq-nf/main.nf", duplicate=False, display_info=True)
w.initialise()
w.generate_all_graphs(render_graphs = True, processes_2_remove=[])
```

1. line 1 to 5 : import the `Workflow` object allowing the analysis
2. line 6 : create the object `w` corresponding to `Workflow`
   1. line 6 : the first parameter is the address of the main Nextflow file (obligatory paramter).
   2. line 6 : parameter `duplicate` (by default `False`), in the case some processes and subworkflows are duplicated in the workflow by the `include as` option, this parameter will duplicate the elements in the graphs.
   3. line 6 : parameter `display_info` (by default `True`), shows the files which are being analysed
3. line 7 : `initialise` runs the entire analysis of the Nextflow workflow
4. line 8 : `generate_all_graphs` generates all the graphs in the mermaid and dot formats + the associated metadata for the graphs 
   1. line 8 : parameter `render_graphs` (by default `True`), if true the png images of the dot graphs are generated thanks to Graphviz. For large workflows this can sometimes fail (depending on the hardware).
   2. line 8 : parameter `processes_2_remove` (by default `[]`), is a list of processes which are to be removed from the graphs. This is usefull in the cas of `MULTIQC` processes (they don't really serve a functionnal role but can cluter the structure since they are connected to the majority of processes).

### Output

After the workflow has been analysed and the graphs generated, the outputs are saved in the `results` folder.

The structure of this folder is organised as such :

```
.
├── debug
│   ├── calls.nf
│   ├── operations_in_call.nf
│   └── operations.nf
├── graphs
│   ├── dependency_graph_wo_branch_operations.dot
│   ├── dependency_graph_wo_branch_operations.json
│   ├── dependency_graph_wo_branch_operations.mmd
│   ├── dependency_graph_wo_branch_operations.png
│   ├── dependency_graph_wo_branch_operations_wo_lables.dot
│   ├── dependency_graph_wo_branch_operations_wo_lables.mmd
│   ├── dependency_graph_wo_branch_operations_wo_lables.png
│   ├── dependency_graph_wo_branch_operations_wo_orphan_operations.dot
│   ├── dependency_graph_wo_branch_operations_wo_orphan_operations.mmd
│   ├── dependency_graph_wo_branch_operations_wo_orphan_operations.png
│   ├── dependency_graph_wo_branch_operations_wo_orphan_operations_wo_lables.dot
│   ├── dependency_graph_wo_branch_operations_wo_orphan_operations_wo_lables.mmd
│   ├── dependency_graph_wo_branch_operations_wo_orphan_operations_wo_lables.png
│   ├── metadata_dependency_graph_wo_branch_operations.json
│   ├── metadata_process_dependency_graph.json
│   ├── metadata_specification_graph.json
│   ├── process_dependency_graph.dot
│   ├── process_dependency_graph.json
│   ├── process_dependency_graph.mmd
│   ├── process_dependency_graph.png
│   ├── specification_graph.dot
│   ├── specification_graph.json
│   ├── specification_graph.mmd
│   ├── specification_graph.png
│   ├── specification_graph_wo_labels.dot
│   ├── specification_graph_wo_labels.mmd
│   ├── specification_graph_wo_labels.png
│   ├── specification_wo_orphan_operations.dot
│   ├── specification_wo_orphan_operations.mmd
│   ├── specification_wo_orphan_operations.png
│   ├── specification_wo_orphan_operations_wo_labels.dot
│   ├── specification_wo_orphan_operations_wo_labels.mmd
│   └── specification_wo_orphan_operations_wo_labels.png
└── ro-crate-metadata-rnaseq-nf.json
```

* The `ro-crate-metadata-rnaseq-nf.json` describes the workflow following an extended Workflow [RO-Crate](https://www.researchobject.org/ro-crate/) profile. The description of this extended profile can be found [here]() (TODO)
* the `debug` folder contains different intermediary files which are ussefull for debugging
* the `graphs` folder contains the different graphs which are generated. For each of the 3 graphs described above, **BioFlow-Insight** generates :
  * A `json` file which describes the graph using **BioFlow-Insight** specific format
  * A `json` file which describes the metadata which are extracted from the graph
  * Where possible **BioFlow-Insight** also generates the graphs without labels on the operations and channels. Additionaly there is also a variant where the orphan operations (operations which don't have any inputs or outputs) are not represented.

> For each graph **BioFlow-Insight** generates it in the `mermaid` format and the dot `dot` format. If the `render_graphs` option is set to `True`, the `png` image is also generated.

Here are some of the graphs which are generated by **BioFlow-Insight**, they are rendered using Graphviz (png).

| <img align="center" src="img/specification_graph.png" >  | <img align="center" src="img/dependency_graph_wo_branch_operations.png">   | <img align="center" src="img/process_dependency_graph.png" >   |
|:-:|:-:|---|
| Specification Graph  |  Dependency Graph without branch operations | Process Dependency Graph  |


## License

This project is licensed under the [GNU Affero General Public License](https://www.gnu.org/licenses/agpl-3.0.en.html).

TODO -> add license to git repo

## Funding

This work received support from the National Research Agency under the France 2030 program, with reference to ANR-22-PESN-0007.

___

<img align="left" src="img/logo.png" width="16%">
<img align="left" src="img/paris_saclay.png" width="16%">
<img align="left" src="img/lisn.png" width="16%">
<img align="left" src="img/pasteur.png" width="16%">
<img align="left" src="img/sharefair.png" width="16%">
<img align="left" src="img/france2030.png" width="16%">

<br/><br/>
<br/><br/>

