Metadata-Version: 2.1
Name: MedicalMultitaskModeling
Version: 1.0.4
Summary: Multitask learning framework for medical data
License: All rights reserved.
Author: Raphael Schäfer
Author-email: jan.raphael.schaefer@mevis.fraunhofer.de
Requires-Python: >=3.10,<3.12
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Provides-Extra: all
Provides-Extra: api
Provides-Extra: detection
Provides-Extra: export
Provides-Extra: interactive
Provides-Extra: testing
Requires-Dist: aiohttp (>=3.9.0,<4.0.0) ; extra == "all" or extra == "api"
Requires-Dist: albumentations (>=1.4.11,<2.0.0)
Requires-Dist: autodoc-pydantic (>=2.2.0,<3.0.0) ; extra == "all" or extra == "testing"
Requires-Dist: black (>=23.1.0,<24.0.0) ; extra == "all" or extra == "all" or extra == "all" or extra == "interactive" or extra == "testing"
Requires-Dist: deepdiff (>=6.6.1,<7.0.0)
Requires-Dist: fastapi[all] (>=0.110.2,<0.111.0) ; extra == "all" or extra == "api"
Requires-Dist: fsspec[s3] (>=2024.3.1,<2025.0.0)
Requires-Dist: imageio (>=2.31.5,<3.0.0)
Requires-Dist: ipykernel (>=6.25.2,<7.0.0) ; extra == "all" or extra == "all" or extra == "interactive"
Requires-Dist: ipywidgets (>=8.1.1,<9.0.0) ; extra == "all" or extra == "all" or extra == "interactive"
Requires-Dist: isort (>=5.10.1,<6.0.0) ; extra == "all" or extra == "testing"
Requires-Dist: json5 (>=0.9.14,<0.10.0)
Requires-Dist: label-studio-converter (>=0.0.58,<0.0.59) ; extra == "all" or extra == "api"
Requires-Dist: label-studio-sdk (>=0.0.34,<0.0.35) ; extra == "all" or extra == "api"
Requires-Dist: lightgbm (>=4.1.0,<5.0.0) ; extra == "all" or extra == "api"
Requires-Dist: matplotlib (>=3.8.0,<4.0.0) ; extra == "all" or extra == "all" or extra == "interactive"
Requires-Dist: minio (>=7.1.17,<8.0.0)
Requires-Dist: mmdet (>=3.2.0,<4.0.0) ; extra == "all" or extra == "detection"
Requires-Dist: monai (>=1.3.0,<2.0.0)
Requires-Dist: nbconvert (>=7.16.1,<8.0.0) ; extra == "all" or extra == "all" or extra == "interactive"
Requires-Dist: nbformat (>=5.9.2,<6.0.0) ; extra == "all" or extra == "all" or extra == "interactive"
Requires-Dist: nibabel (>=5.2.1,<6.0.0)
Requires-Dist: numpy (>=1.26.1,<2.0.0)
Requires-Dist: onnx (>=1.13.1,<2.0.0) ; extra == "all" or extra == "export"
Requires-Dist: onnxruntime (>=1.14.1,<2.0.0) ; extra == "all" or extra == "export"
Requires-Dist: opencv-python (>=4.8.1.78,<5.0.0.0)
Requires-Dist: pandas (>=2.1.1,<3.0.0)
Requires-Dist: patchify (>=0.2.3,<0.3.0)
Requires-Dist: plotly (>=5.17.0,<6.0.0) ; extra == "all" or extra == "all" or extra == "interactive"
Requires-Dist: pydantic (>=2.4.2,<3.0.0)
Requires-Dist: pydantic-settings (>=2.0.3,<3.0.0)
Requires-Dist: pytest (>=8.0.0,<9.0.0) ; extra == "all" or extra == "testing"
Requires-Dist: pytest-cov (>=4.0.0,<5.0.0) ; extra == "all" or extra == "testing"
Requires-Dist: rasterio (>=1.3.9,<2.0.0)
Requires-Dist: scikit-learn (>=1.3.1,<2.0.0)
Requires-Dist: seaborn (>=0.13.0,<0.14.0) ; extra == "all" or extra == "all" or extra == "interactive"
Requires-Dist: segmentation-models-pytorch (>=0.3.3,<0.4.0)
Requires-Dist: shapely (>=2.0.2,<3.0.0)
Requires-Dist: sphinx (>=7.3.7,<8.0.0) ; extra == "all" or extra == "testing"
Requires-Dist: streamlit (>=1.33.0,<2.0.0) ; extra == "all" or extra == "all" or extra == "interactive"
Requires-Dist: tiffslide (>=2.2.0,<3.0.0)
Requires-Dist: toml (>=0.10.2,<0.11.0)
Requires-Dist: torch (>=2.1.2,<3.0.0)
Requires-Dist: torchinfo (>=1.8.0,<2.0.0)
Requires-Dist: torchvision (>=0.16.2,<0.17.0)
Requires-Dist: tqdm (>=4.66.1,<5.0.0)
Requires-Dist: typer (>=0.12.3,<0.13.0)
Requires-Dist: universal-pathlib (>=0.2.2,<0.3.0)
Requires-Dist: wandb (>=0.16.3,<0.17.0)
Requires-Dist: wsi-service (>=0.12.14,<0.13.0) ; extra == "all" or extra == "api"
Description-Content-Type: text/markdown

# MedicalMultitaskModeling

The project enables training foundational medical imaging models using multi-task learning. 

The software is provided on "AS IS" basis, i.e. it comes without any warranty, express or implied including (without limitations) any warranty of merchantability and warranty of fitness for a particular purpose.

Please note that this software is licensed under the LICENSE FOR SCIENTIFIC NON-COMMERCIAL RESEARCH PURPOSES, see license.md.

## Installation:

To install the project and its dependencies, run the following command: 

```bash
pip install medicalmultitaskmodeling
# Including extra dependency groups "interactive" and "testing" recommended for development:
pip install medicalmultitaskmodeling[interactive, testing]
# The latest main branch from https://github.com/FraunhoferMEVIS/MedicalMultitaskModeling
pip install git+https://github.com/FraunhoferMEVIS/MedicalMultitaskModeling.git
# A specific commit
pip install git+https://github.com/FraunhoferMEVIS/MedicalMultitaskModeling.git@<commit-hash>

# Verify system dependencies
import cv2; import torch; assert torch.cuda.is_available()
# Verify MMM
from mmm.interactive import *
```

You can check the pyproject.toml file to see all available extras.

## Usage

```python
# See our tutorial notebooks in the Quick Start Guide for more details.
from mmm.labelstudio_ext.NativeBlocks import NativeBlocks, MMM_MODELS, DEFAULT_MODEL
model = NativeBlocks(MMM_MODELS[DEFAULT_MODEL], device_identifier="cuda:0")

import torch; import torch.nn as nn
with torch.inference_mode():
    feature_pyramid: list[torch.Tensor] = model["encoder"](torch.rand(1, 3, 224, 224).to(model.device))
    hidden_vector = nn.Flatten(1)(model["squeezer"](feature_pyramid)[1])
```

## Quickstart Guide

To begin training multi-task models, you can use our `quickstart.ipynb` getting started notebook.
We recommend using our directory layout as created using our template as following:

1. Install the 'copier' package using pipx:

```bash
# We use pipx to install copier in isolated environment. We use copier to scaffold the code for an experiment. By the time of writing, we used copier version 9.2.0
pipx install copier
```

2. Use the template from a local 'medicalmultitaskmodeling' checkout to create a scaffold for your experiment.

```bash
# To create a new experiment next to your checkout of medicalmultitaskmodeling
copier copy ../medicalmultitaskmodeling/copier_template/ .
```

### Using VSCode development container

3. Open the development container using VSCode via the command `@command:remote-containers.rebuildAndReopenInContainer`. This requires the extension `ms-vscode-remote.remote-containers`.

4. Inside the development container, run the VSCode task (`@command:workbench.action.tasks.runTask`) `Prepare environment` which will reload the window after its done.

5. Run the `quickstart.ipynb` notebook to start your training and learn about this project.

### Using virtualenv

If you prefer to use a virtual environment instead of a container, follow these steps:

1. Create a new virtual environment in your template directory: `virtualenv venv`
1. Activate the virtual environment using `source venv/bin/activate`. For Windows `./venv/Scripts/activate`.
1. Install the 'medicalmultitaskmodeling' package and its dependencies in the virtual environment:

```bash
pip install medicalmultitaskmodeling[interactive]
# Or with a local checkout, and using Jupyterlab:
pip install /your/local/path/medicalmultitaskmodeling[interactive] jupyterlab
```

4. Run the `quickstart.ipynb` notebook. We recommend opening the folder in VSCode. Alternatively, you can use `LOCAL_DEV_ENV=True jupyter lab` and visit the link starting with http://localhost:8888/.

#### System dependencies

We *strongly* recommend using MMM with our public Docker images.
If that is not possible, setup GPU support, check with `nvidia-smi` and run:

```bash
sudo apt install python3-opencv -y
```

## Development

1. Start poetry environment `poetry init`
1. Add the package as a git submodule `git submodule add <repository-url>`
1. Add the package, including interactive and dependencies for adding the tests: `poetry add ./medicalmultitaskmodeling/ --editable -E interactive -E testing`
1. For practical examples on how to get started with development, refer to one of our projects, such as [UMedPT](https://github.com/FraunhoferMEVIS/UMedPT).

## Docker images

```bash
# Verify your GPU Docker setup using the hello-world image:
docker run --rm --gpus=all hello-world
# Only system requirements:
MMMVERSION=$(poetry version -s) && docker pull hub.cc-asp.fraunhofer.de/medicalmultitaskmodeling/mmm-base:$MMMVERSION
# Verify with
MMMVERSION=$(poetry version -s) && docker run --rm -it --gpus=all hub.cc-asp.fraunhofer.de/medicalmultitaskmodeling/mmm-base:$MMMVERSION nvidia-smi
# With dependencies pre-installed:
MMMVERSION=$(poetry version -s) && docker pull hub.cc-asp.fraunhofer.de/medicalmultitaskmodeling/mmm-stack:$MMMVERSION
```

## Start local infrastructure and inference API with Docker Compose

```bash
# Profiles:
# - inference runs MMM inference container
# - storage runs network drive based on S3 and JuiceFS
# - annotation runs Labelstudio annotation GUI
# - empaia runs infrastructure for gigapixel imaging
MMMVERSION=$(poetry version -s) docker compose --profile inference --profile storage --profile annotation --profile empaia up --build --remove-orphans -d
```

## Citation

If you use this project, please cite [our work](https://doi.org/10.48550/arXiv.2311.09847):

```
@misc{schäfer2023overcoming,
      title={Overcoming Data Scarcity in Biomedical Imaging with a Foundational Multi-Task Model}, 
      author={Raphael Schäfer and Till Nicke and Henning Höfener and Annkristin Lange and Dorit Merhof and Friedrich Feuerhake and Volkmar Schulz and Johannes Lotz and Fabian Kiessling},
      year={2023},
      eprint={2311.09847},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
```

# Repository Structure

For more detailed information, please refer to the docstrings within each directory.

- **torch_ext**: Contains Torch utilities that, while not specific to multi-task learning, can simplify its implementation. This includes our caching utilities.
- **task_sampling**: Provides utilities for enumerating tasks in a way that integrates with PyTorch.
- **inference_api**: starting point to our inference and few-shot-training FastAPI

### data_loading 

This directory contains tools for loading medical data and annotations, supporting formats such as NIfTI, DICOM, and GeoJSON.
It also contains the annotation type specific dataset wrappers such as `SemSegDataset`, responsible for data verification and visualization.

### interactive

This directory has been restructured to allow for easy importing in interactive environments like Jupyter. For instance, you can import several modules with a single line:

```python
from mmm.interactive import blocks, configs as cfs, data, tasks, training, pipes
```

### logging 

Here you'll find utilities that integrate with our logging and visualization tools.

### mtl_modules 

This directory houses multi-task learning types, such as `PyramidEncoder`, and specific tasks.

### neural 

This directory contains PyTorch modules that are not based on our multi-task learning types.

### optimization

This is the home of `MTLOptimizer`. It integrates several PyTorch optimizers with our training strategy and employs the `ZeroRedundancyOptimizer` strategy for distributed training.

### resources

This directory contains static files, like HTML templates for logging.

### trainer

The `Loop` class, used by the `MtlTrainer` class to execute multi-task learning, is located here.

