Metadata-Version: 2.3
Name: biophony
Version: 1.5.0
Summary: Random generation of genetic files
License: CeCILL
Keywords: CNRGH,genomic,gene,genetic,mutation,VCF,BED,FASTA,coverage
Author: Pierrick ROGER
Author-email: pierrick.roger@cea.fr
Requires-Python: >=3.11
Classifier: Programming Language :: Python :: 3
Classifier: License :: CeCILL-C Free Software License Agreement (CECILL-C)
Classifier: Operating System :: OS Independent
Requires-Dist: colorlog (>=6.9.0)
Requires-Dist: mutation-simulator (>=3.0.2)
Requires-Dist: rich-argparse (>=1.6.0,<2.0.0)
Project-URL: Bug Tracker, https://gitlab.com/cnrgh/databases/biophony/issues
Project-URL: Changelog, https://gitlab.com/cnrgh/databases/biophony/-/blob/main/CHANGELOG.md
Project-URL: Documentation, https://cnrgh.gitlab.io/databases/biophony/
Project-URL: Homepage, https://gitlab.com/cnrgh/databases/biophony
Project-URL: Repository, https://gitlab.com/cnrgh/databases/biophony
Description-Content-Type: text/markdown

# Genetic data files generator for testing purposes

`biophony` is a package for generating random genetic data files intended specifically for testing and validation.
Real genetic data is often too large, lacks flexibility, or raises privacy concerns, making it
unsuitable for thorough testing.
`biophony` makes it simpler to test software in different scenarios without needing real data,
enabling focused and efficient development and validation.

## Installation

`biophony` requires at least Python 3.11 to work.

To install with `pip`, run:

```bash
pip install biophony
```

## Usage

### Command Line Interfaces

`biophony` provides the following CLIs to generate data:

- `gen-cov`: generates a BED file with custom depth,
- `gen-fasta`: generates a FASTA file with a custom size sequence,
- `gen-fastavar`: generates a FASTA file with custom size sequences,
  each with `n` variants with control over insertion, deletion and mutation rate,
- `gen-fastq`: generates a FASTQ file with custom read count and size,
- `gen-vcf`: generates a VCF file from a FASTA file, with control over insertion, deletion and mutation rate.

CLIs that read and / or write data do it on `stdin` and `stdout` by default,
thus permitting to chain operations with the pipe operator `|`.

For exemple, run the following command to generate a VCF with 2% SNP, 1% INS and 1% DEL:

```bash
gen-fasta | gen-vcf --snp-rate 0.02 --ins-rate 0.01 --del-rate 0.01
```

To save the generated content, you can either use the regular output operator `>` to redirect `stdout` to a file or
use the dedicated option:

```bash
gen-fasta | gen-vcf --snp-rate 0.02 --ins-rate 0.01 --del-rate 0.01 > test.vcf  # redirect
gen-fasta | gen-vcf --snp-rate 0.02 --ins-rate 0.01 --del-rate 0.01 -o test.vcf  # dedicated option
```

### Python API

You can also use the Python API to generate random genetic data files in your scripts.

Link to the Python API documentation: https://cnrgh.gitlab.io/databases/biophony/.


