Metadata-Version: 2.1
Name: bio-wrangler
Version: 0.2
Summary: A bioinformatics data wrangling package for FASTA, FASTQ, VCF, and GFF files.
Home-page: https://github.com/se7en69/bio-wrangler
Author: Abdul-Rehman ikram
Author-email: hanzo7n@gmail.com
License: MIT
Project-URL: Bug Tracker, https://github.com/se7en69/bio-wrangler/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: pandas
Requires-Dist: biopython
Requires-Dist: gffutils
Requires-Dist: pyvcf3


# Bio-Wrangler

**Bio-Wrangler** is a bioinformatics data wrangling package for handling FASTA, FASTQ, VCF, and GFF files. It helps load, filter, merge, and summarize biological datasets in an easy and efficient manner.

## Features

- Load **FASTA**, **FASTQ**, **VCF**, and **GFF** files into pandas DataFrames.
- Filter data by quality, chromosome, position, and specific attributes.
- Merge and summarize datasets.
- Save data to **CSV** or **Excel** formats.

## Installation

You can install **Bio-Wrangler** directly from PyPI:


pip install bio-wrangler


## Usage

Here’s how to use **Bio-Wrangler** to load, filter, and manipulate your bioinformatics datasets.

### Loading Data

You can load data from FASTA, FASTQ, VCF, and GFF formats into pandas DataFrames for easy manipulation.

#### Example: Loading FASTA, FASTQ, VCF, and GFF Files


from bio_wrangler.bio_wrangler import BioWrangler

# Initialize the BioWrangler class
wrangler = BioWrangler()

# Load data from different formats
fasta_data = wrangler.load_fasta('path/to/sample.fasta')
fastq_data = wrangler.load_fastq('path/to/sample.fastq')
vcf_data = wrangler.load_vcf('path/to/sample.vcf')
gff_data = wrangler.load_gff('path/to/sample.gff')

# Display the first few rows of the datasets
print(fasta_data.head())
print(fastq_data.head())
print(vcf_data.head())
print(gff_data.head())


### Filtering Data

You can filter the data by quality, chromosome, position, or specific attributes.

#### Example: Filtering FASTQ by Quality


filtered_fastq = wrangler.filter_fastq_by_quality(fastq_data, 30.0)
print(filtered_fastq.head())  # Display FASTQ sequences with avg quality >= 30


#### Example: Filtering VCF by Chromosome and Position Range


filtered_vcf_by_chr = wrangler.filter_by_chromosome(vcf_data, 'chr1')
filtered_vcf_by_pos = wrangler.filter_by_position_range(vcf_data, 100000, 500000)

print(filtered_vcf_by_chr.head())
print(filtered_vcf_by_pos.head())


#### Example: Filtering GFF by Attribute


filtered_gff = wrangler.filter_by_attribute(gff_data, 'ID', 'gene1')
print(filtered_gff.head())  # Filter by gene ID


### Summarizing Data

Generate a summary of the dataset, including total rows, average quality, and positional statistics.

#### Example: Summarizing FASTQ and VCF Data


fastq_summary = wrangler.summarize_fastq(fastq_data)
vcf_summary = wrangler.summarize_data(vcf_data)

print(fastq_summary)
print(vcf_summary)


### Merging Datasets

Merge multiple datasets (e.g., two VCF datasets) into one for combined analysis.

#### Example: Merging VCF Datasets


merged_vcf = wrangler.merge_datasets(vcf_data, filtered_vcf_by_chr)
print(merged_vcf.head())  # Combined dataset


### Saving Data

You can save your processed data to a file in either CSV or Excel format.

#### Example: Saving Filtered VCF Data to a CSV File


wrangler.save_data(filtered_vcf_by_chr, 'filtered_vcf_output.csv', 'csv')


## License

This project is licensed under the MIT License.

---
