Metadata-Version: 2.1
Name: PredDNAContam
Version: 0.0.2
Summary: A Machine Learning Model to Estimate Within-Species DNA Contamination.
Author: Raziyeh Mohseni
Author-email: raziyeh.mohseni.y@gmail.com
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: pandas==1.4.4
Requires-Dist: numpy>=1.19.5
Requires-Dist: scikit-learn==1.1.2
Requires-Dist: matplotlib==3.6.2
Requires-Dist: seaborn==0.12.2
Requires-Dist: joblib==1.4.2

# PredDNAContam

PredDNAContam is a tool for DNA contamination prediction from biosample data.

## Input File Format (CSV)

When using **PredDNAContam**, your input data should be in CSV format with the following columns:

| Column  | Description |
|---------|------------|
| GQ      | Genotype quality |
| DP      | Total read depth |
| AF      | Allele frequency |
| VAF     | Variant allele frequency |

### Example CSV File
The CSV file is generated by extracting the following key features from a VCF (Variant Call Format) file for each variant.

Here’s an example of how the CSV should look after extracting these features:


```csv
GQ,DP,AF,VAF
20,47,0.5,0.23
60,25,0.5,0.24
23,55,0.5,0.78
```


### Example config.txt file:

Before running PredDNAContam, you need to configure the paths in the config.txt file. This file contains important directory paths and filenames, which should be set as follows: 

input_dir=/path/to/csv_files
output_dir=/path/to/output_directory/output_PredDNAcontam
model_filename=/path/to/PredDNAContam_model/Random_Forest_Contamination_Model.joblib
scaler_filename=/path/to/PredDNAContam_model_scaler/scaler.joblib


## Download and Installation

To install PredDNAContam, follow these steps:

1. Download the package
You can download the package from PyPI:
👉 PredDNAContam on PyPI 
🔗 https://pypi.org/project/PredDNAContam/#files

Download the file:
📂 preddnacontam-0.0.2.tar.gz 

2. Extract the package

```
After downloading, unzip the file:
tar -xvzf preddnacontam-0.0.2.tar.gz
cd preddnacontam-0.0.2
```

3. Install the package
Inside the extracted directory, run:

```
pip install .
```

## Running PredDNAContam
1. Modify the configuration file
Navigate to the scripts directory and update config.txt to set the correct paths for your input files, model, and output directory.

2. Run the tool 
After configuring the paths, execute PredDNAContam:

```
PredDNAContam
```

