Metadata-Version: 2.1
Name: aidetector
Version: 0.0.1
Summary: AiDetector provides a simple interface to train and run models to classify if text was generated by AI or not.
Home-page: https://github.com/baileytec-labs
Author: Sean Bailey
Author-email: seanbailey518@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: spacy
Requires-Dist: torch
Requires-Dist: torchtext
Requires-Dist: halo
Requires-Dist: pandas
Requires-Dist: scikit-learn

# AI Detector: Detecting AI Generated Text

## Overview

AI Detector is a Python module, based on PyTorch, that simplifies the process of training and deploying a classification model to detect whether a given text has been generated by AI. It is designed to be platform-agnostic, making AI detection capabilities accessible to users across different work environments.

___



# Installation

There are two methods available for installing the AI Detector module:

- **Using pip:** You can install AI Detector directly from PyPI using pip by running the following command:

    `pip3 install aidetector`

- **From this repository:** Alternatively, you can clone this repository and install it locally:

    ```
    git clone https://github.com/baileytec-labs/aidetector.git
    cd aidetector
    pip3 install .
    ```

___

## Usage
AI Detector can be operated in two modes: training and inference.

### Training
To train a new model, you need a CSV dataset with a classification column (labels: 0 for human-written and 1 for AI-generated text) and a text column (the text data). The script takes the following command-line arguments:

```
aidetector train --datafile [path_to_data] --modeloutputfile [path_to_model] --vocaboutputfile [path_to_vocab] --tokenmodel [SpaCy model] --percentsplit [percentage_for_test_split] --classificationlabel [classification_label_in_data] --textlabel [text_label_in_data] --download  --lowerbound [lower_bound_for_early_stopping] --upperbound [upper_bound_for_early_stopping] --epochs [number_of_epochs]
```

### Inference
To make predictions with a trained model, you need to provide the text you want to classify. The script takes the following command-line arguments:

```
aidetection infer --modelfile [path_to_trained_model] --vocabfile [path_to_vocab] --text [text_to_classify] --tokenmodel [SpaCy_model] --threshold [probability_threshold_for_classification] --download [flag_to_download_SpaCy_model]
```


The prediction will be printed to the console: "This was written by AI" or "This was written by a human."

___

## Python API

You can use all the functionality of AiDetector in your python programs, it's as simple as starting with 

```

from aidetector.aidetectorclass import *
from aidetector.inference import *
from aidetector.training import *
from aidetector.tokenization import *
#or
import aidetector as ad

```

From there, you have access to all of the training, inference, and tokenization capabilities. 

for example, 

```
#Getting inference of an AI model in python
from aidetector.tokenization import *
from aidetector.inference import *
from aidetector.aidetectorclass import *

tokenizer=get_tokenizer()
vocab=load_vocab("./myvocabfile.vocab")
model = AiDetector(len(vocab))

testtext="Is this written by AI?"


model.load_state_dict(torch.load("./mymodelfile.model"))
isai=check_input(
    model,
    vocab,
    testtext,
    tokenizer=tokenizer,
)

#returns 0 if human, 1 if AI.



```

___

## Dependencies
The main dependencies for this project include:

PyTorch
SpaCy
Torchtext
scikit-learn
pandas
argparse
Halo

```Note: For tokenization, the project uses SpaCy models. By default, it uses the multi-language model xx_ent_wiki_sm, but other models can be specified using the --tokenmodel argument. If the model is not already downloaded, you can use the --download flag to download the model.```

# Contributing
Contributions to the AI Detector project are welcome. 
Please review CONTRIBUTION.md for further instructions.

