Metadata-Version: 2.2
Name: audio2chat
Version: 0.1.0
Summary: Generate chat data from multi-speaker audio files
Home-page: https://github.com/neuralwork/audio2chat
Author: Alara Dirik
Keywords: audio,transcription,whisper,diarization,speech-to-text,chat
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.19.0
Requires-Dist: torch>=2.1.1
Requires-Dist: transformers>=4.37.0
Requires-Dist: accelerate>=0.27.0
Requires-Dist: datasets[audio]>=2.16.0
Requires-Dist: librosa>=0.9.0
Requires-Dist: soundfile>=0.10.0
Requires-Dist: assemblyai
Requires-Dist: ffmpeg-python>=0.2.0
Requires-Dist: yt-dlp>=2023.12.30
Provides-Extra: speed
Requires-Dist: flash-attn>=2.5.0; extra == "speed"
Requires-Dist: safetensors>=0.4.1; extra == "speed"
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: black>=22.0; extra == "dev"
Requires-Dist: isort>=5.0; extra == "dev"
Requires-Dist: flake8>=4.0; extra == "dev"
Dynamic: author
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Audio2Chat

Audio2Chat converts multi-speaker audio files into chat format using [AssemblyAI](https://www.assemblyai.com/app) for speaker diarization and optionally Whisper for enhanced transcription.

### Features
- Speaker diarization and transcription using AssemblyAI
- Optional enhanced transcription using Whisper large-v3-turbo
- YouTube video download support
- Word-level timestamp support (can be used for speech-to-text and text-to-speech tasks)
- Structured chat format output

## Installation

```bash
# Install from PyPI
pip install audio2chat

# Or install from source
git clone https://github.com/yourusername/audio2chat.git
cd audio2chat
pip install -e .
```

### Requirements
- Python >=3.8
- FFmpeg (for YouTube downloads)
- CUDA-capable GPU (recommended for Whisper)

Install FFmpeg:
```bash
# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# MacOS
brew install ffmpeg

# Windows (using Chocolatey)
choco install ffmpeg
```

You need to have an Assembly AI account and an API key to use audio2chat. Once you setup an account, you can find the API key on your [dashboard](https://www.assemblyai.com/app).

## Usage

### Command Line

Basic usage:
```bash
# Process local audio file
audio2chat input.wav --api-key YOUR_ASSEMBLYAI_KEY --output output_dir

# Process YouTube video
audio2chat "https://youtube.com/watch?v=xxxxx" --api-key YOUR_ASSEMBLYAI_KEY --output output_dir
```

All options:
```bash
audio2chat --help

required arguments:
  input                   Input audio file path or YouTube URL
  --api-key API_KEY      AssemblyAI API key

output settings:
  --output OUTPUT        Output directory for audio and chat data (default: output)
  --download-format {mp3,wav}
                        Audio format for YouTube downloads (default: wav)

transcription settings:
  --language LANGUAGE    Language code for transcription (default: en)
  --num-speakers NUM     Expected number of speakers (default: auto-detect)
  --use-whisper         Use Whisper for enhanced transcription (default: False)

chat generation settings:
  --min-segment-confidence CONF
                        Minimum confidence score to include segment (default: 0.5)
  --merge-threshold THRESH
                        Time threshold to merge adjacent utterances (default: 1.0)
  --min-duration DUR    Minimum duration for a chat segment (default: 0.5)
  --include-metadata    Include additional metadata in output (default: True)
  --include-word-timestamps
                        Include word-level timing information (default: False)

vocabulary settings:
  --word-boost [WORDS ...]
                        List of words to boost recognition for

other:
  --verbose, -v         Enable verbose logging
```

### Python API

```python
from audio2chat.pipeline import AudioChatPipeline
from audio2chat.youtube_downloader import download_audio

# For YouTube videos
audio_path = download_audio(
    "https://youtube.com/watch?v=xxxxx",
    output_dir="downloads",
    audio_format="wav"
)

# Initialize pipeline
pipeline = AudioChatPipeline(
    api_key="YOUR_ASSEMBLYAI_KEY",
    language="en",
    num_speakers=2,  # or None for auto-detect
    use_whisper=True,  # enable Whisper for better transcription
    include_word_timestamps=True
)

# Process file
chat_data = pipeline.process_file(audio_path, "output/chat.json")
```

### Output Format

```json
{
    "messages": [
        {
            "speaker": "A",
            "text": "Hello there!",
            "start": 0,
            "end": 1500,
            "words": [
                {
                    "text": "Hello",
                    "start": 0,
                    "end": 750,
                    "confidence": 0.98
                },
                {
                    "text": "there",
                    "start": 750,
                    "end": 1500,
                    "confidence": 0.95
                }
            ]
        }
    ],
    "metadata": {
        "num_speakers": 2,
        "speakers": ["A", "B"],
        "transcription": "whisper+assemblyai"
    }
}
```

## Development

Run tests:
```bash
# Set up environment
export ASSEMBLYAI_API_KEY=your_key_here

# Add test audio file
cp your_test_audio.wav tests/test_data/input.wav

# Run tests
pytest tests/test_pipeline.py tests/test_chat_builder.py  # without Whisper
pytest tests/  # all tests including Whisper
```

## License
This project is licensed under the [MIT license](https://github.com/neuralwork/audio2chat/blob/main/LICENSE).

From [neuralwork](https://neuralwork.ai/) with :heart:
