Metadata-Version: 2.1
Name: africanwhisper
Version: 0.9.11
Summary: A framework for fast fine-tuning and API endpoint deployment of Whisper model specifically developed to accelerate Automatic Speech Recognition(ASR) for African Languages.
Home-page: https://kevkibe.github.io/African-Whisper
Author: Kevin Kibe
Author-email: keviinkibe@gmail.com
License: MIT
Project-URL: Source, https://github.com/KevKibe/African-Whisper
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: all
Requires-Dist: transformers ==4.42.3 ; extra == 'all'
Requires-Dist: datasets ==2.19.2 ; extra == 'all'
Requires-Dist: librosa ==0.10.2.post1 ; extra == 'all'
Requires-Dist: evaluate ==0.4.1 ; extra == 'all'
Requires-Dist: jiwer ==3.0.4 ; extra == 'all'
Requires-Dist: bitsandbytes ==0.42.0 ; extra == 'all'
Requires-Dist: accelerate ==0.31.0 ; extra == 'all'
Requires-Dist: peft ==0.11.1 ; extra == 'all'
Requires-Dist: numpy ==1.26.4 ; extra == 'all'
Requires-Dist: wandb ==0.17.4 ; extra == 'all'
Requires-Dist: holoviews ==1.18.3 ; extra == 'all'
Requires-Dist: panel ==1.3.8 ; extra == 'all'
Requires-Dist: tf-keras ==2.16.0 ; extra == 'all'
Requires-Dist: tensorflow ==2.16.1 ; extra == 'all'
Requires-Dist: keras ==3.1.1 ; extra == 'all'
Requires-Dist: scipy ==1.12.0 ; extra == 'all'
Requires-Dist: tensorflow-probability ==0.24.0 ; extra == 'all'
Requires-Dist: faster-whisper ==1.0.3 ; extra == 'all'
Requires-Dist: python-dotenv ==1.0.1 ; extra == 'all'
Requires-Dist: pyannote-audio ==3.2.0 ; extra == 'all'
Requires-Dist: nltk ==3.8.1 ; extra == 'all'
Requires-Dist: torchvision ==0.17.2 ; extra == 'all'
Requires-Dist: ctranslate2 ==4.3.1 ; extra == 'all'
Requires-Dist: pandas ==2.0.3 ; extra == 'all'
Requires-Dist: torch ==2.3.1 ; extra == 'all'
Requires-Dist: pydantic ==2.7.3 ; extra == 'all'
Requires-Dist: prometheus-client ==0.20.0 ; extra == 'all'
Requires-Dist: fastapi ==0.111.0 ; extra == 'all'
Requires-Dist: uvicorn ==0.30.1 ; extra == 'all'
Requires-Dist: pandas ==2.2.1 ; extra == 'all'
Provides-Extra: deployment
Requires-Dist: torch ==2.3.1 ; extra == 'deployment'
Requires-Dist: transformers ==4.42.3 ; extra == 'deployment'
Requires-Dist: pydantic ==2.7.3 ; extra == 'deployment'
Requires-Dist: prometheus-client ==0.20.0 ; extra == 'deployment'
Requires-Dist: fastapi ==0.111.0 ; extra == 'deployment'
Requires-Dist: uvicorn ==0.30.1 ; extra == 'deployment'
Requires-Dist: python-dotenv ==1.0.1 ; extra == 'deployment'
Requires-Dist: faster-whisper ==1.0.3 ; extra == 'deployment'
Requires-Dist: pyannote-audio ==3.2.0 ; extra == 'deployment'
Requires-Dist: nltk ==3.8.1 ; extra == 'deployment'
Requires-Dist: torchvision ==0.17.2 ; extra == 'deployment'
Requires-Dist: ctranslate2 ==4.3.1 ; extra == 'deployment'
Requires-Dist: pandas ==2.2.1 ; extra == 'deployment'
Provides-Extra: training
Requires-Dist: transformers ==4.42.3 ; extra == 'training'
Requires-Dist: datasets ==2.19.2 ; extra == 'training'
Requires-Dist: librosa ==0.10.2.post1 ; extra == 'training'
Requires-Dist: evaluate ==0.4.1 ; extra == 'training'
Requires-Dist: jiwer ==3.0.4 ; extra == 'training'
Requires-Dist: bitsandbytes ==0.42.0 ; extra == 'training'
Requires-Dist: accelerate ==0.31.0 ; extra == 'training'
Requires-Dist: peft ==0.11.1 ; extra == 'training'
Requires-Dist: numpy ==1.26.4 ; extra == 'training'
Requires-Dist: wandb ==0.17.4 ; extra == 'training'
Requires-Dist: holoviews ==1.18.3 ; extra == 'training'
Requires-Dist: panel ==1.3.8 ; extra == 'training'
Requires-Dist: tf-keras ==2.16.0 ; extra == 'training'
Requires-Dist: tensorflow ==2.16.1 ; extra == 'training'
Requires-Dist: keras ==3.1.1 ; extra == 'training'
Requires-Dist: scipy ==1.12.0 ; extra == 'training'
Requires-Dist: tensorflow-probability ==0.24.0 ; extra == 'training'
Requires-Dist: faster-whisper ==1.0.3 ; extra == 'training'
Requires-Dist: python-dotenv ==1.0.1 ; extra == 'training'
Requires-Dist: pyannote-audio ==3.2.0 ; extra == 'training'
Requires-Dist: nltk ==3.8.1 ; extra == 'training'
Requires-Dist: torchvision ==0.17.2 ; extra == 'training'
Requires-Dist: ctranslate2 ==4.3.1 ; extra == 'training'
Requires-Dist: pandas ==2.0.3 ; extra == 'training'

<h1 align="center">African Whisper: ASR for African Languages</h1>

<p align="center">
  <a href="https://twitter.com/AfriWhisper">
    <img src="https://img.shields.io/twitter/follow/AfriWhisper?style=social" alt="Twitter">
  </a>
  <a href="https://github.com/KevKibe/African-Whisper/commits/">
    <img src="https://img.shields.io/github/last-commit/KevKibe/African-Whisper?" alt="Last commit">
  </a>
  <a href="https://github.com/KevKibe/African-Whisper/blob/main/LICENSE">
    <img src="https://img.shields.io/github/license/KevKibe/African-Whisper?" alt="License">
  </a>

</p>

<p align="center">
    <img src= "logo_image.png" width="100">
</p>


*Framework for seamless fine-tuning and deploying Whisper Model developed to advance Automatic Speech Recognition (ASR): translation and transcription capabilities for African languages*.


## Features
  
- 🔧 **Fine-Tuning**: Fine-tune the [Whisper](https://huggingface.co/collections/openai/whisper-release-6501bba2cf999715fd953013) model on any audio dataset from Huggingface, e.g., [Mozilla's](https://huggingface.co/mozilla-foundation) Common Voice datasets.

- 📊 **Metrics Monitoring**: View training run metrics on [Wandb](https://wandb.ai/).

- 🐳 **Production Deployment**: Seamlessly containerize and deploy the model inference endpoint for real-world applications.

- 🚀 **Model Optimization**: Utilize CTranslate2 for efficient model optimization, ensuring faster inference times.

- 📝 **Word-Level Transcriptions**: Produce detailed word-level transcriptions and translations, complete with timestamps.

- 🎙️ **Multi-Speaker Diarization**: Perform speaker identification and separation in multi-speaker audio using diarization techniques.

- 🔍 **Alignment Precision**: Improve transcription and translation accuracy by aligning outputs with Wav2vec models.

- 🛡️ **Reduced Hallucination**: Leverage Voice Activity Detection (VAD) to minimize hallucination and improve transcription clarity.
<br>
The framework implements the following papers:
<br>

1. [Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/abs/2212.04356) : Speech processing systems trained to predict large amounts of transcripts of audio on the internet scaled to 680,000 hours of multilingual and multitask supervision.

2. [WhisperX](https://arxiv.org/abs/2303.00747): Time-Accurate Speech Transcription of Long-Form Audio for time-accurate speech recognition with word-level timestamps. 

3. [Pyannote.audio](https://arxiv.org/abs/1911.01255): Neural building blocks for speaker diarization for advanced speaker diarization capabilities. 

4. [Efficient and High-Quality Neural Machine Translation with OpenNMT](https://arxiv.org/abs/1701.02810): Efficient neural machine translation and model acceleration.  

For more details, you can refer to the [Whisper ASR model paper](https://cdn.openai.com/papers/whisper.pdf).<br>

## Documentation
Refer to the [Documentation](https://kevkibe.github.io/African-Whisper/gettingstarted/) to get started


## Contributing 
Contributions are welcome and encouraged.

Before contributing, please take a moment to review our [Contribution Guidelines](https://github.com/KevKibe/African-Whisper/blob/master/DOCS/CONTRIBUTING.md) for important information on how to contribute to this project.

If you're unsure about anything or need assistance, don't hesitate to reach out to us or open an issue to discuss your ideas.

We look forward to your contributions!


## License
This project is licensed under the MIT License - see the [LICENSE](https://github.com/KevKibe/African-Whisper/blob/main/LICENSE) file for details.

## Contact
For any enquiries, please reach out to me through keviinkibe@gmail.com
