Metadata-Version: 2.1
Name: at16k
Version: 0.1.2
Summary: at16k is a Python library to perform automatic speech recognition or speech to text conversion.
Home-page: https://github.com/at16k/at16k.git
License: MIT
Keywords: asr, automatic speech recognition, speech-to-text, speech recognition, speech analysis
Author: Mohit Shah
Author-email: mohit@at16k.com
Requires-Python: >=3.6,<4.0
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Requires-Dist: flask (>=1.1.1,<2.0.0)
Requires-Dist: flask-cors (>=3.0.8,<4.0.0)
Requires-Dist: gevent (>=1.4.0,<2.0.0)
Requires-Dist: progressbar (>=2.5,<3.0)
Requires-Dist: scipy (>=1.3.3,<2.0.0)
Requires-Dist: tensorflow (==1.14)
Project-URL: Repository, https://github.com/at16k/at16k.git
Description-Content-Type: text/markdown

# at16k
Pronounced as ***at sixteen k***

# What is at16k?
at16k is a Python library to perform automatic speech recognition or speech to text conversion. The goal of this project is to provide the community with a production quality speech-to-text library.

# Installation
It is recommended that you install at16k in a virtual environment.

## Prerequisites
- Python >= 3.6
- Tensorflow = 1.14
- Scipy (for reading wav files)

## Install via pip
```
$ pip install at16k
```

## Install from source
Requires: [poetry](https://github.com/sdispater/poetry)
```
$ git clone https://github.com/at16k/at16k.git
$ poetry env use python3.6
$ poetry install
```

# Download models
Currently, two models are available for speech to text conversion.
- en_8k (Trained on english audio recorded at 8 KHz)
- en_16k (Trained on english audio recorded at 16 KHz)

To download all the models:
```
$ python -m at16k.download all
```
Alternatively, you can download only the model you need. For example:
```
$ python -m at16k.download en_8k
$ python -m at16k.download en_16k
```

# Preprocessing audio files
at16k accepts wav files with the following spces:
- Channels: 1
- Bits per sample: 16
- Sample rate: 8000 (en_8k) or 16000 (en_16k)

Use ffmpeg to convert your audio/video files to an acceptable format. For example,
```
# For 8 KHz
$ ffmpeg -i <input_file> -ar 8000 -ac 1 -ab 16 <output_file>

# For 16 KHz
$ ffmpeg -i <input_file> -ar 16000 -ac 1 -ab 16 <output_file>
```

# Usage
There are three ways to invoke at16k speech-to-text converter.

## Command line
```
at16k-convert -i <input_wav_file> -m <model_name>
```
Alternatively,
```
python -m at16k.bin.speech_to_text -i <input_wav_file> -m <model_name>
```
## Library API
```
from at16k.api import SpeechToText

# One-time initialization
STT = SpeechToText('en_16k') # or en_8k

# Run STT on an audio file, returns a dict
print(STT('./samples/test_16k.wav'))
```
Check [example.py](https://github.com/at16k/at16k/blob/master/example.py) for details on how to use the API.

## REST API server
```
at16k-serve -p <port> -m <model_name>
```
Alternatively,
```
python -m at16k.bin.serve -i <input_wav_file> -m <model_name>
```
Check [API Docs](https://documenter.getpostman.com/view/1430496/SWE58Kwx?version=latest) for details on how to use the REST API.

# Limitations

The max duration of your audio file should be less than **30 seconds** when using **en_8k**, and less than **15 seconds** when using **en_16k**. An error will not be thrown if the duration exceeds the limits, however, your transcript may contain errors and missing text.

# License

This software is distributed under the MIT license.

# Acknowledgements

We would like to thank [Google TensorFlow Research Cloud (TFRC)](https://www.tensorflow.org/tfrc) program for providing access to cloud TPUs.
