Metadata-Version: 2.4
Name: allyin
Version: 0.1.2
Summary: Allyin: Modular AI tools for enterprise data processing
Home-page: https://github.com/AllyInAi/libs
Author: Niraj Dalavi
Author-email: niraj@allyin.ai
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: readability-lxml==0.8.4.1
Requires-Dist: beautifulsoup4==4.13.4
Requires-Dist: pillow==11.2.1
Requires-Dist: pytesseract==0.3.13
Requires-Dist: pymupdf==1.26.0
Requires-Dist: python-pptx==1.0.2
Requires-Dist: pytest==8.4.0
Requires-Dist: python-docx==1.1.2
Requires-Dist: pandas==2.3.0
Requires-Dist: openpyxl==3.1.5
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary



# Allyin: Modular Multimodal Extractor

Allyin is a Python library that extracts text from various file formats including PDFs, DOCX, XLSX, PPTX, images, HTML, and audio.

## Installation

First, ensure that `openai-whisper` is installed (used for audio transcription):

```bash
pip install git+https://github.com/openai/whisper.git
```

Then install Allyin:

```bash
pip install allyin
```

> Note: You may need to install system dependencies like `ffmpeg` for Whisper and `tesseract` for OCR.

## Usage

### Import

```python
from allyin.multimodal2text import extract_text
```

### Supported File Types

| Format      | Description               |
|-------------|---------------------------|
| `.pdf`      | Extracts text from PDFs   |
| `.docx`     | Extracts text from Word   |
| `.xlsx`     | Extracts text from Excel  |
| `.pptx`     | Extracts text from Slides |
| `.png/jpg`  | OCR-based text extraction |
| `.html`     | Extracts visible content  |
| `.mp3/.wav` | Transcribes audio         |

### Example

```python
from allyin.multimodal2text import extract_text

result = extract_text("/path/to/your/file.pdf")
print(result["text"])
```

## License

MIT
