Metadata-Version: 2.1
Name: bisindo-trans
Version: 1.2.0
Summary: Penerjemah BISINDO ke Bahasa Indonesia — N-gram Bigram & GPT-2 | BLEU 27.90 | v1.2.0: bug fix + studi ablasi + benchmark lengkap
Home-page: https://github.com/aldialdifatih/bisindo-trans
Author: Muhammad Aldi Alfatih
Author-email: Muhammad Aldi Alfatih <aldialfatih016@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/AldiAlfatih/bisindo-trans
Project-URL: Repository, https://github.com/AldiAlfatih/bisindo-trans
Project-URL: Changelog, https://github.com/AldiAlfatih/bisindo-trans/blob/main/CHANGELOG.md
Project-URL: BugTracker, https://github.com/AldiAlfatih/bisindo-trans/issues
Keywords: sign-language,translation,nlp,bisindo,indonesian,n-gram,gpt2,beam-search,nucleus-sampling
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: pandas>=1.5.0
Requires-Dist: nltk>=3.8.0
Requires-Dist: openpyxl>=3.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Provides-Extra: neural
Requires-Dist: torch>=2.0.0; extra == "neural"
Requires-Dist: transformers>=4.30.0; extra == "neural"

# BisindoTrans

**Hybrid Sign Language Translation System for Indonesian**

![Python](https://img.shields.io/badge/Python-3.9%2B-blue?logo=python&logoColor=white)
![License](https://img.shields.io/badge/License-MIT-green)
![Status](https://img.shields.io/badge/Status-Active-brightgreen)
![NLP](https://img.shields.io/badge/NLP-NLTK%20%7C%20HuggingFace-orange)

---

## Tentang Proyek

BisindoTrans adalah sistem penerjemahan bahasa isyarat Indonesia (BISINDO) ke teks Bahasa Indonesia natural yang dikembangkan sebagai bagian dari penelitian Skripsi S1 **Ilmu Komputer**.

Proyek ini mengeksplorasi pendekatan **Hybrid** yang menggabungkan:
- **Statistical Language Model** (N-gram Bigram) untuk efisiensi dan presisi
- **Neural Language Model** (GPT-2 Indonesian) untuk fleksibilitas generasi

Sistem ini juga membandingkan dua strategi decoding:
- **Beam Search** — deterministik, konsisten
- **Nucleus Sampling** — stokastik, variatif

Evaluasi dilakukan menggunakan metrik standar NLP:
- **BLEU Score** — n-gram precision
- **chrF Score** — character-level F-score

---

## Instalasi

```bash
# Clone repository
git clone https://github.com/username/bisindo-trans.git
cd bisindo-trans

# Install package (development mode)
pip install -e .

# Dengan dukungan Neural LM (GPT-2)
pip install -e ".[neural]"
```

**Requirements:**
- Python 3.9+
- pandas, nltk, openpyxl
- torch, transformers (opsional, untuk mode neural)

---

## Quick Start

```python
from bisindotrans import Translator

# Inisialisasi dengan N-gram model
translator = Translator(model_type="ngram")

# Terjemahkan glosa ke bahasa natural
hasil = translator.translate("SAYA MAKAN NASI", method="beam")
print(hasil)  # Output: "Saya makan nasi."

# Dengan nama (ejaan jari)
hasil = translator.translate("NAMA SAYA M U H A M M A D A L D I", method="beam")
print(hasil)  # Output: "Nama saya Muhammad Aldi."

# Perbandingan metode
beam_result = translator.translate("SELAMAT PAGI", method="beam")
nucleus_result = translator.translate("SELAMAT PAGI", method="nucleus")
```

---

## Fitur Unggulan

| Fitur | Deskripsi |
|-------|-----------|
| **Hybrid Model** | Pilih antara Statistical N-gram atau Neural GPT-2 sesuai kebutuhan |
| **Dual Decoding** | Beam Search (presisi tinggi) dan Nucleus Sampling (variasi output) |
| **Smart NER** | Deteksi otomatis nama orang dari ejaan jari (fingerspelling) |
| **Auto-Capitalization** | Kapitalisasi otomatis untuk nama dan awal kalimat |
| **Model Persistence** | Simpan/muat model untuk loading instan (~10 detik vs ~5 menit training) |
| **Anti-Hallucination** | Filter bawaan untuk mencegah output yang tidak relevan pada Neural LM |
| **Preprocessing Pipeline** | Normalisasi glosa, penggabungan ejaan jari, koreksi frasa |
| **Batch Translation** | Terjemahkan banyak kalimat sekaligus |

---

## Benchmark

### Perbandingan Model

| Model | Decoding | BLEU ↑ | chrF ↑ | Latency ↓ |
|-------|----------|--------|--------|----------|
| **N-gram** | **Beam Search** | **17.12** | **58.80** | **4.89 ms** |
| N-gram | Nucleus Sampling | 5.09 | 42.52 | 1.86 ms |
| Neural (GPT-2) | Beam Search | 12.07 | 57.89 | 270.06 ms |
| Neural (GPT-2) | Nucleus Sampling | 11.73 | 57.90 | 162.07 ms |

> 🏆 **Best overall:** N-gram + Beam Search — BLEU 17.12, chrF 58.80, latency 4.89 ms  
> ⚡ **Best speed:** N-gram — ~55× faster than Neural model

> **Test Coverage:** 3.460 exhaustive scenarios × 4 configurations = **13.840 total inferences**

### Konfigurasi Pengujian

| Parameter | N-gram | Neural |
|-----------|--------|--------|
| Corpus Size | ~19.5 juta kata | Pre-trained |
| Vocab Size | ~1.47 juta | 50,257 token |
| Beam Width | 3 | 5 |
| Top-p (Nucleus) | 0.5 | 0.7 |
| Temperature | — | 0.6 |
| Label Glosa | 73 kelas | 73 kelas |
| Test Scenarios | 3.460 (exhaustive) | 3.460 (exhaustive) |

---

## Struktur Package

```
bisindotrans/
├── __init__.py              # Public API
├── translator.py            # Main Translator class
├── preprocessing/
│   ├── normalisasi.py       # Cleaning & deduplication
│   └── naturalisasi.py      # NER & phrase mapping
├── models/
│   ├── ngram.py             # Statistical bigram model
│   └── neural.py            # GPT-2 wrapper
├── decoding/
│   └── strategies.py        # Beam Search & Nucleus Sampling
└── utils/
    └── postprocessing.py    # Output formatting
```

---

## Penggunaan Lanjutan

### Ganti Model

```python
# Mode N-gram (cepat, offline)
t = Translator(model_type="ngram")

# Mode Neural (butuh GPU/CPU kuat)
t = Translator(model_type="neural")
```

### Custom Model Path

```python
# Load model dari lokasi custom
t = Translator(model_type="ngram", model_path="path/to/custom_model.pkl")
```

### Batch Processing

```python
glosses = ["SELAMAT PAGI", "TERIMA KASIH", "SAMPAI JUMPA"]
results = translator.translate_batch(glosses, method="beam")
```

---

## Referensi

- Holtzman, A., et al. (2019). *The Curious Case of Neural Text Degeneration*
- Radford, A., et al. (2019). *Language Models are Unsupervised Multitask Learners*
- Papineni, K., et al. (2002). *BLEU: a Method for Automatic Evaluation of Machine Translation*

---

## Author

**Muhammad Aldi Alfatih**  
📧 aldialfatih016@gmail.com  
*Skripsi S1 Ilmu Komputer — 2026*

---

## License

MIT License — Silakan gunakan untuk keperluan akademis dan pengembangan.

---

## Acknowledgments

Proyek ini dikembangkan dengan dukungan ekosistem open-source berikut yang tersedia melalui [Python Package Index (PyPI)](https://pypi.org/):

| Library | Versi | Fungsi |
|---------|-------|--------|
| [`pandas`](https://pypi.org/project/pandas/) | ≥1.5.0 | Manipulasi data & evaluasi |
| [`nltk`](https://pypi.org/project/nltk/) | ≥3.8.0 | BLEU score & tokenisasi |
| [`openpyxl`](https://pypi.org/project/openpyxl/) | ≥3.0.0 | Ekspor hasil ke Excel |
| [`torch`](https://pypi.org/project/torch/) | ≥2.0.0 | Backend untuk Neural LM |
| [`transformers`](https://pypi.org/project/transformers/) | ≥4.30.0 | GPT-2 Indonesian model |

Model Neural menggunakan pre-trained [`cahya/gpt2-small-indonesian-522M`](https://huggingface.co/cahya/gpt2-small-indonesian-522M) dari HuggingFace Hub.

> **Published on PyPI:** [pypi.org/project/bisindo-trans](https://pypi.org/project/bisindo-trans/)  
> Terima kasih kepada komunitas Python Indonesia dan seluruh kontributor open-source yang membuat proyek ini mungkin terlaksana.
