Metadata-Version: 2.1
Name: RoManTools
Version: 0.1.0b2
Summary: Tools for processing and converting Romanized Mandarin text
Author-email: Jeff Heller <jsheller@princeton.edu>
License: GPL-3.0 license
Project-URL: Homepage, https://github.com/JHGFD82/RoManTools
Project-URL: Documentation, https://github.com/JHGFD82/RoManTools/docs
Project-URL: Source, https://github.com/JHGFD82/RoManTools
Project-URL: Tracker, https://github.com/JHGFD82/RoManTools/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy

# RoManTools - Romanized Mandarin Tools

![Coverage](https://img.shields.io/badge/coverage-100%25-brightgreen)
![Python](https://img.shields.io/badge/python-3.9%2B-blue)
![Flake8](https://img.shields.io/badge/code%20style-flake8-brightgreen)
![Pylint](https://img.shields.io/badge/pylint-10.0%2F10-brightgreen)

This package comprises a set of tools designed to facilitate the handling of romanized Mandarin text. It is currently under active development by Jeff Heller, Digital Project Specialist for the Department of East Asian Studies at Princeton University. This is a beta release, open for testing and forking.

## Features Planned for Version 1.0

Version 1.0 of this project will include the following features:

- **Conversion between Romanization Standards**: Support for converting between Pinyin and Wade-Giles (with Yale and additional standards to be added in future versions).
- **Cherry Pick**: Converts only identified romanized Chinese terms, excluding any English words or those in a stopword list.
- **Text Segmentation**: Segments text into meaningful chunks, a feature that will be utilized by other features but also available for direct use by the user.
- **Syllable Count**: Counts the number of syllables per word and reports the list to the user.
- **Method Detect**: Identifies the romanization standard used in the input text and returns the detected standard(s) to the user as either a single standard or a list of multiple standards.
- **Validator**: Basic validation of supplied text.

## Prerequisites

Python packages:

* numpy

Packages will be installed along with RoMantools. This package has been tested to work with Python 3.9+.

## Installation

To install RoManTools, the easiest method is through pip:

``pip -m install RoManTools``

## Execution

RoManTools can be executed in three ways.

* From the command-line using the installed command:

```bash
RoManTools [action] -i [input]
```

* From the command-line using the ```python``` command on main.py (after navigating to where RoManTools is installed):

```bash
python main.py
```

* From within the Python console or scripts via import:

```python
from RoManTools import *
```

Documentation on command-line execution can be found in [CLI.md](docs/CLI.md), as well as execution within Python from [Python.md](docs/Python.md). Other documentation include requirements for inputted text as well as the package's methodology for text analysis.

## Possible Future Goals (suggestions welcome!)

* **Feedback**: Provide meaningful and specific error messages for incorrect syntax (e.g., `missing or invalid final: chy`, `extraneous characters: "(2)"`, `Xui is not a valid Wade-Giles syllable.`).
* **IPA Pronunciation**: Convert between romanized text and the International Phonetic Alphabet.
* **Tone Marking Conversion**: Convert between tone marking systems (numerical and IPA).
* **Audio Pronunciation**: Produce audio recordings of inputted text.
* **Flashcards/Quizzes**: Gamification of text input and pronunciation.
* To submit suggestions for future updates, contact main developer Jeff Heller via [Github issues](https://github.com/JHGFD82/RoManTools/issues) or via [e-mail](mailto:jh43@princeton.edu).

## Origin

This project originated as the `syllable_count` function developed for use with the Tang History Database, led by Professor Anna Shields of the Department of East Asian Studies at Princeton University. The objective was to validate user input of romanized Mandarin, facilitating the incorporation of data from Harvard University's Chinese Biographical Database (CBDB). By analyzing the syllable structure of romanized Mandarin strings and comparing them to corresponding Chinese characters, the function initially focused on validating the entry of Tang dynasty figures' names. As the project evolved, it expanded to include robust error handling, detection of both Pinyin and Wade-Giles romanization systems, and cross-system translation, even within mixed English text. The motivation to release this tool as a publicly available package stems from the need for a fast, efficient solution to validate romanized Mandarin text, promoting consistency in future datasets and ensuring flawless adherence to romanization standards.
