Metadata-Version: 2.1
Name: SimboloSiamese
Version: 0.2.7
Author: Phyo Thu Htet, Min Thiha Htun, Ye Bhone Lin
Description-Content-Type: text/markdown
License-File: LICENSE.txt

### Here's an extended version of your text with additional information about Burmese-to-Romanization:

Provides a tool for syllable-based tokenization of Burmese text. It breaks down Burmese text into individual syllables, facilitating language processing tasks such as text analysis, machine learning, and natural language processing (NLP) for Burmese.

### Features

Syllable Tokenization: Tokenizes Burmese text into syllables based on Unicode rules. It helps in language segmentation and provides a clear framework for analyzing Burmese sentences in a structured manner.

Efficient Processing: Designed to handle large text efficiently with minimal memory overhead, making it scalable for tasks involving big data or large-scale text analysis.

Burmese Unicode Support: Fully supports Burmese script and syllable rules as defined by the Burmese Unicode standard, ensuring that the tokenization aligns with native Burmese text structure.

Burmese-to-Romanization: Converts Burmese script into its Romanized equivalent, facilitating pronunciation guidance and helping non-native speakers understand Burmese text. The Romanization process follows the standard linguistic rules for Burmese phonetic transcription, offering a bridge for users unfamiliar with the Burmese script to read, pronounce, and comprehend the language. This feature can be particularly useful for language learners, cross-lingual applications, and linguistic studies that require Romanized Burmese text.

Romanization-to-Burmese: Converts Romanized Burmese back into its native script, enabling smooth transitions between phonetic transcriptions and written Burmese. This feature ensures accuracy by adhering to linguistic rules and Unicode standards for Burmese, making it valuable for transliteration tasks, text restoration, and multilingual systems.

### About Swan Arr Electronics' Mapping

The Swan Arr Electronics' Mapping serves as the foundational framework for the conversion processes between Burmese script and Romanized text. This mapping provides a systematic approach to transliteration by adhering to established phonetic and linguistic rules specific to the Burmese language.

#### Key Features of Swan Arr Electronics' Mapping:

Phonetic Accuracy: The mapping ensures that Romanized outputs represent the actual pronunciation of Burmese words, making it easier for non-native speakers to read and understand.

Linguistic Consistency: It follows standardized linguistic guidelines for Burmese phonetics, ensuring uniformity in conversions.

Bidirectional Support: The mapping is capable of converting Burmese script to Romanized text and vice versa without losing contextual meaning or linguistic accuracy.

#### Integration in This Toolkit

The Burmese-to-Romanization and Romanization-to-Burmese features in this library are directly powered by the Swan Arr Electronics' Mapping. This ensures:

Accurate Romanized outputs for Burmese script.

Seamless re-conversion of Romanized text back into its original Burmese form.

### How to use (Getting Started)

```
# Install the SimboloSiamese package using pip
# pip install SimboloSiamese

# Import the BurmeseConverter from the Siamese module
from Siamese import BurmeseConverter

converter = BurmeseConverter()

# Example: Zawgyi to Unicode
zawgyi_text = "ဖြွှော်"
try:
    # Convert Zawgyi text to Unicode
    unicode_output = converter.zawgyi_to_unicode(zawgyi_text)
    # Print the Unicode output
    print("Unicode Output:", unicode_output)
except Exception as e:
    # Handle any errors that occur during conversion
    print(f"Error in Zawgyi to Unicode conversion: {e}")

# Example: Tokenization of a Burmese word
tokenization_text = "တက္ကသိုလ်"
try:
    # Tokenize the Burmese word. 1 means With the virama mark. If you dont want to tokenize the virama mark, you can type any numbers except 1
    tokenized_output = converter.syllable_tokenization(1, tokenization_text) # try with process_text in case it cannot work with syllable_tokenization
    print("Tokenized Output:", tokenized_output)
except Exception as e:
    # Handle any errors that occur during tokenization
    print(f"Cannot Tokenize the word: {e}")

# Example: Convert Burmese text to Romanized script
burmese_text = "ကော်"
try:
    # Convert Burmese text to Romanized script
    romanized_output = converter.burmese_to_romanization(burmese_text)
    # Print the Romanized output
    print("Romanized Output:", romanized_output)
except Exception as e:
    # Handle any errors that occur during Romanization
    print(f"Error in Burmese Romanization: {e}")

# Example: Romanization Burmese
burmese_text = "le kReAc: liuc:, K rI: sq a mHt, ၂ ၂ ၈ ၃, jQ, SeAF piu liu mRiu., lU ne rp kWk peAF jiu., pYk kY KL. pRI:, liuk pA lA jU, ၆ ၂, OO: s luN:, je SuN: KL. jQ // "
try:
    burmese_output = converter.romanization_to_burmese(burmese_text)
    print("Burmese Output:", burmese_output)
except Exception as e:
    print(f"Error in Romanization Burmese: {e}")

```

Syallable: Author: Phyo Thu Htet
Zawgyi to Unicode and Unicode to Zawgyi: Author Min Thiha Htun, Supervisor: Phyo Thu Htet, Other Contributor: Ye Bhone Lin
Romanized: Author: Ye Bhone Lin, Supervisor: Phyo Thu Htet
