Metadata-Version: 2.1
Name: akkadian
Version: 1.0.6
Summary: Translating Akkadian signs to transliteration using NLP algorithms
Home-page: https://github.com/gaigutherz/Translating-Akkadian-using-NLP
Author: Ariel Elazary, Gai Gutherz
Author-email: am.elazary@gmail.com, gaigutherz@gmail.com
License: UNKNOWN
Description: # Translating-Akkadian-using-NLP
        Translating Akkadian signs to transliteration using NLP algorithms such as HMM, MEMM and BiLSTM neural networks.
        
        ## Getting Started
        There are 3 main ways to deploy the project:
        * Website
        * Python package
        * Github clone
        
        ## Website
        Use this link to access the website: https://babylonian.herokuapp.com/#/
        
        Go to "Akkademia" tab and enter signs to see them transliterated.
        
        ## Python Package
        These instructions will enable you to use the project on your local machine for transliterating using "akkadian" python package that is based on our project.
        
        ### Prerequisites
        Install Python 3.6 or 3.7 - Link for example (version 3.7.1): https://www.python.org/downloads/release/python-371/.
        
        ### Installing
        Install akkadian package (may takes a while).
        One way to do so is using pip:
        ```
        pip install akkadian
        ```
        
        ### Running
        Following are a few examples for running sessions.
        
        Transliterating akkadian signs:
        ```
        import akkadian.transliterate as akk
        print(akk.transliterate("𒁹𒀭𒌍𒋀𒈨𒌍𒌷𒁀"))
        ```
        
        Transliterating akkadian signs using BiLSTM:
        ```
        import akkadian.transliterate as akk
        print(akk.transliterate_bilstm("𒁹𒀭𒌍𒋀𒈨𒌍𒌷𒁀"))
        ```
        
        Top three options of transliterating akkadian signs using BiLSTM:
        ```
        import akkadian.transliterate as akk
        print(akk.transliterate_bilstm_top3("𒁹𒀭𒌍𒋀𒈨𒌍𒌷𒁀"))
        ```
        
        Transliterating akkadian signs using MEMM:
        ```
        import akkadian.transliterate as akk
        print(akk.transliterate_memm("𒁹𒀭𒌍𒋀𒈨𒌍𒌷𒁀"))
        ```
        
        Transliterating akkadian signs using HMM:
        ```
        import akkadian.transliterate as akk
        print(akk.transliterate_hmm("𒁹𒀭𒌍𒋀𒈨𒌍𒌷𒁀"))
        ```
        
        ## Github
        These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
        
        ### Prerequisites
        Install Python 3.6 or 3.7 - Link for example (version 3.7.1): https://www.python.org/downloads/release/python-371/.
        
        If you don't have git installed, install git - https://git-scm.com/downloads (Choose the appropriate operating system).
        
        If you don't have a Github user, create one - https://github.com/join?source=header-home.
        
        ### Installing the python dependencies
        
        Install torch:
        Windows - 
        ```
        pip install torch===1.3.1 torchvision===0.4.2 -f https://download.pytorch.org/whl/torch_stable.html
        ```
        
        Linux and MAC - 
        ```
        pip install torch torchvision
        ```
        
        Install allennlp:
        ```
        pip install allennlp==0.8.5
        ```
        
        ### Cloning the project
        
        Clone the project:
        ```
        git clone https://github.com/gaigutherz/Translating-Akkadian-using-NLP.git
        ```
        
        ### Running
        Now you can develop for the Translating-Akkadian-using-NLP repository and and your improvements!
        
        #### Training
        Use the file train.py in order to train the models using the datasets. There is a function for each model that trains, stores the pickle and tests its performance on a specific corpora.
        
        The functions are as follows:
        ```
        hmm_train_and_test(corpora)
        memm_train_and_test(corpora)
        biLSTM_train_and_test(corpora)
        ```
        
        #### Transliterating
        Use the file transliterate.py in order to transliterate using the models. There is a function for each model that gets a sentence of Akkadian signs as parameter and returns its transliteration.
        
        Example of usage:
        ```
        akkadian_signs = "𒁹𒀭𒌍𒋀𒈨𒌍𒌷𒁀"
        print(transliterate(akkadian_signs))
        print(transliterate_bilstm(akkadian_signs))
        print(transliterate_bilstm_top3(akkadian_signs))
        print(transliterate_hmm(akkadian_signs))
        print(transliterate_memm(akkadian_signs))
        ```
        
        ## Datasets
        The main datasets used for training and tests are:
        
        | Dataset                                                        | King   | Time    | Line Number   | Percentage of Corpora   |
        |-----------------------------------------------------------------|--------------------|-----------|------------|---------------|
        | RINAP 1                 | Tiglath-pileser III and Shalmaneser V      | 744-722 BC         | 1125  | 4.78% |
        | RINAP 3               | Sennacherib       | 704-681 BC         | 7131  | 30.31% |
        | RINAP 4               | Esarhaddon   | 680-669 BC | 6018  | 25.58%  |
        | RINAP 5               | Ashurbanipal and Successors | 668-612 BC  | 9252  | 39.33%  |
        
        More datasets used:
        		
        * **RIAO** - This project intends to present annotated editions of the entire corpus of Assyrian royal inscriptions, texts that were published in RIMA 1-3.
        		
        * **RIBO** - This project intends to present annotated editions of the entire corpus of Babylonian royal inscriptions from the Second Dynasty of Isin to the Neo-Babylonian Dynasty (1157-539 BC).
        		
        * **SAAO** - The online counterpart to the State Archives of Assyria series.
        		
        * **SUHU** - This project presents annotated editions of the officially commissioned texts of the extant, first-millennium-BC inscriptions of the rulers of Suhu, texts published in Frame, RIMB 2 pp. 275-331.
        		
        * **TEI** - Databases used for full translation.
        
        ### Datasets deployment
        
        The datasets are taken from ORACC project and can be downloaded from the following link: http://oracc.museum.upenn.edu/rinap/rinapdownloads/index.html.
        
        In our repository the datasets are located in the "raw_data" directory. They can be also downloaded from the Github repository using git clone or zip download.
        
        ## Project structure
        
        **BiLSTM_input**: 
        
        	Contains  dictionaries used for transliteration by BiLSTM.
        	
        **NMT_input**:
        
        	Contains dictionaries used for natural machine translation.
        	
        **akkadian.egg-info**:
        
        	Inforamtion  and settings for akkadian python package.
        	
        **akkadian**:
        
        	Sources and train's output.
        	
        	output:	Train's output for HMM, MEMM and BiLSTM - mostly pickles.
        		
        	__init__.py: Init script for akkadian python package. Initializes global variables.
        	
        	bilstm.py:  Class for BiLSTM train and prediction using AllenNLP implementation.
        	
        	build_data.py: Code for organizing the data in dictionaries.
        	
        	check_translation.py: Code for translation accuracy checking.
        	
        	combine_algorithms.py: Code for prediction using both HMM, MEMM and BiLSTM.
        	
        	data.py: Utils for accuracy checks and dictionaries interpretations.
        	
        	full_translation_build_data.py: Code for organizing the data for full translation task.
        	
        	get_texts_details.py: Util for getting more information about the text.
        	
        	hmm.py: Implementation of HMM for train and prediction.
        	
        	memm.py: Implementation of MEMM for train and prediction.
        	
        	parse_json: Json parsing used for data organizing.
        	
        	parse_xml.py: XML parsing used for data organizing.
        	
        	train.py: API for training all 3 algorithms and store the output.
        	
        	translation_tokenize.py: Code for tokenization for translation task.
        	
        	transliterate.py: API for transliterating using all 3 algorithms.
        	
        **build/lib/akkadian**:
        
        	Inforamtion  and settings for akkadian python package.
        	
        **dist**:
        
        	Akkadian python package - wheel and tar.
        	
        **raw_data**:
        
        	Databases used for  training the models.
        	
        	random: 4 Texts used for cross era testing.
        		
        	riao: This project intends to present annotated editions of the entire corpus of Assyrian royal inscriptions, texts that were published in RIMA 1-3.
        		
        	ribo: This project intends to present annotated editions of the entire corpus of Babylonian royal inscriptions from the Second Dynasty of Isin to the Neo-Babylonian Dynasty (1157-539 BC).
        		
        	rinap: Presents fully searchable, annotated editions of the royal inscriptions of Neo-Assyrian kings Tiglath-pileser III (744-727 BC), Shalmaneser V (726-722 BC), Sennacherib (704-681 BC), Esarhaddon (680-669 BC), Ashurbanipal (668-631 BC), Aššur-etel-ilāni (630-627 BC), and Sîn-šarra-iškun (626-612 BC).
        		
        	saao: The online counterpart to the State Archives of Assyria series.
        		
        	suhu: This project presents annotated editions of the officially commissioned texts of the extant, first-millennium-BC inscriptions of the rulers of Suhu, texts published in Frame, RIMB 2 pp. 275-331.
        		
        	tei: Databases used for full translation.
        		
        
        ### Authors
        * Gai Gutherz
        
        * Ariel Elazary
        
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
