Metadata-Version: 2.1
Name: automata-tools
Version: 1.0.9
Summary: Tools to build automata from your custom rule
Home-page: https://github.com/linonetwo/automata-tools
Author: LinOnetwo
Author-email: linonetwo012@gmail.com
License: UNKNOWN
Project-URL: Bug Reports, https://github.com/linonetwo/automata-tools/issues
Project-URL: Source, https://github.com/linonetwo/automata-tools/
Keywords: regex automata nlp regular expression state machine NFA DFA
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: Filters
Classifier: Topic :: Text Processing :: Linguistic
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Typing :: Typed
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Provides-Extra: dev
Provides-Extra: test

# Automata Tools

Tools to build automata from your custom rule.

This package provides a set of handy tools to programmatically build automata, so you can build NFA、DFA、MinimizedDFA、WFA from any custom rules.

## Usage

### Install

```shell
conda install -c conda-forge automata-tools # not available yet
# or
pip install automata-tools
```

### Import

See example in [examples/NFAfromCustomRule.py](examples/NFAfromCustomRule.py)

```python
from typing import List
from automata_tools import BuildAutomata, Automata

automata: List[Automata] = []
```

### BuildAutomata

#### characterStruct

Build simple `(0)-[a]->(1)` automata

```python
automata.append(BuildAutomata.characterStruct(char))
```

#### unionStruct

Build automata that is an "or" of two sub-automata `(1)<-[a]-(0)-[b]->(1)`

```python
# to match "a|b"
a = automata.pop()
b = automata.pop()
if operator == "|":
    automata.append(BuildAutomata.unionStruct(b, a))
```

#### concatenationStruct

Build automata that is an "and" of two sub-automata `(0)-[a]->(1)-[b]->(2)`

```python
# to match "ab"
a = automata.pop()
b = automata.pop()
automata.append(BuildAutomata.concatenationStruct(b, a))
```

#### starStruct

Build automata that looks like the "Kleene closure"

```python
# to match "a*"
if operator == "*":
    a = automata.pop()
    automata.append(BuildAutomata.starStruct(a))
```

#### skipStruct

Build automata that looks like the "Kleene closure" but without the loop back `(1)<-[ε]-(2)`, so it only match the token once at most.

```python
# to match "a*"
if operator == "?":
    a = automata.pop()
    automata.append(BuildAutomata.skipStruct(a))
```

#### repeatRangeStruct

Build automata that will match the same token for several times `(0)-[a]->(1)-[a]->(2)-[a]->(3)`

```python
# to match "a{3}"
repeatedAutomata = BuildAutomata.repeatStruct(automata, 3)
```

#### repeatStruct

Build automata that will match the same token for n to m times

`(0)-[a]->(1)-[a]->(4), (0)-[a]->(2)-[a]->(3)-[a]->(4)`

```python
# to match "a{2,3}"
repeatedAutomata = BuildAutomata.repeatRangeStruct(automata, 2, 3)
```

### Automata

See example in [features/steps/customRule.py](features/steps/customRule.py)

```python
from automata_tools import DFAFromNFA, Automata

from your_implementation import NFAFromRegex, executor

nfa: Automata = NFAFromRegex().buildNFA(rule)
minDFA: Automata = DFAFromNFA(nfa).getMinimizedDFA()
minDFA.setExecuter(executor)

print(minDFA.execute(someText))
```

where `executor` is a function like the one in [examples/NFAfromCustomRule.py](examples/NFAfromCustomRule.py):

```python
def executor(tokens, startState, finalStates, transitions):
    return True
```

#### setExecuter

Set an executor to the automata that can freely use state and transition of the automata, and return a boolean value.

```python
from automata_tools import IAutomataExecutor

defaultExecuter: IAutomataExecutor = lambda tokens, startState, finalStates, transitions: True
minDFA.setExecuter(defaultExecuter)
```

#### setTokenizer

Set an tokenizer to the automata that can transform string to list of string token, which will be used by the executer.

```python
minDFA.setExecuter(lambda input: input.split(' ')[::-1])
```

### DFAFromNFA

Allow you minify Automata state

```python
nfa = NFAFromRegex().buildNFA(rule)
minDFA = DFAFromNFA(nfa).getMinimizedDFA()
```

### Waited Finite Automata

WFA, it can execute automata use matrix multiplication, so it can be very fast compare to brute force execution, especially when state space is large.

```python
from automata_tools import WFA, get_word_to_index

_, wordToIndex = get_word_to_index([ruleParser(context.rule), tokenizer(text)])
wfa = WFA(minDFA, wordToIndex, dfa_to_tensor)
wfa.execute(text)
```

#### get_word_to_index

Given `[['token', 'another'], ['token_in_rule']]`, return something like

```python
{'token': 0, 'another': 1, ...}
```

So we can translate automata state to a matrix.

#### WFA

Given an automata, a word index like `{'token': 0, 'another': 1, ...}`, and a function that transform automata to tensor (see example at [customRuleDFAToTensor](examples/customRuleDFAToTensor.py)), return a WFA instance.

## Development

### Environment

Create environment from the text file:

```shell
conda env create --file automataTools-env.txt
conda activate automataTools
```

Save env file: `conda list --explicit > automataTools-env.txt`

### Python Path

Create a `.env` file with content `PYTHONPATH=automataTools`

### Publish

To pypi

```shell
rm -rf ./dist && python3 setup.py sdist bdist_wheel && twine upload dist/*
```

To Conda

```shell
# I'm learning how to do...
```

## Resources

[Automata Theory Course Slides](http://www.cs.may.ie/staff/jpower/Courses/Previous/parsing/node5.html)

Probably the original reference [source](https://github.com/sdht0/automata-from-regex)


