Metadata-Version: 2.0
Name: FinnSyll
Version: 1.0.0
Summary: Finnish syllabifier and compound segmenter
Home-page: https://github.com/tsnaomi/finnsyll
Author: Naomi Tachikawa Shapiro
Author-email: coder@tsnaomi.net
License: BSD
Keywords: Finnish syllabifier compound segmenter
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python
Requires-Dist: morfessor

## FinnSyll

FinnSyll is a Python library that syllabifies words according to Finnish syllabification principles.
It is also equipped with a Finnish compound splitter. 
More details/docs to come.

### Installation

```$ pip install FinnSyll```

### Basic usage

First, instantiate a ```FinnSyll``` object.

```
>>> from finnsyll import FinnSyll
>>> f = FinnSyll()
```

To syllabify:
```
>>> f.syllabify('runoja')
['ru.no.ja']  # internal syllable boundaries are indicated with '.'
```

To segment compounds:
```
>>> f.split('sosiaalidemokraattien')
'sosiaali=demokraattien'  # internal word boundaries are indicated with '='
```

### Optional arguments

The syllabifier can be customized along two different parameters: variation and compound splitting.  

####variation

Instantiating a ```FinnSyll``` object with ```variation=True``` (default) will allow the syllabifier to return multiple syllabifications if variation is predicted. When ```variation=True```, the syllabifier will return a list. Setting ```variation``` to ```False``` will cause the syllabifier to return a string containing the first predicted syllabification. 

*Variation*:
```
>>> f = FinnSyll(variation=True) 
>>> f.syllabify('runoja')
['ru.no.ja']
>>> f.syllabify('vapaus')
['va.pa.us', 'va.paus']
```

*No variation*:
```
>>> f = FinnSyll(variation=False)
>>> f.syllabify('runoja')
'ru.no.ja'
>>> f.syllabify('vapaus')
'va.pa.us'
```

#### split_compounds

When instantiating a ```FinnSyll``` object with ```split_compounds=True``` (default), the syllabifier will first attempt to split the input into constituent words before syllabifying it. This forces the syllabifier to insert a syllable boundary in between identified constituent words. The syllabifier will skip this step if ```split_compounds``` is set to ```False```.

*Compound splitting*:
```
>>> f = FinnSyll(split_compounds=True) 
>>> f.syllabify('rahoituserien')  # rahoitus=erien
['ra.hoi.tus.e.ri.en']
```

*No compound splitting*:
```
>>> f = FinnSyll(split_compounds=False) 
>>> f.syllabify('rahoituserien')
['ra.hoi.tu.se.ri.en']  # incorrect
```



