Metadata-Version: 2.0
Name: FinnSyll
Version: 1.0.1
Summary: Finnish syllabifier and compound segmenter
Home-page: https://github.com/tsnaomi/finnsyll
Author: Naomi Tachikawa Shapiro
Author-email: coder@tsnaomi.net
License: BSD
Keywords: Finnish syllabifier compound segmenter
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python
Requires-Dist: morfessor

FinnSyll
********

FinnSyll is a Python library that syllabifies words according to Finnish syllabification principles.
It is also equipped with a Finnish compound splitter. 
More details/docs to come.

Installation
============

``$ pip install FinnSyll``

Basic usage
===========

First, instantiate a ``FinnSyll`` object.
::
  >>> from finnsyll import FinnSyll
  >>> f = FinnSyll()

To syllabify:
::
  >>> f.syllabify('runoja')
  ['ru.no.ja']  # internal syllable boundaries are indicated with '.'

To segment compounds:
::
  >>> f.split('sosiaalidemokraattien')
  'sosiaali=demokraattien'  # internal word boundaries are indicated with '='

Optional arguments
==================

The syllabifier can be customized along two different parameters: variation and compound splitting.  

variation
---------

Instantiating a ``FinnSyll`` object with ``variation=True`` (default) will allow the syllabifier to return multiple syllabifications if variation is predicted. When ``variation=True``, the syllabifier will return a list. Setting ``variation`` to ``False`` will cause the syllabifier to return a string containing the first predicted syllabification. 

*Variation*:
::
  >>> f = FinnSyll(variation=True) 
  >>> f.syllabify('runoja')
  ['ru.no.ja']
  >>> f.syllabify('vapaus')
  ['va.pa.us', 'va.paus']

*No variation*:
::
  >>> f = FinnSyll(variation=False)
  >>> f.syllabify('runoja')
  'ru.no.ja'
  >>> f.syllabify('vapaus')
  'va.pa.us'

split_compounds
---------------

When instantiating a ``FinnSyll`` object with ``split_compounds=True`` (default), the syllabifier will first attempt to split the input into constituent words before syllabifying it. This forces the syllabifier to insert a syllable boundary in between identified constituent words. The syllabifier will skip this step if ``split_compounds`` is set to ``False``.

*Compound splitting*:
::
  >>> f = FinnSyll(split_compounds=True) 
  >>> f.syllabify('rahoituserien')  # rahoitus=erien
  ['ra.hoi.tus.e.ri.en']

*No compound splitting*:
::
  >>> f = FinnSyll(split_compounds=False) 
  >>> f.syllabify('rahoituserien')
  ['ra.hoi.tu.se.ri.en']  # incorrect  


