Metadata-Version: 2.0
Name: FinnSyll
Version: 2.0.0
Summary: Finnish syllabifier and compound segmenter
Home-page: https://github.com/tsnaomi/finnsyll
Author: Naomi Tachikawa Shapiro
Author-email: coder@tsnaomi.net
License: BSD
Keywords: Finnish syllabifier compound segmenter
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python
Requires-Dist: morfessor

.. image:: https://travis-ci.org/tsnaomi/finnsyll.svg?branch=master
    :target: https://travis-ci.org/tsnaomi/finnsyll

FinnSyll
********

FinnSyll is a Python library that syllabifies words according to Finnish syllabification principles.
It is also equipped with a Finnish compound splitter. 
More details/docs to come.

Installation
============
::

        $ pip install FinnSyll

Basic usage
===========

First, instantiate a ``FinnSyll`` object. ::

        >>> from finnsyll import FinnSyll
        >>> f = FinnSyll()

To syllabify: ::

        >>> f.syllabify('runoja')
        ['ru.no.ja']  # internal syllable boundaries are indicated with '.'

To segment compounds: ::

        >>> f.split('sosiaalidemokraattien')
        'sosiaali=demokraattien'  # internal word boundaries are indicated with '='

Optional arguments
==================

The syllabifier can be customized along two different parameters: variation and compound splitting.  

variation
---------

Instantiating a ``FinnSyll`` object with ``variation=True`` (default) will allow the syllabifier to return multiple syllabifications if variation is predicted. When ``variation=True``, the syllabifier will return a list. Setting ``variation`` to ``False`` will cause the syllabifier to return a string containing the first predicted syllabification. 

*Variation*: ::

        >>> f = FinnSyll(variation=True) 
        >>> f.syllabify('runoja')
        ['ru.no.ja']
        >>> f.syllabify('vapaus')
        ['va.pa.us', 'va.paus']

*No variation*: ::

        >>> f = FinnSyll(variation=False)
        >>> f.syllabify('runoja')
        'ru.no.ja'
        >>> f.syllabify('vapaus')
        'va.pa.us'

split_compounds
---------------

When instantiating a ``FinnSyll`` object with ``split_compounds=True`` (default), the syllabifier will first attempt to split the input into constituent words before syllabifying it. This forces the syllabifier to insert a syllable boundary in between identified constituent words. The syllabifier will skip this step if ``split_compounds`` is set to ``False``.

*Compound splitting*: ::

        >>> f = FinnSyll(split_compounds=True) 
        >>> f.syllabify('rahoituserien')  # rahoitus=erien
        ['ra.hoi.tus.e.ri.en']

*No compound splitting*: ::

        >>> f = FinnSyll(split_compounds=False) 
        >>> f.syllabify('rahoituserien')
        ['ra.hoi.tu.se.ri.en']  # incorrect  


