Metadata-Version: 2.1
Name: SoMaJo
Version: 2.2.4
Summary: A tokenizer and sentence splitter for German and English web and social media texts.
Home-page: https://github.com/tsproisl/SoMaJo
Download-URL: https://github.com/tsproisl/SoMaJo/archive/v2.2.4.tar.gz
Author: Thomas Proisl, Peter Uhrig
Author-email: thomas.proisl@fau.de
License: GNU General Public License v3 or later (GPLv3+)
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Natural Language :: German
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Text Processing :: Linguistic
License-File: LICENSE.txt

SoMaJo
======

SoMaJo is a state-of-the-art tokenizer for German and English web and
social media texts. It won the `EmpiriST 2015 shared task
<https://sites.google.com/site/empirist2015/>`_ on automatic
linguistic annotation of computer-mediated communication / social
media. As such, it is particularly well-suited to perform tokenization
on all kinds of written discourse, for example chats, forums, wiki
talk pages, tweets, blog comments, social networks, SMS and WhatsApp
dialogues.

More detailed documentation is available `here
<https://github.com/tsproisl/SoMaJo>`_.
