# CHANGELOG #

## Version 1.8.0, 2021-08-03 ##

- Add option --use-nfkc to the command line interface and option
  use_nfkc to the constructor of ASPTagger (issue #11). If this option
  is used, the internal representation of the input data uses Unicode
  normalization form NFKC. This can be useful for social media input
  that misuses mathematical symbols for their typographic effects
  (e.g. “𝕴𝖒𝖕𝖋𝖆𝖚𝖘𝖜𝖊𝖎𝖘” instead of “Impfausweis”).
- Add option --sentence-tag to specify an XML tag in the input data
  that marks sentence boundaries (issue #12). This is particularly
  useful in combination with the --sentence-tag option of SoMaJo.

## Version 1.7.3, 2021-03-18 ##

- Use less memory when loading a model if the ijson library is present
  and the Python version is at least 3.7 (at least 3.6 for CPython)
  (issue #9).
- Restructured code for parallel tagging (issue #8).

## Version 1.7.2, 2021-03-05 ##

- Bugfix: Do not choke on chunks of XML that do not contain actual
  word tokens (usually at the end of a file).
- Updated regular expressions for emojis, emoticons, numbers and URLs.

## Version 1.7.1, 2019-11-07 ##

- Fixed an XML-related bug in STTS_IBK_postprocessor
- Fixed a minor bug in emoticon regex

## Version 1.7.0, 2019-11-07 ##

- Added Reddit links and Reddit-specific emoticons
- Moved command-line interface to cli.py
- Helper script for tagging multiple files (somewe-tagger-multifile)
- Postprocessing script for some deterministic tagging decisions in
  STTS_IBK, e.g. URLs, Emoticons, etc. (STTS_IBK_postprocessor)

## Version 1.6.2, 2019-10-17 ##

- Sanity-check input: Warn if there are extremely long sentences (≥
  500 words) in the input as this might indicate missing sentence
  boundaries.
- Use np.frombuffer() instead of np.fromstring() to fix a
  DeprecationWarning.

## Version 1.6.1, 2019-10-02 ##

- New option -v/--version to output version information.
- Explicitly specify input encoding as UTF-8.
- Fixed a bug in progress display.

## Version 1.6.0, 2019-07-02 ##

- New method tag_xml_sentence for simplified processing of SoMaJo's
  output for XML files.
- Updated regular expressions for emojis (taken from SoMaJo).
- Fixed a bug where SoMeWeTa could not be installed when numpy was not
  already there.

## Version 1.5.1, 2019-06-19 ##

- Got rid of FutureWarning about possible nested sets in regular
  expression.

## Version 1.5.0, 2019-04-12 ##

- Added support for parallel tagging of XML input.
- New option --progress for showing tagging progress and remaining
  time.
- Fixed calculation of confidence interval when reporting
  crossvalidation results.

## Version 1.4.0, 2018-11-14 ##

- Replaced XML parsing with a shallower approach. When tagging an XML
  file, we do no longer have to keep the whole file in memory.
- Minor improvements regarding URLs and emojis.

## Version 1.3.1, 2018-03-27 ##

Bugfix: Sentence boundaries are correctly recognized when reading an
XML file.

## Version 1.3.0, 2018-03-23 ##

- SoMeWeTa has now XML support. To tag an XML file, use the option
  -x/--xml. It is assumed that each XML tag is on a separate line.
- The implementation of the beam search algorithm has been slightly
  improved.

## Version 1.2.0, 2018-02-23 ##

It is now possible to use the option --ignore-tag to specify a tag
that will not be learned during training and that will be ignored
during evaluation. Use case: Partially annotated data that use a
pseudo-tag for tokens without annotation.

## Version 1.1.2, 2017-10-27 ##

Bugfix: Using the --parallel option does no longer change the order of
the sentences.

## Version 1.1.1, 2017-10-25 ##

This version fixes a bug that made it impossible to use the --parallel
option when reading from STDIN.

## Version 1.1.0, 2017-10-24 ##

- Bugfix: Removed trailing space from last tag in sentence.
- The new option --parallel makes it possible to use a pool of worker
  processes to speed up tagging.
- We also print a log message that indicates tagging speed.

## Version 1.0.0, 2017-09-15 ##

First release.
