Metadata-Version: 2.1
Name: xtokenizer
Version: 0.0.2
Summary: A simple Tokenizer
Home-page: https://gitee.com/summry/xtokenizer
Author: summy
Author-email: fkfkfk2024@2925.com
License: UNKNOWN
Keywords: tokenizer,NLP
Platform: UNKNOWN
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE

Usage Sample
''''''''''''

.. code:: python

        from xtokenizer import Tokenizer

        tokenizer = Tokenizer.from_texts(texts, min_freq=5)
        sent = 'I love you'
        tokens = tokenizer.encode(sent, max_length=6)
        # [101, 66, 88, 99, 102, 0]
        sent = tokenizer.decode(tokens)
        # ['<BOS>', 'I', 'love', 'you', '<EOS>', '<PAD>']


