Metadata-Version: 2.1
Name: Tokenize2
Version: 2.0.3
Summary: A byte-level BPE tokenizer for efficient text processing
Home-page: https://github.com/TnsaAi/Tokenize2
Author: TNSA AI
Author-email: thishyakethabimalla@gmail.com
License: MIT
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown

# Tokenize2

Tokenize2 is an improved byte-level BPE tokenizer, inspired by models like GPT-3, designed for efficient tokenization of text into subword units. It supports special tokens and byte-level text handling for robust tokenization, including for non-ASCII characters.

## Features

- Byte-level tokenization for handling a wide range of characters
- Special tokens (like `<PAD>`, `<UNK>`) for flexible token management
- Supports efficient BPE merges for subword tokenization
- Suitable for natural language processing and text generation tasks

## Installation

You can install Tokenize2 via pip:

