Telugu Tokenizer Sentence Inspector & Evaluation.
Using Two SUTRA Tokenizer as Baseline.
Paper :
EVALUATING TOKENIZER PERFORMANCE OF LARGE LANGUAGE MODELS ACROSS OFFICIAL INDIAN LANGUAGES
PERFORMANCE EVALUATION OF TOKENIZERS IN LARGE LANGUAGE MODELS FOR THE ASSAMESE LANGUAGE
Tokenize a sentence with various tokenizers and inspect how it's broken down.
Examples
Baseline Tokenizer
Detailed Token Table
Token Table