Token

Definition

The basic unit of text that language models process — typically a word, subword, or character.

Tokens are the fundamental units that language models operate on. Rather than processing text character by character or word by word, modern models use subword tokenization (like BPE or SentencePiece) that breaks text into variable-length pieces. Common words might be a single token, while rare words are split into multiple tokens.

The word 'tokenization' might be split into 'token' + 'ization'. A typical English word averages about 1.3 tokens. Token count determines how much text fits in a model's context window and directly affects processing cost and speed. Understanding tokens helps predict refinement costs and context window limits.

Related Terms

Related Content