Tokens
Tokens are to an LLM what individual LEGO bricks are to a model: the small, standardized pieces the system actually works with. You write words, but the model sees tokens — numeric identifiers representing words, parts of words, or even punctuation marks. A single word might be one token or several, and understanding this distinction is key to working effectively with AI.
How It Works
Before an LLM can process any text, a tokenizer splits it into tokens. As the diagram shows, the sentence "The cat sat on the mat" becomes six individual tokens, each mapped to a numeric identifier. The model works entirely with these identifiers, not with raw text.
Tokenization isn't as simple as splitting on spaces. Common words like "the" or "is" are typically a single token, but less common words get broken into subword pieces. For example, "tokenization" might become ["token", "ization"] — two tokens. Programming keywords, punctuation, and whitespace all have their own token identifiers. The dot before words in the diagram represents the space character, which is encoded as part of the token itself.
Different models use different tokenizers. GPT-4 uses a tokenizer called cl100k_base with roughly 100,000 possible tokens. Claude uses its own tokenizer. This means the same text can produce different token counts depending on the model — something to keep in mind when estimating costs or context usage.
The tokenizer is trained separately from the model itself. It learns to split text efficiently by finding common patterns in a large corpus. Frequent words get their own token (cheaper), while rare words are split into subword pieces (more tokens, higher cost).
Why It Matters
Token count directly controls three things you care about: cost, speed, and limits. API pricing is per token, both for what you send (input tokens) and what the model generates (output tokens). More tokens also mean slower responses, since the model processes them sequentially.
Most importantly, every model has a context window — a maximum number of tokens it can handle in a single conversation. GPT-4 Turbo supports 128K tokens; Claude supports up to 200K. If your prompt exceeds the limit, it gets truncated and the model loses context. Knowing how tokenization works helps you write more efficient prompts, estimate API costs accurately, and understand why some inputs get cut off unexpectedly.