Top-p Sampling

Definition

A decoding strategy that selects from the smallest set of tokens whose cumulative probability exceeds a threshold p.

Top-p sampling (also called nucleus sampling) dynamically adjusts the candidate set size based on the probability distribution. Instead of a fixed number of candidates (top-k), top-p includes all tokens whose cumulative probability mass reaches the threshold p. When the model is confident, fewer tokens qualify; when uncertain, more tokens are considered.

This adaptiveness makes top-p generally more robust than top-k across different contexts. A common default is p=0.95, meaning the model samples from the smallest set of tokens covering 95% of the probability mass. Top-p is widely used in production language model APIs.

Related Terms

Related Content