Top-p Sampling
Definition
A decoding strategy that selects from the smallest set of tokens whose cumulative probability exceeds a threshold p.
Top-p sampling (also called nucleus sampling) dynamically adjusts the candidate set size based on the probability distribution. Instead of a fixed number of candidates (top-k), top-p includes all tokens whose cumulative probability mass reaches the threshold p. When the model is confident, fewer tokens qualify; when uncertain, more tokens are considered.
This adaptiveness makes top-p generally more robust than top-k across different contexts. A common default is p=0.95, meaning the model samples from the smallest set of tokens covering 95% of the probability mass. Top-p is widely used in production language model APIs.