Top-k Sampling
Definition
A decoding strategy that restricts token selection to the k most probable candidates at each step.
Top-k sampling limits the model's choices at each generation step to the k tokens with the highest probabilities. All other tokens are zeroed out before sampling. This prevents the model from selecting very unlikely tokens while still allowing diversity in the output.
Common values range from k=10 to k=100. Lower k values produce more focused, predictable output; higher values allow more variety. Top-k sampling is often combined with temperature scaling and top-p sampling for fine-grained control over generation quality.