Single vs Stacked Presets for Dictation Refinement

Compare using a single refinement preset versus stacking multiple presets for dictation. Learn when simplicity wins and when composability matters.

CriteriaSingle Preset RefinementStacked Preset Refinement
Latency1-3 seconds — single LLM call2-9 seconds — multiple sequential LLM calls
Cost Per DictationOne LLM inference — typically $0.001-$0.01N inferences — $0.002-$0.03 for 2-3 stacked presets
FlexibilityLow — changing behavior requires editing the monolithic promptHigh — swap, add, or remove individual presets without side effects
MaintainabilityDecreases as the single prompt grows in complexityRemains high — each preset stays small and focused
Output ConsistencyHigher — one model call means no inter-pass conflictsLower — later passes may conflict with earlier refinements

Single Preset Refinement

One refinement preset handles all post-processing in a single LLM pass. The preset contains all instructions for tone, formatting, cleanup, and domain-specific corrections in one prompt.

Pros

  • Simpler to create, understand, and maintain — one prompt does everything
  • Lower latency — only one LLM inference call per dictation
  • Lower cost — one API call instead of multiple sequential calls
  • No risk of conflicting instructions between presets

Cons

  • Complex prompts trying to handle everything can become unwieldy and brittle
  • Cannot easily mix and match behaviors — changing one aspect requires editing the whole preset
  • Difficult to reuse individual refinement behaviors across different workflows
  • Single prompts have diminishing returns as instruction count grows

Stacked Preset Refinement

Multiple refinement presets are applied sequentially, each handling a specific aspect of post-processing. For example: cleanup pass, then formatting pass, then tone adjustment pass.

Pros

  • Modular and composable — mix and match presets for different workflows
  • Each preset can be simple, focused, and individually tested
  • Easy to add or remove a specific behavior without affecting others
  • Enables preset libraries where community presets can be combined freely
  • Better separation of concerns — each preset has one job

Cons

  • Higher latency — each preset adds another LLM inference round-trip
  • Higher cost — N presets means N API calls per dictation
  • Later presets may undo or conflict with changes from earlier presets
  • More complex pipeline to configure, debug, and reason about

Verdict

Start with single presets. They are simpler, faster, and cheaper. Move to stacked presets only when your single preset becomes too complex to maintain or when you need to compose behaviors dynamically across different workflows. Ummless supports both approaches, letting you start simple and graduate to composition as your needs grow.

Frequently Asked Questions

When should I switch from single to stacked presets?

Switch when your single preset exceeds 300-400 words and tries to handle unrelated concerns. If you find yourself with a prompt that handles cleanup, formatting, tone, and domain terms all at once, breaking it into focused presets improves reliability and makes each piece testable.

How do I prevent stacked presets from conflicting?

Order matters. Apply structural changes first (cleanup, filler removal), then formatting (markdown, code blocks), then tone (formal, casual) last. Each preset should be idempotent — running it on already-correct text should not introduce changes. Test each preset independently before stacking.

What is a good starter preset for developer dictation?

A single preset that removes filler words, adds punctuation, fixes capitalization, and preserves technical terms is a great starting point. It handles 90% of dictation cleanup needs without overcomplicating the prompt or requiring multiple passes.

Related Content