Grounding
Anchoring model outputs in verifiable evidence—documents, APIs, structured records—rather than relying solely on weights learned during training.
Parametric memory vs external memory
A model’s weights encode broad statistical regularities; they are not a queryable database with provenance. Grounding introduces channels where answers can be checked against sources: retrieved passages, SQL results, tool outputs. Those channels come with their own failure modes—wrong chunk, stale index, ambiguous join—so grounding is an integration problem, not a checkbox. Start with RAG: memory and honesty for the product-shaped view.
Agents, tools, and layered trust
Tool-using agents extend grounding beyond text lookup: they can call calculators, run code, or trigger workflows. Each hop adds attack surface (prompt injection, confused deputies) and reliability questions. Our essay tool use is not autonomy stresses evaluation under perturbations—small environment changes that break brittle chains.
When grounding fails visibly
The model may still sound authoritative if the base policy favors fluency—classic hallucination dynamics. Mitigations include quoting sparingly, showing citations inline, refusing when evidence is thin, and separating “answer” from “supporting passages” in the UI—themes in automation bias.
Windows, tokens, and what fits on the canvas
Grounded systems must decide what enters the context window after retrieval. More text is not always better; relevance ranking and deduplication matter as much as embedding quality. Token economics still apply—see Token.
Temporal validity and stale corpora
Grounding is only as fresh as the index. Retrieval over outdated manuals produces confident wrong answers unless pipelines expose document dates and decay policies. For live tools, agents must handle API versioning and clock skew—another layer of "grounding" beyond text.
Conflict resolution between sources
When two retrieved passages disagree, models often blend them fluently. Explicit conflict detection—surface both, ask the user to choose, or defer—prevents silent merging; this is partly UX and partly policy training—see RLHF limits when preferences rarely cover contradiction cases.