Progressive Disclosure for AI Agents
Use <?catalog?> with a per-file summary front matter field to emit a one-line index of a directory, so AI coding agents read a few thousand tokens of metadata up front and only Read the files a task actually touches.
AI coding agents (Claude Code, Codex, Copilot, Cursor) pay context tokens for every file they pull into the prompt. Loading every doc in a repository eagerly is the easy default and the wrong one: 60 docs at 2 000 tokens each is 120 000 tokens spent before the agent has done anything.
Progressive disclosure inverts that. The agent file
(CLAUDE.md, AGENTS.md, .github/copilot-instructions.md)
carries a short index — one line per doc, naming the
specific data each doc contains — and the agent
Reads the full file only when a task makes one
relevant.
mdsmith builds that index for you. This guide is the recipe.
# Motivation
A three-tier loading model is the broader pattern:
- Metadata. Filenames + one-line
summarystrings. Cheap; always in the prompt. - Instructions. A skill or guide body the agent pulls in when the metadata says it applies.
- Reference. Full source files the agent reads on demand once it knows what it is looking for.
Tier 1 is the bottleneck. A vague index (“docs page
about catalog”) forces the agent to load tier 2 to
decide whether tier 2 was needed at all — the
disclosure stops being progressive. A precise index
(“How <?catalog?> projects a summary field into a
generated bullet list, with the where and sort
parameters”) lets the agent skip the file without
opening it.
mdsmith’s <?catalog?> directive automates tier 1.
Each tracked file declares a summary in YAML front
matter; the directive generates the index from those
summaries; mdsmith check flags drift; mdsmith fix
re-renders.
# How mdsmith does it
CLAUDE.md
is the live example.
A single <?catalog?> under its ## Docs heading
matches every tracked doc under docs/** — four !
patterns exclude research notes, security reports,
brand copy, and proto.md templates — sorts by
path, and emits one bullet per match:
<?catalog
glob:
- "docs/**/*.md"
- "!docs/research/**"
- "!docs/security/**"
- "!docs/brand/**"
- "!**/proto.md"
sort: path
header: ""
row: "- [{summary}]({filename})"
?>
- [How generated sections work — markers, directives, and fix behavior.](docs/background/concepts/generated-section.md)
- [Build commands, project layout, code style, test fixtures, coverage gate, and merge conflicts.](docs/development/index.md)
...
<?/catalog?>The ~110-doc tree compresses to ~3 400 tokens of index.
The agent reads CLAUDE.md once at session start and
already knows which file to open for any task it gets.
AGENTS.md is a thin shim that <?include?>s
CLAUDE.md, so Codex and Copilot consume the same
index from the same source.
# Step-by-step
#
1. Add a summary to every file in the indexed tree
Front matter on each doc:
---
title: Coverage Gate
summary: >-
Codecov coverage gate and CI status checks.
---The summary is what shows up in the agent’s index.
Write it like a tier-1 cue: state the specific data
the file contains, in one sentence the agent can match
against a task description. Vague verbs (“describes”,
“handles”, “covers”) force a tier-2 load to confirm.
#
2. Place a <?catalog?> in the agent file
In CLAUDE.md (or AGENTS.md, or any orient-the-
agent doc):
## Docs
<?catalog
glob: "docs/**/*.md"
sort: path
row: "- [{summary}]({filename})"
?>
<?/catalog?>row is the line shape; {summary} and {filename}
are pulled from the matched file’s front matter.
Without header or footer, the body is just the
rows.
#
3. Run mdsmith fix to populate the body
mdsmith fix CLAUDE.mdThe empty body between the markers fills in. From
this point, the index is read-only: edit the
underlying file’s summary (not the body line) and
re-run fix.
#
4. Lock in freshness with mdsmith check
mdsmith check .If anyone edits a file’s summary and forgets to
re-render, check fails with an MDS019 diagnostic
naming the catalog whose body drifted. Run this in
CI; the index cannot rot.
# 5. (Optional) Filter and partition
A single 60-line index is fine for a small repo. For larger trees, partition by category so each section is short enough for the agent to skim:
### Background
<?catalog
glob: "docs/background/**/*.md"
sort: path
row: "- [{summary}]({filename})"
?>
<?/catalog?>
### Reference
<?catalog
glob: "docs/reference/**/*.md"
sort: path
row: "- [{summary}]({filename})"
?>
<?/catalog?>Or filter by front matter with where:
<?catalog
glob: "plan/*.md"
where: 'status: "🔳"'
sort: numeric:id
row: "- [{id}: {summary}]({filename})"
?>
<?/catalog?>where is a CUE expression evaluated against the
matched file’s parsed front matter. The same grammar
mdsmith list query
uses
drops in unchanged. See
Generating Content with Directives
for the full parameter reference.
# Common issues
#
A file is missing summary
{summary} resolves to an empty string and the row
becomes - [](path/to/file.md). Fix with a schema
that requires the field. Declare a kind for the
indexed tree, point a required-structure schema at
it, and list summary as required front matter:
---
title: nonEmpty
summary: nonEmpty
---
# ?See Schemas
for the kind +
proto.md wiring. Once the schema is in place,
mdsmith check blocks any file in the tree that
ships without a usable summary.
# Summaries are too vague to disclose anything
If a summary reads “Docs page about catalogs”, an
agent has no way to skip it without loading the file.
The repo writing guideline applies: name the inputs
(front matter fields, glob patterns, heading levels)
and the condition checked against them, not the
mechanism. “How <?catalog?> projects a summary
field into a bullet list, with where and sort
parameters” is a tier-1 cue; “Catalog directive
guide” is not.
This is a content rule, not a lint rule. MDS024 paragraph-structure and the surrounding readability rules catch the worst offenders in body prose; summaries are short enough that review is the gate.
# The index is too large to load eagerly
A 200-line catalog still beats loading 200 files, but it crowds out room for the agent’s actual work. Two tactics:
- Partition. One catalog per top-level directory, each under its own heading. The agent reads the index, jumps to the section that matches the task.
- Filter. Pin the catalog to active work with
where: 'status: "🔳"'(orkind, or a tag field). Archived material drops out of the index without leaving the repo.
The
token-budget
rule on CLAUDE.md is the canary: if the index
pushes the file over budget, the rule fires before
the agent runs out of context.
# A summary disagrees with the file’s content
mdsmith check does not validate summary against
the body — it only validates the catalog body
against the front matter. Drift here is a content
problem, not a lint problem. Two mitigations:
- Keep
summaryshort. A two-sentence summary is hard to falsify; a paragraph is not. - Treat
summaryas part of the same change as the content. The review process catches mismatches the same way it catches stale docstrings.
# The field you want to project is a list
row placeholders project scalar values only. A
front matter field that is a list or a map renders
as an empty string — there is no list-comprehension
or join syntax in the placeholder grammar.
For now, project a scalar derived from the list (a joined comma-separated field maintained alongside the structured one), or generate the catalog body from a Go helper that walks the structured field itself.
# Generated index conflicts on merge
Two branches both ran mdsmith fix and both committed
the regenerated catalog body — git reports a conflict
inside the <?catalog?> markers. Install the
merge driver
once per clone:
mdsmith merge-driver installThe driver re-runs fix and re-stages the result, so
generated-section conflicts resolve without manual
intervention.
# See also
- Generating Content with Directives
— full reference for
<?catalog?>and<?include?>. - Generated sections concept — markers, directives, and fix behavior.
- Schemas
— require a
summaryfield per directory. mdsmith list query— same CUE grammarwhere:uses.