Choosing Readability, Conciseness, and Token Budget Metrics

Trade-offs and threshold guidance for readability, structure, length, and token budgets.

# Scope and disclaimer

This guide compares existing mdsmith rules that touch readability and length with token budget awareness. Any metric scores and trade-offs below are illustrative and focus on the rules that are currently implemented.

# What the current rules measure

Rule	Measures	Default	What it misses
MDS023 `paragraph-readability`	Complexity using ARI (characters per word, words per sentence)	`max-index: 14.0`, `min-words: 20`	Wordiness and filler; short but dense paragraphs can be skipped
MDS024 `paragraph-structure`	Shape and length of paragraphs (sentences per paragraph, words per sentence)	`max-sentences: 6`, `max-words-per-sentence: 40`	Verbosity that fits within limits; dense but short prose
MDS022 `max-file-length`	Lines per file	`max: 300`	Token load and dense paragraphs
MDS028 `token-budget`	Estimated token count per file (`heuristic` or `tokenizer` mode)	`max: 8000`, `mode: heuristic`	Exact model token parity; tokenizer mode is still approximate
MDS001 `line-length`	Characters per line	`max: 80`	Verbosity and paragraph complexity

# Planned metrics (not implemented)

No additional metrics are planned at this time.

# What token budget awareness is trying to measure

Token budget awareness (MDS028 ) focuses on file-level size in terms of tokens rather than lines or characters. It protects LLM context windows by warning when a file exceeds a configurable budget. heuristic mode multiplies word count by a tokens-per-word factor, which is fast but approximate. tokenizer mode uses tokenizer-aware splitting with a selected encoding for a closer estimate.

Tokenization happens before inference, so any LLM will read inputs as tokens. That means token budgets are only accurate when they use the same tokenizer as the target model. The trade-off is performance: exact tokenization is slower and needs vocab assets, while heuristic estimates are fast and model-agnostic.

# Example paragraphs (paragraph-level metrics)

# Example A

In order to make sure that we are all on the same page, it is important to note that the system is, in most cases, able to handle requests pretty well, and this is something we should keep in mind.

# Example B

The synchronization algorithm enforces linearizability via per-shard lease epochs and monotonic commit indices.

# Example C

We should update the onboarding guide so that new contributors can quickly find the build steps, understand the release checklist, and avoid common pitfalls without needing to ask in chat, which will reduce interruptions for everyone.

# Example D

The plan is straightforward. We will add a new rule. It will report issues. It will include guidance. It will ship this week. It will help teams. It will reduce noise. It will keep docs short.

# Example E

Basically, we just want to make sure the plan is pretty clear to everyone. It is really just a simple update, and we might adjust it later.

# How the rules score these examples

Notes: ARI values use mdsmith’s current formula. MDS023 skips paragraphs under min-words. MDS024 flags when sentences or words exceed limits. Conciseness scores below are illustrative heuristics, not an implemented rule. Token budget awareness is file-level; see the token budget examples right after this table.

Example	Words	Sentences	ARI	MDS023 result	MDS024 result	Conciseness score (illustrative)
A	40	1	16.6	Fail (16.6 > 14.0)	Pass	36.2
B	13	1	20.2	Skipped (< 20 words)	Pass	84.6
C	36	1	22.1	Fail (22.1 > 14.0)	Pass	63.9
D	36	8	0.3	Pass	Fail (8 > 6 sentences)	50.0
E	26	2	4.6	Pass	Pass	50.0

# Token budget examples (file-level)

These examples assume a tokens-per-word of 1.33 and a budget of 2,000 tokens.

File F: 2,800 words -> ~3,724 tokens, flagged by token budget even if line count is below max-file-length.
File G: 1,200 words with heavy code blocks -> estimate ~1,596 tokens, but actual tokens could be higher; tokens-per-word tuning or code weighting may be needed.

# Trade-offs by metric

Metric	Strengths	Risks
Readability (MDS023 )	Encourages simple, broadly accessible prose	Penalizes technical terms; misses wordiness; can skip short dense paragraphs
Structure (MDS024 )	Enforces consistent paragraph shape with low false positives	Does not address filler or redundancy
Length (MDS022 , MDS001 )	Prevents runaway size and formatting drift	Poor proxy for token load or verbosity
Token budget (MDS028 )	Directly targets context window size	Estimation is noisy; code blocks and symbols can skew counts
Conciseness (proposed)	Targets verbosity and token waste	Heuristic; can penalize necessary qualifiers or legal language

# How to choose limits

Start with defaults for MDS023 and MDS024 to establish baseline structure and readability.
Sample a representative set of documents and collect results before tightening thresholds.
For token budgets, pick a target based on your context window and allocate a safe share per document (for example, reserve 20 to 30 percent of a prompt budget for a single doc). Choose an initial tokens-per-word value and adjust for code-heavy files.
For conciseness scoring, set an initial threshold that flags only the worst 10 to 20 percent of paragraphs, then adjust.
Use path-based overrides to reflect different document types, such as onboarding guides vs architecture specs.
Re-evaluate thresholds after major content changes or when onboarding new teams.

# When to use one measure instead of many

If you need a single metric to minimize complexity, choose the one that best matches your risk:

Choose MDS024 paragraph-structure when you want predictable, low-noise enforcement.
Choose MDS023 paragraph-readability when broad comprehension is the highest priority.
Choose MDS028 token-budget when context window limits are the dominant constraint and you want a file-level guardrail.
Choose conciseness scoring when token budget and drift are the main risks and you accept heuristic trade-offs.

# Recommendation for mdsmith users

Start with MDS023 and MDS024 enabled. Use MDS022 and MDS001 as baseline file and line controls. Add MDS028 when context limits matter, then add conciseness scoring only after calibrating its thresholds and confirming it improves signal without harming necessary precision.