mdsmith
Esc
    v0.52.0 GitHub
    MDS024 prose ready

    MDS024: paragraph-structure

    Paragraphs must not exceed sentence and word limits.

    # Settings

    SettingTypeDefaultDescription
    max-sentencesint6Maximum sentences per paragraph
    max-words-per-sentenceint40Maximum words per sentence
    placeholderslist[]Placeholder tokens to treat as opaque; see placeholder grammar

    Useful tokens: var-token, heading-question, placeholder-section.

    Markdown tables and code blocks are skipped.

    # How it works

    When the rule fires, every number in the message is exact.

    The sentence count comes from the trained Punkt segmenter (github.com/neurosnap/sentences ). It classifies every ./!/? against abbreviation, decimal, ellipsis, and initial heuristics — not a regex-based count. The per-sentence word count is the exact word count of the Punkt-segmented sentence (not the paragraph total). The over-long-sentence preview is a slice of the actual Punkt-segmented sentence (not a guess).

    A paragraph like “Dr. Smith met Mr. Jones at 3.14 p.m. on Jan. 5.” is one sentence, not seven. Naive splitters disagree. Punkt is right. So paragraph has too many sentences (8 > 6) means eight, and sentence too long (45 > 40 words): "..." quotes the real over-long sentence.

    A constant-time pre-check skips the segmenter when a paragraph cannot violate either limit. The pre-check never changes which diagnostics the rule reports.

    Configured placeholder tokens ({body}, {var-token}, etc.) collapse to neutral words before counting.

    # Config

    Enable with default thresholds:

    rules:
      paragraph-structure: true

    Enable with custom thresholds:

    rules:
      paragraph-structure:
        max-sentences: 6
        max-words-per-sentence: 40

    Explicitly disable (matches the default):

    rules:
      paragraph-structure: false

    # Examples

    # Good

    # Well Structured Document
    
    The sun rose over the hills. Birds began to sing.
    A gentle breeze swept through the valley.
    # Well Structured Chinese Document
    
    太阳升起。鸟儿开始歌唱。一阵微风吹过山谷。
    # Well Structured Japanese Document
    
    太陽が昇る。鳥が歌い始める。風が谷を吹き抜ける。

    # Bad

    # Overly Long Paragraph
    
    Dogs bark. Cats meow. Birds sing. Fish swim. Frogs croak. Snakes hiss. Bees buzz. Ants march.
    # Overly Long Chinese Paragraph
    
    今天天气很好。鸟儿在歌唱。微风轻拂。阳光明媚。空气清新。花朵盛开。山谷宁静。

    The segmenter treats the full-width Chinese / Japanese period as a sentence boundary the same way it does ASCII ., so the rule fires on CJK paragraphs that end every sentence with . Mixed CJK / ASCII paragraphs work too.

    Full-width and are word boundaries in the trained English pipeline (the vendored Punkt fork inherits that), but they do not flag a sentence break on their own — match upstream behaviour. Author CJK paragraphs with between sentences for the rule to count them correctly.

    # Diagnostics

    ConditionMessage
    too many sentencesparagraph has too many sentences (8 > 6)
    sentence too longsentence too long (45 > 40 words)

    # See also

    # Meta-Information

    • ID: MDS024
    • Name: paragraph-structure
    • Status: ready
    • Default: disabled (opt-in). When enabled: max-sentences: 6, max-words-per-sentence: 40
    • Fixable: no
    • Implementation: source
    • Category: prose