mdsmith
Esc
    v0.52.0 GitHub
    MDS028 prose ready

    MDS028: token-budget

    File must not exceed a token budget.

    # Settings

    SettingTypeDefaultDescription
    maxint8000Default token budget when no per-glob budget matches
    modestringheuristicCounting mode: heuristic or tokenizer
    tokens-per-wordnumber1.33Tokens per word multiplier used in heuristic mode
    tokenizerstringbuiltinTokenizer family used in tokenizer mode
    encodingstringcl100k_baseEncoding profile for tokenizer mode: cl100k_base, p50k_base, r50k_base, gpt2
    budgetslistnoneOrdered per-glob budgets (glob, max); last matching entry wins

    # Config

    Enable with defaults:

    rules:
      token-budget: true

    Heuristic mode:

    rules:
      token-budget:
        mode: heuristic
        tokens-per-word: 1.33
        max: 2400

    Tokenizer mode with per-glob budgets:

    rules:
      token-budget:
        mode: tokenizer
        tokenizer: builtin
        encoding: cl100k_base
        max: 3000
        budgets:
          - glob: "README.md"
            max: 4000
          - glob: "guides/*.md"
            max: 5000

    Disable:

    rules:
      token-budget: false

    # Examples

    # Good

    # Token Budget
    
    This file stays within budget.

    # Bad

    # Token Budget
    
    one two three four five six

    # Meta-Information

    • ID: MDS028
    • Name: token-budget
    • Status: ready
    • Default: enabled, max: 8000, mode: heuristic, tokens-per-word: 1.33, tokenizer: builtin, encoding: cl100k_base
    • Fixable: no
    • Implementation: source
    • Category: prose