LLM Agents Do Not Respect Rules. This Pattern Makes Them.

Arnold Wender April 25, 2026

#LLM #Agents #Python #Claude Code #Anthropic #Policy #Open Source

The Problem

When you run an LLM agent over a real codebase, you need it to respect rules. Not suggestions — rules. “This is the legal business address; never substitute a target-market address.” “No force pushes.” “Never write these prohibited brand terms in copy.” “LocalBusiness schema must always use the home address, not the service area.”

The obvious approach: put the rules in the system prompt. “You MUST NOT modify the address field.” The problem is that LLMs ignore system prompt instructions under task pressure, under adversarial input, or simply by accident. This is not a theoretical concern. It is documented and reproducible.

Issue #19874 in the anthropics/claude-code repository — opened in January 2026 and closed as “not planned” at v2.1.14 without a fix to the underlying architecture, though the underlying bug is documented since v1.0.95 (August 2025) in multiple earlier issues — states it plainly:

Plan mode provides no actual enforcement of read-only restrictions. The “safety guarantee” is implemented purely as a system prompt instruction that the LLM can (and regularly does) ignore.

The issue includes a flow diagram making the architectural gap explicit: there is no enforcement layer between the LLM decision and the tool execution. When the LLM ignores the safety instruction, the file write happens anyway.

Two related issues illustrate the pattern: #21292 describes how Claude proceeds with non-readonly actions when ExitPlanMode is rejected by the user; #19021 documents a concrete case in which the agent, after editing files during plan mode, attempted to revert its changes with git checkout and destroyed uncommitted user work. The root cause is the same: the rejection message is treated as post-implementation guidance rather than a continuation of the plan-mode constraint.

The right fix is not a better prompt. It is enforcement at the executor layer — a check in code that the LLM cannot override.

What the Existing Tools Do (and Do Not Do)

Three approaches address parts of this problem. None covers it completely.

Claude Code Plan Mode — proposal-time review, not executor enforcement

Plan Mode solves the proposal-review loop: the agent shows you what it intends to do before doing it. That is useful, but it is not executor-side enforcement. If the LLM proceeds despite a rejection — which it does, per the issues above — there is no backstop. Approval happens at proposal time; execution remains unguarded.

AWS Bedrock Guardrails — inference-time filtering, not action enforcement

Bedrock Guardrails enforces at inference time. Since April 2026 it supports cross-account and organization-level enforcement, which is a meaningful improvement. But it operates on model output: content filtering, topic deny lists, PII detection. You cannot express “if this agent action would write the wrong NAP address into a LocalBusiness schema, halt.” You can express “if the model output contains PII, filter it.” This is valuable, but it operates on tokens, not on actions — Bedrock knows nothing about the LocalBusiness schema or the NAP field your agent is about to write.

LangGraph `interrupt()` — human-in-the-loop without a policy object

LangGraph interrupt() pauses a running graph and hands control to a human. The Command(resume=...) pattern lets you approve, reject, or modify the next action before execution continues. This is the right shape for human-in-the-loop workflows. But LangGraph has no opinionated policy object — no mechanism to load declarative rules from a file, merge per-run overrides on top, and enforce the result deterministically. You wire the checks manually inside your nodes — each node that needs to enforce a rule writes its own Python condition in code, with no single place to declare “these are the project invariants”.

GitHub Spec Kit — context injection, not enforcement

GitHub Spec Kit (github/spec-kit) introduces .specify/memory/constitution.md — a non-negotiable principles file loaded as LLM context before lifecycle commands. The naming is right and the idea is right. But it is context injection, not enforcement. The LLM reads the constitution and is expected to respect it. There is no code-level halt when it does not.

The gap is the combination: declarative layered rules merged deterministically, with a halt enforced in the executor, not in the prompt.

The Convergence

The layering problem has been solved in the infrastructure world since 2018: Kustomize overlays.

Kustomize addresses “I have a base config and I need per-environment corrections” with a deterministic rightmost-wins merge. Base defines defaults; overlays override specific fields; the result is reproducible and debuggable. Its design explicitly avoids any logic-driven merge — the merge is a pure structural operation.

Applied to agent policy, the shape is:

.agent/
  constitution.yaml    # human-written, committed, base invariants
  corrections.yaml     # per-run overrides, gitignored

The merge algorithm is simple: final_rules = deep_merge(constitution, corrections). Dicts merge recursively; scalars and lists are replaced by the rightmost value. No logic, no LLM involvement, no surprises.

In the current implementation, corrections.yaml is written by hand by the developer for a specific run — for example, “this migration may touch 200 files, not 50”. An automated approval workflow, where Plan Mode proposes corrections and the user approves before they are applied, is a natural integration but lives outside the v0.1 library.

Then — the part missing from every current tool — the merged rules are enforced in Python, not in the prompt:

@halt_on_reject(constitution)
def write_local_business_schema(address: dict) -> None:
    legal = constitution.get("business.home_address")
    if address.get("addressLocality") != legal.get("addressLocality"):
        raise PolicyReject(
            f"address mismatch: schema has {address['addressLocality']!r}, "
            f"constitution requires {legal['addressLocality']!r}"
        )
    # ... actual write logic

PolicyReject propagates unconditionally. The LLM cannot instruct the Python runtime to skip a raise. This is the backstop that Plan Mode does not have.

Reference Implementation

I built this pattern as a small Python library: constitution-overlay — MIT licensed, ~300 lines, mypy --strict clean.

The public API has four symbols:

from constitution_overlay import (
    Constitution,        # loads and merges YAML/dict layers
    halt_on_reject,      # decorator factory — enforcement
    PolicyReject,        # exception type — halts the executor
    ConstitutionContext, # read-only view passed to fixer functions
)

Loading and merging

# From dicts
base = {
    "business": {
        "legal_name": "Arnold Wender",
        "home_address": {"addressLocality": "Halle (Saale)"},
    },
    "limits": {"max_files_per_commit": 50},
}
corrections = {"limits": {"max_files_per_commit": 200}}  # this run needs more

constitution = Constitution.from_layers(base, corrections)
constitution.get("limits.max_files_per_commit")  # 200
constitution.get("business.home_address.addressLocality")  # "Halle (Saale)"

# From files
constitution = Constitution.from_layers(
    "constitution.yaml",
    "corrections.yaml",
)

from_layers() accepts dicts, string paths, and Path objects interchangeably. Missing files raise FileNotFoundError immediately — no silent fallback.

Enforcing

@halt_on_reject(constitution)
def commit_files(files: list[str]) -> None:
    limit = constitution.get("limits.max_files_per_commit")
    if len(files) > limit:
        raise PolicyReject(f"too many files: {len(files)} > {limit}")
    # proceed with commit

@halt_on_reject(constitution)
def write_copy(text: str) -> None:
    prohibited = constitution.get("brand.prohibited_terms", [])
    hits = [t for t in prohibited if t in text.lower()]
    if hits:
        raise PolicyReject(f"prohibited brand terms: {hits}")
    # proceed with write

The decorator is a contract marker. PolicyReject propagates unconditionally — nothing in the decorator swallows it. Future versions will support pre-execution rule checks from the constitution directly, but v0.1 keeps it simple: checks live in the wrapped function, enforcement lives in the decorator.

A realistic constitution.yaml

business:
  model: remote-services
  legal_name: "Arnold Wender"
  home_address:
    streetAddress: "Franckestrasse 3a"
    addressLocality: "Halle (Saale)"
    postalCode: "06110"
    addressCountry: "DE"
  telephone: "+49 345 6867 6857"
service_area:
  - "Bayern"
brand:
  prohibited_terms: ["billig", "guenstigster"]
constraints:
  - "LocalBusiness schema MUST use home_address as address; service_area goes in areaServed"
  - "City pages reference target locality in content only — never in NAP"
require_review:
  - "schema.LocalBusiness"
  - "url-impact:destructive"

The service_area list is what the website targets. The home_address is where the business actually operates. An LLM agent working on a Bayern-targeted site must not confuse them — the constitution makes that constraint explicit and enforceable.

Limitations

v0.1 is deliberately narrow. What it does not do:

No async support. Sync only. Async wrappers are planned for v0.2.
No constitution schema validation. The library merges whatever YAML you give it. JSON Schema validation of the constitution itself is on the roadmap.
No built-in review gate. The approval workflow (show the conflict, ask approve/reject/edit, write corrections.yaml) is outside the library. The library only handles merge and enforcement.
List merge is replace-only. Rightmost list wins. There is no append directive yet, which means prohibited_terms in an overlay replaces the base list rather than extending it. This is on the roadmap.
Not a replacement for LangGraph, Claude Code, or Bedrock. It is an enforcement layer that sits next to them.

What I Would Like to See in Claude Code

The executor-side halt pattern belongs natively in Claude Code as a first-class primitive. Specifically:

A constitution.yaml in the project root (or .claude/) that Claude Code reads before any tool execution — not as context injection but as a hard constraint list. Before any Edit, Write, or Bash call, Claude Code checks the action against the constitution. Violation raises, not warns.

Corrections as a first-class concept: when Plan Mode identifies a conflict between a proposed action and the constitution, it writes a corrections.yaml with the proposed resolution. The executor reads that file, merges it on top of the constitution, and proceeds with the corrected action — or halts if the correction says decision: rejected.

The merge semantics are the same ones Kustomize has used since 2018: deterministic, rightmost wins, no LLM-driven merge logic. The result is reproducible and auditable.

The reference implementation is at github.com/arnoldwender/constitution-overlay. It is MIT licensed, has 69 tests, 98% coverage, and passes mypy --strict. If this is useful as a starting point for a TypeScript port or as a design reference, I am happy to help.

For a practical example of Python-based LLM tooling from an open-source contributor perspective, see the MemPalace case study.

Cite this work

Wender, A. (2026). constitution-overlay: Kustomize-style YAML rule layering with executor-side halt-on-reject for LLM agents (Version v0.1.0) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.19773588

ORCID iD: https://orcid.org/0009-0005-1750-818X

The Problem

What the Existing Tools Do (and Do Not Do)

Claude Code Plan Mode — proposal-time review, not executor enforcement

AWS Bedrock Guardrails — inference-time filtering, not action enforcement

LangGraph interrupt() — human-in-the-loop without a policy object

GitHub Spec Kit — context injection, not enforcement

The Convergence

Reference Implementation

Loading and merging

Enforcing

A realistic constitution.yaml

Limitations

What I Would Like to See in Claude Code

Cite this work

LangGraph `interrupt()` — human-in-the-loop without a policy object