<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>JSON-Schema on Mikhail Shogin</title><link>https://mshogin.com/tags/json-schema/</link><description>Recent content in JSON-Schema on Mikhail Shogin</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>Mikhail Shogin</copyright><lastBuildDate>Fri, 22 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://mshogin.com/tags/json-schema/index.xml" rel="self" type="application/rss+xml"/><item><title>The Discipline of Agent Pipelines: Teaching Agents to Stop</title><link>https://mshogin.com/blog/agent-pipeline-discipline/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://mshogin.com/blog/agent-pipeline-discipline/</guid><description>&lt;img src="https://mshogin.com/blog/agent-pipeline-discipline/cover.en.svg" alt="Featured image of post The Discipline of Agent Pipelines: Teaching Agents to Stop" /&gt;&lt;h2 id="in-this-article"&gt;In this article
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class="link" href="#where-autonomy-breaks" &gt;Where autonomy breaks&lt;/a&gt;: silent repair and hidden drift&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="#a-concrete-failure" &gt;A concrete failure&lt;/a&gt;: how the pipeline closed a task on diverging reports&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="#solution-1-json-contract-between-phases" &gt;Solution 1&lt;/a&gt;: JSON contract between phases&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="#solution-2-independent-verify" &gt;Solution 2&lt;/a&gt;: independent VERIFY outside the agent perimeter&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="#solution-3-model-specialization-via-router" &gt;Solution 3&lt;/a&gt;: router and five models for five phases&lt;/li&gt;
&lt;li&gt;&lt;a class="link" href="#principles" &gt;Principles&lt;/a&gt; and &lt;a class="link" href="#closing" &gt;Closing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2 id="where-autonomy-breaks"&gt;Where autonomy breaks
&lt;/h2&gt;&lt;p&gt;Technical debt is the kind of work nobody has time for. It looks like the perfect target for automation: split the roles (PM, system analyst, developer, QA) and run an autonomous loop over the ticket queue.&lt;/p&gt;
&lt;p&gt;On paper, it works. In practice, autonomy turns into silent repair.&lt;/p&gt;
&lt;p&gt;The agent hits a problem in its environment — and does not report a block. Instead it starts editing neighboring code, poking config files, &amp;ldquo;optimizing&amp;rdquo; things it was not asked to touch. The output is &amp;ldquo;Done&amp;rdquo;, but half the environment has drifted away from the spec.&lt;/p&gt;
&lt;p&gt;I call this hidden drift. Each individual edit looks like a local improvement. Stacked together, they become a silent repair nobody requested — and one that is hard to roll back, because the Git diff stopped representing a single task long ago.&lt;/p&gt;
&lt;h2 id="a-concrete-failure"&gt;A concrete failure
&lt;/h2&gt;&lt;p&gt;Today the pipeline closed a task as DONE. The developer agent did the work, ran &lt;code&gt;check_command&lt;/code&gt;, got &lt;code&gt;exit=0&lt;/code&gt;, reported &lt;code&gt;passed=true&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;An independent parallel check returned &lt;code&gt;exit=1&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Nobody lied. &lt;code&gt;check_command&lt;/code&gt; contained a &lt;code&gt;test&lt;/code&gt; pattern that behaves differently on linux and macOS. The build artifact turned out to be incompatible with the verification environment. The agent saw &amp;ldquo;success&amp;rdquo; from its own vantage point. From another vantage point — failure.&lt;/p&gt;
&lt;p&gt;This one case shapes the entire architecture downstream. If we trust the agent&amp;rsquo;s self-report, we trust a slice of the environment it happens to be in. And the environment is a variable.&lt;/p&gt;
&lt;h2 id="solution-1-json-contract-between-phases"&gt;Solution 1: JSON contract between phases
&lt;/h2&gt;&lt;p&gt;In early iterations, phases talked to each other in free-form text. The analyst described a plan, the developer read it and interpreted. Sounds like a normal human process. In practice, it produces three problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the model invents fields nobody asked for&lt;/li&gt;
&lt;li&gt;the next phase has to parse free-form text&lt;/li&gt;
&lt;li&gt;&amp;ldquo;I thought about it and decided to do a bit more&amp;rdquo; is a real phrase in our logs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We moved to a JSON schema between every pair of phases. Constrained generation via tool_use forces the model to fill exactly the fields the next phase expects, and nothing extra.&lt;/p&gt;
&lt;div class="mermaid"&gt;
stateDiagram-v2
[*] --&gt; PLAN
PLAN --&gt; SPEC: JSON plan
SPEC --&gt; DEV: JSON spec
DEV --&gt; QA: JSON diff + report
QA --&gt; VERIFY: JSON review
VERIFY --&gt; DONE: independent pass
VERIFY --&gt; BLOCKED: mismatch
PLAN --&gt; BLOCKED: cannot plan
SPEC --&gt; BLOCKED: ambiguity
DEV --&gt; BLOCKED: env issue
QA --&gt; BLOCKED: review fail
BLOCKED --&gt; [*]: handover to human
DONE --&gt; [*]
&lt;/div&gt;
&lt;p&gt;Each transition is a structural artifact bound to the schema. No &amp;ldquo;free creativity&amp;rdquo;. Want to add a thought — there is no field for it in the schema, so the thought does not propagate. Free-form text stays in one place only — human-facing comments, which never feed the next step.&lt;/p&gt;
&lt;div class="mermaid"&gt;
graph LR
A[Issue] --&gt;|JSON: goal, constraints| B[PLAN]
B --&gt;|JSON: steps, owners| C[SPEC]
C --&gt;|JSON: contract, acceptance| D[DEV]
D --&gt;|JSON: diff, check_command, passed| E[QA]
E --&gt;|JSON: review verdict| F[VERIFY]
F --&gt;|JSON: status| G[DONE or BLOCKED]
&lt;/div&gt;
&lt;p&gt;The point: the schema is not the &amp;ldquo;nicest format&amp;rdquo; for humans, it is the format the next phase needs. If reviewers want to see something else, render a human-readable view from the same JSON. Do not blend the two channels.&lt;/p&gt;
&lt;h2 id="solution-2-independent-verify"&gt;Solution 2: independent VERIFY
&lt;/h2&gt;&lt;p&gt;From the &lt;code&gt;check_command&lt;/code&gt; story I extracted a hard rule: verification must live outside the agent.&lt;/p&gt;
&lt;div class="mermaid"&gt;
graph TB
subgraph agentenv ["Agent environment"]
A1[Developer agent]
A2[Local config]
A3[Cache, env vars]
A4[Build artifact]
A1 --- A2
A1 --- A3
A1 --- A4
end
subgraph verifyenv ["Verify environment"]
V1[Fresh clone]
V2[Clean container]
V3[Independent check_command]
V1 --- V2
V2 --- V3
end
A1 --&gt;|reports passed=true| C{Compare}
V3 --&gt;|independent exit code| C
C --&gt;|match| OK[DONE]
C --&gt;|mismatch| BL[BLOCKED]
&lt;/div&gt;
&lt;p&gt;On the left — the agent&amp;rsquo;s environment. It runs there: it knows its dependencies, local configs, caches, env vars. It can do whatever it wants. The point is, it reports a result.&lt;/p&gt;
&lt;p&gt;On the right — the verify environment. Fresh clone, clean container, no artifacts from the agent, no pre-built binaries. It runs the same &lt;code&gt;check_command&lt;/code&gt; from an independent point. If the two results disagree — the task moves to BLOCKED.&lt;/p&gt;
&lt;p&gt;This is not more expensive than trusting. One extra verification run costs far less than a silent close with hidden drift. The audit of those closes will happen anyway — just a month later, with the context already lost.&lt;/p&gt;
&lt;h2 id="solution-3-model-specialization-via-router"&gt;Solution 3: model specialization via router
&lt;/h2&gt;&lt;p&gt;One universal agent for everything is an anti-pattern. Every phase has its own load profile, and a universal model either overpays or underdelivers.&lt;/p&gt;
&lt;div class="mermaid"&gt;
graph LR
IN[Phase request] --&gt; R{Local router}
R --&gt;|structural JSON| M1[Planner]
R --&gt;|deep reasoning| M2[Reasoning specialist]
R --&gt;|long document| M3[Long-context]
R --&gt;|short patch| M4[Executor]
R --&gt;|simple reply| M5[Fast chat]
M1 --&gt; OUT[Structural artifact]
M2 --&gt; OUT
M3 --&gt; OUT
M4 --&gt; OUT
M5 --&gt; OUT
&lt;/div&gt;
&lt;p&gt;Five roles for five load types:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;planner&lt;/strong&gt; — structural decisions, sequencing and precision over JSON&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;reasoning specialist&lt;/strong&gt; — heavy justification, architectural reasoning&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;long-context&lt;/strong&gt; — summarizing documents and long tickets&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;executor&lt;/strong&gt; — short patches, routine operations&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;fast chat&lt;/strong&gt; — conversational replies, simple answers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The router looks at the phase type and artifact complexity, and routes the call to the matching model. Locally, with no lock-in to a single provider. This is about cost, and also about the fact that different models are good at different things — there is no point producing a structural plan with a model tuned for long-form dialogue.&lt;/p&gt;
&lt;h2 id="principles"&gt;Principles
&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Contract discipline beats generation speed.&lt;/strong&gt; Free-form text between phases is an invitation to drift.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Do not trust self-report.&lt;/strong&gt; VERIFY lives outside the agent perimeter.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Stop explicitly.&lt;/strong&gt; BLOCKED is a status, not a fallback. If something does not add up — the task waits for a human. It does not close as &amp;ldquo;Done&amp;rdquo;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Specialization beats universality.&lt;/strong&gt; A pipeline of specialized models is cheaper and more stable than one large model doing everything.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id="closing"&gt;Closing
&lt;/h2&gt;&lt;p&gt;Agent autonomy is not freedom. It is a contract. The tighter the contract and the more independent the verification, the less drift and the fewer silent closes.&lt;/p&gt;
&lt;p&gt;Knowing when to stop is a skill. A pipeline that can say &amp;ldquo;I don&amp;rsquo;t know&amp;rdquo; is safer than a pipeline that always says &amp;ldquo;done&amp;rdquo;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Architectural question:&lt;/strong&gt; where in your process is the agent&amp;rsquo;s self-report currently treated as truth, and what is the smallest step that would put an independent check next to it?&lt;/p&gt;
&lt;/blockquote&gt;</description></item></channel></rss>