1. Overview: The Missing Piece
An AI Agent has only one foothold—the Claude API's messages array. The API recognizes only three roles: system, user, and assistant. The harness layer that carries the Agent must cram all the fancy "state, events, context, reminders, and constraints" into these three roles—there is no fourth.
Claude Code is usually described as: "Claude API + tool use + MCP + skill + context compression." These items are indeed all present. And from an engineering density perspective, Claude Code has many other layers of design: hook mechanisms, plan mode, auto mode, team collaboration, context compression strategies, MCP integration... each worth discussing separately. But if you only look at those surface-level items, you'll miss its most essential layer—the one that most determines whether the model is "easy to handle": the <system-reminder> tag.
The so-called system-reminder is essentially a convention Claude Code establishes for itself:
- Wrap a piece of text in a pair of
<system-reminder>...</system-reminder>tags; - Insert it into the
messagesarray as auserrole message with meta markings; - Tell the model in the system prompt: "When you see this tag, know it's automatically added system-side information bearing no causal relation to the preceding user message or tool result."
With just this convention, Claude Code gains a bypass channel: it can continuously infuse guidance and constraints into the model's mind without altering API role semantics or inventing new roles.
It determines whether the model still remembers it's in plan mode at turn 50, knows what day it is, should maintain the todo list, or should avoid already-invalidated MCP tools. Most system-reminders are never seen by users, yet the model reads them every turn. This article focuses specifically on this layer: definition, all producers (including types, meanings, injection timing and position), pipeline post-processing, consumption and filtering; the final two appendices leave a crack open to see how big tech does A/B testing for prompt engineering.
2. Definition and Syntax
2.1 The Tag Itself
Form:
The wrapper function fixedly adds a \n before and after, so newlines are part of the tag, not the content. When batch-wrapping, string-type content gets wrapped as a whole; array-type content only wraps text blocks where type === 'text', leaving image / tool_result and other blocks unchanged.
2.2 "Identification Prefix"
The entire Claude Code system shares one判定条件: whether the text starts with <system-reminder>. Subsequent "merging back into tool_result", "UI stripping", "transcript search", and "telemetry sampling" all rely on this to identify "whether this segment is a reminder".
To this end, Claude Code runs an idempotent wrapper on the tail segment of all attachment-type messages—if some path forgot to add the reminder, it gets patched here. This guarantees no "漏网之鱼" (fish that slip through the net), otherwise subsequent merge logic correctness would be affected.
2.3 Tag Morphology in Messages
Always appears as a user-role text block. There are two forms:
Form A: The entire user message is a single reminder (most common)
Or array notation:
Such messages carry an isMeta: true marker inside Claude Code, used for local UI filtering, without affecting the API request body sent out.
Form B: Merged into a tool_result's content (detailed in §5 below)
2.4 Recognition and Stripping
Recognition uses a simple regex, used in telemetry and similar scenarios:
UI-side stripping has two granularities:
- "Strip only the beginning": Used when rendering chat bubbles and copying to clipboard. Because reminders are often prepended to user messages, stripping the beginning reveals the user's actual typed text.
- "Full-text stripping": Used for transcript search. Because when restoring old sessions with
claude -c(continue subcommand), memory reminders may be inserted mid-message; the beginning-only version is insufficient.
3. The "Constitution" on the Model Side — Declarations in the System Prompt
Claude Code's system prompt contains two declarations explicitly telling the model "how to understand <system-reminder> when you see it." Both are components of the system prompt—one for general agents, one for autonomous work agents.
First segment:
Second segment (appears in another system prompt items list):
Three key points in these two declarations:
- May appear in user messages or tool results — corresponding to Forms A and B in §2.3.
- "bear no direct relation to..." — explicitly tells the model "this text is not a reply to the preceding user message or tool result; do not read it as causal."
- "automatically added by the system" — the model should not echo this to the user's face, should not imitate it, should not repeat this to the user.
Additionally, specific templates contain大量加强措辞: "DO NOT mention this to the user explicitly because they are already aware", "Make sure that you NEVER mention this reminder to the user", "Don't tell the user this, since they are already aware"—repeatedly reminding the model: "do not speak this aloud."
4. Producer Taxonomy: Who is Stuffing <system-reminder> into Messages
Below, categorized by semantics. Each category first presents a table listing Type / Meaning / Injection three items. The "Injection" column explains both Timing (when it triggers) and Position (where it enters the messages array), as both are equally important for understanding Harness behavior.
The "Type" column uses the corresponding attachment.type string from the code; for those without a corresponding type field but directly assembled by functions/constants, the function name or constant name is used. ${...} placeholders in templates are runtime variables from the code (e.g., ${filename}, ${planFilePath}), replaced with actual values at render time.
4.1 User Context Preloading
| Type | Meaning | Injection |
|---|---|---|
prependUserContext | A set of key: value dictionaries packaged into one reminder. Main fields: claudeMd (full content of project root CLAUDE.md), currentDate ("Today's date is ..." sentence). | Timing: Runs before every API call, reconstructed and inserted. Position: Very beginning of the entire messages array (fixed as message 0). Content itself barely changes during the session (except date turning over), maximizing Anthropic-side prompt cache hits. |
Template:
Notes:
- Fields in the
contextdictionary aren't hardcoded; callers (main conversation / forked agent / companion, etc.) can decide which keys to include. The main conversation actually includesclaudeMd,currentDate. - The
claudeMdfield handles injection of project root CLAUDE.md; but before filling this field, it runsfilterInjectedMemoryFilesto剔除 memory files already injected as attachments in this session (likenested_memoryin §4.4 below), avoiding double injection. In other words: project root CLAUDE.md goes throughprependUserContext.claudeMd, subdirectory CLAUDE.md goes throughnested_memoryattachment—two paths with mutually exclusive分工.
4.2 @-mentioned Files and Directories
| Type | Meaning | Injection |
|---|---|---|
directory | User @ed a directory in input. Claude Code constructs a pair of text-form "tool call + tool result", packaging ls description and actual directory listing into reminder. | Turn where @directory appears in user input; as two user messages appended after that turn's user message. |
file | User @ed a file. Similarly constructs pair of "file read + read result" text messages. Subtypes: text / image / notebook / pdf. Text files if truncated get additional "truncation notice" reminder. | Turn where @file appears; as two user messages appended after that turn's user message. |
edited_text_file | Previously @ed file later manually changed by user or auto-changed by linter—reminder informs model "this change was intentional, don't revert". | Detected when file mtime changes and file was referenced in session; as one user message appended to current turn's user message sequence. |
compact_file_reference | After history compaction, original file content too large to keep, only placeholder reminder kept saying "was read before". | During context compression when replacing original file content; as user message left in compressed messages. |
pdf_reference | PDFs over 10 pages can't be read at once, reminder forces model to use pagination params. | Turn where large PDF @ed. |
Templates:
edited_text_file:
file subtype text when truncated, additional second message:
compact_file_reference:
pdf_reference:
Example: @directory / @file message structure
The special thing about these two branches: they don't just stuff text, but construct two user-role text messages, corresponding to "tool call description" and "tool result description" respectively, then wrap the whole thing in reminder. Note these "tool call/tool result" are not API schema-defined tool_use / tool_result structured blocks—they're ordinary user-role text literally stating:
Taking @src/ as example, what roughly enters messages are two adjacent messages like this (before Section 5's merge pipeline):
Effect: The model "thinks" the user wanted it to look at what's in src/ and sees the ls text result. But the whole process never actually went through the Bash tool—it was completely伪造 at the text level as a narrative of a call.
4.3 IDE Integration
| Type | Meaning | Injection |
|---|---|---|
selected_lines_in_ide | User selected code lines in IDE. Content over 2000 chars truncated with \n... (truncated) appended. | Turn after IDE syncs selection info; as one user message appended to current turn's sequence. |
opened_file_in_ide | User opened file in IDE (no specific lines selected). | Turn after IDE syncs open event. |
Templates:
selected_lines_in_ide:
opened_file_in_ide:
4.4 Memory (CLAUDE.md / Project Memory / Personal Memory)
| Type | Meaning | Injection |
|---|---|---|
nested_memory | Nested CLAUDE.md found in project subdirectory. Note this only handles subdirectory CLAUDE.md; project root CLAUDE.md goes through prependUserContext.claudeMd in §4.1 above. Mutually exclusive, deduplicated by Harness. | Timing: Injected when "session references a subdirectory containing CLAUDE.md"; once per newly discovered CLAUDE.md, tracked by dedup Set to prevent same CLAUDE.md injected multiple times in same session. Position: As independent user message inserted into current turn's user message sequence. |
relevant_memories | Set of personal/project memory files sorted by relevance, each memory as independent user message. Each prefixed with header containing "N days ago" age note. | Timing: Pre-fetched by background relevance retrieval task (non-blocking main turn), injected in corresponding turn when hit. Position: Several user messages for relevant memories inserted into current turn's sequence. |
memoryFreshnessNote | Embedded reminder specifically for file read tool results, appending "may be stale" warning for memory files over 1 day old. | Timing: Appended when file read tool returns memory file with mtime over 1 day. Position: Not standalone message, directly embedded in that FileReadTool's tool_result content string. |
Templates:
nested_memory:
Each relevant_memories (header calculated and cached at attachment creation time, not recalculated per-turn with current time—for prompt cache stability):
memoryFreshnessNote (from memoryFreshnessText):
4.5 Skills
| Type | Meaning | Injection |
|---|---|---|
skill_listing | Read-only list of all skills available this session (name + description one per line). | Timing: Single-trigger mechanism—tracked by in-process sentSkillNames Map of "which skills already broadcasted", entire list only injected once per session first appearance, or when skill set actually changes (plugin reload, skill file changes on disk); when restoring old session with claude -c, if already present in transcript, actively suppresses next injection. Not re-injected after compacting, because ~4K tokens all spent on prompt cache creation with minimal benefit. Position: As user message inserted into that injection turn's sequence (usually turn 0). |
invoked_skills | Skills already invoked via Skill tool this session, packaging full markdown content. Not "re-injected every turn"—that would cost massive tokens per turn. Real purpose is survival across compression events. | Timing: Only created when compaction occurs as such attachment, joining as part of post-compaction message sequence—purpose is to carry "full content of skills already invoked this session" through the summarizer's merge without being swallowed. Not re-injected in subsequent turns. When restoring session with claude -c, read by restore logic to recover in-process skill state for future compactions. Position: In the user message generated by compaction event, after compaction产物. |
skill_discovery | Relevant skills matched via skill search (only name + description, not full content), prompting model "consider using it". | Timing: When EXPERIMENTAL_SKILL_SEARCH feature enabled, background retrieval hits relevant skill that turn. Position: As user message inserted into current turn's sequence. |
dynamic_skill | UI-only, produces no user messages at API layer (code explicitly returns empty array). | —— |
Templates:
skill_listing:
invoked_skills:
Where skillsContent is each skill拼接 in following format, joined by \n\n---\n\n:
skill_discovery:
Notes:
- Skill initial loading doesn't go through reminder—goes through Skill tool call, after which skill full content returned to model via
tool_result. Afterwards skill content lives intool_result, no重复 injection needed. invoked_skillsreminder's real purpose is survival across compression events. Normally without compression, skill content lives intool_result; but once context compression triggers, summarizer may summarize awaytool_resultdetails, causing subsequent turns to lose skill guidelines. Thus compaction流程 while generating compaction产物, additionally stuffsinvoked_skillscarrying these skills' full markdown across the compression boundary; when restoring old session withclaude -c, this attachment also restores in-process skill registry.skill_listingmoderation similar—entire list ~4K tokens, so code repeatedly emphasizes "inject only once, don't repeat铺", preferring to trust model remembers Skill tool schema and already-used skills'tool_result.skill_discoverybranch only gives name + description; full skill content must be fetched via Skill tool.
4.6 Todo / Task Soft Reminders
| Type | Meaning | Injection |
|---|---|---|
todo_reminder | Gentle nudge to use Todo tool, with current todo list attached. | Detected when Todo tool unused for period and current task appears to need tracking (specific threshold determined by internal logic); as user message appended to current turn's sequence. |
task_reminder | Similar to above, but for new Task tool series (TaskCreate / TaskUpdate). | Same as above, but only when isTodoV2Enabled() enabled. |
Templates:
todo_reminder:
task_reminder:
Note: Both templates end with "NEVER mention this reminder to the user"—fixed closing for reminder-type prompts, suppressing model from treating "system told me to use TodoWrite" as something to speak to user, becoming meaningless no-tool turn.
4.7 Mode Switching (Plan / Auto / Output Style / Brief)
These reminders are semantically the "heaviest"—they are runtime mode switches, injecting them is like equipping the model with a whole new set of behavioral准则.
| Type | Meaning | Injection |
|---|---|---|
plan_mode (reminderType=full) | Enter plan mode—model can only read files, write plan file, cannot edit code. Full 5-phase workflow. | Turn user enters plan mode; or when session restored and re-declaration needed first time. |
plan_mode (reminderType=sparse) | Plan mode already active but deep in session—minimal reminder to prevent model "forgetting it's in plan mode". | Determined by "sparse reminder" strategy when session deep. |
plan_mode_reentry | Previously exited plan mode, now re-entering special guidance. | When user re-enters plan mode and plan file already exists. |
plan_mode_exit | Inform model exited plan mode, edits allowed. | Turn user exits plan mode. |
auto_mode (reminderType=full) | Enter auto mode (continuous autonomous execution, fewer interruptions). | Turn user enters auto mode. |
auto_mode (reminderType=sparse) | Deep session simplified version. | Same as Plan sparse. |
auto_mode_exit | Inform model exited auto mode,恢复正常 interaction rhythm. | Turn user exits auto mode. |
output_style | Current output style declaration (default / Explanatory / Learning etc.). | Before every API request (except default); as user message. |
brief (/brief command) | Brief mode on/off notice. | Turn user toggles brief mode; not through attachment, directly拼接 string into message. |
Templates:
plan_mode (full, version without interview phase):
Supplementary note: The entire plan workflow has 5 Phases, where Phase 4 (writing final plan step) has a separate A/B experiment in code, with 4 variants (CONTROL / TRIM / CUT / CAP), bucketed by feature flag from server. The expanded version above is CONTROL control group原文; full experiment details, motivation, monitoring metrics see Appendix A.
plan_mode (sparse):
Where workflowDescription is Follow iterative workflow: explore codebase, interview user, write to plan incrementally. when interview phase enabled, otherwise Follow 5-phase workflow..
plan_mode_reentry:
plan_mode_exit (planReference is The plan file is located at ${planFilePath} if you need to reference it. when plan file exists, otherwise empty):
auto_mode (full):
auto_mode (sparse):
auto_mode_exit:
output_style:
Brief mode on:
Brief mode off:
4.8 Sub-agents / Background Tasks
| Type | Meaning | Injection |
|---|---|---|
agent_mention | Trigger prompt when user @AgentName. | Turn containing agent mention in user input. |
task_status (killed) | User manually stopped a background agent. | Turn after stop operation. |
task_status (running) | Background task not yet finished—key point is preventing model from spawning duplicate agent, especially after compaction when original spawn message no longer in messages. | Before every API request, when corresponding background task detected running and spawn message already compacted away. |
task_status (completed / failed) | Background task results—tell model how to read results. | Turn after task completion event arrives. |
team_context | Team collaboration mode, declaring current agent identity, team config path, task list path, etc. | Injected once at agent swarm initialization (whether subsequent injection continues unverified, understood as first-turn injection for now). |
SHUTDOWN_TEAM_PROMPT | Non-interactive mode, forcing model to "shutdown sub-team before returning final response". | Timing: When team still exists in non-interactive mode. Position: Not through attachment pipeline, CLI print path directly assembled into user message sent to model. |
Templates:
agent_mention:
task_status (killed):
task_status (running, with outputFilePath):
task_status (running, without outputFilePath):
task_status (completed / failed, with outputFilePath):
(Without outputFilePath, last sentence becomes You can check its output using the TaskOutput tool.. When deltaSummary empty, no Delta: segment attached.)
team_context (template embeds JSON code block, first line backticks escaped to avoid breaking outer rendering):
SHUTDOWN_TEAM_PROMPT (reminder segment + command sentence outside reminder in same user message, both sent together):
4.9 Hook Events (User-defined Shell Hooks)
| Type | Meaning | Injection |
|---|---|---|
hook_blocking_error | Hook blocked this tool call with non-zero exit code. | Turn after hook returns blocking exit code; as user message appended to current turn's sequence. |
hook_success | Hook succeeded with extra output. Only for SessionStart / UserPromptSubmit events and content non-empty—otherwise "hook success: Success" every turn would pollute messages. | When above two conditions met. |
hook_additional_context | Context text actively appended by hook through additionalContext field. | Turn after hook returns additionalContext. |
hook_stopped_continuation | Hook actively blocked current turn from continuing. | After hook indicates interruption. |
async_hook_response | Async Hook completion callback, may carry systemMessage and additionalContext. | Turn after async hook completion event arrives. |
| Async Stop Hook blocking | Async Stop hook blocking via dedicated path directly wrapped and enqueued. | When async Stop hook exits with blocking code; injected into next turn's messages via enqueuePendingNotification. |
Templates:
hook_blocking_error:
hook_success:
hook_additional_context:
hook_stopped_continuation:
Async Stop Hook blocking:
4.10 MCP / Dynamic Tools and Agent Registration
These reminder types serve one purpose: letting the model know immediately about runtime dynamic tool/agent ecosystem changes.
| Type | Meaning | Injection |
|---|---|---|
deferred_tools_delta | Newly connected/disconnected deferred tools under ToolSearch ecosystem. | When MCP server connection status changes, or ToolSearch index updates; as user message appended to current turn's sequence. |
agent_listing_delta | Incremental broadcast of sub-agent type registration/deregistration. First injection includes "concurrency hint". | When agent list first injected; subsequently when additions/removals detected. |
mcp_instructions_delta | Usage instructions provided by MCP servers. | When MCP server connection established/disconnected. |
mcp_resource | User @ed an MCP resource—convert resource content to text/image blocks, add prompts before and after. | Turn where user @ed MCP resource. |
Templates:
deferred_tools_delta (adds/removes can appear simultaneously, segments joined by \n\n):
agent_listing_delta (first vs incremental headers differ):
mcp_instructions_delta:
mcp_resource (when has text content, item.text wrapped with two prompt texts):
mcp_resource (empty/binary/no displayable content, choose one of three):
Note: deferred_tools_delta's subtlety lies in its pairing with ToolSearchTool—it only broadcasts tool names, letting model know "these tools exist"; real tool schema queried on-demand via ToolSearch tool. This is classic on-demand tool schema loading, avoiding blowing up system prompt when new MCP tools added.
4.11 Session Lifecycle and State Transitions
| Type | Meaning | Injection |
|---|---|---|
compaction_reminder | Remind model "unlimited context", reassure it needn't rush to finish. | Injected before every request when COMPACTION_REMINDERS feature enabled. |
context_efficiency | Prompt when HISTORY_SNIP feature enabled, letting model use snip tool to裁剪 oversized history content (e.g., folding large Read result into summary to save tokens)—this reminder's content roughly guides model to actively tighten with snip when context inflates rapidly. Specific template text not found in code (text located in compiled js file). | Timing: When HISTORY_SNIP feature enabled, turns where context growing fast. Position: As user message inserted into current turn's sequence. |
date_change | Notice when date turns over—do not mention to user's face. | Turn when today's date detected different from session start date. |
critical_system_reminder | "Critical reminder" field declared in some agent definitions (experimental field, code comments call it "Short message re-injected at every user turn"). Content customized by agent author. Real built-in use case is verificationAgent, using this reminder to make verification sub-agent remember "verify only, cannot edit code, must end with PASS/FAIL/PARTIAL". | Timing: Re-injected every user turn (as long as current agent definition configures this field). Position: As user message inserted into current turn's sequence. |
ultrathink_effort | User requested reasoning intensity raised to specified level. | Turn when user requests corresponding reasoning level. |
companion_intro | "Companion creature" identity intro in Buddy feature. | When Buddy enabled at appropriate timing. |
queued_command | Wrapper for messages queued by user/subsystem during turn in progress (prefix function branches by source into 4 types, then outer reminder wrapper). | When queue drained during turn in progress. |
verify_plan_reminder | After plan execution completed, remind model to call VerifyPlanExecution for验收. | When plan item execution detected complete. |
plan_file_reference | Bring current plan file content as prompt into conversation. | Appropriate turns in plan mode. |
Templates:
compaction_reminder:
date_change:
ultrathink_effort:
companion_intro (from companionIntroText(name, species)):
queued_command 4 prefix types (by wrapCommandText(raw, origin) branching on origin.kind, outer unified reminder wrapper):
Source human / undefined (ordinary user typing):
Source task-notification:
Source coordinator:
Source channel:
verify_plan_reminder (toolName is VerifyPlanExecution when CLAUDE_CODE_VERIFY_PLAN === 'true', otherwise empty string):
plan_file_reference:
critical_system_reminder content defined by agent definition itself, built-in verificationAgent original text:
4.12 Token / Budget Statistics
| Type | Meaning | Injection |
|---|---|---|
token_usage | Current accumulated token usage. | When token statistics injection feature enabled, every turn or by threshold. |
budget_usd | USD budget consumption status. | When budget injection feature enabled. |
output_token_usage | Per-turn and per-session output token statistics. | When corresponding feature enabled. |
Templates:
token_usage:
budget_usd:
output_token_usage (when budget !== null):
output_token_usage (when budget === null):
4.13 Diagnostics (IDE / LSP Diagnostics)
| Type | Meaning | Injection |
|---|---|---|
diagnostics | IDE / LSP newly discovered lint warnings or error集合. | Turn when diagnostic count changes detected. |
Template (reminder internally nests another <new-diagnostics> for model discrimination):
4.14 Side Question (Not via attachment, independent path)
| Type | Meaning | Injection |
|---|---|---|
| Side Question | User asks a "small question that doesn't interrupt main thread" during main task progress. Claude Code spawns lightweight forked agent, wrapping entire user question in reminder, strongly constraining it to answer directly, no tools, no promises of follow-up actions. | In that independent forked agent call when user triggers "side question", reminder + user original question as forked agent's only user message. |
Template (reminder closed followed by newline + user real question text ${question}):
Note: This branch is one of the strongest reminder intensities in the entire system—even "don't say Let me try..." phrasing-level restrictions written into constraints. Reason: side question forked agent has no tools available; if it says "I'll go check" it completely deadlocks.
4.15 Reminders Embedded in tool_result
These are not independent user messages, but reminders manually拼接 into a tool result string—what model sees is "annotation appended to end of tool result".
| Type | Meaning | Injection |
|---|---|---|
FileReadTool empty file | Read file with 0 lines. | When reading empty file, embedded in that FileReadTool's tool_result string. |
FileReadTool offset out of bounds | Specified offset exceeds actual line count. | When reading out of bounds, embedded in tool_result. |
CYBER_RISK_MITIGATION_REMINDER | For suspected malicious code content, append "can analyze but cannot改造" constraint. | After file read, appended for specific model sets (appended for all models except certain ones); embedded at end of tool_result. |
memoryFreshnessNote (see §4.4) | Staleness annotation for memory files in FileRead output. | Embedded in tool_result when reading memory file over 1 day old. |
Templates:
FileReadTool empty file:
FileReadTool offset out of bounds:
CYBER_RISK_MITIGATION_REMINDER (with \n\n before and \n after, full original text):
4.16 Branches Returning Empty (Types declared but produce no reminder at API layer)
For exhaustive listing—these attachment.types explicitly return empty arrays in code, only meaningful at UI layer:
already_read_file,command_permissions,edited_image_filehook_cancelled,hook_error_during_execution,hook_non_blocking_error,hook_system_message,hook_permission_decisionstructured_outputdynamic_skill(UI-only)- Removed legacy types:
autocheckpointing,background_task_status,todo,task_progress,ultramemory, etc.
5. Message Pipeline Post-processing: Smoosh — Folding Reminders Back into tool_result
Through §4, all producers output independent user messages. But before actually sending to API, Claude Code performs a key merge: folding <system-reminder>-prefixed text blocks into the last tool_result immediately preceding them.
5.1 Why Fold?
Code explanation for this (translated from comment original):
If a toolresult block is directly followed by a text block (even just a reminder), in underlying prompt serialization it renders as `</functionresults>\n\nHuman:
pattern. Having this pattern repeatedly appear mid-conversation teaches the model a bad habit: after tool calls,吐出 emptyHuman:` prefix before ending turn—wasting 3 tokens of invalid turn. Internal A/B experiments show: without merging this behavior occurs ~92% of time, drops to 0% after merging.
Simply put: without merging, it pollutes the model's output habits.
5.2 Merge Rules
- Adjacent user messages first merged into same message;
- If containing both
tool_resultblocks and<system-reminder>-prefixed text blocks, fold these reminder texts into last tool_result'scontentfield; tool_result.contentoriginally string and blocks to merge all text → concatenate as string, joined by\n\n;tool_result.contentcontains experimentaltool_referenceblock → don't merge, skip;- Error state (
is_error: true) tool_results API-constrained to text only—filter non-text blocks first then concatenate; - Other cases normalize to array form, adjacent text blocks then merged.
5.3 Before and After Comparison
Before merge (two user messages already merged adjacent, but reminder still independent text block):
After merge (reminder folded into tool_result.content string):
5.4 Two Related Guardrails
- tool_result fronting:
tool_resultblocks in user messages must appear first, otherwise API errors "tool result must follow tool use". Merge process does a fronting整理. - Error result content sanitization: Old sessions with image等非-text blocks stuffed into
is_error: truetool_results would crash with API 400 on restore. Read side has unconditional sanitization, filtering to pure text.
5.5 Why the "Identification Prefix" Idempotent Wrapper is Needed
Merge logic relies on "whether text starts with <system-reminder>" to determine whether to fold into tool_result. So every attachment branch must wrap its text content with reminder tags, otherwise it slips through, becoming that sibling that teaches bad habits. The idempotent wrapper running at pipeline tail is the final兜底补齐 step.
6. Consumers: Who Reads These Reminders
6.1 The Model (The Only "Serious" Reader)
- System prompt already told it "this is旁白", don't echo, don't treat as user original words.
- Most specific templates additionally write "DO NOT mention this to the user explicitly", "NEVER mention this reminder to the user", "Don't tell the user this" for repeated reinforcement.
- Whether model reads well is the ceiling of Agent engineering.
6.2 UI Rendering Chat Bubbles
Claude Code's interaction interface, every user message bubble, every assistant message bubble rendered by frontend components from messages data source. Problem: from "model perspective" messages, a user message's complete text often looks like:
Rendering such text directly to chat bubbles, users would see piles of incomprehensible English旁白. So UI components feed message text to stripSystemReminders before rendering, stripping beginning consecutive <system-reminder>...</system-reminder> blocks one by one, preserving only trailing user real input. Stripping only beginning suffices—because producer-side convention is "reminders always prepended to user message front".
6.3 Copy to Clipboard
Claude Code supports copying messages to clipboard (e.g., for sharing elsewhere). Copying uses same stripping function, rules consistent with UI rendering: only strip message beginning reminder blocks, copy out user actual typed text. Unless debugging scenarios, users don't need to see reminders.
6.4 Transcript Search
Claude Code preserves session history as transcript (conversation script), supporting users to later search "what did I say before", "what did Claude say before". Search hits human-readable content, so before indexing/matching, reminders must also be cleared from text. But transcript search uses harsher version than UI rendering—not "strip only beginning" but "loop strip full text".
Reason: When users restore old sessions with claude -c, certain reminders (like memory updates) inserted mid-message rather than beginning; stripping only beginning leaves residue. So transcript search's full-text stripping version iterates over entire text, finding one <system-reminder>...</system-reminder> and cutting one, until none remain.
6.5 Telemetry
Telemetry refers to usage behavior and performance metrics collected by Claude Code—e.g., "how many bytes per turn are reminders, which reminder types most frequently injected, prompt cache hit rate, various reminders' impact on output tokens", etc. These metrics reported to R&D team via anonymized session tracking, supporting A/B experiments and long-term statistics (the "A/B experiment codenames" appearing throughout this article pulled from such telemetry).
In this scenario, Telemetry needs to extract reminder content separately for classification统计—without extraction can only statistic entire messages, unable to distinguish "how many bytes this turn are user original words vs system injection". Thus Telemetry-specific helper function uses simple regex:
Matching entire message text against it: if entire segment exactly a reminder, extract internal text, classify as "system injection"; otherwise count as ordinary user/model content in another bucket.
7. Design Philosophy: Guidance and Constraints
<system-reminder> from start to finish embodies the same thing—guidance and constraints for large models. Reading through the chain, four specific practices can be summarized:
- Establish a bypass channel the model can recognize. API only has three roles, cannot open new role for "system injection", so Claude Code uses a pair of XML-style tags + two declarations in system prompt, letting model learn "seeing this prefix means旁白, not causal".
- Repeatedly inject at key nodes to maintain state. Plan mode, Auto mode, Output Style, skill invocation guidelines—these told once model forgets; re-inject every turn to make "currently in plan mode" truly persist 50 turns.
- Make reminder bytes "cacheable". Fields in templates prone to jitter per-turn actively frozen (e.g., memory header's "N days ago" calculated and cached at attachment creation time), avoiding
Date.now()punching through prompt cache—making per-turn injected reminders allies of prompt cache, not enemies. - Correct side effects. Reminders misplaced (as tool_result sibling) teach bad model habits, making it吐出 extra empty
Human:turn. Smoosh merge mechanism fundamentally eliminates this drift, making "continuously injecting reminders" itself not accumulate toxicity.
These four together are what <system-reminder> really does—it's not just a tag, but a complete engineering practice of "how to continuously guide and constrain large models".
Conclusion
If your previous impression of Claude Code was "Claude API + tool use + MCP + skill + context compression", I hope this article showed you one more layer: before every API request actually goes out, the messages array has long been quietly filled with various <system-reminder>s. They are the nearly invisible skeleton keeping model behavior stable long-term.
Appendices
A. Phase 4 A/B Experiment
Main text §4.7 mentioned Plan mode Phase 4 ("write final plan" step) has 4 A/B variants in code. Here organizing experiment details as a small window into "how big tech does prompt engineering experiments".
Experiment codename: tengu_pewter_ledger, bucketed by Anthropic server-side feature flag. Return values: null (CONTROL control group, also default/fallback), 'trim', 'cut', 'cap'. So which Phase 4 variant a user sees doesn't depend on local config, but server-side assignment; null is fallback, meaning statistically most users see CONTROL.
4 variants from loose to strict:
CONTROL: Control group original (expanded in main text), requires Context background section, detailed file list, end-to-end verification description.TRIM: Context compressed to one line, verification section changed to "one verification command".CUT: Directly prohibits writing Context/Background section, adds "good plans usually under 40 lines, fluff is padding".CAP: Strictest version—prohibits prose sections, one line per file description only, hard 40-line limit.
Experiment motivation (from CONTROL group baseline): Code comments attach baseline statistics from ~26.3 million sessions over 14 days (as of 2026-03-02):
- Plan file length: median 4,906 chars, p90 11,617 chars, mean 6,207 chars.
- 82% sessions use Opus 4.6.
- Rejection rate monotonically increases with plan size: ~20% user rejection when plan under 2K chars, rising to 50% when plan over 20K chars.
This monotonic relationship is the starting point for TRIM/CUT/CAP experiments—shorter plans, easier user acceptance. Thus experiment asks: "can we compress Phase 4 instructions to make model actively write shorter plans, without losing quality?"
Monitoring metrics (from code comments):
- Primary metric: Session-level average cost (Anthropic internal metric codename
fact__201omjcij85f). Using cost rather than direct "plan length" as primary metric because Opus output price is 5x input price—money spent equivalent to bytes output; this reflects overall overhead better than plan length. - Mechanism variable: Plan file character count recorded at plan exit (
planLengthChars). Comment specifically warns: "CAP variant may compress plan file shorter, but total output反而 increases due to write→count→edit repeated cycles—so looking only at plan length misleads." - Guardrail metrics: User thumbs-down (feedback-bad) ratio, requests per session (plan too thin → implementation phase needs more turns), tool error rate. Three guardrails prevent "compressing for compression's sake destroying quality".
As of submission experiment still running—code for all 4 variant branches still online, server still bucketing by config, no convergence to single winner evident.
B. Interview Phase: The Alternative Plan Path for Anthropic Employees
Main text §4.7 Plan mode expansion shows 5-phase workflow—main flow for external users. Code also contains another complete Plan workflow, codename interview phase. It is not step X of 5-phase, but alternative to 5-phase: after Plan mode activated, Claude Code either takes 5-phase or interview phase, mutually exclusive.
How to determine which path: Function isPlanModeInterviewPhaseEnabled() determines by priority:
- Anthropic employees (code uses env var
USER_TYPE === 'ant')—always enabled interview phase. - Otherwise check env var
CLAUDE_CODE_PLAN_MODE_INTERVIEW_PHASE(user explicit toggle). - Otherwise check feature flag
tengu_plan_mode_interview_phase.
Because employees强制走 interview phase, while Phase 4 A/B experiment (see Appendix A) only runs on 5-phase workflow, interview phase is naturally isolated from experiment, becoming observational design reference group.
Full template (from getPlanModeInterviewInstructions, placeholders like ${planFilePath} replaced at runtime; outermost wrapped in <system-reminder>):
Where ${planFileInfo} uses same conditional copy as 5-phase同名 placeholder—when plan file exists expands to "A plan file already exists at ${planFilePath}. You can read it and make incremental edits using the Edit tool.", when not exists expands to "No plan file exists yet. You should create your plan at ${planFilePath} using the Write tool.".