使用社交账号登录
如果你写过任何基于 Claude API / OpenAI API 的应用,大概都踩过同一个坑:上下文长到一定程度,要么直接 prompt-too-long 被拒,要么钱包被缓存缺失打穿,要么模型开始"记性变差"。
大家通常会自己写一层简单的 compact:到阈值就丢一段历史,或者塞一条总结。Claude Code 这边不一样——它不是一个机制,而是一条五级流水线。每一级负责不同的场景、不同的代价、不同的兜底时机,每一级最终都落在发给 API 的那个 messages 数组上。
这篇就是把这条流水线彻底拆开,告诉你每一级:
对 Anthropic API 来说,所有"上下文管理"归根到底就是客户端决定一件事——
这一轮我要把哪一组
messages发过去。
API 不认识"你的会话历史"、也不认识"compact 过几次",它只认识你这次 request payload 里的那个 messages 数组。每条 message 是 user 或 assistant,content 可能是字符串、也可能是 text / tool_use / tool_result / thinking 这些 block 的混合。
Claude Code 的上下文管理,说白了就是在 client side 一层一层修改这个数组,然后才把修改后的结果发给 API。极少数情况下还会附带一些服务端 Context Management 指令(比如 cache_edits / clear_tool_uses_20250919),但那是锦上添花,不是主干。
记住这一点:下面讲的每一级,都是在重构 messages 数组。
每次要向 API 发请求之前,Claude Code 按这个顺序跑:
设计直觉是:
下面挨个拆开。
这是入口限流,不是对历史的压缩。
当一个工具(比如 Bash、Read、Grep)执行完,返回的 tool_result 要塞进 messages 数组之前,先过一道预算检查:太大就不让原文进去。
两个时间点:
tool_result block 时tool_result 做一次总预算检查(主要防并行工具同轮返回的叠加)两级阈值(默认值):
tool-results/<tool_use_id>.txt,messages 里只留一个"引用消息"替换后的内容是固定模板:
你让 Claude 跑 cat huge.log,stdout 500KB。如果这 500KB 原文直接进上下文:
Tool Result Budget 做的事是:只让 2KB 预览进入上下文,模型看到"文件在那里",真需要细节时再通过 Read(offset, limit) 精确回读对应片段。一个轻量的磁盘层替代了一个沉重的 token 层。
一种给模型赋能的主动删除机制:给每一条 user input 挂一个短 ID,让模型可以引用这个 ID 说"这一整轮我不要了",然后把这条 user input 到下一条 user input 之间所有内容(包括该 user message 本身、随后的 assistant 思考、所有 tooluse / toolresult)整段从 messages 数组里物理移除。
这是整个流水线里唯一由模型主导的一级——其他几级都是 client 自动做决定。
理解 Snip 最重要的一点是它按 user turn 为单位工作,不是按单条 message。
一个 user turn 长这样:
当模型对某个 ID 调用 SnipTool,整个从该 user input 开始、到下一条 user input 之前的所有消息都从后续 API 请求里消失。
三个时间点:
[id:<短ID>] 尾标(tool_result 类的 user message 不算"真正的 user input",不会挂 ID)removedUuids 写到 transcript 边界;resume 时重放保证持久化parentUuid 做回溯修复,避免 dangling 链关键细节:[id:xxxxxx] 这个 tag 只加在"发给 API 的 copy"上,不会写回原始存储。transcript 里用户的原话永远干净,只有 model-visible 的那份带 tag。
假设会话里已经积累了下面这段历史(为方便阅读省略了部分字段)。注意 Claude Code 发给 API 之前会在每条真正的 user input末尾追加 [id:...] 标签——tool_result 类型的 user message 不挂标签。
此时模型在这一轮里能看到两个真正的 user input ID:abc123(调研 TODO)和 def456(修登录 bug)。由于用户已经明确说"先别搞 TODO 了",关于 Turn 1 的所有内容(23 处 TODO 列表、login.ts 全文、以及对应的思考)对后续修 bug 的工作已经是纯粹的 token 负担。
Turn 1 整段(含 user input + 两次 assistant tooluse + 两次 toolresult,共 5 条 message)被物理删除:
toolu_01 的 tool_use 和它对应的 tool_result 一起消失,toolu_02 同理。这样 API 侧不会出现"有 toolresult 但找不到 tooluse"或反过来的错误。[id:def456] 没被动过。Snip 精准作用在 abc123 所在的那个 user turn,不会误伤后续 turn。removedUuids 重放同样的删除结果,让 model-visible 视图保持一致——但用户"原话"本身始终保留在磁盘上,随时可审计。传统 compact 的弱点是"一刀切总结"——粒度粗,容易把有用的原文一并丢掉。Snip 恰好相反:由模型自己判断哪一轮已经作废,整轮精准切除,剩下的 recent messages 还是原汁原味。两者互补:
只针对旧工具结果的轻量压缩。它不总结对话、不调模型、不改用户消息,只干一件事:把旧的大块 tool_result.content 换成占位符或 cache 编辑指令。
只处理这几个工具的结果:Read、Bash、Grep、Glob、WebSearch、WebFetch、Edit、Write。用户的文字、模型的思考、plan、attachment 它一概不动。
两条独立路径:
路径 A:Time-based Microcompact
路径 B:Cached Microcompact
Time-based 路径——直接改本地 messages:
content 原文替换成字面字符串 [Old tool result content cleared]Cached 路径——本地 messages 不变,而是在 API 层带 cache_edits:
cache_edits 指令,告诉服务端"你缓存里编号 xxx 的那几段我不要了"另外还有一层API-native Context Management,不是客户端做的,而是 Anthropic API 原生支持的策略:
这两个块加在 API 参数里,由服务端在超过 180K input tokens 时自动清理 tool_use 类内容。
为了把示例控制在可读长度内,下面演示一个缩小版场景——假设 keepRecent = 2(默认是 5)。场景是:你让 Claude 帮你调研一个项目,连续跑了 3 个工具,然后去吃午饭,70 分钟后回来继续问问题。
这一刻,Time-based Microcompact 的触发条件成立:主线程 + 有上一条 assistant + gap > 60 分钟。
从旧到新找出所有"可压缩工具"的 tool_result:
保留最近 keepRecent = 2 个(toolu_02 / toolu_03),其余的 content 替换为占位符。
tool_use_id 不删——toolu_01 的 tooluse(含参数 pattern: "**/*.ts")完整保留,只是对应的 toolresult content 被替换成字面字符串。API 侧的 tool_use ↔ tool_result 配对关系依然成立。toolu_01 是一次 Glob("**/*.ts") 调用,只是具体返回已作废。如果后续真的需要,它可以再调一次 Glob 重新取。content。Cached 路径的关键是本地 messages 完全不变,替换发生在服务端缓存那边。对比如下(沿用上面的场景但走 Cached 路径):
这样的好处是:prompt cache prefix 不会被打断。Time-based 那种直接改本地 messages 的做法,会导致缓存 key 改变,下一次请求所有 cache hit 归零。Cached 路径通过让服务端自己"内部删",既释放了 token 成本,又保住了 cache 命中率。
退一步看,两条路径的分工其实就是一个 cache state machine:
cache_edits 精细化编辑并且源码里这两条路径是短路关系——Time-based 先检查,一旦命中就直接 return,不再走 Cached。这个顺序也是冷/热的自然推论:既然都判断"缓存已冷"了,再去做 cache editing 也没意义。
clear_tool_uses_20250919 兜底,由服务端在 180K 阈值上自动清理这一级在 query pipeline 里存在,位置在 Microcompact 之后、Autocompact 之前。启用它会抑制主动 Autocompact——在 Claude Code 的设计里,Collapse 和 Autocompact 竞争同一段 headroom,所以开 Collapse 时 shouldAutoCompact() 直接返回 false,让 Collapse 接管。
从 transcript 里能看到 Collapse 会落两种记录:
marble-origami-commit:append-only 的 splice 指令,记录"如何把某段历史折叠成一条 summary placeholder",包含 collapseId / summaryUuid / summaryContent / firstArchivedUuid / lastArchivedUuidmarble-origami-snapshot:last-wins 的 staged 状态快照,包含 staged spans / armed 标志 / lastSpawnTokens这两种记录结构暗示 Collapse 在做"分段归档 + 摘要 placeholder"——大致工作方式是把一段早期历史评分、挑选、打包成一个带摘要的归档单元,让后续请求里那段历史被一条 placeholder 替代。更细的 staged spans 挑选算法、summary placeholder 的具体格式、触发阈值链条,本篇不展开。
当前面四级都没能把上下文压下来时的最后一道防线。它本身不做压缩,而是两条子路径二选一:
summary.md)shouldAutoCompact() 判断"上下文接近 token 上限"时。注意如果 Context Collapse 启用了,这一步会被直接跳过(让 Collapse 处理)。
触发后的流程是:
核心思路是:不要等到上下文爆了才开始总结,平时就在后台持续维护一份结构化摘要文件,爆的时候直接读这份文件当摘要用。
这样的好处非常明确:
文件名 summary.md,完整路径:
注意是每个 session 一份,不是项目级共享。理由很直接——不同 session 做不同的事,公用一份会互相串扰。
summary.md 不是自由格式日记,而是后台 agent 按固定模板填空。初始化时就会先写入一份空模板,后续每次 extraction 只更新正文。完整模板有 10 个 section:
| # | Section | 这个 section 装什么(guidance) |
|---|---|---|
| 1 | Session Title | 一个信息密度高的 5-10 词 session 标题,无填充词 |
| 2 | Current State | 当下正在做什么?未完成的任务、下一步要做什么 |
| 3 | Task specification | 用户要建什么?任何设计决策或解释性上下文 |
| 4 | Files and Functions | 哪些文件重要?各自装了什么、为什么相关 |
| 5 | Workflow | 常用哪些 bash 命令、什么顺序、输出怎么解读 |
| 6 | Errors & Corrections | 遇到过哪些 error、怎么修的、用户纠正了什么、哪些路子走不通 |
| 7 | Codebase and System Documentation | 重要系统组件、它们怎么协作 |
| 8 | Learnings | 什么方式奏效了、什么不行、要避免什么(不重复其他 section 的内容) |
| 9 | Key results | 如果用户明确要过某个结果(答案、表格、文档),在这里原样保留 |
| 10 | Worklog | 一步一步做了什么尝试,每步极简摘要 |
这里要和后面传统 LLM Compact 的 9 段式摘要区分开——那是 Autocompact 兜底路径临时要模型生成的摘要格式,Section 和这里不一样(比如有 "All user messages"、"Current Work"、"Optional Next Step" 等更偏"对话上下文"的项)。Session Memory 的 10 段模板更偏"项目记忆"。
分两层看:后台什么时候更新 summary.md vs Autocompact 什么时候读它。
后台更新触发(默认阈值):
(token 增长 ≥ 5000 && 工具调用 ≥ 3) || (token 增长 ≥ 5000 && 最近一轮无工具调用)querySource === 'repl_main_thread' 上跑,subagent / teammate 不跑Autocompact 调用时机(子路径 A 的 first-try 入口):
shouldAutoCompact() 判定要压summary.md;如果文件不存在或仍是空模板,返回 null 让位给兜底lastSummarizedMessageId 的生命周期这是 Session Memory Compact 的核心状态之一,决定了"哪一条消息之后才属于保留区"。不弄清它,看不懂后面的保留算法。
语义:最后被 summary.md 吸收掉的消息 uuid
也就是说,uuid ≤ lastSummarizedMessageId 的消息都已经被 Session Memory 消化过了;新消息(uuid > lastSummarizedMessageId)才是下一次 extraction 要处理的增量。
更新时机与更新值
后台 extraction 结束之后,不是无条件更新,而是有一道安全闸:
为什么要加这道闸?
因为 lastSummarizedMessageId 会被 compact 阶段拿来算"保留区起点"。如果在"assistant 刚发起 tooluse,toolresult 还没返回"那一刻更新了它,后续 compact 就可能把 tooluse 归入"已 summary",把 toolresult 归入"保留区"——API 请求就会报 "toolresult 找不到对应的 tooluse" 的 400。这个闸门保证更新发生在对话的自然断点。
源码里的 calculateMessagesToKeepIndex() 做的事用伪代码写出来就是:
几个关键点:
i-- 就把更早的一条消息纳入保留区。很多人以为 compact 的输出就是"boundary + summary + 最近消息"三件套。实际上后面还挂着一串 attachment,它们才是 compact 后模型能快速继续干活的关键。
buildPostCompactMessages() 的拼装顺序是:
这 8 种 attachment 的触发条件各不相同:
| # | Attachment 类型 | 注入内容 | 什么时候会出现 |
|---|---|---|---|
| 1 | file_reference | 最近读过的文件,原文摘回 | 有最近 Read 过的文件且不在保留区里 |
| 2 | plan_file_reference | 当前 session 的 plan 文件 | 有 active plan |
| 3 | invoked_skills | 本会话激活过的 skills | 激活过任何 skill |
| 4 | plan_mode | plan mode 状态提示 | 当前处于 plan mode |
| 5 | task_status | 后台运行中的 agent / task 状态 | 有后台 async agent 在跑 |
| 6 | deferred_tools_delta | 工具列表相较 compact 前的增减 | 工具列表有变化 |
| 7 | agent_listing_delta | agent 列表的增减 | agent 列表有变化 |
| 8 | mcp_instructions_delta | MCP 指令的变动 | MCP instructions 有变化 |
预算(默认常量):
换句话说,Claude Code 对"物品栏里能塞什么"做了严格的预算控制——summary 负责告诉你"我们在干什么",attachments 负责给你"继续干的原材料"。两者分工非常清楚。
场景:你和 Claude 讨论了 2 小时项目 auth 重构,中间跑过 40 多次工具调用、改了十几个文件。上下文涨到接近阈值,最近几轮正在实现 AuthSession.refresh()。后台的 summary.md 一直在被更新。
summary.md 当前内容(磁盘上)按 10 段式模板填充后大概长这样(下面展示前几段的示例填充,真实文件 10 段都会有):
注意这份文件是固定模板 + 后台 agent 填充的,不是自由格式日记。模板中每个 section 下面都有一条斜体 guidance(比如 "What is actively being worked on right now?"),后台 agent 看着这些 guidance 往里填。
算法做三件事:
lastSummarizedMessageId = u128,u129 及之后都属于"保留区"summary.md 的正文包成一条 user messagelastSummarizedMessageId。headUuid / anchorUuid / tailUuid 记下了"保留段是从哪里到哪里"。resume 时靠这三个 UUID 把 compact 后的视图和原始 transcript 重新连起来。Session Memory Compact 失败时(最常见的原因:summary.md 还没到初始化阈值就爆了)走的老路径。做法很直接:临时调一次模型,让它给当前会话生成一份 9 段式结构化摘要,然后用这份摘要替换原始历史。
trySessionMemoryCompaction() 返回 null 时。
核心是一次"对话的对话"——客户端用当前 messages 构造一个新的 API 请求:
真正发给摘要模型的 api_messages 数组大概长这样——中间那一大段是当前会话的完整历史原文,方括号标出省略的部分:
调用参数里还有三个关键设定:
system 固定一句话:"You are a helpful AI assistant tasked with summarizing conversations."thinkingConfig 显式禁用——摘要任务不需要 extended thinkingquerySource = "compact"——标记这是一次"摘要调用",不会再次触发 compact / snip 等上下文管理流程(避免递归)模型返回的应该是 <analysis>...</analysis><summary>...</summary> 两段纯文本。客户端从中提取 <summary> 部分作为 9 段式摘要正文,再走 buildPostCompactMessages() 拼成新的主线程 messages(和 Session Memory Compact 共用同一个拼装函数)。
摘要固定要求 9 个 section(完整 prompt 见本节末附录):
这是一个很容易忽视但非常关键的自救机制。上面那次"对话的对话"本身也是一次 API 调用,它也可能返回 PTL 错误——特别是会话已经巨大才触发 compact 时,把"当前历史 + 长 prompt"一起发出去很容易超 token 限额。
Compact 不会让会话彻底卡死,最多允许 3 次 PTL retry,流程如下:
对应的伪代码:
几点值得注意的设计选择:
[earlier conversation truncated for compaction retry] 这行不是给用户看的、也不是给模型"真"读的——它纯粹是为了满足"messages 第一条必须是 user"这个 API 约束。摘要模型返回后,客户端用 buildPostCompactMessages() 重建主线程 messages:
和 Session Memory Compact 最本质的差别只有一个:没有 messagesToKeep 段——传统 LLM Compact 用摘要替换所有历史,recent messages 原文不保留。其他(boundary / summary / attachments / hooks 的顺序和预算)完全一致,因为用的是同一个拼装函数。
Post-compact 恢复的预算是写死的常量(默认):
新开会话,和 Claude 讨论了一个很紧凑的问题——几分钟内就跑了十几个大工具调用,上下文直接冲上限。这时 summary.md 还没到初始化阈值(10K tokens 是"够稳定"的门槛,但你这会话是"短时间内密集")。
Session Memory Compact 返回 null,回退到传统 LLM Compact:
为了方便核对,下面是 getCompactPrompt() 最终拼出来、发给摘要模型的 user message 全文。它由 4 段拼接而成:
完整原文(未配置 customInstructions 的情况):
这段 prompt 里几个值得留意的 prompt engineering 细节:
CRITICAL、段中 "Tool calls will be REJECTED"、结尾再 REMINDER。对 tool-calling 能力很强的模型,这种高频硬约束是必要的——只说一次,模型还是会忍不住去 Read 验证一下<analysis> → <summary>:前者是模型"先想一遍",后者才是真正会被写进 transcript 的摘要。分开是为了让思考过程不直接污染摘要正文CLAUDE.md 或专门的 compact instructions 给这个 prompt 加后缀,比如 "focus on typescript code changes" / "include test output verbatim"如果把五级 + 两条子路径放到一张表里看:
| 层级 | 触发时机 | 作用对象 | 代价 | 是否调模型 |
|---|---|---|---|---|
| Tool Result Budget | 工具返回 & 发请求前 | 单个 tool_result | 极低 | 否 |
| Snip | 每请求 / 模型主动 | 整条 message | 低 | 否(模型主导) |
| Microcompact (time) | 60min 静默后 | 旧 tool_result.content | 低 | 否 |
| Microcompact (cached) | 每请求(支持 cache) | 服务端缓存视图 | 极低 | 否 |
| Context Collapse | 每请求 | 分段归档 + summary placeholder | 中 | 是(摘要生成) |
| Session Memory Compact | Autocompact 首选 | 早期历史 → summary.md | 中(读磁盘) | 后台 agent 维护文件 |
| 传统 LLM Compact | Autocompact 兜底 | 全量历史 → 9 段摘要 | 高(主线程级 LLM 调用) | 是 |
几点值得注意的设计选择:
"便宜的事情先做"。Tool Result Budget 只是字符数比较 + 写文件,几乎零成本;LLM Compact 是主线程级别的 API 调用,是重活。流水线把廉价、精细的处理放前面,把昂贵、粗糙的处理放后面,典型的 cost-aware pipeline。
"能不调模型就不调"。直到最后一级 Autocompact 的兜底路径,才会真的花一次 API 调用做摘要。前面所有级别要么是机械替换、要么是删 UUID、要么是服务端 cache 指令。
"保住最近的原文"是一个清晰的价值排序。Session Memory Compact 的所有复杂性——后台持续维护 summary.md、lastSummarizedMessageId bookkeeping、API invariants 修复——都在保护同一个目标:recent messages 原样保留。因为开发者的"正在干的事"往往需要原文细节,而"上下文铺垫"只要知识性的摘要就够。
每一级都是 messages 数组的修改。对 Anthropic API 来说没有什么神秘压缩参数,所有机制的落点都在 payload 里那个数组。唯一的例外是 Cached Microcompact 的 cache_edits 和 API-native Context Management(clear_thinking_* / clear_tool_uses_*),那是服务端层面的约定。
如果你在做 AI 应用的上下文管理,这条流水线至少给了三个直接能借鉴的点:
Tool Result Budget → Snip → Microcompact → Context Collapse → Autocompact
│
┌─────────────────────┤
▼ ▼
Session Memory Compact 传统 LLM Compact
(首选路径) (兜底路径)
<persisted-output>
Output too large (317842 chars). Full output saved to: /project/.claude/tool-results/toolu_abc.txt
Preview (first 2 KB):
...前 2KB 原文...
</persisted-output>
// 变换前 —— 原始 tool_result
{
"type": "user",
"message": {
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_abc",
"content": "<300KB 的 bash stdout>"
}
]
}
}
// 变换后 —— preview + 磁盘路径
{
"type": "user",
"message": {
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_abc",
"content": "<persisted-output>Output too large (317842 chars). Full output saved to: /project/.claude/tool-results/toolu_abc.txt\n\nPreview (first 2 KB):\n...\n</persisted-output>"
}
]
}
}
[ user input ][ assistant/tool_use ][ tool_result ][ assistant/tool_use ][ tool_result ]...
↑ 这里挂一个 [id:xxxxxx] ↑ 直到下一条真正的 user input
[
// —— Turn 1:用户要求调研 TODO ——
{
"type": "user",
"message": {
"content": "帮我找一下项目里所有的 TODO,我想集中处理一下\n[id:abc123]"
}
},
{
"type": "assistant",
"message": {
"content": [
{ "type": "text", "text": "好的,我用 Grep 搜索一下。" },
{
"type": "tool_use",
"id": "toolu_01",
"name": "Grep",
"input": { "pattern": "TODO", "path": "src/" }
}
]
}
},
{
"type": "user",
"message": {
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_01",
"content": "src/auth/login.ts:42: // TODO: handle refresh token\nsrc/api/user.ts:89: // TODO: validate input\n... 共 23 处 ..."
}
]
}
},
{
"type": "assistant",
"message": {
"content": [
{ "type": "text", "text": "我打开 login.ts 看一下那个 refresh token 的 TODO 具体场景。" },
{
"type": "tool_use",
"id": "toolu_02",
"name": "Read",
"input": { "file_path": "src/auth/login.ts" }
}
]
}
},
{
"type": "user",
"message": {
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_02",
"content": "<login.ts 全文,约 4KB>"
}
]
}
},
// —— Turn 2:用户改主意,转去修 bug ——
{
"type": "user",
"message": {
"content": "算了先别搞 TODO 了,线上登录报 500 的 bug 更紧急,先修这个\n[id:def456]"
}
},
{
"type": "assistant",
"message": {
"content": [
{ "type": "text", "text": "理解,TODO 先放下。登录 500 我查一下最近的错误日志。" },
{
"type": "tool_use",
"id": "toolu_03",
"name": "Bash",
"input": { "command": "tail -200 /var/log/app/error.log" }
}
]
}
}
]
{
"type": "tool_use",
"id": "toolu_04",
"name": "Snip",
"input": {
"ids": ["abc123"],
"reason": "用户已改方向放弃 TODO 调研,原轮次的 Grep 结果与 login.ts 全文与当前 bug 修复无关"
}
}
[
// Turn 1 整段消失,直接从 Turn 2 的 user input 开始
{
"type": "user",
"message": {
"content": "算了先别搞 TODO 了,线上登录报 500 的 bug 更紧急,先修这个\n[id:def456]"
}
},
{
"type": "assistant",
"message": {
"content": [
{ "type": "text", "text": "理解,TODO 先放下。登录 500 我查一下最近的错误日志。" },
{
"type": "tool_use",
"id": "toolu_03",
"name": "Bash",
"input": { "command": "tail -200 /var/log/app/error.log" }
}
]
}
},
{
"type": "user",
"message": {
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_03",
"content": "<日志 tail 输出>"
}
]
}
}
]
{ "type": "clear_thinking_20251015", "keep": "all" }
{
"type": "clear_tool_uses_20250919",
"trigger": { "type": "input_tokens", "value": 180000 },
"clear_at_least": { "type": "input_tokens", "value": 140000 }
}
[
{
"type": "user",
"message": { "content": "帮我了解一下这个项目的结构\n[id:abc123]" }
},
{
"type": "assistant",
"message": {
"content": [
{ "type": "text", "text": "我先看看目录。" },
{ "type": "tool_use", "id": "toolu_01", "name": "Glob", "input": { "pattern": "**/*.ts" } }
]
}
},
{
"type": "user",
"message": {
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_01",
"content": "src/index.ts\nsrc/auth/login.ts\nsrc/api/user.ts\n... 共 147 个文件 ..."
}
]
}
},
{
"type": "assistant",
"message": {
"content": [
{ "type": "text", "text": "读一下入口文件。" },
{ "type": "tool_use", "id": "toolu_02", "name": "Read", "input": { "file_path": "src/index.ts" } }
]
}
},
{
"type": "user",
"message": {
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_02",
"content": "import { createApp } from './app'\nimport { loadConfig } from './config'\n... <完整 4KB 源码> ..."
}
]
}
},
{
"type": "assistant",
"message": {
"content": [
{ "type": "text", "text": "再扫一下 config 相关的定义。" },
{ "type": "tool_use", "id": "toolu_03", "name": "Grep", "input": { "pattern": "loadConfig", "path": "src/" } }
]
}
},
{
"type": "user",
"message": {
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_03",
"content": "src/config.ts:12: export function loadConfig() {\nsrc/config.ts:34: return loadConfig()\n... 共 8 处匹配 ..."
}
]
}
},
{
"type": "assistant",
"message": {
"content": [
{ "type": "text", "text": "项目结构大致是单入口 + 按模块分的 config/auth/api 三块。你想先深入哪一块?" }
]
}
// ← 最后一条 assistant 的 timestamp 记作 12:00
}
]
{
"type": "user",
"message": { "content": "那就从 auth 模块开始\n[id:def456]" }
// ← 当前时间 13:10,距离上一条 assistant (12:00) 已 70 分钟
}
toolu_01 (Glob) toolu_02 (Read) toolu_03 (Grep)
↑ 最旧 ↑ 第二 ↑ 最新
[
// user_1 + 对应 assistant.tool_use 完全不动
{
"type": "user",
"message": { "content": "帮我了解一下这个项目的结构\n[id:abc123]" }
},
{
"type": "assistant",
"message": {
"content": [
{ "type": "text", "text": "我先看看目录。" },
{ "type": "tool_use", "id": "toolu_01", "name": "Glob", "input": { "pattern": "**/*.ts" } }
]
}
},
// ★ toolu_01 的 tool_result.content 被替换
{
"type": "user",
"message": {
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_01",
"content": "[Old tool result content cleared]"
}
]
}
},
// toolu_02 / toolu_03 原文保留(落在 keepRecent 窗口内)
{
"type": "assistant",
"message": {
"content": [
{ "type": "text", "text": "读一下入口文件。" },
{ "type": "tool_use", "id": "toolu_02", "name": "Read", "input": { "file_path": "src/index.ts" } }
]
}
},
{
"type": "user",
"message": {
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_02",
"content": "import { createApp } from './app'\nimport { loadConfig } from './config'\n... <完整 4KB 源码> ..."
}
]
}
},
{
"type": "assistant",
"message": {
"content": [
{ "type": "text", "text": "再扫一下 config 相关的定义。" },
{ "type": "tool_use", "id": "toolu_03", "name": "Grep", "input": { "pattern": "loadConfig", "path": "src/" } }
]
}
},
{
"type": "user",
"message": {
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_03",
"content": "src/config.ts:12: export function loadConfig() {\nsrc/config.ts:34: return loadConfig()\n... 共 8 处匹配 ..."
}
]
}
},
// assistant 收尾 + 新 user input 原样附在后面
{
"type": "assistant",
"message": {
"content": [
{ "type": "text", "text": "项目结构大致是单入口 + 按模块分的 config/auth/api 三块。你想先深入哪一块?" }
]
}
},
{
"type": "user",
"message": { "content": "那就从 auth 模块开始\n[id:def456]" }
}
]
// 本地 messages 数组:原文全部保留,没有任何 [Old ... cleared]
// 发给 Anthropic 的 API request body(概念化):
{
"model": "claude-xxx",
"messages": [ /* 上面那个完整数组,原文不变 */ ],
"cache_edits": [
// 客户端维护的 pending edits,告诉服务端
// "你 prompt cache 里对应 toolu_01 那段可以删掉了"
{ "action": "clear", "tool_use_id": "toolu_01" }
]
}
autoCompactIfNeeded()
├── 先试 Session Memory Compact
│ ├── 成功 → 返回结果
│ └── 失败(null)→ 回退
└── 传统 LLM Compact
{projectDir}/{sessionId}/session-memory/summary.md
# 伪代码
def update_last_summarized_message_id_if_safe(messages):
# 最后一条 assistant 还在等 tool_result 吗?
if has_tool_calls_in_last_assistant_turn(messages):
return # 不更新,避免把一个未闭合的 tool pair 切在中间
last = messages[-1]
if last.uuid:
set_last_summarized_message_id(last.uuid)
# 默认配置
MIN_TOKENS = 10_000
MIN_TEXT_BLOCK_MESSAGES = 5
MAX_TOKENS = 40_000
def calculate_messages_to_keep_index(messages):
# 1) 找分界点
last_index = find_index(messages, by_uuid=last_summarized_message_id)
start = last_index + 1 # 默认:summary 消化过的那条之后,全部保留
# 2) 统计当前保留区的 token / text-block 数
total_tokens = sum_tokens(messages[start:])
text_block_count = count_text_block_messages(messages[start:])
# 3) 如果已经满足下限 → 直接收工
if total_tokens >= MIN_TOKENS and text_block_count >= MIN_TEXT_BLOCK_MESSAGES:
return adjust_index_to_preserve_api_invariants(messages, start)
# 4) 否则,从 start-1 倒着向前扩展起点(往早拿)
# 直到同时满足 (total_tokens >= MIN_TOKENS AND text_block_count >= MIN)
# 或者 total_tokens >= MAX_TOKENS 硬上限先到
# 或者撞到 floor(上一个 compact boundary,不能跨)
for i in range(start - 1, floor - 1, -1):
total_tokens += token_count(messages[i])
if has_text_blocks(messages[i]):
text_block_count += 1
start = i
if total_tokens >= MAX_TOKENS:
break # 硬上限优先
if total_tokens >= MIN_TOKENS and text_block_count >= MIN_TEXT_BLOCK_MESSAGES:
break # 下限已同时满足
# 5) 最后对齐 API 约束
return adjust_index_to_preserve_api_invariants(messages, start)
[boundary marker]
→ [summary messages]
→ [messages to keep(保留区原样)]
→ [attachments ← 一批 meta 消息,按下面 8 种类型注入]
→ [hook results(session start hooks)]
# Session Title
Refactor auth middleware for compliance rewrite
# Current State
Migrating session token storage from cookies to encrypted Redis.
Pending: integration tests for multi-device login path.
Immediate next: finish `AuthSession.refresh()` branch.
# Task specification
- Remove raw session tokens from client cookies (legal/compliance request)
- Introduce AuthSession wrapper that holds an opaque id, with real data in Redis
- Preserve existing /login and /logout API shape; only storage layer changes
- Ensure refresh flow works across multi-device, no forced logout on other devices
# Files and Functions
- src/auth/AuthSession.ts (new): wraps opaque id + Redis-backed metadata
- src/auth/login.ts (updated): issues AuthSession on credential success
- src/auth/middleware.ts (updated): validates AuthSession on every request
- src/redis/sessionStore.ts (new): typed Redis gateway for session records
# Workflow
- `pnpm test auth/` to run auth unit tests
- `pnpm run dev:redis` to boot a local Redis for integration runs
- Error "ECONNREFUSED 6379" means Redis isn't up; start it before tests
# Errors & Corrections
- Early attempt used HMAC-signed cookies — rejected by legal (still stores session data client-side)
- First Redis schema used JSON strings — switched to hashes for partial-field updates
- User corrected: refresh must NOT invalidate other devices' session ids
# Codebase and System Documentation
- Auth path: login.ts → middleware.ts → AuthSession.ts → Redis
- Session ids are opaque; all real data lives in Redis under `session:{id}`
- TTL is sliding: every successful request extends expiry by 30 days
# Learnings
- Opaque id format needs to be URL-safe (base58 chosen over base64)
- Fail-closed fallback is acceptable because re-login UX is already smooth
# Key results
- (none yet — refresh implementation in progress)
# Worklog
- Drafted AuthSession type and Redis gateway
- Updated login/logout to issue/revoke AuthSession
- Updated middleware to validate AuthSession on each request
- Started refresh() branch; paused to handle multi-device concern
[
// ========== 早期 100+ 条历史(共约 80K tokens)==========
// UUID: u001 — 会话第一条 user input
{
"type": "user",
"message": { "content": "我们得重构 auth 中间件,legal 那边给了新要求\n[id:a1b2c3]" }
},
{ "type": "assistant", "message": { "content": [/* ... 长讨论 ... */] } },
{ "type": "user", "message": { "content": [/* tool_result: 读 middleware.ts */] } },
// ... 此处省略 100+ 条交错的 user/assistant/tool_result ...
// UUID: u128 — 这是 lastSummarizedMessageId(summary.md 消化到这里为止)
// ========== 最近的若干条消息(共约 12K tokens,落在 10K ~ 40K 的保留窗口内)==========
// UUID: u129
{
"type": "assistant",
"message": {
"content": [
{ "type": "text", "text": "AuthSession.refresh 的主要分支我画出来了,先写核心路径。" },
{ "type": "tool_use", "id": "toolu_80", "name": "Read", "input": { "file_path": "src/auth/AuthSession.ts" } }
]
}
},
{
"type": "user",
"message": {
"content": [
{ "type": "tool_result", "tool_use_id": "toolu_80", "content": "<AuthSession.ts 当前内容>" }
]
}
},
{
"type": "assistant",
"message": {
"content": [
{ "type": "text", "text": "refresh 要更新 Redis TTL 并回写新 opaque id。写一版。" },
{ "type": "tool_use", "id": "toolu_81", "name": "Edit", "input": { "file_path": "src/auth/AuthSession.ts", "old_string": "...", "new_string": "..." } }
]
}
},
{
"type": "user",
"message": {
"content": [
{ "type": "tool_result", "tool_use_id": "toolu_81", "content": "File edited." }
]
}
},
{
"type": "user",
"message": { "content": "等下 refresh 里要考虑多设备场景\n[id:m9n8o7]" }
},
{
"type": "assistant",
"message": {
"content": [
{ "type": "text", "text": "好,那 refresh 不能作废其它 device 的 session id。让我把设计稍微调一下……" }
]
}
}
]
[
// ---- ① compact boundary(带 preservedSegment 元数据,供 resume 时 relink)----
{
"type": "user",
"isMeta": true,
"message": { "content": "<compact boundary>" },
"compactMetadata": {
"preservedSegment": {
"headUuid": "u001",
"anchorUuid": "u128",
"tailUuid": "u134"
}
}
},
// ---- ② summary message(来自 summary.md,用 user role 注入)----
{
"type": "user",
"isMeta": true,
"message": {
"content": "Below is a summary of the session so far:\n\n# Session Title\nRefactor auth middleware for compliance rewrite\n\n# Current State\nMigrating session token storage from cookies to encrypted Redis.\n...\n# Pending tasks\n- AuthSession.refresh() implementation\n- Integration tests for multi-device case\n- Rollout plan (feature flag name: `auth_opaque_sessions_v2`)"
}
},
// ---- ③ recent messages:原样保留(u129 ~ u134,一字未动)----
{
"type": "assistant",
"message": {
"content": [
{ "type": "text", "text": "AuthSession.refresh 的主要分支我画出来了,先写核心路径。" },
{ "type": "tool_use", "id": "toolu_80", "name": "Read", "input": { "file_path": "src/auth/AuthSession.ts" } }
]
}
},
{ "type": "user", "message": { "content": [/* toolu_80 tool_result 原文 */] } },
{
"type": "assistant",
"message": {
"content": [
{ "type": "text", "text": "refresh 要更新 Redis TTL 并回写新 opaque id。写一版。" },
{ "type": "tool_use", "id": "toolu_81", "name": "Edit", "input": {/* ... */} }
]
}
},
{ "type": "user", "message": { "content": [/* toolu_81 tool_result 原文 */] } },
{ "type": "user", "message": { "content": "等下 refresh 里要考虑多设备场景\n[id:m9n8o7]" } },
{
"type": "assistant",
"message": {
"content": [
{ "type": "text", "text": "好,那 refresh 不能作废其它 device 的 session id。让我把设计稍微调一下……" }
]
}
},
// ---- ④ post-compact attachments(按触发条件依次注入)----
// ④-1 file_reference:最近读过的 ≤5 个文件(50K token 预算,单文件 5K)
{
"type": "user",
"isMeta": true,
"message": {
"content": [
{ "type": "text", "text": "<attachment: src/auth/AuthSession.ts>\n<完整内容 ≤5K tokens>" },
{ "type": "text", "text": "<attachment: src/auth/login.ts>\n<完整内容>" },
{ "type": "text", "text": "<attachment: src/redis/sessionStore.ts>\n<完整内容>" }
]
}
},
// ④-2 plan_file_reference:active plan 文件
{
"type": "user",
"isMeta": true,
"message": { "content": "<plan: implement AuthSession.refresh with multi-device support>" }
},
// ④-3 invoked_skills:本会话激活过的 skills(单 skill ≤5K,合计 ≤25K)
{
"type": "user",
"isMeta": true,
"message": { "content": "<invoked skills: test-driven-development, systematic-debugging>" }
},
// ④-4 plan_mode:如果当前在 plan mode,附一条状态提示
// (本例不在 plan mode,跳过)
// ④-5 task_status:如果有后台 async agent 在跑
// (本例没有后台 task,跳过)
// ④-6 deferred_tools_delta / ④-7 agent_listing_delta / ④-8 mcp_instructions_delta
// 这三种 delta 只在工具列表/agent 列表/MCP 指令相较 compact 前有变化时才注入
// (本例假设都没变化,跳过)
// ---- ⑤ session start hooks ----
{
"type": "user",
"isMeta": true,
"message": { "content": "<session start hook output>" }
}
]
# 伪代码
summary_prompt = build_compact_prompt() # 完整原文见本节末"附录"
summary_request = { "role": "user", "content": summary_prompt }
api_messages = normalize(strip_images(strip_attachments([
*get_messages_after_compact_boundary(messages), # 当前历史(不含之前 compact 过的部分)
summary_request # 末尾追加"请总结"的 user message
])))
summary_text = call_api(
messages = api_messages,
system = "You are a helpful AI assistant tasked with summarizing conversations.",
thinking = DISABLED,
source = "compact"
)
[
// ========== 会话历史(去掉图片、去掉 reinjected attachments 之后)==========
{
"type": "user",
"message": { "content": "帮我重构 auth 中间件,legal 那边给了新要求\n[id:a1b2c3]" }
},
{
"type": "assistant",
"message": {
"content": [
{ "type": "text", "text": "好的,先读一下 middleware.ts。" },
{ "type": "tool_use", "id": "toolu_01", "name": "Read", "input": { "file_path": "src/auth/middleware.ts" } }
]
}
},
{
"type": "user",
"message": { "content": [{ "type": "tool_result", "tool_use_id": "toolu_01", "content": "<middleware.ts 原文>" }] }
},
// [...省略 N 条真实历史消息:继续的 user input / assistant tool_use / tool_result / thinking 等
// 真实会话中这段通常 50~200 条、合计 80K~150K tokens,正是因为太大才会触发 compact...]
{
"type": "assistant",
"message": {
"content": [
{ "type": "text", "text": "refresh 的多设备分支写到一半,现在卡在是否需要全局广播 session 变更。" }
]
}
},
// ========== 末尾追加的 summary request(compact 的"指令")==========
{
"type": "user",
"message": {
"content": "CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.\n\n- Do NOT use Read, Bash, Grep, Glob, Edit, Write, or ANY other tool.\n- You already have all the context you need in the conversation above.\n- Tool calls will be REJECTED and will waste your only turn — you will fail the task.\n- Your entire response must be plain text: an <analysis> block followed by a <summary> block.\n\nYour task is to create a detailed summary of the conversation so far ...\n\n[完整原文见本节末"附录:完整 compact prompt 原文"]"
}
}
]
1. Primary Request and Intent
2. Key Technical Concepts
3. Files and Code Sections
4. Errors and fixes
5. Problem Solving
6. All user messages
7. Pending Tasks
8. Current Work
9. Optional Next Step
# 伪代码
MAX_PTL_RETRIES = 3
for attempt in range(MAX_PTL_RETRIES):
try:
return call_compact(messages)
except PromptTooLong as e:
if e.token_gap:
# 按 API 返回的 tokenGap 精确砍头
messages = drop_oldest_groups_until_gap_covered(messages, e.token_gap)
else:
# 兜底:砍掉最旧 20% 的 round
messages = drop_oldest_20_percent(messages)
# 如果砍完后第一条变成 assistant,API 会拒(首条必须是 user)
# 补一条 synthetic meta marker
if first_is_assistant(messages):
messages.insert(0, {
"type": "user",
"isMeta": True,
"message": { "content": "[earlier conversation truncated for compaction retry]" }
})
[
"<compact boundary>",
"<compact summary user message(9 段式正文)>",
"<post-compact attachments:最近 5 个文件、plan、invoked skills 等 8 种>",
"<session start hooks>"
]
NO_TOOLS_PREAMBLE
+ BASE_COMPACT_PROMPT ← 其中嵌入 DETAILED_ANALYSIS_INSTRUCTION_BASE
+ [可选] customInstructions ← 若通过 CLAUDE.md / compact instructions 配置了额外指令
+ NO_TOOLS_TRAILER
CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.
- Do NOT use Read, Bash, Grep, Glob, Edit, Write, or ANY other tool.
- You already have all the context you need in the conversation above.
- Tool calls will be REJECTED and will waste your only turn — you will fail the task.
- Your entire response must be plain text: an <analysis> block followed by a <summary> block.
Your task is to create a detailed summary of the conversation so far, paying close attention to the user's explicit requests and your previous actions.
This summary should be thorough in capturing technical details, code patterns, and architectural decisions that would be essential for continuing development work without losing context.
Before providing your final summary, wrap your analysis in <analysis> tags to organize your thoughts and ensure you've covered all necessary points. In your analysis process:
1. Chronologically analyze each message and section of the conversation. For each section thoroughly identify:
- The user's explicit requests and intents
- Your approach to addressing the user's requests
- Key decisions, technical concepts and code patterns
- Specific details like:
- file names
- full code snippets
- function signatures
- file edits
- Errors that you ran into and how you fixed them
- Pay special attention to specific user feedback that you received, especially if the user told you to do something differently.
2. Double-check for technical accuracy and completeness, addressing each required element thoroughly.
Your summary should include the following sections:
1. Primary Request and Intent: Capture all of the user's explicit requests and intents in detail
2. Key Technical Concepts: List all important technical concepts, technologies, and frameworks discussed.
3. Files and Code Sections: Enumerate specific files and code sections examined, modified, or created. Pay special attention to the most recent messages and include full code snippets where applicable and include a summary of why this file read or edit is important.
4. Errors and fixes: List all errors that you ran into, and how you fixed them. Pay special attention to specific user feedback that you received, especially if the user told you to do something differently.
5. Problem Solving: Document problems solved and any ongoing troubleshooting efforts.
6. All user messages: List ALL user messages that are not tool results. These are critical for understanding the users' feedback and changing intent.
7. Pending Tasks: Outline any pending tasks that you have explicitly been asked to work on.
8. Current Work: Describe in detail precisely what was being worked on immediately before this summary request, paying special attention to the most recent messages from both user and assistant. Include file names and code snippets where applicable.
9. Optional Next Step: List the next step that you will take that is related to the most recent work you were doing. IMPORTANT: ensure that this step is DIRECTLY in line with the user's most recent explicit requests, and the task you were working on immediately before this summary request. If your last task was concluded, then only list next steps if they are explicitly in line with the users request. Do not start on tangential requests or really old requests that were already completed without confirming with the user first.
If there is a next step, include direct quotes from the most recent conversation showing exactly what task you were working on and where you left off. This should be verbatim to ensure there's no drift in task interpretation.
Here's an example of how your output should be structured:
<example>
<analysis>
[Your thought process, ensuring all points are covered thoroughly and accurately]
</analysis>
<summary>
1. Primary Request and Intent:
[Detailed description]
2. Key Technical Concepts:
- [Concept 1]
- [Concept 2]
- [...]
3. Files and Code Sections:
- [File Name 1]
- [Summary of why this file is important]
- [Summary of the changes made to this file, if any]
- [Important Code Snippet]
- [File Name 2]
- [Important Code Snippet]
- [...]
4. Errors and fixes:
- [Detailed description of error 1]:
- [How you fixed the error]
- [User feedback on the error if any]
- [...]
5. Problem Solving:
[Description of solved problems and ongoing troubleshooting]
6. All user messages:
- [Detailed non tool use user message]
- [...]
7. Pending Tasks:
- [Task 1]
- [Task 2]
- [...]
8. Current Work:
[Precise description of current work]
9. Optional Next Step:
[Optional Next step to take]
</summary>
</example>
Please provide your summary based on the conversation so far, following this structure and ensuring precision and thoroughness in your response.
There may be additional summarization instructions provided in the included context. If so, remember to follow these instructions when creating the above summary. Examples of instructions include:
<example>
## Compact Instructions
When summarizing the conversation focus on typescript code changes and also remember the mistakes you made and how you fixed them.
</example>
<example>
# Summary instructions
When you are using compact - please focus on test output and code changes. Include file reads verbatim.
</example>
REMINDER: Do NOT call any tools. Respond with plain text only — an <analysis> block followed by a <summary> block. Tool calls will be rejected and you will fail the task.