使用扩展思考构建
此功能符合零数据保留(ZDR)的条件。当您的组织有 ZDR 安排时,通过此功能发送的数据在 API 响应返回后不会被存储。
扩展思考为 Claude 提供了增强的推理能力,用于处理复杂任务,同时在其交付最终答案之前提供不同级别的逐步思考过程透明度。
支持的模型
手动扩展思考(thinking: {type: "enabled", budget_tokens: N})在除 Claude Opus 4.7 之外的所有当前 Claude 模型上受支持,在 Opus 4.7 上不再被接受并返回 400 错误。部分模型有特定的行为:
- Claude Opus 4.7(
claude-opus-4-7): 不再支持手动扩展思考。请使用自适应思考(thinking: {type: "adaptive"})配合 effort 参数。 - Claude Mythos Preview: 默认使用自适应思考;也接受
thinking: {type: "enabled", budget_tokens: N}。不支持thinking: {type: "disabled"},display默认为"omitted"而不是返回思考内容。传递display: "summarized"以接收摘要。 - Claude Opus 4.6(
claude-opus-4-6): 推荐使用自适应思考;手动模式(type: "enabled")已被弃用但仍可使用。 - Claude Sonnet 4.6(
claude-sonnet-4-6): 推荐使用自适应思考;手动模式(type: "enabled")配合交错模式已被弃用但仍可使用。
思考行为在不同的 Claude 模型版本之间有所不同。详见不同模型版本之间的思考差异。
扩展思考的工作原理
当扩展思考开启时,Claude 会创建 thinking 内容块,在其中输出其内部推理。Claude 在构建最终响应之前会利用这些推理中的洞察。
API 响应包含 thinking 内容块,后跟 text 内容块。
以下是默认响应格式的示例:
{
"content": [
{
"type": "thinking",
"thinking": "Let me analyze this step by step...",
"signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
},
{
"type": "text",
"text": "Based on my analysis..."
}
]
}
有关扩展思考响应格式的更多信息,请参阅 Messages API 参考。
如何使用扩展思考
以下是在 Messages API 中使用扩展思考的示例:
curl https://api.anthropic.com/v1/messages \
--header "x-api-key: $ANTHROPIC_API_KEY" \
--header "anthropic-version: 2023-06-01" \
--header "content-type: application/json" \
--data \
'{
"model": "claude-sonnet-4-6",
"max_tokens": 16000,
"thinking": {
"type": "enabled",
"budget_tokens": 10000
},
"messages": [
{
"role": "user",
"content": "Are there an infinite number of prime numbers such that n mod 4 == 3?"
}
]
}'
ant messages create \
--transform content --format yaml \
--model claude-sonnet-4-6 \
--max-tokens 16000 \
--thinking '{type: enabled, budget_tokens: 10000}' \
--message '{role: user, content: Are there an infinite number of prime numbers such that n mod 4 == 3?}'
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
messages=[
{
"role": "user",
"content": "Are there an infinite number of prime numbers such that n mod 4 == 3?",
}
],
)
# The response contains summarized thinking blocks and text blocks
for block in response.content:
if block.type == "thinking":
print(f"\nThinking summary: {block.thinking}")
elif block.type == "text":
print(f"\nResponse: {block.text}")
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 16000,
thinking: {
type: "enabled",
budget_tokens: 10000
},
messages: [
{
role: "user",
content: "Are there an infinite number of prime numbers such that n mod 4 == 3?"
}
]
});
// The response contains summarized thinking blocks and text blocks
for (const block of response.content) {
if (block.type === "thinking") {
console.log(`\nThinking summary: ${block.thinking}`);
} else if (block.type === "text") {
console.log(`\nResponse: ${block.text}`);
}
}
using Anthropic;
using Anthropic.Models.Messages;
AnthropicClient client = new();
var parameters = new MessageCreateParams
{
Model = Model.ClaudeSonnet4_6,
MaxTokens = 16000,
Thinking = new ThinkingConfigEnabled(budgetTokens: 10000),
Messages = [
new() {
Role = Role.User,
Content = "Are there an infinite number of prime numbers such that n mod 4 == 3?"
}
]
};
var message = await client.Messages.Create(parameters);
foreach (var block in message.Content)
{
if (block.TryPickThinking(out ThinkingBlock? thinking))
{
Console.WriteLine({{CONTENT}}quot;\nThinking summary: {thinking.Thinking}");
}
else if (block.TryPickText(out TextBlock? text))
{
Console.WriteLine({{CONTENT}}quot;\nResponse: {text.Text}");
}
}
package main
import (
"context"
"fmt"
"log"
"github.com/anthropics/anthropic-sdk-go"
)
func main() {
client := anthropic.NewClient()
response, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
Model: anthropic.ModelClaudeSonnet4_6,
MaxTokens: 16000,
Thinking: anthropic.ThinkingConfigParamOfEnabled(10000),
Messages: []anthropic.MessageParam{
anthropic.NewUserMessage(anthropic.NewTextBlock("Are there an infinite number of prime numbers such that n mod 4 == 3?")),
},
})
if err != nil {
log.Fatal(err)
}
for _, block := range response.Content {
switch v := block.AsAny().(type) {
case anthropic.ThinkingBlock:
fmt.Printf("\nThinking summary: %s", v.Thinking)
case anthropic.TextBlock:
fmt.Printf("\nResponse: %s", v.Text)
}
}
}
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.messages.MessageCreateParams;
import com.anthropic.models.messages.Message;
import com.anthropic.models.messages.Model;
void main() {
AnthropicClient client = AnthropicOkHttpClient.fromEnv();
MessageCreateParams params = MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(16000L)
.enabledThinking(10000L)
.addUserMessage("Are there an infinite number of prime numbers such that n mod 4 == 3?")
.build();
Message response = client.messages().create(params);
response.content().forEach(block -> {
block.thinking().ifPresent(thinkingBlock ->
IO.println("\nThinking summary: " + thinkingBlock.thinking())
);
block.text().ifPresent(textBlock ->
IO.println("\nResponse: " + textBlock.text())
);
});
}
<?php
use Anthropic\Client;
$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));
$message = $client->messages->create(
maxTokens: 16000,
messages: [
[
'role' => 'user',
'content' => 'Are there an infinite number of prime numbers such that n mod 4 == 3?'
]
],
model: 'claude-sonnet-4-6',
thinking: ['type' => 'enabled', 'budget_tokens' => 10000],
);
foreach ($message->content as $block) {
if ($block->type === 'thinking') {
echo "\nThinking summary: " . $block->thinking;
} elseif ($block->type === 'text') {
echo "\nResponse: " . $block->text;
}
}
require "anthropic"
client = Anthropic::Client.new
message = client.messages.create(
model: "claude-sonnet-4-6",
max_tokens: 16000,
thinking: {
type: "enabled",
budget_tokens: 10000
},
messages: [
{
role: "user",
content: "Are there an infinite number of prime numbers such that n mod 4 == 3?"
}
]
)
message.content.each do |block|
case block.type
when :thinking
puts "\nThinking summary: #{block.thinking}"
when :text
puts "\nResponse: #{block.text}"
end
end
要开启扩展思考,需添加一个 thinking 对象,将 type 参数设置为 enabled,并将 budget_tokens 设置为扩展思考的指定 token 预算。对于 Claude Opus 4.6 和 Claude Sonnet 4.6,请改用 type: "adaptive"。详见自适应思考。虽然 type: "enabled" 配合 budget_tokens 在这些模型上仍可使用,但已被弃用,将在未来版本中移除。
budget_tokens 参数决定 Claude 在其内部推理过程中允许使用的最大 token 数。此限制适用于完整的思考 token,而非摘要输出。更大的预算可以通过对复杂问题进行更彻底的分析来提高响应质量,不过 Claude 可能不会使用分配的全部预算,尤其是在超过 32k 的范围。
Claude Mythos Preview、Claude Opus 4.7 和 Claude Opus 4.6 支持最多 128k 输出 token。Claude Sonnet 4.6 和 Claude Haiku 4.5 支持最多 64k。请参阅模型概览了解旧版模型的限制。在 Message Batches API 上,output-300k-2026-03-24 beta 头将 Opus 4.7、Opus 4.6 和 Sonnet 4.6 的输出限制提高到 300k。
budget_tokens 必须设置为小于 max_tokens 的值。但是,当使用交错思考与工具时,您可以超过此限制,因为 token 限制变为您的整个上下文窗口。由于 budget_tokens 必须小于 max_tokens,扩展思考不能与 max_tokens: 0(缓存预热)结合使用。
摘要思考
启用扩展思考后,Claude 4 模型的 Messages API 会返回 Claude 完整思考过程的摘要。摘要思考提供了扩展思考的全部智能优势,同时防止滥用。这是 Claude 4 模型在思考配置的 display 字段未设置或设置为 "summarized" 时的默认行为。在 Claude Opus 4.7 和 Claude Mythos Preview 上,display 默认为 "omitted",因此您必须显式设置 display: "summarized" 才能接收摘要思考。
以下是摘要思考的一些重要注意事项:
- 您需要为原始请求生成的完整思考 token 付费,而不是摘要 token。
- 计费的输出 token 数量将不匹配您在响应中看到的 token 数量。
- 在 Claude 4 模型上,思考输出的前几行更详细,提供对提示工程特别有用的详细推理。Claude Mythos Preview 从第一个 token 开始摘要,因此其思考块不会显示此详细的前导内容。
- 随着 Anthropic 不断改进扩展思考功能,摘要行为可能会发生变化。
- 摘要保留了 Claude 思考过程的关键思想,同时增加了最小的延迟,实现了可流式传输的用户体验。
- 摘要由与您请求中指定的模型不同的模型处理。思考模型不会看到摘要输出。
在极少数情况下,如果您需要访问 Claude 4 模型的完整思考输出,请联系 Anthropic 销售团队。
控制思考显示
思考配置中的 display 字段控制思考内容在 API 响应中的返回方式。它接受两个值:
"summarized":思考块包含摘要思考文本。详见摘要思考。这是 Claude Opus 4.6、Claude Sonnet 4.6 及更早 Claude 4 模型的默认值。"omitted":思考块返回时thinking字段为空。signature字段仍然携带加密的完整思考用于多轮连续性(参见思考加密)。这是 Claude Opus 4.7 和 Claude Mythos Preview 的默认值。
当您的应用程序不向用户展示思考内容时,设置 display: "omitted" 非常有用。主要好处是流式传输时更快地获得第一个文本 token: 服务器完全跳过流式传输思考 token,只传递签名,因此最终文本响应会更快开始流式传输。
以下是省略思考的一些重要注意事项:
- 您仍然需要为完整的思考 token 付费。省略减少的是延迟,而不是成本。
- 如果您在多轮对话中传回思考块,请原样传回。服务器会解密
signature以重建原始思考用于提示构建(参见保留思考块)。您在传回的省略块的thinking字段中放置的任何文本都会被忽略。 display与thinking.type: "disabled"不兼容(没有内容可显示)。- 当使用
thinking.type: "adaptive"且模型对简单请求跳过思考时,无论display设置如何都不会产生思考块。
无论 display 是 "summarized" 还是 "omitted",signature 字段都是相同的。支持在对话的各轮之间切换 display 值。
在 Claude Mythos Preview 上,display 默认为 "omitted"。本节中的示例显式传递 display,因此适用于所有模型,但在 Mythos Preview 上您可以不设置它并获得相同的行为。要在 Mythos Preview 上接收摘要思考,请显式设置 display: "summarized"。
从不向最终用户展示思考内容的自动化管道可以跳过通过网络接收思考 token 的开销。对延迟敏感的应用程序可以获得相同的推理质量,而无需等待思考文本在最终响应开始之前流式传输。
curl https://api.anthropic.com/v1/messages \
--header "x-api-key: $ANTHROPIC_API_KEY" \
--header "anthropic-version: 2023-06-01" \
--header "content-type: application/json" \
--data \
'{
"model": "claude-sonnet-4-6",
"max_tokens": 16000,
"thinking": {
"type": "enabled",
"budget_tokens": 10000,
"display": "omitted"
},
"messages": [
{
"role": "user",
"content": "What is 27 * 453?"
}
]
}'
ant messages create \
--model claude-sonnet-4-6 \
--max-tokens 16000 \
--transform content --format yaml \
--thinking '{type: enabled, budget_tokens: 10000, display: omitted}' \
--message '{role: user, content: "What is 27 * 453?"}'
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000,
"display": "omitted",
},
messages=[
{"role": "user", "content": "What is 27 * 453?"},
],
)
for block in response.content:
if block.type == "thinking":
if block.thinking:
print(f"Thinking: {block.thinking}")
else:
print("Thinking: [omitted]")
elif block.type == "text":
print(f"Response: {block.text}")
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 16000,
thinking: {
type: "enabled",
budget_tokens: 10000,
display: "omitted"
},
messages: [
{
role: "user",
content: "What is 27 * 453?"
}
]
});
for (const block of response.content) {
if (block.type === "thinking") {
if (block.thinking.length > 0) {
console.log(`Thinking: ${block.thinking}`);
} else {
console.log("Thinking: [omitted]");
}
} else if (block.type === "text") {
console.log(`Response: ${block.text}`);
}
}
using Anthropic;
using Anthropic.Models.Messages;
AnthropicClient client = new();
var message = await client.Messages.Create(new MessageCreateParams
{
Model = Model.ClaudeSonnet4_6,
MaxTokens = 16000,
Thinking = new ThinkingConfigEnabled
{
BudgetTokens = 10000,
Display = ThinkingConfigEnabledDisplay.Omitted
},
Messages =
[
new() { Role = Role.User, Content = "What is 27 * 453?" }
]
});
foreach (var block in message.Content)
{
if (block.TryPickThinking(out ThinkingBlock? thinking))
{
Console.WriteLine(string.IsNullOrEmpty(thinking.Thinking)
? "Thinking: [omitted]"
: {{CONTENT}}quot;Thinking: {thinking.Thinking}");
}
else if (block.TryPickText(out TextBlock? text))
{
Console.WriteLine({{CONTENT}}quot;Response: {text.Text}");
}
}
package main
import (
"cmp"
"context"
"fmt"
"log"
"github.com/anthropics/anthropic-sdk-go"
)
func main() {
client := anthropic.NewClient()
response, err := client.Messages.New(context.Background(), anthropic.MessageNewParams{
Model: anthropic.ModelClaudeSonnet4_6,
MaxTokens: 16000,
Thinking: anthropic.ThinkingConfigParamUnion{
OfEnabled: &anthropic.ThinkingConfigEnabledParam{
BudgetTokens: 10000,
Display: anthropic.ThinkingConfigEnabledDisplayOmitted,
},
},
Messages: []anthropic.MessageParam{
anthropic.NewUserMessage(anthropic.NewTextBlock("What is 27 * 453?")),
},
})
if err != nil {
log.Fatal(err)
}
for _, block := range response.Content {
switch v := block.AsAny().(type) {
case anthropic.ThinkingBlock:
fmt.Println("Thinking:", cmp.Or(v.Thinking, "[omitted]"))
case anthropic.TextBlock:
fmt.Println("Response:", v.Text)
}
}
}
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.messages.MessageCreateParams;
import com.anthropic.models.messages.Message;
import com.anthropic.models.messages.Model;
import com.anthropic.models.messages.ThinkingConfigEnabled;
void main() {
AnthropicClient client = AnthropicOkHttpClient.fromEnv();
MessageCreateParams params = MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(16000L)
.thinking(ThinkingConfigEnabled.builder()
.budgetTokens(10000L)
.display(ThinkingConfigEnabled.Display.OMITTED)
.build())
.addUserMessage("What is 27 * 453?")
.build();
Message message = client.messages().create(params);
message.content().forEach(block -> {
block.thinking().ifPresent(thinkingBlock -> {
if (thinkingBlock.thinking().isEmpty()) {
IO.println("Thinking: [omitted]");
} else {
IO.println("Thinking: " + thinkingBlock.thinking());
}
});
block.text().ifPresent(textBlock ->
IO.println("Response: " + textBlock.text())
);
});
}
<?php
use Anthropic\Client;
use Anthropic\Messages\TextBlock;
use Anthropic\Messages\ThinkingBlock;
use Anthropic\Messages\ThinkingConfigEnabled;
use Anthropic\Messages\ThinkingConfigEnabled\Display;
$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));
$response = $client->messages->create(
model: 'claude-sonnet-4-6',
maxTokens: 16000,
thinking: ThinkingConfigEnabled::with(
budgetTokens: 10000,
display: Display::OMITTED,
),
messages: [
['role' => 'user', 'content' => 'What is 27 * 453?'],
],
);
foreach ($response->content as $block) {
echo match (true) {
$block instanceof ThinkingBlock && $block->thinking === '' => "Thinking: [omitted]\n",
$block instanceof ThinkingBlock => "Thinking: {$block->thinking}\n",
$block instanceof TextBlock => "Response: {$block->text}\n",
default => '',
};
}
require "anthropic"
client = Anthropic::Client.new
response = client.messages.create(
model: "claude-sonnet-4-6",
max_tokens: 16000,
thinking: {
type: :enabled,
budget_tokens: 10000,
# The Ruby SDK uses `display_` (trailing underscore) to avoid
# shadowing Kernel#display; the wire field is still `display`.
display_: :omitted
},
messages: [{role: "user", content: "What is 27 * 453?"}]
)
response.content.each do |block|
case block.type
when :thinking
puts block.thinking.empty? ? "Thinking: [omitted]" : "Thinking: #{block.thinking}"
when :text
puts "Response: #{block.text}"
end
end
当设置 display: "omitted" 时,响应包含 thinking 字段为空的 thinking 块:
{
"content": [
{
"type": "thinking",
"thinking": "",
"signature": "EosnCkYICxIMMb3LzNrMu..."
},
{
"type": "text",
"text": "The answer is 12,231."
}
]
}
当使用 display: "omitted" 进行流式传输时,不会发出 thinking_delta 事件;事件序列请参见下方流式思考。
流式思考
您可以使用服务器发送事件(SSE)流式传输扩展思考响应。
当为扩展思考启用流式传输时,您通过 thinking_delta 事件接收思考内容。
当设置 display: "omitted" 时,不会发出 thinking_delta 事件。参见控制思考显示。
有关通过 Messages API 进行流式传输的更多文档,请参阅流式传输 Messages。
以下是如何处理带有思考的流式传输:
curl https://api.anthropic.com/v1/messages \
--header "x-api-key: $ANTHROPIC_API_KEY" \
--header "anthropic-version: 2023-06-01" \
--header "content-type: application/json" \
--data \
'{
"model": "claude-sonnet-4-6",
"max_tokens": 16000,
"stream": true,
"thinking": {
"type": "enabled",
"budget_tokens": 10000
},
"messages": [
{
"role": "user",
"content": "What is the greatest common divisor of 1071 and 462?"
}
]
}'
ant messages create --stream --format jsonl \
--model claude-sonnet-4-6 \
--max-tokens 16000 \
--thinking '{type: enabled, budget_tokens: 10000}' \
--message '{role: user, content: What is the greatest common divisor of 1071 and 462?}'
import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
messages=[
{
"role": "user",
"content": "What is the greatest common divisor of 1071 and 462?",
}
],
) as stream:
thinking_started = False
response_started = False
for event in stream:
if event.type == "content_block_start":
print(f"\nStarting {event.content_block.type} block...")
# Reset flags for each new block
thinking_started = False
response_started = False
elif event.type == "content_block_delta":
if event.delta.type == "thinking_delta":
if not thinking_started:
print("Thinking: ", end="", flush=True)
thinking_started = True
print(event.delta.thinking, end="", flush=True)
elif event.delta.type == "text_delta":
if not response_started:
print("Response: ", end="", flush=True)
response_started = True
print(event.delta.text, end="", flush=True)
elif event.type == "content_block_stop":
print("\nBlock complete.")
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const stream = await client.messages.stream({
model: "claude-sonnet-4-6",
max_tokens: 16000,
thinking: {
type: "enabled",
budget_tokens: 10000
},
messages: [
{
role: "user",
content: "What is the greatest common divisor of 1071 and 462?"
}
]
});
let thinkingStarted = false;
let responseStarted = false;
for await (const event of stream) {
if (event.type === "content_block_start") {
console.log(`\nStarting ${event.content_block.type} block...`);
// Reset flags for each new block
thinkingStarted = false;
responseStarted = false;
} else if (event.type === "content_block_delta") {
if (event.delta.type === "thinking_delta") {
if (!thinkingStarted) {
process.stdout.write("Thinking: ");
thinkingStarted = true;
}
process.stdout.write(event.delta.thinking);
} else if (event.delta.type === "text_delta") {
if (!responseStarted) {
process.stdout.write("Response: ");
responseStarted = true;
}
process.stdout.write(event.delta.text);
}
} else if (event.type === "content_block_stop") {
console.log("\nBlock complete.");
}
}
using Anthropic;
using Anthropic.Models.Messages;
AnthropicClient client = new();
var parameters = new MessageCreateParams
{
Model = Model.ClaudeSonnet4_6,
MaxTokens = 16000,
Thinking = new ThinkingConfigEnabled(budgetTokens: 10000),
Messages = [new() { Role = Role.User, Content = "What is the greatest common divisor of 1071 and 462?" }]
};
bool thinkingStarted = false;
bool responseStarted = false;
await foreach (var streamEvent in client.Messages.CreateStreaming(parameters))
{
if (streamEvent.TryPickContentBlockStart(out var blockStart))
{
Console.WriteLine({{CONTENT}}quot;\nStarting {blockStart.ContentBlock.Type} block...");
thinkingStarted = false;
responseStarted = false;
}
else if (streamEvent.TryPickContentBlockDelta(out var blockDelta))
{
if (blockDelta.Delta.TryPickThinking(out var thinkingDelta))
{
if (!thinkingStarted)
{
Console.Write("Thinking: ");
thinkingStarted = true;
}
Console.Write(thinkingDelta.Thinking);
}
else if (blockDelta.Delta.TryPickText(out var textDelta))
{
if (!responseStarted)
{
Console.Write("Response: ");
responseStarted = true;
}
Console.Write(textDelta.Text);
}
}
else if (streamEvent.TryPickContentBlockStop(out _))
{
Console.WriteLine("\nBlock complete.");
}
}
package main
import (
"context"
"fmt"
"log"
"github.com/anthropics/anthropic-sdk-go"
)
func main() {
client := anthropic.NewClient()
stream := client.Messages.NewStreaming(context.TODO(), anthropic.MessageNewParams{
Model: anthropic.ModelClaudeSonnet4_6,
MaxTokens: 16000,
Thinking: anthropic.ThinkingConfigParamOfEnabled(10000),
Messages: []anthropic.MessageParam{
anthropic.NewUserMessage(anthropic.NewTextBlock("What is the greatest common divisor of 1071 and 462?")),
},
})
thinkingStarted := false
responseStarted := false
for stream.Next() {
event := stream.Current()
switch eventVariant := event.AsAny().(type) {
case anthropic.ContentBlockStartEvent:
fmt.Printf("\nStarting %s block...\n", eventVariant.ContentBlock.Type)
thinkingStarted = false
responseStarted = false
case anthropic.ContentBlockDeltaEvent:
switch deltaVariant := eventVariant.Delta.AsAny().(type) {
case anthropic.ThinkingDelta:
if !thinkingStarted {
fmt.Print("Thinking: ")
thinkingStarted = true
}
fmt.Print(deltaVariant.Thinking)
case anthropic.TextDelta:
if !responseStarted {
fmt.Print("Response: ")
responseStarted = true
}
fmt.Print(deltaVariant.Text)
}
case anthropic.ContentBlockStopEvent:
fmt.Println("\nBlock complete.")
}
}
if err := stream.Err(); err != nil {
log.Fatal(err)
}
}
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.messages.MessageCreateParams;
import com.anthropic.models.messages.Model;
void main() {
AnthropicClient client = AnthropicOkHttpClient.fromEnv();
MessageCreateParams params = MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(16000L)
.enabledThinking(10000L)
.addUserMessage("What is the greatest common divisor of 1071 and 462?")
.build();
try (var streamResponse = client.messages().createStreaming(params)) {
streamResponse.stream().forEach(event -> {
event.contentBlockStart().ifPresent(startEvent ->
IO.println("\nStarting block...")
);
event.contentBlockDelta().ifPresent(deltaEvent -> {
deltaEvent.delta().thinking().ifPresent(td ->
IO.print(td.thinking())
);
deltaEvent.delta().text().ifPresent(td ->
IO.print(td.text())
);
});
event.contentBlockStop().ifPresent(stopEvent ->
IO.println("\nBlock complete.")
);
});
}
}
<?php
use Anthropic\Client;
$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));
$thinkingStarted = false;
$responseStarted = false;
$stream = $client->messages->createStream(
maxTokens: 16000,
messages: [
['role' => 'user', 'content' => 'What is the greatest common divisor of 1071 and 462?']
],
model: 'claude-sonnet-4-6',
thinking: ['type' => 'enabled', 'budget_tokens' => 10000],
);
foreach ($stream as $event) {
if ($event->type === 'content_block_start') {
echo "\nStarting {$event->contentBlock->type} block...\n";
$thinkingStarted = false;
$responseStarted = false;
} elseif ($event->type === 'content_block_delta') {
if ($event->delta->type === 'thinking_delta') {
if (!$thinkingStarted) {
echo "Thinking: ";
$thinkingStarted = true;
}
echo $event->delta->thinking;
} elseif ($event->delta->type === 'text_delta') {
if (!$responseStarted) {
echo "Response: ";
$responseStarted = true;
}
echo $event->delta->text;
}
} elseif ($event->type === 'content_block_stop') {
echo "\nBlock complete.\n";
}
}
require "anthropic"
client = Anthropic::Client.new
thinking_started = false
response_started = false
stream = client.messages.stream(
model: "claude-sonnet-4-6",
max_tokens: 16000,
thinking: {
type: "enabled",
budget_tokens: 10000
},
messages: [
{ role: "user", content: "What is the greatest common divisor of 1071 and 462?" }
]
)
stream.each do |event|
case event.type
when :content_block_start
puts "\nStarting #{event.content_block.type} block..."
thinking_started = false
response_started = false
when :content_block_delta
if event.delta.type == :thinking_delta
unless thinking_started
print "Thinking: "
thinking_started = true
end
print event.delta.thinking
elsif event.delta.type == :text_delta
unless response_started
print "Response: "
response_started = true
end
print event.delta.text
end
when :content_block_stop
puts "\nBlock complete."
end
end
流式输出示例:
event: message_start
data: {"type": "message_start", "message": {"id": "msg_01...", "type": "message", "role": "assistant", "content": [], "model": "claude-sonnet-4-6", "stop_reason": null, "stop_sequence": null}}
event: content_block_start
data: {"type": "content_block_start", "index": 0, "content_block": {"type": "thinking", "thinking": "", "signature": ""}}
event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "I need to find the GCD of 1071 and 462 using the Euclidean algorithm.\n\n1071 = 2 × 462 + 147"}}
event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "thinking_delta", "thinking": "\n462 = 3 × 147 + 21\n147 = 7 × 21 + 0\n\nSo GCD(1071, 462) = 21"}}
// Additional thinking deltas...
event: content_block_delta
data: {"type": "content_block_delta", "index": 0, "delta": {"type": "signature_delta", "signature": "EqQBCgIYAhIM1gbcDa9GJwZA2b3hGgxBdjrkzLoky3dl1pkiMOYds..."}}
event: content_block_stop
data: {"type": "content_block_stop", "index": 0}
event: content_block_start
data: {"type": "content_block_start", "index": 1, "content_block": {"type": "text", "text": ""}}
event: content_block_delta
data: {"type": "content_block_delta", "index": 1, "delta": {"type": "text_delta", "text": "The greatest common divisor of 1071 and 462 is **21**."}}
// Additional text deltas...
event: content_block_stop
data: {"type": "content_block_stop", "index": 1}
event: message_delta
data: {"type": "message_delta", "delta": {"stop_reason": "end_turn", "stop_sequence": null}}
event: message_stop
data: {"type": "message_stop"}
当设置 display: "omitted" 时,思考块打开,单个 signature_delta 到达,然后块关闭,没有任何 thinking_delta 事件。文本流式传输随即开始:
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"thinking","thinking":"","signature":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"signature_delta","signature":"EosnCkYICxIMMb3LzNrMu..."}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"text","text":""}}
当使用启用思考的流式传输时,您可能会注意到文本有时以较大的块交替以较小的、逐 token 的方式到达。这是预期行为,特别是对于思考内容。
流式传输系统需要批量处理内容以获得最佳性能,这可能导致这种"块状"传递模式,流式事件之间可能有延迟。
扩展思考与工具使用
扩展思考可以与工具使用一起使用,允许 Claude 推理工具选择和结果处理。
当将扩展思考与工具使用一起使用时,请注意以下限制:
-
工具选择限制:带思考的工具使用仅支持
tool_choice: {"type": "auto"}(默认)或tool_choice: {"type": "none"}。使用tool_choice: {"type": "any"}或tool_choice: {"type": "tool", "name": "..."}将导致错误,因为这些选项强制使用工具,与扩展思考不兼容。 -
保留思考块:在工具使用期间,您必须将最后一个助手消息的
thinking块传回 API。将完整的未修改块传回 API 以保持推理连续性。
在对话中切换思考模式
您不能在助手轮次中间切换思考,包括在工具使用循环期间。整个助手轮次应在单一思考模式下运行:
- 如果思考已启用,最终助手轮次应以思考块开始。
- 如果思考已禁用,最终助手轮次不应包含任何思考块
从模型的角度来看,工具使用循环是助手轮次的一部分。助手轮次在 Claude 完成其完整响应之前不会完成,这可能包括多次工具调用和结果。
例如,以下序列都是单个助手轮次的一部分:
User: "What's the weather in Paris?"
Assistant: [thinking] + [tool_use: get_weather]
User: [tool_result: "20°C, sunny"]
Assistant: [text: "The weather in Paris is 20°C and sunny"]
即使有多个 API 消息,工具使用循环在概念上是一个连续的助手响应的一部分。
优雅的思考降级
当发生轮次中思考冲突时(例如在工具使用循环期间切换思考开或关),API 会自动为该请求禁用思考。为了保持模型质量并保持在分布内,API 可能会:
- 当思考块会创建无效的轮次结构时,从对话中剥离思考块
- 当对话历史与启用思考不兼容时,为当前请求禁用思考
这意味着尝试在轮次中切换思考不会导致错误,但思考将被静默禁用。要确认思考是否处于活动状态,请检查响应中是否存在 thinking 块。
实用指南
最佳实践:在每个轮次开始时规划您的思考策略,而不是尝试在轮次中切换。
示例:在完成轮次后切换思考
User: "What's the weather?"
Assistant: [tool_use] (thinking disabled)
User: [tool_result]
Assistant: [text: "It's sunny"]
User: "What about tomorrow?"
Assistant: [thinking] + [text: "..."] (thinking enabled - new turn)
通过在切换思考之前完成助手轮次,您可以确保思考确实为新请求启用。
切换思考模式也会使消息历史的提示缓存失效。更多详情请参阅扩展思考与提示缓存部分。
示例:传递带有工具结果的思考块
以下是展示如何在提供工具结果时保留思考块的实用示例:
ant messages create --transform content <<'YAML'
model: claude-sonnet-4-6
max_tokens: 16000
thinking:
type: enabled
budget_tokens: 10000
tools:
- name: get_weather
description: Get current weather for a location
input_schema:
type: object
properties:
location:
type: string
required:
- location
messages:
- role: user
content: "What's the weather in Paris?"
YAML
import anthropic
client = anthropic.Anthropic()
weather_tool = {
"name": "get_weather",
"description": "Get current weather for a location",
"input_schema": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"],
},
}
# First request - Claude responds with thinking and tool request
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
tools=[weather_tool],
messages=[{"role": "user", "content": "What's the weather in Paris?"}],
)
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const weatherTool: Anthropic.Tool = {
name: "get_weather",
description: "Get current weather for a location",
input_schema: {
type: "object",
properties: {
location: { type: "string" }
},
required: ["location"]
}
};
// First request - Claude responds with thinking and tool request
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 16000,
thinking: {
type: "enabled",
budget_tokens: 10000
},
tools: [weatherTool],
messages: [{ role: "user", content: "What's the weather in Paris?" }]
});
using System.Text.Json;
using Anthropic;
using Anthropic.Models.Messages;
AnthropicClient client = new();
var weatherTool = new ToolUnion(new Tool()
{
Name = "get_weather",
Description = "Get current weather for a location",
InputSchema = new InputSchema()
{
Properties = new Dictionary<string, JsonElement>
{
["location"] = JsonSerializer.SerializeToElement(new { type = "string" }),
},
Required = ["location"],
},
});
var parameters = new MessageCreateParams
{
Model = Model.ClaudeSonnet4_6,
MaxTokens = 16000,
Thinking = new ThinkingConfigEnabled(budgetTokens: 10000),
Tools = [weatherTool],
Messages = [new() { Role = Role.User, Content = "What's the weather in Paris?" }]
};
var message = await client.Messages.Create(parameters);
Console.WriteLine(message);
package main
import (
"context"
"fmt"
"log"
"github.com/anthropics/anthropic-sdk-go"
)
func main() {
client := anthropic.NewClient()
weatherTool := anthropic.ToolUnionParam{
OfTool: &anthropic.ToolParam{
Name: "get_weather",
Description: anthropic.String("Get current weather for a location"),
InputSchema: anthropic.ToolInputSchemaParam{
Properties: map[string]any{
"location": map[string]any{
"type": "string",
},
},
Required: []string{"location"},
},
},
}
response, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
Model: anthropic.ModelClaudeSonnet4_6,
MaxTokens: 16000,
Thinking: anthropic.ThinkingConfigParamOfEnabled(10000),
Tools: []anthropic.ToolUnionParam{weatherTool},
Messages: []anthropic.MessageParam{
anthropic.NewUserMessage(anthropic.NewTextBlock("What's the weather in Paris?")),
},
})
if err != nil {
log.Fatal(err)
}
fmt.Println(response)
}
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.messages.MessageCreateParams;
import com.anthropic.models.messages.Message;
import com.anthropic.models.messages.Model;
import com.anthropic.models.messages.Tool;
import com.anthropic.core.JsonValue;
import java.util.List;
import java.util.Map;
void main() {
AnthropicClient client = AnthropicOkHttpClient.fromEnv();
MessageCreateParams params = MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(16000L)
.enabledThinking(10000L)
.addTool(Tool.builder()
.name("get_weather")
.description("Get current weather for a location")
.inputSchema(Tool.InputSchema.builder()
.properties(JsonValue.from(Map.of(
"location", Map.of("type", "string")
)))
.required(List.of("location"))
.build())
.build())
.addUserMessage("What's the weather in Paris?")
.build();
Message response = client.messages().create(params);
IO.println(response);
}
<?php
use Anthropic\Client;
$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));
$weatherTool = [
'name' => 'get_weather',
'description' => 'Get current weather for a location',
'input_schema' => [
'type' => 'object',
'properties' => [
'location' => ['type' => 'string']
],
'required' => ['location']
]
];
$message = $client->messages->create(
maxTokens: 16000,
messages: [
['role' => 'user', 'content' => "What's the weather in Paris?"]
],
model: 'claude-sonnet-4-6',
thinking: ['type' => 'enabled', 'budget_tokens' => 10000],
tools: [$weatherTool],
);
echo $message;
require "anthropic"
client = Anthropic::Client.new
weather_tool = {
name: "get_weather",
description: "Get current weather for a location",
input_schema: {
type: "object",
properties: {
location: { type: "string" }
},
required: ["location"]
}
}
message = client.messages.create(
model: "claude-sonnet-4-6",
max_tokens: 16000,
thinking: {
type: "enabled",
budget_tokens: 10000
},
tools: [weather_tool],
messages: [
{ role: "user", content: "What's the weather in Paris?" }
]
)
puts message
API 响应包含思考、文本和 tool_use 块:
{
"content": [
{
"type": "thinking",
"thinking": "The user wants to know the current weather in Paris. I have access to a function `get_weather`...",
"signature": "BDaL4VrbR2Oj0hO4XpJxT28J5TILnCrrUXoKiiNBZW9P+nr8XSj1zuZzAl4egiCCpQNvfyUuFFJP5CncdYZEQPPmLxYsNrcs...."
},
{
"type": "text",
"text": "I can help you get the current weather information for Paris. Let me check that for you"
},
{
"type": "tool_use",
"id": "toolu_01CswdEQBMshySk6Y9DFKrfq",
"name": "get_weather",
"input": {
"location": "Paris"
}
}
]
}
现在让我们继续对话并使用工具
# First turn: capture the assistant content array (thinking + tool_use,
# with signatures intact) as compact JSON.
ASSISTANT_CONTENT=$(ant messages create \
--transform content <<'YAML'
model: claude-sonnet-4-6
max_tokens: 16000
thinking:
type: enabled
budget_tokens: 10000
tools:
- name: get_weather
description: Get the current weather in a given location
input_schema:
type: object
properties:
location:
type: string
description: The city and state
required: [location]
messages:
- role: user
content: What's the weather in Paris?
YAML
)
TOOL_USE_ID=$(printf '%s' "$ASSISTANT_CONTENT" \
| grep -o 'toolu_[A-Za-z0-9]*')
# Second turn: pass the captured blocks back as the assistant message.
# The thinking block MUST accompany the tool_use block.
ant messages create <<YAML
model: claude-sonnet-4-6
max_tokens: 16000
thinking:
type: enabled
budget_tokens: 10000
tools:
- name: get_weather
description: Get the current weather in a given location
input_schema:
type: object
properties:
location:
type: string
description: The city and state
required: [location]
messages:
- role: user
content: What's the weather in Paris?
- role: assistant
content: $ASSISTANT_CONTENT
- role: user
content:
- type: tool_result
tool_use_id: $TOOL_USE_ID
content: "Current temperature: 88°F"
YAML
import anthropic
client = anthropic.Anthropic()
weather_tool = {
"name": "get_weather",
"description": "Get the current weather in a given location",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "The city and state"}
},
"required": ["location"],
},
}
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
tools=[weather_tool],
messages=[{"role": "user", "content": "What's the weather in Paris?"}],
)
# Extract thinking block and tool use block
thinking_block = next(
(block for block in response.content if block.type == "thinking"), None
)
tool_use_block = next(
(block for block in response.content if block.type == "tool_use"), None
)
# Call your actual weather API, here is where your actual API call would go
# Let's pretend this is what we get back
weather_data = {"temperature": 88}
# Second request - Include thinking block and tool result
# No new thinking blocks are generated in the response
continuation = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
tools=[weather_tool],
messages=[
{"role": "user", "content": "What's the weather in Paris?"},
# notice that the thinking_block is passed in as well as the tool_use_block
# if this is not passed in, an error is raised
{"role": "assistant", "content": [thinking_block, tool_use_block]},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": tool_use_block.id,
"content": f"Current temperature: {weather_data['temperature']}°F",
}
],
},
],
)
print(continuation)
// Extract thinking block and tool use block
const thinkingBlock = response.content.find(
(block): block is Anthropic.ThinkingBlock => block.type === "thinking"
);
const toolUseBlock = response.content.find(
(block): block is Anthropic.ToolUseBlock => block.type === "tool_use"
);
// Call your actual weather API, here is where your actual API call would go
// Let's pretend this is what we get back
const weatherData = { temperature: 88 };
if (thinkingBlock && toolUseBlock) {
// Second request - Include thinking block and tool result
// No new thinking blocks are generated in the response
const continuation = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 16000,
thinking: {
type: "enabled",
budget_tokens: 10000
},
tools: [weatherTool],
messages: [
{ role: "user", content: "What's the weather in Paris?" },
// notice that the thinkingBlock is passed in as well as the toolUseBlock
// if this is not passed in, an error is raised
{ role: "assistant", content: [thinkingBlock, toolUseBlock] },
{
role: "user",
content: [
{
type: "tool_result" as const,
tool_use_id: toolUseBlock.id,
content: `Current temperature: ${weatherData.temperature}°F`
}
]
}
]
});
console.log(continuation);
}
using System.Text.Json;
using Anthropic;
using Anthropic.Models.Messages;
AnthropicClient client = new();
var weatherTool = new ToolUnion(new Tool()
{
Name = "get_weather",
Description = "Get current weather for a location",
InputSchema = new InputSchema()
{
Properties = new Dictionary<string, JsonElement>
{
["location"] = JsonSerializer.SerializeToElement(new { type = "string", description = "City name" }),
},
Required = ["location"],
},
});
var parameters = new MessageCreateParams
{
Model = Model.ClaudeSonnet4_6,
MaxTokens = 16000,
Thinking = new ThinkingConfigEnabled(budgetTokens: 10000),
Tools = [weatherTool],
Messages = [
new() { Role = Role.User, Content = "What is the weather in Paris?" }
]
};
var response = await client.Messages.Create(parameters);
// Extract the tool_use block to get its ID for the tool result
ToolUseBlock? toolUseBlock = null;
foreach (var block in response.Content)
{
if (block.TryPickToolUse(out var toolUse))
{
toolUseBlock = toolUse;
}
}
var weatherData = new { temperature = 88 };
// Build continuation with tool result
var continuationParams = new MessageCreateParams
{
Model = Model.ClaudeSonnet4_6,
MaxTokens = 16000,
Thinking = new ThinkingConfigEnabled(budgetTokens: 10000),
Tools = [weatherTool],
Messages = [
new() { Role = Role.User, Content = "What is the weather in Paris?" },
// response.Content includes the thinking blocks; passing them back is required
new() { Role = Role.Assistant, Content = response.Content.Select(block => new ContentBlockParam(block.Json)).ToList() },
new() { Role = Role.User, Content = new MessageParamContent(new List<ContentBlockParam>
{
new ContentBlockParam(new ToolResultBlockParam()
{
ToolUseID = toolUseBlock?.ID ?? "",
Content = {{CONTENT}}quot;Current temperature: {weatherData.temperature}°F"
})
})}
]
};
var continuation = await client.Messages.Create(continuationParams);
Console.WriteLine(continuation);
package main
import (
"context"
"fmt"
"log"
"github.com/anthropics/anthropic-sdk-go"
)
func main() {
client := anthropic.NewClient()
weatherTool := anthropic.ToolUnionParam{
OfTool: &anthropic.ToolParam{
Name: "get_weather",
Description: anthropic.String("Get current weather for a location"),
InputSchema: anthropic.ToolInputSchemaParam{
Properties: map[string]any{
"location": map[string]any{
"type": "string",
"description": "City name",
},
},
Required: []string{"location"},
},
},
}
response, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
Model: anthropic.ModelClaudeSonnet4_6,
MaxTokens: 16000,
Thinking: anthropic.ThinkingConfigParamOfEnabled(10000),
Tools: []anthropic.ToolUnionParam{weatherTool},
Messages: []anthropic.MessageParam{
anthropic.NewUserMessage(anthropic.NewTextBlock("What is the weather in Paris?")),
},
})
if err != nil {
log.Fatal(err)
}
var toolUseBlock anthropic.ToolUseBlock
for _, block := range response.Content {
switch v := block.AsAny().(type) {
case anthropic.ToolUseBlock:
toolUseBlock = v
}
}
weatherData := map[string]int{"temperature": 88}
continuation, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
Model: anthropic.ModelClaudeSonnet4_6,
MaxTokens: 16000,
Thinking: anthropic.ThinkingConfigParamOfEnabled(10000),
Tools: []anthropic.ToolUnionParam{weatherTool},
Messages: []anthropic.MessageParam{
anthropic.NewUserMessage(anthropic.NewTextBlock("What is the weather in Paris?")),
response.ToParam(),
anthropic.NewUserMessage(
anthropic.NewToolResultBlock(toolUseBlock.ID, fmt.Sprintf("Current temperature: %d°F", weatherData["temperature"]), false),
),
},
})
if err != nil {
log.Fatal(err)
}
fmt.Println(continuation)
}
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.messages.ContentBlockParam;
import com.anthropic.models.messages.MessageCreateParams;
import com.anthropic.models.messages.Message;
import com.anthropic.models.messages.Model;
import com.anthropic.models.messages.Tool;
import com.anthropic.models.messages.ToolResultBlockParam;
import com.anthropic.models.messages.ToolUseBlock;
import com.anthropic.models.messages.ToolUseBlockParam;
import com.anthropic.models.messages.ThinkingBlock;
import com.anthropic.models.messages.ThinkingBlockParam;
import com.anthropic.core.JsonValue;
import java.util.List;
import java.util.Map;
void main() {
AnthropicClient client = AnthropicOkHttpClient.fromEnv();
Tool weatherTool = Tool.builder()
.name("get_weather")
.description("Get current weather for a location")
.inputSchema(Tool.InputSchema.builder()
.properties(JsonValue.from(Map.of(
"location", Map.of("type", "string", "description", "City name")
)))
.required(List.of("location"))
.build())
.build();
MessageCreateParams initialParams = MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(16000L)
.enabledThinking(10000L)
.addTool(weatherTool)
.addUserMessage("What is the weather in Paris?")
.build();
Message response = client.messages().create(initialParams);
ThinkingBlock thinkingBlock = null;
ToolUseBlock toolUseBlock = null;
for (var block : response.content()) {
if (block.thinking().isPresent()) {
thinkingBlock = block.thinking().get();
}
if (block.toolUse().isPresent()) {
toolUseBlock = block.toolUse().get();
}
}
int temperature = 88;
// Second request: pass back thinking block and tool result
MessageCreateParams continuationParams = MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(16000L)
.enabledThinking(10000L)
.addTool(weatherTool)
.addUserMessage("What is the weather in Paris?")
.addAssistantMessageOfBlockParams(List.of(
ContentBlockParam.ofThinking(ThinkingBlockParam.builder()
.thinking(thinkingBlock.thinking())
.signature(thinkingBlock.signature())
.build()),
ContentBlockParam.ofToolUse(ToolUseBlockParam.builder()
.id(toolUseBlock.id())
.name(toolUseBlock.name())
.input(toolUseBlock._input())
.build())
))
.addUserMessageOfBlockParams(List.of(
ContentBlockParam.ofToolResult(
ToolResultBlockParam.builder()
.toolUseId(toolUseBlock.id())
.content("Current temperature: " + temperature + "°F")
.build()
)
))
.build();
Message continuation = client.messages().create(continuationParams);
IO.println(continuation);
}
<?php
use Anthropic\Client;
$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));
$weatherTool = [
'name' => 'get_weather',
'description' => 'Get current weather for a location',
'input_schema' => [
'type' => 'object',
'properties' => [
'location' => [
'type' => 'string',
'description' => 'City name'
]
],
'required' => ['location']
]
];
$response = $client->messages->create(
maxTokens: 16000,
messages: [
['role' => 'user', 'content' => 'What is the weather in Paris?']
],
model: 'claude-sonnet-4-6',
thinking: ['type' => 'enabled', 'budget_tokens' => 10000],
tools: [$weatherTool],
);
$thinkingBlock = null;
$toolUseBlock = null;
foreach ($response->content as $block) {
if ($block->type === 'thinking') {
$thinkingBlock = $block;
}
if ($block->type === 'tool_use') {
$toolUseBlock = $block;
}
}
$weatherData = ['temperature' => 88];
$continuation = $client->messages->create(
maxTokens: 16000,
messages: [
['role' => 'user', 'content' => 'What is the weather in Paris?'],
['role' => 'assistant', 'content' => [$thinkingBlock, $toolUseBlock]],
['role' => 'user', 'content' => [
[
'type' => 'tool_result',
'tool_use_id' => $toolUseBlock->id,
'content' => "Current temperature: {$weatherData['temperature']}°F"
]
]]
],
model: 'claude-sonnet-4-6',
thinking: ['type' => 'enabled', 'budget_tokens' => 10000],
tools: [$weatherTool],
);
echo $continuation;
require "anthropic"
client = Anthropic::Client.new
weather_tool = {
name: "get_weather",
description: "Get current weather for a location",
input_schema: {
type: "object",
properties: {
location: { type: "string", description: "City name" }
},
required: ["location"]
}
}
response = client.messages.create(
model: "claude-sonnet-4-6",
max_tokens: 16000,
thinking: {
type: "enabled",
budget_tokens: 10000
},
tools: [weather_tool],
messages: [
{ role: "user", content: "What is the weather in Paris?" }
]
)
thinking_block = response.content.find { |block| block.type == :thinking }
tool_use_block = response.content.find { |block| block.type == :tool_use }
raise "No tool_use block found" unless tool_use_block
weather_data = { temperature: 88 }
continuation = client.messages.create(
model: "claude-sonnet-4-6",
max_tokens: 16000,
thinking: {
type: "enabled",
budget_tokens: 10000
},
tools: [weather_tool],
messages: [
{ role: "user", content: "What is the weather in Paris?" },
{ role: "assistant", content: [thinking_block, tool_use_block] },
{ role: "user", content: [
{
type: "tool_result",
tool_use_id: tool_use_block.id,
content: "Current temperature: #{weather_data[:temperature]}°F"
}
] }
]
)
puts continuation
API 现在仅包含文本响应
{
"content": [
{
"type": "text",
"text": "Currently in Paris, the temperature is 88°F (31°C)"
}
]
}
保留思考块
在工具使用期间,您必须将 thinking 块传回 API,并且必须将完整的未修改块传回 API。这对于保持模型的推理流和对话完整性至关重要。
虽然您可以省略先前 assistant 角色轮次中的 thinking 块,但对于任何多轮对话,始终将所有思考块传回 API。API 会:
- 自动过滤提供的思考块
- 使用保持模型推理所需的相关思考块
- 仅对显示给 Claude 的块收取输入 token 费用
保留哪些块取决于模型。详见按模型保留思考块了解各类模型的默认值。要覆盖默认值,请使用 clear_thinking_20251015 上下文编辑策略。
在对话中切换思考模式时,请记住整个助手轮次(包括工具使用循环)必须在单一思考模式下运行。更多详情请参阅在对话中切换思考模式。
当 Claude 调用工具时,它会暂停构建响应以等待外部信息。当工具结果返回时,Claude 继续构建该现有响应。这使得在工具使用期间保留思考块成为必要,原因如下:
-
推理连续性:思考块捕获了 Claude 导致工具请求的逐步推理。当您发布工具结果时,包含原始思考可确保 Claude 可以从中断的地方继续推理。
-
上下文维护:虽然工具结果在 API 结构中显示为用户消息,但它们是连续推理流的一部分。保留思考块可在多个 API 调用之间保持这种概念流。有关上下文管理的更多信息,请参阅上下文窗口指南。
重要:在提供 thinking 块时,连续 thinking 块的整个序列必须与模型在原始请求期间生成的输出匹配;您不能重新排列或修改这些块的序列。
交错思考
Claude 4 模型中带工具使用的扩展思考支持交错思考,这使 Claude 能够在工具调用之间进行思考,并在接收工具结果后进行更复杂的推理。
使用交错思考,Claude 可以:
- 在决定下一步之前推理工具调用的结果
- 在推理步骤之间链接多个工具调用
- 基于中间结果做出更细致的决策
模型支持:
- Claude Mythos Preview:交错思考自动发生。每个工具间的推理步骤都会进入思考块而不是纯文本,并且思考块默认跨轮次保留。不需要也不支持 beta 头。
- Claude Opus 4.7:使用自适应思考时自动启用交错思考(Opus 4.7 上唯一支持的思考模式)。不需要 beta 头。
- Claude Opus 4.6:使用自适应思考时自动启用交错思考。不需要 beta 头。
interleaved-thinking-2025-05-14beta 头在 Opus 4.6 上已弃用,如果包含则安全忽略。 - Claude Sonnet 4.6:使用自适应思考时自动启用交错思考(推荐)。
interleaved-thinking-2025-05-14beta 头配合手动扩展思考(thinking: {type: "enabled"})仍可使用但已被弃用。 - 其他 Claude 4 模型(Opus 4.5、Opus 4.1、Opus 4(已弃用)、Sonnet 4.5、Sonnet 4(已弃用)):在 API 请求中添加 beta 头
interleaved-thinking-2025-05-14以启用交错思考。
以下是交错思考的一些重要注意事项:
- 使用交错思考时,
budget_tokens可以超过max_tokens参数,因为它代表一个助手轮次内所有思考块的总预算。 - 交错思考仅支持通过 Messages API 使用的工具。
- Claude API 和 Claude Platform on AWS 接受对任何模型的
interleaved-thinking-2025-05-14请求而不返回错误。在不支持交错思考的模型上,该头被忽略。在 Claude Opus 4.7 和 Claude Opus 4.6 上,它已被弃用并安全忽略。在 Claude Mythos Preview 上,它不需要且安全忽略。 - 在合作伙伴运营的平台上(例如 Amazon Bedrock 和 Vertex AI),如果您将
interleaved-thinking-2025-05-14传递给除 Claude Opus 4.7、Claude Opus 4.6、Claude Sonnet 4.6、Claude Opus 4.5、Claude Opus 4.1、Opus 4(已弃用)、Sonnet 4.5 或 Sonnet 4(已弃用)之外的任何模型,您的请求将失败。
不使用交错思考的工具使用
不使用交错思考时,Claude 在助手轮次开始时思考一次。工具结果之后的后续响应在没有新思考块的情况下继续。
User: "What's the total revenue if we sold 150 units at $50 each,
and how does this compare to our average monthly revenue?"
Turn 1: [thinking] "I need to calculate 150 * $50, then check the database..."
[tool_use: calculator] { "expression": "150 * 50" }
↓ tool result: "7500"
Turn 2: [tool_use: database_query] { "query": "SELECT AVG(revenue)..." }
↑ no thinking block
↓ tool result: "5200"
Turn 3: [text] "The total revenue is $7,500, which is 44% above your
average monthly revenue of $5,200."
↑ no thinking block
使用交错思考的工具使用
启用交错思考后,Claude 可以在接收每个工具结果后进行思考,允许它在继续之前推理中间结果。
User: "What's the total revenue if we sold 150 units at $50 each,
and how does this compare to our average monthly revenue?"
Turn 1: [thinking] "I need to calculate 150 * $50 first..."
[tool_use: calculator] { "expression": "150 * 50" }
↓ tool result: "7500"
Turn 2: [thinking] "Got $7,500. Now I should query the database to compare..."
[tool_use: database_query] { "query": "SELECT AVG(revenue)..." }
↑ thinking after receiving calculator result
↓ tool result: "5200"
Turn 3: [thinking] "$7,500 vs $5,200 average - that's a 44% increase..."
[text] "The total revenue is $7,500, which is 44% above your
average monthly revenue of $5,200."
↑ thinking before final answer
扩展思考与提示缓存
带思考的提示缓存有几个重要注意事项:
扩展思考任务通常需要超过 5 分钟才能完成。考虑使用1 小时缓存持续时间以在较长的思考会话和多步骤工作流中保持缓存命中。
思考块上下文移除
- 在较早的 Opus/Sonnet 模型和所有 Haiku 模型上,先前轮次的思考块会从上下文中移除,这可能影响缓存断点。在 Opus 4.5+ 和 Sonnet 4.6+ 上,默认保留它们。
- 当继续带工具使用的对话时,思考块被缓存并在从缓存读取时计入输入 token
- 这产生了一个权衡:虽然思考块不会在视觉上消耗上下文窗口空间,但当被缓存时它们仍然计入您的输入 token 使用量
- 如果思考被禁用并且您在当前工具使用轮次中传递思考内容,思考内容将被剥离,思考将对该请求保持禁用
缓存失效模式
理解思考块缓存行为
当将扩展思考与工具使用一起使用时,思考块会表现出特定的缓存行为,影响 token 计数:
工作原理:
- 缓存仅在您发出包含工具结果的后续请求时发生
- 当发出后续请求时,先前的对话历史(包括思考块)可以被缓存
- 这些缓存的思考块在从缓存读取时计入您的使用指标中的输入 token
- 当包含非工具结果的用户块时:在 Opus 4.5+ 和 Sonnet 4.6+ 上,保留先前的思考块;在较早的 Opus/Sonnet 模型和所有 Haiku 模型上,所有先前的思考块被忽略并从上下文中剥离
详细示例流程:
请求 1:
User: "What's the weather in Paris?"
响应 1:
[thinking_block_1] + [tool_use block 1]
请求 2:
User: ["What's the weather in Paris?"],
Assistant: [thinking_block_1] + [tool_use block 1],
User: [tool_result_1, cache=True]
响应 2:
[thinking_block_2] + [text block 2]
请求 2 写入请求内容(不是响应)的缓存。缓存包括原始用户消息、第一个思考块、工具使用块和工具结果。
请求 3:
User: ["What's the weather in Paris?"],
Assistant: [thinking_block_1] + [tool_use block 1],
User: [tool_result_1, cache=True],
Assistant: [thinking_block_2] + [text block 2],
User: [Text response, cache=True]
对于 Opus 4.5+ 和 Sonnet 4.6+,默认保留所有先前的思考块。对于较早的 Opus/Sonnet 模型和所有 Haiku 模型,因为包含了非工具结果的用户块,所有先前的思考块被忽略并从上下文中剥离。此请求将按以下方式处理:
User: ["What's the weather in Paris?"],
Assistant: [tool_use block 1],
User: [tool_result_1, cache=True],
Assistant: [text block 2],
User: [Text response, cache=True]
关键要点:
- 此缓存行为自动发生,即使没有显式的
cache_control标记 - 无论使用常规思考还是交错思考,此行为都是一致的
系统提示缓存(思考更改时保留)
# Fetch ~10 kB of Pride and Prejudice for the cached system block
curl -s https://www.gutenberg.org/cache/epub/1342/pg1342.txt \
| head -c 10000 > pride.txt
# Emit a request body for the given thinking budget. Once CONTENT1
# is populated (after the first turn), the assistant reply and a
# follow-up user message are appended so the conversation grows.
build_body() {
cat <<YAML
model: claude-sonnet-4-6
max_tokens: 20000
thinking:
type: enabled
budget_tokens: $1
system:
- type: text
text: >-
You are an AI assistant that is tasked with literary analysis.
Analyze the following text carefully.
- type: text
text: "@./pride.txt"
cache_control:
type: ephemeral
messages:
- role: user
content: Analyze the tone of this passage.
YAML
if [[ -n "${CONTENT1:-}" ]]; then
printf ' - role: assistant\n content: %s\n' "$CONTENT1"
printf ' - role: user\n'
printf ' content: Analyze the characters in this passage.\n'
fi
}
# First request (budget 4000): establishes the cache. Capture usage
# and content as two jsonl lines so the reply can be fed forward.
printf 'First request - establishing cache\n'
{
read -r USAGE1
read -r CONTENT1
} < <(build_body 4000 \
| ant messages create --transform '[usage,content]' --format jsonl)
printf 'First response usage: %s\n' "$USAGE1"
# Second request: same budget, system-prompt cache hit expected.
printf '\nSecond request - same thinking parameters (cache hit expected)\n'
USAGE2=$(build_body 4000 \
| ant messages create --transform usage --format jsonl)
printf 'Second response usage: %s\n' "$USAGE2"
# Third request: budget changed to 8000. The cached system prompt
# still hits; only message-block caching is invalidated.
printf '\nThird request - different thinking parameters (cache miss for messages)\n'
USAGE3=$(build_body 8000 \
| ant messages create --transform usage --format jsonl)
printf 'Third response usage: %s\n' "$USAGE3"
from anthropic import Anthropic
import requests
from bs4 import BeautifulSoup
client = Anthropic()
def fetch_article_content(url):
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
# Remove script and style elements
for script in soup(["script", "style"]):
script.decompose()
# Get text
text = soup.get_text()
# Break into lines and remove leading and trailing space on each
lines = (line.strip() for line in text.splitlines())
# Break multi-headlines into a line each
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
# Drop blank lines
text = "\n".join(chunk for chunk in chunks if chunk)
return text
# Fetch the content of the article
book_url = "https://www.gutenberg.org/cache/epub/1342/pg1342.txt"
book_content = fetch_article_content(book_url)
# Use just enough text for caching (first few chapters)
LARGE_TEXT = book_content[:10000]
SYSTEM_PROMPT = [
{
"type": "text",
"text": "You are an AI assistant that is tasked with literary analysis. Analyze the following text carefully.",
},
{"type": "text", "text": LARGE_TEXT, "cache_control": {"type": "ephemeral"}},
]
MESSAGES = [{"role": "user", "content": "Analyze the tone of this passage."}]
# First request - establish cache
print("First request - establishing cache")
response1 = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=20000,
thinking={"type": "enabled", "budget_tokens": 4000},
system=SYSTEM_PROMPT,
messages=MESSAGES,
)
print(f"First response usage: {response1.usage}")
MESSAGES.append({"role": "assistant", "content": response1.content})
MESSAGES.append({"role": "user", "content": "Analyze the characters in this passage."})
# Second request - same thinking parameters (cache hit expected)
print("\nSecond request - same thinking parameters (cache hit expected)")
response2 = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=20000,
thinking={"type": "enabled", "budget_tokens": 4000},
system=SYSTEM_PROMPT,
messages=MESSAGES,
)
print(f"Second response usage: {response2.usage}")
# Third request - different thinking parameters (cache miss for messages)
print("\nThird request - different thinking parameters (cache miss for messages)")
response3 = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=20000,
thinking={
"type": "enabled",
"budget_tokens": 8000, # Changed thinking budget
},
system=SYSTEM_PROMPT, # System prompt remains cached
messages=MESSAGES, # Messages cache is invalidated
)
print(f"Third response usage: {response3.usage}")
import Anthropic from "@anthropic-ai/sdk";
import axios from "axios";
import * as cheerio from "cheerio";
const client = new Anthropic();
async function fetchArticleContent(url: string): Promise<string> {
const response = await axios.get(url);
const $ = cheerio.load(response.data);
$("script, style").remove();
let text = $.text();
const lines = text.split("\n").map((line) => line.trim());
text = lines.filter((line) => line.length > 0).join("\n");
return text;
}
const bookUrl = "https://www.gutenberg.org/cache/epub/1342/pg1342.txt";
const bookContent = await fetchArticleContent(bookUrl);
const LARGE_TEXT = bookContent.slice(0, 10000);
const SYSTEM_PROMPT: Anthropic.TextBlockParam[] = [
{
type: "text",
text: "You are an AI assistant that is tasked with literary analysis. Analyze the following text carefully."
},
{
type: "text",
text: LARGE_TEXT,
cache_control: { type: "ephemeral" }
}
];
const messages: Anthropic.MessageParam[] = [
{ role: "user", content: "Analyze the tone of this passage." }
];
// First request - establish cache
console.log("First request - establishing cache");
const response1 = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 20000,
thinking: { type: "enabled", budget_tokens: 4000 },
system: SYSTEM_PROMPT,
messages
});
console.log(`First response usage: ${JSON.stringify(response1.usage)}`);
messages.push({
role: "assistant",
content: response1.content
});
messages.push({
role: "user",
content: "Analyze the characters in this passage."
});
// Second request - same thinking parameters (cache hit expected)
console.log("\nSecond request - same thinking parameters (cache hit expected)");
const response2 = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 20000,
thinking: { type: "enabled", budget_tokens: 4000 },
system: SYSTEM_PROMPT,
messages
});
console.log(`Second response usage: ${JSON.stringify(response2.usage)}`);
// Third request - different thinking parameters (cache miss for messages)
console.log("\nThird request - different thinking parameters (cache miss for messages)");
const response3 = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 20000,
thinking: { type: "enabled", budget_tokens: 8000 },
system: SYSTEM_PROMPT,
messages
});
console.log(`Third response usage: ${JSON.stringify(response3.usage)}`);
using System.Net.Http;
using Anthropic;
using Anthropic.Models.Messages;
AnthropicClient client = new();
// Fetch book content
using var httpClient = new HttpClient();
var bookContent = await httpClient.GetStringAsync("https://www.gutenberg.org/cache/epub/1342/pg1342.txt");
var largeText = bookContent.Substring(0, Math.Min(10000, bookContent.Length));
var systemPrompt = new MessageCreateParamsSystem(new List<TextBlockParam>
{
new TextBlockParam()
{
Text = "You are an AI assistant that is tasked with literary analysis. Analyze the following text carefully."
},
new TextBlockParam()
{
Text = largeText,
CacheControl = new CacheControlEphemeral(),
},
});
var messages = new List<MessageParam>
{
new() { Role = Role.User, Content = "Analyze the tone of this passage." }
};
// First request - establish cache
Console.WriteLine("First request - establishing cache");
var parameters1 = new MessageCreateParams
{
Model = Model.ClaudeSonnet4_6,
MaxTokens = 20000,
Thinking = new ThinkingConfigEnabled(budgetTokens: 4000),
System = systemPrompt,
Messages = messages
};
var response1 = await client.Messages.Create(parameters1);
Console.WriteLine({{CONTENT}}quot;First response usage: {response1.Usage}");
messages.Add(new() { Role = Role.Assistant, Content = response1.Content.Select(block => new ContentBlockParam(block.Json)).ToList() });
messages.Add(new() { Role = Role.User, Content = "Analyze the characters in this passage." });
// Second request - same thinking parameters (cache hit expected)
Console.WriteLine("\nSecond request - same thinking parameters (cache hit expected)");
var parameters2 = new MessageCreateParams
{
Model = Model.ClaudeSonnet4_6,
MaxTokens = 20000,
Thinking = new ThinkingConfigEnabled(budgetTokens: 4000),
System = systemPrompt,
Messages = messages
};
var response2 = await client.Messages.Create(parameters2);
Console.WriteLine({{CONTENT}}quot;Second response usage: {response2.Usage}");
// Third request - different thinking parameters (cache miss for messages)
Console.WriteLine("\nThird request - different thinking parameters (cache miss for messages)");
var parameters3 = new MessageCreateParams
{
Model = Model.ClaudeSonnet4_6,
MaxTokens = 20000,
Thinking = new ThinkingConfigEnabled(budgetTokens: 8000),
System = systemPrompt,
Messages = messages
};
var response3 = await client.Messages.Create(parameters3);
Console.WriteLine({{CONTENT}}quot;Third response usage: {response3.Usage}");
package main
import (
"context"
"fmt"
"io"
"log"
"net/http"
"github.com/anthropics/anthropic-sdk-go"
)
func main() {
client := anthropic.NewClient()
// Fetch book content
resp, err := http.Get("https://www.gutenberg.org/cache/epub/1342/pg1342.txt")
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
log.Fatal(err)
}
largeText := string(body)
if len(largeText) > 10000 {
largeText = largeText[:10000]
}
systemPrompt := []anthropic.TextBlockParam{
{Text: "You are an AI assistant that is tasked with literary analysis. Analyze the following text carefully."},
{
Text: largeText,
CacheControl: anthropic.NewCacheControlEphemeralParam(),
},
}
messages := []anthropic.MessageParam{
anthropic.NewUserMessage(anthropic.NewTextBlock("Analyze the tone of this passage.")),
}
// First request - establish cache
fmt.Println("First request - establishing cache")
response1, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
Model: anthropic.ModelClaudeSonnet4_6,
MaxTokens: 20000,
Thinking: anthropic.ThinkingConfigParamOfEnabled(4000),
System: systemPrompt,
Messages: messages,
})
if err != nil {
log.Fatal(err)
}
fmt.Printf("First response usage: %s\n", response1.Usage.RawJSON())
messages = append(messages, response1.ToParam())
messages = append(messages, anthropic.NewUserMessage(anthropic.NewTextBlock("Analyze the characters in this passage.")))
// Second request - same thinking parameters (cache hit expected)
fmt.Println("\nSecond request - same thinking parameters (cache hit expected)")
response2, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
Model: anthropic.ModelClaudeSonnet4_6,
MaxTokens: 20000,
Thinking: anthropic.ThinkingConfigParamOfEnabled(4000),
System: systemPrompt,
Messages: messages,
})
if err != nil {
log.Fatal(err)
}
fmt.Printf("Second response usage: %s\n", response2.Usage.RawJSON())
// Third request - different thinking parameters (cache miss for messages)
fmt.Println("\nThird request - different thinking parameters (cache miss for messages)")
response3, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
Model: anthropic.ModelClaudeSonnet4_6,
MaxTokens: 20000,
Thinking: anthropic.ThinkingConfigParamOfEnabled(8000),
System: systemPrompt,
Messages: messages,
})
if err != nil {
log.Fatal(err)
}
fmt.Printf("Third response usage: %s\n", response3.Usage.RawJSON())
}
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.messages.CacheControlEphemeral;
import com.anthropic.models.messages.MessageCreateParams;
import com.anthropic.models.messages.Message;
import com.anthropic.models.messages.Model;
import com.anthropic.models.messages.TextBlockParam;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.util.List;
void main() throws Exception {
AnthropicClient client = AnthropicOkHttpClient.fromEnv();
// Fetch book content
HttpClient httpClient = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("https://www.gutenberg.org/cache/epub/1342/pg1342.txt"))
.build();
HttpResponse<String> response = httpClient.send(request, HttpResponse.BodyHandlers.ofString());
String bookContent = response.body();
String largeText = bookContent.substring(0, Math.min(10000, bookContent.length()));
List<TextBlockParam> systemPrompt = List.of(
TextBlockParam.builder()
.text("You are an AI assistant that is tasked with literary analysis. Analyze the following text carefully.")
.build(),
TextBlockParam.builder()
.text(largeText)
.cacheControl(CacheControlEphemeral.builder().build())
.build()
);
// First request - establish cache
IO.println("First request - establishing cache");
MessageCreateParams params1 = MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(20000L)
.enabledThinking(4000L)
.systemOfTextBlockParams(systemPrompt)
.addUserMessage("Analyze the tone of this passage.")
.build();
Message response1 = client.messages().create(params1);
IO.println("First response usage: " + response1.usage());
// Second request - same thinking parameters (cache hit expected)
IO.println("\nSecond request - same thinking parameters (cache hit expected)");
MessageCreateParams params2 = MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(20000L)
.enabledThinking(4000L)
.systemOfTextBlockParams(systemPrompt)
.addUserMessage("Analyze the tone of this passage.")
.addAssistantMessageOfBlockParams(response1.content().stream()
.map(block -> block.toParam())
.collect(java.util.stream.Collectors.toList()))
.addUserMessage("Analyze the characters in this passage.")
.build();
Message response2 = client.messages().create(params2);
IO.println("Second response usage: " + response2.usage());
// Third request - different thinking parameters (cache miss for messages)
IO.println("\nThird request - different thinking parameters (cache miss for messages)");
MessageCreateParams params3 = MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(20000L)
.enabledThinking(8000L)
.systemOfTextBlockParams(systemPrompt)
.addUserMessage("Analyze the tone of this passage.")
.addAssistantMessageOfBlockParams(response1.content().stream()
.map(block -> block.toParam())
.collect(java.util.stream.Collectors.toList()))
.addUserMessage("Analyze the characters in this passage.")
.build();
Message response3 = client.messages().create(params3);
IO.println("Third response usage: " + response3.usage());
}
<?php
use Anthropic\Client;
$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));
// Fetch book content
$bookContent = file_get_contents("https://www.gutenberg.org/cache/epub/1342/pg1342.txt");
$largeText = substr($bookContent, 0, 10000);
$systemPrompt = [
[
'type' => 'text',
'text' => 'You are an AI assistant that is tasked with literary analysis. Analyze the following text carefully.'
],
[
'type' => 'text',
'text' => $largeText,
'cache_control' => ['type' => 'ephemeral']
]
];
$messages = [
['role' => 'user', 'content' => 'Analyze the tone of this passage.']
];
// First request - establish cache
echo "First request - establishing cache\n";
$response1 = $client->messages->create(
maxTokens: 20000,
messages: $messages,
model: 'claude-sonnet-4-6',
system: $systemPrompt,
thinking: ['type' => 'enabled', 'budget_tokens' => 4000],
);
echo "First response usage: " . json_encode($response1->usage) . "\n";
$messages[] = ['role' => 'assistant', 'content' => $response1->content];
$messages[] = ['role' => 'user', 'content' => 'Analyze the characters in this passage.'];
// Second request - same thinking parameters (cache hit expected)
echo "\nSecond request - same thinking parameters (cache hit expected)\n";
$response2 = $client->messages->create(
maxTokens: 20000,
messages: $messages,
model: 'claude-sonnet-4-6',
system: $systemPrompt,
thinking: ['type' => 'enabled', 'budget_tokens' => 4000],
);
echo "Second response usage: " . json_encode($response2->usage) . "\n";
// Third request - different thinking parameters (cache miss for messages)
echo "\nThird request - different thinking parameters (cache miss for messages)\n";
$response3 = $client->messages->create(
maxTokens: 20000,
messages: $messages,
model: 'claude-sonnet-4-6',
system: $systemPrompt,
thinking: ['type' => 'enabled', 'budget_tokens' => 8000],
);
echo "Third response usage: " . json_encode($response3->usage) . "\n";
require "anthropic"
require "net/http"
require "uri"
client = Anthropic::Client.new
# Fetch book content
uri = URI("https://www.gutenberg.org/cache/epub/1342/pg1342.txt")
response = Net::HTTP.get_response(uri)
book_content = response.body
large_text = book_content[0...10000]
system_prompt = [
{
type: "text",
text: "You are an AI assistant that is tasked with literary analysis. Analyze the following text carefully."
},
{
type: "text",
text: large_text,
cache_control: { type: "ephemeral" }
}
]
messages = [
{ role: "user", content: "Analyze the tone of this passage." }
]
# First request - establish cache
puts "First request - establishing cache"
response1 = client.messages.create(
model: "claude-sonnet-4-6",
max_tokens: 20000,
thinking: {
type: "enabled",
budget_tokens: 4000
},
system: system_prompt,
messages: messages
)
puts "First response usage: #{response1.usage}"
messages << { role: "assistant", content: response1.content }
messages << { role: "user", content: "Analyze the characters in this passage." }
# Second request - same thinking parameters (cache hit expected)
puts "\nSecond request - same thinking parameters (cache hit expected)"
response2 = client.messages.create(
model: "claude-sonnet-4-6",
max_tokens: 20000,
thinking: {
type: "enabled",
budget_tokens: 4000
},
system: system_prompt,
messages: messages
)
puts "Second response usage: #{response2.usage}"
# Third request - different thinking parameters (cache miss for messages)
puts "\nThird request - different thinking parameters (cache miss for messages)"
response3 = client.messages.create(
model: "claude-sonnet-4-6",
max_tokens: 20000,
thinking: {
type: "enabled",
budget_tokens: 8000
},
system: system_prompt,
messages: messages
)
puts "Third response usage: #{response3.usage}"
消息缓存(思考更改时失效)
# Fetch the first ~10 kB of Pride and Prejudice for the cached prefix
curl -sL 'https://www.gutenberg.org/cache/epub/1342/pg1342.txt' \
| head -c 10000 > book.txt
# Call 1: thinking budget 4000, writes the cache
USAGE=$(ant messages create \
--model claude-sonnet-4-6 --max-tokens 20000 \
--transform usage <<'YAML'
thinking:
type: enabled
budget_tokens: 4000
messages:
- role: user
content:
- type: text
text: "@./book.txt"
cache_control:
type: ephemeral
- type: text
text: "Give a one-sentence summary of this passage."
YAML
)
printf 'Call 1 (budget 4000):\n%s\n\n' "$USAGE"
# Call 2: same budget, conversation extended; expect cache HIT
USAGE=$(ant messages create \
--model claude-sonnet-4-6 --max-tokens 20000 \
--transform usage <<'YAML'
thinking:
type: enabled
budget_tokens: 4000
messages:
- role: user
content:
- type: text
text: "@./book.txt"
cache_control:
type: ephemeral
- type: text
text: "Give a one-sentence summary of this passage."
- role: assistant
content: "It opens Pride and Prejudice with the Bennet family."
- role: user
content: "Who is the protagonist?"
YAML
)
printf 'Call 2 (budget 4000):\n%s\n\n' "$USAGE"
# Call 3: budget changed to 8000; cache MISS even though prefix is identical
USAGE=$(ant messages create \
--model claude-sonnet-4-6 --max-tokens 20000 \
--transform usage <<'YAML'
thinking:
type: enabled
budget_tokens: 8000
messages:
- role: user
content:
- type: text
text: "@./book.txt"
cache_control:
type: ephemeral
- type: text
text: "Give a one-sentence summary of this passage."
- role: assistant
content: "It opens Pride and Prejudice with the Bennet family."
- role: user
content: "Who is the protagonist?"
- role: assistant
content: "Elizabeth Bennet is the protagonist."
- role: user
content: "What era is the story set in?"
YAML
)
printf 'Call 3 (budget 8000):\n%s\n' "$USAGE"
from anthropic import Anthropic
import requests
from bs4 import BeautifulSoup
client = Anthropic()
def fetch_article_content(url):
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
# Remove script and style elements
for script in soup(["script", "style"]):
script.decompose()
# Get text
text = soup.get_text()
# Break into lines and remove leading and trailing space on each
lines = (line.strip() for line in text.splitlines())
# Break multi-headlines into a line each
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
# Drop blank lines
text = "\n".join(chunk for chunk in chunks if chunk)
return text
# Fetch the content of the article
book_url = "https://www.gutenberg.org/cache/epub/1342/pg1342.txt"
book_content = fetch_article_content(book_url)
# Use just enough text for caching (first few chapters)
LARGE_TEXT = book_content[:10000]
# No system prompt - caching in messages instead
MESSAGES = [
{
"role": "user",
"content": [
{
"type": "text",
"text": LARGE_TEXT,
"cache_control": {"type": "ephemeral"},
},
{"type": "text", "text": "Analyze the tone of this passage."},
],
}
]
# First request - establish cache
print("First request - establishing cache")
response1 = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=20000,
thinking={"type": "enabled", "budget_tokens": 4000},
messages=MESSAGES,
)
print(f"First response usage: {response1.usage}")
MESSAGES.append({"role": "assistant", "content": response1.content})
MESSAGES.append({"role": "user", "content": "Analyze the characters in this passage."})
# Second request - same thinking parameters (cache hit expected)
print("\nSecond request - same thinking parameters (cache hit expected)")
response2 = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=20000,
thinking={
"type": "enabled",
"budget_tokens": 4000, # Same thinking budget
},
messages=MESSAGES,
)
print(f"Second response usage: {response2.usage}")
MESSAGES.append({"role": "assistant", "content": response2.content})
MESSAGES.append({"role": "user", "content": "Analyze the setting in this passage."})
# Third request - different thinking budget (cache miss expected)
print("\nThird request - different thinking budget (cache miss expected)")
response3 = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=20000,
thinking={
"type": "enabled",
"budget_tokens": 8000, # Different thinking budget breaks cache
},
messages=MESSAGES,
)
print(f"Third response usage: {response3.usage}")
import Anthropic from "@anthropic-ai/sdk";
import axios from "axios";
import * as cheerio from "cheerio";
const client = new Anthropic();
async function fetchArticleContent(url: string): Promise<string> {
const response = await axios.get(url);
const $ = cheerio.load(response.data);
// Remove script and style elements
$("script, style").remove();
// Get text
let text = $.text();
// Clean up text (break into lines, remove whitespace)
const lines = text.split("\n").map((line) => line.trim());
const chunks = lines.flatMap((line) => line.split(" ").map((phrase) => phrase.trim()));
text = chunks.filter((chunk) => chunk).join("\n");
return text;
}
const bookUrl = "https://www.gutenberg.org/cache/epub/1342/pg1342.txt";
const bookContent = await fetchArticleContent(bookUrl);
const LARGE_TEXT = bookContent.substring(0, 10000);
// No system prompt - caching in messages instead
const messages: Anthropic.MessageParam[] = [
{
role: "user",
content: [
{
type: "text",
text: LARGE_TEXT,
cache_control: { type: "ephemeral" }
},
{
type: "text",
text: "Analyze the tone of this passage."
}
]
}
];
// First request - establish cache
console.log("First request - establishing cache");
const response1 = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 20000,
thinking: { type: "enabled", budget_tokens: 4000 },
messages
});
console.log("First response usage: ", response1.usage);
messages.push(
{ role: "assistant", content: response1.content },
{ role: "user", content: "Analyze the characters in this passage." }
);
// Second request - same thinking parameters (cache hit expected)
console.log("\nSecond request - same thinking parameters (cache hit expected)");
const response2 = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 20000,
thinking: { type: "enabled", budget_tokens: 4000 },
messages
});
console.log("Second response usage: ", response2.usage);
messages.push(
{ role: "assistant", content: response2.content },
{ role: "user", content: "Analyze the setting in this passage." }
);
// Third request - different thinking budget (cache miss expected)
console.log("\nThird request - different thinking budget (cache miss expected)");
const response3 = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 20000,
thinking: { type: "enabled", budget_tokens: 8000 },
messages
});
console.log("Third response usage: ", response3.usage);
using System.Net.Http;
using Anthropic;
using Anthropic.Models.Messages;
AnthropicClient client = new();
string bookUrl = "https://www.gutenberg.org/cache/epub/1342/pg1342.txt";
string bookContent = await FetchArticleContent(bookUrl);
string largeText = bookContent.Substring(0, Math.Min(10000, bookContent.Length));
Console.WriteLine("First request - establishing cache");
var parameters1 = new MessageCreateParams
{
Model = Model.ClaudeSonnet4_6,
MaxTokens = 20000,
Thinking = new ThinkingConfigEnabled(budgetTokens: 4000),
Messages =
[
new()
{
Role = Role.User,
Content = new MessageParamContent(new List<ContentBlockParam>
{
new ContentBlockParam(new TextBlockParam()
{
Text = largeText,
CacheControl = new CacheControlEphemeral(),
}),
new ContentBlockParam(new TextBlockParam()
{
Text = "Analyze the tone of this passage."
}),
})
}
]
};
var response1 = await client.Messages.Create(parameters1);
Console.WriteLine({{CONTENT}}quot;First response usage: {response1.Usage}");
Console.WriteLine("\nSecond request - same thinking parameters (cache hit expected)");
var parameters2 = new MessageCreateParams
{
Model = Model.ClaudeSonnet4_6,
MaxTokens = 20000,
Thinking = new ThinkingConfigEnabled(budgetTokens: 4000),
Messages =
[
new()
{
Role = Role.User,
Content = new MessageParamContent(new List<ContentBlockParam>
{
new ContentBlockParam(new TextBlockParam()
{
Text = largeText,
CacheControl = new CacheControlEphemeral(),
}),
new ContentBlockParam(new TextBlockParam()
{
Text = "Analyze the tone of this passage."
}),
})
},
new()
{
Role = Role.Assistant,
Content = response1.Content.Select(block => new ContentBlockParam(block.Json)).ToList()
},
new()
{
Role = Role.User,
Content = "Analyze the characters in this passage."
}
]
};
var response2 = await client.Messages.Create(parameters2);
Console.WriteLine({{CONTENT}}quot;Second response usage: {response2.Usage}");
Console.WriteLine("\nThird request - different thinking budget (cache miss expected)");
var parameters3 = new MessageCreateParams
{
Model = Model.ClaudeSonnet4_6,
MaxTokens = 20000,
Thinking = new ThinkingConfigEnabled(budgetTokens: 8000),
Messages =
[
new()
{
Role = Role.User,
Content = new MessageParamContent(new List<ContentBlockParam>
{
new ContentBlockParam(new TextBlockParam()
{
Text = largeText,
CacheControl = new CacheControlEphemeral(),
}),
new ContentBlockParam(new TextBlockParam()
{
Text = "Analyze the tone of this passage."
}),
})
},
new()
{
Role = Role.Assistant,
Content = response1.Content.Select(block => new ContentBlockParam(block.Json)).ToList()
},
new()
{
Role = Role.User,
Content = "Analyze the characters in this passage."
},
new()
{
Role = Role.Assistant,
Content = response2.Content.Select(block => new ContentBlockParam(block.Json)).ToList()
},
new()
{
Role = Role.User,
Content = "Analyze the setting in this passage."
}
]
};
var response3 = await client.Messages.Create(parameters3);
Console.WriteLine({{CONTENT}}quot;Third response usage: {response3.Usage}");
static async Task<string> FetchArticleContent(string url)
{
using HttpClient httpClient = new();
string content = await httpClient.GetStringAsync(url);
return content;
}
package main
import (
"context"
"fmt"
"io"
"log"
"net/http"
"strings"
"github.com/anthropics/anthropic-sdk-go"
)
func fetchArticleContent(url string) (string, error) {
resp, err := http.Get(url)
if err != nil {
return "", err
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
return "", err
}
text := string(body)
lines := strings.Split(text, "\n")
var cleanedLines []string
for _, line := range lines {
trimmed := strings.TrimSpace(line)
if trimmed != "" {
cleanedLines = append(cleanedLines, trimmed)
}
}
return strings.Join(cleanedLines, "\n"), nil
}
func main() {
client := anthropic.NewClient()
bookURL := "https://www.gutenberg.org/cache/epub/1342/pg1342.txt"
bookContent, err := fetchArticleContent(bookURL)
if err != nil {
log.Fatal(err)
}
largeText := bookContent
if len(largeText) > 10000 {
largeText = largeText[:10000]
}
// No system prompt - caching in messages instead
messages := []anthropic.MessageParam{
anthropic.NewUserMessage(
anthropic.ContentBlockParamUnion{OfText: &anthropic.TextBlockParam{
Text: largeText,
CacheControl: anthropic.NewCacheControlEphemeralParam(),
}},
anthropic.NewTextBlock("Analyze the tone of this passage."),
),
}
// First request - establish cache
fmt.Println("First request - establishing cache")
response1, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
Model: anthropic.ModelClaudeSonnet4_6,
MaxTokens: 20000,
Thinking: anthropic.ThinkingConfigParamOfEnabled(4000),
Messages: messages,
})
if err != nil {
log.Fatal(err)
}
fmt.Printf("First response usage: %s\n", response1.Usage.RawJSON())
messages = append(messages, response1.ToParam())
messages = append(messages, anthropic.NewUserMessage(anthropic.NewTextBlock("Analyze the characters in this passage.")))
// Second request - same thinking parameters (cache hit expected)
fmt.Println("\nSecond request - same thinking parameters (cache hit expected)")
response2, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
Model: anthropic.ModelClaudeSonnet4_6,
MaxTokens: 20000,
Thinking: anthropic.ThinkingConfigParamOfEnabled(4000),
Messages: messages,
})
if err != nil {
log.Fatal(err)
}
fmt.Printf("Second response usage: %s\n", response2.Usage.RawJSON())
messages = append(messages, response2.ToParam())
messages = append(messages, anthropic.NewUserMessage(anthropic.NewTextBlock("Analyze the setting in this passage.")))
// Third request - different thinking budget (cache miss expected)
fmt.Println("\nThird request - different thinking budget (cache miss expected)")
response3, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
Model: anthropic.ModelClaudeSonnet4_6,
MaxTokens: 20000,
Thinking: anthropic.ThinkingConfigParamOfEnabled(8000),
Messages: messages,
})
if err != nil {
log.Fatal(err)
}
fmt.Printf("Third response usage: %s\n", response3.Usage.RawJSON())
}
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.messages.CacheControlEphemeral;
import com.anthropic.models.messages.ContentBlockParam;
import com.anthropic.models.messages.MessageCreateParams;
import com.anthropic.models.messages.Message;
import com.anthropic.models.messages.Model;
import com.anthropic.models.messages.TextBlockParam;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.util.List;
void main() throws Exception {
AnthropicClient client = AnthropicOkHttpClient.fromEnv();
String bookUrl = "https://www.gutenberg.org/cache/epub/1342/pg1342.txt";
String bookContent = fetchArticleContent(bookUrl);
String largeText = bookContent.substring(0, Math.min(10000, bookContent.length()));
// First request - establishing cache
IO.println("First request - establishing cache");
MessageCreateParams params1 = MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(20000L)
.enabledThinking(4000L)
.addUserMessageOfBlockParams(List.of(
ContentBlockParam.ofText(TextBlockParam.builder()
.text(largeText)
.cacheControl(CacheControlEphemeral.builder().build())
.build()),
ContentBlockParam.ofText(TextBlockParam.builder()
.text("Analyze the tone of this passage.")
.build())
))
.build();
Message response1 = client.messages().create(params1);
IO.println("First response usage: " + response1.usage());
// Second request - same thinking parameters (cache hit expected)
IO.println("\nSecond request - same thinking parameters (cache hit expected)");
MessageCreateParams params2 = MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(20000L)
.enabledThinking(4000L)
.addUserMessageOfBlockParams(List.of(
ContentBlockParam.ofText(TextBlockParam.builder()
.text(largeText)
.cacheControl(CacheControlEphemeral.builder().build())
.build()),
ContentBlockParam.ofText(TextBlockParam.builder()
.text("Analyze the tone of this passage.")
.build())
))
.addAssistantMessageOfBlockParams(response1.content().stream()
.map(block -> block.toParam())
.collect(java.util.stream.Collectors.toList()))
.addUserMessage("Analyze the characters in this passage.")
.build();
Message response2 = client.messages().create(params2);
IO.println("Second response usage: " + response2.usage());
// Third request - different thinking budget (cache miss expected)
IO.println("\nThird request - different thinking budget (cache miss expected)");
MessageCreateParams params3 = MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(20000L)
.enabledThinking(8000L)
.addUserMessageOfBlockParams(List.of(
ContentBlockParam.ofText(TextBlockParam.builder()
.text(largeText)
.cacheControl(CacheControlEphemeral.builder().build())
.build()),
ContentBlockParam.ofText(TextBlockParam.builder()
.text("Analyze the tone of this passage.")
.build())
))
.addAssistantMessageOfBlockParams(response1.content().stream()
.map(block -> block.toParam())
.collect(java.util.stream.Collectors.toList()))
.addUserMessage("Analyze the characters in this passage.")
.addAssistantMessageOfBlockParams(response2.content().stream()
.map(block -> block.toParam())
.collect(java.util.stream.Collectors.toList()))
.addUserMessage("Analyze the setting in this passage.")
.build();
Message response3 = client.messages().create(params3);
IO.println("Third response usage: " + response3.usage());
}
String fetchArticleContent(String url) throws Exception {
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(url))
.build();
HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
return response.body();
}
<?php
use Anthropic\Client;
function fetchArticleContent($url) {
$content = file_get_contents($url);
$lines = explode("\n", $content);
$cleanedLines = array_filter(array_map('trim', $lines));
return implode("\n", $cleanedLines);
}
$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));
$bookUrl = "https://www.gutenberg.org/cache/epub/1342/pg1342.txt";
$bookContent = fetchArticleContent($bookUrl);
$largeText = substr($bookContent, 0, 10000);
echo "First request - establishing cache\n";
$response1 = $client->messages->create(
maxTokens: 20000,
messages: [[
'role' => 'user',
'content' => [
[
'type' => 'text',
'text' => $largeText,
'cache_control' => ['type' => 'ephemeral']
],
[
'type' => 'text',
'text' => 'Analyze the tone of this passage.'
]
]
]],
model: 'claude-sonnet-4-6',
thinking: ['type' => 'enabled', 'budget_tokens' => 4000],
);
echo "First response usage: " . json_encode($response1->usage) . "\n";
echo "\nSecond request - same thinking parameters (cache hit expected)\n";
$response2 = $client->messages->create(
maxTokens: 20000,
messages: [
[
'role' => 'user',
'content' => [
[
'type' => 'text',
'text' => $largeText,
'cache_control' => ['type' => 'ephemeral']
],
[
'type' => 'text',
'text' => 'Analyze the tone of this passage.'
]
]
],
[
'role' => 'assistant',
'content' => $response1->content
],
[
'role' => 'user',
'content' => 'Analyze the characters in this passage.'
]
],
model: 'claude-sonnet-4-6',
thinking: ['type' => 'enabled', 'budget_tokens' => 4000],
);
echo "Second response usage: " . json_encode($response2->usage) . "\n";
echo "\nThird request - different thinking budget (cache miss expected)\n";
$response3 = $client->messages->create(
maxTokens: 20000,
messages: [
[
'role' => 'user',
'content' => [
[
'type' => 'text',
'text' => $largeText,
'cache_control' => ['type' => 'ephemeral']
],
[
'type' => 'text',
'text' => 'Analyze the tone of this passage.'
]
]
],
[
'role' => 'assistant',
'content' => $response1->content
],
[
'role' => 'user',
'content' => 'Analyze the characters in this passage.'
],
[
'role' => 'assistant',
'content' => $response2->content
],
[
'role' => 'user',
'content' => 'Analyze the setting in this passage.'
]
],
model: 'claude-sonnet-4-6',
thinking: ['type' => 'enabled', 'budget_tokens' => 8000],
);
echo "Third response usage: " . json_encode($response3->usage) . "\n";
require "anthropic"
require "net/http"
require "uri"
def fetch_article_content(url)
uri = URI.parse(url)
response = Net::HTTP.get_response(uri)
text = response.body
lines = text.split("\n").map(&:strip)
lines.reject(&:empty?).join("\n")
end
client = Anthropic::Client.new
book_url = "https://www.gutenberg.org/cache/epub/1342/pg1342.txt"
book_content = fetch_article_content(book_url)
large_text = book_content[0...10000]
puts "First request - establishing cache"
response1 = client.messages.create(
model: "claude-sonnet-4-6",
max_tokens: 20000,
thinking: {
type: "enabled",
budget_tokens: 4000
},
messages: [{
role: "user",
content: [
{
type: "text",
text: large_text,
cache_control: { type: "ephemeral" }
},
{
type: "text",
text: "Analyze the tone of this passage."
}
]
}]
)
puts "First response usage: #{response1.usage}"
puts "\nSecond request - same thinking parameters (cache hit expected)"
response2 = client.messages.create(
model: "claude-sonnet-4-6",
max_tokens: 20000,
thinking: {
type: "enabled",
budget_tokens: 4000
},
messages: [
{
role: "user",
content: [
{
type: "text",
text: large_text,
cache_control: { type: "ephemeral" }
},
{
type: "text",
text: "Analyze the tone of this passage."
}
]
},
{
role: "assistant",
content: response1.content
},
{
role: "user",
content: "Analyze the characters in this passage."
}
]
)
puts "Second response usage: #{response2.usage}"
puts "\nThird request - different thinking budget (cache miss expected)"
response3 = client.messages.create(
model: "claude-sonnet-4-6",
max_tokens: 20000,
thinking: {
type: "enabled",
budget_tokens: 8000
},
messages: [
{
role: "user",
content: [
{
type: "text",
text: large_text,
cache_control: { type: "ephemeral" }
},
{
type: "text",
text: "Analyze the tone of this passage."
}
]
},
{
role: "assistant",
content: response1.content
},
{
role: "user",
content: "Analyze the characters in this passage."
},
{
role: "assistant",
content: response2.content
},
{
role: "user",
content: "Analyze the setting in this passage."
}
]
)
puts "Third response usage: #{response3.usage}"
以下是脚本的输出(您可能会看到略有不同的数字)
First request - establishing cache
First response usage: { cache_creation_input_tokens: 1370, cache_read_input_tokens: 0, input_tokens: 17, output_tokens: 700 }
Second request - same thinking parameters (cache hit expected)
Second response usage: { cache_creation_input_tokens: 0, cache_read_input_tokens: 1370, input_tokens: 303, output_tokens: 874 }
Third request - different thinking budget (cache miss expected)
Third response usage: { cache_creation_input_tokens: 1370, cache_read_input_tokens: 0, input_tokens: 747, output_tokens: 619 }
此示例演示了当在消息数组中设置缓存时,更改思考参数(budget_tokens 从 4000 增加到 8000)会使缓存失效。第三个请求显示没有缓存命中,cache_creation_input_tokens=1370 和 cache_read_input_tokens=0,证明基于消息的缓存在思考参数更改时会失效。
使用扩展思考时的最大 token 数和上下文窗口大小
max_tokens(启用思考时包括思考预算)作为严格限制执行。在 Claude 4.5 及更新的模型上,如果输入 token 加 max_tokens 超过上下文窗口大小,API 会接受请求。如果生成随后达到上下文窗口限制,则以 stop_reason: "model_context_window_exceeded" 停止。在较早的模型上,API 会返回验证错误。请参阅处理停止原因。
您可以阅读上下文窗口指南获取更深入的了解。
使用扩展思考时的上下文窗口
在启用思考的情况下计算上下文窗口使用量时,需要注意以下几点:
- 在 Opus 4.5+ 和 Sonnet 4.6+ 上,前几轮的思考块会被保留并计入上下文窗口;在较早的 Opus/Sonnet 模型和所有 Haiku 模型上,它们会被剥离且不计入
- 当前轮次的思考计入该轮次的
max_tokens限制
下图展示了启用扩展思考时的专用 token 管理:
有效上下文窗口的计算方式为:
context window =
(current input tokens - previous thinking tokens) +
(thinking tokens + encrypted thinking tokens + text output tokens)
使用 token 计数 API 获取您特定用例的准确 token 计数,尤其是在处理包含思考的多轮对话时。
使用扩展思考与工具时的上下文窗口
使用扩展思考与工具时,思考块必须显式保留并与工具结果一起返回。
使用扩展思考与工具时的有效上下文窗口计算方式为:
context window =
(current input tokens + previous thinking tokens + tool use tokens) +
(thinking tokens + encrypted thinking tokens + text output tokens)
下图展示了使用扩展思考与工具时的 token 管理:
使用扩展思考时的 token 管理
鉴于扩展思考的上下文窗口和 max_tokens 行为,您可能需要:
- 更积极地监控和管理您的 token 使用量
- 随着提示长度变化调整
max_tokens值 - 可能更频繁地使用 token 计数端点
- 了解之前的思考块不会在上下文窗口中累积
思考加密
完整的思考内容经过加密并返回在 signature 字段中。此字段用于在将思考块传回 API 时验证它们是由 Claude 生成的。
以下是关于思考加密的一些重要注意事项:
- 流式响应时,签名通过
content_block_stop事件之前的content_block_delta事件中的signature_delta添加。 - Claude 4 模型中的
signature值比之前的模型长得多。 signature字段是不透明字段,不应被解释或解析。signature值在各平台之间兼容(Claude API、Amazon Bedrock 和 Vertex AI)。在一个平台上生成的值将与另一个平台兼容。
已编辑的思考块
除了常规 thinking 块之外,API 还可能返回 redacted_thinking 块。redacted_thinking 块在 data 字段中包含加密的思考内容,没有可读的摘要:
{
"type": "redacted_thinking",
"data": "..."
}
data 字段是不透明且加密的。与常规思考块上的 signature 字段一样,在使用工具继续多轮对话时,您应该将 redacted_thinking 块原样传回 API。
如果您的代码在使用工具往返响应时按类型过滤内容块(例如 block.type == "thinking"),请同时包含 redacted_thinking 块。仅过滤 block.type == "thinking" 会静默丢弃 redacted_thinking 块并破坏上述多轮协议。
redacted_thinking 块是 API 在思考部分被安全编辑时返回的独立内容块类型。这与 display: "omitted" 选项不同,后者返回 thinking 字段为空的常规 thinking 块。
不同模型版本之间的思考差异
Messages API 在不同 Claude 模型版本中对思考的处理方式不同。下表给出了简要比较:
| 功能 | Claude 4 模型(Opus 4.5 之前) | Claude Opus 4.5 | Claude Sonnet 4.6 | Claude Opus 4.6(自适应思考) | Claude Opus 4.7(自适应思考) | Claude Mythos 预览版(自适应思考) |
|---|---|---|---|---|---|---|
| 思考输出 | 返回摘要思考 | 返回摘要思考 | 返回摘要思考 | 返回摘要思考 | 默认省略;设置 display: "summarized" 以接收摘要思考 | 默认省略;设置 display: "summarized" 以接收摘要思考。原始思考 token 永远不会返回。 |
| 交错思考 | 通过 interleaved-thinking-2025-05-14 beta 头支持 | 通过 interleaved-thinking-2025-05-14 beta 头支持 | 通过 interleaved-thinking-2025-05-14 beta 头支持,或通过自适应思考自动启用 | 通过自适应思考自动启用(beta 头已弃用,可安全忽略) | 通过自适应思考自动启用(beta 头已弃用,可安全忽略) | 通过自适应思考自动启用(beta 头不需要,可安全忽略)。工具间推理在此模型上移入思考块。 |
| 思考块保留 | 跨轮次不保留 | 默认保留 | 默认保留 | 默认保留 | 默认保留 | 默认保留。 在不支持 Mythos 思考格式的模型上继续对话时,块会被剥离。 |
按模型分类的思考块保留
前几轮助手思考块是否默认保留在上下文中取决于模型类别。Opus:Claude Opus 4.5 及后续 Opus 模型保留所有先前的思考块;Claude Opus 4.1 及更早的 Opus 模型仅保留最后一轮助手的思考。Sonnet:Claude Sonnet 4.6 及后续 Sonnet 模型保留所有;Claude Sonnet 4.5 及更早的 Sonnet 模型仅保留最后一轮。Haiku:截至 Claude Haiku 4.5 的所有 Haiku 模型仅保留最后一轮。Claude Mythos 预览版也保留所有先前的思考块。
思考块保留的好处:
- 缓存优化:使用工具时,保留的思考块能够实现缓存命中,因为它们与工具结果一起传回并在助手轮次中逐步缓存,从而在多步骤工作流中节省 token
- 无智力影响:保留思考块对模型性能没有负面影响
重要注意事项:
- 上下文使用量:长对话将消耗更多上下文空间,因为思考块被保留在上下文中
- 自动行为:这是上述每个模型的默认行为。不需要代码更改或 beta 头
- 向后兼容性:要利用此功能,继续将完整的、未修改的思考块传回 API,就像您为工具使用所做的那样
对于较早的模型(Claude Sonnet 4.5、Opus 4.1 等),前几轮的思考块继续从上下文中移除。扩展思考与提示缓存部分描述的现有行为适用于这些模型。
定价
有关基本费率、缓存写入、缓存命中和输出 token 的完整定价信息,请参阅定价页面。
思考过程会产生以下费用:
- 思考期间使用的 token(输出 token)
- 保留在上下文中的前几轮助手思考块:较早的 Opus/Sonnet 模型和所有 Haiku 模型仅保留最后一轮;Opus 4.5+ 和 Sonnet 4.6+ 默认保留所有轮次(输入 token)
- 标准文本输出 token
启用扩展思考时,会自动包含专门的系统提示以支持此功能。
使用摘要思考时:
- 输入 token: 原始请求中的 token(不包括前几轮的思考 token)
- 输出 token(计费): Claude 内部生成的原始思考 token
- 输出 token(可见): 您在响应中看到的摘要思考 token
- 不收费: 用于生成摘要的 token
使用 display: "omitted" 时:
- 输入 token: 原始请求中的 token(与摘要相同)
- 输出 token(计费): Claude 内部生成的原始思考 token(与摘要相同)
- 输出 token(可见): 零思考 token(
thinking字段为空)
计费的输出 token 数量不会与响应中可见的 token 数量匹配。您按完整思考过程计费,而非响应中可见的思考内容。
扩展思考的最佳实践和注意事项
使用思考预算
- 预算优化: 最小预算为 1,024 token。从最小值开始,逐步增加思考预算,为您的用例找到最佳范围。更高的 token 数可以实现更全面的推理,但根据任务不同会有递减回报。增加预算可以提高响应质量,但代价是延迟增加。对于关键任务,测试不同设置以找到最佳平衡。请注意,思考预算是目标而非严格限制。实际 token 使用量可能因任务而异。
- 起点: 对于复杂任务,从较大的思考预算(16k+ token)开始,然后根据需要进行调整。
- 大预算: 对于超过 32k 的思考预算,请使用批处理以避免网络问题。推动模型思考超过 32k token 的请求会导致长时间运行的请求,可能会遇到系统超时和打开连接限制。
- token 使用量跟踪: 监控思考 token 使用量以优化成本和性能。
性能注意事项
- 响应时间: 由于额外处理,请做好响应时间更长的准备。生成思考块会增加整体响应时间。
- 流式要求: 当
max_tokens大于 21,333 时,SDK 要求流式传输以避免长时间运行的请求出现 HTTP 超时。这是客户端验证,而非 API 限制。如果您不需要增量处理事件,请使用.stream()配合.get_final_message()(Python)或.finalMessage()(TypeScript)来获取完整的Message对象,而无需处理单个事件。详情请参阅流式消息。流式传输时,请做好在到达时同时处理思考和文本内容块的准备。 - 为延迟省略思考: 如果您的应用程序不显示思考内容,请在思考配置中设置
display: "omitted"以减少首文本 token 的时间。请参阅控制思考显示。
功能兼容性
- 思考与
temperature或top_k修改以及强制工具使用不兼容。 - 启用思考时,您可以将
top_p设置为 1 到 0.95 之间的值。 - 启用思考时,您无法预填充响应。
- 思考预算的更改会使包含消息的缓存提示前缀失效。但是,缓存的系统提示和工具定义在思考参数更改时将继续工作。
使用指南
- 任务选择: 将扩展思考用于特别复杂的、受益于逐步推理的任务,如数学、编码和分析。
- 上下文处理: 您不需要自己移除之前的思考块。在 Opus 4.5+ 和 Sonnet 4.6+ 上,Claude API 默认保留前几轮的思考块;在较早的 Opus/Sonnet 模型和所有 Haiku 模型上,它会自动忽略它们,在计算上下文使用量时不计入。
- 提示工程: 如果您想最大化 Claude 的思考能力,请查看扩展思考提示技巧。