Prompt caching

Prompt caching 通过允许从提示中的特定前缀恢复来优化您的 API 使用。这可以显著减少重复任务或具有固定元素的提示的处理时间和成本。

Note

此功能符合零数据保留 (ZDR) 条件。当您的组织有 ZDR 安排时，通过此功能发送的数据在 API 响应返回后不会被存储。

有两种启用 prompt caching 的方式：

自动缓存：在请求的顶层添加单个 cache_control 字段。系统自动将缓存断点应用于最后一个可缓存的块，并随着对话增长向前移动。最适合需要自动缓存增长消息历史的多轮对话。
显式缓存断点：直接在单个内容块上放置 cache_control，以精确控制缓存内容。

最简单的开始方式是使用自动缓存：

curl https://api.anthropic.com/v1/messages \
  -H "content-type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-opus-4-7",
    "max_tokens": 1024,
    "cache_control": {"type": "ephemeral"},
    "system": "You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on themes, characters, and writing style.",
    "messages": [
      {
        "role": "user",
        "content": "Analyze the major themes in Pride and Prejudice."
      }
    ]
  }'

ant messages create --transform usage <<'YAML'
model: claude-opus-4-7
max_tokens: 1024
cache_control:
  type: ephemeral
system: >-
  You are an AI assistant tasked with analyzing literary works. Your goal is
  to provide insightful commentary on themes, characters, and writing style.
messages:
  - role: user
    content: Analyze the major themes in Pride and Prejudice.
YAML

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    cache_control={"type": "ephemeral"},
    system="You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on themes, characters, and writing style.",
    messages=[
        {
            "role": "user",
            "content": "Analyze the major themes in 'Pride and Prejudice'.",
        }
    ],
)
print(response.usage.model_dump_json())

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 1024,
  cache_control: { type: "ephemeral" },
  system:
    "You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on themes, characters, and writing style.",
  messages: [
    {
      role: "user",
      content: "Analyze the major themes in 'Pride and Prejudice'."
    }
  ]
});
console.log(response.usage);

using Anthropic;
using Anthropic.Models.Messages;

AnthropicClient client = new();

var parameters = new MessageCreateParams
{
    Model = Model.ClaudeOpus4_7,
    MaxTokens = 1024,
    CacheControl = new CacheControlEphemeral(),
    System = "You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on themes, characters, and writing style.",
    Messages =
    [
        new()
        {
            Role = Role.User,
            Content = "Analyze the major themes in 'Pride and Prejudice'."
        }
    ]
};

var message = await client.Messages.Create(parameters);
Console.WriteLine(message.Usage);

package main

import (
	"context"
	"fmt"
	"log"

	"github.com/anthropics/anthropic-sdk-go"
)

func main() {
	client := anthropic.NewClient()

	response, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
		Model:        anthropic.ModelClaudeOpus4_7,
		MaxTokens:    1024,
		CacheControl: anthropic.NewCacheControlEphemeralParam(),
		System: []anthropic.TextBlockParam{
			{Text: "You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on themes, characters, and writing style."},
		},
		Messages: []anthropic.MessageParam{
			anthropic.NewUserMessage(anthropic.NewTextBlock("Analyze the major themes in 'Pride and Prejudice'.")),
		},
	})
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(response.Usage)
}

import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.messages.CacheControlEphemeral;
import com.anthropic.models.messages.Message;
import com.anthropic.models.messages.MessageCreateParams;
import com.anthropic.models.messages.Model;

public class PromptCachingExample {

  public static void main(String[] args) {
    AnthropicClient client = AnthropicOkHttpClient.fromEnv();

    MessageCreateParams params = MessageCreateParams.builder()
        .model(Model.CLAUDE_OPUS_4_7)
        .maxTokens(1024)
        .cacheControl(CacheControlEphemeral.builder().build())
        .system("You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on themes, characters, and writing style.")
        .addUserMessage("Analyze the major themes in 'Pride and Prejudice'.")
        .build();

    Message message = client.messages().create(params);
    System.out.println(message.usage());
  }
}

<?php

use Anthropic\Client;
use Anthropic\Messages\CacheControlEphemeral;

$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));

$response = $client->messages->create(
    maxTokens: 1024,
    messages: [
        ['role' => 'user', 'content' => "Analyze the major themes in 'Pride and Prejudice'."]
    ],
    model: 'claude-opus-4-7',
    cacheControl: CacheControlEphemeral::with(),
    system: "You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on themes, characters, and writing style.",
);
echo json_encode($response->usage);

require "anthropic"

client = Anthropic::Client.new

response = client.messages.create(
  model: "claude-opus-4-7",
  max_tokens: 1024,
  cache_control: {type: "ephemeral"},
  system: "You are an AI assistant tasked with analyzing literary works. Your goal is to provide insightful commentary on themes, characters, and writing style.",
  messages: [
    {
      role: "user",
      content: "Analyze the major themes in 'Pride and Prejudice'."
    }
  ]
)
puts response.usage

使用自动缓存时，系统会缓存到最后一个可缓存块为止的所有内容。在后续具有相同前缀的请求中，缓存的内容会自动重用。

Prompt caching 的工作原理

当您发送启用 prompt caching 的请求时：

系统检查指定缓存断点之前的提示前缀是否已被最近的查询缓存。
如果找到，则使用缓存版本，减少处理时间和成本。
否则，处理完整提示，并在响应开始后缓存前缀。

这特别适用于：

包含大量示例的提示
大量上下文或背景信息
具有一致指令的重复任务
长时间的多轮对话

默认情况下，缓存具有 5 分钟的生命周期。每次使用缓存内容时，缓存会刷新，无需额外费用。

Note

如果您发现 5 分钟太短，Anthropic 还提供额外付费的 1 小时缓存持续时间。

更多信息请参阅1 小时缓存持续时间。

Tip

Prompt caching 缓存完整前缀

Prompt caching 引用整个提示 - tools、system 和 messages（按此顺序）直到并包括标记为 cache_control 的块。

定价

Prompt caching 引入了新的定价结构。下表显示了每个支持模型的每百万 token 价格：

模型	基础输入 token	5 分钟缓存写入	1 小时缓存写入	缓存命中和刷新	输出 token
Claude Opus 4.7	$5 / MTok	$6.25 / MTok	$10 / MTok	$0.50 / MTok	$25 / MTok
Claude Opus 4.6	$5 / MTok	$6.25 / MTok	$10 / MTok	$0.50 / MTok	$25 / MTok
Claude Opus 4.5	$5 / MTok	$6.25 / MTok	$10 / MTok	$0.50 / MTok	$25 / MTok
Claude Opus 4.1	$15 / MTok	$18.75 / MTok	$30 / MTok	$1.50 / MTok	$75 / MTok
Claude Opus 4（已弃用）	$15 / MTok	$18.75 / MTok	$30 / MTok	$1.50 / MTok	$75 / MTok
Claude Sonnet 4.6	$3 / MTok	$3.75 / MTok	$6 / MTok	$0.30 / MTok	$15 / MTok
Claude Sonnet 4.5	$3 / MTok	$3.75 / MTok	$6 / MTok	$0.30 / MTok	$15 / MTok
Claude Sonnet 4（已弃用）	$3 / MTok	$3.75 / MTok	$6 / MTok	$0.30 / MTok	$15 / MTok
Claude Haiku 4.5	$1 / MTok	$1.25 / MTok	$2 / MTok	$0.10 / MTok	$5 / MTok
Claude Haiku 3.5（已退役，Bedrock 和 Vertex AI 除外）	$0.80 / MTok	$1 / MTok	$1.60 / MTok	$0.08 / MTok	$4 / MTok

Note

上表反映了 prompt caching 的以下定价倍数：

5 分钟缓存写入 token 是基础输入 token 价格的 1.25 倍
1 小时缓存写入 token 是基础输入 token 价格的 2 倍
缓存读取 token 是基础输入 token 价格的 0.1 倍

这些倍数与其他定价修改器（如 Batch API 折扣和数据驻留）叠加。详见定价。

支持的模型

Prompt caching（自动和显式）支持所有活跃 Claude 模型。

自动缓存

自动缓存是启用 prompt caching 的最简单方式。无需在单个内容块上放置 cache_control，只需在请求体的顶层添加单个 cache_control 字段。系统自动将缓存断点应用于最后一个可缓存的块。

curl https://api.anthropic.com/v1/messages \
  -H "content-type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-opus-4-7",
    "max_tokens": 1024,
    "cache_control": {"type": "ephemeral"},
    "system": "You are a helpful assistant that remembers our conversation.",
    "messages": [
      {"role": "user", "content": "My name is Alex. I work on machine learning."},
      {"role": "assistant", "content": "Nice to meet you, Alex! How can I help with your ML work today?"},
      {"role": "user", "content": "What did I say I work on?"}
    ]
  }'

ant messages create --transform usage <<'YAML'
model: claude-opus-4-7
max_tokens: 1024
cache_control:
  type: ephemeral
system: You are a helpful assistant that remembers our conversation.
messages:
  - role: user
    content: My name is Alex. I work on machine learning.
  - role: assistant
    content: Nice to meet you, Alex! How can I help with your ML work today?
  - role: user
    content: What did I say I work on?
YAML

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    cache_control={"type": "ephemeral"},
    system="You are a helpful assistant that remembers our conversation.",
    messages=[
        {"role": "user", "content": "My name is Alex. I work on machine learning."},
        {
            "role": "assistant",
            "content": "Nice to meet you, Alex! How can I help with your ML work today?",
        },
        {"role": "user", "content": "What did I say I work on?"},
    ],
)
print(response.usage.model_dump_json())

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 1024,
  cache_control: { type: "ephemeral" },
  system: "You are a helpful assistant that remembers our conversation.",
  messages: [
    { role: "user", content: "My name is Alex. I work on machine learning." },
    {
      role: "assistant",
      content: "Nice to meet you, Alex! How can I help with your ML work today?"
    },
    { role: "user", content: "What did I say I work on?" }
  ]
});
console.log(response.usage);

using Anthropic;
using Anthropic.Models.Messages;

AnthropicClient client = new();

var parameters = new MessageCreateParams
{
    Model = Model.ClaudeOpus4_7,
    MaxTokens = 1024,
    CacheControl = new CacheControlEphemeral(),
    System = "You are a helpful assistant that remembers our conversation.",
    Messages =
    [
        new()
        {
            Role = Role.User,
            Content = "My name is Alex. I work on machine learning."
        },
        new()
        {
            Role = Role.Assistant,
            Content = "Nice to meet you, Alex! How can I help with your ML work today?"
        },
        new()
        {
            Role = Role.User,
            Content = "What did I say I work on?"
        }
    ]
};

var message = await client.Messages.Create(parameters);
Console.WriteLine(message.Usage);

package main

import (
	"context"
	"fmt"
	"log"

	"github.com/anthropics/anthropic-sdk-go"
)

func main() {
	client := anthropic.NewClient()

	response, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
		Model:        anthropic.ModelClaudeOpus4_7,
		MaxTokens:    1024,
		CacheControl: anthropic.NewCacheControlEphemeralParam(),
		System: []anthropic.TextBlockParam{
			{Text: "You are a helpful assistant that remembers our conversation."},
		},
		Messages: []anthropic.MessageParam{
			anthropic.NewUserMessage(anthropic.NewTextBlock("My name is Alex. I work on machine learning.")),
			anthropic.NewAssistantMessage(anthropic.NewTextBlock("Nice to meet you, Alex! How can I help with your ML work today?")),
			anthropic.NewUserMessage(anthropic.NewTextBlock("What did I say I work on?")),
		},
	})
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(response.Usage)
}

import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.messages.CacheControlEphemeral;
import com.anthropic.models.messages.Message;
import com.anthropic.models.messages.MessageCreateParams;
import com.anthropic.models.messages.Model;

public class AutomaticCachingExample {

    public static void main(String[] args) {
        AnthropicClient client = AnthropicOkHttpClient.fromEnv();

        MessageCreateParams params = MessageCreateParams.builder()
                .model(Model.CLAUDE_OPUS_4_7)
                .maxTokens(1024)
                .cacheControl(CacheControlEphemeral.builder().build())
                .system("You are a helpful assistant that remembers our conversation.")
                .addUserMessage("My name is Alex. I work on machine learning.")
                .addAssistantMessage("Nice to meet you, Alex! How can I help with your ML work today?")
                .addUserMessage("What did I say I work on?")
                .build();

        Message message = client.messages().create(params);
        System.out.println(message.usage());
    }
}

<?php

use Anthropic\Client;
use Anthropic\Messages\CacheControlEphemeral;

$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));

$response = $client->messages->create(
    maxTokens: 1024,
    messages: [
        ['role' => 'user', 'content' => 'My name is Alex. I work on machine learning.'],
        ['role' => 'assistant', 'content' => 'Nice to meet you, Alex! How can I help with your ML work today?'],
        ['role' => 'user', 'content' => 'What did I say I work on?'],
    ],
    model: 'claude-opus-4-7',
    cacheControl: CacheControlEphemeral::with(),
    system: 'You are a helpful assistant that remembers our conversation.',
);
echo json_encode($response->usage);

require "anthropic"

client = Anthropic::Client.new

response = client.messages.create(
  model: "claude-opus-4-7",
  max_tokens: 1024,
  cache_control: {type: "ephemeral"},
  system: "You are a helpful assistant that remembers our conversation.",
  messages: [
    {role: "user", content: "My name is Alex. I work on machine learning."},
    {role: "assistant", content: "Nice to meet you, Alex! How can I help with your ML work today?"},
    {role: "user", content: "What did I say I work on?"}
  ]
)
puts response.usage

自动缓存在多轮对话中的工作原理

使用自动缓存时，缓存点会随着对话增长自动向前移动。每个新请求都会缓存到最后一个可缓存块为止的所有内容，之前的内容从缓存中读取。

请求	内容	缓存行为
请求 1	系统 + 用户(1) + 助手(1) + 用户(2) ◀ 缓存	所有内容写入缓存
请求 2	系统 + 用户(1) + 助手(1) + 用户(2) + 助手(2) + 用户(3) ◀ 缓存	系统到用户(2)从缓存读取；助手(2) + 用户(3)写入缓存
请求 3	系统 + 用户(1) + 助手(1) + 用户(2) + 助手(2) + 用户(3) + 助手(3) + 用户(4) ◀ 缓存	系统到用户(3)从缓存读取；助手(3) + 用户(4)写入缓存

缓存断点会自动移动到每个请求中的最后一个可缓存块，因此您无需随着对话增长更新任何 cache_control 标记。

TTL 支持

默认情况下，自动缓存使用 5 分钟的 TTL。您可以指定 1 小时的 TTL，价格为基础输入 token 价格的 2 倍：

{ "cache_control": { "type": "ephemeral", "ttl": "1h" } }

与块级缓存结合

自动缓存与显式缓存断点兼容。一起使用时，自动缓存断点使用 4 个可用断点插槽中的一个。

这让您可以结合两种方法。例如，使用显式断点缓存系统提示，同时自动缓存处理对话：

{
  "model": "claude-opus-4-7",
  "max_tokens": 1024,
  "cache_control": { "type": "ephemeral" },
  "system": [
    {
      "type": "text",
      "text": "You are a helpful assistant.",
      "cache_control": { "type": "ephemeral" }
    }
  ],
  "messages": [{ "role": "user", "content": "What are the key terms?" }]
}

保持不变的内容

自动缓存使用相同的底层缓存基础设施。定价、最小 token 阈值、上下文排序要求和 20 块回溯窗口都与显式断点相同。

边缘情况

如果最后一个块已经有相同 TTL 的显式 cache_control，自动缓存是无操作。
如果最后一个块有不同 TTL 的显式 cache_control，API 返回 400 错误。
如果已经有 4 个显式块级断点，API 返回 400 错误（没有剩余插槽用于自动缓存）。
如果最后一个块不符合自动缓存断点目标的条件，系统会静默向后回溯以找到最近的符合条件的块。如果未找到，则跳过缓存。

Note

自动缓存可在 Claude API、Claude Platform on AWS 和 Microsoft Foundry（beta）上使用。Bedrock 和 Vertex AI 不支持自动缓存。

显式缓存断点

要对缓存进行更多控制，您可以直接在单个内容块上放置 cache_control。当您需要缓存不同频率变化的不同部分，或需要精确控制缓存内容时，这很有用。

构建提示结构

将静态内容（工具定义、系统指令、上下文、示例）放在提示的开头。使用 cache_control 参数标记可重用内容的结尾以进行缓存。

缓存前缀按以下顺序创建：tools、system，然后是 messages。此顺序形成一个层次结构，每个级别都建立在前一个级别之上。

自动前缀检查的工作原理

您可以在静态内容的末尾只使用一个缓存断点，系统会自动找到先前请求已写入缓存的最长前缀。理解其工作原理有助于优化缓存策略。

三个核心原则：

缓存写入仅在断点处发生。 标记带有 cache_control 的块会精确写入一个缓存条目：以该块结尾的前缀的哈希值。系统不会为任何更早的位置写入条目。因为哈希是累积的，覆盖直到并包括断点的所有内容，更改断点处或之前的任何块会在下一个请求中产生不同的哈希值。
缓存读取向后查找先前请求写入的条目。 在每个请求中，系统计算断点处的前缀哈希值并检查匹配的缓存条目。如果不存在，则每次向后移动一个块，检查每个更早位置的前缀哈希值是否与缓存中已有的内容匹配。它查找的是先前的写入，而不是稳定内容。
回溯窗口是 20 块。 系统每个断点最多检查 20 个位置，断点本身算作第一个。如果系统在该窗口内未找到匹配的条目，则停止检查（或从下一个显式断点继续，如果有的话）。

示例：在增长对话中的回溯

您每轮追加新块，并在每个请求的最后一个块上设置 cache_control：

第 1 轮： 10 个块，断点在块 10。没有先前的缓存条目。系统在块 10 写入条目。
第 2 轮： 15 个块，断点在块 15。块 15 没有条目，因此系统回溯到块 10 并找到第 1 轮的条目。在块 10 缓存命中；系统只处理块 11 到 15 的新内容，并在块 15 写入新条目。
第 3 轮： 35 个块，断点在块 35。系统检查 20 个位置（块 35 到 16）但什么都没找到。第 2 轮在块 15 的条目在窗口之外一个位置，因此没有缓存命中。在块 15 添加第二个断点会从那里开始第二个回溯窗口，找到第 2 轮的条目。

常见错误：在每次请求都变化的内容上设置断点

您的提示有一个大的静态系统上下文（块 1 到 5），后面是包含时间戳和用户消息的每个请求块（块 6）。您在块 6 上设置 cache_control：

请求 1： 在块 6 缓存写入。哈希值包含时间戳。
请求 2： 时间戳不同，因此块 6 的前缀哈希值不同。回溯遍历块 5、4、3、2 和 1，但系统从未在这些位置写入过条目。没有缓存命中。您每次请求都支付新的缓存写入费用，却从未获得读取。

回溯不会找到断点后面的稳定内容并缓存它。它找到的是先前请求已写入的条目，而写入只在断点处发生。将 cache_control 移到块 5（跨请求保持不变的最后一个块），每个后续请求都会读取缓存的前缀。自动缓存也会陷入同样的陷阱：它将断点放在最后一个可缓存块上，在此结构中这是每次请求都变化的块，因此请改用块 5 上的显式断点。

关键要点： 将 cache_control 放在您想要共享缓存的请求中前缀相同的最后一个块上。在增长的对话中，只要每轮添加少于 20 个块，最后一个块就可以工作：早期内容从不变化，因此下一个请求的回溯会找到先前的写入。对于带有变化后缀（时间戳、每个请求的上下文、传入消息）的提示，将断点放在静态前缀的末尾，而不是变化的块上。

何时使用多个断点

如果您想要，可以定义最多 4 个缓存断点：

缓存不同频率变化的不同部分（例如，工具很少变化，但上下文每天更新）
对缓存内容有更多控制
当增长的对话将您的断点推到距离上次缓存写入 20 个或更多块时，确保缓存命中

Note

重要限制： 回溯只能找到先前请求已写入的条目。如果增长的对话将您的断点推到距离上次写入 20 个或更多块，回溯窗口会错过它。从一开始就添加第二个断点更接近该位置，以便在需要之前在那里积累写入。

理解缓存断点成本

缓存断点本身不增加任何成本。 您只需支付：

缓存写入：当新内容写入缓存时（5 分钟 TTL 比基础输入 token 贵 25%）
缓存读取：当使用缓存内容时（基础输入 token 价格的 10%）
常规输入 token：对于任何未缓存的内容

添加更多 cache_control 断点不会增加您的成本 - 您仍然根据实际缓存和读取的内容支付相同的金额。断点只是让您控制哪些部分可以独立缓存。

缓存策略和注意事项

缓存限制

在 Claude API、Claude Platform on AWS、Vertex AI 和 Microsoft Foundry（beta）上，最小可缓存提示长度为：

Claude Mythos Preview、Claude Opus 4.7、Claude Opus 4.6 和 Claude Opus 4.5 为 4,096 tokens
Claude Sonnet 4.6、Claude Sonnet 4.5、Claude Opus 4.1、Claude Opus 4（已弃用）和 Claude Sonnet 4（已弃用）为 1,024 tokens
Claude Haiku 4.5 为 4,096 tokens
Claude Haiku 3.5（已退役，Vertex AI 除外）为 2,048 tokens

模型可用性因平台而异。

较短的提示无法缓存，即使标记了 cache_control。任何缓存少于此数量 token 的请求将在不缓存的情况下处理，且不会返回错误。要验证提示是否已缓存，请检查响应使用字段：如果 cache_creation_input_tokens 和 cache_read_input_tokens 都为 0，则提示未缓存（可能是因为未满足最小长度要求）。

如果您的提示刚好低于模型和平台的最小值，扩展缓存内容以达到阈值通常是值得的。缓存读取成本远低于未缓存的输入 token，因此达到最小值可以减少频繁重用提示的成本。

Note

Bedrock 是 AWS 运营的平台。在 Bedrock 上，请参阅 Bedrock prompt caching 文档了解每个模型的最小值、失败行为和使用字段名称。

对于并发请求，请注意缓存条目仅在第一个响应开始后才可用。如果您需要并行请求的缓存命中，请在发送后续请求之前等待第一个响应。

目前，"ephemeral" 是唯一支持的缓存类型，默认具有 5 分钟的生命周期。

可以缓存的内容

请求中的大多数块都可以缓存。这包括：

工具：tools 数组中的工具定义
系统消息：system 数组中的内容块
文本消息：messages.content 数组中的内容块，包括用户和助手轮次
图像和文档：messages.content 数组中的内容块，在用户轮次中
工具使用和工具结果：messages.content 数组中的内容块，在用户和助手轮次中

这些元素中的每一个都可以缓存，无论是自动还是通过标记 cache_control。

不能缓存的内容

虽然大多数请求块可以缓存，但有一些例外：

思考块不能直接用 cache_control 缓存。但是，当思考块出现在先前的助手轮次中时，它们可以与其他内容一起缓存。以这种方式缓存时，从缓存读取时它们确实算作输入 token。
子内容块（如引用）本身不能直接缓存。相反，缓存顶层块。

在引用的情况下，作为引用源材料的顶层文档内容块可以缓存。这使您可以通过缓存引用将引用的文档来有效使用 prompt caching。
空文本块不能缓存。

使缓存失效的内容

对缓存内容的修改可能使部分或全部缓存失效。

如构建提示结构中所述，缓存遵循层次结构：tools → system → messages。每个级别的更改会使该级别和所有后续级别失效。

下表显示了不同类型的更改会使缓存的哪些部分失效。✘ 表示缓存失效，✓ 表示缓存保持有效。

更改内容	工具缓存	系统缓存	消息缓存	影响
工具定义	✘	✘	✘	修改工具定义（名称、描述、参数）会使整个缓存失效
Web search 开关	✓	✘	✘	启用/禁用 web search 会修改系统提示
引用开关	✓	✘	✘	启用/禁用引用会修改系统提示
速度设置	✓	✘	✘	在 `speed: "fast"` 和标准速度之间切换会使系统和消息缓存失效
工具选择	✓	✓	✘	更改 `tool_choice` 参数只影响消息块
图像	✓	✓	✘	在提示中任何位置添加/删除图像会影响消息块
思考参数	✓	✓	✘	更改扩展思考设置（启用/禁用、预算）会影响消息块
传递给扩展思考请求的非工具结果	✓	✓	模型特定	在 Opus 4.5+ 和 Sonnet 4.6+ 上，默认保留思考块，因此缓存保持有效（✓）。在早期 Opus/Sonnet 模型和所有 Haiku 模型上，所有先前缓存的思考块会从上下文中剥离，跟随这些思考块的任何消息会从缓存中移除（✘）。更多详情请参阅使用思考块缓存。

跟踪缓存性能

使用这些 API 响应字段监控缓存性能，位于响应中的 usage 内（如果流式传输则在 message_start 事件中）：

cache_creation_input_tokens：创建新条目时写入缓存的 token 数量。
cache_read_input_tokens：为此请求从缓存检索的 token 数量。
input_tokens：未从缓存读取或用于创建缓存的输入 token 数量（即最后一个缓存断点之后的 token）。

Note

理解 token 分解

input_tokens 字段仅代表请求中最后一个缓存断点之后的 token - 不是您发送的所有输入 token。

要计算总输入 token：

total_input_tokens = cache_read_input_tokens + cache_creation_input_tokens + input_tokens

空间解释：

cache_read_input_tokens = 断点之前已缓存的 token（读取）
cache_creation_input_tokens = 断点之前正在缓存的 token（写入）
input_tokens = 最后一个断点之后的 token（不符合缓存条件）

示例： 如果您有一个包含 100,000 tokens 缓存内容（从缓存读取）、0 tokens 新内容正在缓存和 50 tokens 用户消息（在缓存断点之后）的请求：

cache_read_input_tokens: 100,000
cache_creation_input_tokens: 0
input_tokens: 50
处理的总输入 token: 100,050 tokens

这对于理解成本和速率限制很重要，因为使用缓存时 input_tokens 通常会比总输入小得多。

使用思考块缓存

在使用扩展思考和 prompt caching 时，思考块有特殊行为：

与其他内容一起自动缓存：虽然思考块不能直接用 cache_control 标记，但当您使用工具结果进行后续 API 调用时，它们会作为请求内容的一部分被缓存。这通常在工具使用期间发生，当您传回思考块以继续对话时。

输入 token 计数：当从缓存读取思考块时，它们在您的使用指标中算作输入 token。这对于成本计算和 token 预算很重要。

缓存失效模式：

当只提供工具结果作为用户消息时，缓存保持有效
在 Opus 4.5+ 和 Sonnet 4.6+ 上，默认情况下即使添加非工具结果用户内容也会保留思考块，因此缓存保持有效
在早期 Opus/Sonnet 模型和所有 Haiku 模型上，添加非工具结果用户内容会使缓存失效，导致所有先前的思考块从上下文中被剥离
即使没有显式 cache_control 标记也会发生此缓存行为

有关缓存失效的更多详情，请参阅使缓存失效的内容。

工具使用示例：

请求 1: 用户: "巴黎天气怎么样？"
响应: [thinking_block_1] + [tool_use block 1]

请求 2:
用户: ["巴黎天气怎么样？"],
助手: [thinking_block_1] + [tool_use block 1],
用户: [tool_result_1, cache=True]
响应: [thinking_block_2] + [text block 2]
# 请求 2 缓存其请求内容（不是响应）
# 缓存包括：用户消息、thinking_block_1、tool_use block 1 和 tool_result_1

请求 3:
用户: ["巴黎天气怎么样？"],
助手: [thinking_block_1] + [tool_use block 1],
用户: [tool_result_1, cache=True],
助手: [thinking_block_2] + [text block 2],
用户: [文本响应, cache=True]
# 在早期 Opus/Sonnet 和所有 Haiku 模型上，非工具结果用户块会导致先前思考块被剥离；在 Opus 4.5+/Sonnet 4.6+ 上它们被保留

在早期 Opus/Sonnet 模型和所有 Haiku 模型上，此时所有先前的思考块会从上下文中移除。在 Opus 4.5+ 和 Sonnet 4.6+ 上，默认保留先前的思考块，并保持为缓存前缀的一部分。

更多详细信息请参阅扩展思考文档。

缓存存储和共享

Warning

自 2026 年 2 月 5 日起，prompt caching 使用工作区级隔离而不是组织级隔离。缓存在每个工作区之间隔离，确保同一组织内工作区之间的数据分离。这适用于 Claude API、Claude Platform on AWS 和 Microsoft Foundry（beta）；Bedrock 和 Vertex AI 保持组织级缓存隔离。如果您使用多个工作区，请审查您的缓存策略以考虑此差异。

组织和工作区隔离： 缓存在组织之间隔离。不同的组织从不共享缓存，即使使用相同的提示。自 2026 年 2 月 5 日起，在 Claude API、Claude Platform on AWS 和 Microsoft Foundry（beta）上，缓存也在组织内的每个工作区之间隔离；Bedrock 和 Vertex AI 继续仅使用组织级隔离。
精确匹配： 缓存命中需要 100% 相同的提示段，包括直到并包括标记为缓存控制的块的所有文本和图像。
输出 token 生成： Prompt caching 对输出 token 生成没有影响。您收到的响应与不使用 prompt caching 时完全相同。

有效缓存的最佳实践

要优化 prompt caching 性能：

对于多轮对话，从自动缓存开始。它自动处理断点管理。
当需要缓存不同变化频率的不同部分时，使用显式块级断点。
缓存稳定、可重用的内容，如系统指令、背景信息、大型上下文或频繁的工具定义。
将缓存内容放在提示的开头以获得最佳性能。
战略性地使用缓存断点来分隔不同的可缓存前缀部分。
将断点放在跨请求保持相同的最后一个块上。对于带有静态前缀和变化后缀（时间戳、每个请求的上下文、传入消息）的提示，那是前缀的末尾，而不是变化的块。
定期分析缓存命中率并根据需要调整策略。

针对不同用例的优化

根据您的场景定制 prompt caching 策略：

对话代理：减少扩展对话的成本和延迟，特别是那些带有长指令或上传文档的对话。
编码助手：通过在提示中保留相关部分或代码库的摘要版本来改进自动补全和代码库问答。
大型文档处理：在提示中包含完整的长篇材料（包括图像）而不增加响应延迟。
详细指令集：共享大量指令、程序和示例列表以微调 Claude 的响应。开发人员通常在提示中包含一两个示例，但使用 prompt caching，您可以通过包含 20 多个高质量答案的多样化示例获得更好的性能。
代理工具使用：增强涉及多个工具调用和迭代代码更改的场景的性能，其中每一步通常需要新的 API 调用。
与书籍、论文、文档、播客文字稿和其他长篇内容对话：通过将整个文档嵌入提示并让用户提问，让任何知识库活起来。

故障排除常见问题

如果遇到意外行为：

Tip

缓存诊断（beta）让 API 比较连续请求并准确报告提示前缀在哪里 diverged，这会自动处理此列表中的许多步骤。

确保缓存部分在调用之间相同。对于显式断点，验证 cache_control 标记是否在相同位置
检查调用是否在缓存生命周期内（默认 5 分钟）
验证 tool_choice 和图像使用在调用之间保持一致
验证您是否缓存了至少模型和平台的最小 token 数（参见缓存限制）
确认您的断点在跨请求保持相同的块上。缓存写入仅在断点处发生，如果该块变化（时间戳、每个请求的上下文、传入消息），前缀哈希值永远不会匹配。回溯不会找到断点后面的稳定内容；它只找到先前请求在其自身断点处写入的条目
验证 tool_use 内容块中的键具有稳定的顺序，因为某些语言（例如 Swift、Go）在 JSON 转换期间会随机化键顺序，破坏缓存
使用缓存诊断让 API 比较连续请求并报告提示的哪个部分 diverged

Note

更改 tool_choice 或提示中任何位置图像的存在/缺失会使缓存失效，需要创建新的缓存条目。有关缓存失效的更多详情，请参阅使缓存失效的内容。

1 小时缓存持续时间

如果您发现 5 分钟太短，Anthropic 还提供额外付费的 1 小时缓存持续时间。

Note

1 小时缓存持续时间可在 Claude API、Claude Platform on AWS、Amazon Bedrock、Amazon Bedrock（旧版）、Vertex AI 和 Microsoft Foundry（beta）上使用。

要使用扩展缓存，在 cache_control 定义中包含 ttl，如下所示：

{
  "cache_control": {
    "type": "ephemeral",
    "ttl": "1h"
  }
}

响应将包含详细的缓存信息，如下所示：

{
  "usage": {
    "input_tokens": 2048,
    "cache_read_input_tokens": 1800,
    "cache_creation_input_tokens": 248,
    "output_tokens": 503,

    "cache_creation": {
      "ephemeral_5m_input_tokens": 148,
      "ephemeral_1h_input_tokens": 100
    }
  }
}

请注意，当前的 cache_creation_input_tokens 字段等于 cache_creation 对象中的值之和。

何时使用 1 小时缓存

如果您有以固定节奏使用的提示（即系统提示使用频率高于每 5 分钟），请继续使用 5 分钟缓存，因为这将继续免费刷新。

1 小时缓存最适合以下场景：

当您的提示可能使用频率低于 5 分钟，但高于每小时时。例如，当代理 side-agent 将花费超过 5 分钟时，或当存储与用户的长聊天对话且您通常期望该用户可能在接下来的 5 分钟内不会响应时。
当延迟很重要且您的后续提示可能在 5 分钟后发送时。
当您想提高速率限制利用率时，因为缓存命中不会从速率限制中扣除。

Note

5 分钟和 1 小时缓存在延迟方面表现相同。对于长文档，您通常会看到改进的 time-to-first-token。

混合不同的 TTL

您可以在同一请求中使用 1 小时和 5 分钟缓存控制，但有一个重要约束：具有较长 TTL 的缓存条目必须出现在较短 TTL 之前（即 1 小时缓存条目必须出现在任何 5 分钟缓存条目之前）。

混合 TTL 时，API 在提示中确定三个计费位置：

位置 A：最高缓存命中的 token 计数（如果没有命中则为 0）。
位置 B：A 之后最高 1 小时 cache_control 块的 token 计数（如果不存在则等于 A）。
位置 C：最后一个 cache_control 块的 token 计数。

Note

如果 B 和/或 C 大于 A，它们必然是缓存未命中，因为 A 是最高缓存命中。

您将被收取：

A 的缓存读取 token。
(B - A) 的 1 小时缓存写入 token。
(C - B) 的 5 分钟缓存写入 token。

以下是 3 个示例。这描述了 3 个请求的输入 token，每个请求都有不同的缓存命中和缓存未命中。每个都有不同的计算定价，显示在彩色框中。混合 TTL 图

预热缓存

缓存预热让您在用户触发真实请求之前将系统提示或工具定义加载到 prompt 缓存中。这消除了第一次用户交互时的缓存未命中延迟惩罚，减少延迟敏感应用程序的 time-to-first-token (TTFT)。

工作原理

在请求中设置 max_tokens: 0。API 将您的提示读入模型并在任何 cache_control 断点处写入缓存，然后立即返回而不生成任何输出。响应具有空的 content 数组、stop_reason: "max_tokens" 和完全填充的 usage 块。

将 cache_control 断点放在与后续请求共享的最后一个块上（通常是系统提示或工具定义），而不是占位符用户消息上。否则缓存条目以占位符为键，后续请求不会命中它。这意味着使用显式缓存断点而不是自动缓存，因为自动缓存将断点放在最后一个块上，这里是占位符。占位符用户消息可以是任何包含非空白内容的字符串（这里的示例使用 "warmup"）；其内容被读入模型但从不被回答。

Note

预热请求如果前缀尚未缓存会产生缓存写入费用，与任何其他请求相同。检查响应中的 usage.cache_creation_input_tokens 以确认发生了写入。零输出 token 被计费。

curl https://api.anthropic.com/v1/messages \
  -H "content-type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-opus-4-7",
    "max_tokens": 0,
    "system": [
      {
        "type": "text",
        "text": "You are an expert software engineer with deep knowledge of distributed systems...",
        "cache_control": {"type": "ephemeral"}
      }
    ],
    "messages": [{"role": "user", "content": "warmup"}]
  }'

ant messages create \
  --transform '{stop_reason,content,usage}' --format yaml <<'YAML'
model: claude-opus-4-7
max_tokens: 0
system:
  - type: text
    text: >-
      You are an expert software engineer with deep knowledge of
      distributed systems...
    cache_control:
      type: ephemeral
messages:
  - role: user
    content: warmup
YAML

import anthropic

client = anthropic.Anthropic()

# Fire this before users arrive to warm the shared system-prompt cache.
prewarm = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=0,
    system=[
        {
            "type": "text",
            "text": "You are an expert software engineer with deep knowledge of distributed systems...",
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=[{"role": "user", "content": "warmup"}],
)
print(prewarm.stop_reason)  # "max_tokens"
print(prewarm.content)  # []
print(prewarm.usage)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

// Fire this before users arrive to warm the shared system-prompt cache.
const prewarm = await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 0,
  system: [
    {
      type: "text",
      text: "You are an expert software engineer with deep knowledge of distributed systems...",
      cache_control: { type: "ephemeral" }
    }
  ],
  messages: [{ role: "user", content: "warmup" }]
});
console.log(prewarm.stop_reason); // "max_tokens"
console.log(prewarm.content); // []
console.log(prewarm.usage);

using Anthropic;
using Anthropic.Models.Messages;

AnthropicClient client = new();

var prewarm = await client.Messages.Create(
    new()
    {
        Model = Model.ClaudeOpus4_7,
        MaxTokens = 0,
        System = new(
            [
                new TextBlockParam
                {
                    Text = "You are an expert software engineer with deep knowledge of distributed systems...",
                    CacheControl = new(),
                },
            ]
        ),
        Messages = [new() { Role = Role.User, Content = "warmup" }],
    }
);

Console.WriteLine(prewarm.StopReason?.Raw()); // "max_tokens"
Console.WriteLine(prewarm.Content.Count); // 0
Console.WriteLine(prewarm.Usage);

package main

import (
	"context"
	"fmt"

	"github.com/anthropics/anthropic-sdk-go"
)

func main() {
	client := anthropic.NewClient()

	prewarm, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
		Model:     anthropic.ModelClaudeOpus4_7,
		MaxTokens: 0,
		System: []anthropic.TextBlockParam{
			{
				Text:         "You are an expert software engineer with deep knowledge of distributed systems...",
				CacheControl: anthropic.NewCacheControlEphemeralParam(),
			},
		},
		Messages: []anthropic.MessageParam{
			anthropic.NewUserMessage(anthropic.NewTextBlock("warmup")),
		},
	})
	if err != nil {
		panic(err)
	}

	fmt.Println(prewarm.StopReason) // "max_tokens"
	fmt.Println(prewarm.Content)    // []
	fmt.Println(prewarm.Usage.RawJSON())
}

import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.messages.CacheControlEphemeral;
import com.anthropic.models.messages.Message;
import com.anthropic.models.messages.MessageCreateParams;
import com.anthropic.models.messages.Model;
import com.anthropic.models.messages.TextBlockParam;

void main() {
    AnthropicClient client = AnthropicOkHttpClient.fromEnv();

    Message prewarm = client.messages().create(MessageCreateParams.builder()
            .model(Model.CLAUDE_OPUS_4_7)
            .maxTokens(0)
            .systemOfTextBlockParams(List.of(TextBlockParam.builder()
                    .text("You are an expert software engineer with deep knowledge of distributed systems...")
                    .cacheControl(CacheControlEphemeral.builder().build())
                    .build()))
            .addUserMessage("warmup")
            .build());

    IO.println(prewarm.stopReason()); // Optional[max_tokens]
    IO.println(prewarm.content());    // []
    IO.println(prewarm.usage());
}

<?php

use Anthropic\Client;
use Anthropic\Messages\Model;

$client = new Client();

$prewarm = $client->messages->create(
    model: Model::CLAUDE_OPUS_4_7,
    maxTokens: 0,
    system: [
        [
            'type' => 'text',
            'text' => 'You are an expert software engineer with deep knowledge of distributed systems...',
            'cache_control' => ['type' => 'ephemeral'],
        ],
    ],
    messages: [['role' => 'user', 'content' => 'warmup']],
);

echo $prewarm->stopReason->value, PHP_EOL; // "max_tokens"
echo json_encode($prewarm->content), PHP_EOL; // []
echo json_encode($prewarm->usage), PHP_EOL;

require "anthropic"

client = Anthropic::Client.new

prewarm = client.messages.create(
  model: Anthropic::Model::CLAUDE_OPUS_4_7,
  max_tokens: 0,
  system_: [
    {
      type: "text",
      text: "You are an expert software engineer with deep knowledge of distributed systems...",
      cache_control: {type: "ephemeral"}
    }
  ],
  messages: [{role: "user", content: "warmup"}]
)

puts prewarm.stop_reason # :max_tokens
puts prewarm.content # []
puts prewarm.usage

API 返回空的 content 数组：

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [],
  "model": "claude-opus-4-7-20251101",
  "stop_reason": "max_tokens",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 8,
    "cache_creation_input_tokens": 5120,
    "cache_read_input_tokens": 0,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 5120,
      "ephemeral_1h_input_tokens": 0
    },
    "iterations": [
      {
        "input_tokens": 8,
        "output_tokens": 0,
        "cache_read_input_tokens": 0,
        "cache_creation_input_tokens": 5120,
        "cache_creation": {
          "ephemeral_5m_input_tokens": 5120,
          "ephemeral_1h_input_tokens": 0
        },
        "type": "message"
      }
    ],
    "output_tokens": 0,
    "service_tier": "standard",
    "inference_geo": "global"
  }
}

典型使用模式

在应用程序启动时（或按计划间隔）触发预热请求，然后在预热完成后发送真实用户请求：

import anthropic

client = anthropic.Anthropic()

SYSTEM_PROMPT = [
    {
        "type": "text",
        "text": "You are an expert software engineer with deep knowledge of distributed systems...",
        "cache_control": {"type": "ephemeral"},
    }
]


def prewarm_cache() -> None:
    """Call this at application startup or on a scheduled interval."""
    client.messages.create(
        model="claude-opus-4-7",
        max_tokens=0,
        system=SYSTEM_PROMPT,
        messages=[{"role": "user", "content": "warmup"}],
    )


def respond(user_message: str) -> anthropic.types.Message:
    """The real user request; benefits from a warm cache."""
    return client.messages.create(
        model="claude-opus-4-7",
        max_tokens=1024,
        system=SYSTEM_PROMPT,
        messages=[{"role": "user", "content": user_message}],
    )


# Warm the cache before any user traffic arrives.
prewarm_cache()

# Later, when the user submits a message, the system-prompt prefix is already cached.
response = respond("How do I implement a binary search tree?")
print(response.content[0].text)

请记住缓存 TTL 仍然适用。对于默认的 5 分钟缓存，至少每 5 分钟发送新的预热请求以保持缓存温暖。对于用户请求之间较长的间隔，请改用1 小时缓存持续时间。

限制

如果设置了以下任何一项，max_tokens: 0 请求将被拒绝并返回 invalid_request_error，因为每一项都暗示了零 token 预算无法产生的输出：

stream: true
扩展思考（thinking.type: "enabled"）
结构化输出（output_config.format）
tool_choice 为 {"type": "tool", ...} 或 {"type": "any"}

max_tokens: 0 在 Message Batches 请求中也被拒绝。预热针对 time-to-first-token，这不适用于批处理，并且在批处理期间写入的缓存条目可能在后续请求运行之前过期。

替代 max_tokens=1 的解决方法

在 max_tokens: 0 可用之前，一些应用程序使用 max_tokens: 1 预热调用来实现相同的效果。max_tokens: 0 方法是首选：不产生输出，因此没有需要丢弃的单 token 回复，不计费输出 token，并且请求的意图明确。

Prompt caching 示例

要帮助您开始使用 prompt caching，prompt caching cookbook 提供了详细的示例和最佳实践。

以下代码片段展示了各种 prompt caching 模式。这些示例演示了如何在不同场景中实现缓存，帮助您理解此功能的实际应用：

大型上下文缓存示例

curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-opus-4-7",
    "max_tokens": 1024,
    "system": [
        {
            "type": "text",
            "text": "You are an AI assistant tasked with analyzing legal documents."
        },
        {
            "type": "text",
            "text": "Here is the full text of a complex legal agreement: [Insert full text of a 50-page legal agreement here]",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    "messages": [
        {
            "role": "user",
            "content": "What are the key terms and conditions in this agreement?"
        }
    ]
}'

ant messages create <<'YAML'
model: claude-opus-4-7
max_tokens: 1024
system:
  - type: text
    text: You are an AI assistant tasked with analyzing legal documents.
  - type: text
    text: >-
      Here is the full text of a complex legal agreement:
      [Insert full text of a 50-page legal agreement here]
    cache_control:
      type: ephemeral
messages:
  - role: user
    content: What are the key terms and conditions in this agreement?
YAML

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are an AI assistant tasked with analyzing legal documents.",
        },
        {
            "type": "text",
            "text": "Here is the full text of a complex legal agreement: [Insert full text of a 50-page legal agreement here]",
            "cache_control": {"type": "ephemeral"},
        },
    ],
    messages=[
        {
            "role": "user",
            "content": "What are the key terms and conditions in this agreement?",
        }
    ],
)
print(response.usage.model_dump_json())

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: "You are an AI assistant tasked with analyzing legal documents."
    },
    {
      type: "text",
      text: "Here is the full text of a complex legal agreement: [Insert full text of a 50-page legal agreement here]",
      cache_control: { type: "ephemeral" }
    }
  ],
  messages: [
    {
      role: "user",
      content: "What are the key terms and conditions in this agreement?"
    }
  ]
});
console.log(response);

using Anthropic;
using Anthropic.Models.Messages;

AnthropicClient client = new()
{
    ApiKey = Environment.GetEnvironmentVariable("ANTHROPIC_API_KEY")
};

var parameters = new MessageCreateParams
{
    Model = Model.ClaudeOpus4_7,
    MaxTokens = 1024,
    System = new MessageCreateParamsSystem(new List<TextBlockParam>
    {
        new TextBlockParam()
        {
            Text = "You are an AI assistant tasked with analyzing legal documents.",
        },
        new TextBlockParam()
        {
            Text = "Here is the full text of a complex legal agreement: [Insert full text of a 50-page legal agreement here]",
            CacheControl = new CacheControlEphemeral(),
        },
    }),
    Messages =
    [
        new()
        {
            Role = Role.User,
            Content = "What are the key terms and conditions in this agreement?"
        }
    ]
};

var message = await client.Messages.Create(parameters);
Console.WriteLine(message);

package main

import (
	"context"
	"fmt"
	"log"

	"github.com/anthropics/anthropic-sdk-go"
)

func main() {
	client := anthropic.NewClient()

	response, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
		Model:     anthropic.ModelClaudeOpus4_7,
		MaxTokens: 1024,
		System: []anthropic.TextBlockParam{
			{
				Text: "You are an AI assistant tasked with analyzing legal documents.",
			},
			{
				Text:         "Here is the full text of a complex legal agreement: [Insert full text of a 50-page legal agreement here]",
				CacheControl: anthropic.NewCacheControlEphemeralParam(),
			},
		},
		Messages: []anthropic.MessageParam{
			anthropic.NewUserMessage(anthropic.NewTextBlock("What are the key terms and conditions in this agreement?")),
		},
	})
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(response.Usage)
}

import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.messages.CacheControlEphemeral;
import com.anthropic.models.messages.Message;
import com.anthropic.models.messages.MessageCreateParams;
import com.anthropic.models.messages.Model;
import com.anthropic.models.messages.TextBlockParam;
import java.util.List;

public class LegalDocumentAnalysisExample {

  public static void main(String[] args) {
    AnthropicClient client = AnthropicOkHttpClient.fromEnv();

    MessageCreateParams params = MessageCreateParams.builder()
      .model(Model.CLAUDE_OPUS_4_7)
      .maxTokens(1024)
      .systemOfTextBlockParams(
        List.of(
          TextBlockParam.builder()
            .text("You are an AI assistant tasked with analyzing legal documents.")
            .build(),
          TextBlockParam.builder()
            .text(
              "Here is the full text of a complex legal agreement: [Insert full text of a 50-page legal agreement here]"
            )
            .cacheControl(CacheControlEphemeral.builder().build())
            .build()
        )
      )
      .addUserMessage("What are the key terms and conditions in this agreement?")
      .build();

    Message message = client.messages().create(params);
    System.out.println(message);
  }
}

<?php

use Anthropic\Client;

$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));

$message = $client->messages->create(
    maxTokens: 1024,
    messages: [
        [
            'role' => 'user',
            'content' => 'What are the key terms and conditions in this agreement?'
        ]
    ],
    model: 'claude-opus-4-7',
    system: [
        [
            'type' => 'text',
            'text' => 'You are an AI assistant tasked with analyzing legal documents.'
        ],
        [
            'type' => 'text',
            'text' => 'Here is the full text of a complex legal agreement: [Insert full text of a 50-page legal agreement here]',
            'cache_control' => ['type' => 'ephemeral']
        ]
    ],
);

echo $message->content[0]->text;

require "anthropic"

client = Anthropic::Client.new

message = client.messages.create(
  model: "claude-opus-4-7",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: "You are an AI assistant tasked with analyzing legal documents."
    },
    {
      type: "text",
      text: "Here is the full text of a complex legal agreement: [Insert full text of a 50-page legal agreement here]",
      cache_control: { type: "ephemeral" }
    }
  ],
  messages: [
    {
      role: "user",
      content: "What are the key terms and conditions in this agreement?"
    }
  ]
)
puts message

此示例演示了基本的 prompt caching 用法，将法律协议的全文缓存为前缀，同时保持用户指令未缓存。

对于第一个请求：

input_tokens：仅用户消息中的 token 数量
cache_creation_input_tokens：整个系统消息中的 token 数量，包括法律文档
cache_read_input_tokens：0（第一个请求没有缓存命中）

对于缓存生命周期内的后续请求：

input_tokens：仅用户消息中的 token 数量
cache_creation_input_tokens：0（没有新的缓存创建）
cache_read_input_tokens：整个缓存系统消息中的 token 数量

缓存工具定义

通过在 tools 数组中的最后一个工具上放置 cache_control 来缓存工具定义。在该工具之前定义的所有工具都作为单个前缀缓存。

{
  "model": "claude-opus-4-7",
  "max_tokens": 1024,
  "tools": [
    {
      "name": "get_weather",
      "description": "Get the current weather in a given location",
      "input_schema": {
        "type": "object",
        "properties": { "location": { "type": "string" } },
        "required": ["location"]
      }
    },
    {
      "name": "get_time",
      "description": "Get the current time in a given time zone",
      "input_schema": {
        "type": "object",
        "properties": { "timezone": { "type": "string" } },
        "required": ["timezone"]
      },
      "cache_control": { "type": "ephemeral" }
    }
  ],
  "messages": [{ "role": "user", "content": "What is the weather and time in New York?" }]
}

在第一个请求中，cache_creation_input_tokens 反映所有工具定义的 token 计数。在缓存生命周期内的后续请求中，这些 token 会出现在 cache_read_input_tokens 下。

有关工具定义、defer_loading 和缓存失效之间详细交互，请参阅使用 prompt caching 的工具使用。

继续多轮对话

curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-opus-4-7",
    "max_tokens": 1024,
    "system": [
        {
            "type": "text",
            "text": "...long system prompt",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Hello, can you tell me more about the solar system?"
                }
            ]
        },
        {
            "role": "assistant",
            "content": "Certainly! The solar system is the collection of celestial bodies that orbit our Sun. It consists of eight planets, numerous moons, asteroids, comets, and other objects. The planets, in order from closest to farthest from the Sun, are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. Each planet has its own unique characteristics and features. Is there a specific aspect of the solar system you would like to know more about?"
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Good to know."
                },
                {
                    "type": "text",
                    "text": "Tell me more about Mars.",
                    "cache_control": {"type": "ephemeral"}
                }
            ]
        }
    ]
}'

ant messages create <<'YAML'
model: claude-opus-4-7
max_tokens: 1024
system:
  - type: text
    text: "...long system prompt"
    cache_control:
      type: ephemeral
messages:
  - role: user
    content:
      - type: text
        text: Hello, can you tell me more about the solar system?
  - role: assistant
    content: >-
      Certainly! The solar system is the collection of celestial bodies that
      orbit our Sun. It consists of eight planets, numerous moons, asteroids,
      comets, and other objects. The planets, in order from closest to farthest
      from the Sun, are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus,
      and Neptune. Each planet has its own unique characteristics and features.
      Is there a specific aspect of the solar system you would like to know
      more about?
  - role: user
    content:
      - type: text
        text: Good to know.
      - type: text
        text: Tell me more about Mars.
        cache_control:
          type: ephemeral
YAML

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "...long system prompt",
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=[
        # ...long conversation so far
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Hello, can you tell me more about the solar system?",
                }
            ],
        },
        {
            "role": "assistant",
            "content": "Certainly! The solar system is the collection of celestial bodies that orbit our Sun. It consists of eight planets, numerous moons, asteroids, comets, and other objects. The planets, in order from closest to farthest from the Sun, are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. Each planet has its own unique characteristics and features. Is there a specific aspect of the solar system you'd like to know more about?",
        },
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Good to know."},
                {
                    "type": "text",
                    "text": "Tell me more about Mars.",
                    "cache_control": {"type": "ephemeral"},
                },
            ],
        },
    ],
)
print(response.model_dump_json())

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: "...long system prompt",
      cache_control: { type: "ephemeral" }
    }
  ],
  messages: [
    // ...long conversation so far
    {
      role: "user",
      content: [
        {
          type: "text",
          text: "Hello, can you tell me more about the solar system?"
        }
      ]
    },
    {
      role: "assistant",
      content:
        "Certainly! The solar system is the collection of celestial bodies that orbit our Sun. It consists of eight planets, numerous moons, asteroids, comets, and other objects. The planets, in order from closest to farthest from the Sun, are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. Each planet has its own unique characteristics and features. Is there a specific aspect of the solar system you'd like to know more about?"
    },
    {
      role: "user",
      content: [
        {
          type: "text",
          text: "Good to know."
        },
        {
          type: "text",
          text: "Tell me more about Mars.",
          cache_control: { type: "ephemeral" }
        }
      ]
    }
  ]
});
console.log(response);

using Anthropic;
using Anthropic.Models.Messages;
using System.Collections.Generic;

AnthropicClient client = new();

var parameters = new MessageCreateParams
{
    Model = Model.ClaudeOpus4_7,
    MaxTokens = 1024,
    System = new MessageCreateParamsSystem(new List<TextBlockParam>
    {
        new TextBlockParam()
        {
            Text = "...long system prompt",
            CacheControl = new CacheControlEphemeral(),
        },
    }),
    Messages =
    [
        new()
        {
            Role = Role.User,
            Content = new MessageParamContent(new List<ContentBlockParam>
            {
                new ContentBlockParam(new TextBlockParam("Hello, can you tell me more about the solar system?")),
            }),
        },
        new()
        {
            Role = Role.Assistant,
            Content = "Certainly! The solar system is the collection of celestial bodies that orbit our Sun. It consists of eight planets, numerous moons, asteroids, comets, and other objects. The planets, in order from closest to farthest from the Sun, are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. Each planet has its own unique characteristics and features. Is there a specific aspect of the solar system you would like to know more about?"
        },
        new()
        {
            Role = Role.User,
            Content = new MessageParamContent(new List<ContentBlockParam>
            {
                new ContentBlockParam(new TextBlockParam("Good to know.")),
                new ContentBlockParam(new TextBlockParam()
                {
                    Text = "Tell me more about Mars.",
                    CacheControl = new CacheControlEphemeral(),
                }),
            })
        }
    ]
};

var message = await client.Messages.Create(parameters);
Console.WriteLine(message);

package main

import (
	"context"
	"fmt"
	"log"

	"github.com/anthropics/anthropic-sdk-go"
)

func main() {
	client := anthropic.NewClient()

	response, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
		Model:     anthropic.ModelClaudeOpus4_7,
		MaxTokens: 1024,
		System: []anthropic.TextBlockParam{
			{
				Text:         "...long system prompt",
				CacheControl: anthropic.NewCacheControlEphemeralParam(),
			},
		},
		Messages: []anthropic.MessageParam{
			anthropic.NewUserMessage(anthropic.NewTextBlock("Hello, can you tell me more about the solar system?")),
			anthropic.NewAssistantMessage(anthropic.NewTextBlock("Certainly! The solar system is the collection of celestial bodies that orbit our Sun. It consists of eight planets, numerous moons, asteroids, comets, and other objects. The planets, in order from closest to farthest from the Sun, are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. Each planet has its own unique characteristics and features. Is there a specific aspect of the solar system you would like to know more about?")),
			{
				Role: anthropic.MessageParamRoleUser,
				Content: []anthropic.ContentBlockParamUnion{
					anthropic.NewTextBlock("Good to know."),
					{OfText: &anthropic.TextBlockParam{
						Text:         "Tell me more about Mars.",
						CacheControl: anthropic.NewCacheControlEphemeralParam(),
					}},
				},
			},
		},
	})
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(response)
}

import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.messages.CacheControlEphemeral;
import com.anthropic.models.messages.ContentBlockParam;
import com.anthropic.models.messages.Message;
import com.anthropic.models.messages.MessageCreateParams;
import com.anthropic.models.messages.Model;
import com.anthropic.models.messages.TextBlockParam;
import java.util.List;

public class ConversationWithCacheControlExample {

  public static void main(String[] args) {
    AnthropicClient client = AnthropicOkHttpClient.fromEnv();

    // Create ephemeral system prompt
    TextBlockParam systemPrompt = TextBlockParam.builder()
      .text("...long system prompt")
      .cacheControl(CacheControlEphemeral.builder().build())
      .build();

    // Create message params
    MessageCreateParams params = MessageCreateParams.builder()
      .model(Model.CLAUDE_OPUS_4_7)
      .maxTokens(1024)
      .systemOfTextBlockParams(List.of(systemPrompt))
      // First user message (without cache control)
      .addUserMessage("Hello, can you tell me more about the solar system?")
      // Assistant response
      .addAssistantMessage(
        "Certainly! The solar system is the collection of celestial bodies that orbit our Sun. It consists of eight planets, numerous moons, asteroids, comets, and other objects. The planets, in order from closest to farthest from the Sun, are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. Each planet has its own unique characteristics and features. Is there a specific aspect of the solar system you would like to know more about?"
      )
      // Second user message (with cache control)
      .addUserMessageOfBlockParams(
        List.of(
          ContentBlockParam.ofText(TextBlockParam.builder().text("Good to know.").build()),
          ContentBlockParam.ofText(
            TextBlockParam.builder()
              .text("Tell me more about Mars.")
              .cacheControl(CacheControlEphemeral.builder().build())
              .build()
          )
        )
      )
      .build();

    Message message = client.messages().create(params);
    System.out.println(message);
  }
}

<?php

use Anthropic\Client;

$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));

$message = $client->messages->create(
    maxTokens: 1024,
    messages: [
        [
            'role' => 'user',
            'content' => [
                [
                    'type' => 'text',
                    'text' => 'Hello, can you tell me more about the solar system?'
                ]
            ]
        ],
        [
            'role' => 'assistant',
            'content' => "Certainly! The solar system is the collection of celestial bodies that orbit our Sun. It consists of eight planets, numerous moons, asteroids, comets, and other objects. The planets, in order from closest to farthest from the Sun, are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. Each planet has its own unique characteristics and features. Is there a specific aspect of the solar system you would like to know more about?"
        ],
        [
            'role' => 'user',
            'content' => [
                ['type' => 'text', 'text' => 'Good to know.'],
                [
                    'type' => 'text',
                    'text' => 'Tell me more about Mars.',
                    'cache_control' => ['type' => 'ephemeral']
                ]
            ]
        ]
    ],
    model: 'claude-opus-4-7',
    system: [
        [
            'type' => 'text',
            'text' => '...long system prompt',
            'cache_control' => ['type' => 'ephemeral']
        ]
    ],
);

echo $message->content[0]->text;

require "anthropic"

client = Anthropic::Client.new

message = client.messages.create(
  model: "claude-opus-4-7",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: "...long system prompt",
      cache_control: { type: "ephemeral" }
    }
  ],
  messages: [
    {
      role: "user",
      content: [
        {
          type: "text",
          text: "Hello, can you tell me more about the solar system?"
        }
      ]
    },
    {
      role: "assistant",
      content: "Certainly! The solar system is the collection of celestial bodies that orbit our Sun. It consists of eight planets, numerous moons, asteroids, comets, and other objects. The planets, in order from closest to farthest from the Sun, are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. Each planet has its own unique characteristics and features. Is there a specific aspect of the solar system you would like to know more about?"
    },
    {
      role: "user",
      content: [
        { type: "text", text: "Good to know." },
        {
          type: "text",
          text: "Tell me more about Mars.",
          cache_control: { type: "ephemeral" }
        }
      ]
    }
  ]
)
puts message

此示例演示了如何在多轮对话中使用 prompt caching。

在每轮中，最后一条消息的最后一个块被标记为 cache_control，以便对话可以增量缓存。系统会自动查找并使用先前缓存的最长块序列用于后续消息。也就是说，先前标记有 cache_control 块的块后来不再标记，但如果它们在 5 分钟内被命中，它们仍被视为缓存命中（也是缓存刷新！）。

此外，请注意 cache_control 参数放在系统消息上。这是为了确保如果它被从缓存中驱逐（超过 5 分钟未使用），它将在下一个请求中被添加回缓存。

此方法有助于在进行中的对话中维护上下文，而无需重复处理相同的信息。

正确设置后，您应该在每个请求的使用响应中看到以下内容：

input_tokens：新用户消息中的 token 数量（将很少）
cache_creation_input_tokens：新助手和用户轮次中的 token 数量
cache_read_input_tokens：直到前一轮的对话中的 token 数量

综合应用：多个缓存断点

curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-opus-4-7",
    "max_tokens": 1024,
    "tools": [
        {
            "name": "search_documents",
            "description": "Search through the knowledge base",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Search query"
                    }
                },
                "required": ["query"]
            }
        },
        {
            "name": "get_document",
            "description": "Retrieve a specific document by ID",
            "input_schema": {
                "type": "object",
                "properties": {
                    "doc_id": {
                        "type": "string",
                        "description": "Document ID"
                    }
                },
                "required": ["doc_id"]
            },
            "cache_control": {"type": "ephemeral"}
        }
    ],
    "system": [
        {
            "type": "text",
            "text": "You are a helpful research assistant with access to a document knowledge base.\n\n# Instructions\n- Always search for relevant documents before answering\n- Provide citations for your sources\n- Be objective and accurate in your responses\n- If multiple documents contain relevant information, synthesize them\n- Acknowledge when information is not available in the knowledge base",
            "cache_control": {"type": "ephemeral"}
        },
        {
            "type": "text",
            "text": "# Knowledge Base Context\n\nHere are the relevant documents for this conversation:\n\n## Document 1: Solar System Overview\nThe solar system consists of the Sun and all objects that orbit it...\n\n## Document 2: Planetary Characteristics\nEach planet has unique features. Mercury is the smallest planet...\n\n## Document 3: Mars Exploration\nMars has been a target of exploration for decades...\n\n[Additional documents...]",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    "messages": [
        {
            "role": "user",
            "content": "Can you search for information about Mars rovers?"
        },
        {
            "role": "assistant",
            "content": [
                {
                    "type": "tool_use",
                    "id": "tool_1",
                    "name": "search_documents",
                    "input": {"query": "Mars rovers"}
                }
            ]
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "tool_result",
                    "tool_use_id": "tool_1",
                    "content": "Found 3 relevant documents: Document 3 (Mars Exploration), Document 7 (Rover Technology), Document 9 (Mission History)"
                }
            ]
        },
        {
            "role": "assistant",
            "content": [
                {
                    "type": "text",
                    "text": "I found 3 relevant documents about Mars rovers. Let me get more details from the Mars Exploration document."
                }
            ]
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Yes, please tell me about the Perseverance rover specifically.",
                    "cache_control": {"type": "ephemeral"}
                }
            ]
        }
    ]
}'

ant messages create <<'YAML'
model: claude-opus-4-7
max_tokens: 1024
tools:
  - name: search_documents
    description: Search through the knowledge base
    input_schema:
      type: object
      properties:
        query:
          type: string
          description: Search query
      required: [query]
  - name: get_document
    description: Retrieve a specific document by ID
    input_schema:
      type: object
      properties:
        doc_id:
          type: string
          description: Document ID
      required: [doc_id]
    cache_control:
      type: ephemeral
system:
  - type: text
    text: |-
      You are a helpful research assistant with access to a document knowledge base.

      # Instructions
      - Always search for relevant documents before answering
      - Provide citations for your sources
      - Be objective and accurate in your responses
      - If multiple documents contain relevant information, synthesize them
      - Acknowledge when information is not available in the knowledge base
    cache_control:
      type: ephemeral
  - type: text
    text: |-
      # Knowledge Base Context

      Here are the relevant documents for this conversation:

      ## Document 1: Solar System Overview
      The solar system consists of the Sun and all objects that orbit it...

      ## Document 2: Planetary Characteristics
      Each planet has unique features. Mercury is the smallest planet...

      ## Document 3: Mars Exploration
      Mars has been a target of exploration for decades...

      [Additional documents...]
    cache_control:
      type: ephemeral
messages:
  - role: user
    content: Can you search for information about Mars rovers?
  - role: assistant
    content:
      - type: tool_use
        id: tool_1
        name: search_documents
        input:
          query: Mars rovers
  - role: user
    content:
      - type: tool_result
        tool_use_id: tool_1
        content: >-
          Found 3 relevant documents: Document 3 (Mars Exploration),
          Document 7 (Rover Technology), Document 9 (Mission History)
  - role: assistant
    content:
      - type: text
        text: >-
          I found 3 relevant documents about Mars rovers. Let me get more
          details from the Mars Exploration document.
  - role: user
    content:
      - type: text
        text: Yes, please tell me about the Perseverance rover specifically.
        cache_control:
          type: ephemeral
YAML

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    tools=[
        {
            "name": "search_documents",
            "description": "Search through the knowledge base",
            "input_schema": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"}
                },
                "required": ["query"],
            },
        },
        {
            "name": "get_document",
            "description": "Retrieve a specific document by ID",
            "input_schema": {
                "type": "object",
                "properties": {
                    "doc_id": {"type": "string", "description": "Document ID"}
                },
                "required": ["doc_id"],
            },
            "cache_control": {"type": "ephemeral"},
        },
    ],
    system=[
        {
            "type": "text",
            "text": "You are a helpful research assistant with access to a document knowledge base.\n\n# Instructions\n- Always search for relevant documents before answering\n- Provide citations for your sources\n- Be objective and accurate in your responses\n- If multiple documents contain relevant information, synthesize them\n- Acknowledge when information is not available in the knowledge base",
            "cache_control": {"type": "ephemeral"},
        },
        {
            "type": "text",
            "text": "# Knowledge Base Context\n\nHere are the relevant documents for this conversation:\n\n## Document 1: Solar System Overview\nThe solar system consists of the Sun and all objects that orbit it...\n\n## Document 2: Planetary Characteristics\nEach planet has unique features. Mercury is the smallest planet...\n\n## Document 3: Mars Exploration\nMars has been a target of exploration for decades...\n\n[Additional documents...]",
            "cache_control": {"type": "ephemeral"},
        },
    ],
    messages=[
        {
            "role": "user",
            "content": "Can you search for information about Mars rovers?",
        },
        {
            "role": "assistant",
            "content": [
                {
                    "type": "tool_use",
                    "id": "tool_1",
                    "name": "search_documents",
                    "input": {"query": "Mars rovers"},
                }
            ],
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "tool_result",
                    "tool_use_id": "tool_1",
                    "content": "Found 3 relevant documents: Document 3 (Mars Exploration), Document 7 (Rover Technology), Document 9 (Mission History)",
                }
            ],
        },
        {
            "role": "assistant",
            "content": [
                {
                    "type": "text",
                    "text": "I found 3 relevant documents about Mars rovers. Let me get more details from the Mars Exploration document.",
                }
            ],
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Yes, please tell me about the Perseverance rover specifically.",
                    "cache_control": {"type": "ephemeral"},
                }
            ],
        },
    ],
)
print(response.model_dump_json())

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 1024,
  tools: [
    {
      name: "search_documents",
      description: "Search through the knowledge base",
      input_schema: {
        type: "object",
        properties: {
          query: {
            type: "string",
            description: "Search query"
          }
        },
        required: ["query"]
      }
    },
    {
      name: "get_document",
      description: "Retrieve a specific document by ID",
      input_schema: {
        type: "object",
        properties: {
          doc_id: {
            type: "string",
            description: "Document ID"
          }
        },
        required: ["doc_id"]
      },
      cache_control: { type: "ephemeral" }
    }
  ],
  system: [
    {
      type: "text",
      text: "You are a helpful research assistant with access to a document knowledge base.\n\n# Instructions\n- Always search for relevant documents before answering\n- Provide citations for your sources\n- Be objective and accurate in your responses\n- If multiple documents contain relevant information, synthesize them\n- Acknowledge when information is not available in the knowledge base",
      cache_control: { type: "ephemeral" }
    },
    {
      type: "text",
      text: "# Knowledge Base Context\n\nHere are the relevant documents for this conversation:\n\n## Document 1: Solar System Overview\nThe solar system consists of the Sun and all objects that orbit it...\n\n## Document 2: Planetary Characteristics\nEach planet has unique features. Mercury is the smallest planet...\n\n## Document 3: Mars Exploration\nMars has been a target of exploration for decades...\n\n[Additional documents...]",
      cache_control: { type: "ephemeral" }
    }
  ],
  messages: [
    {
      role: "user",
      content: "Can you search for information about Mars rovers?"
    },
    {
      role: "assistant",
      content: [
        {
          type: "tool_use",
          id: "tool_1",
          name: "search_documents",
          input: { query: "Mars rovers" }
        }
      ]
    },
    {
      role: "user",
      content: [
        {
          type: "tool_result",
          tool_use_id: "tool_1",
          content:
            "Found 3 relevant documents: Document 3 (Mars Exploration), Document 7 (Rover Technology), Document 9 (Mission History)"
        }
      ]
    },
    {
      role: "assistant",
      content: [
        {
          type: "text",
          text: "I found 3 relevant documents about Mars rovers. Let me get more details from the Mars Exploration document."
        }
      ]
    },
    {
      role: "user",
      content: [
        {
          type: "text",
          text: "Yes, please tell me about the Perseverance rover specifically.",
          cache_control: { type: "ephemeral" }
        }
      ]
    }
  ]
});
console.log(response);

using System.Text.Json;
using Anthropic;
using Anthropic.Models.Messages;

AnthropicClient client = new()
{
    ApiKey = Environment.GetEnvironmentVariable("ANTHROPIC_API_KEY")
};

var parameters = new MessageCreateParams
{
    Model = Model.ClaudeOpus4_7,
    MaxTokens = 1024,
    Tools =
    [
        new ToolUnion(new Tool()
        {
            Name = "search_documents",
            Description = "Search through the knowledge base",
            InputSchema = new InputSchema()
            {
                Properties = new Dictionary<string, JsonElement>
                {
                    ["query"] = JsonSerializer.SerializeToElement(new { type = "string", description = "Search query" }),
                },
                Required = ["query"],
            },
        }),
        new ToolUnion(new Tool()
        {
            Name = "get_document",
            Description = "Retrieve a specific document by ID",
            InputSchema = new InputSchema()
            {
                Properties = new Dictionary<string, JsonElement>
                {
                    ["doc_id"] = JsonSerializer.SerializeToElement(new { type = "string", description = "Document ID" }),
                },
                Required = ["doc_id"],
            },
            CacheControl = new CacheControlEphemeral(),
        }),
    ],
    System = new MessageCreateParamsSystem(new List<TextBlockParam>
    {
        new TextBlockParam()
        {
            Text = "You are a helpful research assistant with access to a document knowledge base.\n\n# Instructions\n- Always search for relevant documents before answering\n- Provide citations for your sources\n- Be objective and accurate in your responses\n- If multiple documents contain relevant information, synthesize them\n- Acknowledge when information is not available in the knowledge base",
            CacheControl = new CacheControlEphemeral(),
        },
        new TextBlockParam()
        {
            Text = "# Knowledge Base Context\n\nHere are the relevant documents for this conversation:\n\n## Document 1: Solar System Overview\nThe solar system consists of the Sun and all objects that orbit it...\n\n## Document 2: Planetary Characteristics\nEach planet has unique features. Mercury is the smallest planet...\n\n## Document 3: Mars Exploration\nMars has been a target of exploration for decades...\n\n[Additional documents...]",
            CacheControl = new CacheControlEphemeral(),
        },
    }),
    Messages =
    [
        new() { Role = Role.User, Content = "Can you search for information about Mars rovers?" },
        new()
        {
            Role = Role.Assistant,
            Content = new MessageParamContent(new List<ContentBlockParam>
            {
                new ContentBlockParam(new ToolUseBlockParam()
                {
                    ID = "tool_1",
                    Name = "search_documents",
                    Input = new Dictionary<string, JsonElement>
                    {
                        ["query"] = JsonSerializer.SerializeToElement("Mars rovers"),
                    },
                }),
            }),
        },
        new()
        {
            Role = Role.User,
            Content = new MessageParamContent(new List<ContentBlockParam>
            {
                new ContentBlockParam(new ToolResultBlockParam()
                {
                    ToolUseID = "tool_1",
                    Content = "Found 3 relevant documents: Document 3 (Mars Exploration), Document 7 (Rover Technology), Document 9 (Mission History)",
                }),
            }),
        },
        new()
        {
            Role = Role.Assistant,
            Content = "I found 3 relevant documents about Mars rovers. Let me get more details from the Mars Exploration document.",
        },
        new()
        {
            Role = Role.User,
            Content = new MessageParamContent(new List<ContentBlockParam>
            {
                new ContentBlockParam(new TextBlockParam()
                {
                    Text = "Yes, please tell me about the Perseverance rover specifically.",
                    CacheControl = new CacheControlEphemeral(),
                }),
            }),
        },
    ]
};

var message = await client.Messages.Create(parameters);
Console.WriteLine(message);

package main

import (
	"context"
	"fmt"
	"log"

	"github.com/anthropics/anthropic-sdk-go"
)

func main() {
	client := anthropic.NewClient()

	response, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
		Model:     anthropic.ModelClaudeOpus4_7,
		MaxTokens: 1024,
		Tools: []anthropic.ToolUnionParam{
			{OfTool: &anthropic.ToolParam{
				Name:        "search_documents",
				Description: anthropic.String("Search through the knowledge base"),
				InputSchema: anthropic.ToolInputSchemaParam{
					Properties: map[string]any{
						"query": map[string]any{
							"type":        "string",
							"description": "Search query",
						},
					},
					Required: []string{"query"},
				},
			}},
			{OfTool: &anthropic.ToolParam{
				Name:        "get_document",
				Description: anthropic.String("Retrieve a specific document by ID"),
				InputSchema: anthropic.ToolInputSchemaParam{
					Properties: map[string]any{
						"doc_id": map[string]any{
							"type":        "string",
							"description": "Document ID",
						},
					},
					Required: []string{"doc_id"},
				},
				CacheControl: anthropic.NewCacheControlEphemeralParam(),
			}},
		},
		System: []anthropic.TextBlockParam{
			{
				Text:         "You are a helpful research assistant with access to a document knowledge base.\n\n# Instructions\n- Always search for relevant documents before answering\n- Provide citations for your sources\n- Be objective and accurate in your responses\n- If multiple documents contain relevant information, synthesize them\n- Acknowledge when information is not available in the knowledge base",
				CacheControl: anthropic.NewCacheControlEphemeralParam(),
			},
			{
				Text:         "# Knowledge Base Context\n\nHere are the relevant documents for this conversation:\n\n## Document 1: Solar System Overview\nThe solar system consists of the Sun and all objects that orbit it...\n\n## Document 2: Planetary Characteristics\nEach planet has unique features. Mercury is the smallest planet...\n\n## Document 3: Mars Exploration\nMars has been a target of exploration for decades...\n\n[Additional documents...]",
				CacheControl: anthropic.NewCacheControlEphemeralParam(),
			},
		},
		Messages: []anthropic.MessageParam{
			anthropic.NewUserMessage(anthropic.NewTextBlock("Can you search for information about Mars rovers?")),
			anthropic.NewAssistantMessage(anthropic.NewToolUseBlock(
				"tool_1",
				map[string]any{"query": "Mars rovers"},
				"search_documents",
			)),
			anthropic.NewUserMessage(anthropic.NewToolResultBlock(
				"tool_1",
				"Found 3 relevant documents: Document 3 (Mars Exploration), Document 7 (Rover Technology), Document 9 (Mission History)",
				false,
			)),
			anthropic.NewAssistantMessage(anthropic.NewTextBlock("I found 3 relevant documents about Mars rovers. Let me get more details from the Mars Exploration document.")),
			{
				Role: anthropic.MessageParamRoleUser,
				Content: []anthropic.ContentBlockParamUnion{
					{OfText: &anthropic.TextBlockParam{
						Text:         "Yes, please tell me about the Perseverance rover specifically.",
						CacheControl: anthropic.NewCacheControlEphemeralParam(),
					}},
				},
			},
		},
	})
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(response)
}

import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.core.JsonValue;
import com.anthropic.models.messages.CacheControlEphemeral;
import com.anthropic.models.messages.ContentBlockParam;
import com.anthropic.models.messages.Message;
import com.anthropic.models.messages.MessageCreateParams;
import com.anthropic.models.messages.Model;
import com.anthropic.models.messages.TextBlockParam;
import com.anthropic.models.messages.Tool;
import com.anthropic.models.messages.Tool.InputSchema;
import com.anthropic.models.messages.ToolResultBlockParam;
import com.anthropic.models.messages.ToolUseBlockParam;
import java.util.List;
import java.util.Map;

public class MultipleCacheBreakpointsExample {

  public static void main(String[] args) {
    AnthropicClient client = AnthropicOkHttpClient.fromEnv();

    // Search tool schema
    InputSchema searchSchema = InputSchema.builder()
      .properties(
        JsonValue.from(
          Map.of("query", Map.of("type", "string", "description", "Search query"))
        )
      )
      .putAdditionalProperty("required", JsonValue.from(List.of("query")))
      .build();

    // Get document tool schema
    InputSchema getDocSchema = InputSchema.builder()
      .properties(
        JsonValue.from(
          Map.of("doc_id", Map.of("type", "string", "description", "Document ID"))
        )
      )
      .putAdditionalProperty("required", JsonValue.from(List.of("doc_id")))
      .build();

    MessageCreateParams params = MessageCreateParams.builder()
      .model(Model.CLAUDE_OPUS_4_7)
      .maxTokens(1024)
      // Tools with cache control on the last one
      .addTool(
        Tool.builder()
          .name("search_documents")
          .description("Search through the knowledge base")
          .inputSchema(searchSchema)
          .build()
      )
      .addTool(
        Tool.builder()
          .name("get_document")
          .description("Retrieve a specific document by ID")
          .inputSchema(getDocSchema)
          .cacheControl(CacheControlEphemeral.builder().build())
          .build()
      )
      // System prompts with cache control on instructions and context separately
      .systemOfTextBlockParams(
        List.of(
          TextBlockParam.builder()
            .text(
              "You are a helpful research assistant with access to a document knowledge base.\n\n# Instructions\n- Always search for relevant documents before answering\n- Provide citations for your sources\n- Be objective and accurate in your responses\n- If multiple documents contain relevant information, synthesize them\n- Acknowledge when information is not available in the knowledge base"
            )
            .cacheControl(CacheControlEphemeral.builder().build())
            .build(),
          TextBlockParam.builder()
            .text(
              "# Knowledge Base Context\n\nHere are the relevant documents for this conversation:\n\n## Document 1: Solar System Overview\nThe solar system consists of the Sun and all objects that orbit it...\n\n## Document 2: Planetary Characteristics\nEach planet has unique features. Mercury is the smallest planet...\n\n## Document 3: Mars Exploration\nMars has been a target of exploration for decades...\n\n[Additional documents...]"
            )
            .cacheControl(CacheControlEphemeral.builder().build())
            .build()
        )
      )
      // Conversation history
      .addUserMessage("Can you search for information about Mars rovers?")
      .addAssistantMessageOfBlockParams(
        List.of(
          ContentBlockParam.ofToolUse(
            ToolUseBlockParam.builder()
              .id("tool_1")
              .name("search_documents")
              .input(JsonValue.from(Map.of("query", "Mars rovers")))
              .build()
          )
        )
      )
      .addUserMessageOfBlockParams(
        List.of(
          ContentBlockParam.ofToolResult(
            ToolResultBlockParam.builder()
              .toolUseId("tool_1")
              .content(
                "Found 3 relevant documents: Document 3 (Mars Exploration), Document 7 (Rover Technology), Document 9 (Mission History)"
              )
              .build()
          )
        )
      )
      .addAssistantMessageOfBlockParams(
        List.of(
          ContentBlockParam.ofText(
            TextBlockParam.builder()
              .text(
                "I found 3 relevant documents about Mars rovers. Let me get more details from the Mars Exploration document."
              )
              .build()
          )
        )
      )
      .addUserMessageOfBlockParams(
        List.of(
          ContentBlockParam.ofText(
            TextBlockParam.builder()
              .text("Yes, please tell me about the Perseverance rover specifically.")
              .cacheControl(CacheControlEphemeral.builder().build())
              .build()
          )
        )
      )
      .build();

    Message message = client.messages().create(params);
    System.out.println(message);
  }
}

<?php

use Anthropic\Client;

$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));

$message = $client->messages->create(
    maxTokens: 1024,
    messages: [
        [
            'role' => 'user',
            'content' => 'Can you search for information about Mars rovers?'
        ],
        [
            'role' => 'assistant',
            'content' => [
                [
                    'type' => 'tool_use',
                    'id' => 'tool_1',
                    'name' => 'search_documents',
                    'input' => ['query' => 'Mars rovers']
                ]
            ]
        ],
        [
            'role' => 'user',
            'content' => [
                [
                    'type' => 'tool_result',
                    'tool_use_id' => 'tool_1',
                    'content' => 'Found 3 relevant documents: Document 3 (Mars Exploration), Document 7 (Rover Technology), Document 9 (Mission History)'
                ]
            ]
        ],
        [
            'role' => 'assistant',
            'content' => [
                [
                    'type' => 'text',
                    'text' => 'I found 3 relevant documents about Mars rovers. Let me get more details from the Mars Exploration document.'
                ]
            ]
        ],
        [
            'role' => 'user',
            'content' => [
                [
                    'type' => 'text',
                    'text' => 'Yes, please tell me about the Perseverance rover specifically.',
                    'cache_control' => ['type' => 'ephemeral']
                ]
            ]
        ]
    ],
    model: 'claude-opus-4-7',
    system: [
        [
            'type' => 'text',
            'text' => "You are a helpful research assistant with access to a document knowledge base.\n\n# Instructions\n- Always search for relevant documents before answering\n- Provide citations for your sources\n- Be objective and accurate in your responses\n- If multiple documents contain relevant information, synthesize them\n- Acknowledge when information is not available in the knowledge base",
            'cache_control' => ['type' => 'ephemeral']
        ],
        [
            'type' => 'text',
            'text' => "# Knowledge Base Context\n\nHere are the relevant documents for this conversation:\n\n## Document 1: Solar System Overview\nThe solar system consists of the Sun and all objects that orbit it...\n\n## Document 2: Planetary Characteristics\nEach planet has unique features. Mercury is the smallest planet...\n\n## Document 3: Mars Exploration\nMars has been a target of exploration for decades...\n\n[Additional documents...]",
            'cache_control' => ['type' => 'ephemeral']
        ]
    ],
    tools: [
        [
            'name' => 'search_documents',
            'description' => 'Search through the knowledge base',
            'input_schema' => [
                'type' => 'object',
                'properties' => [
                    'query' => [
                        'type' => 'string',
                        'description' => 'Search query'
                    ]
                ],
                'required' => ['query']
            ]
        ],
        [
            'name' => 'get_document',
            'description' => 'Retrieve a specific document by ID',
            'input_schema' => [
                'type' => 'object',
                'properties' => [
                    'doc_id' => [
                        'type' => 'string',
                        'description' => 'Document ID'
                    ]
                ],
                'required' => ['doc_id']
            ],
            'cache_control' => ['type' => 'ephemeral']
        ]
    ],
);

echo json_encode($message->usage), PHP_EOL;

require "anthropic"

client = Anthropic::Client.new

message = client.messages.create(
  model: "claude-opus-4-7",
  max_tokens: 1024,
  tools: [
    {
      name: "search_documents",
      description: "Search through the knowledge base",
      input_schema: {
        type: "object",
        properties: {
          query: {
            type: "string",
            description: "Search query"
          }
        },
        required: ["query"]
      }
    },
    {
      name: "get_document",
      description: "Retrieve a specific document by ID",
      input_schema: {
        type: "object",
        properties: {
          doc_id: {
            type: "string",
            description: "Document ID"
          }
        },
        required: ["doc_id"]
      },
      cache_control: { type: "ephemeral" }
    }
  ],
  system: [
    {
      type: "text",
      text: "You are a helpful research assistant with access to a document knowledge base.\n\n# Instructions\n- Always search for relevant documents before answering\n- Provide citations for your sources\n- Be objective and accurate in your responses\n- If multiple documents contain relevant information, synthesize them\n- Acknowledge when information is not available in the knowledge base",
      cache_control: { type: "ephemeral" }
    },
    {
      type: "text",
      text: "# Knowledge Base Context\n\nHere are the relevant documents for this conversation:\n\n## Document 1: Solar System Overview\nThe solar system consists of the Sun and all objects that orbit it...\n\n## Document 2: Planetary Characteristics\nEach planet has unique features. Mercury is the smallest planet...\n\n## Document 3: Mars Exploration\nMars has been a target of exploration for decades...\n\n[Additional documents...]",
      cache_control: { type: "ephemeral" }
    }
  ],
  messages: [
    {
      role: "user",
      content: "Can you search for information about Mars rovers?"
    },
    {
      role: "assistant",
      content: [
        {
          type: "tool_use",
          id: "tool_1",
          name: "search_documents",
          input: { query: "Mars rovers" }
        }
      ]
    },
    {
      role: "user",
      content: [
        {
          type: "tool_result",
          tool_use_id: "tool_1",
          content: "Found 3 relevant documents: Document 3 (Mars Exploration), Document 7 (Rover Technology), Document 9 (Mission History)"
        }
      ]
    },
    {
      role: "assistant",
      content: [
        {
          type: "text",
          text: "I found 3 relevant documents about Mars rovers. Let me get more details from the Mars Exploration document."
        }
      ]
    },
    {
      role: "user",
      content: [
        {
          type: "text",
          text: "Yes, please tell me about the Perseverance rover specifically.",
          cache_control: { type: "ephemeral" }
        }
      ]
    }
  ]
)
puts message

此综合示例演示了如何使用所有 4 个可用缓存断点来优化提示的不同部分：

工具缓存（缓存断点 1）：最后一个工具定义上的 cache_control 参数缓存所有工具定义。
可重用指令缓存（缓存断点 2）：系统提示中的静态指令被单独缓存。这些指令在请求之间很少变化。
RAG 上下文缓存（缓存断点 3）：知识库文档被独立缓存，允许您更新 RAG 文档而不会使工具或指令缓存失效。
对话历史缓存（缓存断点 4）：最后一条用户消息被标记为 cache_control 以启用对话的增量缓存。

此方法提供最大的灵活性：

如果您在不更改早期内容的情况下向对话追加新轮次，所有四个缓存段都会被重用
如果您更新 RAG 文档但保持相同的工具和指令，前两个缓存段会被重用
如果您更改对话但保持相同的工具、指令和文档，前三个段会被重用
任何断点处的更改都会使该段及其后的所有内容失效，而早期缓存的段保持有效

对于第一个请求：

input_tokens：很少（最后一个缓存断点之后的 token，在此示例中接近 0）
cache_creation_input_tokens：所有缓存段中的 token（工具 + 指令 + RAG 文档 + 对话历史）
cache_read_input_tokens：0（没有缓存命中）

对于只有新用户消息的后续请求（且第四个断点移动到该新最后消息，如示例中）：

input_tokens：很少（最后一个缓存断点之后的 token，在此示例中接近 0）
cache_creation_input_tokens：新用户消息和先前助手轮次中的 token（正在缓存的新对话段）
cache_read_input_tokens：所有先前缓存的 token（工具 + 指令 + RAG 文档 + 先前对话）

此模式特别适用于：

具有大型文档上下文的 RAG 应用
使用多个工具的代理系统
需要维护上下文的长时间运行对话
需要独立优化提示不同部分的应用

数据保留

Prompt caching（自动和显式）符合 ZDR 条件。Anthropic 不存储提示的原始文本或 Claude 的响应。

KV（键值）缓存表示和缓存内容的加密哈希仅保存在内存中，不会静态存储。缓存条目具有最少 5 分钟（标准）或 1 小时（扩展）的生命周期，之后会被及时（尽管不是立即）删除。缓存条目在组织之间隔离，在 Claude API、Claude Platform on AWS 和 Microsoft Foundry（beta）上，也在组织内的工作区之间隔离。

有关所有功能的 ZDR 资格，请参阅 API 和数据保留。

FAQ

我需要多个缓存断点还是末尾一个就足够了？

在大多数情况下，在静态内容末尾的单个缓存断点就足够了。 缓存写入仅在您标记的块处发生。将其放在跨请求保持相同的最后一个块上，每个后续请求都会读取该相同的条目。如果后面的块每次请求都变化（时间戳、传入消息），请将断点放在它之前的最后一个稳定块上。

只有在以下情况下才需要多个断点：

增长的对话将您的断点推到距离上次缓存写入 20 个或更多块，将先前条目置于回溯窗口之外
您想要独立缓存不同频率更新的部分
您需要显式控制缓存内容以进行成本优化

示例：如果您有系统指令（很少变化）和 RAG 上下文（每天变化），您可能会使用两个断点来分别缓存它们。

缓存断点会增加额外成本吗？

不会，缓存断点本身是免费的。您只需支付：

将内容写入缓存（5 分钟 TTL 比基础输入 token 贵 25%）
从缓存读取（基础输入 token 价格的 10%）
未缓存内容的常规输入 token

断点的数量不影响定价 - 只有缓存和读取的内容量才重要。

如何从使用字段计算总输入 token？

使用响应包括三个单独的输入 token 字段，它们共同代表您的总输入：

total_input_tokens = cache_read_input_tokens + cache_creation_input_tokens + input_tokens

cache_read_input_tokens：从缓存检索的 token（缓存断点之前已缓存的所有内容）
cache_creation_input_tokens：正在写入缓存的新 token（在缓存断点处）
input_tokens：最后一个缓存断点之后未缓存的 token

重要： input_tokens 不代表所有输入 token - 仅表示最后一个缓存断点之后的部分。如果您有缓存内容，input_tokens 通常会比总输入小得多。

示例： 使用 200k token 文档缓存和 50 token 用户问题：

cache_read_input_tokens: 200,000
cache_creation_input_tokens: 0
input_tokens: 50
总计: 200,050 tokens

此分解对于理解成本和速率限制使用至关重要。详见跟踪缓存性能。

缓存生命周期是什么？

缓存的默认最小生命周期 (TTL) 为 5 分钟。每次使用缓存内容时，此生命周期都会刷新。

如果您发现 5 分钟太短，Anthropic 还提供 1 小时缓存 TTL。

我可以使用多少个缓存断点？

您可以在提示中定义最多 4 个缓存断点（使用 cache_control 参数）。

Prompt caching 是否适用于所有模型？

Prompt caching 支持所有活跃 Claude 模型。

Prompt caching 如何与扩展思考配合使用？

当思考参数变化时，缓存的系统提示和工具将被重用。但是，思考更改（启用/禁用或预算更改）会使先前缓存的带有消息内容的提示前缀失效。

有关缓存失效的更多详情，请参阅使缓存失效的内容。

有关扩展思考的更多信息，包括其与工具使用和 prompt caching 的交互，请参阅扩展思考文档。

如何启用 prompt caching？

最简单的方法是在请求体的顶层添加 "cache_control": {"type": "ephemeral"}（自动缓存）。或者，在单个内容块上包含至少一个 cache_control 断点（显式缓存断点）。

我可以将 prompt caching 与其他 API 功能一起使用吗？

是的，prompt caching 可以与其他 API 功能（如工具使用和视觉能力）一起使用。但是，更改提示中是否有图像或修改工具使用设置会破坏缓存。

有关缓存失效的更多详情，请参阅使缓存失效的内容。

Prompt caching 如何影响定价？

Prompt caching 引入了新的定价结构，其中 5 分钟缓存写入比基础输入 token 贵 25%，1 小时缓存写入是基础输入 token 的 2 倍，缓存命中仅为基础输入 token 价格的 10%。

我可以手动清除缓存吗？

目前，没有办法手动清除缓存。缓存的前缀在不活动最少 5 分钟后自动过期。

如何跟踪缓存策略的有效性？

您可以使用 API 响应中的 cache_creation_input_tokens 和 cache_read_input_tokens 字段监控缓存性能。

什么会破坏缓存？

有关缓存失效的更多详情，包括需要创建新缓存条目的更改列表，请参阅使缓存失效的内容。

Prompt caching 如何处理隐私和数据分离？

Prompt caching 设计有强大的隐私和数据分离措施：

缓存键使用直到缓存控制点的提示的加密哈希生成。这意味着只有具有相同提示的请求才能访问特定缓存。
在 Claude API、Claude Platform on AWS 和 Microsoft Foundry（beta）上，缓存在组织内的每个工作区之间隔离。在 Bedrock 和 Vertex AI 上，缓存在每个组织之间隔离。在任何情况下，缓存都不会跨组织共享，即使对于相同的提示。详见缓存存储和共享。
缓存机制旨在维护每个唯一对话或上下文的完整性和隐私。
在提示中的任何位置使用 cache_control 都是安全的。要使缓存产生读取，请将断点放在稳定前缀的末尾：将其放在每次请求都变化的块（如时间戳或用户的任意输入）上会每次写入新条目且永远不会命中。

这些措施确保 prompt caching 在提供性能优势的同时维护数据隐私和安全性。

我可以将 prompt caching 与 Batches API 一起使用吗？

是的，可以将 prompt caching 与 Batches API 请求一起使用。但是，由于异步批处理请求可以并发且按任何顺序处理，缓存命中是在尽力而为的基础上提供的。

1 小时缓存可以帮助提高缓存命中率。最具成本效益的使用方式如下：

收集一组具有共享前缀的消息请求。
发送一个批处理请求，其中包含一个具有此共享前缀和 1 小时缓存块的请求。这会将前缀写入 1 小时缓存。
完成后立即提交其余请求。您需要监控作业以了解何时完成。

这通常比使用 5 分钟缓存更好，因为批处理请求通常需要 5 分钟到 1 小时才能完成。

为什么在 Python 中看到错误 AttributeError: 'Beta' object has no attribute 'prompt_caching'？

当您升级了 SDK 或使用过时的代码示例时，通常会出现此错误。Prompt caching 不再需要 beta 前缀。不要使用：

client.beta.prompt_caching.messages.create(**params)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.beta.promptCaching.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: "You are an expert on this large document...",
      cache_control: { type: "ephemeral" }
    }
  ],
  messages: [{ role: "user", content: "Summarize the key points" }]
});

console.log(response);

<?php

use Anthropic\Client;

$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));

$message = $client->beta->promptCaching->messages->create(
    maxTokens: 1024,
    messages: [
        ['role' => 'user', 'content' => 'Summarize the key points']
    ],
    model: 'claude-opus-4-7',
    system: [
        [
            'type' => 'text',
            'text' => 'You are an expert on this large document...',
            'cache_control' => ['type' => 'ephemeral']
        ]
    ],
);

echo $message->content[0]->text;

require "anthropic"

client = Anthropic::Client.new

message = client.beta.prompt_caching.messages.create(
  model: "claude-opus-4-7",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: "You are an expert on this large document...",
      cache_control: { type: "ephemeral" }
    }
  ],
  messages: [
    { role: "user", content: "Summarize the key points" }
  ]
)
puts message.content.first.text

使用：

client.messages.create(**params)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: "You are an expert on this large document...",
      cache_control: { type: "ephemeral" }
    }
  ],
  messages: [{ role: "user", content: "Summarize the key points" }]
});

console.log(response);

<?php

use Anthropic\Client;

$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));

$message = $client->messages->create(
    maxTokens: 1024,
    messages: [
        ['role' => 'user', 'content' => 'Summarize the key points']
    ],
    model: 'claude-opus-4-7',
    system: [
        [
            'type' => 'text',
            'text' => 'You are an expert on this large document...',
            'cache_control' => ['type' => 'ephemeral']
        ]
    ],
);

echo $message->content[0]->text;

require "anthropic"

client = Anthropic::Client.new

message = client.messages.create(
  model: "claude-opus-4-7",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: "You are an expert on this large document...",
      cache_control: { type: "ephemeral" }
    }
  ],
  messages: [
    { role: "user", content: "Summarize the key points" }
  ]
)
puts message.content.first.text

为什么看到 'TypeError: Cannot read properties of undefined (reading 'messages')'？

当您升级了 SDK 或使用过时的代码示例时，通常会出现此错误。Prompt caching 不再需要 beta 前缀。不要使用：

client.beta.promptCaching.messages.create(/* ... */);

只需使用：

client.messages.create(/* ... */);