快速模式(测试版:研究预览)
为 Claude Opus 4.6 和 Claude Opus 4.7 提供更高的输出速度,显著加快 token 生成速度,适用于延迟敏感和智能体工作流。
快速模式为 Claude Opus 4.6 和 Claude Opus 4.7 提供显著更快的输出 token 生成速度。通过在 API 请求中设置 speed: "fast",您可以从同一模型获得高达 2.5 倍的每秒输出 token,但价格更高。
快速模式处于测试版(研究预览)。加入候补名单以请求访问。在 Anthropic 收集反馈期间,可用性有限。
此功能符合零数据保留(ZDR)条件。当您的组织拥有 ZDR 协议时,通过此功能发送的数据在 API 响应返回后不会被存储。
支持的模型
快速模式在以下模型上受支持:
- Claude Opus 4.7(
claude-opus-4-7) - Claude Opus 4.6(
claude-opus-4-6)
快速模式的工作原理
快速模式使用更快的推理配置运行相同的模型。智能或能力没有任何变化。
- 与标准速度相比,每秒输出 token 高达 2.5 倍
- 速度优势集中在每秒输出 token(OTPS)上,而不是首个 token 的时间(TTFT)
- 相同的模型权重和行为(不是不同的模型)
基本用法
curl https://api.anthropic.com/v1/messages \
--header "x-api-key: $ANTHROPIC_API_KEY" \
--header "anthropic-version: 2023-06-01" \
--header "anthropic-beta: fast-mode-2026-02-01" \
--header "content-type: application/json" \
--data '{
"model": "claude-opus-4-7",
"max_tokens": 4096,
"speed": "fast",
"messages": [{
"role": "user",
"content": "Refactor this module to use dependency injection"
}]
}'
ant beta:messages create \
--beta fast-mode-2026-02-01 \
--transform 'content.0.text' --raw-output <<'YAML'
model: claude-opus-4-7
max_tokens: 4096
speed: fast
messages:
- role: user
content: Refactor this module to use dependency injection
YAML
import anthropic
client = anthropic.Anthropic()
response = client.beta.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
speed="fast",
betas=["fast-mode-2026-02-01"],
messages=[
{"role": "user", "content": "Refactor this module to use dependency injection"}
],
)
print(response.content[0].text)
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const response = await client.beta.messages.create({
model: "claude-opus-4-7",
max_tokens: 4096,
speed: "fast",
betas: ["fast-mode-2026-02-01"],
messages: [
{
role: "user",
content: "Refactor this module to use dependency injection"
}
]
});
const textBlock = response.content.find(
(block): block is Anthropic.Beta.Messages.BetaTextBlock => block.type === "text"
);
console.log(textBlock?.text);
using Anthropic;
using Anthropic.Models.Beta.Messages;
AnthropicClient client = new();
var response = await client.Beta.Messages.Create(new MessageCreateParams
{
Model = "claude-opus-4-7",
MaxTokens = 4096,
Speed = Speed.Fast,
Betas = ["fast-mode-2026-02-01"],
Messages = [
new() { Role = Role.User, Content = "Refactor this module to use dependency injection" }
],
});
Console.WriteLine(response);
package main
import (
"context"
"fmt"
"log"
anthropic "github.com/anthropics/anthropic-sdk-go"
)
func main() {
client := anthropic.NewClient()
response, err := client.Beta.Messages.New(context.TODO(), anthropic.BetaMessageNewParams{
Model: anthropic.ModelClaudeOpus4_7,
MaxTokens: 4096,
Speed: anthropic.BetaMessageNewParamsSpeedFast,
Betas: []anthropic.AnthropicBeta{anthropic.AnthropicBetaFastMode2026_02_01},
Messages: []anthropic.BetaMessageParam{
anthropic.NewBetaUserMessage(anthropic.NewBetaTextBlock("Refactor this module to use dependency injection")),
},
})
if err != nil {
log.Fatal(err)
}
fmt.Println(response.Content[0].AsText().Text)
}
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.beta.AnthropicBeta;
import com.anthropic.models.beta.messages.BetaMessage;
import com.anthropic.models.beta.messages.MessageCreateParams;
import com.anthropic.models.messages.Model;
void main() {
AnthropicClient client = AnthropicOkHttpClient.fromEnv();
BetaMessage response = client.beta().messages().create(
MessageCreateParams.builder()
.model(Model.CLAUDE_OPUS_4_7)
.maxTokens(4096L)
.speed(MessageCreateParams.Speed.FAST)
.addBeta(AnthropicBeta.FAST_MODE_2026_02_01)
.addUserMessage("Refactor this module to use dependency injection")
.build());
IO.println(response.content().get(0).text().get().text());
}
<?php
use Anthropic\Client;
$client = new Client();
$response = $client->beta->messages->create(
model: 'claude-opus-4-7',
maxTokens: 4096,
speed: 'fast',
betas: ['fast-mode-2026-02-01'],
messages: [
['role' => 'user', 'content' => 'Refactor this module to use dependency injection'],
],
);
echo $response->content[0]->text;
require "anthropic"
client = Anthropic::Client.new
response = client.beta.messages.create(
model: "claude-opus-4-7",
max_tokens: 4096,
speed: "fast",
betas: ["fast-mode-2026-02-01"],
messages: [{role: "user", content: "Refactor this module to use dependency injection"}]
)
puts response.content[0].text
定价
快速模式在整个上下文窗口范围内(包括超过 20 万输入 token 的请求)按标准 Opus 费率的 6 倍定价。下表显示了 Claude Opus 4.6 和 Claude Opus 4.7 使用快速模式的定价:
| 输入 | 输出 |
|---|---|
| $30 / MTok | $150 / MTok |
快速模式定价与其他定价修改器叠加:
有关完整的定价详情,请参阅定价页面。
速率限制
快速模式有独立的速率限制,与标准 Opus 速率限制分开。当您的快速模式速率限制被超出时,API 返回 429 错误,并带有 retry-after 头,指示何时会有容量可用。
响应包含指示快速模式速率限制状态的头:
| 头 | 描述 |
|---|---|
anthropic-fast-input-tokens-limit | 每分钟最大快速模式输入 token |
anthropic-fast-input-tokens-remaining | 剩余快速模式输入 token |
anthropic-fast-input-tokens-reset | 快速模式输入 token 限制重置的时间 |
anthropic-fast-output-tokens-limit | 每分钟最大快速模式输出 token |
anthropic-fast-output-tokens-remaining | 剩余快速模式输出 token |
anthropic-fast-output-tokens-reset | 快速模式输出 token 限制重置的时间 |
有关特定层级的速率限制,请参阅速率限制页面。
检查使用了哪种速度
响应的 usage 对象包含一个 speed 字段,指示使用了哪种速度,"fast" 或 "standard":
curl https://api.anthropic.com/v1/messages \
--header "x-api-key: $ANTHROPIC_API_KEY" \
--header "anthropic-version: 2023-06-01" \
--header "anthropic-beta: fast-mode-2026-02-01" \
--header "content-type: application/json" \
--data '{
"model": "claude-opus-4-7",
"max_tokens": 1024,
"speed": "fast",
"messages": [{"role": "user", "content": "Hello"}]
}'
ant beta:messages create --beta fast-mode-2026-02-01 \
--transform usage.speed --raw-output <<'YAML'
model: claude-opus-4-7
max_tokens: 1024
speed: fast
messages:
- role: user
content: Hello
YAML
response = client.beta.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
speed="fast",
betas=["fast-mode-2026-02-01"],
messages=[{"role": "user", "content": "Hello"}],
)
print(response.usage.speed) # "fast" 或 "standard"
const response = await client.beta.messages.create({
model: "claude-opus-4-7",
max_tokens: 1024,
speed: "fast",
betas: ["fast-mode-2026-02-01"],
messages: [{ role: "user", content: "Hello" }]
});
console.log(response.usage.speed); // "fast" 或 "standard"
using Anthropic;
using Anthropic.Models.Beta.Messages;
AnthropicClient client = new();
var response = await client.Beta.Messages.Create(new MessageCreateParams
{
Model = "claude-opus-4-7",
MaxTokens = 1024,
Speed = Speed.Fast,
Betas = ["fast-mode-2026-02-01"],
Messages = [new() { Role = Role.User, Content = "Hello" }],
});
Console.WriteLine(response.Usage.Speed); // "fast" 或 "standard"
package main
import (
"context"
"fmt"
"log"
anthropic "github.com/anthropics/anthropic-sdk-go"
)
func main() {
client := anthropic.NewClient()
response, err := client.Beta.Messages.New(context.TODO(), anthropic.BetaMessageNewParams{
Model: anthropic.ModelClaudeOpus4_7,
MaxTokens: 1024,
Speed: anthropic.BetaMessageNewParamsSpeedFast,
Betas: []anthropic.AnthropicBeta{anthropic.AnthropicBetaFastMode2026_02_01},
Messages: []anthropic.BetaMessageParam{
anthropic.NewBetaUserMessage(anthropic.NewBetaTextBlock("Hello")),
},
})
if err != nil {
log.Fatal(err)
}
fmt.Println(response.Usage.Speed) // "fast" 或 "standard"
}
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.beta.AnthropicBeta;
import com.anthropic.models.beta.messages.BetaMessage;
import com.anthropic.models.beta.messages.MessageCreateParams;
import com.anthropic.models.messages.Model;
void main() {
AnthropicClient client = AnthropicOkHttpClient.fromEnv();
MessageCreateParams params = MessageCreateParams.builder()
.model(Model.CLAUDE_OPUS_4_7)
.maxTokens(1024L)
.speed(MessageCreateParams.Speed.FAST)
.addBeta(AnthropicBeta.FAST_MODE_2026_02_01)
.addUserMessage("Hello")
.build();
BetaMessage response = client.beta().messages().create(params);
IO.println(response.usage().speed()); // "fast" 或 "standard"
}
<?php
use Anthropic\Client;
$client = new Client();
$response = $client->beta->messages->create(
model: 'claude-opus-4-7',
maxTokens: 1024,
speed: 'fast',
betas: ['fast-mode-2026-02-01'],
messages: [['role' => 'user', 'content' => 'Hello']],
);
echo $response->usage->speed; // "fast" 或 "standard"
response = client.beta.messages.create(
model: "claude-opus-4-7",
max_tokens: 1024,
speed: "fast",
betas: ["fast-mode-2026-02-01"],
messages: [{ role: "user", content: "Hello" }]
)
puts(response.usage.speed) # "fast" 或 "standard"
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [{ "type": "text", "text": "Hello!" }],
"model": "claude-opus-4-7",
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 8,
"output_tokens": 12,
"speed": "fast"
}
}
要跟踪组织中的快速模式使用情况和成本,请参阅使用量和成本 API。
重试和回退
自动重试
当快速模式速率限制被超出时,API 返回带有 retry-after 头的 429 错误。Anthropic SDK 默认自动重试这些请求最多 2 次(可通过 max_retries 配置),在每次重试之前等待服务器指定的延迟。由于快速模式使用连续 token 补充,retry-after 延迟通常很短,一旦容量可用请求就会成功。
回退到标准速度
如果您希望回退到标准速度而不是等待快速模式容量,请捕获速率限制错误并在不使用 speed: "fast" 的情况下重试。在初始快速请求上设置 max_retries 为 0 以跳过自动重试,并在速率限制错误时立即失败。
从快速模式回退到标准速度会导致提示缓存未命中。不同速度的请求不共享缓存前缀。
由于将 max_retries 设置为 0 也会禁用其他瞬态错误(过载、内部服务器错误)的重试,下面的示例在这些情况下使用默认重试重新发出原始请求。
# `ant` 自动重试 429/5xx,没有每请求 max_retries
# 覆盖,因此在快速模式 429 时回退在内置重试耗尽后运行。
# --transform-error 显示 error.type 用于分支。
create_message_with_fast_fallback() {
local speed="$1" max_attempts="${2:-3}" body out
body=${3:-$(cat)}
out=$(
ant beta:messages create --beta fast-mode-2026-02-01 \
${speed:+--speed "$speed"} \
--transform-error error.type --format-error yaml <<<"$body" 2>/dev/null
) && { printf '%s\n' "$out"; return; }
case "$out" in
rate_limit_error)
if [[ -n "$speed" ]]; then
create_message_with_fast_fallback "" "$max_attempts" "$body"
return
fi ;;
overloaded_error | api_error | "")
if (( max_attempts > 1 )); then
create_message_with_fast_fallback "$speed" $((max_attempts - 1)) "$body"
return
fi ;;
esac
printf '%s\n' "${out:-connection_error}" >&2
return 1
}
MESSAGE=$(
create_message_with_fast_fallback fast <<'YAML'
model: claude-opus-4-7
max_tokens: 1024
messages:
- role: user
content: Hello
YAML
)
import anthropic
client = anthropic.Anthropic()
def create_message_with_fast_fallback(max_retries=None, max_attempts=3, **params):
try:
return client.beta.messages.create(**params, max_retries=max_retries)
except anthropic.RateLimitError:
if params.get("speed") == "fast":
del params["speed"]
return create_message_with_fast_fallback(**params)
raise
except (
anthropic.APIStatusError,
anthropic.APIConnectionError,
) as error:
if isinstance(error, anthropic.APIStatusError) and error.status_code < 500:
raise
if max_attempts > 1:
return create_message_with_fast_fallback(
max_attempts=max_attempts - 1, **params
)
raise
message = create_message_with_fast_fallback(
model="claude-opus-4-7",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
betas=["fast-mode-2026-02-01"],
speed="fast",
max_retries=0,
)
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
(async () => {
async function createMessageWithFastFallback(
params: Anthropic.Beta.MessageCreateParams,
requestOptions?: Anthropic.RequestOptions,
maxAttempts: number = 3
): Promise<Anthropic.Beta.Messages.BetaMessage> {
try {
return (await client.beta.messages.create(
params,
requestOptions
)) as Anthropic.Beta.Messages.BetaMessage;
} catch (e) {
if (e instanceof Anthropic.RateLimitError && params.speed === "fast") {
const { speed, ...rest } = params;
return createMessageWithFastFallback(rest);
}
if (
e instanceof Anthropic.InternalServerError ||
e instanceof Anthropic.APIConnectionError
) {
if (maxAttempts > 1) {
return createMessageWithFastFallback(params, undefined, maxAttempts - 1);
}
}
throw e;
}
}
const message = await createMessageWithFastFallback(
{
model: "claude-opus-4-7",
max_tokens: 1024,
messages: [{ role: "user", content: "Hello" }],
betas: ["fast-mode-2026-02-01"],
speed: "fast"
},
{ maxRetries: 0 }
);
})();
using Anthropic;
using Anthropic.Exceptions;
using Anthropic.Models.Beta.Messages;
AnthropicClient client = new();
async Task<BetaMessage> CreateMessageWithFastFallback(
MessageCreateParams parameters,
int? maxRetries = null,
int maxAttempts = 3)
{
try
{
var requestClient = maxRetries is int retries
? client.WithOptions(options => options with { MaxRetries = retries })
: client;
return await requestClient.Beta.Messages.Create(parameters);
}
catch (AnthropicRateLimitException)
{
if (parameters.Speed is not null)
{
return await CreateMessageWithFastFallback(
parameters with { Speed = null });
}
throw;
}
catch (Anthropic5xxException)
{
if (maxAttempts > 1)
{
return await CreateMessageWithFastFallback(
parameters, maxAttempts: maxAttempts - 1);
}
throw;
}
}
var message = await CreateMessageWithFastFallback(
new MessageCreateParams
{
Model = "claude-opus-4-7",
MaxTokens = 1024,
Messages = [new() { Role = Role.User, Content = "Hello" }],
Betas = ["fast-mode-2026-02-01"],
Speed = Speed.Fast,
},
maxRetries: 0);
package main
import (
"context"
"errors"
"fmt"
anthropic "github.com/anthropics/anthropic-sdk-go"
"github.com/anthropics/anthropic-sdk-go/option"
)
func createMessageWithFastFallback(
ctx context.Context,
client *anthropic.Client,
params anthropic.BetaMessageNewParams,
maxAttempts int,
opts ...option.RequestOption,
) (*anthropic.BetaMessage, error) {
message, err := client.Beta.Messages.New(ctx, params, opts...)
if err != nil {
var apierr *anthropic.Error
if errors.As(err, &apierr) && apierr.StatusCode == 429 && params.Speed != "" {
params.Speed = ""
return createMessageWithFastFallback(ctx, client, params, maxAttempts)
}
if (errors.As(err, &apierr) && apierr.StatusCode >= 500) || !errors.As(err, &apierr) {
if maxAttempts > 1 {
return createMessageWithFastFallback(ctx, client, params, maxAttempts-1)
}
}
return nil, err
}
return message, nil
}
func main() {
client := anthropic.NewClient()
message, err := createMessageWithFastFallback(
context.TODO(),
&client,
anthropic.BetaMessageNewParams{
Model: anthropic.ModelClaudeOpus4_7,
MaxTokens: 1024,
Messages: []anthropic.BetaMessageParam{
anthropic.NewBetaUserMessage(anthropic.NewBetaTextBlock("Hello")),
},
Speed: "fast",
Betas: []anthropic.AnthropicBeta{anthropic.AnthropicBetaFastMode2026_02_01},
},
3,
option.WithMaxRetries(0),
)
if err != nil {
panic(err)
}
fmt.Println(message)
}
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.errors.InternalServerException;
import com.anthropic.errors.RateLimitException;
import com.anthropic.models.beta.AnthropicBeta;
import com.anthropic.models.beta.messages.BetaMessage;
import com.anthropic.models.beta.messages.MessageCreateParams;
import com.anthropic.models.messages.Model;
import java.util.Optional;
// 禁用 SDK 自动重试,以便下面的回退逻辑处理
AnthropicClient client =
AnthropicOkHttpClient.builder().fromEnv().maxRetries(0).build();
BetaMessage createMessageWithFastFallback(
MessageCreateParams params, int maxAttempts) {
try {
return client.beta().messages().create(params);
} catch (RateLimitException e) {
if (params.speed().isPresent()) {
MessageCreateParams retryParams = params.toBuilder()
.speed(Optional.empty())
.build();
return createMessageWithFastFallback(retryParams, maxAttempts);
}
throw e;
} catch (InternalServerException e) {
if (maxAttempts > 1) {
return createMessageWithFastFallback(params, maxAttempts - 1);
}
throw e;
}
}
void main() {
BetaMessage message = createMessageWithFastFallback(
MessageCreateParams.builder()
.model(Model.CLAUDE_OPUS_4_7)
.maxTokens(1024L)
.addUserMessage("Hello")
.addBeta(AnthropicBeta.FAST_MODE_2026_02_01)
.speed(MessageCreateParams.Speed.FAST)
.build(),
3);
IO.println(message.content().get(0).text().get().text());
}
<?php
use Anthropic\Client;
use Anthropic\Core\Exceptions\APIConnectionException;
use Anthropic\Core\Exceptions\InternalServerException;
use Anthropic\Core\Exceptions\RateLimitException;
use Anthropic\RequestOptions;
$client = new Client();
function createMessageWithFastFallback(
Client $client,
array $params,
?RequestOptions $requestOptions = null,
int $maxAttempts = 3,
) {
try {
return $client->beta->messages->create(
...$params,
requestOptions: $requestOptions,
);
} catch (RateLimitException $e) {
if (isset($params['speed'])) {
unset($params['speed']);
return createMessageWithFastFallback($client, $params);
}
throw $e;
} catch (InternalServerException | APIConnectionException $e) {
if ($maxAttempts > 1) {
return createMessageWithFastFallback(
$client, $params, maxAttempts: $maxAttempts - 1
);
}
throw $e;
}
}
$message = createMessageWithFastFallback(
$client,
[
'model' => 'claude-opus-4-7',
'maxTokens' => 1024,
'messages' => [['role' => 'user', 'content' => 'Hello']],
'betas' => ['fast-mode-2026-02-01'],
'speed' => 'fast',
],
RequestOptions::with(maxRetries: 0),
);
require "anthropic"
anthropic = Anthropic::Client.new
def create_message_with_fast_fallback(client, request_options: {}, max_attempts: 3, **params)
client.beta.messages.create(**params, request_options: request_options)
rescue Anthropic::Errors::RateLimitError
raise unless params[:speed] == "fast"
params.delete(:speed)
create_message_with_fast_fallback(client, **params)
rescue Anthropic::Errors::InternalServerError, Anthropic::Errors::APIConnectionError
raise unless max_attempts > 1
create_message_with_fast_fallback(client, max_attempts: max_attempts - 1, **params)
end
message = create_message_with_fast_fallback(
anthropic,
model: "claude-opus-4-7",
max_tokens: 1024,
messages: [{ role: "user", content: "Hello" }],
betas: ["fast-mode-2026-02-01"],
speed: "fast",
request_options: { max_retries: 0 }
)
注意事项
- 提示缓存: 在快速和标准速度之间切换会使提示缓存失效。不同速度的请求不共享缓存前缀。
- 支持的模型: 快速模式在 Claude Opus 4.6 和 Claude Opus 4.7 上受支持。向不支持的模型发送
speed: "fast"会返回错误。 - TTFT: 快速模式的优势集中在每秒输出 token(OTPS)上,而不是首个 token 的时间(TTFT)。
- Batch API: 快速模式不适用于 Batch API。
- 优先层: 快速模式不适用于优先层。
- AWS 上的 Claude 平台: 快速模式目前不适用于 AWS 上的 Claude 平台。