处理停止原因


当您向 Messages API 发出请求时,Claude 的响应包含一个 stop_reason 字段,指示模型为何停止生成响应。理解这些值对于构建能够适当处理不同响应类型的健壮应用程序至关重要。

有关 API 响应中 stop_reason 的详情,请参阅 Messages API 参考

stop_reason 字段

stop_reason 字段是每个成功的 Messages API 响应的一部分。与指示请求处理失败的错误不同,stop_reason 告诉您 Claude 为何成功完成了响应生成。

{
  "id": "msg_01234",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Here's the answer to your question..."
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 100,
    "output_tokens": 50
  }
}

停止原因值

end_turn

最常见的停止原因。表示 Claude 自然地完成了响应。

from anthropic import Anthropic

client = Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
if response.stop_reason == "end_turn":
    # 处理完整响应
    print(response.content[0].text)

使用 end_turn 时的空响应

有时 Claude 返回空响应(恰好 2-3 个 token,没有内容),stop_reason: "end_turn"。这通常发生在 Claude 判断助手轮次已完成时,特别是在工具结果之后。

常见原因:

  • 在工具结果之后立即添加文本块(Claude 学会期望用户总是在工具结果后插入文本,因此它结束轮次以遵循该模式)
  • 在不添加任何内容的情况下将 Claude 的完成响应发送回去(Claude 已经决定完成了,所以它将保持完成状态)

如何防止空响应:

# 错误做法:在 tool_result 之后立即添加文本
messages = [
    {"role": "user", "content": "Calculate the sum of 1234 and 5678"},
    {
        "role": "assistant",
        "content": [
            {
                "type": "tool_use",
                "id": "toolu_123",
                "name": "calculator",
                "input": {"operation": "add", "a": 1234, "b": 5678},
            }
        ],
    },
    {
        "role": "user",
        "content": [
            {"type": "tool_result", "tool_use_id": "toolu_123", "content": "6912"},
            {
                "type": "text",
                "text": "Here's the result",  # 不要在 tool_result 之后添加文本
            },
        ],
    },
]

# 正确做法:直接发送工具结果,不添加额外文本
messages = [
    {"role": "user", "content": "Calculate the sum of 1234 and 5678"},
    {
        "role": "assistant",
        "content": [
            {
                "type": "tool_use",
                "id": "toolu_123",
                "name": "calculator",
                "input": {"operation": "add", "a": 1234, "b": 5678},
            }
        ],
    },
    {
        "role": "user",
        "content": [
            {"type": "tool_result", "tool_use_id": "toolu_123", "content": "6912"}
        ],
    },  # 只有 tool_result,没有额外文本
]


# 如果修复上述问题后仍然收到空响应:
def handle_empty_response(client, messages):
    response = client.messages.create(
        model="claude-opus-4-7", max_tokens=1024, messages=messages
    )

    # 检查响应是否为空
    if response.stop_reason == "end_turn" and not response.content:
        # 错误做法:不要直接用空响应重试
        # 这不会有效,因为 Claude 已经决定完成了

        # 正确做法:在新的用户消息中添加继续提示
        messages.append({"role": "user", "content": "Please continue"})

        response = client.messages.create(
            model="claude-opus-4-7", max_tokens=1024, messages=messages
        )

    return response

最佳实践:

  1. 永远不要在工具结果之后立即添加文本块 - 这会教会 Claude 在每次工具使用后期望用户输入
  2. 不要在不修改的情况下重试空响应 - 简单地将空响应发送回去不会有帮助
  3. 使用继续提示作为最后手段 - 仅当上述修复无法解决问题时

max_tokens

Claude 停止是因为达到了请求中指定的 max_tokens 限制。

# 限制 token 的请求
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=10,
    messages=[{"role": "user", "content": "Explain quantum physics"}],
)

if response.stop_reason == "max_tokens":
    # 响应被截断
    print("Response was cut off at token limit")
    # 考虑发出另一个请求以继续

不完整的工具使用块

如果 Claude 的响应因达到 max_tokens 限制而被截断,并且截断的响应包含不完整的工具使用块,您需要使用更高的 max_tokens 值重试请求以获取完整的工具使用。

RESPONSE=$(ant messages create --max-tokens 1024 \
  --format jsonl < request.yaml)

# 检查响应是否在工具使用中被截断
STOP_REASON=$(jq -r '.stop_reason' <<<"$RESPONSE")
LAST_TYPE=$(jq -r '.content[-1].type' <<<"$RESPONSE")
if [ "$STOP_REASON" = "max_tokens" ] && [ "$LAST_TYPE" = "tool_use" ]; then
  # 使用更高的 max_tokens 重试
  ant messages create --max-tokens 4096 < request.yaml
fi
import anthropic

client = anthropic.Anthropic()
tools: list[dict] = []
messages: list[dict] = []
response = client.messages.create(
    model="claude-opus-4-7", max_tokens=1024, tools=tools, messages=messages
)
# 检查响应是否在工具使用中被截断
if response.stop_reason == "max_tokens":
    # 检查最后一个内容块是否是不完整的 tool_use
    last_block = response.content[-1]
    if last_block.type == "tool_use":
        # 使用更高的 max_tokens 发送请求
        response = client.messages.create(
            model="claude-opus-4-7",
            max_tokens=4096,  # 增加的限制
            messages=messages,
            tools=tools,
        )
// 检查响应是否在工具使用中被截断
if (response.stop_reason === "max_tokens") {
  // 检查最后一个内容块是否是不完整的 tool_use
  const lastBlock = response.content[response.content.length - 1];
  if (lastBlock.type === "tool_use") {
    // 使用更高的 max_tokens 发送请求
    response = await client.messages.create({
      model: "claude-opus-4-7",
      max_tokens: 4096, // 增加的限制
      messages: messages,
      tools: tools
    });
  }
}
using System.Linq;
using Anthropic;
using Anthropic.Models.Messages;

AnthropicClient client = new();

var parameters = new MessageCreateParams
{
    Model = Model.ClaudeOpus4_7,
    MaxTokens = 1024,
    Messages = messages,
    Tools = tools
};

var response = await client.Messages.Create(parameters);

if (response.StopReason == "max_tokens")
{
    var lastBlock = response.Content.Last();
    if (lastBlock.Type == "tool_use")
    {
        parameters.MaxTokens = 4096;
        response = await client.Messages.Create(parameters);
    }
}
package main

import (
	"context"
	"fmt"
	"log"

	"github.com/anthropics/anthropic-sdk-go"
)

func main() {
	client := anthropic.NewClient()

	tools := []anthropic.ToolUnionParam{}
	messages := []anthropic.MessageParam{anthropic.NewUserMessage(anthropic.NewTextBlock("test"))}
	response, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
		Model:     anthropic.ModelClaudeOpus4_7,
		MaxTokens: 1024,
		Messages:  messages,
		Tools:     tools,
	})
	if err != nil {
		log.Fatal(err)
	}

	if response.StopReason == "max_tokens" {
		lastBlock := response.Content[len(response.Content)-1]
		switch lastBlock.AsAny().(type) {
		case anthropic.ToolUseBlock:
			response, err = client.Messages.New(context.TODO(), anthropic.MessageNewParams{
				Model:     anthropic.ModelClaudeOpus4_7,
				MaxTokens: 4096,
				Messages:  messages,
				Tools:     tools,
			})
			if err != nil {
				log.Fatal(err)
			}
		}
	}

	fmt.Println(response)
}
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.messages.MessageCreateParams;
import com.anthropic.models.messages.Message;
import com.anthropic.models.messages.Model;
import com.anthropic.models.messages.Tool;
import com.anthropic.models.messages.ContentBlock;
import java.util.List;
import com.anthropic.models.messages.StopReason;
AnthropicClient client = AnthropicOkHttpClient.fromEnv();
List<MessageCreateParams.Message> messages = List.of();
List<Tool> tools = List.of();
Message response = client.messages().create(MessageCreateParams.builder().model(Model.CLAUDE_OPUS_4_7).maxTokens(1024L).addUserMessage("test").build());
// 检查响应是否在工具使用中被截断
if (response.stopReason().isPresent() && response.stopReason().get().equals(StopReason.MAX_TOKENS)) {
    ContentBlock lastBlock = response.content().get(response.content().size() - 1);
    if (lastBlock.toolUse().isPresent()) {
        // 使用更高的 max_tokens 发送请求
        response = client.messages().create(
            MessageCreateParams.builder()
                .model(Model.CLAUDE_OPUS_4_7)
                .maxTokens(4096L) // 增加的限制
                .messages(messages)
                .tools(tools)
                .build()
        );
    }
}
<?php

use Anthropic\Client;

$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));

$response = $client->messages->create(
    maxTokens: 1024,
    messages: $messages,
    model: 'claude-opus-4-7',
    tools: $tools,
);

if ($response->stopReason === 'max_tokens') {
    $lastBlock = end($response->content);
    if ($lastBlock->type === 'tool_use') {
        $response = $client->messages->create(
            maxTokens: 4096,
            messages: $messages,
            model: 'claude-opus-4-7',
            tools: $tools,
        );
    }
}
require "anthropic"

client = Anthropic::Client.new

tools = [
  {
    name: "get_weather",
    description: "Get the current weather in a given location",
    input_schema: { type: "object", properties: { location: { type: "string" } }, required: ["location"] }
  }
]
messages = [
  { role: "user", content: "What's the weather in San Francisco?" }
]

response = client.messages.create(
  model: "claude-opus-4-7",
  max_tokens: 1024,
  messages: messages,
  tools: tools
)

if response.stop_reason == :max_tokens
  last_block = response.content.last
  if last_block.type == :tool_use
    response = client.messages.create(
      model: "claude-opus-4-7",
      max_tokens: 4096,
      messages: messages,
      tools: tools
    )
  end
end

stop_sequence

Claude 遇到了您自定义的停止序列之一。

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    stop_sequences=["END", "STOP"],
    messages=[{"role": "user", "content": "Generate text until you say END"}],
)

if response.stop_reason == "stop_sequence":
    print(f"Stopped at sequence: {response.stop_sequence}")

tool_use

Claude 正在调用工具并期望您执行它。

Note

对于大多数工具使用实现,我们推荐使用工具运行器,它会自动处理工具执行、结果格式化和对话管理。

from anthropic import Anthropic

client = Anthropic()
weather_tool = {
    "name": "get_weather",
    "description": "Get the current weather in a given location",
    "input_schema": {
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City and state"},
        },
        "required": ["location"],
    },
}


def execute_tool(name, tool_input):
    """执行工具并返回结果。"""
    return f"Weather in {tool_input.get('location', 'unknown')}: 72°F"


response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[weather_tool],
    messages=[{"role": "user", "content": "What's the weather?"}],
)

if response.stop_reason == "tool_use":
    # 提取并执行工具
    for content in response.content:
        if content.type == "tool_use":
            result = execute_tool(content.name, content.input)
            # 将结果返回给 Claude 以获取最终响应

pause_turn

当服务端采样循环在执行服务端工具(如网络搜索或网络获取)时达到迭代限制时返回。默认限制是每个请求 10 次迭代。

发生这种情况时,响应可能包含一个 server_tool_use 块,但没有对应的 server_tool_result。要让 Claude 完成处理,请通过按原样发回响应来继续对话。

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    tools=[{"type": "web_search_20250305", "name": "web_search"}],
    messages=[{"role": "user", "content": "Search for latest AI news"}],
)

if response.stop_reason == "pause_turn":
    # 通过发回响应来继续对话
    messages = [
        {"role": "user", "content": original_query},
        {"role": "assistant", "content": response.content},
    ]
    continuation = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=1024,
        messages=messages,
        tools=[{"type": "web_search_20250305", "name": "web_search"}],
    )
Note

您的应用程序应该在使用服务端工具的任何智能体循环中处理 pause_turn。只需将助手的响应添加到您的消息数组中,并发出另一个 API 请求以让 Claude 继续。

refusal

Claude 由于安全问题拒绝生成响应。

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "[Unsafe request]"}],
)

if response.stop_reason == "refusal":
    # Claude 拒绝响应
    print("Claude was unable to process this request")
    # 考虑重新措辞或修改请求
Tip

如果您在使用 Claude Sonnet 4.5 或 Opus 4.1 时频繁遇到 refusal 停止原因,可以尝试将 API 调用更新为使用 Haiku 4.5(claude-haiku-4-5-20251001),它有不同的使用限制。了解更多关于理解 Sonnet 4.5 的 API 安全过滤器

model_context_window_exceeded

Claude 停止是因为达到了模型的上下文窗口限制。这允许您在不知道确切输入大小的情况下请求最大可能的 token。

# 请求最大 token 以尽可能多地获取
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=20000,  # Python SDK 对超过约 21k 的 max_tokens 需要流式传输(Opus 4.7 在流式传输下支持 128k)
    messages=[
        {"role": "user", "content": "Large input that uses most of context window..."}
    ],
)

if response.stop_reason == "model_context_window_exceeded":
    # 响应在 max_tokens 之前达到上下文窗口限制
    print("Response reached model's context window limit")
    # 响应仍然有效,但受到上下文窗口限制
Note

此停止原因在 Sonnet 4.5 及更新模型中默认可用。对于早期模型,使用 beta 头 model-context-window-exceeded-2025-08-26 来启用此行为。

处理停止原因的最佳实践

1. 始终检查 stop_reason

养成在响应处理逻辑中检查 stop_reason 的习惯:

def handle_response(response):
    if response.stop_reason == "tool_use":
        return handle_tool_use(response)
    elif response.stop_reason == "max_tokens":
        return handle_truncation(response)
    elif response.stop_reason == "model_context_window_exceeded":
        return handle_context_limit(response)
    elif response.stop_reason == "pause_turn":
        return handle_pause(response)
    elif response.stop_reason == "refusal":
        return handle_refusal(response)
    else:
        # 处理 end_turn 和其他情况
        return response.content[0].text

2. 优雅地处理截断的响应

当响应因 token 限制或上下文窗口而被截断时:

def handle_truncated_response(response):
    if response.stop_reason in ["max_tokens", "model_context_window_exceeded"]:
        # 选项 1:向用户警告特定限制
        if response.stop_reason == "max_tokens":
            message = "[Response truncated due to max_tokens limit]"
        else:
            message = "[Response truncated due to context window limit]"
        return f"{response.content[0].text}\n\n{message}"

        # 选项 2:继续生成
        messages = [
            {"role": "user", "content": original_prompt},
            {"role": "assistant", "content": response.content[0].text},
        ]
        continuation = client.messages.create(
            model="claude-opus-4-7",
            max_tokens=1024,
            messages=messages + [{"role": "user", "content": "Please continue"}],
        )
        return response.content[0].text + continuation.content[0].text

3. 为 pause_turn 实现重试逻辑

使用服务端工具时,如果服务端采样循环达到迭代限制(默认 10),API 可能返回 pause_turn。通过继续对话来处理:

def handle_server_tool_conversation(client, user_query, tools, max_continuations=5):
    """
    处理可能需要多次继续的服务端工具对话。

    服务端在执行服务端工具时运行采样循环。如果循环
    达到迭代限制,API 返回 pause_turn。通过发回响应来继续
    对话,让 Claude 完成。
    """
    messages = [{"role": "user", "content": user_query}]

    for _ in range(max_continuations):
        response = client.messages.create(
            model="claude-opus-4-7", max_tokens=1024, messages=messages, tools=tools
        )

        if response.stop_reason != "pause_turn":
            # Claude 完成了处理 - 返回最终响应
            return response

        # pause_turn:替换完整消息列表以维护交替角色
        messages = [
            {"role": "user", "content": user_query},
            {"role": "assistant", "content": response.content},
        ]

    # 达到最大继续次数 - 返回最后一个响应
    return response

停止原因与错误

区分 stop_reason 值和实际错误非常重要:

停止原因(成功响应)

  • 响应正文的一部分
  • 指示生成为何正常停止
  • 响应包含有效内容

错误(失败请求)

  • HTTP 状态码 4xx 或 5xx
  • 指示请求处理失败
  • 响应包含错误详情
import anthropic
from anthropic import Anthropic

client = Anthropic()

try:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello!"}],
    )

    # 处理带有 stop_reason 的成功响应
    if response.stop_reason == "max_tokens":
        print("Response was truncated")

except anthropic.APIStatusError as e:
    # 处理实际错误
    if e.status_code == 429:
        print("Rate limit exceeded")
    elif e.status_code == 500:
        print("Server error")

流式传输注意事项

使用流式传输时,stop_reason 为:

  • 在初始 message_start 事件中为 null
  • message_delta 事件中提供
  • 不在任何其他事件中提供
from anthropic import Anthropic

client = Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
) as stream:
    for event in stream:
        if event.type == "message_delta":
            stop_reason = event.delta.stop_reason
            if stop_reason:
                print(f"Stream ended with: {stop_reason}")

常见模式

处理工具使用工作流

Tip

使用工具运行器更简单:下面的示例展示了手动工具处理。对于大多数用例,工具运行器会自动处理工具执行,代码量少得多。

def complete_tool_workflow(client, user_query, tools):
    messages = [{"role": "user", "content": user_query}]

    while True:
        response = client.messages.create(
            model="claude-opus-4-7", max_tokens=1024, messages=messages, tools=tools
        )

        if response.stop_reason == "tool_use":
            # 执行工具并继续
            tool_results = execute_tools(response.content)
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})
        else:
            # 最终响应
            return response

确保完整响应

def get_complete_response(client, prompt, max_attempts=3):
    messages = [{"role": "user", "content": prompt}]
    full_response = ""

    for _ in range(max_attempts):
        response = client.messages.create(
            model="claude-opus-4-7", messages=messages, max_tokens=4096
        )

        full_response += response.content[0].text

        if response.stop_reason != "max_tokens":
            break

        # 从上次中断的地方继续
        messages = [
            {"role": "user", "content": prompt},
            {"role": "assistant", "content": full_response},
            {"role": "user", "content": "Please continue from where you left off."},
        ]

    return full_response

在不知道输入大小的情况下获取最大 token

使用 model_context_window_exceeded 停止原因,您可以在不计算输入大小的情况下请求最大可能的 token:

def get_max_possible_tokens(client, prompt):
    """
    在模型的上下文窗口内获取尽可能多的 token,
    无需计算输入 token 数量
    """
    response = client.messages.create(
        model="claude-opus-4-7",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=20000,  # Python SDK 对超过约 21k 的 max_tokens 需要流式传输
    )

    if response.stop_reason == "model_context_window_exceeded":
        # 在给定输入大小下获得了最大可能的 token
        print(
            f"Generated {response.usage.output_tokens} tokens (context limit reached)"
        )
    elif response.stop_reason == "max_tokens":
        # 获得了恰好请求的 token
        print(f"Generated {response.usage.output_tokens} tokens (max_tokens reached)")
    else:
        # 自然完成
        print(f"Generated {response.usage.output_tokens} tokens (natural completion)")

    return response.content[0].text

通过正确处理 stop_reason 值,您可以构建更健壮的应用程序,优雅地处理不同的响应场景并提供更好的用户体验。