Migration guide
Guide for migrating to Claude Opus 4.7 and Claude 4.6 models from previous Claude versions
This guide covers migrating Messages API code. If you use Claude Managed Agents, no changes beyond updating model name are required.
Migrating to Claude Opus 4.7
Claude Opus 4.7 is Anthropic's most capable generally available model to date. It is highly autonomous and performs exceptionally well on long-horizon agentic work, knowledge work, vision tasks, and memory tasks.
Claude Opus 4.7 should have strong out-of-the-box performance on existing Claude Opus 4.6 prompts and evals at the same $5 / $25 per MTok pricing, but there are a handful of behavioral and API changes worth knowing about as you migrate. It supports the same set of features as Claude Opus 4.6, including:
- 1M token context window at standard API pricing with no long-context premium
- 128k max output tokens
- Adaptive thinking
- Prompt caching
- Batch processing
- Files API
- PDF support
- Vision
- The full set of server-side and client-side tools (bash, code execution, computer use, text editor, web search, web fetch, MCP connector, memory)
Automate this migration with the Claude API skill. In Claude Code, run /claude-api migrate to invoke the bundled Claude API skill:
/claude-api migrate this project to claude-opus-4-7
The skill applies the model ID swap, breaking parameter changes, prefill replacement, and effort calibration described below across your codebase, then produces a checklist of items to verify manually. It asks you to confirm the migration scope (entire working directory, a subdirectory, or a specific file list) before editing any files. The skill also detects Amazon Bedrock, Vertex AI, Claude Platform on AWS, and Microsoft Foundry clients and adjusts model ID formats and feature changes for each platform.
Update your model name
# Opus migration
model = "claude-opus-4-6" # Before
model = "claude-opus-4-7" # After
Breaking changes
-
Extended thinking removed:
thinking: {type: "enabled", budget_tokens: N}is no longer supported on Claude Opus 4.7 or later models and returns a 400 error. Switch to adaptive thinking (thinking: {type: "adaptive"}) and use the effort parameter to control thinking depth. Adaptive thinking is off by default on Claude Opus 4.7: requests with nothinkingfield run without thinking, matching Opus 4.6 behavior. Setthinking: {type: "adaptive"}explicitly to enable it.Before (Claude Opus 4.6):
client.messages.create( model="claude-opus-4-6", max_tokens=16000, thinking={"type": "enabled", "budget_tokens": 10000}, messages=[{"role": "user", "content": "..."}], )After (Claude Opus 4.7):
client.messages.create( model="claude-opus-4-7", max_tokens=16000, thinking={"type": "adaptive"}, output_config={"effort": "high"}, # or "max", "xhigh", "medium", "low" messages=[{"role": "user", "content": "..."}], )Adaptive thinking is steerable through prompting. For guidance on tuning when the model over- or under-thinks, see Calibrating effort and thinking depth.
-
Sampling parameters removed: Setting
temperature,top_p, ortop_kto any non-default value on Claude Opus 4.7 returns a 400 error. The safest migration path is to omit these parameters entirely from request payloads. Prompting is the recommended way to guide model behavior on Claude Opus 4.7. If you were usingtemperature = 0for determinism, note that it never guaranteed identical outputs on prior models. -
Thinking content omitted by default: Thinking blocks still appear in the response stream on Claude Opus 4.7, but their
thinkingfield is empty unless you explicitly opt in. This is a silent change from Claude Opus 4.6, where the default was to return summarized thinking text. To restore summarized thinking content on Claude Opus 4.7, setthinking.displayto"summarized":thinking = { "type": "adaptive", "display": "summarized", }The default is
"omitted"on Claude Opus 4.7. If your product streams reasoning to users, the new default appears as a long pause before output begins; setdisplay: "summarized"to restore visible progress during thinking. See Extended thinking for details. -
Updated token counting: Claude Opus 4.7 uses a new tokenizer, contributing to its improved performance on a wide range of tasks. The new tokenizer may use roughly 1x to 1.35x as many tokens when processing text compared to previous models (up to ~35% more, varying by content).
/v1/messages/count_tokenswill return a different number of tokens for Claude Opus 4.7 than it did for Claude Opus 4.6. Token efficiency can vary by workload shape.Prompting interventions,
task_budget, andeffortcan help control costs and ensure appropriate token usage. These controls may trade off model intelligence. Update yourmax_tokensparameters to give additional headroom, including compaction triggers. Claude Opus 4.7 provides a 1M context window at standard API pricing with no long-context premium. -
Prefill removal (carried over from Opus 4.6): Prefilling assistant messages returns a 400 error on Claude Opus 4.7. Use structured outputs, system prompt instructions, or
output_config.formatinstead.
Choosing an effort level
The effort parameter allows you to tune Claude's intelligence vs. token spend, trading off capability for faster speed and lower costs. Start with the new xhigh effort level for coding and agentic use cases, and use a minimum of high effort for most intelligence-sensitive use cases. Experiment with other effort levels to further tune token usage and intelligence:
max: Max effort can deliver performance gains in some use cases, but may show diminishing returns from increased token usage. This setting can also sometimes be prone to overthinking. Test max effort for intelligence-demanding tasks.xhigh(new): Extra high effort is the best setting for most coding and agentic use cases.high: This setting balances token usage and intelligence. For most intelligence-sensitive use cases, use a minimum ofhigheffort.medium: Good for cost-sensitive use cases that need to reduce token usage while trading off intelligence.low: Reserve for short, scoped tasks and latency-sensitive workloads that are not intelligence-sensitive.
Effort is more important for this model than for any prior Opus. Experiment with it actively when you upgrade.
Behavior changes
Claude Opus 4.7 has several behavioral differences from Claude Opus 4.6 that are not API breaking changes but may require prompt updates or scaffolding removal.
-
Response length varies by use case: Claude Opus 4.7 calibrates response length to how complex it judges the task to be, rather than defaulting to a fixed verbosity. This usually means shorter answers on simple lookups and much longer ones on open-ended analysis.
If your product depends on a certain style or verbosity of output, you may need to tune your prompts. For example, to decrease verbosity, add: "Provide concise, focused responses. Skip non-essential context, and keep examples minimal." If you see specific kinds of over-explaining, add targeted instructions in your prompt to prevent them.
Positive examples showing how Claude can communicate with the appropriate level of concision tend to be more effective than negative examples or instructions that tell the model what not to do.
-
More literal instruction following: Claude Opus 4.7 interprets prompts more literally and explicitly than Claude Opus 4.6, particularly at lower effort levels. It will not silently generalize an instruction from one item to another, and it will not infer requests you didn't make. The upside of this literalism is precision and less thrash. It generally performs better for API use cases with carefully tuned prompts, structured extraction, and pipelines where you want predictable behavior. A prompt and harness review may be especially helpful for migration to Claude Opus 4.7.
-
More direct tone: As with any new model, prose style on long-form writing may shift. Claude Opus 4.7 is more direct and opinionated, with less validation-forward phrasing and fewer emoji than Claude Opus 4.6's warmer style. If your product relies on a specific voice, re-evaluate style prompts against the new baseline.
-
Built-in progress updates in agentic traces: Claude Opus 4.7 provides more regular, higher-quality updates to the user throughout long agentic traces. If you've added scaffolding to force interim status messages ("After every 3 tool calls, summarize progress"), try removing it. If you find that the length or contents of Claude Opus 4.7's user-facing updates are not well-calibrated to your use case, explicitly describe what these updates should look like in the prompt and provide examples.
-
Fewer subagents spawned by default: Claude Opus 4.7 tends to spawn fewer subagents by default. However, this behavior is steerable through prompting; give Claude Opus 4.7 explicit guidance around when subagents are desirable.
-
Stricter effort calibration: Meaningfully changing from Claude Opus 4.6, Claude Opus 4.7 respects effort levels strictly, especially at the low end. At
lowandmedium, the model scopes its work to what was asked rather than going above and beyond.This is good for latency and cost, but on moderately complex tasks running at
loweffort there is some risk of under-thinking. If you observe shallow reasoning on complex problems, raise effort tohighorxhighrather than prompting around it.If you need to keep effort at
lowfor latency, add targeted guidance: "This task involves multi-step reasoning. Think carefully through the problem before responding." See Recommended effort levels for Claude Opus 4.7. -
Fewer tool calls by default: Claude Opus 4.7 has a tendency to use tools less often than Claude Opus 4.6 and to use reasoning more. This produces better results in most cases.
To increase tool usage, raise the effort setting.
highorxhigheffort settings show substantially more tool usage in agentic search and coding. You can also adjust your prompt to explicitly instruct the model about when and how to properly use its tools. -
Real-time cybersecurity safeguards: Newly added in Claude Opus 4.7, requests that involve prohibited or high-risk topics may lead to refusals. For legitimate security work such as penetration testing, vulnerability research, or red-teaming, apply to the Cyber Verification Program to request reduced restrictions. See Safeguards, warnings, and appeals for background.
-
High-resolution image support: Claude Opus 4.7 is the first Claude model with high-resolution image support. Maximum image resolution is 2576 pixels on the long edge, up from 1568 pixels on prior models. This unlocks gains on vision-heavy workloads and is particularly valuable for computer use, screenshot understanding, and document analysis.
High-resolution support is automatic and requires no beta header or client-side opt-in. Two things to plan for:
- Full-resolution images can use up to approximately 3x more image tokens than on prior models (up to 4,784 tokens per image, compared to the previous cap of roughly 1,600 tokens per image). Re-budget
max_tokensand cost expectations for image-heavy workloads, or downsample before sending if you do not need the additional fidelity. - Pointing and bounding-box coordinates returned by the model are 1:1 with actual image pixels on Claude Opus 4.7, so no scale-factor conversion is required.
See High-resolution image support on Claude Opus 4.7 for details.
- Full-resolution images can use up to approximately 3x more image tokens than on prior models (up to 4,784 tokens per image, compared to the previous cap of roughly 1,600 tokens per image). Re-budget
Recommended changes
These are not required but will improve your experience:
-
Re-evaluate
max_tokens: Because the same text produces a higher token count on Claude Opus 4.7, update yourmax_tokensparameters to give additional headroom, including compaction triggers. Prompting interventions,task_budget, andeffortcan help control costs and ensure appropriate token usage. -
Audit token-count expectations: Any code path that estimates tokens client-side or assumes a fixed token-to-character ratio should be re-tested against Claude Opus 4.7. Use the Token counting endpoint to verify.
-
Adopt task budgets (beta): Claude Opus 4.7 introduces task budgets. These budgets let you inform Claude how many tokens it has for a full agentic loop, including thinking, tool calls, tool results, and final output. The model sees a running countdown and uses it to prioritize work and finish the task gracefully as the budget is consumed. To use, set the beta header
task-budgets-2026-03-13and add the following to your output config:output_config = { "effort": "high", "task_budget": {"type": "tokens", "total": 128000}, }You may need to experiment with different task budgets for your use case. If the model is given a task budget that is too restrictive, it may complete the task less thoroughly, referencing its budget as the constraint.
For open-ended agentic tasks where quality matters more than speed, do not set a task budget. Reserve task budgets for workloads where you need the model to scope its work to a token allowance. The minimum value for a task budget is 20k tokens.
A task budget is not a hard cap; it's a suggestion that the model is aware of. It differs from
max_tokens:task_budget: an advisory cap across the full agentic loop. The model sees it and uses it to pace itself.max_tokens: a hard per-request ceiling on generated tokens. It is not passed to the model, so the model is not aware of it.
Use
task_budgetwhen you want the model to self-moderate, andmax_tokensas a hard ceiling to cap usage. -
Set a large
max_tokensatmaxorxhigheffort: If you are running Claude Opus 4.7 atmaxorxhigheffort, set a large max output token budget so the model has room to think and act across its subagents and tool calls. Start at 64k tokens and tune from there. -
Downsample images if high resolution is unnecessary: Claude Opus 4.7 supports images up to 2576px / 3.75MP. High-res images use more tokens. If the additional image fidelity is unnecessary, downsample images before sending to Claude to avoid token-usage increases. See Images and vision.
Migration checklist
- Update model name from
claude-opus-4-6toclaude-opus-4-7(or update aliases). - Remove
temperature,top_p, andtop_kfrom request payloads. - Replace
thinking: {type: "enabled", budget_tokens: N}withthinking: {type: "adaptive"}plus the effort parameter. - Remove any assistant-message prefills.
- If your UI displays thinking content, explicitly opt in to thinking summarization.
- Re-benchmark end-to-end cost and latency under the updated tokenization.
- Re-tune
max_tokensto account for the updated tokenization. - Re-test any client-side token-count estimations.
- If your application sends images, re-budget for high-resolution image support (up to approximately 3x more image tokens per full-resolution image). Downsample before sending if you do not need the additional fidelity.
- If you consume pointing or bounding-box coordinates from the model, remove any scale-factor conversion; coordinates are 1:1 with actual image pixels on Claude Opus 4.7.
- Review prompts for the behavior changes above (response length, literalism, tone, progress updates, subagents, effort calibration, tool triggering, cyber safeguards, high-resolution image handling).
- Re-baseline response length with existing length-control prompts removed, then tune explicitly.
- If using
xhighormaxeffort, raisemax_tokensto at least 64k as a starting point. - Consider adopting task budgets (beta) for agentic workflows.
- If your product does legitimate security work, apply to the Cyber Verification Program for access to lower restrictions on cyber content.
Migrating to Claude Opus 4.7 from Opus 4.5 or earlier
If you are migrating from Claude Opus 4.5, Opus 4.1, or an earlier model directly to Claude Opus 4.7, apply all of the Opus 4.7 changes above plus the cumulative changes in this section that took effect between Opus 4.5 and Opus 4.7. If you are migrating from Opus 4.6, you only need the Opus 4.7 section above.
Update your model name
# Opus migration
model = "claude-opus-4-5" # Before
model = "claude-opus-4-7" # After
Breaking changes
-
Prefill removal is covered in the Opus 4.7 breaking changes above.
-
Tool parameter quoting: Claude Opus 4.6 and later models may produce slightly different JSON string escaping in tool call arguments (e.g., different handling of Unicode escapes or forward slash escaping). If you parse tool call
inputas a raw string rather than using a JSON parser, verify your parsing logic. Standard JSON parsers (likejson.loads()orJSON.parse()) handle these differences automatically.
Recommended changes
These changes improve your experience on Opus 4.7. Items marked (required on Opus 4.7) were optional recommendations when Opus 4.6 launched but are now mandatory; the rest remain recommended.
-
Migrate to adaptive thinking (required on Opus 4.7):
thinking: {type: "enabled", budget_tokens: N}returns a 400 error on Claude Opus 4.7. Switch tothinking: {type: "adaptive"}and use the effort parameter to control thinking depth. See Adaptive thinking.curl -sS https://api.anthropic.com/v1/messages \ -H "content-type: application/json" \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "claude-opus-4-7", "max_tokens": 16000, "thinking": {"type": "adaptive"}, "output_config": {"effort": "high"}, "messages": [{"role": "user", "content": "Your prompt here"}] }'import anthropic client = anthropic.Anthropic() response = client.beta.messages.create( model="claude-opus-4-5", max_tokens=16000, thinking={"type": "enabled", "budget_tokens": 32000}, betas=["interleaved-thinking-2025-05-14"], messages=[{"role": "user", "content": "Your prompt here"}], )response = client.messages.create( model="claude-opus-4-7", max_tokens=16000, thinking={"type": "adaptive"}, output_config={"effort": "high"}, messages=[{"role": "user", "content": "Your prompt here"}], )ant messages create <<'YAML' model: claude-opus-4-7 max_tokens: 16000 thinking: type: adaptive output_config: effort: high messages: - role: user content: Your prompt here YAMLimport Anthropic from "@anthropic-ai/sdk"; const client = new Anthropic(); const response = await client.messages.create({ model: "claude-opus-4-7", max_tokens: 16000, thinking: { type: "adaptive" }, output_config: { effort: "high" }, messages: [{ role: "user", content: "Your prompt here" }] });using Anthropic; using Anthropic.Models.Messages; AnthropicClient client = new(); var parameters = new MessageCreateParams { Model = Model.ClaudeOpus4_7, MaxTokens = 16000, Thinking = new ThinkingConfigAdaptive(), OutputConfig = new OutputConfig { Effort = Effort.High }, Messages = [new() { Role = Role.User, Content = "Your prompt here" }] }; var response = await client.Messages.Create(parameters); Console.WriteLine(response);package main import ( "context" "fmt" "log" "github.com/anthropics/anthropic-sdk-go" ) func main() { client := anthropic.NewClient() response, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{ Model: anthropic.ModelClaudeOpus4_7, MaxTokens: 16000, Thinking: anthropic.ThinkingConfigParamUnion{ OfAdaptive: &anthropic.ThinkingConfigAdaptiveParam{}, }, OutputConfig: anthropic.OutputConfigParam{ Effort: anthropic.OutputConfigEffortHigh, }, Messages: []anthropic.MessageParam{ anthropic.NewUserMessage(anthropic.NewTextBlock("Your prompt here")), }, }) if err != nil { log.Fatal(err) } fmt.Println(response) }import com.anthropic.client.AnthropicClient; import com.anthropic.client.okhttp.AnthropicOkHttpClient; import com.anthropic.models.messages.MessageCreateParams; import com.anthropic.models.messages.Message; import com.anthropic.models.messages.Model; import com.anthropic.models.messages.OutputConfig; import com.anthropic.models.messages.ThinkingConfigAdaptive; public class AdaptiveThinkingExample { public static void main(String[] args) { AnthropicClient client = AnthropicOkHttpClient.fromEnv(); MessageCreateParams params = MessageCreateParams.builder() .model(Model.CLAUDE_OPUS_4_7) .maxTokens(16000L) .thinking(ThinkingConfigAdaptive.builder().build()) .outputConfig(OutputConfig.builder() .effort(OutputConfig.Effort.HIGH) .build()) .addUserMessage("Your prompt here") .build(); Message response = client.messages().create(params); System.out.println(response); } }<?php use Anthropic\Client; $client = new Client(apiKey: getenv("ANTHROPIC_API_KEY")); $response = $client->messages->create( maxTokens: 16000, messages: [['role' => 'user', 'content' => 'Your prompt here']], model: 'claude-opus-4-7', thinking: ['type' => 'adaptive'], outputConfig: ['effort' => 'high'], );require "anthropic" client = Anthropic::Client.new response = client.messages.create( model: "claude-opus-4-7", max_tokens: 16000, thinking: { type: "adaptive" }, output_config: { effort: "high" }, messages: [{ role: "user", content: "Your prompt here" }] )Note that the migration also moves from
client.beta.messages.createtoclient.messages.create. Adaptive thinking and effort are GA features and do not require the beta SDK namespace or any beta headers. -
Remove effort beta header: The effort parameter is now GA. Remove
betas=["effort-2025-11-24"]from your requests. -
Remove fine-grained tool streaming beta header: Fine-grained tool streaming is now GA. Remove
betas=["fine-grained-tool-streaming-2025-05-14"]from your requests. -
Remove interleaved thinking beta header: Adaptive thinking automatically enables interleaved thinking on Claude Opus 4.7, Opus 4.6, and Sonnet 4.6. Remove
betas=["interleaved-thinking-2025-05-14"]from your requests. The header is still functional on Sonnet 4.6 with manual extended thinking, but manual mode is deprecated. -
Migrate to output_config.format: If using structured outputs, update
output_format={...}tooutput_config={"format": {...}}. The old parameter remains functional but is deprecated and will be removed in a future model release.
Migrating from Claude 4.1 or earlier
If you're migrating from Opus 4.1, Sonnet 4 (deprecated), or earlier models directly to Claude Opus 4.7, apply the Claude Opus 4.7 changes at the top of this guide and the cumulative changes above plus the additional changes in this section.
# From Opus 4.1
model = "claude-opus-4-1-20250805" # Before
model = "claude-opus-4-7" # After
# From Sonnet 4
model = "claude-sonnet-4-20250514" # Before
model = "claude-opus-4-7" # After
# From Sonnet 3.7
model = "claude-3-7-sonnet-20250219" # Before
model = "claude-opus-4-7" # After
Additional breaking changes
-
Remove sampling parameters
WarningThis is a breaking change when migrating from Claude 3.x models.
Starting with Claude Opus 4.7, setting
temperature,top_p, ortop_kto any non-default value will return a 400 error. The safest migration path is to omit these parameters entirely from requests, and to use prompting to guide the model's behavior. If you were usingtemperature = 0for determinism, note that it never guaranteed identical outputs.# Before - This will error in Claude 4+ models response = client.messages.create( model="claude-3-7-sonnet-20250219", temperature=0.7, top_p=0.9, # Non-default sampling params return 400 on Opus 4.7 # ... ) # After response = client.messages.create( model="claude-opus-4-7", # ... ) -
Update tool versions
WarningThis is a breaking change when migrating from Claude 3.x models.
Update to the latest tool versions. Remove any code using the
undo_editcommand.# Before tools = [{"type": "text_editor_20250124", "name": "str_replace_editor"}] # After tools = [{"type": "text_editor_20250728", "name": "str_replace_based_edit_tool"}]- Text editor: Use
text_editor_20250728andstr_replace_based_edit_tool. See Text editor tool documentation for details. - Code execution: Upgrade to
code_execution_20250825. See Code execution tool documentation for migration instructions.
- Text editor: Use
-
Handle the
refusalstop reasonUpdate your application to handle
refusalstop reasons:response = client.messages.create(...) if response.stop_reason == "refusal": # Handle refusal appropriately pass -
Handle the
model_context_window_exceededstop reasonClaude 4.5+ models return a
model_context_window_exceededstop reason when generation stops due to hitting the context window limit, rather than the requestedmax_tokenslimit. Update your application to handle this new stop reason:response = client.messages.create(...) if response.stop_reason == "model_context_window_exceeded": # Handle context window limit appropriately pass -
Verify tool parameter handling (trailing newlines)
Claude 4.5+ models preserve trailing newlines in tool call string parameters that were previously stripped. If your tools rely on exact string matching against tool call parameters, verify your logic handles trailing newlines correctly.
-
Update your prompts for behavioral changes
Claude 4+ models have a more concise, direct communication style and require explicit direction. Review prompting best practices for optimization guidance.
Additional recommended changes
- Remove legacy beta headers: Remove
token-efficient-tools-2025-02-19andoutput-128k-2025-02-19. All Claude 4+ models have built-in token-efficient tool use and these headers have no effect.
Migration checklist (from Opus 4.5 or earlier)
- Update model ID to
claude-opus-4-7 - Apply all Opus 4.7 breaking changes (extended thinking removed, sampling parameters removed, thinking display omitted by default, updated tokenization)
- BREAKING: Remove assistant message prefills (returns 400 error); use structured outputs or
output_config.formatinstead - BREAKING on Opus 4.7: Replace
thinking: {type: "enabled", budget_tokens: N}withthinking: {type: "adaptive"}plus the effort parameter (returns 400 on Opus 4.7) - Verify tool call JSON parsing uses a standard JSON parser
- Remove
effort-2025-11-24beta header (effort is now GA) - Remove
fine-grained-tool-streaming-2025-05-14beta header - Remove
interleaved-thinking-2025-05-14beta header (adaptive thinking enables interleaved thinking automatically) - Migrate
output_formattooutput_config.format(if applicable) - If migrating from Claude 4.1 or earlier: remove
temperature,top_p, andtop_k(non-default values return 400 on Opus 4.7) - If migrating from Claude 4.1 or earlier: update tool versions (
text_editor_20250728,code_execution_20250825) - If migrating from Claude 4.1 or earlier: handle
refusalstop reason - If migrating from Claude 4.1 or earlier: handle
model_context_window_exceededstop reason - If migrating from Claude 4.1 or earlier: verify tool string parameter handling for trailing newlines
- If migrating from Claude 4.1 or earlier: remove legacy beta headers (
token-efficient-tools-2025-02-19,output-128k-2025-02-19) - Review and update prompts following prompting best practices
- Test in development environment before production deployment
Migrating to Claude Sonnet 4.6
Claude Sonnet 4.6 combines strong intelligence with fast performance, featuring improved agentic search capabilities and free code execution when used with web search or web fetch. It is ideal for everyday coding, analysis, and content tasks.
For a complete overview of capabilities, see the models overview.
Sonnet 4.6 pricing is $3 per million input tokens, $15 per million output tokens. See Claude pricing for details.
Update your model name:
# From Sonnet 4.5
model = "claude-sonnet-4-5" # Before
model = "claude-sonnet-4-6" # After
# From Sonnet 4
model = "claude-sonnet-4-20250514" # Before
model = "claude-sonnet-4-6" # After
Breaking changes
When migrating from Sonnet 4.5
-
Prefilling assistant messages is no longer supported
WarningThis is a breaking change when migrating from Sonnet 4.5 or earlier.
Prefilling assistant messages returns a
400error on Sonnet 4.6. Use structured outputs, system prompt instructions, oroutput_config.formatinstead.Common prefill use cases and migrations:
-
Controlling output formatting (forcing JSON/YAML output): Use structured outputs or tools with enum fields for classification tasks.
-
Eliminating preambles (removing "Here is..." phrases): Add direct instructions in the system prompt: "Respond directly without preamble. Do not start with phrases like 'Here is...', 'Based on...', etc."
-
Avoiding bad refusals: Claude is much better at appropriate refusals now. Clear prompting in the user message without prefill should be sufficient.
-
Continuations (resuming interrupted responses): Move the continuation to the user message: "Your previous response was interrupted and ended with
[previous_response]. Continue from where you left off." -
Context hydration / role consistency (refreshing context in long conversations): Inject what were previously prefilled-assistant reminders into the user turn instead.
-
-
Tool parameter JSON escaping may differ
WarningThis is a breaking change when migrating from Sonnet 4.5 or earlier.
JSON string escaping in tool parameters may differ from previous models. Standard JSON parsers handle this automatically, but custom string-based parsing may need updates.
When migrating from Claude 3.x
-
Update sampling parameters
WarningThis is a breaking change when migrating from Claude 3.x models.
Use only
temperatureORtop_p, not both. -
Update tool versions
WarningThis is a breaking change when migrating from Claude 3.x models.
Update to the latest tool versions (
text_editor_20250728,code_execution_20250825). Remove any code using theundo_editcommand. -
Handle the
refusalstop reasonUpdate your application to handle
refusalstop reasons. -
Update your prompts for behavioral changes
Claude 4 models have a more concise, direct communication style. Review prompting best practices for optimization guidance.
Recommended changes
- Remove
fine-grained-tool-streaming-2025-05-14beta header: Fine-grained tool streaming is now GA on Sonnet 4.6 and no longer requires a beta header. - Migrate
output_formattooutput_config.format: Theoutput_formatparameter is deprecated. Useoutput_config.formatinstead.
Migrating from Sonnet 4.5
Consider migrating from Sonnet 4.5 to Sonnet 4.6, which delivers more intelligence at the same price point.
Sonnet 4.6 defaults to an effort level of high, in contrast to Sonnet 4.5 which had no effort parameter. Consider adjusting the effort parameter as you migrate from Sonnet 4.5 to Sonnet 4.6. If not explicitly set, you may experience higher latency with the default effort level.
If you're not using extended thinking
If you're not using extended thinking on Sonnet 4.5, you can continue without it on Sonnet 4.6. You should explicitly set effort to the level appropriate for your use case. At low effort with thinking disabled, you can expect similar or better performance relative to Sonnet 4.5 with no extended thinking.
curl https://api.anthropic.com/v1/messages \
--header "x-api-key: $ANTHROPIC_API_KEY" \
--header "anthropic-version: 2023-06-01" \
--header "content-type: application/json" \
--data \
'{
"model": "claude-sonnet-4-6",
"max_tokens": 8192,
"output_config": {
"effort": "low"
},
"messages": [
{
"role": "user",
"content": "Your prompt here"
}
]
}'
ant messages create <<'YAML'
model: claude-sonnet-4-6
max_tokens: 8192
output_config:
effort: low
messages:
- role: user
content: Your prompt here
YAML
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=8192,
output_config={"effort": "low"},
messages=[{"role": "user", "content": "Your prompt here"}],
)
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 8192,
output_config: { effort: "low" },
messages: [{ role: "user", content: "Your prompt here" }]
});
using Anthropic;
using Anthropic.Models.Messages;
AnthropicClient client = new();
var parameters = new MessageCreateParams
{
Model = Model.ClaudeSonnet4_6,
MaxTokens = 8192,
OutputConfig = new OutputConfig
{
Effort = Effort.Low
},
Messages = [new() { Role = Role.User, Content = "Your prompt here" }]
};
var message = await client.Messages.Create(parameters);
Console.WriteLine(message);
package main
import (
"context"
"fmt"
"log"
"github.com/anthropics/anthropic-sdk-go"
)
func main() {
client := anthropic.NewClient()
response, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
Model: anthropic.Model("claude-sonnet-4-6"),
MaxTokens: 8192,
OutputConfig: anthropic.OutputConfigParam{
Effort: anthropic.OutputConfigEffortLow,
},
Messages: []anthropic.MessageParam{
anthropic.NewUserMessage(anthropic.NewTextBlock("Your prompt here")),
},
})
if err != nil {
log.Fatal(err)
}
fmt.Println(response.Content[0].Text)
}
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.messages.MessageCreateParams;
import com.anthropic.models.messages.Message;
import com.anthropic.models.messages.Model;
import com.anthropic.models.messages.OutputConfig;
public class Main {
public static void main(String[] args) {
AnthropicClient client = AnthropicOkHttpClient.fromEnv();
MessageCreateParams params = MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(8192L)
.outputConfig(OutputConfig.builder()
.effort(OutputConfig.Effort.LOW)
.build())
.addUserMessage("Your prompt here")
.build();
Message response = client.messages().create(params);
response.content().stream()
.flatMap(block -> block.text().stream())
.forEach(textBlock -> System.out.println(textBlock.text()));
}
}
<?php
use Anthropic\Client;
$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));
$message = $client->messages->create(
maxTokens: 8192,
messages: [['role' => 'user', 'content' => 'Your prompt here']],
model: 'claude-sonnet-4-6',
outputConfig: ['effort' => 'low'],
);
echo $message->content[0]->text;
require "anthropic"
client = Anthropic::Client.new
message = client.messages.create(
model: "claude-sonnet-4-6",
max_tokens: 8192,
output_config: {
effort: "low"
},
messages: [
{ role: "user", content: "Your prompt here" }
]
)
puts message.content.first.text
If you're using extended thinking
If you're using extended thinking with budget_tokens on Sonnet 4.5, it is still functional on Sonnet 4.6 but is deprecated. Migrate to adaptive thinking with the effort parameter.
Migrating to adaptive thinking
Adaptive thinking is the recommended replacement for budget_tokens on Sonnet 4.6. It is particularly well suited to the following workload patterns:
- Autonomous multi-step agents: coding agents that turn requirements into working software, data analysis pipelines, and bug finding where the model runs independently across many steps. Adaptive thinking lets the model calibrate its reasoning per step, staying on path over longer trajectories. For these workloads, start at
higheffort. If latency or token usage is a concern, scale down tomedium. - Computer use agents: Sonnet 4.6 achieved best-in-class accuracy on computer use evaluations using adaptive mode.
- Bimodal workloads: a mix of easy and hard tasks where adaptive skips thinking on simple queries and reasons deeply on complex ones.
When using adaptive thinking, evaluate medium and high effort on your tasks. The right level depends on your workload's tradeoff between quality, latency, and token usage.
curl https://api.anthropic.com/v1/messages \
--header "x-api-key: $ANTHROPIC_API_KEY" \
--header "anthropic-version: 2023-06-01" \
--header "content-type: application/json" \
--data \
'{
"model": "claude-sonnet-4-6",
"max_tokens": 64000,
"thinking": {
"type": "adaptive"
},
"output_config": {
"effort": "medium"
},
"messages": [
{
"role": "user",
"content": "Your prompt here"
}
]
}'
ant messages create <<'YAML'
model: claude-sonnet-4-6
max_tokens: 64000
thinking:
type: adaptive
output_config:
effort: medium
messages:
- role: user
content: Your prompt here
YAML
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=64000,
thinking={"type": "adaptive"},
output_config={"effort": "medium"},
messages=[{"role": "user", "content": "Your prompt here"}],
)
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 64000,
thinking: { type: "adaptive" },
output_config: { effort: "medium" },
messages: [{ role: "user", content: "Your prompt here" }]
});
using Anthropic;
using Anthropic.Models.Messages;
AnthropicClient client = new();
var parameters = new MessageCreateParams
{
Model = Model.ClaudeSonnet4_6,
MaxTokens = 64000,
Thinking = new ThinkingConfigAdaptive(),
OutputConfig = new OutputConfig { Effort = Effort.Medium },
Messages = [new() { Role = Role.User, Content = "Your prompt here" }]
};
var message = await client.Messages.Create(parameters);
Console.WriteLine(message);
package main
import (
"context"
"fmt"
"log"
"github.com/anthropics/anthropic-sdk-go"
)
func main() {
client := anthropic.NewClient()
response, err := client.Messages.New(context.TODO(), anthropic.MessageNewParams{
Model: "claude-sonnet-4-6",
MaxTokens: 64000,
Thinking: anthropic.ThinkingConfigParamUnion{
OfAdaptive: &anthropic.ThinkingConfigAdaptiveParam{},
},
OutputConfig: anthropic.OutputConfigParam{
Effort: anthropic.OutputConfigEffortMedium,
},
Messages: []anthropic.MessageParam{
anthropic.NewUserMessage(anthropic.NewTextBlock("Your prompt here")),
},
})
if err != nil {
log.Fatal(err)
}
fmt.Println(response)
}
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.messages.MessageCreateParams;
import com.anthropic.models.messages.Message;
import com.anthropic.models.messages.Model;
import com.anthropic.models.messages.OutputConfig;
import com.anthropic.models.messages.ThinkingConfigAdaptive;
public class Main {
public static void main(String[] args) {
AnthropicClient client = AnthropicOkHttpClient.fromEnv();
MessageCreateParams params = MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(64000L)
.thinking(ThinkingConfigAdaptive.builder().build())
.outputConfig(OutputConfig.builder()
.effort(OutputConfig.Effort.MEDIUM)
.build())
.addUserMessage("Your prompt here")
.build();
Message response = client.messages().create(params);
System.out.println(response);
}
}
<?php
use Anthropic\Client;
$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));
$message = $client->messages->create(
maxTokens: 64000,
messages: [['role' => 'user', 'content' => 'Your prompt here']],
model: 'claude-sonnet-4-6',
thinking: ['type' => 'adaptive'],
outputConfig: ['effort' => 'medium'],
);
echo array_find($message->content, fn($block) => $block->type === 'text')->text;
require "anthropic"
client = Anthropic::Client.new
message = client.messages.create(
model: "claude-sonnet-4-6",
max_tokens: 64000,
thinking: {
type: "adaptive"
},
output_config: {
effort: "medium"
},
messages: [
{ role: "user", content: "Your prompt here" }
]
)
puts message.content.find { |block| block.type == :text }.text
If you see inconsistent behavior or quality regressions with adaptive thinking, try lowering the effort setting or using max_tokens as a hard limit first. Extended thinking with budget_tokens is still functional on Sonnet 4.6 but is deprecated and no longer recommended.
Keeping budget_tokens during migration
If you need to keep budget_tokens temporarily while migrating, a budget around 16k tokens provides headroom for harder problems without risk of runaway token usage. This configuration is deprecated and will be removed in a future model release.
Coding and agentic use cases
For agentic coding, frontend design, tool-heavy workflows, and complex enterprise workflows, start with medium effort. If you find latency is too high, consider reducing effort to low. If you need higher intelligence, consider increasing effort to high or migrating to Opus 4.7.
curl https://api.anthropic.com/v1/messages \
--header "x-api-key: $ANTHROPIC_API_KEY" \
--header "anthropic-version: 2023-06-01" \
--header "anthropic-beta: interleaved-thinking-2025-05-14" \
--header "content-type: application/json" \
--data \
'{
"model": "claude-sonnet-4-6",
"max_tokens": 16384,
"thinking": {
"type": "enabled",
"budget_tokens": 16384
},
"output_config": {
"effort": "medium"
},
"messages": [
{
"role": "user",
"content": "Your prompt here"
}
]
}'
ant beta:messages create --beta interleaved-thinking-2025-05-14 <<'YAML'
model: claude-sonnet-4-6
max_tokens: 16384
thinking:
type: enabled
budget_tokens: 16384
output_config:
effort: medium
messages:
- role: user
content: Your prompt here
YAML
response = client.beta.messages.create(
model="claude-sonnet-4-6",
max_tokens=16384,
thinking={"type": "enabled", "budget_tokens": 16384},
output_config={"effort": "medium"},
betas=["interleaved-thinking-2025-05-14"],
messages=[{"role": "user", "content": "Your prompt here"}],
)
const response = await client.beta.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 16384,
thinking: { type: "enabled", budget_tokens: 16384 },
output_config: { effort: "medium" },
betas: ["interleaved-thinking-2025-05-14"],
messages: [{ role: "user", content: "Your prompt here" }]
});
using Anthropic;
using Anthropic.Models.Beta;
using Anthropic.Models.Beta.Messages;
AnthropicClient client = new();
var parameters = new MessageCreateParams
{
Model = "claude-sonnet-4-6",
MaxTokens = 16384,
Thinking = new BetaThinkingConfigEnabled { BudgetTokens = 16384 },
OutputConfig = new BetaOutputConfig
{
Effort = Effort.Medium
},
Betas = [AnthropicBeta.InterleavedThinking2025_05_14],
Messages = [new() { Role = Role.User, Content = "Your prompt here" }]
};
var message = await client.Beta.Messages.Create(parameters);
Console.WriteLine(message);
package main
import (
"context"
"fmt"
"log"
"github.com/anthropics/anthropic-sdk-go"
)
func main() {
client := anthropic.NewClient()
response, err := client.Beta.Messages.New(context.TODO(), anthropic.BetaMessageNewParams{
Model: "claude-sonnet-4-6",
MaxTokens: 16384,
Thinking: anthropic.BetaThinkingConfigParamOfEnabled(16384),
OutputConfig: anthropic.BetaOutputConfigParam{
Effort: anthropic.BetaOutputConfigEffortMedium,
},
Messages: []anthropic.BetaMessageParam{
anthropic.NewBetaUserMessage(anthropic.NewBetaTextBlock("Your prompt here")),
},
Betas: []anthropic.AnthropicBeta{anthropic.AnthropicBetaInterleavedThinking2025_05_14},
})
if err != nil {
log.Fatal(err)
}
fmt.Println(response)
}
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.beta.messages.MessageCreateParams;
import com.anthropic.models.beta.messages.BetaMessage;
import com.anthropic.models.messages.Model;
import com.anthropic.models.beta.AnthropicBeta;
import com.anthropic.models.beta.messages.BetaThinkingConfigEnabled;
import com.anthropic.models.beta.messages.BetaOutputConfig;
public class Main {
public static void main(String[] args) {
AnthropicClient client = AnthropicOkHttpClient.fromEnv();
MessageCreateParams params = MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(16384L)
.thinking(BetaThinkingConfigEnabled.builder()
.budgetTokens(16384L)
.build())
.outputConfig(BetaOutputConfig.builder()
.effort(BetaOutputConfig.Effort.MEDIUM)
.build())
.addBeta(AnthropicBeta.INTERLEAVED_THINKING_2025_05_14)
.addUserMessage("Your prompt here")
.build();
BetaMessage response = client.beta().messages().create(params);
System.out.println(response);
}
}
<?php
use Anthropic\Client;
$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));
$message = $client->beta->messages->create(
maxTokens: 16384,
messages: [['role' => 'user', 'content' => 'Your prompt here']],
model: 'claude-sonnet-4-6',
thinking: ['type' => 'enabled', 'budget_tokens' => 16384],
outputConfig: ['effort' => 'medium'],
betas: ['interleaved-thinking-2025-05-14'],
);
echo array_find($message->content, fn($block) => $block->type === 'text')->text;
require "anthropic"
client = Anthropic::Client.new
message = client.beta.messages.create(
model: "claude-sonnet-4-6",
max_tokens: 16384,
thinking: {
type: "enabled",
budget_tokens: 16384
},
output_config: {
effort: "medium"
},
betas: ["interleaved-thinking-2025-05-14"],
messages: [
{ role: "user", content: "Your prompt here" }
]
)
puts message.content.find { |block| block.type == :text }.text
Chat and non-coding use cases
For chat, content generation, search, classification, and other non-coding tasks, start with low effort with extended thinking. If you need more depth, increase effort to medium.
curl https://api.anthropic.com/v1/messages \
--header "x-api-key: $ANTHROPIC_API_KEY" \
--header "anthropic-version: 2023-06-01" \
--header "anthropic-beta: interleaved-thinking-2025-05-14" \
--header "content-type: application/json" \
--data \
'{
"model": "claude-sonnet-4-6",
"max_tokens": 8192,
"thinking": {
"type": "enabled",
"budget_tokens": 16384
},
"output_config": {
"effort": "low"
},
"messages": [
{
"role": "user",
"content": "Your prompt here"
}
]
}'
ant beta:messages create --beta interleaved-thinking-2025-05-14 <<'YAML'
model: claude-sonnet-4-6
max_tokens: 8192
thinking:
type: enabled
budget_tokens: 16384
output_config:
effort: low
messages:
- role: user
content: Your prompt here
YAML
response = client.beta.messages.create(
model="claude-sonnet-4-6",
max_tokens=8192,
thinking={"type": "enabled", "budget_tokens": 16384},
output_config={"effort": "low"},
betas=["interleaved-thinking-2025-05-14"],
messages=[{"role": "user", "content": "Your prompt here"}],
)
const response = await client.beta.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 8192,
thinking: { type: "enabled", budget_tokens: 16384 },
output_config: { effort: "low" },
betas: ["interleaved-thinking-2025-05-14"],
messages: [{ role: "user", content: "Your prompt here" }]
});
using Anthropic;
using Anthropic.Models.Beta;
using Anthropic.Models.Beta.Messages;
AnthropicClient client = new();
var parameters = new MessageCreateParams
{
Model = "claude-sonnet-4-6",
MaxTokens = 8192,
Thinking = new BetaThinkingConfigEnabled { BudgetTokens = 16384 },
OutputConfig = new BetaOutputConfig
{
Effort = Effort.Low
},
Betas = [AnthropicBeta.InterleavedThinking2025_05_14],
Messages = [new() { Role = Role.User, Content = "Your prompt here" }]
};
var message = await client.Beta.Messages.Create(parameters);
Console.WriteLine(message);
package main
import (
"context"
"fmt"
"log"
"github.com/anthropics/anthropic-sdk-go"
)
func main() {
client := anthropic.NewClient()
response, err := client.Beta.Messages.New(context.TODO(), anthropic.BetaMessageNewParams{
Model: "claude-sonnet-4-6",
MaxTokens: 8192,
Thinking: anthropic.BetaThinkingConfigParamOfEnabled(16384),
OutputConfig: anthropic.BetaOutputConfigParam{
Effort: anthropic.BetaOutputConfigEffortLow,
},
Messages: []anthropic.BetaMessageParam{
anthropic.NewBetaUserMessage(anthropic.NewBetaTextBlock("Your prompt here")),
},
Betas: []anthropic.AnthropicBeta{anthropic.AnthropicBetaInterleavedThinking2025_05_14},
})
if err != nil {
log.Fatal(err)
}
fmt.Println(response)
}
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.beta.messages.MessageCreateParams;
import com.anthropic.models.beta.messages.BetaMessage;
import com.anthropic.models.messages.Model;
import com.anthropic.models.beta.AnthropicBeta;
import com.anthropic.models.beta.messages.BetaThinkingConfigEnabled;
import com.anthropic.models.beta.messages.BetaOutputConfig;
public class Main {
public static void main(String[] args) {
AnthropicClient client = AnthropicOkHttpClient.fromEnv();
MessageCreateParams params = MessageCreateParams.builder()
.model(Model.CLAUDE_SONNET_4_6)
.maxTokens(8192L)
.thinking(BetaThinkingConfigEnabled.builder()
.budgetTokens(16384L)
.build())
.outputConfig(BetaOutputConfig.builder()
.effort(BetaOutputConfig.Effort.LOW)
.build())
.addBeta(AnthropicBeta.INTERLEAVED_THINKING_2025_05_14)
.addUserMessage("Your prompt here")
.build();
BetaMessage response = client.beta().messages().create(params);
System.out.println(response);
}
}
<?php
use Anthropic\Client;
$client = new Client(apiKey: getenv("ANTHROPIC_API_KEY"));
$message = $client->beta->messages->create(
maxTokens: 8192,
messages: [['role' => 'user', 'content' => 'Your prompt here']],
model: 'claude-sonnet-4-6',
thinking: ['type' => 'enabled', 'budget_tokens' => 16384],
outputConfig: ['effort' => 'low'],
betas: ['interleaved-thinking-2025-05-14'],
);
echo array_find($message->content, fn($block) => $block->type === 'text')->text;
require "anthropic"
client = Anthropic::Client.new
message = client.beta.messages.create(
model: "claude-sonnet-4-6",
max_tokens: 8192,
thinking: {
type: "enabled",
budget_tokens: 16384
},
output_config: {
effort: "low"
},
betas: ["interleaved-thinking-2025-05-14"],
messages: [
{ role: "user", content: "Your prompt here" }
]
)
puts message.content.find { |block| block.type == :text }.text
Sonnet 4.6 migration checklist
- Update model ID to
claude-sonnet-4-6 - BREAKING: Remove assistant message prefilling; use structured outputs or
output_config.formatinstead - BREAKING: Verify tool parameter JSON parsing handles escaping differences
- BREAKING: Update tool versions to latest (
text_editor_20250728,code_execution_20250825); legacy versions are not supported (if migrating from 3.x) - BREAKING: Remove any code using the
undo_editcommand (if applicable) - BREAKING: Update sampling parameters to use only
temperatureORtop_p, not both (if migrating from 3.x) - Handle new
refusalstop reason in your application - Remove
fine-grained-tool-streaming-2025-05-14beta header (now GA) - Migrate
output_formattooutput_config.format - Review and update prompts following prompting best practices
- Recommended: Migrate from
thinking: {type: "enabled", budget_tokens: N}tothinking: {type: "adaptive"}with the effort parameter (budget_tokensis deprecated and will be removed in a future release) - Test in development environment before production deployment
Migrating to Claude Sonnet 4.5
Claude Sonnet 4.5 combines strong intelligence with fast performance, making it ideal for everyday coding, analysis, and content tasks.
For a complete overview of capabilities, see the models overview.
Sonnet 4.5 pricing is $3 per million input tokens, $15 per million output tokens. See Claude pricing for details.
Update your model name:
# From Sonnet 4
model = "claude-sonnet-4-20250514" # Before
model = "claude-sonnet-4-5-20250929" # After
# From Sonnet 3.7
model = "claude-3-7-sonnet-20250219" # Before
model = "claude-sonnet-4-5-20250929" # After
Breaking changes
These breaking changes apply when migrating from Claude 3.x Sonnet models.
-
Update sampling parameters
WarningThis is a breaking change when migrating from Claude 3.x models.
Use only
temperatureORtop_p, not both. -
Update tool versions
WarningThis is a breaking change when migrating from Claude 3.x models.
Update to the latest tool versions (
text_editor_20250728,code_execution_20250825). Remove any code using theundo_editcommand. -
Handle the
refusalstop reasonUpdate your application to handle
refusalstop reasons. -
Update your prompts for behavioral changes
Claude 4 models have a more concise, direct communication style. Review prompting best practices for optimization guidance.
Sonnet 4.5 migration checklist
- Update model ID to
claude-sonnet-4-5-20250929 - BREAKING: Update tool versions to latest (
text_editor_20250728,code_execution_20250825); legacy versions are not supported (if migrating from 3.x) - BREAKING: Remove any code using the
undo_editcommand (if applicable) - BREAKING: Update sampling parameters to use only
temperatureORtop_p, not both (if migrating from 3.x) - Handle new
refusalstop reason in your application - Review and update prompts following prompting best practices
- Consider enabling extended thinking for complex reasoning tasks
- Test in development environment before production deployment
Migrating to Claude Haiku 4.5
Claude Haiku 4.5 is the fastest and most intelligent Haiku model with near-frontier performance, delivering premium model quality for interactive applications and high-volume processing.
For a complete overview of capabilities, see the models overview.
Haiku 4.5 pricing is $1 per million input tokens, $5 per million output tokens. See Claude pricing for details.
Update your model name:
# From Haiku 3.5
model = "claude-3-5-haiku-20241022" # Before
model = "claude-haiku-4-5-20251001" # After
Review new rate limits: Haiku 4.5 has separate rate limits from Haiku 3.5. See Rate limits documentation for details.
For significant performance improvements on coding and reasoning tasks, consider enabling extended thinking with thinking: {type: "enabled", budget_tokens: N}.
Extended thinking impacts prompt caching efficiency.
Extended thinking is deprecated in Claude 4.6 models and removed in Claude Opus 4.7. If using newer models, use adaptive thinking instead.
Explore new capabilities: See the models overview for details on context awareness, increased output capacity (64k tokens), higher intelligence, and improved speed.
Breaking changes
These breaking changes apply when migrating from Claude 3.x Haiku models.
-
Update sampling parameters
WarningThis is a breaking change when migrating from Claude 3.x models.
Use only
temperatureORtop_p, not both. -
Update tool versions
WarningThis is a breaking change when migrating from Claude 3.x models.
Update to the latest tool versions (
text_editor_20250728,code_execution_20250825). Remove any code using theundo_editcommand. -
Handle the
refusalstop reasonUpdate your application to handle
refusalstop reasons. -
Update your prompts for behavioral changes
Claude 4 models have a more concise, direct communication style. Review prompting best practices for optimization guidance.
Haiku 4.5 migration checklist
- Update model ID to
claude-haiku-4-5-20251001 - BREAKING: Update tool versions to latest (
text_editor_20250728,code_execution_20250825); legacy versions are not supported - BREAKING: Remove any code using the
undo_editcommand (if applicable) - BREAKING: Update sampling parameters to use only
temperatureORtop_p, not both - Handle new
refusalstop reason in your application - Review and adjust for new rate limits (separate from Haiku 3.5)
- Review and update prompts following prompting best practices
- Consider enabling extended thinking for complex reasoning tasks
- Test in development environment before production deployment
Get help
- Check the API documentation for detailed specifications
- Review model capabilities for performance comparisons
- Review API release notes for API updates
- Contact support if you encounter any issues during migration