Fast mode (beta: research preview)

Higher output speed for Claude Opus 4.6 and Claude Opus 4.7, delivering significantly faster token generation for latency-sensitive and agentic workflows.


Fast mode provides significantly faster output token generation for Claude Opus 4.6 and Claude Opus 4.7. By setting speed: "fast" in your API request, you get up to 2.5x higher output tokens per second from the same model at premium pricing.

Note

Fast mode is in beta (research preview). Join the waitlist to request access. Availability is limited while Anthropic gathers feedback.

Note

This feature is eligible for Zero Data Retention (ZDR). When your organization has a ZDR arrangement, data sent through this feature is not stored after the API response is returned.

Supported models

Fast mode is supported on the following models:

  • Claude Opus 4.7 (claude-opus-4-7)
  • Claude Opus 4.6 (claude-opus-4-6)

How fast mode works

Fast mode runs the same model with a faster inference configuration. There is no change to intelligence or capabilities.

  • Up to 2.5x higher output tokens per second compared to standard speed
  • Speed benefits are focused on output tokens per second (OTPS), not time to first token (TTFT)
  • Same model weights and behavior (not a different model)

Basic usage

curl https://api.anthropic.com/v1/messages \
    --header "x-api-key: $ANTHROPIC_API_KEY" \
    --header "anthropic-version: 2023-06-01" \
    --header "anthropic-beta: fast-mode-2026-02-01" \
    --header "content-type: application/json" \
    --data '{
        "model": "claude-opus-4-7",
        "max_tokens": 4096,
        "speed": "fast",
        "messages": [{
            "role": "user",
            "content": "Refactor this module to use dependency injection"
        }]
    }'
ant beta:messages create \
  --beta fast-mode-2026-02-01 \
  --transform 'content.0.text' --raw-output <<'YAML'
model: claude-opus-4-7
max_tokens: 4096
speed: fast
messages:
  - role: user
    content: Refactor this module to use dependency injection
YAML
import anthropic

client = anthropic.Anthropic()

response = client.beta.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    speed="fast",
    betas=["fast-mode-2026-02-01"],
    messages=[
        {"role": "user", "content": "Refactor this module to use dependency injection"}
    ],
)

print(response.content[0].text)
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.beta.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 4096,
  speed: "fast",
  betas: ["fast-mode-2026-02-01"],
  messages: [
    {
      role: "user",
      content: "Refactor this module to use dependency injection"
    }
  ]
});

const textBlock = response.content.find(
  (block): block is Anthropic.Beta.Messages.BetaTextBlock => block.type === "text"
);
console.log(textBlock?.text);
using Anthropic;
using Anthropic.Models.Beta.Messages;

AnthropicClient client = new();

var response = await client.Beta.Messages.Create(new MessageCreateParams
{
    Model = "claude-opus-4-7",
    MaxTokens = 4096,
    Speed = Speed.Fast,
    Betas = ["fast-mode-2026-02-01"],
    Messages = [
        new() { Role = Role.User, Content = "Refactor this module to use dependency injection" }
    ],
});

Console.WriteLine(response);
package main

import (
	"context"
	"fmt"
	"log"

	anthropic "github.com/anthropics/anthropic-sdk-go"
)

func main() {
	client := anthropic.NewClient()

	response, err := client.Beta.Messages.New(context.TODO(), anthropic.BetaMessageNewParams{
		Model:     anthropic.ModelClaudeOpus4_7,
		MaxTokens: 4096,
		Speed:     anthropic.BetaMessageNewParamsSpeedFast,
		Betas:     []anthropic.AnthropicBeta{anthropic.AnthropicBetaFastMode2026_02_01},
		Messages: []anthropic.BetaMessageParam{
			anthropic.NewBetaUserMessage(anthropic.NewBetaTextBlock("Refactor this module to use dependency injection")),
		},
	})
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(response.Content[0].AsText().Text)
}
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.beta.AnthropicBeta;
import com.anthropic.models.beta.messages.BetaMessage;
import com.anthropic.models.beta.messages.MessageCreateParams;
import com.anthropic.models.messages.Model;

void main() {
    AnthropicClient client = AnthropicOkHttpClient.fromEnv();

    BetaMessage response = client.beta().messages().create(
            MessageCreateParams.builder()
                    .model(Model.CLAUDE_OPUS_4_7)
                    .maxTokens(4096L)
                    .speed(MessageCreateParams.Speed.FAST)
                    .addBeta(AnthropicBeta.FAST_MODE_2026_02_01)
                    .addUserMessage("Refactor this module to use dependency injection")
                    .build());

    IO.println(response.content().get(0).text().get().text());
}
<?php

use Anthropic\Client;

$client = new Client();

$response = $client->beta->messages->create(
    model: 'claude-opus-4-7',
    maxTokens: 4096,
    speed: 'fast',
    betas: ['fast-mode-2026-02-01'],
    messages: [
        ['role' => 'user', 'content' => 'Refactor this module to use dependency injection'],
    ],
);

echo $response->content[0]->text;
require "anthropic"

client = Anthropic::Client.new

response = client.beta.messages.create(
  model: "claude-opus-4-7",
  max_tokens: 4096,
  speed: "fast",
  betas: ["fast-mode-2026-02-01"],
  messages: [{role: "user", content: "Refactor this module to use dependency injection"}]
)

puts response.content[0].text

Pricing

Fast mode is priced at 6x standard Opus rates across the full context window, including requests over 200k input tokens. The following table shows pricing for Claude Opus 4.6 and Claude Opus 4.7 with fast mode:

InputOutput
$30 / MTok$150 / MTok

Fast mode pricing stacks with other pricing modifiers:

For complete pricing details, see the pricing page.

Rate limits

Fast mode has a dedicated rate limit that is separate from standard Opus rate limits. When your fast mode rate limit is exceeded, the API returns a 429 error with a retry-after header indicating when capacity will be available.

The response includes headers that indicate your fast mode rate limit status:

HeaderDescription
anthropic-fast-input-tokens-limitMaximum fast mode input tokens per minute
anthropic-fast-input-tokens-remainingRemaining fast mode input tokens
anthropic-fast-input-tokens-resetTime when the fast mode input token limit resets
anthropic-fast-output-tokens-limitMaximum fast mode output tokens per minute
anthropic-fast-output-tokens-remainingRemaining fast mode output tokens
anthropic-fast-output-tokens-resetTime when the fast mode output token limit resets

For tier-specific rate limits, see the rate limits page.

Checking which speed was used

The response usage object includes a speed field that indicates which speed was used, either "fast" or "standard":

curl https://api.anthropic.com/v1/messages \
    --header "x-api-key: $ANTHROPIC_API_KEY" \
    --header "anthropic-version: 2023-06-01" \
    --header "anthropic-beta: fast-mode-2026-02-01" \
    --header "content-type: application/json" \
    --data '{
        "model": "claude-opus-4-7",
        "max_tokens": 1024,
        "speed": "fast",
        "messages": [{"role": "user", "content": "Hello"}]
    }'
ant beta:messages create --beta fast-mode-2026-02-01 \
  --transform usage.speed --raw-output <<'YAML'
model: claude-opus-4-7
max_tokens: 1024
speed: fast
messages:
  - role: user
    content: Hello
YAML
response = client.beta.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    speed="fast",
    betas=["fast-mode-2026-02-01"],
    messages=[{"role": "user", "content": "Hello"}],
)

print(response.usage.speed)  # "fast" or "standard"
const response = await client.beta.messages.create({
  model: "claude-opus-4-7",
  max_tokens: 1024,
  speed: "fast",
  betas: ["fast-mode-2026-02-01"],
  messages: [{ role: "user", content: "Hello" }]
});

console.log(response.usage.speed); // "fast" or "standard"
using Anthropic;
using Anthropic.Models.Beta.Messages;

AnthropicClient client = new();

var response = await client.Beta.Messages.Create(new MessageCreateParams
{
    Model = "claude-opus-4-7",
    MaxTokens = 1024,
    Speed = Speed.Fast,
    Betas = ["fast-mode-2026-02-01"],
    Messages = [new() { Role = Role.User, Content = "Hello" }],
});

Console.WriteLine(response.Usage.Speed);  // "fast" or "standard"
package main

import (
	"context"
	"fmt"
	"log"

	anthropic "github.com/anthropics/anthropic-sdk-go"
)

func main() {
	client := anthropic.NewClient()

	response, err := client.Beta.Messages.New(context.TODO(), anthropic.BetaMessageNewParams{
		Model:     anthropic.ModelClaudeOpus4_7,
		MaxTokens: 1024,
		Speed:     anthropic.BetaMessageNewParamsSpeedFast,
		Betas:     []anthropic.AnthropicBeta{anthropic.AnthropicBetaFastMode2026_02_01},
		Messages: []anthropic.BetaMessageParam{
			anthropic.NewBetaUserMessage(anthropic.NewBetaTextBlock("Hello")),
		},
	})
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(response.Usage.Speed) // "fast" or "standard"
}
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.models.beta.AnthropicBeta;
import com.anthropic.models.beta.messages.BetaMessage;
import com.anthropic.models.beta.messages.MessageCreateParams;
import com.anthropic.models.messages.Model;

void main() {
    AnthropicClient client = AnthropicOkHttpClient.fromEnv();

    MessageCreateParams params = MessageCreateParams.builder()
            .model(Model.CLAUDE_OPUS_4_7)
            .maxTokens(1024L)
            .speed(MessageCreateParams.Speed.FAST)
            .addBeta(AnthropicBeta.FAST_MODE_2026_02_01)
            .addUserMessage("Hello")
            .build();

    BetaMessage response = client.beta().messages().create(params);
    IO.println(response.usage().speed());  // "fast" or "standard"
}
<?php

use Anthropic\Client;

$client = new Client();

$response = $client->beta->messages->create(
    model: 'claude-opus-4-7',
    maxTokens: 1024,
    speed: 'fast',
    betas: ['fast-mode-2026-02-01'],
    messages: [['role' => 'user', 'content' => 'Hello']],
);

echo $response->usage->speed;  // "fast" or "standard"
response = client.beta.messages.create(
  model: "claude-opus-4-7",
  max_tokens: 1024,
  speed: "fast",
  betas: ["fast-mode-2026-02-01"],
  messages: [{ role: "user", content: "Hello" }]
)

puts(response.usage.speed)  # "fast" or "standard"
{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [{ "type": "text", "text": "Hello!" }],
  "model": "claude-opus-4-7",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 8,
    "output_tokens": 12,
    "speed": "fast"
  }
}

To track fast mode usage and costs across your organization, see the Usage and Cost API.

Retries and fallback

Automatic retries

When fast mode rate limits are exceeded, the API returns a 429 error with a retry-after header. The Anthropic SDKs automatically retry these requests up to 2 times by default (configurable via max_retries), waiting for the server-specified delay before each retry. Since fast mode uses continuous token replenishment, the retry-after delay is typically short and requests succeed once capacity is available.

Falling back to standard speed

If you'd prefer to fall back to standard speed rather than wait for fast mode capacity, catch the rate limit error and retry without speed: "fast". Set max_retries to 0 on the initial fast request to skip automatic retries and fail immediately on rate limit errors.

Note

Falling back from fast to standard speed will result in a prompt cache miss. Requests at different speeds do not share cached prefixes.

Since setting max_retries to 0 also disables retries for other transient errors (overloaded, internal server errors), the examples below re-issue the original request with default retries for those cases.

# `ant` retries 429/5xx automatically and has no per-request max_retries
# override, so on a fast-mode 429 the fallback runs after the built-in
# retries exhaust. --transform-error surfaces error.type for branching.
create_message_with_fast_fallback() {
  local speed="$1" max_attempts="${2:-3}" body out
  body=${3:-$(cat)}
  out=$(
    ant beta:messages create --beta fast-mode-2026-02-01 \
      ${speed:+--speed "$speed"} \
      --transform-error error.type --format-error yaml <<<"$body" 2>/dev/null
  ) && { printf '%s\n' "$out"; return; }
  case "$out" in
    rate_limit_error)
      if [[ -n "$speed" ]]; then
        create_message_with_fast_fallback "" "$max_attempts" "$body"
        return
      fi ;;
    overloaded_error | api_error | "")
      if (( max_attempts > 1 )); then
        create_message_with_fast_fallback "$speed" $((max_attempts - 1)) "$body"
        return
      fi ;;
  esac
  printf '%s\n' "${out:-connection_error}" >&2
  return 1
}

MESSAGE=$(
  create_message_with_fast_fallback fast <<'YAML'
model: claude-opus-4-7
max_tokens: 1024
messages:
  - role: user
    content: Hello
YAML
)
import anthropic

client = anthropic.Anthropic()


def create_message_with_fast_fallback(max_retries=None, max_attempts=3, **params):
    try:
        return client.beta.messages.create(**params, max_retries=max_retries)
    except anthropic.RateLimitError:
        if params.get("speed") == "fast":
            del params["speed"]
            return create_message_with_fast_fallback(**params)
        raise
    except (
        anthropic.APIStatusError,
        anthropic.APIConnectionError,
    ) as error:
        if isinstance(error, anthropic.APIStatusError) and error.status_code < 500:
            raise
        if max_attempts > 1:
            return create_message_with_fast_fallback(
                max_attempts=max_attempts - 1, **params
            )
        raise


message = create_message_with_fast_fallback(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
    betas=["fast-mode-2026-02-01"],
    speed="fast",
    max_retries=0,
)
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
(async () => {
  async function createMessageWithFastFallback(
    params: Anthropic.Beta.MessageCreateParams,
    requestOptions?: Anthropic.RequestOptions,
    maxAttempts: number = 3
  ): Promise<Anthropic.Beta.Messages.BetaMessage> {
    try {
      return (await client.beta.messages.create(
        params,
        requestOptions
      )) as Anthropic.Beta.Messages.BetaMessage;
    } catch (e) {
      if (e instanceof Anthropic.RateLimitError && params.speed === "fast") {
        const { speed, ...rest } = params;
        return createMessageWithFastFallback(rest);
      }
      if (
        e instanceof Anthropic.InternalServerError ||
        e instanceof Anthropic.APIConnectionError
      ) {
        if (maxAttempts > 1) {
          return createMessageWithFastFallback(params, undefined, maxAttempts - 1);
        }
      }
      throw e;
    }
  }

  const message = await createMessageWithFastFallback(
    {
      model: "claude-opus-4-7",
      max_tokens: 1024,
      messages: [{ role: "user", content: "Hello" }],
      betas: ["fast-mode-2026-02-01"],
      speed: "fast"
    },
    { maxRetries: 0 }
  );
})();
using Anthropic;
using Anthropic.Exceptions;
using Anthropic.Models.Beta.Messages;

AnthropicClient client = new();

async Task<BetaMessage> CreateMessageWithFastFallback(
    MessageCreateParams parameters,
    int? maxRetries = null,
    int maxAttempts = 3)
{
    try
    {
        var requestClient = maxRetries is int retries
            ? client.WithOptions(options => options with { MaxRetries = retries })
            : client;
        return await requestClient.Beta.Messages.Create(parameters);
    }
    catch (AnthropicRateLimitException)
    {
        if (parameters.Speed is not null)
        {
            return await CreateMessageWithFastFallback(
                parameters with { Speed = null });
        }
        throw;
    }
    catch (Anthropic5xxException)
    {
        if (maxAttempts > 1)
        {
            return await CreateMessageWithFastFallback(
                parameters, maxAttempts: maxAttempts - 1);
        }
        throw;
    }
}

var message = await CreateMessageWithFastFallback(
    new MessageCreateParams
    {
        Model = "claude-opus-4-7",
        MaxTokens = 1024,
        Messages = [new() { Role = Role.User, Content = "Hello" }],
        Betas = ["fast-mode-2026-02-01"],
        Speed = Speed.Fast,
    },
    maxRetries: 0);
package main

import (
	"context"
	"errors"
	"fmt"

	anthropic "github.com/anthropics/anthropic-sdk-go"
	"github.com/anthropics/anthropic-sdk-go/option"
)

func createMessageWithFastFallback(
	ctx context.Context,
	client *anthropic.Client,
	params anthropic.BetaMessageNewParams,
	maxAttempts int,
	opts ...option.RequestOption,
) (*anthropic.BetaMessage, error) {
	message, err := client.Beta.Messages.New(ctx, params, opts...)
	if err != nil {
		var apierr *anthropic.Error
		if errors.As(err, &apierr) && apierr.StatusCode == 429 && params.Speed != "" {
			params.Speed = ""
			return createMessageWithFastFallback(ctx, client, params, maxAttempts)
		}
		if (errors.As(err, &apierr) && apierr.StatusCode >= 500) || !errors.As(err, &apierr) {
			if maxAttempts > 1 {
				return createMessageWithFastFallback(ctx, client, params, maxAttempts-1)
			}
		}
		return nil, err
	}
	return message, nil
}

func main() {
	client := anthropic.NewClient()
	message, err := createMessageWithFastFallback(
		context.TODO(),
		&client,
		anthropic.BetaMessageNewParams{
			Model:     anthropic.ModelClaudeOpus4_7,
			MaxTokens: 1024,
			Messages: []anthropic.BetaMessageParam{
				anthropic.NewBetaUserMessage(anthropic.NewBetaTextBlock("Hello")),
			},
			Speed: "fast",
			Betas: []anthropic.AnthropicBeta{anthropic.AnthropicBetaFastMode2026_02_01},
		},
		3,
		option.WithMaxRetries(0),
	)
	if err != nil {
		panic(err)
	}
	fmt.Println(message)
}
import com.anthropic.client.AnthropicClient;
import com.anthropic.client.okhttp.AnthropicOkHttpClient;
import com.anthropic.errors.InternalServerException;
import com.anthropic.errors.RateLimitException;
import com.anthropic.models.beta.AnthropicBeta;
import com.anthropic.models.beta.messages.BetaMessage;
import com.anthropic.models.beta.messages.MessageCreateParams;
import com.anthropic.models.messages.Model;
import java.util.Optional;

// Disable SDK auto-retry so the fallback logic below handles it
AnthropicClient client =
        AnthropicOkHttpClient.builder().fromEnv().maxRetries(0).build();

BetaMessage createMessageWithFastFallback(
        MessageCreateParams params, int maxAttempts) {
    try {
        return client.beta().messages().create(params);
    } catch (RateLimitException e) {
        if (params.speed().isPresent()) {
            MessageCreateParams retryParams = params.toBuilder()
                    .speed(Optional.empty())
                    .build();
            return createMessageWithFastFallback(retryParams, maxAttempts);
        }
        throw e;
    } catch (InternalServerException e) {
        if (maxAttempts > 1) {
            return createMessageWithFastFallback(params, maxAttempts - 1);
        }
        throw e;
    }
}

void main() {
    BetaMessage message = createMessageWithFastFallback(
            MessageCreateParams.builder()
                    .model(Model.CLAUDE_OPUS_4_7)
                    .maxTokens(1024L)
                    .addUserMessage("Hello")
                    .addBeta(AnthropicBeta.FAST_MODE_2026_02_01)
                    .speed(MessageCreateParams.Speed.FAST)
                    .build(),
            3);
    IO.println(message.content().get(0).text().get().text());
}
<?php

use Anthropic\Client;
use Anthropic\Core\Exceptions\APIConnectionException;
use Anthropic\Core\Exceptions\InternalServerException;
use Anthropic\Core\Exceptions\RateLimitException;
use Anthropic\RequestOptions;

$client = new Client();

function createMessageWithFastFallback(
    Client $client,
    array $params,
    ?RequestOptions $requestOptions = null,
    int $maxAttempts = 3,
) {
    try {
        return $client->beta->messages->create(
            ...$params,
            requestOptions: $requestOptions,
        );
    } catch (RateLimitException $e) {
        if (isset($params['speed'])) {
            unset($params['speed']);
            return createMessageWithFastFallback($client, $params);
        }
        throw $e;
    } catch (InternalServerException | APIConnectionException $e) {
        if ($maxAttempts > 1) {
            return createMessageWithFastFallback(
                $client, $params, maxAttempts: $maxAttempts - 1
            );
        }
        throw $e;
    }
}

$message = createMessageWithFastFallback(
    $client,
    [
        'model' => 'claude-opus-4-7',
        'maxTokens' => 1024,
        'messages' => [['role' => 'user', 'content' => 'Hello']],
        'betas' => ['fast-mode-2026-02-01'],
        'speed' => 'fast',
    ],
    RequestOptions::with(maxRetries: 0),
);
require "anthropic"

anthropic = Anthropic::Client.new

def create_message_with_fast_fallback(client, request_options: {}, max_attempts: 3, **params)
  client.beta.messages.create(**params, request_options: request_options)
rescue Anthropic::Errors::RateLimitError
  raise unless params[:speed] == "fast"
  params.delete(:speed)
  create_message_with_fast_fallback(client, **params)
rescue Anthropic::Errors::InternalServerError, Anthropic::Errors::APIConnectionError
  raise unless max_attempts > 1
  create_message_with_fast_fallback(client, max_attempts: max_attempts - 1, **params)
end

message = create_message_with_fast_fallback(
  anthropic,
  model: "claude-opus-4-7",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Hello" }],
  betas: ["fast-mode-2026-02-01"],
  speed: "fast",
  request_options: { max_retries: 0 }
)

Considerations

  • Prompt caching: Switching between fast and standard speed invalidates the prompt cache. Requests at different speeds do not share cached prefixes.
  • Supported models: Fast mode is supported on Claude Opus 4.6 and Claude Opus 4.7. Sending speed: "fast" with an unsupported model returns an error.
  • TTFT: Fast mode's benefits are focused on output tokens per second (OTPS), not time to first token (TTFT).
  • Batch API: Fast mode is not available with the Batch API.
  • Priority Tier: Fast mode is not available with Priority Tier.
  • Claude Platform on AWS: Fast mode is not currently available on Claude Platform on AWS.

Next steps