使用实时模型 | OpenAI API

gpt-realtime-2 是我们专为实现低延迟端到端语音应用而打造的最先进推理语音模型。与早期的实时模型相比，它能够在开口前进行思考，更可靠地遵循指令，支持更大的上下文窗口，并以更高的精度调用工具。

为了充分利用这些优势，您在设计提示词时需要更具意图性。明确定义助手的职责、决策点、工具调用行为和防护边界：它该做什么、何时做，以及应避免什么。

从简单开始。不要一开始就过度提示。从最精简的提示词起步，运行评估，然后仅针对测试中未达预期的行为添加指令。

选择模型

模型	适用场景	提示词侧重点
`gpt-realtime-2`	您需要最强大的实时推理、工具调用及指令遵循能力。	调优推理力度、前言、工具策略、精确实体捕获以及长会话状态。
`gpt-realtime-1.5`	您需要一个快速、可靠的非推理端到端语音模型。	遵循核心实时提示词结构，并针对延迟敏感行为进行测试。

Realtime 2.0 提示词指南

使用 gpt-realtime-2 当语音助手需要更强的推理能力、工具选择、精确实体处理或长会话状态时。请从 reasoning.effort: “low”, 测试默认前导行为，并在执行写入操作前设定明确的确认边界。

Realtime 2 的变更

将 Realtime 2 作为推理语音助手进行提示，而不是基础的语音机器人。

变更	对提示词的影响
推理	允许模型在说话或调用工具前，针对复杂任务进行内部推理。利用前言来避免尴尬的沉默或不必要的填充词。
提示词的精确度更为重要	将“提供帮助”等宽泛的指导替换为明确的触发条件、动作和异常规则：何时行动、做什么以及何时不做。
指令冲突的代价更高	移除重叠的 `always`, `never`, `only`，且 `must` 规则，除非它们确实是必需的。当规则存在冲突时，定义优先级。
工具行为更具可控性	明确指定助手应何时立即行动、询问缺失信息、确认高精度细节、失败后重试或向上升级。
前言是首要行为	模型可能会在较长的推理或工具调用流程前，播报简短的更新。控制前言应在何时出现、应有多简短，以及何时应跳过。
扩展的上下文窗口	`gpt-realtime-2` 将实时上下文窗口从 32k 扩展到了 128k 个 token，使其更适合长会话和更庞大的系统提示词。

前言并非隐藏的思维链。它们是简短的语音更新，例如“我现在就去查询那个订单”。不要要求模型透露其内部推理过程。

设置推理力度

gpt-realtime-2 可以在延迟与深度推理之间进行权衡。使用足以满足工作流所需智能水平的最低推理等级。

起始项 low 适用于大多数生产环境的语音代理。请根据任务复杂度、对延迟的容忍度以及失败成本进行上调或下调。

力度	适用场景	示例
`minimal`	最看重最低延迟，且任务简单。	智能家居指令、计时器、简单的日程查询。
`low`	需要快速响应以及基本的推理能力。	客户支持、订单查询、简单的政策问题。
`medium`	助手必须通过多步骤任务进行推理。	技术支持、故障诊断、复杂路由。
`high`	更深度的推理能显著提高成功率。	高精度工作流、升级决策、具有约束条件的任务。
`xhigh`	最大程度的推理值得付出额外的延迟和成本。	复杂规划、关键分诊、高风险工具编排。

除了 API 设置外，还需引导模型了解何时推理以及推理的程度。

## Reasoning

- For direct answers, simple lookups, and short confirmations, respond quickly and do not reason.
- For multi-step tasks, tool decisions, troubleshooting, or escalation, reason before acting.
- Do not perform extended reasoning when the user's audio is unclear; ask for clarification instead.

有目的地使用前导语

前导语是简短的口头更新，让语音代理在推理、查找信息或调用工具时仍保持响应感。运用得当，能让用户确信助手正在处理；运用不当，则会成为废话并增加体感延迟。

gpt-realtime-2 默认生成前导语。请先测试默认行为。如果它与你的产品体验不符，请明确对其进行调整。

Preamble generation and playback timeline

## Preambles

Use short preambles only when they help the user understand that work is happening.

### When to use a preamble

Use a preamble when:

- you are about to call a tool that may take noticeable time;
- you need to reason through a multi-step request;
- you are checking records, availability, account state, or policy details;
- you are preparing an escalation or handoff;
- silence would make the assistant feel unresponsive.

When a preamble is needed, output it immediately before substantive reasoning or tool use.

### When to not use a preamble

Do not use a preamble when:

- the answer is direct and can be given immediately;
- the user is only confirming, correcting, or declining something;
- the audio is unclear and you need clarification;
- the latest audio is silence, background noise, hold music, TV audio, or side conversation;
- the tool call is lightweight and the user would not benefit from an update.

### Preamble style

When using a preamble:

- keep it natural, calm, and concise;
- vary the wording across turns;
- describe the action, not the internal reasoning;
- avoid filler.

Avoid phrases like:

- "Let me think..."
- "Hmm..."
- "One moment while I process that..."
- "I am now going to access the tool..."

### Preamble length

Use one short sentence.

Do not exceed two short sentences unless the user needs an explanation before a high-impact action.

### Prefer

- "I'll check that order now."
- "I'll look up your appointment details."
- "I'll verify that before we make any changes."
- "I'll check the policy and then give you the next step."
- "I'll pull that up so we can make sure it's the right account."

### Avoid

- "Let me think about that for a second."
- "Please wait while I process your request."
- "I'm going to use my tools now."
- "Interesting question. I will reason through this carefully."

控制回复长度

gpt-realtime-2 当提示词针对每种任务类型明确规定了所需细节的详略程度时，模型能最好地遵循长度指导。与其告诉模型“简洁”，不如在上下文中定义简洁的具体含义：直接回答、工具结果、故障排查、对比说明和问题升级可能各自需要不同的回复长度。

## Verbosity

- Direct answers: Use 1-2 short sentences.
- Clarifying questions: Ask one question at a time.
- Tool results: Summarize the result first, then give only the next useful action.
- Product or option comparisons: Include key differences, tradeoffs, and who each option fits.
- Troubleshooting: Give one step at a time unless the user asks for the full procedure.
- Escalations: Briefly explain why escalation is needed and what will happen next.

Example:

用户：我应该选择哪个计划？

助手：如果你想追求最低成本，请选择基础版。如果你需要团队权限和共享账单，请选择专业版。如果合规审查或管理控制对你很重要，请选择企业版。

设计工具行为

gpt-realtime-2 在工具调用方面表现更出色，但工具行为仍取决于提示词和工具规范的设计。如果提示词没有界定何时行动、提问、确认或恢复，助手可能会过早调用工具、提出不必要的问题或重复失败的调用。

设置工具调用积极性

对于只读、低风险的操作，高积极性效果良好。当工具用于修改数据、触发外部影响或依赖精确标识符时，低积极性更合适。

工具类型	默认行为
只读、低风险的查询	在意图和必填字段明确时调用。
使用精确标识符的只读操作	在查询前确认标识符。
用户可见的沟通	在发送前起草或总结。
账户变更	在调用前确认。
购买、取消、支付	在调用前确认金额、目标和后果。
不可逆或高影响的操作	明确确认并在适当时提供升级处理。

当您混合使用读写操作时，请使用此平衡的默认设置。根据您的用例进行调整。

## Tools

Use only the tools explicitly provided in the current tool list. Do not invent, assume, simulate, or rename tools.

For read-only tools:

- Call the tool when the user's intent is clear and all required fields are available.
- Do not ask for confirmation unless the lookup depends on a high-precision identifier or there is meaningful risk of using the wrong record.
- Ask a clarification question only if a required field is missing, ambiguous, or conflicting.

For write tools or external actions:

- Summarize the intended action before calling the tool.
- Include the key consequence, such as what will be changed, sent, canceled, ordered, or charged.
- Ask for confirmation.
- Do not call the tool until the user clearly confirms.

For exact identifiers:

- Treat order IDs, tracking numbers, account numbers, confirmation codes, phone numbers, and email addresses as high precision.
- Normalize only when the field type is clear.
- Confirm the final value before account-specific lookups, validation, or write actions.

After tool calls:

- Only say an action was completed after the tool call succeeds.
- If the tool fails, explain the failure briefly, avoid raw errors, and give the user a clear next step.

高风险示例：

用户：用我的卡支付剩余余额。

Bad:

助手：我已经刷了您的卡。

Good:

助手：确认一下，您希望我从档案中的信用卡扣除 248.16 美元的剩余余额。是否继续？

从工具故障中恢复

工具故障是对话的一部分。良好的恢复机制应解释发生了什么，并为用户提供明确的后续步骤。

不要同等对待所有故障。恢复行为应取决于工具类型、故障模式和用户影响。某些故障应在重试时静默处理。其他故障则需要要求用户澄清、更正标识符、确认新操作或选择替代路径。

## Tool Failures

If a tool call fails:

1. Briefly explain what failed in user-friendly language.
2. Do not blame the user or expose raw tool errors.
3. If the failure may be due to an exact identifier, read back the value used and ask the user to correct it.
4. If the failure may be temporary, offer to retry once.
5. If the same failure happens repeatedly, offer an alternate path or escalation.

Do not repeatedly call the same tool with the same arguments after failure.

Do not ask for a different identifier until you have first checked whether the captured value was correct.

Bad:

助手：出了点问题。

Good:

助手：我找不到与 O-R-D-3-1-2-5-B-2-3 匹配的记录。其中有哪部分我听错了吗？

保持工具可用性同步

实时模型乐于助人。如果提示词中提到了实际不可用的工具，或者工具列表与提示词不匹配，模型可能会捏造工具名称或假装已完成操作。

例如，如果提示词引用了 lookup_order, 但提供的工具名为 search_orders, 模型可能会调用错误的名称或模拟该操作。

## Tool Availability

Use only the tools that are explicitly provided in the current tool list.

Do not invent, assume, or simulate tools. If a tool is mentioned in the instructions but is not present in the tool list, treat it as unavailable.

If the user requests an action that requires an unavailable tool:

1. Do not pretend to complete the action.
2. Briefly explain that the tool is not available.
3. Offer the closest supported next step.

Only say an action was completed after the relevant tool call succeeds.

使用附录中的提示词审计元提示词来检查生产提示词是否存在矛盾、缺失的工具和脆弱的指令。

处理静音和背景音频

语音助手默认会进行响应。在生产环境中，它们经常会听到不应得到口头回应的音频，例如静音、背景噪音、等待音乐、电视音频或旁人的对话。

当助手应保持安静并继续聆听时，请使用无操作等待工具。该工具为模型提供了一种有效的非语音操作，而不是让它说出“我还在”或“我没听清”之类的话。

工具设计：

1
2
3
4
5
6
7
8
9
{
  "name": "wait_for_user",
  "description": "Call this when the latest audio does not need a spoken response, such as silence, background noise, hold music, TV audio, side conversation, or speech not addressed to the assistant. This tool helps end the turn without a spoken reply.",
  "parameters": {
    "type": "object",
    "properties": {},
    "required": []
  }
}

将其与提示词指令配对使用：

## Handling Silence and Background Noise

If the latest audio is silence, background noise, hold music, TV audio, side conversation, or speech not addressed to you, call `wait_for_user`.

Do not respond conversationally after calling this tool.

Do not say "I'm here," "I didn't catch that," "Take your time," or "Let me know when you're ready."

Resume normal responses only when the user clearly addresses you or asks for help.

将此用于非面向助手的音频，而非用于不明确的用户请求。如果用户明确是在对助手说话，但内容难以理解，请要求其澄清。

有目的地使用消息通道

gpt-realtime-2 可以在评论通道中产生用户可见的中间消息，并在最终通道中产生面向用户的最终响应。当行为取决于出现的位置时，请使用特定于通道的指令。

通道	用户可见？	用途
`commentary`	是	前言和工具调用。
`final`	是	面向用户的最终消息。

例如，工具调用在评论通道中进行。如果您希望助手在工具使用之前、期间或之后说些什么，请相对于评论通道指定该行为。

Before calling tools in the commentary channel, briefly tell the user what you are doing.

gpt-realtime-2 可以在单轮对话中发出多个响应阶段。在 API 输出中，这种区别由 response.done 事件表示，该事件包含一个 phase 值，用于指示内容是评论还是最终答案。

您可以使用此字段在应用程序中对每个阶段进行不同的处理。例如，评论可以作为简短的中间更新播放或显示，而 final_answer 则可保留用于助手的完整响应。

response.output[0].phase: "commentary"
response.output[1].phase: "final_answer"

响应阶段示例

用户提示词：

“这道 AP Bio 题[QUESTION]我不会做。”

简化的 API 响应：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
  "type": "response.done",
  "response": {
    "output": [
      {
        "phase": "commentary",
        "content": [
          {
            "type": "output_audio",
            "transcript": "Let's zero in on the enzyme's shape and binding, since that's the key idea here."
          }
        ]
      },
      {
        "phase": "final_answer",
        "content": [
          {
            "type": "output_audio",
            "transcript": "What changes at the active site at high temperature?"
          }
        ]
      }
    ]
  }
}

处理不清晰的音频

模型应仅在其能有把握理解的音频上采取行动。如果音频不清晰，模型应提出简短的澄清问题，而不是猜测。

不要让模型推断缺失的单词、调用工具、捕获实体、生成前言，或花费隐藏的推理时间试图重建用户可能说过的话。

## Unclear Audio

- Only respond to clear audio or text.
- If the user's audio is not clear, ask for clarification using a short English phrase such as "Sorry, could you repeat that clearly?"
- Don't repeat the same unclear-audio clarification twice.
- Treat audio as unclear if it is ambiguous, noisy, silent, unintelligible, partially cut off, or if you are unsure of the exact words the user said.
- Do not guess what the user meant from unclear audio.
- Do not reason when the audio is unclear.
- Do not provide a preamble or call tools in the commentary channel when the audio is unclear.

Example:

用户音频：“查询订单三一-” [中断]

Bad:

助手：我现在查询订单 31。

Good:

助手：我只听到了订单号的一部分。您能逐个数字重复一遍吗？

捕获精确实体

许多实时工作流依赖于精确值：订单 ID、追踪号码、电子邮件地址、确认码、账号、理赔号、工单 ID、支持参考号和电话号码。

语音使这变得困难。用户说话很快，以不同方式组合数字，拼出部分值，使用填充词，在说话中途自我纠正，或者发音相似的字符。一个错误的数字就可能导致查询失败或检索到错误的账户。

保守地捕获实体。一次收集一个值，仅对明确的部分进行规范化处理，在调用工具前确认高精度值，并使每一次更正都能恢复。

每次收集一个实体

当工作流需要多个值时，请一次收集一个。这可以防止字段混淆，特别是在语音对话中。

## Entity Collection Order

Collect required values one at a time.

- Ask for only the next missing value.
- Do not ask for multiple values in the same turn.
- Before asking, check whether the value was already provided earlier in the conversation or the session.
- If a possible value already exists, confirm it with the user before using it.

Example:

"I see tracking number ABC-54321 from earlier. Should I use that one, or do you have a different tracking number?"

Do not call tools until the current value has been collected, validated, and confirmed.

处理拼读的字符

当用户逐个字母拼读 ID、代码、姓名或电子邮件地址时使用此功能。输入的是口语形式，而不是最终值。

## Spelled-Out Characters

When a user dictates an ID, code, or email character by character, treat the spoken sequence as one compact value. Preserve explicitly spoken separators like dash, dot, underscore, slash, or plus; otherwise do not add spaces or separators.

Examples:

- "A B C one two three" -> "ABC123"
- "B C dash nine eight seven" -> "BC-987"
- "J O H N at example dot com" -> "john@example.com"

Do not insert spaces between spelled-out characters unless the user explicitly says the value contains spaces.

谨慎规范化口述数字

对于数字标识符，用户可能会逐个念出数字、对数字进行分组或使用自然数短语。如果字段期望一个连续的数字值，请将清晰的口述数字转换为数字形式。

## Spoken Number Handling

Convert spoken numbers into digits when collecting numeric identifiers.

Examples:

- "one two three four" -> "1234"
- "one twenty three" -> "123"
- "one nineteen" -> "119"
- "ninety nine eleven" -> "9911"
- "nine thousand nine hundred eleven" -> "9911"

If multiple interpretations are plausible, ask the user to clarify before using the value.

Example:

"I heard either 119 or 1-19. Could you repeat the number digit by digit?"

在工具调用前确认确切的标识符

订单 ID、追踪号码、账号、理赔号、确认码及类似标识符属于高精度字段。在将它们用于工具调用之前，请先进行确认。

对于数字标识符，请逐位复述该值。将值作为完整数字读出可能会掩盖错误。

Example:

助手：确认一下，我听到的是 8… 3… 5… 2… 1。对吗？

如果用户更正了一个字符或数字，请在调用工具前重复完整更正后的值。

Example:

助手：好的。我记下的是 8… 3… 5… 7… 1。对吗？

## Exact Identifier Confirmation

Before calling tools with high-precision identifiers:

- Confirm the final normalized value with the user.
- Read numeric identifiers back digit by digit.
- Do not use guessed, partial, or ambiguous values.
- If the user corrects the value, repeat the full corrected value before calling the tool.

逐个字符确认电子邮件

电子邮件地址是重要的信息。点、连字符、下划线、重复的字母以及发音相似的名称可能会导致账号查询失败，或将邮件发送到错误的地址。

请用户拼读电子邮件地址：

助手：您能逐个字符拼读一下电子邮件地址吗？这样我可以确保准确无误。

复述时，请确认准确的最终地址：

助手：确认一下，是 c-h-e-n at example dot com，对吗？

## Email Confirmation

Email addresses must be captured exactly.

If the user says the email naturally without spelling it out, ask them to repeat it character by character.

Example:

"Could you spell the email address character by character so I can make sure I have it exactly right?"

When reading an email back, confirm the exact final email address.

Example:

"Just to confirm, that is c-h-e-n at example dot com, right?"

实体收集工作流

实体收集工作流示例

当任务在执行任何工具调用之前需要确切的值时，请使用此完整工作流。

## Entity Collection Workflow

When a workflow requires an exact value, collect and confirm it before using it in any tool call.

Exact values include order IDs, tracking numbers, confirmation codes, account numbers, claim numbers, ticket IDs, support references, email addresses, phone numbers, and similar identifiers.

Follow this workflow:

1. Collect the next required value.

- Ask for only one missing value at a time.
- Do not ask for multiple exact values in the same turn.
- Before asking, check whether the value was already provided earlier in the conversation or session.

2. Normalize only what is clear.

- Convert clearly spoken digits or spelled-out characters into the expected format.
- Preserve explicit separators such as dashes, dots, underscores, slashes, and plus signs.
- Do not guess, infer, repair, or fill in unclear characters.
- If the value could be interpreted in more than one way, ask the user to repeat or clarify it.

3. Confirm the final value.

- Read back the normalized value before using it.
- For numeric identifiers, confirm digit by digit.
- For email addresses, confirm character by character when precision matters.
- Wait for a clear confirmation from the user.

4. Call the tool only after confirmation.

- Do not call lookup, account, messaging, payment, booking, or update tools with guessed, partial, ambiguous, or unconfirmed values.

5. Recover safely from corrections.

- If the user corrects any part of the value, update the value, repeat the full corrected value, and ask for confirmation again.
- Do not use the corrected value in a tool call until the user confirms the full final value.

Examples:

User: My order ID is ORD-3125B23.

Assistant: Just to confirm, I heard O-R-D dash 3-1-2-5-B-2-3. Is that right?

User: It is 83521 - actually, the fourth digit is 7.

Assistant: Got it. I have 8... 3... 5... 7... 1. Is that correct?

User: My email is chen@example.com.

Assistant: Could you spell that email address character by character so I can make sure I have it exactly right?

Never call tools with guessed, partial, ambiguous, or unconfirmed exact values.

避免字面指令陷阱

gpt-realtime-2 比早期的实时模型更严格地遵循指令。在旧模型上运行良好的提示词可能需要进行微调。

使用精确的语言。模型可能会优先考虑指令的字面措辞，而不是您期望的更广泛的行为。宽泛或僵化的规则可能会以令人意外的方式主导助手的行为，尤其是当多条规则重叠时。

谨慎使用约束词，例如 must, only, never，且 always。仅在确实需要该行为时使用它们，而不是作为一般的强调手段。过度使用硬性约束会使助手变得死板、过于谨慎，或无法处理合理的例外情况。

优先使用精确的范围：

For write actions that modify user data, ask for confirmation before calling the tool.

避免宽泛的范围：

Always ask for confirmation before doing anything.

宽泛的版本可能会在无害的只读查询（例如检查订单状态、检索可用性或读取账户信息）之前导致不必要的确认。

字面解释示例

字面解释陷阱示例

此提示词范围太窄：

When a confirmation code is provided, repeat it verbatim and wait for a clear yes.

用户消息：

我的订单 ID 是 ORD-3125B23。

可能的失败：

模型可能不会应用该规则，因为用户提供的是订单 ID，而不是确认码。开发者对预期行为的理解很明确，但指令的范围太窄了。

更安全的重写：

When the user provides an exact identifier, including confirmation codes, order IDs, ticket IDs, reset PINs, claim numbers, tracking numbers, or account numbers, repeat the captured value and wait for confirmation before using it in a tool call.

通用提示建议：

优先使用明确的指令，而非隐含的意图。
除非行为确实需要严格限制，否则避免使用不必要的约束词。
尽量减少相互矛盾的指导。
谨慎处理分层或相互竞争的优先级指令。
逐步测试提示词。微小的措辞变化可能会产生巨大的行为差异。
从早期的实时模型迁移时，预计某些提示词需要进行重构才能获得最佳效果。

分别控制语言和口音

语言和口音应该分开控制。

用户的口音与其预期的语言并不相同。用户可能会带印度、西班牙、法国或普通话口音说英语，并且依然期望得到英语回复。

避免使用过于宽泛的语言指令，例如：

Mirror the user.
Respond naturally in the user's language.
Switch languages when appropriate.
Sound local.
Adapt to the user's accent.

这些指令过于宽泛。模型可能会将口音、语气词、附和词或个别外语词汇误解为需要切换语言的信号。

英语语言策略

## Language

English is the default response language.

- Do not infer language from accent alone.
- Ignore short filler sounds, backchannels, and isolated foreign words for language detection.
- Only switch languages if the user explicitly asks or provides a substantive utterance in another language.
- If language confidence is low, ask a short clarification instead of guessing.
- Keep preambles, spoken bridges, tool-related messages, and final answers in the same language.
- Accent adaptation must not change the response language.

多语言策略

## Language

Default to English unless the user clearly uses another language.

Switch languages only when:

- the user explicitly asks to use another language;
- the user provides a substantive utterance in another language. A substantive utterance means the user gives a complete request, question, or correction in another language, not just a greeting, name, address, filler word, or borrowed phrase.

Do not switch languages based on:

- accent;
- pronunciation;
- filler words;
- short backchannels;
- names;
- addresses;
- isolated foreign words.

If uncertain, ask:

"Would you like me to continue in English or [LANGUAGE]?"

口音控制

gpt-realtime-2 可以更强地遵循口音指令，但模糊的口音提示可能会导致漂移或非预期的语言切换。

口音控制提示在明确指定以下内容时效果最佳：

the target accent;
哪些特征应保持稳定；
预期的语速、重音和韵律；
口音适应是否应影响语言选择。

而不是：

Sound Australian.

Use:

## Accent

Speak English with a light Australian accent.

- Keep the accent stable from the first word to the last.
- Use natural Australian vowel shaping, but keep speech easy to understand.
- Do not exaggerate the accent.
- Do not change response language based on the user's accent.

自定义语音

使用自定义声音当标准声音无法可靠地满足品牌、口音或角色要求时。

提示可以引导口音、语速和表达方式，但无法完全替代声音设计。对于需要一致的品牌声音标识或口音保真度的用例，请考虑自定义声音.

自定义语音仅面向获批客户开放。请联系您的客户团队获取访问权限。

在长会话中维持状态

gpt-realtime-2 将实时上下文窗口从 32k 扩展到了 128k 个 token，使其更适合长会话。对于密集的双向对话，128k 个 token 大致可视为约 1-2 小时的密集原始音频上下文。具体情况会因工具调用、内部推理、注入的记录以及其他会话细节而异。

For long-context use cases, gpt-realtime-2 当能够区分哪些信息是最新的、哪些是背景信息，以及在来源冲突时应该忽略哪些信息时，其表现最佳。不要依赖模型从原始转录或大型上下文转储中推断来源优先级。请使用结构化的方式。

在开启包含大量上下文（例如检索到的记录、先前的对话历史、策略、摘要、客户备注或背景文档）的会话时，请使用结构化的模式。

长会话上下文模板示例

## Context

### Current State

- **Current task:** [current task]
- **Latest known state:** [current value]
- **Next safe step:** [what the assistant should do next]

### Authoritative Sources

- **Fact or record:** [fact or record]
- **Source:** [tool result / active policy / verified record]
- **Status:** current
- **Retrieved:** [date/time or this turn]

### Historical or Background Sources

- **Older fact or record:** [older fact or record]
- **Source:** [prior conversation / older record / summary]
- **Status:** stale or background
- **Note:** Do not use for current decisions if it conflicts with a current source.

### Relevant Policy or Rules

- [decision rule or constraint]

### Other Context

- [potentially useful but non-authoritative background]

从早期的实时模型迁移

从早期的实时模型迁移时，请将提示词视为一个行为层面，而不仅仅是需要移植的文本。

使用 Codex 或强大的推理模型，按照最新的 Realtime 提示指南重构提示词。包含指向此提示指南的链接，以便在最佳实践的基础上进行迁移。
将推理努力设置为 low 而非默认值。仅在有需要更深层次规划的工作流中才提高该值。
审查工具名称、参数、枚举、JSON schemas 及其他设置，确保它们与预期的实现相匹配。
移除过时的示例。为正常路径、歧义情况、中断、工具调用和回退行为添加简短的示例。
在迁移前后比较具有代表性的对话。根据现有的评估检查是否存在性能退化，并记录有意为之的行为变更。
进行最终的统一性检查。确认提示词清晰地划分了硬性要求、默认值、工具规则、安全规则和回退行为。
运行评估，检查具有代表性的失败案例，并迭代提示词，直到目标行为稳定可靠。

Realtime 1.5 提示指南

gpt-realtime-1.5 是 Realtime API 中的一个语音到语音模型。相同的 gpt-realtime 提示指南也适用于此模型。

语音到语音系统对于将语音作为核心 AI 交互接口至关重要。 gpt-realtime-1.5 支持构建稳健、易用的实时语音代理，能够大规模处理关键任务工作流。

与早期的实时预览模型相比， gpt-realtime-1.5 提供了更强的指令遵循能力、更可靠的工具调用、更好的语音质量以及整体上更流畅的体验。这些提升使得从链式方法过渡到真正的实时体验变得切实可行，不仅降低了延迟，还能产生听起来更自然、更具表现力的响应。

Realtime 模型受益于一些无法直接应用于基于文本的模型的提示技术。本提示指南从一个建议的提示词骨架开始，然后结合实用技巧、可供复用的小模式以及可根据您的用例进行调整的示例，逐一讲解各个部分。

通用技巧

坚持不懈地迭代: 微小的措辞改变可能会成就或破坏行为。
- 示例：对于不清晰的音频指令，我们将“inaudible（听不见）”替换为“unintelligible（难以辨认）”，从而改善了对嘈杂输入的处理。
使用项目符号而非段落: 清晰、简短的项目符号优于冗长的段落。
通过示例进行引导: 模型会严格遵循示例短语。
表述要精确: 模糊或冲突的指令 = 性能下降，类似于 GPT-5。
控制语言: 如果看到不必要的语言切换，请将输出固定为目标语言。
减少重复: 添加多样性规则以减少机械化的表达。
使用大写文本进行强调: 将关键规则大写可以使它们更醒目，也更容易让模型遵循。
将非文本规则转换为文本: 与其写“IF x > 3 THEN ESCALATE”，不如写“IF MORE THAN THREE FAILURES THEN ESCALATE”。

提示词结构

合理组织提示词有助于模型更好地理解上下文，并在多轮对话中保持一致性。这也便于您对存在问题的部分进行迭代和修改。

作用: 在系统提示中使用清晰的、带标签的部分，以便模型能够找到并遵循它们。保持每个部分只专注于一件事。
如何调整: 添加特定领域的部分（例如：合规性、品牌政策）。移除不需要的部分（例如：如果没有发音问题，则移除参考发音）。

示例

# Role & Objective        — who you are and what “success” means
# Personality & Tone      — the voice and style to maintain
# Context                 — retrieved context, relevant info
# Reference Pronunciations — phonetic guides for tricky words
# Tools                   — names, usage rules, and preambles
# Instructions / Rules    — do’s, don’ts, and approach
# Conversation Flow       — states, goals, and transitions
# Safety & Escalation     — fallback and handoff logic

角色与目标

本部分定义了代理的身份以及“完成”的标准。示例展示了两种不同的身份，以说明当角色和目标明确时，模型会多么严格地遵循它们。

何时使用: 模型没有呈现出您所需的人设、角色或任务范围。
作用: 固定语音助手的身份，使其回复基于该角色描述进行调控
如何调整: 根据您的用例修改角色

示例（模型采用特定口音）

# Role & Objective
You are a Quebecois French-speaking customer service bot. Your task is to answer the user's question.

早期的实时预览模型：

gpt-realtime-1.5:

示例（模型扮演特定角色）

# Role & Objective
You are a high-energy game-show host guiding the caller to guess a secret number from 1 to 100 to win 1,000,000$.

早期的实时预览模型：

gpt-realtime-1.5:

gpt-realtime-1.5 能够比早期的实时预览模型更可靠地履行指定的角色。

个性与语气

gpt-realtime-1.5 在模仿特定个性或语气时，能很好地遵循指令。您可以根据用例的需求定制语音体验和表达方式。

何时使用: 回复感觉平淡、过于冗长，或在多轮对话中不一致。
作用: 设定语气、简洁度和节奏，使回复听起来自然且一致。
如何调整: 调整亲和度/正式程度和默认长度。对于受监管的领域，倾向于中立的精准度。添加与您的用例相关的其他子部分。

示例

# Personality & Tone
## Personality
- Friendly, calm and approachable expert customer service assistant.

## Tone
- Warm, concise, confident, never fawning.

## Length
2–3 sentences per turn.

示例（多种情绪）

# Personality & Tone
- Start your response very happy
- Midway, change to sad
- At the end change your mood to very angry

gpt-realtime-1.5:

模型能够遵循复杂的指令，并在整个音频响应中切换三种情绪。

语速指令

在 Realtime API 中， speed 参数改变的是播放速率，而不是模型组织语音的方式。要让语速听起来真正变快，请添加能够引导节奏的指令。

何时使用: 用户希望听到更快的语速；仅靠播放速度（使用 speed 参数）无法解决说话风格的问题。
作用: 独立于客户端播放速度来调节说话风格（简洁度、节奏）。
如何调整: 修改语速指令以满足用例需求。

示例

# Personality & Tone
## Personality
- Friendly, calm and approachable expert customer service assistant.

## Tone
- Warm, concise, confident, never fawning.

## Length
- 2–3 sentences per turn.

## Pacing
- Deliver your audio response fast, but do not sound rushed.
- Do not modify the content of your response, only increase speaking speed for the same response.

早期的实时预览模型：

gpt-realtime-1.5:

With explicit pacing instructions, gpt-realtime-1.5 能够产生明显更快的语速，且听起来不会过于急促。

语言限制

语言限制可确保模型在背景噪音或多语言输入等具有挑战性的条件下，仍能一致地使用目标语言进行回复。

何时使用: 为了防止在多语言或嘈杂环境中发生意外的语言切换。
作用: 将输出锁定为所选语言，以防止意外的语言更改。
如何调整：将“English”切换为您的目标语言；或者根据您的用例添加更复杂的指令。

示例（固定为一种语言）

# Personality & Tone
## Personality
- Friendly, calm and approachable expert customer service assistant.

## Tone
- Warm, concise, confident, never fawning.

## Length
- 2–3 sentences per turn.

## Language
- The conversation will be only in English.
- Do not respond in any other language even if the user asks.
- If the user speaks another language, politely explain that support is limited to English.

这些是使用以下指令应用后的响应 gpt-realtime-1.5.

lang constraint en

示例（模型教授一种语言）

# Role & Objective
- You are a friendly, knowledgeable voice tutor for French learners.
- Your goal is to help the user improve their French speaking and listening skills through engaging conversation and clear explanations.
- Balance immersive French practice with supportive English guidance to ensure understanding and progress.

# Personality & Tone
## Personality
- Friendly, calm and approachable expert customer service assistant.

## Tone
- Warm, concise, confident, never fawning.

## Length
- 2–3 sentences per turn.

## Language
### Explanations
Use English when explaining grammar, vocabulary, or cultural context.

### Conversation
Speak in French when conducting practice, giving examples, or engaging in dialogue.

这些是使用以下指令应用后的响应 gpt-realtime-1.5.

multi language

模型能够根据自定义指令在不同语言之间进行语码转换。

减少重复

实时模型可以紧密遵循示例短语以保持品牌调性，但这可能导致过度使用，使回复听起来机械或重复。添加重复规则有助于在保持清晰度和品牌声音的同时增加回复的多样性。

何时使用：输出内容在多轮对话或会话中重复使用相同的开场白、填充词或句型结构。
作用：增加多样性约束——防止重复短语，引导使用同义词和替代句型结构，同时保留必要的术语。
如何调整：调整严格程度（例如，“不要在每 N 轮对话中重复使用相同的开场白”），将必须保留的短语（法律/合规/品牌）加入白名单，并在需要保持一致性的地方允许更紧凑的表述。

示例

# Personality & Tone
## Personality
- Friendly, calm and approachable expert customer service assistant.

## Tone
- Warm, concise, confident, never fawning.

## Length
- 2–3 sentences per turn.

## Language
- The conversation will be only in English.
- Do not respond in any other language even if the user asks.
- If the user speaks another language, politely explain that support is limited to English.

## Variety
- Do not repeat the same sentence twice.
- Vary your responses so they don't sound robotic.

这些是回复 之前先评估 应用指令使用 gpt-realtime-1.5。模型重复了相同的确认语： Got it.

repeat before

这些是回复在应用指令使用 gpt-realtime-1.5.

repeat after

现在模型能够改变其回复和确认的方式，不再听起来像机器。

参考发音

本节介绍如何确保模型在语音交互过程中正确发音重要的单词、数字、名称和术语。

何时使用：品牌名称、技术术语或地点经常被读错。
作用：通过发音提示提高信任度和清晰度。
如何调整：保持简短的列表；在听到错误时进行更新。

示例

# Reference Pronunciations
When voicing these words, use the respective pronunciations:
- Pronounce “SQL” as “sequel.”
- Pronounce “PostgreSQL” as “post-gress.”
- Pronounce “Kyiv” as “KEE-iv.”
- Pronounce "Huawei" as “HWAH-way”

早期的实时预览模型：

gpt-realtime-1.5:

通过参考发音指令， gpt-realtime-1.5 能够将 SQL 正确发音为“sequel”。

字母数字发音

Realtime S2S 在回读关键信息（电话号码、信用卡号、订单 ID）时可能会模糊或合并数字/字母。明确的逐字符确认可以防止误听并带来更清晰的合成语音。

何时使用：如果模型难以捕获或回读电话号码、卡号、2FA 验证码、订单 ID、序列号、地址、单元号或混合字母数字字符串。
作用：强制模型带有分隔符逐字朗读，然后与用户确认，并在更正后重新确认。可选择使用字母发音消歧（例如，“A 读作 Alpha”）。

示例（通用指令部分）

# Instructions/Rules
- When reading numbers or codes, speak each character separately, separated by hyphens (e.g., 4-1-5).
- Repeat EXACTLY the provided number; do not omit any digits.

提示：如果您正在遵循对话流提示策略，可以指定哪个对话状态需要应用字母数字发音指令。

示例（对话状态中的指令）

（取自我们以下项目的对话流提示 openai-realtime-agents)

{
    "id": "3_get_and_verify_phone",
    "description": "Request phone number and verify by repeating it back.",
    "instructions": [
      "Politely request the user’s phone number.",
      "Once provided, confirm it by repeating each digit and ask if it’s correct.",
      "If the user corrects you, confirm AGAIN to make sure you understand.",
    ],
    "examples": [
      "I'll need some more information to access your account if that's okay. May I have your phone number, please?",
      "You said 0-2-1-5-5-5-1-2-3-4, correct?",
      "You said 4-5-6-7-8-9-0-1-2-3, correct?"
    ],
    "transitions": [{
      "next_step": "4_authentication_DOB",
      "condition": "Once phone number is confirmed"
    }]
}

这些是回复 之前先评估 应用指令使用 gpt-realtime-1.5.

好的！号码是 55119765423。如果您还需要其他帮助，请告诉我！

这些是回复在应用指令使用 gpt-realtime-1.5.

好的！号码是：5-5-1-1-1-9-7-6-5-4-2-3。如果您还需要其他帮助，请告诉我！

指令

本节涵盖了提示词指南，用于指导模型解决您的任务、应用最佳实践以及修复可能出现的问题。

或许不出所料，我们推荐的提示词模式类似于 GPT-4.1 以获得最佳效果.

指令遵循

与 GPT-4.1 和 GPT-5 一样，如果指令存在冲突、歧义或不清楚， gpt-realtime-1.5 表现会更差。

何时使用：输出内容偏离规则、跳过阶段或滥用工具。
作用：在发布前使用 LLM 指出歧义、冲突和缺失的定义。

指令质量提示词（可在 ChatGPT 或 API 中使用）

请将以下提示词与 GPT-5 结合使用，以识别提示词中您可以修复的问题区域。

## Role & Objective
You are a **Prompt-Critique Expert**.
Examine a user-supplied LLM prompt and surface any weaknesses following the instructions below.


## Instructions
Review the prompt that is meant for an LLM to follow and identify the following issues:
- Ambiguity: Could any wording be interpreted in more than one way?
- Lacking Definitions: Are there any class labels, terms, or concepts that are not defined that might be misinterpreted by an LLM?
- Conflicting, missing, or vague instructions: Are directions incomplete or contradictory?
- Unstated assumptions: Does the prompt assume the model has to be able to do something that is not explicitly stated?


## Do **NOT** list issues of the following types:
- Invent new instructions, tool calls, or external information. You do not know what tools need to be added that are missing.
- Issues that you are unsure about.


## Output Format
"""
# Issues
- Numbered list; include brief quote snippets.

# Improvements
- Numbered list; provide the revised lines you would change and how you would change them.

# Revised Prompt
- Revised prompt where you have applied all your improvements surgically with minimal edits to the original prompt
"""

提示词优化元提示词（可在 ChatGPT 或 API 中使用）

此元提示词通过针对特定的失败模式，帮助您改进基础系统提示词。提供当前的提示词并描述您所看到的问题，模型（GPT-5）将建议经过优化的变体，以收紧约束条件并减少该问题。

Here's my current prompt to an LLM:
[BEGIN OF CURRENT PROMPT]
{CURRENT_PROMPT}
[END OF CURRENT PROMPT]

But I see this issue happening from the LLM:
[BEGIN OF ISSUE]
{ISSUE}
[END OF ISSUE]
Can you provide some variants of the prompt so that the model can better understand the constraints to alleviate the issue?

无音频或音频不清晰

有时模型会认为它听到了内容并尝试响应。您可以添加自定义指令，告诉模型在听到不清晰的音频或用户输入时应该如何表现。请根据您的用例修改所需的行为。例如，您可能希望模型重复同一个问题，而不是要求澄清。

何时使用：背景噪音、不完整的词语或静音会触发不需要的回复。
作用：阻止虚假响应并实现优雅的澄清。
如何调整：根据用例选择是要求澄清还是重复上一个问题。

示例（咳嗽和音频不清晰）

# Instructions/Rules
...


## Unclear audio
- Always respond in the same language the user is speaking in, if unintelligible.
- Only respond to clear audio or text.
- If the user's audio is not clear (e.g. ambiguous input/background noise/silent/unintelligible) or if you did not fully hear or understand the user, ask for clarification using {preferred_language} phrases.

这些是回复在应用指令使用 gpt-realtime-1.5.

在此示例中，模型在我发出 （非常） 大声的咳嗽以及音频不清晰之后要求进行澄清。

背景音乐或声音

有时，模型可能会在语音生成过程中产生意外的背景音乐、哼唱、有节奏的噪音或类似声音的伪影。这些伪影会降低清晰度、分散用户注意力，或使助手显得不够专业。以下指令有助于防止或显著减少这些情况的发生。

何时使用：当您在 Realtime 音频响应中观察到意外的音乐元素或音效时使用。
作用：引导模型避免生成这些不需要的音频伪影。
如何调整：调整指令，尝试明确抑制您遇到的特定声音模式。

示例

# Instructions/Rules
...
- Do not include any sound effects or onomatopoeic expressions in your responses.

工具

使用本节告诉模型如何使用您的函数和工具。详细说明何时调用以及何时不调用工具、需要收集哪些参数、在调用运行期间应该说什么，以及如何处理错误或部分结果。

工具选择

gpt-realtime-1.5 会严格遵循指令。但是，如果您的指令与模型可访问的内容存在冲突（例如，在提示词中提到了未传入工具列表的工具），则可能会导致糟糕的响应。

何时使用：提示词中提到了实际上不可用的工具。
作用：审查可用工具和系统提示词，确保它们保持一致。

示例

# Tools
## lookup_account(email_or_phone)
...


## check_outage(address)
...

我们需要确保相同的工具可用，并且 描述互不矛盾:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[
{
    "name": "lookup_account",
    "description": "Retrieve a customer account using either an email or phone number to enable verification and account-specific actions.",
    "parameters": {
      ...
  },
{
    "name": "check_outage",
    "description": "Check for network outages affecting a given service address and return status and ETA if applicable.",
    "parameters": {
      ...
  }
]

工具调用前导语

某些用例可以受益于 Realtime 模型在调用工具的同时提供音频响应。这能带来更好的用户体验，掩盖延迟。您可以修改示例短语以适应您的用例。

何时使用：用户需要在发起工具调用的同时获得即时确认；有助于掩盖延迟。
作用：在发起工具调用之前添加简短、一致的前导语。

示例

# Tools
- Before any tool call, say one short line like “I’m checking that now.” Then call the tool immediately.

这些是使用以下指令应用后的响应 gpt-realtime-1.5.

tool proactive

使用该指令，模型在调用工具的同时会输出音频响应“我现在正在检查”。

工具调用前导语 + 示例短语

如果您想更严格地控制模型在调用工具的同时输出的短语类型，可以在工具规范描述中添加示例短语。

示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
tools = [
  {
    "name": "lookup_account",
    "description": "Retrieve a customer account using either an email or phone number to enable verification and account-specific actions.

Preamble sample phrases:
- For security, I’ll pull up your account using the email on file.
- Let me look up your account by {email} now.
- I’m fetching the account linked to {phone} to verify access.
- One moment—I’m opening your account details."
    "parameters": {
      "..."
    }
  },
  {
    "name": "check_outage",
    "description": "Check for network outages affecting a given service address and return status and ETA if applicable.

Preamble sample phrases:
- I’ll check for any outages at {service_address} right now.
- Let me look up network status for your area.
- I’m checking whether there’s an active outage impacting your address.
- One sec—verifying service status and any posted ETA.",
    "parameters": {
      "..."
    }
  }
]

无需确认的工具调用

有时模型可能会在工具调用前请求确认。对于某些用例，这可能会导致最终用户体验不佳，因为模型不够主动。

何时使用：Agent 在进行显而易见的工具调用前仍请求许可。
作用：移除不必要的确认循环。

示例

# Tools
- When calling a tool, do not ask for any user confirmation. Be proactive

这些是回复在应用指令使用 gpt-realtime-1.5.

tool no confirm

在示例中，您会注意到 realtime 模型没有产生任何响应音频；它直接调用了相应的工具。

提示：如果您发现模型过快地调用工具，请尝试柔化措辞。例如，将“主动”等较强烈的词语替换为更温和的词语，有助于引导模型采取更从容、不那么急切的方式。

工具调用性能

随着用例变得越来越复杂以及可用工具数量的增加，明确指导模型何时使用每个工具（同样重要的是，何时不使用）变得至关重要。清晰的使用规则不仅能提高工具调用的准确性，还能帮助模型在正确的时间选择正确的工具。

何时使用：模型在工具调用性能方面表现不佳，需要明确的指令以减少误用。
作用：添加关于何时“使用/避免”每种工具的指令。你还可以添加关于工具调用序列的指令（在调用工具 A 之后，可以调用工具 B 或 C）

示例

# Tools
- When you call any tools, you must output at the same time a response letting the user know that you are calling the tool.

## lookup_account(email_or_phone)
Use when: verifying identity or viewing plan/outage flags.
Do NOT use when: the user is clearly anonymous and only asks general questions.


## check_outage(address)
Use when: user reports connectivity issues or slow speeds.
Do NOT use when: question is billing-only.


## refund_credit(account_id, minutes)
Use when: confirmed outage > 240 minutes in the past 7 days.
Do NOT use when: outage is unconfirmed; route to Diagnose → check_outage first.


## schedule_technician(account_id, window)
Use when: repeated failures after reboot and outage status = false.
Do NOT use when: outage status = true (send status + ETA instead).


## escalate_to_human(account_id, reason)
Use when: user seems very frustrated, abuse/harassment, repeated failures, billing disputes >$50, or user requests escalation.

提示：如果工具调用可能发生不可预测的失败，请添加明确的失败处理指令，以便模型能够优雅地响应。

工具级行为

您可以针对特定工具微调模型的行为，而不是应用一个全局规则。例如，您可能希望主动调用 READ 工具，而调用 WRITE 工具则需要明确的确认。

何时使用：关于主动性、确认或前导语的全局指令并不适用于所有工具。
作用：添加针对各个工具的行为规则，定义模型是应该立即调用该工具、先进行确认，还是在调用前说一段前导语。

示例

# TOOLS
- For the tools marked PROACTIVE: do not ask for confirmation from the user and do not output a preamble.
- For the tools marked as CONFIRMATION FIRST: always ask for confirmation to the user.
- For the tools marked as PREAMBLES: Before any tool call, say one short line like “I’m checking that now.” Then call the tool immediately.


## lookup_account(email_or_phone) — PROACTIVE
Use when: verifying identity or accessing billing.
Do NOT use when: caller refuses to identify after second request.


## check_outage(address) — PREAMBLES
Use when: caller reports failed connection or speed lower than 10 Mbps.
Do NOT use when: purely billing OR when internet speed is above 10 Mbps.
If either condition applies, inform the customer you cannot assist and hang up.


## refund_credit(account_id, minutes) — CONFIRMATION FIRST
Use when: confirmed outage > 240 minutes in the past 7 days (credit 60 minutes).
Do NOT use when: outage unconfirmed.
Confirmation phrase: “I can issue a credit for this outage—would you like me to go ahead?”


## schedule_technician(account_id, window) — CONFIRMATION FIRST
Use when: reboot + line checks fail AND outage=false.
Windows: “10am–12pm ET” or “2pm–4pm ET”.
Confirmation phrase: “I can schedule a technician to visit—should I book that for you?”


## escalate_to_human(account_id, reason) — PREAMBLES
Use when: harassment, threats, self-harm, repeated failure, billing disputes > $50, caller is frustrated, or caller requests escalation.
Preamble: “Let me connect you to a senior agent who can assist further.”

工具输出格式

某些工具输出，尤其是必须逐字重复的长字符串，可能不在模型的常规训练分布内。在训练期间，工具输出通常看起来像具有命名字段的 JSON 对象。如果您的工具返回原始字符串，并单独要求模型“完全重复”，则模型可能更容易出现复述、截断或混入自己开场白的情况。

一种实用的修复方法是让工具输出看起来像正常的结果，并通过机器可读的方式明确逐字重复的要求。

何时使用： A tool returns 冗长或复杂的结构化内容 （多句指令、交接数据包、ID/链接、政策摘要、多步骤程序等），并且您观察到 截断、复述、字段遗漏、重新排序，或模型混入了自己的开场白/评论.
其作用： 将工具输出包装在 小型、明确的 JSON 封装 (e.g., response_text 加上类似标志 require_repeat_verbatim, format, or content_type) 以使响应看起来更像 in-distribution 且预期的呈现行为是 机器明确的.
如何适配： 保持 schema 最小且稳定。请在你的两者中都清晰地记录预期的工具输出结构 工具指令 and next to the 工具定义 (e.g., “If require_repeat_verbatim 为真，则精确输出 response_text 且不包含其他内容，”或“原样呈现 response_text ；请勿对工具输出进行添加、省略或重新排序字段。”）。

示例

示例：原始字符串（更容易出错）

工具返回：

I just sent you an email with the verification link. Please open it and click “Confirm”.

模型有时会说：

“我已向您发送了一封验证链接邮件……”（复述）
遗漏了最后一句话（截断）
添加了额外的评论（“还需要我帮忙做点什么吗？”）

示例：封装 JSON（更符合训练分布，更可靠）

工具返回：

1
2
3
4
{
  "response_text": "I just sent you an email with the verification link. Please open it and click “Confirm”.",
  "require_repeat_verbatim": true
}

因为这看起来像一个典型的工具结果（JSON 对象），模型通常更容易做到：

识别什么是“权威”内容（response_text）
理解呈现约束（require_repeat_verbatim）
干净地复现工具输出，而不会发生截断或添加额外评论

改写 Supervisor 工具（响应者-思考者架构）

在许多语音设置中，实时模型充当响应者（对用户说话），而一个更强大的文本模型充当思考者（进行规划、策略查找、SOP 完成等）。文本回复通常并不直接适合语音输出，因此响应者必须先将思考者的文本改写为适合语音的响应，然后再生成音频。

何时使用：当响应者在收到思考者的响应后，口语化输出听起来生硬、过长或别扭时。
作用：添加清晰的指令，引导响应者将思考者的文本重新组织为简短、自然的、口语优先的回复。
如何调整：调整措辞风格、开场白和简明度限制，以符合你的用例预期。

示例

# Tools
## Supervisor Tool
Name: getNextResponseFromSupervisor(relevantContextFromLastUserMessage: string)


When to call:
- Any request outside the allow list.
- Any factual, policy, account, or process question.
- Any action that might require internal lookups or system changes.


When not to call:
- Simple greetings and basic chitchat.
- Requests to repeat or clarify.
- Collecting parameters for later Supervisor use:
  - phone_number for account help (getUserAccountInfo)
  - zip_code for store lookup (findNearestStore)
  - topic or keyword for policy lookup (lookupPolicyDocument)


Usage rules and preamble:
1) Say a neutral filler phrase to the user, then immediately call the tool. Approved fillers: “One moment.”, “Let me check.”, “Just a second.”, “Give me a moment.”, “Let me see.”, “Let me look into that.” Fillers must not imply success or failure.
2) Do not mention the “Supervisor” when responding with filler phrase.
3) relevantContextFromLastUserMessage is a one-line summary of the latest user message; use an empty string if nothing salient.
4) After the tool returns, apply Rephrase Supervisor and send your reply.


### Rephrase Supervisor
- Start with a brief conversational opener using active language, then flow into the answer (for example: “Thanks for waiting—”, “Just finished checking that.”, “I’ve got that pulled up now.”).
- Keep it short: no more than 2 sentences.
- Use this template: opener + one-sentence gist + up to 3 key details + a quick confirmation or choice (for example: “Does that match what you expected?”, “Want me to review options?”).
- Read numbers for speech: money naturally (“$45.20” → “forty-five dollars and twenty cents”), phone numbers 3-3-4, addresses with individual digits, dates/times plainly (“August twelfth”, “three-thirty p.m.”).

以下是没有改写指令的示例：

Assistant: 您当前的信用卡余额为正，为 32,323,232 澳元。

以下是带有改写指令的同一示例：

Assistant: 刚刚查完——您的信用卡余额为您名下的三千二百三十二万三千二百三十二美元。您的最后一笔付款已于八月一日处理完毕。这和您预期的一致吗？

常用工具

gpt-realtime-1.5 已被训练为能够有效地使用以下常用工具。如果您的用例需要类似的行为，请保持名称、签名和描述与其尽量接近，以最大化可靠性并使其更符合训练分布。

以下是模型经过训练掌握的一些重要常用工具：

示例

# answer(question: string)
Description: Call this when the customer asks a question that you don't have an answer to or asks to perform an action.


# escalate_to_human()
Description: Call this when a customer asks for escalation, or to talk to someone else, or expresses dissatisfaction with the call.


# finish_session()
Description: Call this when a customer says they're done with the session or doesn't want to continue. If it's ambiguous, confirm with the customer before calling.

对话流

本节介绍如何将对话结构化为清晰、目标驱动的阶段，以便模型确切知道在每个步骤该做什么。它定义了每个阶段的目的、推进它的说明，以及过渡到下一个阶段的具体“退出标准”。这能防止模型停滞、跳过步骤或越级前进，并确保对话从问候到解决问题始终保持条理。

此外，通过将提示词组织成各种对话状态，您可以更容易地识别错误模式并进行更有效的迭代。

何时使用：如果对话感觉缺乏条理、在达成目标前停滞，或者模型难以有效地完成目标。
作用：将交互划分为具有明确目标、指令和退出条件的多个阶段。
如何调整：重命名各个阶段以匹配你的工作流；修改每个阶段的指令以符合你的预期行为；保持“退出条件（Exit when）”具体且精简。

示例

# Conversation Flow
## 1) Greeting
Goal: Set tone and invite the reason for calling.
How to respond:
- Identify as NorthLoop Internet Support.
- Keep the opener brief and invite the caller’s goal.
- Confirm that customer is a Northloop customer
Exit to Discovery: Caller states they are a Northloop customer and mentions an initial goal or symptom.


## 2) Discover
Goal: Classify the issue and capture minimal details.
How to respond:
- Determine billing vs connectivity with one targeted question.
- For connectivity: collect the service address.
- For billing/account: collect email or phone used on the account.
Exit when: Intent and address (for connectivity) or email/phone (for billing) are known.


## 3) Verify
Goal: Confirm identity and retrieve the account.
How to respond:
- Once you have email or phone, call lookup_account(email_or_phone).
- If lookup fails, try the alternate identifier once; otherwise proceed with general guidance or offer escalation if account actions are required.
Exit when: Account ID is returned.


## 4) Diagnose
Goal: Decide outage vs local issue.
How to respond:
- For connectivity, call check_outage(address).
- If outage=true, skip local steps; move to Resolve with outage context.
- If outage=false, guide a short reboot/cabling check; confirm each step’s result before continuing.
Exit when: Root cause known.


## 5) Resolve
Goal: Apply fix, credit, or appointment.
How to respond:
- If confirmed outage > 240 minutes in the last 7 days, call refund_credit(account_id, 60).
- If outage=false and issue persists after basic checks, offer “10am–12pm ET” or “2pm–4pm ET” and call schedule_technician(account_id, chosen window).
- If the local fix worked, state the result and next steps briefly.
Exit when: A fix/credit/appointment has been applied and acknowledged by the caller.


## 6) Confirm/Close
Goal: Confirm outcome and end cleanly.
How to respond:
- Restate the result and any next step (e.g., stabilization window or tech ETA).
- Invite final questions; close politely if none.
Exit when: Caller declines more help.

示例短语

示例短语充当模型的“锚点示例”。它们展示了您希望模型遵循的风格、简洁性和语气，同时不会将其锁定在一个死板的响应中。

何时使用：响应缺乏你的品牌风格或不一致。
作用：提供示例短语，供模型灵活变换以保持自然和简洁。
如何调整：将示例替换为符合品牌的表述；保留“不要总是使用”的警告。

示例

# Sample Phrases
- Below are sample examples that you should use for inspiration. DO NOT ALWAYS USE THESE EXAMPLES, VARY YOUR RESPONSES.

Acknowledgements: “On it.” “One moment.” “Good question.”
Clarifiers: “Do you want A or B?” “What’s the deadline?”
Bridges: “Here’s the quick plan.” “Let’s keep it simple.”
Empathy (brief): “That’s frustrating—let’s fix it.”
Closers: “Anything else before we wrap?” “Happy to help next time.”

注意：如果您的语音系统最终总是只重复示例短语，导致语音体验更加机械化，请尝试添加多样性约束。我们已观察到此方法能解决该问题。

对话流 + 示例短语

一种有用的模式是在不同的对话流状态中添加示例短语，以此教导模型什么样的回复才是优质的回复：

示例

# Conversation Flow
## 1) Greeting
Goal: Set tone and invite the reason for calling.
How to respond:
- Identify as NorthLoop Internet Support.
- Keep the opener brief and invite the caller’s goal.
Sample phrases (do not always repeat the same phrases, vary your responses):
- “Thanks for calling NorthLoop Internet—how can I help today?”
- “You’ve reached NorthLoop Support. What’s going on with your service?”
- “Hi there—tell me what you’d like help with.”
Exit when: Caller states an initial goal or symptom.


## 2) Discover
Goal: Classify the issue and capture minimal details.
How to respond:
- Determine billing vs connectivity with one targeted question.
- For connectivity: collect the service address.
- For billing/account: collect email or phone used on the account.
Sample phrases (do not always repeat the same phrases, vary your responses):
- “Is this about your bill or your internet speed?”
- “What address are you using for the connection?”
- “What’s the email or phone number on the account?”
Exit when: Intent and address (for connectivity) or email/phone (for billing) are known.


## 3) Verify
Goal: Confirm identity and retrieve the account.
How to respond:
- Once you have email or phone, call lookup_account(email_or_phone).
- If lookup fails, try the alternate identifier once; otherwise proceed with general guidance or offer escalation if account actions are required.
Sample phrases:
- “Thanks—looking up your account now.”
- “If that doesn’t pull up, what’s the other contact—email or phone?”
- “Found your account. I’ll take care of this.”
Exit when: Account ID is returned.


## 4) Diagnose
Goal: Decide outage vs local issue.
How to respond:
- For connectivity, call check_outage(address).
- If outage=true, skip local steps; move to Resolve with outage context.
- If outage=false, guide a short reboot/cabling check; confirm each step’s result before continuing.
Sample phrases (do not always repeat the same phrases, vary your responses):
- “I’m running a quick outage check for your area.”
- “No outage reported—let’s try a fast modem reboot.”
- “Please confirm the modem lights: is the internet light solid or blinking?”
Exit when: Root cause known.


## 5) Resolve
Goal: Apply fix, credit, or appointment.
How to respond:
- If confirmed outage > 240 minutes in the last 7 days, call refund_credit(account_id, 60).
- If outage=false and issue persists after basic checks, offer “10am–12pm ET” or “2pm–4pm ET” and call schedule_technician(account_id, chosen window).
- If the local fix worked, state the result and next steps briefly.
Sample phrases (do not always repeat the same phrases, vary your responses):
- “There’s been an extended outage—adding a 60-minute bill credit now.”
- “No outage—let’s book a technician. I can do 10am–12pm ET or 2pm–4pm ET.”
- “Credit applied—you’ll see it on your next bill.”
Exit when: A fix/credit/appointment has been applied and acknowledged by the caller.


## 6) Confirm/Close
Goal: Confirm outcome and end cleanly.
How to respond:
- Restate the result and any next step (e.g., stabilization window or tech ETA).
- Invite final questions; close politely if none.
Sample phrases (do not always repeat the same phrases, vary your responses):
- “We’re all set: [credit applied / appointment booked / service restored].”
- “You should see stable speeds within a few minutes.”
- “Your technician window is 10am–12pm ET.”
Exit when: Caller declines more help.

高级对话流

随着用例变得日益复杂，您将需要一种既能扩展又能保持模型有效性的结构。关键在于平衡可维护性与简洁性：过多死板的状态会使模型超载，损害性能并让对话显得机械化。

更好的方法是设计能降低模型感知复杂度的流程。通过以结构化且灵活的方式处理状态，您可以让模型更容易保持专注和响应能力，从而改善用户体验。

管理复杂场景的两种常见模式是：

作为状态机的对话流
通过 session.updates 实现的动态对话流

作为状态机的对话流

将对话定义为一个同时编码状态和转换的 JSON 结构。这使得推演覆盖范围、识别边缘情况以及随时间追踪修改变得容易。由于它以代码形式存储，您可以随着流程的演进对其进行版本控制、比较差异并进行扩展。状态机还赋予您对对话如何以及何时从一个状态移动到另一个状态的精细控制。

示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# Conversation States
[
  {
    "id": "1_greeting",
    "description": "Begin each conversation with a warm, friendly greeting, identifying the service and offering help.",
    "instructions": [
        "Use the company name 'Snowy Peak Boards' and provide a warm welcome.",
        "Let them know upfront that for any account-specific assistance, you’ll need some verification details."
    ],
    "examples": [
      "Hello, this is Snowy Peak Boards. Thanks for reaching out! How can I help you today?"
    ],
    "transitions": [{
      "next_step": "2_get_first_name",
      "condition": "Once greeting is complete."
    }, {
      "next_step": "3_get_and_verify_phone",
      "condition": "If the user provides their first name."
    }]
  },
  {
    "id": "2_get_first_name",
    "description": "Ask for the user’s name (first name only).",
    "instructions": [
      "Politely ask, 'Who do I have the pleasure of speaking with?'",
      "Do NOT verify or spell back the name; just accept it."
    ],
    "examples": [
      "Who do I have the pleasure of speaking with?"
    ],
    "transitions": [{
      "next_step": "3_get_and_verify_phone",
      "condition": "Once name is obtained, OR name is already provided."
    }]
  },
  {
    "id": "3_get_and_verify_phone",
    "description": "Request phone number and verify by repeating it back.",
    "instructions": [
      "Politely request the user’s phone number.",
      "Once provided, confirm it by repeating each digit and ask if it’s correct.",
      "If the user corrects you, confirm AGAIN to make sure you understand.",
    ],
    "examples": [
      "I'll need some more information to access your account if that's okay. May I have your phone number, please?",
      "You said 0-2-1-5-5-5-1-2-3-4, correct?",
      "You said 4-5-6-7-8-9-0-1-2-3, correct?"
    ],
    "transitions": [{
      "next_step": "4_authentication_DOB",
      "condition": "Once phone number is confirmed"
    }]
  },
...

动态对话流

在此模式中，对话通过基于当前状态更新系统提示和工具列表来进行实时适应。您无需向模型一次性暴露所有可能的规则和工具，而是仅提供与对话当前活动阶段相关的内容。

当满足某个状态的结束条件时，您可以使用 session.update 进行过渡，将提示和工具替换为下一阶段所需的内容。

这种方法降低了模型的认知负荷，使其更容易处理复杂任务而不受无关上下文的干扰。

示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
from typing import Dict, List, Literal

State = Literal["verify", "resolve"]

# Allowed transitions
TRANSITIONS: Dict[State, List[State]] = {
    "verify": ["resolve"],
    "resolve": []  # terminal
}

def build_state_change_tool(current: State) -> dict:
    allowed = TRANSITIONS[current]
    readable = ", ".join(allowed) if allowed else "no further states (terminal)"
    return {
        "type": "function",
        "name": "set_conversation_state",
        "description": (
            f"Switch the conversation phase. Current: '{current}'. "
            f"You may switch only to: {readable}. "
            "Call this AFTER exit criteria are satisfied."
        ),
        "parameters": {
            "type": "object",
            "properties": {
                "next_state": {"type": "string", "enum": allowed}
            },
            "required": ["next_state"]
        }
    }

# Minimal business tools per state
TOOLS_BY_STATE: Dict[State, List[dict]] = {
    "verify": [{
        "type": "function",
        "name": "lookup_account",
        "description": "Fetch account by email or phone.",
        "parameters": {
            "type": "object",
            "properties": {"email_or_phone": {"type": "string"}},
            "required": ["email_or_phone"]
        }
    }],
    "resolve": [{
        "type": "function",
        "name": "schedule_technician",
        "description": "Book a technician visit.",
        "parameters": {
            "type": "object",
            "properties": {
                "account_id": {"type": "string"},
                "window": {"type": "string", "enum": ["10-12 ET", "14-16 ET"]}
            },
            "required": ["account_id", "window"]
        }
    }]
}

# Short, phase-specific instructions
INSTRUCTIONS_BY_STATE: Dict[State, str] = {
    "verify": (
        "# Role & Objective\n"
        "Verify identity to access the account.\n\n"
        "# Conversation (Verify)\n"
        "- Ask for the email or phone on the account.\n"
        "- Read back digits one-by-one (e.g., '4-1-5… Is that correct?').\n"
        "Exit when: Account ID is returned.\n"
        "When exit is satisfied: call set_conversation_state(next_state=\"resolve\")."
    ),
    "resolve": (
        "# Role & Objective\n"
        "Apply a fix by booking a technician.\n\n"
        "# Conversation (Resolve)\n"
        "- Offer two windows: '10–12 ET' or '2–4 ET'.\n"
        "- Book the chosen window.\n"
        "Exit when: Appointment is confirmed.\n"
        "When exit is satisfied: end the call politely."
    )
}

def build_session_update(state: State) -> dict:
    """Return the JSON payload for a Realtime `session.update` event."""
    return {
        "type": "session.update",
        "session": {
            "instructions": INSTRUCTIONS_BY_STATE[state],
            "tools": TOOLS_BY_STATE[state] + [build_state_change_tool(state)]
        }
    }

安全与升级

通常，对于 Realtime 语音代理而言，拥有一种可靠的方式移交给人工非常重要。在本节中，您应根据自身的用例修改关于何时进行移交的指令。

何时使用：模型难以确定何时应适当地升级给人工或回退系统
作用：定义快速、可靠的升级机制以及该说些什么。
如何调整：插入你自己的阈值以及模型必须要说的话。

示例

# Safety & Escalation
When to escalate (no extra troubleshooting):
- Safety risk (self-harm, threats, harassment)
- User explicitly asks for a human
- Severe dissatisfaction (e.g., “extremely frustrated,” repeated complaints, profanity)
- **2** failed tool attempts on the same task **or** **3** consecutive no-match/no-input events
- Out-of-scope or restricted (e.g., real-time news, financial/legal/medical advice)

What to say at the same time as calling the escalate_to_human tool (MANDATORY):
- “Thanks for your patience—I’m connecting you with a specialist now.”
- Then call the tool: `escalate_to_human`

Examples that would require escalation:
- “This is the third time the reset didn’t work. Just get me a person.”
- “I am extremely frustrated!”

第一个示例展示了来自以下内容的对话响应 gpt-4o-realtime-preview-2025-06-03 使用了该指令。

escalate 06

第二个示例展示了来自以下内容的对话响应 gpt-realtime-1.5 使用了该指令。

escalate 07

gpt-realtime-1.5 能够遵循指令并更可靠地移交给人工。

后续步骤

回顾早前的实时提示指南 for more gpt-realtime-1.5 examples.
回顾实时评估指南来测试典型的语音代理行为。
了解如何连接 WebRTC, WebSocket, or SIP.
了解实时对话生命周期.
审查实时成本.

推荐

入门

核心概念

Apps SDK

工具

运行与扩展

评估

实时与音频

模型优化

专业模型

正式上线

旧版 API

资源

入门指南

使用 Codex

配置

管理

自动化

学习

发布

核心概念

规划

构建

部署

转化应用

指南

资源

指南

文件上传

API

衡量

广告主 API

API 参考

最新

主题

主题

贡献

分类

主题

项目

活动

选择模型

Realtime 2.0 提示词指南

Realtime 2 的变更

推荐的提示词结构

设置推理力度

有目的地使用前导语

控制回复长度

设计工具行为

设置工具调用积极性

从工具故障中恢复

保持工具可用性同步

处理静音和背景音频

有目的地使用消息通道

处理不清晰的音频

捕获精确实体

每次收集一个实体

处理拼读的字符

谨慎规范化口述数字

在工具调用前确认确切的标识符

逐个字符确认电子邮件

实体收集工作流

避免字面指令陷阱

字面解释示例

分别控制语言和口音

英语语言策略

多语言策略

口音控制

自定义语音

在长会话中维持状态

从早期的实时模型迁移

Realtime 1.5 提示指南

通用技巧

提示词结构

角色与目标

示例（模型采用特定口音）

示例（模型扮演特定角色）

个性与语气

示例

示例（多种情绪）