引用格式 | OpenAI API

可靠的引用能够建立信任，并帮助读者验证回复的准确性。本指南提供了实用建议，介绍如何准备可引用的材料，并指导模型使用 OpenAI 模型熟悉的模式来有效地格式化引用。

概览

一个引用系统包含多个部分：您需要决定哪些内容可以被引用，清晰地表示这些材料，指示模型如何引用，并在呈现给用户之前验证结果。

本指南涵盖了模型直接涉及的五个核心要素：

可引用单元：定义模型可以引用的内容。
材料表示：以清晰、结构化的格式呈现源材料。
引用格式：指定模型在引用时应使用的确切格式。
提示词指令：告知模型何时引用以及如何正确引用。
引用解析：从模型的回复中提取引用以供下游使用。

选择可引用单元

在编写提示词之前，请明确定义模型可以引用的内容。常见选项包括：

可引用单元	最适用场景	缺点	示例
文档	您只需要显示答案出自哪个文档。	不够精确。	当您只需要显示哪个文档支持该主张时，引用整个员工手册。
块 / 区块	您希望在简单性和精确性之间取得良好平衡。	仍然无法精确到具体行。	引用包含该条款的具体合同段落或检索到的区块。
行范围	您需要显示确切的支撑文本。	对模型来说更困难。	当用户需要验证精确段落时，引用具体行 `L42-L47` 当用户需要验证确切的段落时。

一个好的可引用单元应该具备以下特点：

一致性：相同的来源在不同运行中应保持相同的 ID。
易于检查：人员应该能够阅读并理解其周围的上下文。
大小合适：大到足以表达完整含义，小到足以保持精确。

对于大多数系统，块级引用是最佳的默认选择。对模型而言，它们通常比行级引用更容易处理；对用户而言，它们比文档级引用更有用。

表示可引用材料

模型无法引用未清晰呈现的材料。无论材料来自工具还是被直接注入，请确保其具备：

稳定的来源 ID：一致的标识符，例如 file1 or block1.
可读文本：格式清晰的源材料。
元数据 (可选)：URL、时间戳、标题和类似上下文。

可引用材料示例

Citation Marker: {CITATION_START}cite{CITATION_DELIMITER}file0{CITATION_STOP}
Title: Employee Handbook
URL: https://company.example/handbook
Updated: 2026-03-01

[L1] Employees may work remotely up to three days per week.
[L2] Additional remote days require manager approval.
[L3] Exceptions may apply for approved accommodations.

来源 ID 与定位符： 来源 ID 是一个稳定的、由模型生成的标识符，例如 block1。定位符是精确的 UI 渲染高亮，例如 lines L8-L13 or Paragraph 21。通常，模型应输出源 ID，而由你的系统来解析或渲染定位符。过早地将二者混合往往会导致格式错误。

定义引用格式

您需要定义模型将生成的引用格式。请使用一种明确、一致且易于模型可靠复现的格式。

以下是我们推荐的引用格式和标记。强烈推荐使用这些引用标记，因为它们与我们的模型所训练的标记非常匹配。如果您选择不同的标记值，请尽量保持整体引用格式相似。

组件	作用	推荐值
`CITATION_START`	打开引用标记。	`\ue200`
引用族	标识引用类型。使用 `cite` for all supported sources.	`cite`
`CITATION_DELIMITER`	分隔标记内的字段。	`\ue202`
来源 ID	标识被引用的单元。 `turn#` 是轮次编号。 `item#` 是具体的文件、块或 URL。	`turn0file1`, `turn0block1`, `turn0url1`
定位符 (可选)	将引用范围缩小到精确的跨度。	`L8-L13`
`CITATION_STOP`	关闭引用标记。	`\ue201`

For tool calls, turnN 每次工具调用递增一次，而不是每个单独结果递增一次。在单次调用中，来源通过后缀区分，例如 file0, file1, 等等。在单响应系统中，所有引用都将是 turn0… 仅当模型在回答前只进行一次工具调用时适用。如果它进行了多次工具调用，您可能会看到类似这样的引用 turn0fileX, turn1fileX, 等等。

模板

{CITATION_START}<citation_family>{CITATION_DELIMITER}<source_id>{CITATION_DELIMITER}<locator>{CITATION_STOP}

示例

{CITATION_START}cite{CITATION_DELIMITER}turn0file1{CITATION_DELIMITER}L8-L13{CITATION_STOP}

如果您的系统不使用定位符，请省略该字段：

{CITATION_START}cite{CITATION_DELIMITER}turn0file1{CITATION_STOP}

编写有效的引用指令

为了保持最高准确性，请使用熟悉的引用模式。自定义或不熟悉的格式会增加模型的认知负荷，导致引用错误，尤其是在以下情况：

低推理工作量下，模型没有足够的预算从格式错误中恢复。
高复杂度任务中，大部分推理预算都花在解决任务本身，而不是清理引用语法。

下面，我们推荐了一种接近模型熟悉模式的引用格式。您可以按原样使用它，也可以对其进行调整以适应您自己的系统。

如果您想定义自己的提示词，请定义：

the exact marker syntax.
引用放在哪里。
何时引用何时不引用。
如何引用多个支持内容。
禁止使用的格式。
缺少支持内容时该怎么做。

推荐的提示词指令

使用以下格式向模型发出明确指令：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
## Citations

Results are returned by "tool_1". Each message from `tool_1` is called a "source" and identified by its reference ID, which is the first occurrence of 【turn\d+\w+\d+】 (e.g. 【turn2file1】). In this example, the string "turn2file1" would be the source reference ID.

Citations are references to `tool_1` sources. Citations may be used to refer to either a single source or multiple sources.

Citations to a single source must be written as {CITATION_START}cite{CITATION_DELIMITER}turn\d+\w+\d+{CITATION_STOP} (e.g. {CITATION_START}cite{CITATION_DELIMITER}turn2file5{CITATION_STOP}).

Citations to multiple sources must be written as {CITATION_START}cite{CITATION_DELIMITER}turn\d+\w+\d+{CITATION_DELIMITER}turn\d+\w+\d+{CITATION_DELIMITER}...{CITATION_STOP} (e.g. {CITATION_START}cite{CITATION_DELIMITER}turn2file5{CITATION_DELIMITER}turn2file1{CITATION_DELIMITER}...{CITATION_STOP}).

Citations must not be placed inside markdown bold, italics, or code fences, as they will not display correctly. Instead, place the citations outside the markdown block. Citations outside code fences may not be placed on the same line as the end of the code fence.

You must NOT write reference ID turn\d+\w+\d+ verbatim in the response text without putting them between {CITATION_START}...{CITATION_STOP}.

- Place citations at the end of the paragraph, or inline if the paragraph is long, unless the user requests specific citation placement.
- Citations must be placed after punctuation.
- Citations must not be all grouped together at the end of the response.
- Citations must not be put in a line or paragraph with nothing else but the citations themselves.

如果您希望模型也输出定位符（例如行号 (L1-L22）），请在提示词中像这样指定：

You *must* cite any results you use from this tool using the:
`\ue200cite\ue202turn0file0\ue202L8-L13\ue201` format ONLY if the item has a corresponding citation marker.

不要试图引用没有相应引用标记的条目，因为它们不应被引用。
您必须在引用中包含行范围。

用于实现更高质量引用的可选指令

当您需要更高质量的引用行为时，以下规则通常值得包含。请根据您的用例需求调整此部分。

1
2
3
4
5
6
7
8
9
10
11
12
13
<extra_considerations_for_citations>
- **Relevance:** Include only search results and citations that support the cited response text. Irrelevant sources permanently degrade user trust.
- **Diversity:** You must base your answer on sources from diverse domains, and cite accordingly.
- **Trustworthiness:** To produce a credible response, you must rely on high quality domains, and ignore information from less reputable domains unless they are the only source.
- **Accurate Representation:** Each citation must accurately reflect the source content. Selective interpretation of the source content is not allowed.

Remember, the quality of a domain/source depends on the context.
- When multiple viewpoints exist, cite sources covering the spectrum of opinions to ensure balance and comprehensiveness.
- When reliable sources disagree, cite at least one high-quality source for each major viewpoint.
- Ensure more than half of citations come from widely recognized authoritative outlets on the topic.
- For debated topics, cite at least one reliable source representing each major viewpoint.
- Do not ignore the content of a relevant source because it is low quality.
</extra_considerations_for_citations>

解析引用

一旦模型输出了引用，您需要从响应文本中提取它们，以便解析来源 ID、渲染链接，或在向用户展示答案之前移除原始标记。

下面的辅助函数旨在直接复制到您的应用程序中。它解析单来源引用、多来源引用和可选的行范围定位符，同时保留原始文本中的字符偏移量。

此示例仅支持行号定位符，如果您的系统使用不同的定位符格式，应进行相应调整。

后处理示例

引用解析辅助函数

python

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
import re
from typing import Iterable, TypedDict

CITATION_START = "\ue200"
CITATION_DELIMITER = "\ue202"
CITATION_STOP = "\ue201"

SOURCE_ID_RE = re.compile(r"^[A-Za-z0-9_-]+$")
LINE_LOCATOR_RE = re.compile(r"^L\\d+(?:-L\\d+)?$")


class Citation(TypedDict):
    raw: str
    family: str
    source_ids: list[str]
    locator: str | None
    start: int
    end: int


def extract_citations(
    text: str,
    *,
    families: tuple[str, ...] = ("cite",),
) -> list[Citation]:
    """
    Extract citations such as:

      {CITATION_START}cite{CITATION_DELIMITER}turn0file0{CITATION_STOP}
      {CITATION_START}cite{CITATION_DELIMITER}turn0file0{CITATION_DELIMITER}L8-L13{CITATION_STOP}
      {CITATION_START}cite{CITATION_DELIMITER}turn0search0{CITATION_DELIMITER}turn1news2{CITATION_STOP}
    """
    if not families:
        return []

    family_pattern = "|".join(re.escape(family) for family in families)
    token_re = re.compile(
        rf"{re.escape(CITATION_START)}"
        rf"(?P<family>{family_pattern})"
        rf"{re.escape(CITATION_DELIMITER)}"
        rf"(?P<body>.*?)"
        rf"{re.escape(CITATION_STOP)}",
        re.DOTALL,
    )

    citations: list[Citation] = []

    for match in token_re.finditer(text):
        parts = [part.strip() for part in match.group("body").split(CITATION_DELIMITER)]
        parts = [part for part in parts if part]

        if not parts:
            continue

        locator = None
        if LINE_LOCATOR_RE.fullmatch(parts[-1]):
            locator = parts.pop()

        if not parts or any(not SOURCE_ID_RE.fullmatch(part) for part in parts):
            continue

        citations.append(
            {
                "raw": match.group(0),
                "family": match.group("family"),
                "source_ids": parts,
                "locator": locator,
                "start": match.start(),
                "end": match.end(),
            }
        )

    return citations


def strip_citations(text: str, citations: Iterable[Citation]) -> str:
    """
    Remove raw citation markers from text using offsets returned by
    extract_citations().
    """
    clean_text = text

    for citation in sorted(citations, key=lambda item: item["start"], reverse=True):
        clean_text = clean_text[: citation["start"]] + clean_text[citation["end"] :]

    return clean_text

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
const CITATION_START = "\uE200";
const CITATION_DELIMITER = "\uE202";
const CITATION_STOP = "\uE201";

const SOURCE_ID_RE = /^[A-Za-z0-9_-]+$/;
const LINE_LOCATOR_RE = /^L\d+(?:-L\d+)?$/;

/**
 * @typedef {Object} Citation
 * @property {string} raw
 * @property {string} family
 * @property {string[]} source_ids
 * @property {string | null} locator
 * @property {number} start
 * @property {number} end
 */

/**
 * Extract citations such as:
 *
 *   {CITATION_START}cite{CITATION_DELIMITER}turn0file0{CITATION_STOP}
 *   {CITATION_START}cite{CITATION_DELIMITER}turn0file0{CITATION_DELIMITER}L8-L13{CITATION_STOP}
 *   {CITATION_START}cite{CITATION_DELIMITER}turn0search0{CITATION_DELIMITER}turn1news2{CITATION_STOP}
 *
 * @param {string} text
 * @param {{ families?: string[] }} [options]
 * @returns {Citation[]}
 */
function extractCitations(text, { families = ["cite"] } = {}) {
  if (families.length === 0) {
    return [];
  }

  const familyPattern = families
    .map((family) => family.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"))
    .join("|");

  const tokenRe = new RegExp(
    `${CITATION_START}(?<family>${familyPattern})${CITATION_DELIMITER}(?<body>[\\s\\S]*?)${CITATION_STOP}`,
    "g"
  );

  /** @type {Citation[]} */
  const citations = [];

  for (const match of text.matchAll(tokenRe)) {
    const body = match.groups?.body ?? "";
    const parts = body
      .split(CITATION_DELIMITER)
      .map((part) => part.trim())
      .filter(Boolean);

    if (parts.length === 0) {
      continue;
    }

    let locator = null;
    const lastPart = parts[parts.length - 1];
    if (LINE_LOCATOR_RE.test(lastPart)) {
      locator = parts.pop() ?? null;
    }

    if (parts.length === 0 || parts.some((part) => !SOURCE_ID_RE.test(part))) {
      continue;
    }

    citations.push({
      raw: match[0],
      family: match.groups?.family ?? "",
      source_ids: parts,
      locator,
      start: match.index ?? 0,
      end: (match.index ?? 0) + match[0].length,
    });
  }

  return citations;
}

/**
 * @param {string} text
 * @param {Iterable<Citation>} citations
 * @returns {string}
 */
function stripCitations(text, citations) {
  let cleanText = text;
  const sortedCitations = Array.from(citations).sort(
    (left, right) => right.start - left.start
  );

  for (const citation of sortedCitations) {
    cleanText = cleanText.slice(0, citation.start) + cleanText.slice(citation.end);
  }

  return cleanText;
}

如果您的来源 ID 采用不同的格式，请更新 SOURCE_ID_RE to match your system.

示例

下面的示例展示了两种常见的引用模式：

检索到的工具上下文，即您的工具返回可引用的内容和 ID。
注入上下文，即您在提示词中直接提供可引用的块。

为检索到的工具上下文格式化引用

当模型通过工具检索上下文并在其回答中引用该检索到的上下文时，请使用此模式。

定义可引用单元

您应该根据用例所需的精度来选择可引用单元。下面的示例展示了几种可能的工具输出。

下面的示例展示了几种推荐的工具输出格式。底层工具可能因应用程序而异，但最重要的是输出要以清晰、稳定的结构呈现，就像这些示例一样。

行级示例

以下是工具调用输出的示例：

Citation Marker: {CITATION_START}cite{CITATION_DELIMITER}turn0file0{CITATION_STOP}
[L1] The service agreement states that termination for convenience requires thirty (30) days’ written notice, unless superseded by a customer-specific addendum.
[L2] In practice, renewal terms auto-extend for successive one-year periods when no written non-renewal notice is received before the deadline.
[L3] Appendix B further clarifies that pricing exceptions must be approved in writing by both Finance and the account owner.

Citation Marker: {CITATION_START}cite{CITATION_DELIMITER}turn0file1{CITATION_STOP}
...

这里， turn0file0 是稳定的来源 ID。行号是定位符。

块级示例

以下是工具调用输出的示例：

Citation Marker: {CITATION_START}cite{CITATION_DELIMITER}turn0file0{CITATION_STOP}
[Block1]
The service agreement states that termination for convenience requires thirty (30) days’ written notice, unless superseded by a customer-specific addendum.
In practice, renewal terms auto-extend for successive one-year periods when no written non-renewal notice is received before the deadline.
Appendix B further clarifies that pricing exceptions must be approved in writing by both Finance and the account owner.

Citation Marker: {CITATION_START}cite{CITATION_DELIMITER}turn0file1{CITATION_STOP}
[Block2]
...

如果您想要块级引用而不是行级引用，推荐的做法是使每个检索到的块都有其自己的稳定来源 ID，并仍然使用相同的双字段引用形式进行引用，例如 {CITATION_START}cite{CITATION_DELIMITER}turn0file0{CITATION_STOP}, 而不是凭空发明一套完全不同的引用体系。

编写提示词指令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
## Citations

Results are returned by "tool_1". Each message from `tool_1` is called a "source" and identified by its reference ID, which is the first occurrence of `turn\\d+file\\d+` (for example, `turn0file0` or `turn2file1`). In this example, the string `turn0file0` would be the source reference ID.

Citations are references to `tool_1` sources. Citations may be used to refer to either a single source or multiple sources.

A citation to a single source must be written as:
{CITATION_START}cite{CITATION_DELIMITER}turn\d+file\d+{CITATION_STOP}

If line-level citations are supported, a citation to a specific line range must be written as:
{CITATION_START}cite{CITATION_DELIMITER}turn\d+file\d+{CITATION_DELIMITER}L\d+-L\d+{CITATION_STOP}

Citations to multiple sources must be written by emitting multiple citation markers, one for each supporting source.

You must NOT write reference IDs like `turn0file0` verbatim in the response text without putting them between {CITATION_START}...{CITATION_STOP}.

- Place citations at the end of the supported sentence, or inline if the sentence is long and contains multiple supported clauses.
- Citations must be placed after punctuation.
- Cite only retrieved sources that directly support the cited text.
- Never invent source IDs, line ranges, or block locators that were not returned by the tool.
- If multiple retrieved sources materially support a proposition, cite all of them.
- If the retrieved sources disagree, cite the conflicting sources and describe the disagreement accurately.

输出示例：

The on-call handoff process is documented in the weekly support sync notes. \ue200cite\ue202turn0file0\ue202L8-L13\ue201

为注入上下文格式化引用

当您提前检索或准备上下文并将其直接注入提示词时，请使用此模式。

定义可引用单元

对于注入上下文，一种常见的模式是使用带有稳定引用 ID 的显式标签将来源片段包裹起来。

1
2
3
4
5
6
7
8
9
10
<BLOCK id="block1">
The service agreement states that termination for convenience requires thirty (30) days’ written notice, unless superseded by a customer-specific addendum.
In practice, renewal terms auto-extend for successive one-year periods when no written non-renewal notice is received before the deadline.
Appendix B further clarifies that pricing exceptions must be approved in writing by both Finance and the account owner.
</BLOCK>

<BLOCK id="block2">
Syllabus
</BLOCK>
...

这使得可引用单元变得明确，且便于模型引用。

编写提示词指令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## Citations

Supporting context is provided directly in the prompt as citable units. Each citable unit is identified by the value of its `id` attribute in the first occurrence of a tag such as `<BLOCK id="block5"> ... </BLOCK>`. In this example, `block5` would be the source reference ID.

Because this pattern does not invoke tools, there is no tool turn counter to increment. That means you do not need to use a `turn#` prefix for the citation marker. You can keep IDs in a `turn0block5` style if that matches the rest of your system, or use plain IDs like `block5` as shown here. The key requirement is that the citation marker matches the injected context ID exactly and consistently.

Citations are references to these provided citable units. Citations may be used to refer to either a single source or multiple sources.

A citation to a single source must be written as:
{CITATION_START}cite{CITATION_DELIMITER}<block_id>{CITATION_STOP}

For example:
{CITATION_START}cite{CITATION_DELIMITER}block5{CITATION_STOP}

Citations to multiple sources must be written by emitting multiple citation markers, one for each supporting block.

You must NOT write block IDs verbatim in the response text without putting them between {CITATION_START}...{CITATION_STOP}.

- Place citations at the end of the supported sentence, or inline if the sentence is long and contains multiple supported clauses.
- Citations must be placed after punctuation.
- Cite only blocks that appear in the provided context.
- Never invent new block IDs.
- Never cite outside knowledge or outside authorities.
- If multiple blocks materially support a proposition, cite all of them.
- If the provided blocks conflict, cite the conflicting blocks and describe the conflict accurately.

输出示例：

The Court held that the District Court lacked personal jurisdiction over the petitioner. \ue200cite\ue202block5\ue201

Note: OpenAI 托管的工具（如 Web 搜索）提供自动内联引用。如果您想改用托管工具，请参阅工具概览, Web 搜索指南，且文件搜索指南.

推荐

入门

核心概念

Apps SDK

工具

运行与扩展

评估

实时与音频

模型优化

专业模型

正式上线

旧版 API

资源

入门指南

使用 Codex

配置

管理

自动化

学习

发布

核心概念

规划

构建

部署

转化应用

指南

资源

指南

文件上传

API

衡量

广告主 API

API 参考

最新

主题

主题

贡献

分类

主题

项目

活动

概览

选择可引用单元

表示可引用材料

定义引用格式

模板

示例

编写有效的引用指令

解析引用

示例

为检索到的工具上下文格式化引用

定义可引用单元

编写提示词指令

为注入上下文格式化引用

定义可引用单元

编写提示词指令