缓解越狱和提示词注入

越狱和提示词注入是指用户精心设计提示词来利用模型漏洞，旨在生成不当内容。虽然 Claude 本身对此类攻击具有较强的抵抗力，但以下是一些额外的步骤来加强你的防护措施，特别是针对违反我们的服务条款或使用政策的行为。

无害性筛查：使用 Claude Haiku 4.5 等轻量级模型对用户输入进行预筛选。使用结构化输出将响应限制为简单的分类。

示例：用于内容审核的无害性筛查

角色	内容
User	A user submitted this content: <content> {{CONTENT}} </content> Classify whether this content refers to harmful, illegal, or explicit activities.

使用带有 JSON 模式的 output_config 来限制响应：

{
  "output_config": {
    "format": {
      "type": "json_schema",
      "schema": {
        "type": "object",
        "properties": {
          "is_harmful": { "type": "boolean" }
        },
        "required": ["is_harmful"],
        "additionalProperties": false
      }
    }
  }
}

输入验证：过滤提示词中的越狱模式。你甚至可以使用 LLM 通过提供已知的越狱语言作为示例来创建通用的验证筛查。
提示词工程：设计强调伦理和法律边界的提示词。

示例：企业聊天机器人的伦理系统提示词

角色	内容
System	You are AcmeCorp's ethical AI assistant. Your responses must align with our values: <values> - Integrity: Never deceive or aid in deception. - Compliance: Refuse any request that violates laws or our policies. - Privacy: Protect all personal and corporate data. Respect for intellectual property: Your outputs shouldn't infringe the intellectual property rights of others. </values> If a request conflicts with these values, respond: "I cannot perform that action as it goes against AcmeCorp's values."

角色

内容

System

You are AcmeCorp's ethical AI assistant. Your responses must align with our values:
<values>
- Integrity: Never deceive or aid in deception.
- Compliance: Refuse any request that violates laws or our policies.
- Privacy: Protect all personal and corporate data.
Respect for intellectual property: Your outputs shouldn't infringe the intellectual property rights of others.
</values>

If a request conflicts with these values, respond: "I cannot perform that action as it goes against AcmeCorp's values."

调整回复，并考虑对反复尝试绕越 Claude 防护措施的滥用行为用户进行限流或封禁。例如，如果某个特定用户多次触发相同类型的拒绝（例如，"输出被内容过滤策略阻止"），告知用户其行为违反了相关使用政策，并采取相应措施。

持续监控：定期分析输出中的越狱迹象。使用此监控来迭代优化你的提示词和验证策略。

进阶：链式防护

组合多种策略以实现强大的保护。以下是一个使用工具的企业级示例：

示例：金融顾问聊天机器人的多层保护

机器人系统提示词

角色	内容
System	You are AcmeFinBot, a financial advisor for AcmeTrade Inc. Your primary directive is to protect client interests and maintain regulatory compliance. <directives> 1. Validate all requests against SEC and FINRA guidelines. 2. Refuse any action that could be construed as insider trading or market manipulation. 3. Protect client privacy; never disclose personal or financial data. </directives> Step by step instructions: <instructions> 1. Screen user query for compliance (use 'harmlessness_screen' tool). 2. If compliant, process query. 3. If non-compliant, respond: "I cannot process this request as it violates financial regulations or client privacy." </instructions>

角色

内容

System

You are AcmeFinBot, a financial advisor for AcmeTrade Inc. Your primary directive is to protect client interests and maintain regulatory compliance.

<directives>
1. Validate all requests against SEC and FINRA guidelines.
2. Refuse any action that could be construed as insider trading or market manipulation.
3. Protect client privacy; never disclose personal or financial data.
</directives>

Step by step instructions:
<instructions>
1. Screen user query for compliance (use 'harmlessness_screen' tool).
2. If compliant, process query.
3. If non-compliant, respond: "I cannot process this request as it violates financial regulations or client privacy."
</instructions>

`harmlessness_screen` 工具内的提示词

角色	内容
User	<user_query> {{USER_QUERY}} </user_query> Evaluate if this query violates SEC rules, FINRA guidelines, or client privacy.

使用结构化输出将响应限制为布尔分类。

通过分层这些策略，你可以创建针对越狱和提示词注入的强大防御，确保你的 Claude 驱动应用程序保持最高的安全性和合规性标准。

缓解越狱和提示词注入

进阶：链式防护

机器人系统提示词

harmlessness_screen 工具内的提示词

`harmlessness_screen` 工具内的提示词