新智元报道
编辑:编辑部
程序员福音!OpenAI 新推出的模型 API 全部支持结构化输出,JSON Schema 匹配率高达 100%,成本还立减一半。
还在绞尽脑汁想一堆提示词,为一顿操作后五花八门的输出结果而头疼?
OpenAI 终于听到了群众的呼声,为广大开发者送上渴望已久的第一大功能。
OpenAI 今日宣布新功能上线,ChatGPT API 现已支持 JSON 结构化输出。
JSON(JavaScript Object Notation)是文件和数据交换格式的行业标准,因为它既易于人类读取又易于机器解析。
然而,LLM 常常与 JSON 对着干,经常会产生幻觉,要不生成仅部分遵循指令的响应,要不就生成一堆「天书」,根本无法完全解析。
这就需要开发人员使用多种开源工具、尝试不同的提示或重复请求等来生成理想的输出结果,耗时耗力。
结构化输出功能于今天发布,以上棘手的难题迎刃而解,确保模型生成的输出与 JSON 中规定的 schema 相匹配。
一直以来,结构化输出功能是开发人员呼声最高的头号功能,奥特曼在推文中也表示,该版本是应广大用户的要求发布的。
OpenAI 发布的新功能确实击中了许多开发者的心,他们一致认为「This is a big deal」。
纷纷留言表示赞叹,直呼「Excellent!」。
几家欢喜几家愁,OpenAI 的这次更新,又让人担心会吞噬初创公司。
然而,对于更多的普通用户来说,他们更关心的问题是 GPT-5 到底什么时候发布,至于 JSON Schema,「那是什么?」
毕竟,没有 GPT-5 的消息,OpenAI 今年秋季的 DevDay,可能与去年相比,将会显得安静了许多。
轻松确保模式一致性
有了结构化输出,只需要定义一个 JSON Schema,AI 就会不再「任性」,乖乖按照指令要求输出数据。
并且,新功能不仅仅让 AI 变得更加听话,还能大大提高输出内容的可靠性。
在对复杂的 JSON schema 的跟踪评估中,带有结构化输出的新模型 gpt-4o-2024-08-06 获得了 100% 的满分。相比之下,gpt-4-0613 的得分不到 40%。
实际上,JSON Schema 功能就是 OpenAI 在去年的 DevDay 上推出的。
现在,OpenAI 在 API 中扩展了这项功能,确保模型生成的输出与开发人员提供的 JSON Schema 完全匹配。
从非结构化输入生成结构化数据是当今应用中人工智能的核心用例之一。
开发人员使用 OpenAI API 构建强大的助手,能够通过函数调用获取数据和回答问题,提取结构化数据以进行数据输入,并构建多步骤的智能体工作流(multi-step agentic workflows),从而允许 LLM 采取行动。
技术原理
OpenAI 采用了一种双管齐下的方法来提高模型输出与 JSON Schema 的匹配度。
最新的 gpt-4o-2024-08-06 模型经过训练,可以更好地理解复杂的 Schema 并生成与之匹配的输出。
尽管模型性能已显著提升,在基准测试中达到了 93% 的准确性,但固有不确定性仍然存在。
为了确保开发者构建应用的稳定性,OpenAI 提供了一种更高准确度的方法来约束模型的输出,从而实现 100% 的可靠性。
约束解码
OpenAI 采用了一种称为约束采样或约束解码的技术,默认情况下,模型生成输出时完全不受约束,可能从词汇表中选择任何 token 作为下一个输出。
这种灵活性可能导致错误,例如,在生成有效 JSON 时随意插入无效字符。
为了避免此类错误,OpenAI 使用动态约束解码的方法,确保生成的输出 token 始终符合提供的 schema。
为了实现这一点,OpenAI 将提供的 JSON Schema 转换为上下文无关文法(CFG)。
对于每个 JSON Schema,OpenAI 计算出一个代表该模式的语法,并在采样期间高效地访问预处理的组件。
这种方法不仅使生成的输出更准确,还减少了不必要的延迟。首次请求新模式可能会有额外的处理时间,但随后的请求通过缓存机制实现快速响应。
备选方案
除了 CFG 方法,其他方法通常使用有限状态机(FSM)或正则表达式来进行约束解码。
然而,这些方法在动态更新有效 token 时能力有限。特别是对于复杂的嵌套或递归数据结构,FSM 通常难以处理。
OpenAI 的 CFG 方法在表达复杂 schema 时表现出色。例如,支持递归模式的 JSON schema 在 OpenAI API 上已得到实现,但无法通过 FSM 方法表达。
输入成本节省一半
支持函数调用的所有模型均可实现结构化输出,包括最新的 GPT-4o 和 GPT-4o-mini 模型,以及微调模型。
此功能可在 Chat Completions API、Assistants API 和 Batch API 上使用,并兼容视觉输入。
与 gpt-4o-2024-05-13 版本相比,gpt-4o-2024-08-06 版本在成本上也更具优势,开发者可以在输入端节省 50% 的成本(2.50 美元/1M oken),在输出端节省 33% 的成本(10.00 美元/1M token)。
如何使用结构化输出
在 API 中可以使用两种形式引入结构化输出:
函数调用
通过在函数定义中设置 strict: true,可以实现通过工具的结构化输出。
此功能适用于支持工具的所有型号,包括所有型号 gpt-4-0613 和 gpt-3.5-turbo-0613 及更高版本。
启用结构化输出后,模型输出将与提供的工具定义匹配。
示例请求:
POST /v1/chat/completions { "model": "gpt-4o-2024-08-06", "messages": [ { "role": "system", "content": "You are a helpful assistant. The current date is August 6, 2024. You help users query for the data they are looking for by calling the query function." }, { "role": "user", "content": "look up all my orders in may of last year that were fulfilled but not delivered on time" } ], "tools": [ { "type": "function", "function": { "name": "query", "description": "Execute a query.", "strict": true, "parameters": { "type": "object", "properties": { "table_name": { "type": "string", "enum": ["orders"] }, "columns": { "type": "array", "items": { "type": "string", "enum": [ "id", "status", "expected_delivery_date", "delivered_at", "shipped_at", "ordered_at", "canceled_at" ] } }, "conditions": { "type": "array", "items": { "type": "object", "properties": { "column": { "type": "string" }, "operator": { "type": "string", "enum": ["=", ">", "<", ">=", "<=", "!="] }, "value": { "anyOf": [ { "type": "string" }, { "type": "number" }, { "type": "object", "properties": { "column_name": { "type": "string" } }, "required": ["column_name"], "additionalProperties": false } ] } }, "required": ["column", "operator", "value"], "additionalProperties": false } }, "order_by": { "type": "string", "enum": ["asc", "desc"] } }, "required": ["table_name", "columns", "conditions", "order_by"], "additionalProperties": false } } } ] }
示例输出:
{ "table_name": "orders", "columns": ["id", "status", "expected_delivery_date", "delivered_at"], "conditions": [ { "column": "status", "operator": "=", "value": "fulfilled" }, { "column": "ordered_at", "operator": ">=", "value": "2023-05-01" }, { "column": "ordered_at", "operator": "<", "value": "2023-06-01" }, { "column": "delivered_at", "operator": ">", "value": { "column_name": "expected_delivery_date" } } ], "order_by": "asc" }
response_format 参数的新选项
开发人员现在可以通过 response_format 的新选项 json_schema 选择是否需要规定格式的输出。
当模型不调用工具,而是以结构化方式响应用户时,这一功能非常有用。
此功能适用于最新的 GPT-4o 型号:今天发布的 gpt-4o-2024-08-06 和 gpt-4o-mini-2024-07-18 。
将 response_format 设置为 strict:true 时,模型输出将与提供的 schema 匹配。
示例请求:
POST /v1/chat/completions { "model": "gpt-4o-2024-08-06", "messages": [ { "role": "system", "content": "You are a helpful math tutor." }, { "role": "user", "content": "solve 8x + 31 = 2" } ], "response_format": { "type": "json_schema", "json_schema": { "name": "math_response", "strict": true, "schema": { "type": "object", "properties": { "steps": { "type": "array", "items": { "type": "object", "properties": { "explanation": { "type": "string" }, "output": { "type": "string" } }, "required": ["explanation", "output"], "additionalProperties": false } }, "final_answer": { "type": "string" } }, "required": ["steps", "final_answer"], "additionalProperties": false } } } }
示例输出:
{ "steps": [ { "explanation": "Subtract 31 from both sides to isolate the term with x.", "output": "8x + 31 - 31 = 2 - 31" }, { "explanation": "This simplifies to 8x = -29.", "output": "8x = -29" }, { "explanation": "Divide both sides by 8 to solve for x.", "output": "x = -29 / 8" } ], "final_answer": "x = -29 / 8" }
开发人员可以使用结构化输出逐步生成答案,以引导达到预期的输出。
根据 OpenAI 的说法,开发人员不需要验证或重试格式不正确的响应,并且该功能允许更简单的提示。
原生 SDK 支持
OpenAI 称他们的 Python 和 Node SDK 已更新,原生支持结构化输出。
为工具提供架构或响应格式就像提供 Pydantic 或 Zod 对象一样简单,OpenAI 的 SDK 能将数据类型转换为支持的 JSON 模式、自动将 JSON 响应反序列化为类型化数据结构以及解析拒绝。
from enum import Enum from typing import Union from pydantic import BaseModel import openai from openai import OpenAI class Table (str, Enum): orders = "orders" customers = "customers" products = "products" class Column (str, Enum): id = "id" status = "status" expected_delivery_date = "expected_delivery_date" delivered_at = "delivered_at" shipped_at = "shipped_at" ordered_at = "ordered_at" canceled_at = "canceled_at" class Operator (str, Enum): eq = "=" gt = ">" lt = "<" le = "<=" ge = ">=" ne = "!=" class OrderBy (str, Enum): asc = "asc" desc = "desc" class DynamicValue (BaseModel): column_name: str class Condition (BaseModel): column: str operator: Operator value: Union[str, int, DynamicValue] class Query (BaseModel): table_name: Table columns: list[Column] conditions: list[Condition] order_by: OrderBy client = OpenAI () completion = client.beta.chat.completions.parse ( model="gpt-4o-2024-08-06", messages=[ { "role": "system", "content": "You are a helpful assistant. The current date is August 6, 2024. You help users query for the data they are looking for by calling the query function.", }, { "role": "user", "content": "look up all my orders in may of last year that were fulfilled but not delivered on time", }, ], tools=[ openai.pydantic_function_tool (Query), ], ) print (completion.choices[0].message.tool_calls[0].function.parsed_arguments)
而且,本机结构化输出支持也可用于 response_format 。
from pydantic import BaseModel from openai import OpenAI class Step (BaseModel): explanation: str output: str class MathResponse (BaseModel): steps: list[Step] final_answer: str client = OpenAI () completion = client.beta.chat.completions.parse ( model="gpt-4o-2024-08-06", messages=[ {"role": "system", "content": "You are a helpful math tutor."}, {"role": "user", "content": "solve 8x + 31 = 2"}, ], response_format=MathResponse, ) message = completion.choices[0].message if message.parsed: print (message.parsed.steps) print (message.parsed.final_answer) else: print (message.refusal)
其他用例
开发人员经常使用 OpenAI 的模型为各种用例生成结构化数据。
其他一些例子包括:
-根据用户意图动态生成用户界面
开发人员可以使用结构化输出来创建代码或 UI 生成应用程序。
使用相同的 response_format ,可根据用户输入生成不同的 UI。
例如,创建「园丁的登录界面」:
就是用以下代码生成的:
{ "type": "div", "label": "", "children": [ { "type": "header", "label": "", "children": [ { "type": "div", "label": "Green Thumb Gardening", "children": [], "attributes": [{ "name": "className", "value": "site-title" }] }, { "type": "div", "label": "Bringing Life to Your Garden", "children": [], "attributes": [{ "name": "className", "value": "site-tagline" }] } ], "attributes": [{ "name": "className", "value": "header" }] }, { "type": "section", "label": "", "children": [ { "type": "div", "label": "", "children": [ { "type": "div", "label": "About Us", "children": [ { "type": "div", "label": "At Green Thumb Gardening, we specialize in transforming your outdoor spaces into beautiful, thriving gardens. Our team has decades of experience in horticulture and landscape design.", "children": [], "attributes": [ { "name": "className", "value": "about-description" } ] } ], "attributes": [{ "name": "className", "value": "about-section" }] } ], "attributes": [{ "name": "className", "value": "content" }] } ], "attributes": [{ "name": "className", "value": "about-container" }] }, { "type": "section", "label": "", "children": [ { "type": "div", "label": "", "children": [ { "type": "div", "label": "Our Services", "children": [ { "type": "div", "label": "Garden Design", "children": [], "attributes": [ { "name": "className", "value": "service-item" } ] }, { "type": "div", "label": "Plant Care & Maintenance", "children": [], "attributes": [ { "name": "className", "value": "service-item" } ] }, { "type": "div", "label": "Seasonal Cleanup", "children": [], "attributes": [ { "name": "className", "value": "service-item" } ] }, { "type": "div", "label": "Custom Landscaping", "children": [], "attributes": [ { "name": "className", "value": "service-item" } ] } ], "attributes": [{ "name": "className", "value": "services-list" }] } ], "attributes": [{ "name": "className", "value": "content" }] } ], "attributes": [{ "name": "className", "value": "services-container" }] } ], "attributes": [{ "name": "className", "value": "landing-page" }] }
- 将最终答案与支撑性的推理或附加评论分开
为模型提供一个单独的思维链字段可以提高响应的最终质量。
请求:
{ "model": "gpt-4o-2024-08-06", "messages": [ { "role": "system", "content": "You are a helpful assistant" }, { "role": "user", "content": "9.11 and 9.9 -- which is bigger?" } ], "response_format": { "type": "json_schema", "json_schema": { "name": "reasoning_schema", "strict": true, "schema": { "type": "object", "properties": { "reasoning_steps": { "type": "array", "items": { "type": "string" }, "description": "The reasoning steps leading to the final conclusion." }, "answer": { "type": "string", "description": "The final answer, taking into account the reasoning steps." } }, "required": ["reasoning_steps", "answer"], "additionalProperties": false } } } }
结构化输出:
{ "reasoning_steps": [ "First step is to compare the numbers 9.11 and 9.9.", "Both numbers have the same whole number part, which is 9.", "To compare the decimal parts, convert them to the same number of decimal places.", "9.11 has two decimal places: it is 9.11.", "9.9 has one decimal place: it can be rewritten as 9.90.", "Now, compare 9.11 and 9.90 by looking at the decimal parts.", "Compare 11 with 90.", "90 is greater than 11, so 9.90 is greater than 9.11." ], "answer": "9.9 is bigger than 9.11." }
- 从非结构化数据中提取结构化数据
例如,指示模型从会议记录中提取待办事项、截止日期和作业等内容。
请求:
POST /v1/chat/completions { "model": "gpt-4o-2024-08-06", "messages": [ { "role": "system", "content": "Extract action items, due dates, and owners from meeting notes." }, { "role": "user", "content": "...meeting notes go here..." } ], "response_format": { "type": "json_schema", "json_schema": { "name": "action_items", "strict": true, "schema": { "type": "object", "properties": { "action_items": { "type": "array", "items": { "type": "object", "properties": { "description": { "type": "string", "description": "Description of the action item." }, "due_date": { "type": ["string", "null"], "description": "Due date for the action item, can be null if not specified." }, "owner": { "type": ["string", "null"], "description": "Owner responsible for the action item, can be null if not specified." } }, "required": ["description", "due_date", "owner"], "additionalProperties": false }, "description": "List of action items from the meeting." } }, "required": ["action_items"], "additionalProperties": false } } } }
结构化输出:
{ "action_items": [ { "description": "Collaborate on optimizing the path planning algorithm", "due_date": "2024-06-30", "owner": "Jason Li" }, { "description": "Reach out to industry partners for additional datasets", "due_date": "2024-06-25", "owner": "Aisha Patel" }, { "description": "Explore alternative LIDAR sensor configurations and report findings", "due_date": "2024-06-27", "owner": "Kevin Nguyen" }, { "description": "Schedule extended stress tests for the integrated navigation system", "due_date": "2024-06-28", "owner": "Emily Chen" }, { "description": "Retest the system after bug fixes and update the team", "due_date": "2024-07-01", "owner": "David Park" } ] }
安全的结构化输出
安全是 OpenAI 的首要任务——新的结构化输出功能将遵守 OpenAI 现有的安全政策,并且仍然允许模型拒绝不安全的请求。
为了使开发更简单,API 响应上有一个新的 refusal 字符串值,它允许开发人员以编程方式检测模型是否生成拒绝而不是与架构匹配的输出。
当响应不包含拒绝并且模型的响应没有过早中断(如 finish_reason 所示)时,模型的响应将可靠地生成与提供的 schema 匹配的有效 JSON。
{ "id": "chatcmpl-9nYAG9LPNonX8DAyrkwYfemr3C8HC", "object": "chat.completion", "created": 1721596428, "model": "gpt-4o-2024-08-06", "choices": [ { "index": 0, "message": { "role": "assistant", "refusal": "I'm sorry, I cannot assist with that request." }, "logprobs": null, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 81, "completion_tokens": 11, "total_tokens": 92 }, "system_fingerprint": "fp_3407719c7f" }
参考资料:
https://openai.com/index/introducing-structured-outputs-in-the-api/