AI Semantic Interoperability:RDF / OWL / SHACL
一句话:
AI Semantic Interoperability / RDF / OWL / SHACL 解读
面向对象: AI Architect / Knowledge Architect / Data Product Manager / Senior BA / Financial Services Product Architect。 核心问题: 企业 AI 经常失败在“语义不一致”: 同一个词在不同系统含义不同, 同一个字段在不同流程代表不同业务事实, 同一个指标在不同团队计算口径不同。RAG、GraphRAG、Agent、LLM-to-SQL 和工具调用都需要语义互操作能力。 学习目标: 理解 RDF、OWL、SHACL、JSON Schema、FIBO 等语义技术如何服务 AI 架构, 并把它们转成可落地的 semantic contract、ontology slice、constraint validation 和 eval。
Source Anchors
| Source | Link | 用途 |
|---|---|---|
| W3C RDF | https://www.w3.org/RDF/ | 参考图式事实表达和资源关系建模 |
| W3C OWL | https://www.w3.org/OWL/ | 参考本体语义、类、属性、推理和概念约束 |
| W3C SHACL | https://www.w3.org/TR/shacl/ | 参考 RDF graph 约束验证, 用于 semantic quality gates |
| JSON Schema | https://json-schema.org/ | 参考 JSON payload 和结构化输出约束 |
| FIBO | https://spec.edmcouncil.org/fibo/ | 参考金融行业本体和语义建模实践 |
一句话:
Semantic interoperability 是让 AI 系统在跨文档、跨数据、跨系统、跨团队时仍能理解同一业务概念的同一含义。
1. AI 为什么需要语义互操作
AI 系统不缺文本, 缺稳定语义。
常见问题:
| Problem | AI impact |
|---|---|
| 同词不同义 | 模型把 account 当账户、客户关系或会计科目 |
| 同义不同词 | borrower、obligor、applicant、customer 混用 |
| 字段无上下文 | status=closed 不知道是 case closed、account closed 还是 alert closed |
| 指标口径漂移 | LLM-to-SQL 生成看似正确但口径错误 |
| RAG source 混杂 | 政策、操作手册、培训材料、FAQ 权威等级不清 |
| Tool payload 语义弱 | JSON 字段类型正确, 但业务含义错 |
| Evidence 不可追踪 | 回答和决策无法追溯到概念、来源和约束 |
语义互操作的目标:
business concept
-> formal definition
-> relation to other concepts
-> constraints
-> source authority
-> data/tool/message mapping
-> eval and evidence
2. RDF, OWL, SHACL in One Architecture
| Technology | Solves | AI usage |
|---|---|---|
| RDF | Facts as graph triples | entity/relation graph, source lineage, evidence graph |
| OWL | Ontology semantics | class hierarchy, property meaning, reasoning, domain vocabulary |
| SHACL | Graph constraints | validate required relationships, cardinality, data shape |
| JSON Schema | Payload structure | tool parameters, structured output, event payload |
| FIBO | Domain semantics | financial concept definitions and relationships |
简单理解:
RDF says: what facts are connected.
OWL says: what concepts mean.
SHACL says: what valid graph data must look like.
JSON Schema says: what JSON payload shape is valid.
AI 架构不一定一开始就要完整语义网技术栈。更实际的做法是建立 thin semantic slice:
- 只选当前 use case 最关键的 20-50 个概念。
- 定义关系、同义词、禁止混用、source authority。
- 给 RAG metadata、tool schema、eval case、metric definition 加语义标签。
- 对高风险概念用 SHACL/规则/测试做约束。
3. Semantic Contract
Semantic contract 比普通 data contract 更高一层:
| Field | Example |
|---|---|
| Concept | Borrower |
| Definition | Party legally obligated to repay a credit agreement |
| Not same as | Applicant, account holder, authorized user |
| Source authority | credit policy ontology, loan contract system |
| Relationships | borrower -> party; borrower -> obligation; loan -> contract |
| Data mappings | CRM.customer_id, LOS.applicant_id, servicing.borrower_id |
| Tool mappings | loan.getBorrower, case.createCreditNote |
| RAG tags | credit-policy, loan-servicing, borrower-obligation |
| Eval cases | distinguish borrower/applicant/guarantor |
| Constraints | high-risk adverse action text must identify correct obligated party |
Semantic contract 应进入:
- glossary。
- ontology slice。
- RAG metadata。
- tool/API contract。
- LLM-to-SQL semantic layer。
- eval dataset。
- evidence graph。
4. RDF for Evidence and Knowledge Graphs
RDF-style thinking is useful even if implementation starts as tables.
Example triples:
case:123 hasCustomer party:456
party:456 hasRole borrower
loan:789 governedBy contract:abc
answer:001 cites source:policy-v12
source:policy-v12 hasEffectiveDate 2026-04-01
toolcall:555 usedConcept borrower
release:rel-17 evidencedBy eval:run-42
AI use:
| Graph | Purpose |
|---|---|
| Knowledge graph | domain facts and relationships |
| Evidence graph | release decision, control, eval, trace |
| Source lineage graph | document/version/citation |
| Customer/entity graph | AML, fraud, KYC relationships |
| Metric graph | metric definition, dimensions, owner, lineage |
Graph 不等于让模型自由推理一切。Graph 应该给模型提供受控事实、路径和约束。
5. OWL for Ontology and Meaning
OWL 思维适合处理:
- 类层级。
- 属性约束。
- domain/range。
- 等价类。
- disjoint concepts。
- 推理规则。
AI 中最有价值的不是复杂推理, 而是防止概念混淆。
| Ontology pattern | AI benefit |
|---|---|
| Class hierarchy | 知道 credit card account 和 loan account 都是 account, 但处理不同 |
| Disjoint classes | 防止 customer complaint 和 transaction dispute 混用 |
| Object property | party hasRole borrower / guarantor |
| Data property | contract hasEffectiveDate |
| Equivalence / synonym | map internal terms to standard terms |
| Restrictions | high-risk decision requires human reviewer |
对 PM/BA/Architect, 最重要的是定义 ontology slice, 不一定亲自写 OWL 文件。
6. SHACL for Semantic Quality Gates
SHACL 可用于验证 RDF graph 是否满足约束。AI 架构中的类比:
| Constraint | AI quality gate |
|---|---|
| Customer-facing answer must cite source | answer node requires source citation |
| Policy source must have effective date | source node requires effectiveDate |
| High-risk recommendation must have reviewer | recommendation node requires reviewerApproval |
| Payment message explanation must include party roles | message node requires debtor/creditor/agent relation |
| Credit adverse action must reference correct obligation | decision node links to obligation and reason |
即使不用 SHACL 引擎, 也可以把这些约束转成:
- data quality tests。
- RAG eval。
- structured output validators。
- release gate checks。
- evidence binder queries。
7. JSON Schema vs SHACL
| Need | JSON Schema | SHACL |
|---|---|---|
| Validate tool JSON payload | Strong fit | Not primary |
| Validate structured LLM output | Strong fit | Possible after graph conversion |
| Validate graph relationship completeness | Limited | Strong fit |
| Validate business concept constraints | Limited | Strong fit |
| Validate API request type/enum | Strong fit | Not primary |
| Validate evidence graph coverage | Limited | Strong fit |
Rule of thumb:
JSON Schema for message/payload shape.
SHACL for graph/concept relationship validity.
8. Financial Retail Case: Lending Policy AI
目标: 让 AI assistant 帮信贷团队解释产品政策、客户状态、还款安排和例外流程。
8.1 Semantic Slice
| Concept | Definition | Confusions to avoid |
|---|---|---|
| Applicant | Applies for credit | not always borrower |
| Borrower | Obligated party | not guarantor |
| Guarantor | Provides guarantee | not primary borrower |
| Facility | Approved credit arrangement | not individual loan account |
| Loan account | Servicing account | not original contract |
| Delinquency | Past due state | not default |
| Forbearance | Temporary relief agreement | not forgiveness |
| Adverse action | Regulated denial/negative decision | not generic rejection |
8.2 Semantic Controls
| Risk | Semantic control |
|---|---|
| AI gives wrong party responsibility | borrower/applicant/guarantor eval cases |
| AI cites stale policy | source effective date required |
| AI confuses delinquency and default | ontology disjoint concepts + eval |
| AI generates adverse action language wrongly | structured output + compliance review |
| AI queries wrong metric | semantic metric contract |
9. Semantic Interoperability in AI Stack
| Stack layer | Semantic requirement |
|---|---|
| Intake | use case mapped to domain concepts |
| RAG ingestion | source tagged by concept, authority, effective date |
| Retrieval | query expansion uses approved synonyms |
| Prompt/context | concept definitions injected only when needed |
| Tool contract | fields mapped to business concepts |
| Structured output | schema uses controlled vocabulary |
| Eval | cases cover concept confusion and semantic drift |
| Observability | trace logs concept tags and source versions |
| Evidence | release bundle links requirements to semantic controls |
10. Common Failure Modes
| Failure mode | 表现 | 修正 |
|---|---|---|
| Glossary-only | 有术语表, 不进入系统 | link glossary to RAG metadata/tool/eval |
| Ontology big bang | 建全企业本体, 迟迟不交付 | thin semantic slice per use case |
| Schema-only | JSON 字段对, 业务概念错 | add semantic contract |
| Vector-only RAG | 语义相似但概念混淆 | metadata + ontology + eval |
| No drift monitoring | 术语和政策变化后 eval 不更新 | semantic drift log |
11. 面试表达
30 秒版本:
我会把金融 AI 的语义问题当成架构问题处理。RDF 帮我表达实体和证据关系, OWL 帮我定义金融概念和关系, SHACL 帮我验证图数据和证据是否满足约束, JSON Schema 帮我约束工具和结构化输出。实际落地时不需要一开始建全企业本体, 可以从每个高价值 use case 的 thin semantic slice 开始。
2 分钟版本:
以 lending policy assistant 为例, 我会先定义 applicant、borrower、guarantor、facility、loan account、delinquency、default、forbearance 等概念, 标注同义词、禁止混用、权威来源和系统字段映射。RAG ingestion 用这些概念做 metadata, tool contract 的字段映射到业务概念, eval 专门覆盖 borrower/applicant/guarantor 混淆和 stale policy 引用。高风险 adverse action 输出必须满足结构化 schema 和语义约束, release evidence 记录 source version、concept tags 和 eval coverage。
12. Practice Assignment
为一个金融 AI use case 设计 semantic interoperability pack:
- 20 个核心概念。
- 10 个关系。
- 10 个容易混淆的术语。
- 5 条 semantic constraints。
- RAG metadata schema。
- Tool field semantic mapping。
- 15 条 semantic eval cases。
- Semantic drift log。
完成标准:
- 每个概念有 owner 和 source authority。
- 每个高风险混淆点有 eval。
- 每个工具字段能追到业务概念。
- 至少 3 条约束可自动验证。