返回 Papers
AI 底层逻辑 / 经典论文

AI Requirements Mining:需求与流程知识抽取架构

重要说明: 本文是学习、作品集和内部架构训练材料, 不构成法律意见、合规结论、监管解释、审计意见、记录保留结论、模型验证报告或采购建议。正式项目必须由 Legal、Compliance、Privacy、Records Management、Information Security、Model Risk、Operational Risk、Internal Audit、Business Owner、D

672ai-foundations/papers/148-ai-requirements-mining-process-knowledge-extraction-architecture.md

AI Requirements Mining / Process Knowledge Extraction Architecture 解读

面向对象: CBAP+ Senior BA / AI PM / Product Architect / Enterprise Architect / Solution Architect / Process Owner / Risk and Control Partner。 核心问题: 如何从 PRD、BRD、SOP、政策、工单、Jira / Azure DevOps、通话转写、会议纪要、流程图、代码/API 规格、测试用例、生产日志和控制证据中提取需求、流程知识、证据链和变更影响, 同时避免把 AI requirements mining 误解为“自动替代 BA”。 学习目标: 建立 artifact ingestion、source authority、permission filtering、domain vocabulary、ambiguity detection、requirement quality rubric、process variant discovery、traceability graph、eval contract、SME validation、hallucination control、records/privacy controls 和 portfolio learning loop 的高级架构心智模型。

重要说明: 本文是学习、作品集和内部架构训练材料, 不构成法律意见、合规结论、监管解释、审计意见、记录保留结论、模型验证报告或采购建议。正式项目必须由 Legal、Compliance、Privacy、Records Management、Information Security、Model Risk、Operational Risk、Internal Audit、Business Owner、Data Owner 和 Architecture 结合机构类型、司法辖区、产品、客户影响和内部政策确认。访问日期按 2026-06-30 记录。


Source Anchors

SourceOfficial link本文使用方式
ISO/IEC/IEEE 29148 Requirements Engineeringhttps://www.iso.org/standard/72089.htmlhttps://standards.ieee.org/ieee/29148/6937/用 requirements lifecycle、stakeholder need、system/software requirement、quality characteristics 和 traceability 作为需求资产治理锚点。
FFIEC Development, Acquisition, and Maintenance IT Handbookhttps://ithandbook.ffiec.gov/it-booklets/development-acquisition-and-maintenance/用金融机构系统开发、采购、维护、变更、测试、实施和控制视角校准 AI 辅助需求发现的 SDLC 边界。
FFIEC Management IT Handbookhttps://ithandbook.ffiec.gov/it-booklets/management/用治理、风险管理、战略、资源、架构、第三方和监督语言组织 AI requirements mining 的管理责任。
NIST SP 800-160 Vol. 1 Systems Security Engineeringhttps://csrc.nist.gov/pubs/sp/800/160/v1/upd2/final用 systems security engineering 思路把 security、resilience、stakeholder protection needs 和 assurance 进入需求图谱。
NIST SP 800-218 Secure Software Development Frameworkhttps://csrc.nist.gov/pubs/sp/800/218/final用 SSDF 的 prepare、protect、produce well-secured software、respond to vulnerabilities 思路约束代码/API/测试资产挖掘和安全需求回流。
NIST AI Risk Management Frameworkhttps://www.nist.gov/itl/ai-risk-management-framework用 Govern / Map / Measure / Manage 组织 AI 风险、测量、门禁、监控和持续改进。
ISO/IEC 42001 AI Management Systemhttps://www.iso.org/standard/81230.html用 AI 管理体系、角色责任、运行、绩效评价、改进和内部审核语言组织 operating model。
IIBA / BABOK professional body pagehttps://www.iiba.org/professional-development/knowledge-centre/business-analysis-body-of-knowledge/仅作为业务分析专业体系的公开锚点; 本文不复述或替代 BABOK 受版权保护内容。

一句话:

AI Requirements Mining = evidence-assisted requirements and process intelligence, not automatic BA replacement.


1. Thesis: 需求挖掘不是“把文档丢给大模型”

低成熟度做法:

上传一批 PRD / SOP / ticket
  -> 让 LLM 总结需求
  -> 生成 user stories
  -> 让团队评审

这会产生三个风险:

  1. 把来源权威性不同的材料混在一起, 让会议纪要覆盖正式政策。
  2. 把“文档里出现过的句子”误当作有效需求, 忽略冲突、版本、生效日期和控制边界。
  3. 把 AI 摘要误当 BA 判断, 失去 stakeholder intent、业务取舍、风险接受和架构约束。

成熟做法:

artifact inventory
  -> source authority and permission filter
  -> canonical evidence unit
  -> domain vocabulary and process ontology
  -> candidate extraction with confidence and ambiguity
  -> traceability graph
  -> requirement quality scoring
  -> acceptance criteria and eval contract generation
  -> SME validation and decision log
  -> change impact and portfolio learning loop

AI 在这里的角色是:

AI 可以增强AI 不能替代
跨材料发现重复、冲突、遗漏、隐性规则和例外路径决定业务目标、scope、risk appetite 和上线责任
从非结构化材料提取 stakeholder、event、decision、control、evidence解释监管义务、合规结论或法律适用性
生成 requirement candidate、acceptance criteria、test idea 和 impact hypothesis签署 requirement baseline、模型风险接受或 production gate
建立 traceability graph 和变更影响候选代替 SME 验证、process owner 取舍和 architecture decision
从线上反馈学习需求质量缺陷让未经批准的模型输出成为记录事实或客户承诺

高级表达:

我不会把 AI requirements mining 当成 BA 自动化, 而是把它设计成一个 evidence control plane: 它帮助 BA 看见更多证据、更快发现冲突、更系统地追踪影响, 但 requirement authority 仍来自业务 owner、政策 owner、架构 owner 和风险治理。


2. Source Universe: 不同材料挖不同知识

Source class可提取知识典型噪声需要的控制
PRD / Product briefsproduct intent、persona、journey、feature scope、success metric、release assumptionsolution bias、过期 roadmap、未批准 scopeapproval status、version、owner、target release
BRD / Business casebusiness need、stakeholder need、policy constraint、benefit hypothesis、risk assumption宏观口号、缺 baseline、benefit 夸大benefit evidence、baseline method、decision owner
SOP / work instructionsactivity sequence、role、handoff、evidence required、exception handling本地绕行没有写、真实流程已变effective date、process owner、control linkage
Policies / standards / control libraryobligation、allowed/prohibited action、threshold、approval level、record requirementpolicy 与实际系统能力不一致policy owner、interpretation record、control mapping
Tickets / service requestsuser pain、defect、enhancement、workaround、operational exception重复工单、情绪化描述、root cause 不清dedupe、severity、linked incident、resolved outcome
Jira / Azure DevOpsbacklog item、acceptance criteria、dependency、defect history、release evidencestory 粒度混乱、status 不可信、验收缺证据workflow state audit、link to commits/tests/releases
Call transcripts / chatscustomer intent、agent action、policy friction、complaint language、vulnerable customer signals转写错误、PII、情绪和事实混淆consent/notice、redaction、sampling、QA review
Meeting notes / decision logsopen issue、decision rationale、tradeoff、assumption、stakeholder concern非正式、缺审批、上下文丢失decision authority、attendee role、follow-up trace
Process maps / BPMNintended flow、control point、role responsibility、SLA、handoff画的是理想流程, 不是生产流程conformance with event log、versioning
Code / API specsactual behavior、contract、validation rule、integration dependency、error codeimplementation detail 被误当业务规则code owner、production version、security review
Test cases / UAT scriptsexpected behavior、edge case、acceptance logic、regression scope只覆盖 happy path、测试数据不真实requirement link、test result、coverage
Production logs / telemetryreal event、variant、latency、failure、usage、exception path缺业务语义、日志采样、PII/secret 风险event schema、minimization、retention、access
Controls / audit evidencecontrol objective、test result、issue、remediation、attestation控制语言与产品语言断开evidence owner、control frequency、issue linkage

核心原则:

Never mine requirements from text alone.
Mine from artifacts + authority + time + owner + permissions + production evidence.

3. Source Authority Model

需求挖掘最容易失败的地方是没有 source authority。AI 会把“说得像真”的材料合并成一个流畅答案, 但高级 BA/Architect 必须保留权威差异。

3.1 Authority ladder

LevelSource默认解释
A1 Governing authority法规/监管说明、正式政策、control standard、approved architecture principle约束边界, 不由 AI 或项目团队单方改写
A2 Business authoritysigned-off BRD、approved product requirement、process owner approved SOP可作为需求 baseline 输入
A3 System authorityproduction code/API contract/config、system of record、workflow engine rules代表当前系统事实, 不自动代表业务应然
A4 Operational authoritytickets、call transcripts、case notes、incident logs、QA findings代表真实痛点、例外和风险信号
A5 Collaboration evidencemeeting notes、workshop boards、chat decisions代表候选意图, 需要 owner 追认
A6 Generated artifactAI summary、draft user story、cluster label、candidate diagram只能作为工作草稿, 必须回链来源和人工确认

3.2 Conflict resolution matrix

ConflictExample处理原则
Policy vs PRDPRD 写“自动豁免费用”, 政策要求主管审批policy/control wins; 需求改为 AI 起草和审批路由
SOP vs event logSOP 说三步完成, 日志显示七个常见变体用 process mining 解释变体, 区分可接受例外和控制偏离
Ticket vs code用户认为功能缺失, API 已有字段但 UI 未暴露形成 experience gap 和 integration impact, 不直接判定需求重复
Meeting note vs approved baseline会议提出新 scope, baseline 未变更形成 change request candidate, 进入 impact assessment
Test case vs requirementUAT 脚本验证了未写入需求的行为形成 implicit requirement candidate, 要求 owner 决策是否纳入 baseline
Log behavior vs policy生产存在绕过 maker-checker 的路径形成 control gap, 进入 risk issue 与 remediation path

一句话:

Requirement mining output must carry authority metadata, or it is just text summarization.


4. Reference Architecture

Artifact connectors
  PRD | BRD | SOP | policy | Jira/Azure DevOps | transcripts | notes
  BPMN | code/API specs | tests | logs | controls
        |
        v
Ingestion and evidence normalization
  document parser | OCR/layout | transcript diarization | issue schema | log event schema
  artifact hash | version | owner | effective date | record class
        |
        v
Permission and purpose filter
  entitlement | data classification | PII redaction | legal hold flag | purpose-bound access
        |
        v
Domain vocabulary and process ontology
  terms | synonyms | products | roles | activities | events | policies | controls
        |
        v
Extraction services
  requirement candidate | event | actor | stakeholder | decision | evidence | control
  ambiguity | conflict | gap | assumption | dependency | acceptance criteria
        |
        v
Traceability graph
  need -> requirement -> process step -> data/API -> test -> control -> eval -> release
        |
        v
Quality scoring and eval contract
  29148-style quality rubric | groundedness | conflict check | completeness | testability
        |
        v
SME validation workbench
  approve | reject | merge | split | escalate | risk accept | defer with rationale
        |
        v
Change impact and portfolio learning
  impacted journeys | systems | APIs | controls | tests | policies | releases | benefits

4.1 Architectural principles

PrincipleProduct / architecture implication
Source-first每个 candidate requirement 必须可回到原始 artifact、位置、版本和 owner。
Permission-before-retrieval不先检索再过滤; source connector、index、chunk、graph query 和输出层都要权限约束。
Draft is not baselineAI 生成的是 candidate, 只有 human-authorized decision 才能成为 baseline。
Requirement as graph需求不是 Word 段落, 而是连接 need、process、data、system、test、control、eval 和 release 的图。
Ambiguity is a feature系统必须显式标注不确定、冲突、缺证据和多解释, 而不是强行给一个顺滑答案。
Controls are first-class访问、记录、隐私、模型评估、变更、人工复核和审计证据要从架构首日进入。
Production learns back工单、投诉、日志、QA、incident、override 和 eval failure 回流需求图谱和 portfolio。

5. Canonical Knowledge Objects

ObjectMinimum fields为什么重要
Artifactartifact_id, type, source_system, owner, version, status, effective_date, hash, record_class, permission_tag防止无来源文本进入需求系统
Evidence unitevidence_id, artifact_id, span/page/section/event_id, extracted_text, normalized_fact, confidence, permission_scope把证据粒度细到可引用和可复核
Domain termterm_id, preferred_term, synonyms, forbidden_terms, product_scope, owner, examples统一业务词汇, 降低误解和幻觉
Process activityactivity_id, name, actor_role, input, output, precondition, postcondition, SLA, control_points连接需求、流程和系统行为
Process eventevent_id, case_id, activity, timestamp, resource, channel, status, outcome, source_log支持变体发现和事实校验
Stakeholder concernconcern_id, stakeholder_role, need, pain, risk, desired_outcome, evidence_refs保留需求背后的真实动机
Requirement candidatereq_candidate_id, statement, type, source_refs, authority_level, ambiguity_flags, conflicts, quality_scoreAI 输出的受控中间资产
Requirement baselinereq_id, approved_statement, owner, version, priority, acceptance_criteria, linked_eval, decision_record经授权的需求资产
Acceptance criterionac_id, condition, observable_result, data_needed, negative_case, edge_case, evidence_required把自然语言需求变成可验证行为
Eval contracteval_id, requirement_refs, dataset, rubric, thresholds, critical_failures, slices, release_decision_rule让 AI 相关需求可重复评估
Control linkcontrol_id, control_objective, requirement_refs, test_method, frequency, owner, evidence让产品需求和控制体系互相可追踪
Change impactimpact_id, changed_object, impacted_requirements, processes, systems, controls, tests, releases, owner支持变更评审和回归分析

6. Domain Vocabulary and Ambiguity Detection

6.1 Vocabulary is architecture

金融零售需求挖掘中, “同义词”经常不是同义:

Term cluster需要区分
customer / member / applicant / account holder / beneficial owner / authorized user身份、权益、权限和通知责任不同
approve / pre-approve / recommend / route / auto-decision决策权、客户影响和控制要求不同
fee waiver / refund / adjustment / goodwill credit账务、授权、政策和客户沟通不同
complaint / dispute / inquiry / request / incident监管处理、SLA、记录和升级路径不同
verified / validated / reviewed / attested / approved证据强度和责任主体不同
requirement / policy / control / design decision / test expectation生命周期和变更治理不同

6.2 Ambiguity detector

Ambiguity typeSignalAI mining actionHuman decision needed
Actor ambiguity“用户可以提交”但 user 指客户、员工或系统不清标注 actor_unknown, 生成角色澄清问题Product owner / process owner 定义 actor
Decision ambiguity“系统判断是否通过”未区分建议和最终决定标注 decision_boundary_unknownRisk / business owner 定义 AI role
Data ambiguity“使用客户信息”未说明字段、来源和目的标注 data_scope_unknownData owner / privacy owner 确认最小数据集
Evidence ambiguity“需提供证明”未说明证明类型和可接受证据标注 evidence_standard_unknownPolicy / operations owner 定义 evidence rule
Timing ambiguity“及时处理”无 SLA、cutoff、时区标注 timing_unknownProcess owner 定义 measurable SLA
Exception ambiguity“特殊情况人工处理”未定义特殊情况标注 exception_taxonomy_missingSME 定义 exception classes
Control ambiguity“需审批”未说明审批角色和阈值标注 control_owner_missingControl owner 定义 approval matrix
Testability ambiguity需求无法观察和验证标注 not_testableBA/QA/Architect 改写为可验收条件

输出格式要强制包含:

candidate_statement
source_refs
authority_level
known_facts
unknowns
conflicts
ambiguity_flags
recommended_questions
validation_owner

7. Extraction Patterns

7.1 Requirement candidate extraction

Pattern从哪里来抽取逻辑示例输出
Explicit shall / mustpolicy、SOP、approved BRD提取强制性动词、对象、条件、例外“在 dispute evidence 缺失时必须发起补件请求。”
Implicit expectationtest cases、tickets、UAT notes从 expected result 和 defect 描述反推需求候选“地址变更后通知偏好应同步到 servicing profile。”
Constraint extractionpolicy、architecture standards、API specs提取禁止行为、限额、依赖和格式“AI 不得直接更新 fee waiver 状态。”
Process step extractionSOP、BPMN、event logs提取 activity、actor、input、output、control point“L2 reviewer 在 high-risk case close 前执行二次复核。”
Data requirement extractionAPI specs、forms、database schema、logs提取字段、来源、质量、权限和保留要求“application_id 必须可关联 LOS、document store 和 CRM case。”
Acceptance extractionUAT、test cases、BDD、support runbooks提取可观察结果、negative case 和 edge case“当 policy version 过期时, 系统拒绝生成 final answer。”
Risk/control extractioncontrol library、audit issue、incident提取 control objective、failure mode、test evidence“高影响输出必须记录 reviewer approval。”

7.2 Process knowledge extraction

Knowledge typeEvidence signals用途
Case lifecyclecreated、assigned、reviewed、escalated、closed events还原真实流程和 cycle time
Variantactivity sequence patterns across cases识别主流程、长尾和异常路径
Handoffresource/team transition定位交接成本和责任断点
Reworkrepeated activity or status regression找资料缺失、规则冲突和系统缺陷
Waitinggap between events识别 queue、SLA、外部依赖
Control executionapproval event、review flag、dual control record验证控制是否执行
Exceptionoverride、manual adjustment、special handling定义例外 taxonomy 和 AI escalation
Outcomeapproved、declined、closed、refunded、filed、withdrawn建立 result-to-path learning loop

7.3 Stakeholder and evidence extraction

Extracted objectFromKey fields
StakeholderBRD、meeting notes、RACI、Jira assignee、process mapsrole、decision right、concern、approval needed
Pain pointtickets、transcripts、retro notes、complaintspersona、journey step、frequency、severity、evidence
Decisionmeeting notes、ADR、change board、release notesdecision、alternatives、rationale、owner、date
Evidence standardpolicy、SOP、control test、audit findingevidence type、source、minimum quality、retention
Assumptionbusiness case、PRD、architecture briefassumption、impact if false、validation plan
DependencyAPI specs、Jira links、code repos、release planupstream/downstream、owner、version、risk

8. Requirement Quality Rubric

参考 requirements engineering 的质量视角, AI mining 不能只输出“看起来像需求”的句子。每个 candidate 至少按下面维度评分。

DimensionHigh-quality signalRed flagSuggested gate
Grounded有来源、位置、版本、owner、authority level只有 AI 总结, 无原文证据source_refs required
Necessary连接 business outcome、risk control 或 user pain只是技术炫技或单个 stakeholder 偏好outcome link required
Unambiguousactor、object、condition、decision boundary 清楚“快速、智能、友好、及时”无定义ambiguity flags must close
FeasibleAPI、数据、权限、流程和 operating model 可支持依赖不存在数据或越权动作architecture review
Verifiable有 acceptance criteria、test method、eval metric无法观察是否满足AC/eval required
Boundedscope、exception、negative case、out-of-scope 清楚无限泛化、覆盖所有情况scope boundary required
Consistent与政策、控制、SOP、API、其他需求不冲突冲突未记录conflict resolution required
Traceable可连到 source、process、system、test、control、release文档段落孤立graph link required
Risk-tiered区分低/中/高影响和 human oversight高影响场景只写功能risk tier required
Maintainableowner、version、change trigger 清楚需求没人负责lifecycle owner required

Candidate scoring:

0 = not usable
1 = candidate only
2 = grounded but ambiguous
3 = clear and traceable
4 = testable and risk-tiered
5 = baseline-ready with eval/control links

Release baseline rule:

No AI-mined requirement enters baseline below score 4.
High-impact AI behavior requires score 5 and SME approval evidence.

9. Traceability Graph

传统 traceability matrix 在 AI 项目里很快失效, 因为 requirement 同时连接 prompt、RAG corpus、model、tools、data contracts、controls、eval cases 和 production feedback。更可持续的是 graph。

9.1 Minimum graph schema

Business outcome
  -> stakeholder need
  -> requirement
  -> process activity
  -> data object / API / policy / control
  -> acceptance criterion
  -> eval case / test case
  -> release gate decision
  -> production signal
  -> change request / improvement

9.2 Edge types

EdgeMeaningExample
derives_from需求来自某证据requirement derives_from SOP section 4.2
constrains政策/控制约束需求policy constrains fee waiver AI action
implements系统/API 实现需求API endpoint implements address validation
verifies测试/eval 验证需求eval case verifies grounded policy answer
conflicts_with两个对象冲突transcript practice conflicts_with policy
supersedes新版本替代旧版本SOP v7 supersedes SOP v6
impacts变更影响对象API schema change impacts acceptance criteria
monitors生产指标监控需求unsupported claim rate monitors groundedness
remediates改进项修复缺陷change request remediates ambiguity defect

9.3 AI traceability-specific nodes

NodeWhy it matters
Prompt/versionprompt 改动可能改变 requirement interpretation 或输出结构
Retrieval corpus/versionpolicy/SOP 更新必须触发 eval 回归
Chunk/evidence id防止模型引用错误来源或过期证据
Tool permissionAI 能读/写/调用哪些系统直接决定风险
Eval dataset/slice平均分不足以覆盖高风险场景
Human overridereviewer 为什么覆盖 AI 输出是需求学习信号
Incident/complaint线上问题必须回到需求和 eval 缺口

10. Acceptance Criteria and Eval Contract

10.1 Acceptance criteria grammar

Given [source evidence / process state / user role / policy version],
when [AI or system action occurs],
then [observable output / workflow action / control evidence],
and [source citation / permission / escalation / negative case rule],
with [metric threshold / reviewer rule / log evidence].

Example:

Given a customer fee dispute transcript and the active fee waiver policy,
when the AI drafts an agent response,
then every policy-based statement must cite the active policy section,
and any waiver recommendation above the role limit must be routed to supervisor approval,
with critical unauthorized commitment defects equal to zero in release eval.

10.2 Eval contract template

Field内容
eval_idREQ-MINING-FEE-DISPUTE-GROUNDEDNESS-001
requirement_refslinked requirement ids and acceptance criteria
datasetapproved sample set, source period, permission scope, redaction method
slicesproduct、channel、customer segment、policy version、exception type、risk tier
tasksextract requirement, cite evidence, detect ambiguity, generate AC, classify control
rubricgroundedness、completeness、conflict detection、testability、risk boundary
thresholdsaverage score, minimum slice score, critical failure blocker
critical failuresunsupported policy claim、permission leak、wrong authority source、missed high-impact ambiguity
evaluatorSME, QA, model evaluation team, independent challenger
decision rulego / limited go / no-go / rollback triggers
evidencerun manifest、model version、prompt version、corpus version、review results

10.3 Eval dimensions for requirements mining

DimensionMeasuresFailure example
Extraction precisioncandidate 是否真是需求/流程知识把会议闲聊生成 requirement
Extraction recall关键需求、控制、例外是否漏掉漏掉 supervisor approval requirement
Groundedness每个结论是否有正确来源引用过期 SOP 或无关 ticket
Authority awareness是否区分政策、PRD、ticket 和 AI draft让 ticket 覆盖 approved policy
Ambiguity detection是否标出 actor/data/decision/control 不清把“及时处理”当完整 SLA
Conflict detection是否发现材料间冲突漏掉 SOP 与 production log 不一致
Quality scoringrubric 是否和人工专家一致高风险模糊需求被评为 baseline-ready
Impact analysis是否找出相关系统、流程、测试、控制API 变更未提示 UAT 和 control impact

11. Hallucination, Privacy and Records Controls

11.1 Hallucination control

ControlDesign requirement
Evidence-bound generation输出必须只基于 retrieved evidence unit, 无证据时显示 unknown / needs validation。
Source authority ranking冲突时显示 authority ladder 和 conflict, 不自动融合成一个答案。
Structured output schemacandidate、source_refs、ambiguity_flags、quality_score、validation_owner 分字段输出。
Unsupported claim detector对每个 requirement statement 做 citation coverage 和 semantic entailment 检查。
Version-aware retrieval检索必须考虑 effective_date、retired status、product scope 和 jurisdiction scope。
Critical failure block发现权限泄露、伪造引用、错误政策引用或高影响决策越权时阻断输出进入工作流。
Human validation queuescore 不足、冲突未解、权限边界不明或高影响需求进入 SME review。

11.2 Permission filtering

identity and role
  -> source connector entitlement
  -> document-level ACL
  -> chunk-level permission tag
  -> graph edge visibility
  -> retrieval-time filter
  -> output redaction
  -> audit log

Permission design questions:

  1. Analyst 能不能看到 call transcript 中的全部 PII, 还是只看 redacted evidence?
  2. Vendor LLM 能不能处理 records classified material?
  3. Cross-product team 能不能检索另一个 product 的 complaint notes?
  4. Legal hold material 是否允许进入 training 或 synthetic eval generation?
  5. 输出是否可能通过 summary 泄露用户本不该看到的来源?

11.3 Privacy and records controls

AreaGuardrail
Data minimization只摄取挖掘任务需要的字段和片段, 对日志与转写默认脱敏和分桶。
Purpose limitationrequirement mining、eval、training、analytics 分用途授权, 不混用。
Retentionraw artifact、derived chunk、embedding、summary、review note、eval result 分别定义保留和处置。
Legal holdhold flag 传播到原文、索引、向量、图谱、AI 输出、review notes 和导出包。
Records provenanceartifact hash、ingestion run、parser version、model version、review decision 可回放。
Access audit记录谁检索了什么、输出了什么、导出了什么、用于哪个项目。
Vendor boundary第三方处理、子处理者、区域、日志、删除证明和退出计划进入 architecture review。

12. SME Validation Workbench

需求挖掘系统必须给 SME 一个可操作的 review surface, 而不是给一页 AI 总结。

Review actionMeaningCaptured evidence
Approvecandidate 可进入 baseline 或 backlogapprover、role、timestamp、source refs、quality score
Rejectcandidate 不成立reason code: not requirement / duplicate / wrong source / obsolete / out of scope
Merge多个 candidate 合并merged ids、canonical wording、rationale
Split一个 candidate 包含多个要求child requirements、scope boundaries
Escalate需要 policy/risk/legal/architecture 判断escalation owner、question、deadline、decision
Reclassify类型或 authority level 错误old/new class、rationale
Add evidenceSME 补充缺失来源evidence id、artifact owner、access scope
Mark conflictSME 发现冲突conflicting nodes、resolution path

Reviewer metrics:

MetricInterpretation
approval rate by source class哪些材料更能产生高质量需求
reject reason distribution模型或 source hygiene 的主要缺陷
ambiguity closure time需求澄清吞吐能力
SME disagreement rate领域词汇或政策解释是否不稳定
override-to-eval feedback latency线上学习是否进入 eval 回归

13. Change Impact Architecture

当政策、API、流程、模型或控制改变时, requirements mining 系统应能回答:

What changed?
Which requirements depend on it?
Which journeys and stakeholders are impacted?
Which tests and eval cases must run?
Which controls and records are affected?
Which release or portfolio commitments may move?
Who must approve?

13.1 Impact dimensions

Changed objectImpact candidates
Policy sectionrequirements, acceptance criteria, RAG corpus, agent scripts, training, control tests
SOP stepprocess map, workflow engine, roles, SLA, task mining interpretation, QA sampling
API schemaUI fields, validation, integration tests, data contract, downstream reports
Code ruleimplicit requirement, defect trend, regression test, control logic
Test casecoverage, acceptance criteria, requirement quality score
Production log patternprocess variant, exception taxonomy, AI intervention point
Control findingrequirement priority, risk gate, release scope, remediation backlog
Model/prompt changeeval contract, output schema, hallucination risk, reviewer instructions

13.2 Change impact scoring

FactorLowMediumHigh
Customer impactinternal productivitystaff decision supportcustomer rights, money, access, eligibility
Policy/control impactno control linkone local controlenterprise policy or high-risk control
System dependencysingle UIAPI/data contractsystem of record or cross-platform
Evidence confidencedirect sourceinferred from multiple sourcesconflict or missing authority
Reversibilityeasy rollbackmanual correctionhard to unwind or records impact
Eval coverageexisting regressionpartial casesno validated eval slice

High score changes require architecture review、SME approval、regression eval and release gate evidence.


14. PM / BA / Architect Implications

Role高级职责
AI PM把 mining output 变成 portfolio decisions: 哪些需求进入产品路线, 哪些只是流程治理, 哪些需要风险接受。
Senior BA设计 elicitation-by-evidence 流程, 管理 ambiguity、conflict、traceability、SME validation 和 requirement baseline。
Product Architect定义 requirement graph、API/data/control linkage、AI role boundary 和 change impact architecture。
Enterprise Architect把需求挖掘能力纳入 enterprise knowledge architecture、records、AI platform、governance 和 reuse strategy。
Process Owner判断真实流程变体、可接受例外、SOP 修订和 operating model change。
Risk / Control Partner把 policy、control objective、test evidence、issue remediation 接入 requirement lifecycle。
Data / Privacy Owner定义可摄取数据、权限、脱敏、保留、legal hold 和跨境边界。
QA / EvalOps把 mined requirements 转成 test cases、eval datasets、rubrics、thresholds 和 regression gates。

高级 PM 判断:

If the output cannot be traced, validated, tested, controlled and maintained,
it is not a requirement asset; it is an AI note.

15. Anti-Patterns

Anti-pattern表现风险修正
Upload-and-summarize把文档批量上传后要 user stories幻觉、过期来源、权限泄露artifact inventory + source authority + structured extraction
Ticket-driven product truth票多就是需求优先级重复、投诉偏差、缺业务价值dedupe + severity + journey + outcome scoring
Policy-blind AI mining只看 PRD 和会议纪要需求越过政策和控制policy/control corpus as first-class source
No permission boundary一个索引服务所有项目数据越权和记录风险retrieval-time ACL + output redaction + audit
Average quality score只看总体准确率高风险 slice 失败被掩盖slice thresholds + critical failure blockers
No SME workflowAI 输出直接进 backlog错误需求规模化approve/reject/merge/split decision log
No graph用 Excel 维护 traceability变更影响不可控requirement-process-system-test-control graph
Treat code as policy从代码反推出“应该如此”固化历史缺陷distinguish current behavior from desired requirement
Ignore production logs只挖文档看不见真实流程变体event log and ticket feedback loop
No portfolio learning每个项目重新挖组织不学习vocabulary、rubric、eval、pattern reuse

16. Evidence Pack

一个可审计的 requirements mining initiative 至少应保留:

Evidence内容
Source inventoryartifact、owner、version、status、authority、permission、retention、hash
Ingestion manifestparser、OCR/transcript model、chunking rule、embedding model、run time
Vocabulary registerdomain terms、synonyms、scope、owner、approved definitions
Extraction run reportprompt/model/version、candidate count、confidence、critical failures
Quality rubric resultsscore distribution、low-score reason、review queue
Conflict and ambiguity logconflict type、source refs、owner、resolution decision
SME validation logapprove/reject/merge/split/escalate actions and rationale
Traceability graph snapshotrequirements、process、systems、tests、controls、evals、release
Eval contract and reportdataset、rubric、threshold、slices、decision、review evidence
Permission auditaccess decisions、redaction、exports、exceptions
Change impact reportchanged objects、impacted assets、required approvals
Portfolio learning reportreusable patterns、new vocabulary、eval failures、roadmap implications

17. Interview Answers

Q1: AI requirements mining 会不会替代 BA?

30 秒回答:

不会。它替代的是低价值的搜文档、初步归类和重复比对, 不是 BA 的业务判断、stakeholder negotiation、scope tradeoff、risk acceptance 和 baseline ownership。我的设计会把 AI 输出固定为 candidate, 只有带来源、质量评分、SME 验证和 traceability 的内容才能进入需求基线。

2 分钟展开:

AI requirements mining 的价值是扩大 evidence surface: PRD、SOP、ticket、transcript、logs、tests 和 controls 都能被系统化读取。它能发现重复、冲突、模糊和隐性需求, 但不能决定哪个 stakeholder 的目标优先, 不能解释合规义务, 不能为生产上线承担责任。所以我会建立 source authority、permission filtering、requirement quality rubric、traceability graph、eval contract 和 SME validation workflow。AI 做 discovery and drafting, BA/PM/Architect 做 decision and accountability。

Q2: 如何防止 AI 从过期或低权威材料中生成错误需求?

我会给每个 artifact 加 owner、version、approval status、effective date、authority level 和 permission tag。检索时先按权限和时间过滤, 冲突时按 authority ladder 显示差异, 不让模型自动融合。输出必须包含 source_refs、authority_level、ambiguity_flags 和 validation_owner。低权威来源只能生成 candidate 或 pain signal, 不能直接进入 baseline。

Q3: 你如何把 mined requirement 变成可上线的 AI requirement?

我会经过四步: 第一, 质量评分, 确认 grounded、unambiguous、feasible、verifiable、traceable 和 risk-tiered; 第二, 写 acceptance criteria, 包括 negative case 和 evidence requirement; 第三, 生成 eval contract, 定义数据集、rubric、threshold、critical failure 和 slice; 第四, 通过 SME validation 和 release gate, 把 requirement、eval、control、release decision 绑定成证据链。

Q4: 生产日志和流程挖掘在需求挖掘里有什么价值?

文档告诉我们流程应该怎么跑, 日志告诉我们流程实际怎么跑。生产日志能发现 variant、handoff、rework、waiting、control execution 和 exception path。AI requirements mining 如果只读文档, 很容易把理想流程产品化; 加上 event logs 后, 才能识别真实瓶颈、例外路径和控制偏离, 并判断 AI 应该介入、流程应该重构, 还是系统集成才是正确解法。

Q5: 如何处理从代码/API/测试用例反推出来的需求?

我会把它们标记为 actual behavior 或 implicit requirement candidate, 不直接当业务需求。代码/API/test 可以揭示系统事实、字段约束、集成边界和历史验收预期, 但也可能固化旧缺陷。需要 business owner、architecture owner 和 QA 一起判断: 是保留为 baseline、修正为新需求, 还是作为技术债进入 backlog。


18. Portfolio Learning Loop

成熟组织不会让每个项目从零开始挖需求。每次 mining 都应沉淀:

validated vocabulary
  -> reusable requirement patterns
  -> acceptance criteria library
  -> eval cases and red-team cases
  -> process variant taxonomy
  -> control mapping templates
  -> source authority rules
  -> anti-pattern catalogue
  -> portfolio opportunity scoring

Portfolio-level metrics:

Metric管理含义
candidate-to-baseline conversion ratesource quality and model extraction usefulness
ambiguity density by domain哪些产品/流程需求成熟度低
conflict density by source pair哪些政策、SOP、系统或运营实践不一致
eval failure recurrence哪些需求质量问题反复造成 AI 风险
source freshness gap文档和生产事实的时效差
impact analysis lead time变更评审速度和 traceability 健康度
reuse rate of requirement patterns组织是否形成可复用 BA/architecture assets

最终心智模型:

AI requirements mining 的终局不是生成更多 user stories, 而是建立一个会学习的需求与流程知识系统: 它知道证据来自哪里、谁有权解释、哪里有冲突、什么能验证、哪些控制必须执行, 以及每次上线后如何把真实反馈变成更好的需求资产。