AI Requirements Mining:需求与流程知识抽取架构
重要说明: 本文是学习、作品集和内部架构训练材料, 不构成法律意见、合规结论、监管解释、审计意见、记录保留结论、模型验证报告或采购建议。正式项目必须由 Legal、Compliance、Privacy、Records Management、Information Security、Model Risk、Operational Risk、Internal Audit、Business Owner、D
AI Requirements Mining / Process Knowledge Extraction Architecture 解读
面向对象: CBAP+ Senior BA / AI PM / Product Architect / Enterprise Architect / Solution Architect / Process Owner / Risk and Control Partner。 核心问题: 如何从 PRD、BRD、SOP、政策、工单、Jira / Azure DevOps、通话转写、会议纪要、流程图、代码/API 规格、测试用例、生产日志和控制证据中提取需求、流程知识、证据链和变更影响, 同时避免把 AI requirements mining 误解为“自动替代 BA”。 学习目标: 建立 artifact ingestion、source authority、permission filtering、domain vocabulary、ambiguity detection、requirement quality rubric、process variant discovery、traceability graph、eval contract、SME validation、hallucination control、records/privacy controls 和 portfolio learning loop 的高级架构心智模型。
重要说明: 本文是学习、作品集和内部架构训练材料, 不构成法律意见、合规结论、监管解释、审计意见、记录保留结论、模型验证报告或采购建议。正式项目必须由 Legal、Compliance、Privacy、Records Management、Information Security、Model Risk、Operational Risk、Internal Audit、Business Owner、Data Owner 和 Architecture 结合机构类型、司法辖区、产品、客户影响和内部政策确认。访问日期按 2026-06-30 记录。
Source Anchors
| Source | Official link | 本文使用方式 |
|---|---|---|
| ISO/IEC/IEEE 29148 Requirements Engineering | https://www.iso.org/standard/72089.html 和 https://standards.ieee.org/ieee/29148/6937/ | 用 requirements lifecycle、stakeholder need、system/software requirement、quality characteristics 和 traceability 作为需求资产治理锚点。 |
| FFIEC Development, Acquisition, and Maintenance IT Handbook | https://ithandbook.ffiec.gov/it-booklets/development-acquisition-and-maintenance/ | 用金融机构系统开发、采购、维护、变更、测试、实施和控制视角校准 AI 辅助需求发现的 SDLC 边界。 |
| FFIEC Management IT Handbook | https://ithandbook.ffiec.gov/it-booklets/management/ | 用治理、风险管理、战略、资源、架构、第三方和监督语言组织 AI requirements mining 的管理责任。 |
| NIST SP 800-160 Vol. 1 Systems Security Engineering | https://csrc.nist.gov/pubs/sp/800/160/v1/upd2/final | 用 systems security engineering 思路把 security、resilience、stakeholder protection needs 和 assurance 进入需求图谱。 |
| NIST SP 800-218 Secure Software Development Framework | https://csrc.nist.gov/pubs/sp/800/218/final | 用 SSDF 的 prepare、protect、produce well-secured software、respond to vulnerabilities 思路约束代码/API/测试资产挖掘和安全需求回流。 |
| NIST AI Risk Management Framework | https://www.nist.gov/itl/ai-risk-management-framework | 用 Govern / Map / Measure / Manage 组织 AI 风险、测量、门禁、监控和持续改进。 |
| ISO/IEC 42001 AI Management System | https://www.iso.org/standard/81230.html | 用 AI 管理体系、角色责任、运行、绩效评价、改进和内部审核语言组织 operating model。 |
| IIBA / BABOK professional body page | https://www.iiba.org/professional-development/knowledge-centre/business-analysis-body-of-knowledge/ | 仅作为业务分析专业体系的公开锚点; 本文不复述或替代 BABOK 受版权保护内容。 |
一句话:
AI Requirements Mining = evidence-assisted requirements and process intelligence, not automatic BA replacement.
1. Thesis: 需求挖掘不是“把文档丢给大模型”
低成熟度做法:
上传一批 PRD / SOP / ticket
-> 让 LLM 总结需求
-> 生成 user stories
-> 让团队评审
这会产生三个风险:
- 把来源权威性不同的材料混在一起, 让会议纪要覆盖正式政策。
- 把“文档里出现过的句子”误当作有效需求, 忽略冲突、版本、生效日期和控制边界。
- 把 AI 摘要误当 BA 判断, 失去 stakeholder intent、业务取舍、风险接受和架构约束。
成熟做法:
artifact inventory
-> source authority and permission filter
-> canonical evidence unit
-> domain vocabulary and process ontology
-> candidate extraction with confidence and ambiguity
-> traceability graph
-> requirement quality scoring
-> acceptance criteria and eval contract generation
-> SME validation and decision log
-> change impact and portfolio learning loop
AI 在这里的角色是:
| AI 可以增强 | AI 不能替代 |
|---|---|
| 跨材料发现重复、冲突、遗漏、隐性规则和例外路径 | 决定业务目标、scope、risk appetite 和上线责任 |
| 从非结构化材料提取 stakeholder、event、decision、control、evidence | 解释监管义务、合规结论或法律适用性 |
| 生成 requirement candidate、acceptance criteria、test idea 和 impact hypothesis | 签署 requirement baseline、模型风险接受或 production gate |
| 建立 traceability graph 和变更影响候选 | 代替 SME 验证、process owner 取舍和 architecture decision |
| 从线上反馈学习需求质量缺陷 | 让未经批准的模型输出成为记录事实或客户承诺 |
高级表达:
我不会把 AI requirements mining 当成 BA 自动化, 而是把它设计成一个 evidence control plane: 它帮助 BA 看见更多证据、更快发现冲突、更系统地追踪影响, 但 requirement authority 仍来自业务 owner、政策 owner、架构 owner 和风险治理。
2. Source Universe: 不同材料挖不同知识
| Source class | 可提取知识 | 典型噪声 | 需要的控制 |
|---|---|---|---|
| PRD / Product briefs | product intent、persona、journey、feature scope、success metric、release assumption | solution bias、过期 roadmap、未批准 scope | approval status、version、owner、target release |
| BRD / Business case | business need、stakeholder need、policy constraint、benefit hypothesis、risk assumption | 宏观口号、缺 baseline、benefit 夸大 | benefit evidence、baseline method、decision owner |
| SOP / work instructions | activity sequence、role、handoff、evidence required、exception handling | 本地绕行没有写、真实流程已变 | effective date、process owner、control linkage |
| Policies / standards / control library | obligation、allowed/prohibited action、threshold、approval level、record requirement | policy 与实际系统能力不一致 | policy owner、interpretation record、control mapping |
| Tickets / service requests | user pain、defect、enhancement、workaround、operational exception | 重复工单、情绪化描述、root cause 不清 | dedupe、severity、linked incident、resolved outcome |
| Jira / Azure DevOps | backlog item、acceptance criteria、dependency、defect history、release evidence | story 粒度混乱、status 不可信、验收缺证据 | workflow state audit、link to commits/tests/releases |
| Call transcripts / chats | customer intent、agent action、policy friction、complaint language、vulnerable customer signals | 转写错误、PII、情绪和事实混淆 | consent/notice、redaction、sampling、QA review |
| Meeting notes / decision logs | open issue、decision rationale、tradeoff、assumption、stakeholder concern | 非正式、缺审批、上下文丢失 | decision authority、attendee role、follow-up trace |
| Process maps / BPMN | intended flow、control point、role responsibility、SLA、handoff | 画的是理想流程, 不是生产流程 | conformance with event log、versioning |
| Code / API specs | actual behavior、contract、validation rule、integration dependency、error code | implementation detail 被误当业务规则 | code owner、production version、security review |
| Test cases / UAT scripts | expected behavior、edge case、acceptance logic、regression scope | 只覆盖 happy path、测试数据不真实 | requirement link、test result、coverage |
| Production logs / telemetry | real event、variant、latency、failure、usage、exception path | 缺业务语义、日志采样、PII/secret 风险 | event schema、minimization、retention、access |
| Controls / audit evidence | control objective、test result、issue、remediation、attestation | 控制语言与产品语言断开 | evidence owner、control frequency、issue linkage |
核心原则:
Never mine requirements from text alone.
Mine from artifacts + authority + time + owner + permissions + production evidence.
3. Source Authority Model
需求挖掘最容易失败的地方是没有 source authority。AI 会把“说得像真”的材料合并成一个流畅答案, 但高级 BA/Architect 必须保留权威差异。
3.1 Authority ladder
| Level | Source | 默认解释 |
|---|---|---|
| A1 Governing authority | 法规/监管说明、正式政策、control standard、approved architecture principle | 约束边界, 不由 AI 或项目团队单方改写 |
| A2 Business authority | signed-off BRD、approved product requirement、process owner approved SOP | 可作为需求 baseline 输入 |
| A3 System authority | production code/API contract/config、system of record、workflow engine rules | 代表当前系统事实, 不自动代表业务应然 |
| A4 Operational authority | tickets、call transcripts、case notes、incident logs、QA findings | 代表真实痛点、例外和风险信号 |
| A5 Collaboration evidence | meeting notes、workshop boards、chat decisions | 代表候选意图, 需要 owner 追认 |
| A6 Generated artifact | AI summary、draft user story、cluster label、candidate diagram | 只能作为工作草稿, 必须回链来源和人工确认 |
3.2 Conflict resolution matrix
| Conflict | Example | 处理原则 |
|---|---|---|
| Policy vs PRD | PRD 写“自动豁免费用”, 政策要求主管审批 | policy/control wins; 需求改为 AI 起草和审批路由 |
| SOP vs event log | SOP 说三步完成, 日志显示七个常见变体 | 用 process mining 解释变体, 区分可接受例外和控制偏离 |
| Ticket vs code | 用户认为功能缺失, API 已有字段但 UI 未暴露 | 形成 experience gap 和 integration impact, 不直接判定需求重复 |
| Meeting note vs approved baseline | 会议提出新 scope, baseline 未变更 | 形成 change request candidate, 进入 impact assessment |
| Test case vs requirement | UAT 脚本验证了未写入需求的行为 | 形成 implicit requirement candidate, 要求 owner 决策是否纳入 baseline |
| Log behavior vs policy | 生产存在绕过 maker-checker 的路径 | 形成 control gap, 进入 risk issue 与 remediation path |
一句话:
Requirement mining output must carry authority metadata, or it is just text summarization.
4. Reference Architecture
Artifact connectors
PRD | BRD | SOP | policy | Jira/Azure DevOps | transcripts | notes
BPMN | code/API specs | tests | logs | controls
|
v
Ingestion and evidence normalization
document parser | OCR/layout | transcript diarization | issue schema | log event schema
artifact hash | version | owner | effective date | record class
|
v
Permission and purpose filter
entitlement | data classification | PII redaction | legal hold flag | purpose-bound access
|
v
Domain vocabulary and process ontology
terms | synonyms | products | roles | activities | events | policies | controls
|
v
Extraction services
requirement candidate | event | actor | stakeholder | decision | evidence | control
ambiguity | conflict | gap | assumption | dependency | acceptance criteria
|
v
Traceability graph
need -> requirement -> process step -> data/API -> test -> control -> eval -> release
|
v
Quality scoring and eval contract
29148-style quality rubric | groundedness | conflict check | completeness | testability
|
v
SME validation workbench
approve | reject | merge | split | escalate | risk accept | defer with rationale
|
v
Change impact and portfolio learning
impacted journeys | systems | APIs | controls | tests | policies | releases | benefits
4.1 Architectural principles
| Principle | Product / architecture implication |
|---|---|
| Source-first | 每个 candidate requirement 必须可回到原始 artifact、位置、版本和 owner。 |
| Permission-before-retrieval | 不先检索再过滤; source connector、index、chunk、graph query 和输出层都要权限约束。 |
| Draft is not baseline | AI 生成的是 candidate, 只有 human-authorized decision 才能成为 baseline。 |
| Requirement as graph | 需求不是 Word 段落, 而是连接 need、process、data、system、test、control、eval 和 release 的图。 |
| Ambiguity is a feature | 系统必须显式标注不确定、冲突、缺证据和多解释, 而不是强行给一个顺滑答案。 |
| Controls are first-class | 访问、记录、隐私、模型评估、变更、人工复核和审计证据要从架构首日进入。 |
| Production learns back | 工单、投诉、日志、QA、incident、override 和 eval failure 回流需求图谱和 portfolio。 |
5. Canonical Knowledge Objects
| Object | Minimum fields | 为什么重要 |
|---|---|---|
| Artifact | artifact_id, type, source_system, owner, version, status, effective_date, hash, record_class, permission_tag | 防止无来源文本进入需求系统 |
| Evidence unit | evidence_id, artifact_id, span/page/section/event_id, extracted_text, normalized_fact, confidence, permission_scope | 把证据粒度细到可引用和可复核 |
| Domain term | term_id, preferred_term, synonyms, forbidden_terms, product_scope, owner, examples | 统一业务词汇, 降低误解和幻觉 |
| Process activity | activity_id, name, actor_role, input, output, precondition, postcondition, SLA, control_points | 连接需求、流程和系统行为 |
| Process event | event_id, case_id, activity, timestamp, resource, channel, status, outcome, source_log | 支持变体发现和事实校验 |
| Stakeholder concern | concern_id, stakeholder_role, need, pain, risk, desired_outcome, evidence_refs | 保留需求背后的真实动机 |
| Requirement candidate | req_candidate_id, statement, type, source_refs, authority_level, ambiguity_flags, conflicts, quality_score | AI 输出的受控中间资产 |
| Requirement baseline | req_id, approved_statement, owner, version, priority, acceptance_criteria, linked_eval, decision_record | 经授权的需求资产 |
| Acceptance criterion | ac_id, condition, observable_result, data_needed, negative_case, edge_case, evidence_required | 把自然语言需求变成可验证行为 |
| Eval contract | eval_id, requirement_refs, dataset, rubric, thresholds, critical_failures, slices, release_decision_rule | 让 AI 相关需求可重复评估 |
| Control link | control_id, control_objective, requirement_refs, test_method, frequency, owner, evidence | 让产品需求和控制体系互相可追踪 |
| Change impact | impact_id, changed_object, impacted_requirements, processes, systems, controls, tests, releases, owner | 支持变更评审和回归分析 |
6. Domain Vocabulary and Ambiguity Detection
6.1 Vocabulary is architecture
金融零售需求挖掘中, “同义词”经常不是同义:
| Term cluster | 需要区分 |
|---|---|
| customer / member / applicant / account holder / beneficial owner / authorized user | 身份、权益、权限和通知责任不同 |
| approve / pre-approve / recommend / route / auto-decision | 决策权、客户影响和控制要求不同 |
| fee waiver / refund / adjustment / goodwill credit | 账务、授权、政策和客户沟通不同 |
| complaint / dispute / inquiry / request / incident | 监管处理、SLA、记录和升级路径不同 |
| verified / validated / reviewed / attested / approved | 证据强度和责任主体不同 |
| requirement / policy / control / design decision / test expectation | 生命周期和变更治理不同 |
6.2 Ambiguity detector
| Ambiguity type | Signal | AI mining action | Human decision needed |
|---|---|---|---|
| Actor ambiguity | “用户可以提交”但 user 指客户、员工或系统不清 | 标注 actor_unknown, 生成角色澄清问题 | Product owner / process owner 定义 actor |
| Decision ambiguity | “系统判断是否通过”未区分建议和最终决定 | 标注 decision_boundary_unknown | Risk / business owner 定义 AI role |
| Data ambiguity | “使用客户信息”未说明字段、来源和目的 | 标注 data_scope_unknown | Data owner / privacy owner 确认最小数据集 |
| Evidence ambiguity | “需提供证明”未说明证明类型和可接受证据 | 标注 evidence_standard_unknown | Policy / operations owner 定义 evidence rule |
| Timing ambiguity | “及时处理”无 SLA、cutoff、时区 | 标注 timing_unknown | Process owner 定义 measurable SLA |
| Exception ambiguity | “特殊情况人工处理”未定义特殊情况 | 标注 exception_taxonomy_missing | SME 定义 exception classes |
| Control ambiguity | “需审批”未说明审批角色和阈值 | 标注 control_owner_missing | Control owner 定义 approval matrix |
| Testability ambiguity | 需求无法观察和验证 | 标注 not_testable | BA/QA/Architect 改写为可验收条件 |
输出格式要强制包含:
candidate_statement
source_refs
authority_level
known_facts
unknowns
conflicts
ambiguity_flags
recommended_questions
validation_owner
7. Extraction Patterns
7.1 Requirement candidate extraction
| Pattern | 从哪里来 | 抽取逻辑 | 示例输出 |
|---|---|---|---|
| Explicit shall / must | policy、SOP、approved BRD | 提取强制性动词、对象、条件、例外 | “在 dispute evidence 缺失时必须发起补件请求。” |
| Implicit expectation | test cases、tickets、UAT notes | 从 expected result 和 defect 描述反推需求候选 | “地址变更后通知偏好应同步到 servicing profile。” |
| Constraint extraction | policy、architecture standards、API specs | 提取禁止行为、限额、依赖和格式 | “AI 不得直接更新 fee waiver 状态。” |
| Process step extraction | SOP、BPMN、event logs | 提取 activity、actor、input、output、control point | “L2 reviewer 在 high-risk case close 前执行二次复核。” |
| Data requirement extraction | API specs、forms、database schema、logs | 提取字段、来源、质量、权限和保留要求 | “application_id 必须可关联 LOS、document store 和 CRM case。” |
| Acceptance extraction | UAT、test cases、BDD、support runbooks | 提取可观察结果、negative case 和 edge case | “当 policy version 过期时, 系统拒绝生成 final answer。” |
| Risk/control extraction | control library、audit issue、incident | 提取 control objective、failure mode、test evidence | “高影响输出必须记录 reviewer approval。” |
7.2 Process knowledge extraction
| Knowledge type | Evidence signals | 用途 |
|---|---|---|
| Case lifecycle | created、assigned、reviewed、escalated、closed events | 还原真实流程和 cycle time |
| Variant | activity sequence patterns across cases | 识别主流程、长尾和异常路径 |
| Handoff | resource/team transition | 定位交接成本和责任断点 |
| Rework | repeated activity or status regression | 找资料缺失、规则冲突和系统缺陷 |
| Waiting | gap between events | 识别 queue、SLA、外部依赖 |
| Control execution | approval event、review flag、dual control record | 验证控制是否执行 |
| Exception | override、manual adjustment、special handling | 定义例外 taxonomy 和 AI escalation |
| Outcome | approved、declined、closed、refunded、filed、withdrawn | 建立 result-to-path learning loop |
7.3 Stakeholder and evidence extraction
| Extracted object | From | Key fields |
|---|---|---|
| Stakeholder | BRD、meeting notes、RACI、Jira assignee、process maps | role、decision right、concern、approval needed |
| Pain point | tickets、transcripts、retro notes、complaints | persona、journey step、frequency、severity、evidence |
| Decision | meeting notes、ADR、change board、release notes | decision、alternatives、rationale、owner、date |
| Evidence standard | policy、SOP、control test、audit finding | evidence type、source、minimum quality、retention |
| Assumption | business case、PRD、architecture brief | assumption、impact if false、validation plan |
| Dependency | API specs、Jira links、code repos、release plan | upstream/downstream、owner、version、risk |
8. Requirement Quality Rubric
参考 requirements engineering 的质量视角, AI mining 不能只输出“看起来像需求”的句子。每个 candidate 至少按下面维度评分。
| Dimension | High-quality signal | Red flag | Suggested gate |
|---|---|---|---|
| Grounded | 有来源、位置、版本、owner、authority level | 只有 AI 总结, 无原文证据 | source_refs required |
| Necessary | 连接 business outcome、risk control 或 user pain | 只是技术炫技或单个 stakeholder 偏好 | outcome link required |
| Unambiguous | actor、object、condition、decision boundary 清楚 | “快速、智能、友好、及时”无定义 | ambiguity flags must close |
| Feasible | API、数据、权限、流程和 operating model 可支持 | 依赖不存在数据或越权动作 | architecture review |
| Verifiable | 有 acceptance criteria、test method、eval metric | 无法观察是否满足 | AC/eval required |
| Bounded | scope、exception、negative case、out-of-scope 清楚 | 无限泛化、覆盖所有情况 | scope boundary required |
| Consistent | 与政策、控制、SOP、API、其他需求不冲突 | 冲突未记录 | conflict resolution required |
| Traceable | 可连到 source、process、system、test、control、release | 文档段落孤立 | graph link required |
| Risk-tiered | 区分低/中/高影响和 human oversight | 高影响场景只写功能 | risk tier required |
| Maintainable | owner、version、change trigger 清楚 | 需求没人负责 | lifecycle owner required |
Candidate scoring:
0 = not usable
1 = candidate only
2 = grounded but ambiguous
3 = clear and traceable
4 = testable and risk-tiered
5 = baseline-ready with eval/control links
Release baseline rule:
No AI-mined requirement enters baseline below score 4.
High-impact AI behavior requires score 5 and SME approval evidence.
9. Traceability Graph
传统 traceability matrix 在 AI 项目里很快失效, 因为 requirement 同时连接 prompt、RAG corpus、model、tools、data contracts、controls、eval cases 和 production feedback。更可持续的是 graph。
9.1 Minimum graph schema
Business outcome
-> stakeholder need
-> requirement
-> process activity
-> data object / API / policy / control
-> acceptance criterion
-> eval case / test case
-> release gate decision
-> production signal
-> change request / improvement
9.2 Edge types
| Edge | Meaning | Example |
|---|---|---|
| derives_from | 需求来自某证据 | requirement derives_from SOP section 4.2 |
| constrains | 政策/控制约束需求 | policy constrains fee waiver AI action |
| implements | 系统/API 实现需求 | API endpoint implements address validation |
| verifies | 测试/eval 验证需求 | eval case verifies grounded policy answer |
| conflicts_with | 两个对象冲突 | transcript practice conflicts_with policy |
| supersedes | 新版本替代旧版本 | SOP v7 supersedes SOP v6 |
| impacts | 变更影响对象 | API schema change impacts acceptance criteria |
| monitors | 生产指标监控需求 | unsupported claim rate monitors groundedness |
| remediates | 改进项修复缺陷 | change request remediates ambiguity defect |
9.3 AI traceability-specific nodes
| Node | Why it matters |
|---|---|
| Prompt/version | prompt 改动可能改变 requirement interpretation 或输出结构 |
| Retrieval corpus/version | policy/SOP 更新必须触发 eval 回归 |
| Chunk/evidence id | 防止模型引用错误来源或过期证据 |
| Tool permission | AI 能读/写/调用哪些系统直接决定风险 |
| Eval dataset/slice | 平均分不足以覆盖高风险场景 |
| Human override | reviewer 为什么覆盖 AI 输出是需求学习信号 |
| Incident/complaint | 线上问题必须回到需求和 eval 缺口 |
10. Acceptance Criteria and Eval Contract
10.1 Acceptance criteria grammar
Given [source evidence / process state / user role / policy version],
when [AI or system action occurs],
then [observable output / workflow action / control evidence],
and [source citation / permission / escalation / negative case rule],
with [metric threshold / reviewer rule / log evidence].
Example:
Given a customer fee dispute transcript and the active fee waiver policy,
when the AI drafts an agent response,
then every policy-based statement must cite the active policy section,
and any waiver recommendation above the role limit must be routed to supervisor approval,
with critical unauthorized commitment defects equal to zero in release eval.
10.2 Eval contract template
| Field | 内容 |
|---|---|
| eval_id | REQ-MINING-FEE-DISPUTE-GROUNDEDNESS-001 |
| requirement_refs | linked requirement ids and acceptance criteria |
| dataset | approved sample set, source period, permission scope, redaction method |
| slices | product、channel、customer segment、policy version、exception type、risk tier |
| tasks | extract requirement, cite evidence, detect ambiguity, generate AC, classify control |
| rubric | groundedness、completeness、conflict detection、testability、risk boundary |
| thresholds | average score, minimum slice score, critical failure blocker |
| critical failures | unsupported policy claim、permission leak、wrong authority source、missed high-impact ambiguity |
| evaluator | SME, QA, model evaluation team, independent challenger |
| decision rule | go / limited go / no-go / rollback triggers |
| evidence | run manifest、model version、prompt version、corpus version、review results |
10.3 Eval dimensions for requirements mining
| Dimension | Measures | Failure example |
|---|---|---|
| Extraction precision | candidate 是否真是需求/流程知识 | 把会议闲聊生成 requirement |
| Extraction recall | 关键需求、控制、例外是否漏掉 | 漏掉 supervisor approval requirement |
| Groundedness | 每个结论是否有正确来源 | 引用过期 SOP 或无关 ticket |
| Authority awareness | 是否区分政策、PRD、ticket 和 AI draft | 让 ticket 覆盖 approved policy |
| Ambiguity detection | 是否标出 actor/data/decision/control 不清 | 把“及时处理”当完整 SLA |
| Conflict detection | 是否发现材料间冲突 | 漏掉 SOP 与 production log 不一致 |
| Quality scoring | rubric 是否和人工专家一致 | 高风险模糊需求被评为 baseline-ready |
| Impact analysis | 是否找出相关系统、流程、测试、控制 | API 变更未提示 UAT 和 control impact |
11. Hallucination, Privacy and Records Controls
11.1 Hallucination control
| Control | Design requirement |
|---|---|
| Evidence-bound generation | 输出必须只基于 retrieved evidence unit, 无证据时显示 unknown / needs validation。 |
| Source authority ranking | 冲突时显示 authority ladder 和 conflict, 不自动融合成一个答案。 |
| Structured output schema | candidate、source_refs、ambiguity_flags、quality_score、validation_owner 分字段输出。 |
| Unsupported claim detector | 对每个 requirement statement 做 citation coverage 和 semantic entailment 检查。 |
| Version-aware retrieval | 检索必须考虑 effective_date、retired status、product scope 和 jurisdiction scope。 |
| Critical failure block | 发现权限泄露、伪造引用、错误政策引用或高影响决策越权时阻断输出进入工作流。 |
| Human validation queue | score 不足、冲突未解、权限边界不明或高影响需求进入 SME review。 |
11.2 Permission filtering
identity and role
-> source connector entitlement
-> document-level ACL
-> chunk-level permission tag
-> graph edge visibility
-> retrieval-time filter
-> output redaction
-> audit log
Permission design questions:
- Analyst 能不能看到 call transcript 中的全部 PII, 还是只看 redacted evidence?
- Vendor LLM 能不能处理 records classified material?
- Cross-product team 能不能检索另一个 product 的 complaint notes?
- Legal hold material 是否允许进入 training 或 synthetic eval generation?
- 输出是否可能通过 summary 泄露用户本不该看到的来源?
11.3 Privacy and records controls
| Area | Guardrail |
|---|---|
| Data minimization | 只摄取挖掘任务需要的字段和片段, 对日志与转写默认脱敏和分桶。 |
| Purpose limitation | requirement mining、eval、training、analytics 分用途授权, 不混用。 |
| Retention | raw artifact、derived chunk、embedding、summary、review note、eval result 分别定义保留和处置。 |
| Legal hold | hold flag 传播到原文、索引、向量、图谱、AI 输出、review notes 和导出包。 |
| Records provenance | artifact hash、ingestion run、parser version、model version、review decision 可回放。 |
| Access audit | 记录谁检索了什么、输出了什么、导出了什么、用于哪个项目。 |
| Vendor boundary | 第三方处理、子处理者、区域、日志、删除证明和退出计划进入 architecture review。 |
12. SME Validation Workbench
需求挖掘系统必须给 SME 一个可操作的 review surface, 而不是给一页 AI 总结。
| Review action | Meaning | Captured evidence |
|---|---|---|
| Approve | candidate 可进入 baseline 或 backlog | approver、role、timestamp、source refs、quality score |
| Reject | candidate 不成立 | reason code: not requirement / duplicate / wrong source / obsolete / out of scope |
| Merge | 多个 candidate 合并 | merged ids、canonical wording、rationale |
| Split | 一个 candidate 包含多个要求 | child requirements、scope boundaries |
| Escalate | 需要 policy/risk/legal/architecture 判断 | escalation owner、question、deadline、decision |
| Reclassify | 类型或 authority level 错误 | old/new class、rationale |
| Add evidence | SME 补充缺失来源 | evidence id、artifact owner、access scope |
| Mark conflict | SME 发现冲突 | conflicting nodes、resolution path |
Reviewer metrics:
| Metric | Interpretation |
|---|---|
| approval rate by source class | 哪些材料更能产生高质量需求 |
| reject reason distribution | 模型或 source hygiene 的主要缺陷 |
| ambiguity closure time | 需求澄清吞吐能力 |
| SME disagreement rate | 领域词汇或政策解释是否不稳定 |
| override-to-eval feedback latency | 线上学习是否进入 eval 回归 |
13. Change Impact Architecture
当政策、API、流程、模型或控制改变时, requirements mining 系统应能回答:
What changed?
Which requirements depend on it?
Which journeys and stakeholders are impacted?
Which tests and eval cases must run?
Which controls and records are affected?
Which release or portfolio commitments may move?
Who must approve?
13.1 Impact dimensions
| Changed object | Impact candidates |
|---|---|
| Policy section | requirements, acceptance criteria, RAG corpus, agent scripts, training, control tests |
| SOP step | process map, workflow engine, roles, SLA, task mining interpretation, QA sampling |
| API schema | UI fields, validation, integration tests, data contract, downstream reports |
| Code rule | implicit requirement, defect trend, regression test, control logic |
| Test case | coverage, acceptance criteria, requirement quality score |
| Production log pattern | process variant, exception taxonomy, AI intervention point |
| Control finding | requirement priority, risk gate, release scope, remediation backlog |
| Model/prompt change | eval contract, output schema, hallucination risk, reviewer instructions |
13.2 Change impact scoring
| Factor | Low | Medium | High |
|---|---|---|---|
| Customer impact | internal productivity | staff decision support | customer rights, money, access, eligibility |
| Policy/control impact | no control link | one local control | enterprise policy or high-risk control |
| System dependency | single UI | API/data contract | system of record or cross-platform |
| Evidence confidence | direct source | inferred from multiple sources | conflict or missing authority |
| Reversibility | easy rollback | manual correction | hard to unwind or records impact |
| Eval coverage | existing regression | partial cases | no validated eval slice |
High score changes require architecture review、SME approval、regression eval and release gate evidence.
14. PM / BA / Architect Implications
| Role | 高级职责 |
|---|---|
| AI PM | 把 mining output 变成 portfolio decisions: 哪些需求进入产品路线, 哪些只是流程治理, 哪些需要风险接受。 |
| Senior BA | 设计 elicitation-by-evidence 流程, 管理 ambiguity、conflict、traceability、SME validation 和 requirement baseline。 |
| Product Architect | 定义 requirement graph、API/data/control linkage、AI role boundary 和 change impact architecture。 |
| Enterprise Architect | 把需求挖掘能力纳入 enterprise knowledge architecture、records、AI platform、governance 和 reuse strategy。 |
| Process Owner | 判断真实流程变体、可接受例外、SOP 修订和 operating model change。 |
| Risk / Control Partner | 把 policy、control objective、test evidence、issue remediation 接入 requirement lifecycle。 |
| Data / Privacy Owner | 定义可摄取数据、权限、脱敏、保留、legal hold 和跨境边界。 |
| QA / EvalOps | 把 mined requirements 转成 test cases、eval datasets、rubrics、thresholds 和 regression gates。 |
高级 PM 判断:
If the output cannot be traced, validated, tested, controlled and maintained,
it is not a requirement asset; it is an AI note.
15. Anti-Patterns
| Anti-pattern | 表现 | 风险 | 修正 |
|---|---|---|---|
| Upload-and-summarize | 把文档批量上传后要 user stories | 幻觉、过期来源、权限泄露 | artifact inventory + source authority + structured extraction |
| Ticket-driven product truth | 票多就是需求优先级 | 重复、投诉偏差、缺业务价值 | dedupe + severity + journey + outcome scoring |
| Policy-blind AI mining | 只看 PRD 和会议纪要 | 需求越过政策和控制 | policy/control corpus as first-class source |
| No permission boundary | 一个索引服务所有项目 | 数据越权和记录风险 | retrieval-time ACL + output redaction + audit |
| Average quality score | 只看总体准确率 | 高风险 slice 失败被掩盖 | slice thresholds + critical failure blockers |
| No SME workflow | AI 输出直接进 backlog | 错误需求规模化 | approve/reject/merge/split decision log |
| No graph | 用 Excel 维护 traceability | 变更影响不可控 | requirement-process-system-test-control graph |
| Treat code as policy | 从代码反推出“应该如此” | 固化历史缺陷 | distinguish current behavior from desired requirement |
| Ignore production logs | 只挖文档 | 看不见真实流程变体 | event log and ticket feedback loop |
| No portfolio learning | 每个项目重新挖 | 组织不学习 | vocabulary、rubric、eval、pattern reuse |
16. Evidence Pack
一个可审计的 requirements mining initiative 至少应保留:
| Evidence | 内容 |
|---|---|
| Source inventory | artifact、owner、version、status、authority、permission、retention、hash |
| Ingestion manifest | parser、OCR/transcript model、chunking rule、embedding model、run time |
| Vocabulary register | domain terms、synonyms、scope、owner、approved definitions |
| Extraction run report | prompt/model/version、candidate count、confidence、critical failures |
| Quality rubric results | score distribution、low-score reason、review queue |
| Conflict and ambiguity log | conflict type、source refs、owner、resolution decision |
| SME validation log | approve/reject/merge/split/escalate actions and rationale |
| Traceability graph snapshot | requirements、process、systems、tests、controls、evals、release |
| Eval contract and report | dataset、rubric、threshold、slices、decision、review evidence |
| Permission audit | access decisions、redaction、exports、exceptions |
| Change impact report | changed objects、impacted assets、required approvals |
| Portfolio learning report | reusable patterns、new vocabulary、eval failures、roadmap implications |
17. Interview Answers
Q1: AI requirements mining 会不会替代 BA?
30 秒回答:
不会。它替代的是低价值的搜文档、初步归类和重复比对, 不是 BA 的业务判断、stakeholder negotiation、scope tradeoff、risk acceptance 和 baseline ownership。我的设计会把 AI 输出固定为 candidate, 只有带来源、质量评分、SME 验证和 traceability 的内容才能进入需求基线。
2 分钟展开:
AI requirements mining 的价值是扩大 evidence surface: PRD、SOP、ticket、transcript、logs、tests 和 controls 都能被系统化读取。它能发现重复、冲突、模糊和隐性需求, 但不能决定哪个 stakeholder 的目标优先, 不能解释合规义务, 不能为生产上线承担责任。所以我会建立 source authority、permission filtering、requirement quality rubric、traceability graph、eval contract 和 SME validation workflow。AI 做 discovery and drafting, BA/PM/Architect 做 decision and accountability。
Q2: 如何防止 AI 从过期或低权威材料中生成错误需求?
我会给每个 artifact 加 owner、version、approval status、effective date、authority level 和 permission tag。检索时先按权限和时间过滤, 冲突时按 authority ladder 显示差异, 不让模型自动融合。输出必须包含 source_refs、authority_level、ambiguity_flags 和 validation_owner。低权威来源只能生成 candidate 或 pain signal, 不能直接进入 baseline。
Q3: 你如何把 mined requirement 变成可上线的 AI requirement?
我会经过四步: 第一, 质量评分, 确认 grounded、unambiguous、feasible、verifiable、traceable 和 risk-tiered; 第二, 写 acceptance criteria, 包括 negative case 和 evidence requirement; 第三, 生成 eval contract, 定义数据集、rubric、threshold、critical failure 和 slice; 第四, 通过 SME validation 和 release gate, 把 requirement、eval、control、release decision 绑定成证据链。
Q4: 生产日志和流程挖掘在需求挖掘里有什么价值?
文档告诉我们流程应该怎么跑, 日志告诉我们流程实际怎么跑。生产日志能发现 variant、handoff、rework、waiting、control execution 和 exception path。AI requirements mining 如果只读文档, 很容易把理想流程产品化; 加上 event logs 后, 才能识别真实瓶颈、例外路径和控制偏离, 并判断 AI 应该介入、流程应该重构, 还是系统集成才是正确解法。
Q5: 如何处理从代码/API/测试用例反推出来的需求?
我会把它们标记为 actual behavior 或 implicit requirement candidate, 不直接当业务需求。代码/API/test 可以揭示系统事实、字段约束、集成边界和历史验收预期, 但也可能固化旧缺陷。需要 business owner、architecture owner 和 QA 一起判断: 是保留为 baseline、修正为新需求, 还是作为技术债进入 backlog。
18. Portfolio Learning Loop
成熟组织不会让每个项目从零开始挖需求。每次 mining 都应沉淀:
validated vocabulary
-> reusable requirement patterns
-> acceptance criteria library
-> eval cases and red-team cases
-> process variant taxonomy
-> control mapping templates
-> source authority rules
-> anti-pattern catalogue
-> portfolio opportunity scoring
Portfolio-level metrics:
| Metric | 管理含义 |
|---|---|
| candidate-to-baseline conversion rate | source quality and model extraction usefulness |
| ambiguity density by domain | 哪些产品/流程需求成熟度低 |
| conflict density by source pair | 哪些政策、SOP、系统或运营实践不一致 |
| eval failure recurrence | 哪些需求质量问题反复造成 AI 风险 |
| source freshness gap | 文档和生产事实的时效差 |
| impact analysis lead time | 变更评审速度和 traceability 健康度 |
| reuse rate of requirement patterns | 组织是否形成可复用 BA/architecture assets |
最终心智模型:
AI requirements mining 的终局不是生成更多 user stories, 而是建立一个会学习的需求与流程知识系统: 它知道证据来自哪里、谁有权解释、哪里有冲突、什么能验证、哪些控制必须执行, 以及每次上线后如何把真实反馈变成更好的需求资产。