AI Open Banking / Open Finance:授权数据共享架构
本文是学习、架构训练和作品集材料, 不构成法律意见、监管意见、CFPB 1033 适用性判断、合规期限判断、实体义务判断、第三方授权充分性判断、隐私影响评估结论、模型验证报告、信息安全认证、消费者通知建议、供应商推荐或业务上线批准。
AI Open Banking / Open Finance / Consented Data Sharing Architecture 解读
面向对象: Advanced AI PM / Senior BA / Product Architect / Enterprise Architect / Financial Retail Architect / Data Product Owner / API Platform Lead / Fraud Risk Architect / Privacy / Compliance / Model Risk / Third-Party Risk / Customer Trust Lead。 核心问题: AI 系统如何在 open banking / open finance 生态中使用 customer-authorized financial data, 既能提供真实客户价值, 又能控制 consent、data minimization、revocation、API contract、third-party risk、RAG grounding、model feature boundary、fraud/scam、audit evidence 和 customer trust? 学习目标: 建立 consented data sharing architecture、data provider / data recipient / aggregator role model、developer onboarding、API ecosystem governance、AI/RAG over account data、feature boundary、data quality lineage、revocation handling、fraud controls、third-party oversight、evidence replay 和 senior PM/architect decision framework。
0. Disclaimer
本文是学习、架构训练和作品集材料, 不构成法律意见、监管意见、CFPB 1033 适用性判断、合规期限判断、实体义务判断、第三方授权充分性判断、隐私影响评估结论、模型验证报告、信息安全认证、消费者通知建议、供应商推荐或业务上线批准。
正式项目必须由 Legal、Compliance、Privacy、Information Security、Fraud Risk、Financial Crime、Model Risk、Data Governance、API Platform、Third-Party Risk Management、Vendor Management、Customer Experience、Operations、Accessibility、Product Owner、Architecture、Internal Audit 和必要外部顾问共同判断。CFPB 1033 或其他 open banking / open finance 要求的具体适用性, 取决于 product、entity role、data type、customer relationship、jurisdiction、rule status、litigation / regulatory developments、contract structure、partner role、data flow、use case 和 Legal / Compliance interpretation。
本文只讨论架构和治理方法。它不推断任何机构是否是 data provider、authorized third party、data aggregator、service provider 或 covered entity; 不推断任何数据字段是否属于 covered data; 不推断任何合规日期、宽限期、豁免、诉讼影响或监管解释。所有 rule-status 和 implementation 相关问题必须以 Legal / Compliance 对官方来源的最新解释为准。
Source Anchors
| Source | Link | 用途 |
|---|---|---|
| CFPB Personal Financial Data Rights final rule | https://www.consumerfinance.gov/rules-policy/final-rules/required-rulemaking-on-personal-financial-data-rights/ | 用作 U.S. personal financial data rights、secure and reliable consumer / authorized third-party data access、privacy protections、open banking 方向的官方锚点 |
| CFPB Regulation 1033 / Personal Financial Data Rights | https://www.consumerfinance.gov/rules-policy/regulations/1033/ | 用 § 1033 的 covered data、developer interface、authorized third party、authorization disclosure、third-party obligations、revocation 和 recordkeeping 结构组织架构问题 |
| CFPB Personal financial data rights compliance resource | https://www.consumerfinance.gov/compliance/compliance-resources/other-applicable-requirements/personal-financial-data-rights/ | 用作 rule status、implementation resources、official interpretations 和 regulatory-development watch 的入口; 不在本文推断适用性或期限 |
| CFPB guidance / circulars landing page | https://www.consumerfinance.gov/compliance/circulars/ | 用作 CFPB guidance / circulars / supervisory materials 的官方入口, 支持 horizon scanning 和 policy update workflow |
| FFIEC Authentication and Access to Financial Institution Services and Systems | https://www.ffiec.gov/press/pr081121.htm | 用 authentication、access risk assessment、layered security、MFA / equivalent controls、third-party / customer-permissioned entity access 风险组织 access controls |
| NIST Privacy Framework | https://www.nist.gov/privacy-framework | 用 privacy risk management、data processing purpose、governance、control design、customer trust 和 minimization 组织 privacy-by-architecture |
| NIST AI RMF | https://www.nist.gov/itl/ai-risk-management-framework | 用 Govern / Map / Measure / Manage 思路组织 AI 风险识别、eval、monitoring、incident response 和 evidence |
| ISO/IEC 42001 overview | https://www.iso.org/standard/42001 | 用 AI management system、policy、roles、operation、performance evaluation、internal audit 和 continual improvement 建立 operating model |
一句话:
Consented financial data sharing is not "AI can read bank data". It is a purpose-bound, revocable, API-mediated trust architecture that lets AI consume only authorized data, reason within governed boundaries, and act only with explicit customer and policy authorization.
1. Thesis
AI open banking / open finance 的核心架构变化, 不是把 screen scraping 换成 API, 也不是把银行流水丢给 LLM 总结。真正变化是:
from: broad data harvesting + credential sharing + opaque enrichment
to: customer-authorized access + standardized API contract
+ purpose-bound use + revocation + evidence replay
+ AI data-use boundary + fraud and third-party controls
成熟架构必须同时回答十二个问题:
- 客户授权的是哪个 data recipient, 面向哪个 requested product or service, 访问哪些 data categories, 多长时间, 多高频率?
- data provider、data recipient、data aggregator、developer app、AI platform 和 downstream processor 各自是什么角色?
- API contract 如何表达 scope、field semantics、quality、latency、pagination、error、consent state、revocation 和 incident behavior?
- AI 是否只使用为当前 purpose 授权的数据, 是否阻止 cross-sell、targeted advertising、unrelated profiling 和 unrestricted training?
- 客户能否像授权一样容易地 revoke, revoke 后 collection、use、retention、cache、vector index、features 和 agent sessions 如何停止或降级?
- data minimization 是产品策略、API scope、feature engineering、prompt grounding 和 retention policy 的共同约束, 还是只是隐私文案?
- RAG over account data 是否做到 tenant / customer isolation、source citation、freshness、entitlement check、prompt-injection defense 和 evidence capture?
- model features 是否有 allowed uses / prohibited uses, 是否把 transaction patterns 变成未经授权的 sensitive inferences?
- fraud / scam controls 是否覆盖 account takeover、authorized push payment scam、malicious third-party app、data exfiltration、synthetic behavior 和 agent overreach?
- developer onboarding 是否有 security, privacy, data-use, operational, financial crime, model-risk 和 exit controls?
- 客户价值是否足够清楚, 让客户理解自己为什么要分享数据, 以及分享后能获得什么?
- 事后能否重放 authorization disclosure、consent receipt、API calls、data transformations、AI run、human decision、customer communication 和 revocation handling?
关键原则:
Consented does not mean unlimited.
Available through API does not mean usable for every model.
Data portability does not mean downstream data resale.
Account history does not mean behavioral surveillance license.
AI personalization does not override purpose limitation.
Revocation must affect data, features, embeddings, agents and evidence.
2. Why It Matters
金融零售 AI 正把 open banking 从 data-access program 推向 AI decision and action ecosystem:
| Pressure | 表现 | 架构含义 |
|---|---|---|
| Consumer data portability | 客户希望把账户、交易、余额、账单、账户验证信息带到新服务 | 需要 API-based, standardized, machine-readable, secure access, 不能依赖客户凭证共享 |
| Personal financial management | AI 预算助手、现金流预测、订阅优化、债务建议、理财建议需要账户数据 | 需要 purpose-bound access、explainability、data freshness、recommendation suitability boundary |
| Open finance expansion | Banking data 之外扩展到 payroll、tax、brokerage、insurance、pension、merchant、wallet、loyalty | 需要跨域 consent registry、semantic mapping、data quality 和 jurisdiction policy |
| Embedded finance | 第三方 app 在贷款、支付、储蓄、BNPL、保险、商户工具中调用金融数据 | 需要 developer onboarding、TPRM、API controls、consumer disclosure 和 monitoring |
| Agentic AI | AI agent 代表客户查询账户、准备申请、比较产品、发起服务请求 | 需要 read / reason / recommend / act 分级授权, action-bound approval 和 revoke propagation |
| Fraud industrialization | 恶意 app、scam scripts、ATO、social engineering、data exfiltration、credential stuffing | 需要 FFIEC-style layered access controls、behavioral monitoring、third-party anomaly controls |
| Customer trust | 客户担心“银行数据被拿去训练模型、卖广告、交叉销售、无法撤回” | 需要透明 value proposition、minimization、revocation receipt、complaint route 和 audit evidence |
高级 PM / Architect 的问题不是:
How do we plug Plaid-like data into the AI feature?
而是:
For this customer-requested outcome,
what data is necessary,
which party is authorized to collect it,
how is access technically constrained,
how will AI use be bounded and evidenced,
what happens on revocation,
and how do we prove the ecosystem behaved as promised?
3. Consented Data Sharing vs Unrestricted Data Harvesting
consented data sharing 与 unrestricted data harvesting 的边界必须写进产品、API、数据和模型架构, 不能只写进 privacy notice。
| Dimension | Consented data sharing | Unrestricted data harvesting |
|---|---|---|
| Customer intent | 客户为 requested product / service 授权特定 data access | 客户被迫接受宽泛条款, 数据被用于不相关目的 |
| Access method | API / developer interface / scoped token / consent state / audit log | screen scraping、credential sharing、暗箱 SDK、brokered datasets |
| Scope | data categories、duration、frequency、provider、recipient、purpose 受限 | 尽可能多拿、长期保留、跨产品复用 |
| Data minimization | 只请求完成当前价值主张所需字段 | 先收集再找用途 |
| AI use | grounded、purpose-bound、feature boundary 明确 | 用于训练、画像、广告、交叉销售或推断敏感属性 |
| Revocation | revocation 影响 collection、API token、cache、feature、embedding、agent workflow | 只停止新抓取, 历史数据继续漂移使用 |
| Evidence | consent receipt、API calls、data lineage、AI runs、decision logs 可重放 | 事后无法证明客户同意了什么 |
| Customer control | 可查看、撤销、投诉、纠错、转移 | 客户不知道谁拿了数据, 也不知道如何停止 |
判定测试:
If the customer cannot explain who gets what data for what purpose,
and the institution cannot technically enforce that boundary,
the architecture is data harvesting with a consent screen.
4. Ecosystem Role Model
AI open banking 必须先做 role clarity, 再做 data flow。一个机构可能在不同旅程中同时扮演 data provider、data recipient、aggregator client、model operator 或 service provider。
| Role | Responsibility | AI architecture question |
|---|---|---|
| Customer / consumer | 授权、撤销、查看分享历史、请求服务、纠错 | 客户是否理解 AI 会如何使用数据, 能否撤销和获得解释? |
| Data provider | 持有账户、交易、余额、条款、账单或账户验证信息的金融服务方 | developer interface 是否安全、可靠、标准化、可监控, 且不要求第三方使用客户凭证? |
| Authorized data recipient | 为客户请求的产品/服务访问数据的一方 | 是否只收集、使用、保留合理必要的数据, 是否可证明目的限制? |
| Data aggregator | 连接 provider 和 recipient 的中介或 service provider | aggregator 是否有 contract pass-through、security、revocation propagation、accuracy 和 incident controls? |
| Developer app | 客户面对的应用, 可能嵌入 AI assistant | app 是否通过 onboarding、security review、data-use attestation 和 monitoring? |
| AI system / model operator | 使用数据做总结、预测、推荐、自动化或 agent action | 模型是否只能访问授权上下文, 是否把 data class 和 purpose 作为 runtime control? |
| API platform | 管理 client registration、token、scope、rate limit、schema、observability | scope 是否足够细, error / revocation / outage behavior 是否契约化? |
| Data governance | 管理 lineage、quality、retention、catalog、feature store、embedding store | open banking data 是否被标记为 consent-bound asset? |
| Fraud / security | 防止 unauthorized access、scam、ATO、malicious recipient、exfiltration | 是否监控 third-party access pattern 和 customer-permissioned entity risk? |
| Legal / Compliance / Privacy | 解释义务、审查 notice、approval、retention、customer rights | rule status、jurisdiction、entity role 和 data type 是否被持续更新? |
架构上建议把每个 use case 建成 data sharing passport:
use_case_id
customer_requested_product_or_service
data_provider
data_recipient
aggregator_if_any
data_categories
purpose
duration
frequency
consent_version
revocation_route
ai_allowed_uses
ai_prohibited_uses
retention_rule
evidence_bundle_id
5. Data Scope and Open Finance Taxonomy
open banking 常从账户和支付相关数据起步, open finance 会扩展到更广泛的金融生活图谱。AI 架构必须避免把范围扩大解读为“任何有用数据都可以拿”。
| Data family | Examples | AI value | Boundary |
|---|---|---|---|
| Account identity / verification | name, address, email, phone, account identifier, account ownership signal | account opening, payout setup, fraud review | 不等于完整 KYC, 不应扩展为 unrelated identity profiling |
| Transaction history | amount, date, merchant, category, pending status, fees | cash-flow intelligence, affordability, budget, scam detection | sensitive inference risk 高, 需要 purpose and feature controls |
| Balance and liquidity | current / available balance, overdraft, credit limit | cash-flow warning, transfer timing, credit line review | freshness and timing critical, stale data can harm customers |
| Terms and conditions | fees, APR/APY, credit limit, rewards, overdraft opt-in | product comparison, fee optimization, advice | AI must cite source and avoid unsupported legal interpretation |
| Upcoming bills / scheduled payments | due amount, due date, biller, recurring obligations | proactive reminders, liquidity planning, hardship support | notification and action authorization must be separate |
| Payment initiation information | routing/account/tokenized information where applicable | account funding, verification, payment setup | high-risk; action permission and fraud controls separate from data access |
| Payroll / income | employer, pay frequency, net/gross pay, tax data | income verification, affordability, benefits | open finance extension; stronger consent, fairness, accuracy controls |
| Brokerage / pension / insurance | holdings, contributions, claims, premiums | holistic advice, risk planning | regulated advice and suitability boundaries likely apply |
| Merchant / small-business finance | POS sales, invoices, receivables, inventory, bank feeds | SME underwriting, cash-flow forecast, working capital | business consent, multi-user authority, data quality and role authority |
Design rule:
Each data family must carry:
source, consent, purpose, freshness, quality, allowed uses,
AI use boundary, downstream sharing rule, retention, revocation action.
6. Consent and Authorization Architecture
Consent 是客户允许访问和使用数据的表达。Authorization 是系统和政策允许某个 actor 做某件事。Authentication 是确认当前 session / user / client 的控制权。三者不可混用。
| Layer | Question | Artifact |
|---|---|---|
| Customer disclosure | 客户是否看到 third party、data provider、product/service、data categories、duration、revocation method? | authorization disclosure version, UI capture |
| Express consent | 客户是否明确授权, 且签署/确认记录可重放? | consent receipt, timestamp, channel, language |
| API authorization | client / aggregator 是否获得 scoped token, 只能访问授权 categories? | OAuth grant, scope, token hash, mTLS / client credential |
| Runtime entitlement | 每次 API / RAG / feature call 是否检查 consent state and purpose? | entitlement check log |
| Revocation | 客户撤销后是否停止新 collection, 通知 provider/aggregator/downstream, 处理 cache and retention? | revocation receipt, propagation log |
| Reauthorization | 长期或持续访问是否有重新授权机制? | reauthorization event |
| Action authorization | AI agent 或 app 要执行支付、提交申请、变更账户时是否另行批准? | action approval, transaction confirmation |
Consent object 建议字段:
consent_id
customer_ref
data_provider_id
data_recipient_id
aggregator_id
requested_product_or_service
data_categories
field_scope
purpose
collection_frequency
duration_limit
granted_at
expires_at
revoked_at
revocation_method
language
disclosure_version
ai_allowed_uses
ai_prohibited_uses
training_allowed_flag
downstream_sharing_allowed_flag
evidence_bundle_id
弱设计:
Customer clicked "Connect bank"; all transaction data enters generic data lake.
强设计:
Customer authorized cash-flow assistant to access transaction history and balance
from Bank A for 90 days.
API tokens are scoped.
RAG and feature store check consent before use.
Cross-sell, targeted ads and model training are prohibited unless separately authorized and approved.
Revocation stops collection, disables derived features where required, and closes agent sessions.
7. Reference Architecture
参考架构:
customer
-> consent and authorization disclosure UX
-> data recipient / developer app
-> aggregator or direct API connection
-> data provider developer interface
-> API gateway / auth / scope / rate limit
-> raw consent-bound data zone
-> normalization / quality / lineage service
-> purpose-bound feature store
-> customer-isolated vector / retrieval index
-> AI orchestration and tool policy
-> recommendation / decision / agent action workflow
-> human review / fraud / operations
-> customer communication
-> evidence ledger / monitoring / revocation service
关键组件:
| Component | Responsibility | Senior design question |
|---|---|---|
| Consent registry | 保存 authorization disclosure、scope、duration、revocation、AI allowed uses | 是否是 runtime enforcement source, 还是只做记录? |
| Developer onboarding portal | 注册 app、certification、security review、sandbox、contract、data-use attestations | onboarding 是否能阻止低信任 app 进入生产 API? |
| API gateway | client authentication、token validation、scope、rate limit、mTLS/FAPI-style controls、logging | API 是否能按 data category and purpose 限制访问? |
| Data provider interface | 标准化、机器可读、可靠、可监控的数据接口 | 是否有 uptime, error semantics, support, dispute and incident process? |
| Data normalization layer | merchant/category mapping、schema mapping、dedup、timezone、currency、pending/posted status | enrichment 是否保留 source lineage and uncertainty? |
| Quality service | completeness、freshness、accuracy feedback、provider variance、drift | AI 是否知道数据不完整、过期或待确认? |
| Feature boundary service | 把 consent-purpose 映射到 allowed features and prohibited features | model features 是否可追踪到 consent and purpose? |
| Retrieval layer | customer-specific source retrieval、embedding lifecycle、source citation、entitlement check | RAG 是否避免跨客户、跨目的、跨产品泄露? |
| AI policy engine | 控制 prompt、tools、outputs、action approval、human review | AI 是否能被 policy deny, 还是只能靠 prompt 提醒? |
| Fraud and scam controls | third-party anomaly、ATO、APP scam、malicious recipient、velocity、device | open banking data 是否成为 scammer 的 leverage? |
| Evidence ledger | 串联 consent、API、data quality、AI run、decision、communication、revocation | 投诉和审计能否重放整个 reliance chain? |
架构边界:
consent registry
-> entitlement checks
-> API scopes
-> data lake tags
-> feature store policies
-> vector index lifecycle
-> prompt/tool policies
-> evidence retention
如果 consent 只在前端展示, 不进入上述控制链, 就不是 production-grade consented data sharing。
8. API Contract and Developer Onboarding
Open banking 的产品体验由 API contract 决定, 不由单个 AI prompt 决定。API contract 应覆盖数据语义、可靠性、授权状态、错误、撤销和运营承诺。
8.1 API Contract
| Domain | Contract requirement | AI impact |
|---|---|---|
| Data categories | transaction, balance, terms, bill, verification 等分类清晰 | 模型不会把字段误用到错误目的 |
| Schema semantics | posted vs pending, merchant vs payee, balance type, timezone, currency | cash-flow model and RAG answer 不会错误推理 |
| Consent state | active, expired, revoked, suspended, reauthorization needed | runtime entitlement and agent stop conditions |
| Field-level scope | read categories and optional fields tied to consent | minimization and feature gating |
| Freshness | last_updated_at, as_of_time, provider latency | AI 输出必须说明数据时点 |
| Error model | unavailable, unauthorized, provider_denied, scope_missing, customer_action_needed | AI 不能把 API error 当成客户财务事实 |
| Pagination and history | historical window, cursor, duplicates, revision handling | transaction analysis reproducible |
| Quality feedback | dispute, correction, categorization feedback, provider issue | model drift and data quality improvement |
| Rate and performance | response SLO, scheduled downtime, throttling | agent workflow degrade gracefully |
| Security | client auth, token binding, encryption, key rotation, logging | third-party access monitorable |
| Revocation | notification, token termination, downstream propagation, retention action | model features and embeddings lifecycle controlled |
| Audit | correlation id, consent id, request id, response hash | evidence replay possible |
8.2 Developer / Third-Party Onboarding
| Gate | Evidence | Reject / restrict when |
|---|---|---|
| Business purpose review | product/service description, data categories, customer value | purpose too broad, data request not necessary |
| Security review | secure SDLC, auth, encryption, secrets, vulnerability mgmt, incident response | credential sharing, weak token handling, no monitoring |
| Privacy review | minimization, retention, downstream sharing, AI use, customer rights | targeted ads / resale / unrelated training built into model |
| Operational review | support, uptime, reconciliation, error handling, revocation process | no customer support or revocation workflow |
| Model-risk review | AI use case, features, prompts, evals, human oversight | identity/eligibility/credit decisions unsupported |
| Fraud review | scam controls, anomalous access, device/account risk | high-risk journeys lack step-up or monitoring |
| Contracting | data-use restrictions, audit rights, subprocessors, breach notice, exit | cannot enforce obligations downstream |
| Sandbox certification | test cases, negative scenarios, consent/revocation tests | app fails scope, revocation, data minimization |
Developer onboarding 要把“能调用 API”变成“能被治理”。对于 AI app, 还要额外收集:
model providers
prompt / tool architecture
whether customer data enters model training
feature store and vector store lifecycle
customer-facing claims
human review triggers
advice / recommendation boundaries
eval evidence
complaint and correction flow
9. Data Quality, Enrichment and Lineage
Open banking data 的可用性不等于高质量。交易数据尤其容易出现 merchant normalization、pending/posted duplication、category inconsistency、refund matching、subscription detection、timezone、currency、joint account 和 business/personal mix 的问题。
| Quality issue | AI failure | Control |
|---|---|---|
| Pending and posted duplicates | AI 高估支出或误报异常 | transaction lifecycle state and dedup logic |
| Merchant alias ambiguity | 错误识别商户、订阅或诈骗 | merchant enrichment confidence + source trace |
| Category inconsistency | 预算建议错误、cash-flow forecast drift | model-aware taxonomy mapping and feedback |
| Missing historical data | affordability or trend model biased | coverage metric and answer caveat |
| Stale balance | AI 建议转账/付款造成透支 | freshness gate and action block |
| Joint account context | AI 把共同账户行为归因给单个客户 | account ownership and household boundary |
| Business/personal commingling | SME cash-flow model误读业务状态 | account purpose classification with review |
| Data provider variance | 不同银行字段语义不一致 | provider-specific schema adapters |
| Customer correction ignored | 错误持续影响推荐 | correction workflow and feature recompute |
Enrichment 要分层:
raw_provider_data
normalized_data
enriched_data
derived_features
model_inferences
customer_visible_answer
每层都应保留:
source
transformation_version
confidence
consent_id
purpose
quality_score
freshness
allowed_uses
retention_rule
revocation_action
高级原则:
Never let enrichment erase consent, uncertainty or lineage.
10. Model Feature Boundaries
AI 使用 open banking data 最常见的风险, 是把交易历史变成无边界的行为画像。Feature boundary 要在 feature store、model serving、RAG retrieval 和 policy engine 中可执行。
| Feature class | Example | Allowed use | Prohibited or high-risk use |
|---|---|---|---|
| Customer-requested utility | recurring bill detection, cash-flow forecast, fee alert | budgeting, alerts, customer-selected recommendation | unrelated cross-sell without authorization |
| Risk / fraud | unusual payee, device + transaction anomaly, scam pattern | protect account, step-up, human review | opaque denial without evidence and review |
| Affordability / credit | income volatility, expense obligations, cash buffer | governed underwriting support where approved | proxy discrimination, unsupported adverse action |
| Vulnerability / hardship | missed payments, overdraft stress, income drop | supportive outreach and relief options | exploitative pricing or pressure sales |
| Marketing segmentation | spending category affinity, life event | only where separately permitted and governed | targeted ads or sale if outside purpose |
| Sensitive inference | health, religion, union, political, gambling, addiction, immigration | generally avoid, mask, or route to privacy review | model training or eligibility use without explicit policy |
Feature record 建议:
feature_name
source_data_categories
consent_purpose
customer_requested_service
allowed_products
allowed_decisions
prohibited_decisions
model_ids
human_review_required
fairness_review_required
retention
revocation_action
evidence_refs
不要只问“模型能不能预测”。要问:
Should this feature exist under this customer authorization?
Can the customer understand the value?
Can the institution explain the use?
Can revocation unwind future use?
Can monitoring detect misuse?
11. RAG over Account Data
RAG over account data 是高价值场景, 也是高风险场景。账户数据不是通用知识库, 它是 consent-bound, customer-specific, time-sensitive financial evidence。
11.1 Safe RAG Pattern
user question
-> authenticate user/session
-> check consent and purpose
-> determine allowed accounts/data categories/time window
-> retrieve source transactions/balances/terms
-> apply data quality/freshness filters
-> ground answer with citations and caveats
-> block prohibited advice/actions
-> capture AI run and sources
11.2 RAG Controls
| Control | What it prevents |
|---|---|
| Per-customer index or strict row-level retrieval | cross-customer leakage |
| Consent-aware retriever | use after revocation or outside purpose |
| Time-window filters | overcollection and stale answers |
| Source-grounded response | hallucinated financial facts |
| Quality and freshness tags | false precision in cash-flow advice |
| Prompt-injection filtering | merchant memo or transaction text manipulating the AI |
| Output policy | prohibited legal, tax, investment or credit conclusions |
| Embedding lifecycle | revoked data remaining searchable |
| Retrieval audit | inability to prove which data informed answer |
| Human escalation | hardship, scam, complaint or high-impact decision |
弱回答:
"You can afford this loan because your bank data looks healthy."
强回答:
"Based on transactions you authorized for this budgeting feature,
your average monthly inflow over the selected three-month period was X
and recurring obligations we detected were Y.
This is a budgeting estimate, not a credit approval or financial advice.
Some transactions may be pending or miscoded."
RAG 不应:
- 使用 revoked consent 的历史 embeddings 继续回答。
- 把 transaction memo 当成绝对事实。
- 从交易推断敏感属性并用于 eligibility。
- 让 agent 自动发起付款或申请而无 action approval。
- 把单一账户数据说成客户完整财务状况。
- 在投诉中无法列出支撑回答的 source rows。
12. Agentic AI and Action Boundaries
AI agent 可以读、解释、建议, 但“行动”必须单独治理。Open banking data access 不等于 payment authorization、product application authorization、account change authorization 或 advice acceptance。
| Agent capability | Example | Required control |
|---|---|---|
| Read | 查询余额、交易、账单 | consent + scope + customer/session auth |
| Reason | 总结现金流、找订阅、识别费用 | grounded RAG + quality caveats + prohibited-use guardrail |
| Recommend | 建议预算调整、提醒账单、建议联系银行 | recommendation boundary + customer explanation |
| Prepare | 预填申请、生成 dispute / hardship package | source citation + customer review |
| Submit | 提交申请、发送证明、开 case | action-bound approval + evidence |
| Initiate value movement | payment, transfer, withdrawal | separate payment authorization, step-up, scam controls |
| Change account | 更新地址、取消订阅、关闭产品 | business authorization + customer confirmation |
Agent authorization record:
agent_id
workflow_run_id
customer_ref
data_access_consent_id
allowed_tools
allowed_data_categories
purpose
time_bound
action_bound_approval_required
approval_id
revocation_reference
human_review_triggers
evidence_bundle_id
关键原则:
Data access consent lets the agent know.
Action authorization lets the agent do.
13. Fraud, Scam and Access Controls
Open banking / open finance 会改变欺诈攻击面。数据分享本身可能成为 fraud vector, 尤其当恶意 app 获取客户授权、诈骗者诱导授权、或账户数据被用于 social engineering。
| Threat | Pattern | Control |
|---|---|---|
| Malicious data recipient | app 以预算服务名义收集大量数据后外泄或滥用 | developer onboarding, data-use attestation, monitoring, audit rights, suspension |
| Consent phishing | 客户被引导授权给仿冒 app 或 scammer-controlled service | verified developer identity, customer warning, domain/app reputation |
| Account takeover | 攻击者登录客户 app 后授权数据分享 | MFA/step-up, device risk, session risk, revocation notification |
| Credential sharing fallback | 第三方绕过 API 要客户交出 bank credentials | contractual prohibition, detection, education, ecosystem enforcement |
| API exfiltration | 正常 token 被批量滥用 | rate limit, anomaly detection, token binding, IP/device intelligence |
| Authorized push payment scam | AI / app 建议或准备向诈骗收款人付款 | payee risk, cooling-off, step-up, scam intervention, human support |
| Data poisoning | merchant text / memo 注入 prompt 或错误分类 | prompt-injection defense, data sanitation, source weighting |
| Synthetic affordability | 欺诈者操纵账户流入流出制造收入 | graph/velocity controls, payroll/source diversity, fraud review |
| Revocation bypass | downstream party 继续使用缓存或 derived data | revocation propagation, retention rules, control testing |
| Third-party outage | app 无法取数, AI 给出错误建议 | degrade mode, freshness warnings, retry and customer message |
FFIEC-style design lessons:
- authentication risk assessment must include customers, employees, third parties, applications, service accounts and devices。
- MFA or equivalent strength controls should be risk-proportionate, especially for high-risk access and actions。
- layered controls matter because any single control can fail。
- customer-permissioned entity access needs specific risk assessment, monitoring, logging and reporting。
14. Governance Model
将 NIST Privacy Framework、NIST AI RMF 和 ISO/IEC 42001 组合成 operating model:
| Governance lens | Open banking question | AI control |
|---|---|---|
| Privacy risk | 是否识别 processing purpose、data minimization、retention、customer rights 和 data sharing risk? | consent registry, purpose binding, privacy review, DPIA-like evidence |
| AI risk | AI 是否被 map 到客户影响、data inputs、model behavior、evals、monitoring 和 incident response? | AI RMF-style govern/map/measure/manage workflow |
| Management system | 是否有 policy、roles、process、performance evaluation、internal audit 和 continual improvement? | ISO 42001-style AIMS operating rhythm |
| Security and access | 是否对 customer / third party / app / API / service account 做 risk-based authentication? | FFIEC-style risk assessment and layered access controls |
| Regulatory horizon | rule status、guidance、litigation、standard-setting、state/international changes 是否被更新? | regulatory change workflow and product impact assessment |
14.1 Operating Evidence
| Evidence | Why it matters |
|---|---|
| Use case risk assessment | 证明为何需要数据和 AI |
| Consent / disclosure artifact | 证明客户看见并授权了 scope |
| Data minimization matrix | 证明未过度收集 |
| API contract review | 证明技术边界可执行 |
| Third-party due diligence | 证明 recipient / aggregator 被治理 |
| Model/data policy | 证明 AI use bounded |
| Eval and red-team result | 证明模型不越权、不幻觉、不泄露 |
| Fraud threat model | 证明 abuse scenarios 被覆盖 |
| Revocation test | 证明撤销影响 collection/use/cache/features/embeddings |
| Evidence replay test | 证明投诉/审计能重放 |
15. Product / Architecture Decisions
| Decision | Weak answer | Strong architecture answer |
|---|---|---|
| What data should we request? | “All transactions, just in case” | Request minimum categories, accounts and time window for the customer-requested service |
| How long should access last? | “Until user disconnects” | Purpose-specific duration, reauthorization, runtime consent check and expiration behavior |
| Can AI train on the data? | “It improves the product” | Separate explicit policy/consent, de-identification review, model-risk approval and opt-out handling |
| Can we use data for cross-sell? | “Customer connected bank data” | Only if separately permitted and consistent with purpose, policy and customer expectation |
| Direct API or aggregator? | “Fastest SDK” | Evaluate coverage, consent UX, contract, security, revocation propagation, data quality and exit |
| RAG architecture? | “Index all account data” | Customer/purpose-isolated retrieval, source grounding, freshness, revocation-aware embedding lifecycle |
| Fraud controls? | “Bank authenticated the customer” | Risk-based authentication, app reputation, access anomaly, scam intervention and human escalation |
| Revocation? | “Delete token” | Stop collection, notify ecosystem, enforce retention, disable derived features and purge/restrict embeddings |
| Advice boundary? | “AI gives financial advice” | Define information, education, recommendation, regulated advice and decision boundaries |
| Evidence? | “Logs exist” | Case-level bundle linking consent, API calls, data transforms, model run, decision and customer communication |
16. Control Matrix
| Control objective | Control activity | Evidence |
|---|---|---|
| Establish lawful/policy basis through governance | Legal/Compliance/Privacy review of role, jurisdiction, data type and product scope | approval record, interpretation memo |
| Minimize data collection | Data category/time-window/account selection tied to customer-requested service | minimization matrix, API scopes |
| Preserve informed authorization | Clear disclosure of recipient, provider, service, categories, duration and revocation | disclosure version, signed consent receipt |
| Enforce consent at runtime | Entitlement check for API, feature store, RAG and agent tools | consent check log, denied access events |
| Support revocation | Easy revocation, ecosystem notification, collection stop, retention and derived-data handling | revocation receipt, propagation log |
| Govern API access | Client registration, authentication, token binding, scopes, rate limits, anomaly monitoring | developer record, gateway logs |
| Manage third-party risk | Due diligence, contractual restrictions, audit rights, incident obligations, exit plan | TPRM file, contract clauses |
| Assure data quality | Freshness, completeness, dedup, correction, source lineage and quality scores | quality dashboard, lineage records |
| Bound AI features | Allowed/prohibited uses for features, inferences, training and decisions | feature policy, model card, eval result |
| Secure RAG | Customer/purpose isolation, source citations, prompt injection controls, retrieval logs | RAG eval, source trace |
| Control agent actions | Separate action-bound approval for submission/payment/account changes | approval id, action hash |
| Detect fraud/scam | Monitor malicious recipient, ATO, API exfiltration, APP scam and synthetic behavior | fraud dashboard, case evidence |
| Preserve audit replay | Link consent, API, data, AI, human and final customer message | evidence bundle |
| Improve continuously | Incident, complaint, model drift and data-quality CAPA | RCA, CAPA owner, closure evidence |
17. Metrics and KRIs
| Metric family | Examples |
|---|---|
| Customer value | successful connections, time-to-value, budgeting accuracy, fee savings, subscription cancellation success, cash-flow alert usefulness |
| Consent quality | disclosure comprehension, consent drop-off, over-scope defects, reauthorization completion, revocation success |
| Data minimization | average data categories requested, account/time-window scope, full-history request rate, unused data ratio |
| API ecosystem | developer approval time, API uptime, response time, error rates, provider coverage, schema variance |
| Data quality | freshness, completeness, duplicate rate, categorization confidence, correction rate, provider issue backlog |
| AI/RAG safety | grounded-answer rate, unsupported conclusion rate, source citation completeness, prompt-injection block rate |
| Feature governance | features with consent lineage, prohibited-use violations, sensitive inference incidents, revocation feature disablement |
| Fraud/scam | malicious recipient attempts, abnormal API calls, ATO-linked authorizations, scam escalation, manual review outcomes |
| Third-party risk | overdue reviews, subprocessor changes, audit findings, incident notifications, exit-test completion |
| Customer trust | complaints about data sharing, unclear use, revocation failure, wrong AI advice, perceived surveillance |
| Evidence | replay completeness, missing consent id rate, missing AI trace rate, retention-rule defects |
Balanced executive dashboard:
Value: customers get useful financial outcomes.
Control: every data use is purpose-bound and revocable.
Safety: AI does not exceed consent, evidence or advice boundaries.
Trust: customers can understand, revoke and contest.
Ecosystem: APIs and third parties perform reliably and securely.
Evidence: each answer, recommendation and action can be replayed.
18. Failure Modes
| Failure mode | Why dangerous | Better control |
|---|---|---|
| Consent screen as blanket waiver | 客户授权被解释成无限使用 | purpose-bound consent object and runtime enforcement |
| Generic data lake ingestion | open banking data loses consent, purpose and revocation metadata | consent-bound data zone and lineage tags |
| RAG indexes all account data | cross-purpose and post-revocation leakage | isolated, entitlement-aware retrieval |
| AI infers sensitive traits from transactions | unfairness, privacy harm, unsupported decisions | sensitive inference policy and prohibited uses |
| Revocation only deletes API token | cache/features/embeddings continue to use data | revocation propagation and lifecycle testing |
| API error treated as financial fact | model says customer has no funds when API failed | error-aware prompts and freshness gates |
| Aggregator selected only for coverage | hidden security, quality, contract and exit risks | TPRM and architecture review |
| Customer credentials still collected | defeats safer open banking model | prohibit credential sharing and monitor fallback |
| Developer app over-requests data | privacy harm and customer distrust | scope review, sandbox certification, monitoring |
| Model training on consented data by default | violates purpose expectation and data-use boundary | explicit AI training policy and approval |
| Agent initiates action under data consent | unauthorized payment/application/account changes | separate action authorization |
| Complaint lacks evidence chain | cannot explain or remediate | consent-to-AI-to-decision evidence bundle |
19. Interview-Ready Takeaways
Q1: Consented data sharing 和 data harvesting 的本质区别是什么?
Consented data sharing 是 customer-requested service 下的 purpose-bound, scoped, revocable, evidenced data access。Data harvesting 是用宽泛同意收集尽可能多的数据, 再用于不相关目的、训练、广告、交叉销售或转售。架构区别在于 consent 是否进入 API scope、data lineage、feature store、RAG retrieval、agent tool 和 revocation enforcement。
Q2: AI open banking 架构中最重要的控制点是什么?
不是单一 API 或模型, 而是 consent registry 作为 runtime control plane。它要驱动 API scope、data lake tags、feature policies、RAG retrieval、agent permissions、retention and revocation。没有这个 control plane, AI 很容易把授权访问变成无边界使用。
Q3: 如何设计 RAG over account data?
按 customer、purpose、consent 和 time window 做隔离检索; 每次检索检查 entitlement; 输出必须引用 source rows 和 freshness; prompt-injection、sensitive inference、prohibited advice 和 agent action 都要受 policy engine 控制; revocation 后要停止未来检索并处理 embeddings lifecycle。
Q4: open banking 数据能否用于模型训练?
不能默认可以。训练用途要和 customer-requested service、授权、隐私政策、合同、适用法律/监管解释、de-identification、retention、model-risk approval 和 customer expectation 一起评估。尤其不能把为预算或账户验证授权的数据自动用于 unrelated model training。
Q5: Senior PM 如何评估 open banking AI feature?
同时看 value, consent, minimization, API reliability, data quality, third-party risk, model boundary, fraud/scam controls, revocation, evidence and customer trust。能上线的不是“模型很聪明”, 而是“客户授权清楚、系统边界可执行、撤销有效、结果可解释、风险可监控”。
20. Practical Templates
20.1 Use Case Data Sharing Card
| Field | Example |
|---|---|
| use_case | AI cash-flow assistant |
| customer_requested_service | monthly cash-flow insight and bill reminder |
| data_provider | customer-selected bank or wallet provider |
| data_recipient | financial planning app |
| aggregator | approved aggregator if used |
| data_categories | transaction history, balance, upcoming bills |
| time_window | last 6 months plus current balance |
| purpose | budgeting and bill reminder |
| excluded_uses | targeted ads, unrelated cross-sell, sale of covered data, default model training |
| AI allowed uses | summarize, forecast, explain, alert |
| AI prohibited uses | credit eligibility, identity proofing, regulated advice without approval |
| revocation_action | stop collection, disable refresh, restrict features, purge/restrict embeddings per policy |
| evidence | consent receipt, API request ids, source rows, AI run id, final answer |
20.2 API Scope Contract
Scope name:
Data category:
Fields:
Purpose:
Required consent version:
Maximum lookback:
Refresh frequency:
Freshness requirement:
Error semantics:
Revocation behavior:
Downstream sharing:
AI allowed uses:
AI prohibited uses:
Retention:
Evidence fields:
Owner:
20.3 AI Feature Boundary Record
Feature:
Source data categories:
Customer-requested service:
Consent ids:
Transformation lineage:
Allowed model ids:
Allowed decisions:
Prohibited decisions:
Sensitive inference risk:
Fairness review:
Human review triggers:
Revocation action:
Monitoring metric:
Evidence reference:
20.4 RAG Answer Evidence Record
Question:
Customer/session:
Consent check:
Purpose:
Retriever policy:
Source records:
Data freshness:
Quality caveats:
Prompt version:
Model version:
Output policy checks:
Final answer:
Customer message id:
Retention rule:
20.5 Revocation Runbook
Revocation event:
Customer:
Data recipient:
Data provider:
Aggregator:
Consent ids:
API token action:
Data collection stop time:
Cache action:
Feature action:
Embedding action:
Agent/session action:
Downstream notification:
Customer receipt:
Exceptions:
Evidence bundle:
Owner:
21. Final Operating Principle
成熟的 AI open banking / open finance architecture 可以用一个问题检验:
Can the institution prove that every AI answer, feature, recommendation and action
used only the financial data the customer authorized for that purpose,
through governed APIs and third parties,
with quality and freshness understood,
with revocation propagated,
with fraud and scam controls active,
with inferences separated from facts,
and with evidence sufficient for audit, complaint and customer trust?
如果答案不清楚, 企业不是缺一个 open banking connector。它缺的是 consent control plane、API ecosystem governance、AI feature boundary、RAG evidence、third-party risk controls、fraud/scam monitoring、revocation lifecycle 和 customer trust operating model。