AI 扩展计划 / Playbooks

AI Synthetic Data Governance / Privacy-Utility-Fidelity Playbook

版本: v1.0

434 行AI_SYNTHETIC_DATA_GOVERNANCE_PRIVACY_UTILITY_FIDELITY_PLAYBOOK.md

AI Synthetic Data Governance Privacy-Utility-Fidelity Playbook

版本: v1.0
日期: 2026-06-30
适用对象: AI 产品经理、CBAP / Senior BA、数据产品架构师、AI 架构师、隐私风险伙伴、模型风险管理、金融零售业务负责人、数据治理、信息安全、内审

Purpose and when to use

本手册用于把 synthetic data 从“项目临时生成文件”升级为可治理、可发布、可审计的数据产品。它适合以下场景:

场景	什么时候使用本手册
AML typology training	需要构造 mule network、layering、structuring、trade-based pattern, 但不能复制真实 case
KYC document testing	需要测试 OCR、document ingestion、exception routing、vendor sandbox, 但禁止使用真实证件图像
Credit model scenario augmentation	需要扩展边界样本、压力测试政策阈值、验证 adverse action reason 稳定性
Payment fraud simulation	需要模拟 APP scam、account takeover、merchant fraud、device anomaly 和 rule trigger
Contact center transcript generation	需要训练 copilot、intent classifier、QA rubric, 但不应暴露真实通话原文
Complaint analytics	需要扩展 complaint taxonomy、root-cause classifier、regulatory response drill, 但不能制造真实投诉指标
Vendor PoC / sandbox	需要给供应商提供近似业务数据, 但要限制用途、保留、再训练和再分发

使用边界:

本手册不是法律意见、隐私影响评估结论、模型验证报告或监管解释。
Synthetic data 不自动等于 anonymized / de-identified / safe for sharing。
如果 source data 不允许某用途, 不能靠“合成”绕过目的限制、合同边界或客户期望。
高风险 synthetic data 只有在 privacy、utility、fidelity、bias、license 和 release evidence 都满足门槛后, 才能释放。

Operating model

1. Lifecycle

Intake
-> source permission review
-> generation design
-> controlled build
-> privacy attack testing
-> utility/fidelity scoring
-> bias/coverage review
-> data card and allowed-use license
-> release gate
-> catalog, access, monitoring
-> renewal, restriction or retirement

2. RACI

Activity	PM	BA	Data Owner	Architect	Data Science	Privacy	Security	Model Risk	Legal/Compliance	Business Owner
Define use case and allowed use	A/R	R	C	C	C	C	C	C	C	A
Classify source data and permissions	C	C	A/R	C	C	A/R	C	C	A/R	C
Design generation approach	C	C	C	A/R	A/R	C	C	C	C	I
Control generation environment	I	I	C	A/R	R	C	A/R	I	C	I
Run privacy attack tests	I	I	C	C	R	A/R	C	C	C	I
Score utility/fidelity	A/R	R	C	C	R	C	I	C	C	C
Review bias/coverage	C	C	C	C	R	C	C	A/R	C	A
Approve release gate	A	C	A	A/R	C	A/R	A/R	A/R	A/R	A
Monitor usage and expiry	R	C	A/R	C	C	C	C	C	C	A

RACI discipline:

Business Owner owns why the synthetic data is needed and accepts business residual risk.
Data Owner owns source permission and release scope.
Privacy and Security own leakage, re-identification, environment and access controls.
Model Risk owns model-impacting uses, especially credit, fraud, AML and decision-support augmentation.
PM / BA / Architect ensure the license is embedded in PRD, architecture review, release checklist and vendor handoff.

3. Gate levels

Gate	Use type	Required reviewers	Release posture
G0 Draft mock	Design-only mock data, no sensitive source	PM / Tech Lead	Local use, no catalog release
G1 Internal low risk	Demo data, generic process testing	PM, Data Owner	Internal release with data card and expiry
G2 Controlled operational test	KYC, contact center, complaint, payment test data	PM, Data Owner, Architect, Privacy, Security	Cataloged release with license and attack evidence
G3 Model-impacting	Training, fine-tune, validation, scenario augmentation	Data Owner, Model Risk, Privacy, Risk, Architect	Limited release with residual risk owner and monitoring
G4 External / vendor	Vendor PoC, offshore test, partner review	Legal, Procurement, Privacy, Security, Business Owner	Contract-bound release with deletion evidence

Template: synthetic data intake

Field	Required content	Example
Request ID	Stable ID tied to PRD / ADR / ticket	`SYN-KYC-DOC-2026-001`
Business objective	The concrete decision, workflow or test supported	Test KYC OCR and exception routing for document quality edge cases
Data product type	Tabular, text, transcript, image, document, graph, sequence, mixed	Synthetic document images + field labels
Target users	Named teams, roles, vendors and locations	KYC QA, onboarding platform team, approved OCR vendor
Downstream system	RAG, Agent, Copilot, model training, analytics, QA, sandbox	OCR regression pipeline and vendor sandbox
Approved use	Precise allowed actions	Run extraction tests and compare field-level accuracy
Prohibited use	Explicitly blocked actions	No production decisioning, no model pretraining, no customer profiling
Source data summary	Systems and field categories, not raw examples	KYC policy rules, document templates, historical exception taxonomy
Sensitive data class	PII, financial crime, credit, complaint, call recording, employee data	Identity document-like data, no real document images
Risk tier	G0-G4 recommendation with rationale	G4 because vendor sandbox release is requested
Utility target	What “useful enough” means	OCR extracts required fields with 95% expected-label agreement in test
Fidelity target	What real-world structure must be preserved	Layouts, expiry formats, country-specific field constraints, image noise
Privacy target	What leakage must be prevented	No real names, IDs, faces, document numbers, metadata or source image similarity
Expiry	Date or event requiring renewal/deletion	90 days after vendor test completion
Business owner	Accountable owner	Head of Digital Onboarding

Intake decision rule:

No approved use, no source permission, no release.

Template: source permission matrix

Source	Data class	Owner	Proposed use in synthesis	Permission basis	Exclusions	Retention	Evidence
AML closed case taxonomy	Financial crime sensitive	Financial Crime Ops	Abstract typology labels and event sequences	Training simulation approved by FC governance	Real names, account numbers, counterparties, exact dates	1 year synthetic training set, source extracts deleted after build	FC owner approval, privacy review
KYC document policy rules	Internal policy	Onboarding Policy	Generate valid/invalid document combinations	Internal testing allowed	Real customer documents excluded	Until next policy version	Policy owner approval
Contact center intent labels	Customer interaction metadata	Contact Center Data Owner	Condition transcript generation by intent	QA and copilot testing approved	Raw transcripts, authentication phrases, vulnerability notes unless reviewed	180 days synthetic set	Data classification, transcript exclusion log
Payment fraud typologies	Fraud sensitive	Fraud Risk	Build transaction sequence simulation	Fraud rule stress testing approved	Unique merchant/customer/device combinations	90 days pilot	Fraud risk sign-off
Complaint taxonomy	Complaint / conduct sensitive	Complaints Ops	Generate complaint narratives by theme	QA taxonomy testing approved	Legal privilege, regulator-specific case text, identifiable narratives	180 days	Compliance review

Permission checks:

Does the original collection purpose support synthetic generation for this use?
Does any contract or policy restrict derivatives, vendor sharing, retention or model training?
Are there fields that must be excluded before generation rather than redacted after generation?
Does source data include rare events whose structure could identify a customer even after field removal?
Will intermediate artifacts be deleted or retained as evidence?

Template: privacy attack checklist

Test	Required for	Method	Pass evidence	Escalation trigger
PII / SPI scan	All G1+ text, document, transcript, tabular data	Automated scan + sampled human review	No unauthorized names, IDs, addresses, accounts, phone, email, auth phrases	Any direct identifier found
Nearest-neighbor similarity	G2+ datasets derived from real cases	Compare synthetic samples with approved source embeddings/features	Similarity below threshold or high-similarity samples removed	Rare case or near-copy detected
Membership inference	G3/G4 model-impacting or externally released data	Attack model or holdout comparison where feasible	Attack performance not materially above baseline	Attacker can infer source membership
Model inversion	G3/G4, especially text/credit/fraud	Attempt reconstruction of sensitive attributes from output or generator	Sensitive reconstruction below threshold, no usable identity clues	Sensitive field reconstructed or inferred
Linkage attack	G2+ transaction, graph, complaint, AML, credit	Join-like test using quasi-identifiers or external/internal reference fields	Small cells suppressed, combinations generalized	Unique customer-like path exposed
Canary extraction	LLM or neural generator trained/fitted on sensitive source	Insert controlled marker during test build and probe outputs	Canary not reproduced	Canary reproduced or paraphrased closely
Metadata / provenance leak	Document/image/transcript files	Inspect EXIF, file metadata, hidden comments, prompt logs	Metadata scrubbed, synthetic marker retained	Real source path, author, customer or vendor metadata present
Prompt/source leakage	LLM generation	Review prompts, logs and outputs	No raw sensitive source in uncontrolled prompts/logs	Raw transcript/case/doc copied into vendor prompt

Privacy result classification:

Result	Meaning	Action
Pass	Meets gate threshold and no material exceptions	Continue to utility/fidelity review
Conditional pass	Minor exceptions removed, limited use, stronger controls	Release only with restricted license and expiry
Fail	Material leakage or attack success	Regenerate, reduce fidelity, narrow source, change method or reject

Template: utility/fidelity scorecard

Dimension	Metric	Target	Evidence	Example
Task utility	Downstream task score / workflow pass rate	Meets approved use threshold	Test report	KYC OCR extracts required fields at target accuracy
Defect discovery	Number and severity of meaningful failures found	Finds known and plausible edge failures	QA report	Payment fraud simulation triggers rule gaps
Business rule validity	Impossible combination rate	Below threshold set by SME	Rule engine output	Credit attributes obey policy constraints
Distribution fidelity	PSI / KS / correlation delta / domain comparison	Within approved range for relevant fields	Statistical report	Transaction amount bands and timing resemble target segment
Temporal fidelity	Sequence order, seasonality, burst behavior	Matches domain pattern needed for test	Sequence analysis	Fraud bursts and device changes occur in plausible order
Text realism	SME realism rating, taxonomy alignment	Meets rubric threshold	SME panel review	Complaint narrative includes product, harm, resolution request
Document fidelity	Layout validity, OCR readability, image artifact realism	Meets test target	Visual/OCR inspection	Synthetic ID statement has valid layout and controlled noise
Graph fidelity	Motif, degree, path length, community pattern	Preserves typology without copying real graph	Graph report	AML mule ring has plausible layering structure
Stability	Regeneration variance	Within expected band	Re-run comparison	Same generator settings produce comparable metrics
Limitation clarity	Known gaps documented	All material gaps captured	Data card	Dataset does not represent rural branch onboarding

Decision rule:

Utility must be measured against the approved use, not against generic realism.
Fidelity must preserve business structure without copying customer-identifying facts.

Template: bias/coverage review

Review area	Question	Evidence	Example control
Segment coverage	Which customer/product/channel segments are included or absent?	Coverage matrix	Separate online, branch, call center and mobile-wallet paths
Edge-case coverage	Which low-frequency high-impact cases are represented?	Scenario list	KYC expired document + address mismatch + manual review
Vulnerable customer risk	Does generated language stereotype or mishandle hardship, disability, language barriers or scams?	SME/risk review	Approved vulnerability phrase library and escalation rules
Protected/proxy attributes	Are sensitive attributes used directly or inferred through proxies?	Feature review	Remove or restrict proxy-like fields unless approved for fairness testing
Label amplification	Does generator reproduce historical biased labels or investigator decisions?	Label distribution and error review	Separate historical label from adjudicated target label
Complaint harm framing	Does narrative understate customer harm or overfit company-friendly resolution?	Complaints QA review	Include harm, impact, expectation and remediation fields
Credit fairness	Does scenario augmentation distort adverse action reasons or subgroup performance?	Model risk review	Run subgroup utility and reason-code stability checks
Fraud false positives	Does simulation over-associate specific behaviors with fraud risk?	Fraud risk review	Balance legitimate lookalike behaviors and require reason codes
Language and channel	Does generated data overrepresent polished English/digital confidence?	Linguistic/channel review	Include approved multilingual or low-digital-confidence cases

Bias decision categories:

Category	Action
Acceptable for limited purpose	Release with documented limitations
Needs rebalance	Regenerate or add missing segments before release
Needs stronger warning	Release only for narrow test, not model training
Reject	Bias amplification creates unacceptable customer, conduct or model risk

Template: allowed-use license

Dataset ID: SYN-PAY-FRAUD-SEQUENCE-2026-002
Version: v1.0
Status: Limited release
Approved purpose: Payment fraud rule and warning UX stress testing.
Allowed users: Fraud Risk Analytics, Payment Platform QA, named AI copilot test engineers.
Allowed environments: Enterprise non-production fraud sandbox only.
Allowed operations: Query, aggregate, run fraud rule simulations, run agent workflow tests, create test reports.
Prohibited operations: Production decisioning, customer scoring, external sharing, model pretraining, customer re-identification, linkage to production customer tables, employee performance analytics.
Derivative rule: All derivative samples, embeddings, reports and test fixtures inherit this license and must retain synthetic marker.
Watermark/provenance: Dataset metadata must retain synthetic=true, source_permission_id, generation_run_id and license_id.
Retention: Delete working copies after 90 days. Evidence packet retained under governance retention schedule.
Review trigger: New fraud typology source, vendor release request, model training request, privacy attack finding, incident, expiry.
Approvers: Business Owner, Data Owner, Privacy, Security, Model Risk, Architect.

License table version:

License field	Required wording standard
Approved purpose	One or two concrete uses, not broad “AI development”
Allowed users	Roles and teams; vendors named separately
Allowed environments	Named non-production / sandbox / approved external environment
Allowed operations	Read, query, transform, test, train, validate, export, embed, index, summarize
Prohibited operations	Re-identification, production decisioning, unrelated training, marketing, resale, external sharing
Derivative rule	Defines whether embeddings, prompts, labels, reports and transformed files inherit restrictions
Watermark/provenance	Required markers and metadata
Retention	Expiry, deletion proof, evidence retention
Review trigger	Events that force reapproval

Template: release evidence packet

Evidence item	G1	G2	G3	G4
Use-case intake	Required	Required	Required	Required
Risk tier rationale	Required	Required	Required	Required
Source permission matrix	Summary	Required	Required	Required
Data classification	Required	Required	Required	Required
Generation plan and method	Summary	Required	Required	Required
Generator version / prompts / rules	Optional	Required	Required	Required
Privacy attack checklist	Basic scan	Required	Required with report	Required with report
Utility/fidelity scorecard	Basic SME review	Required	Required	Required
Bias/coverage review	Optional	Required for customer-like data	Required	Required
Data card	Required	Required	Required	Required
Allowed-use license	Required	Required	Required	Required
Watermark/provenance evidence	Recommended	Required	Required	Required
Security environment evidence	Optional	Required	Required	Required
Vendor/contract evidence	Not needed	If vendor involved	If vendor involved	Required
Residual risk owner	Optional	Required for exceptions	Required	Required
Retention/deletion plan	Required	Required	Required	Required
Monitoring plan	Optional	Required	Required	Required
Approver record	Required	Required	Required	Required

Release decision log:

Decision	When to use
Approved	All hard gates pass, limitations documented
Approved with restrictions	Minor gaps controlled by narrower use, shorter retention, smaller audience or stronger monitoring
Rework required	Utility/fidelity/bias/privacy evidence incomplete or fixable
Rejected	Privacy leakage, source permission failure, unacceptable bias, misleading utility or prohibited downstream use

PM/BA/architecture questions

Product questions

Question	What a strong answer shows
What business decision does this synthetic dataset support?	Clear purpose instead of generic AI experimentation
Who will consume it and what action will they take?	User, workflow and environment clarity
What would be harmed if the synthetic data is wrong?	Customer, operational, model, conduct and compliance impact
What is explicitly out of scope?	Prevention of purpose creep
What would make this dataset no longer fit for use?	Expiry, drift, policy change, source change

BA questions

Question	What a strong answer shows
Which process steps, exceptions and controls must be represented?	Real workflow grounding
Which business rules must never be violated?	Test oracle and rule validity
Which labels or expected outputs are authoritative?	Prevents synthetic label drift
What limitations must users see before consuming the dataset?	Data card discipline
Which stakeholders must approve changes to scope?	Governance routing

Architecture questions

Question	What a strong answer shows
Where is synthetic generation executed and logged?	Environment and evidence design
How are generator prompts, rules, seeds and versions managed?	Reproducibility and lineage
How does watermark/provenance survive transformations and embeddings?	Downstream control
Can synthetic data enter RAG indexes, agent tools or model training pipelines?	Typed access and license enforcement
How will unauthorized use be detected?	Access logs, catalog policy, pipeline checks
What is the deletion and renewal mechanism?	Retention operationalization

Privacy/risk questions

Question	What a strong answer shows
Can an attacker infer source membership, identity or sensitive attributes?	Privacy attack thinking
What quasi-identifiers or rare patterns remain?	Linkage risk awareness
Does the dataset amplify historical bias or false-positive patterns?	Fairness and conduct risk
Who accepts residual risk and when does it expire?	Accountability
What evidence would satisfy internal audit?	Replayable evidence packet

Release checklist

Use this checklist before any G2+ release.

#	Check	Pass condition
1	Use case approved	Business purpose, target users, allowed/prohibited uses are documented
2	Risk tier assigned	G-level is justified by data class, use, audience and environment
3	Source permission complete	Data Owner, Privacy/Legal/Compliance as needed approve source use
4	Generation plan approved	Method, generator, prompts/rules, source windows and exclusions are recorded
5	Environment controlled	Generation and storage environment meets security requirements
6	Privacy attacks executed	Required tests pass or exceptions are remediated/restricted
7	Utility measured	Dataset supports the approved task with evidence
8	Fidelity measured	Relevant domain structure is preserved without unsafe copying
9	Bias/coverage reviewed	Coverage gaps and amplification risks are accepted or corrected
10	Data card complete	Purpose, source summary, methods, tests, limitations and owner are visible
11	License attached	Allowed uses, prohibited uses, derivative rules, retention and expiry are explicit
12	Watermark/provenance applied	Metadata markers and generation lineage are retained
13	Access controls configured	Only approved users/environments can access
14	Retention/deletion scheduled	Working copies, derivatives and evidence retention are defined
15	Release evidence packet stored	All approval and evidence artifacts are linked
16	Monitoring enabled	Access, use, exceptions and expiry are tracked
17	Reopen triggers defined	New use, vendor release, training use, attack finding, incident or expiry triggers review

Hard-stop conditions:

Source permission is unclear or denied.
Direct identifiers or near-copies remain.
Membership inference, linkage or inversion risk exceeds threshold.
Utility evidence does not support requested use.
Bias review identifies unacceptable customer or conduct risk.
Downstream use includes production decisioning without explicit high-risk approval.
Vendor release lacks contract, retention and deletion evidence.

Executive narrative

1-minute version

Synthetic data can accelerate AI delivery, but only if it is governed as a data product. The main risk is not that the data is fake; the risk is that teams treat it as automatically safe, then use it for training, testing, vendor sharing or analytics beyond its permission boundary. Our architecture introduces risk-tiered release gates: source permission, generation controls, privacy attack testing, utility/fidelity scoring, bias review, data card, allowed-use license, provenance and retention. This lets us use synthetic AML, KYC, credit, fraud, contact center and complaint datasets safely enough for approved purposes while preserving audit evidence.

CTO / CDO version

The platform decision is to build a synthetic data control plane, not a loose generation toolkit. The control plane standardizes intake, generator registry, source permission, privacy attack workbench, scorecards, catalog, license enforcement, watermark/provenance and expiry. Engineering gets reusable datasets and faster test coverage. Data governance gets a record of what was generated, from which source, for which use, and under which restrictions. Risk teams get evidence that privacy, utility, fidelity and bias were measured before release. This reduces duplicated scripts, vendor sandbox sprawl and hidden training-data contamination.

CRO / Privacy version

Synthetic data does not remove privacy or conduct risk by definition. The release gate must prove that the dataset cannot reasonably expose real customers, rare cases, documents, calls or transaction paths, and that it will not be used outside the approved purpose. The evidence package includes source permission, membership inference and linkage testing, model inversion review where relevant, bias/coverage review, residual risk owner, retention and deletion plan. For high-risk uses, approval is conditional and time-bound.

Interview drills

Drill 1: “Is synthetic data automatically safe to use with vendors?”

30 秒答案:

Synthetic data is not automatically safe. I would first check source permission, whether the vendor contract allows derivative data, and whether the dataset passed privacy attack testing. For vendor release I would require a G4 gate: data card, allowed-use license, watermark/provenance, access restrictions, retention period, deletion proof and explicit prohibition on model training or onward sharing unless approved.

2 分钟展开:

The key mistake is assuming that removing direct identifiers or generating records from a model eliminates privacy risk. Synthetic data can preserve rare transaction patterns, complaint narratives, KYC document artifacts or fraud graph structures that support linkage or membership inference. For vendor use, I would narrow the purpose, reduce fidelity where needed, test for nearest-neighbor similarity and PII leakage, confirm contractual restrictions, and require evidence that the vendor cannot reuse the data for unrelated model training. If the vendor needs higher fidelity, that is a risk trade-off requiring stronger controls, not an informal data share.

Drill 2: “How do you balance privacy, utility and fidelity?”

30 秒答案:

I treat privacy, utility and fidelity as separate gate dimensions, not one averaged score. Privacy asks whether real people or cases can be inferred. Utility asks whether the dataset supports the approved task. Fidelity asks whether it preserves the relevant business structure without copying protected facts. Any hard failure can block or narrow release.

2 分钟展开:

For example, payment fraud simulation needs realistic transaction timing, device changes and merchant patterns, but copying exact rare fraud sequences can expose source cases. I would abstract typologies, generate plausible variants, suppress unique combinations, and run linkage and nearest-neighbor tests. Then I would measure utility through rule trigger coverage and investigator realism scores, and fidelity through sequence validity. If privacy is strong but utility is weak, the dataset may be suitable for demos but not rule validation. If utility is strong but privacy fails, it cannot be released until regenerated or restricted.

Drill 3: “What makes a synthetic dataset fit for credit model augmentation?”

30 秒答案:

It needs source permission, model-risk approval, privacy attack evidence, policy-valid features, subgroup coverage, reason-code stability and clear limits. I would not let synthetic credit data become production training input unless Model Risk, Privacy, Data Owner and Business Owner approve the use and evidence shows it does not amplify bias or distort adverse action explanations.

2 分钟展开:

Credit is high impact because synthetic data can alter model boundaries and fairness behavior. I would define whether the dataset is for sensitivity analysis, challenger testing, rejected-inference exploration or actual training. The scorecard would include distribution fidelity, business rule validity, subgroup performance, proxy-feature review, label provenance, adverse action reason stability and membership/linkage testing. The license would prohibit production scoring and unrelated use unless explicitly approved. This protects the organization from treating synthetic scenarios as ground truth.

Drill 4: “How would you govern contact center transcript generation?”

30 秒答案:

I would avoid sending raw transcripts to an uncontrolled LLM. I would use approved intent labels, redacted summaries and policy constraints to generate synthetic conversations, then run PII scans, phrase similarity checks, vulnerability-language review and utility testing against copilot QA or intent classification tasks. The data card must say it is synthetic and cannot be used as real customer evidence.

2 分钟展开:

Contact center transcripts contain authentication phrases, account details, hardship signals, health or vulnerability disclosures and employee behavior. The generator should operate in a controlled environment using minimized source features. Utility should be measured by intent classification accuracy, escalation coverage and SME realism. Bias review should check that the synthetic language does not stereotype customers with low digital confidence, accents, vulnerability markers or complaint intensity. The license should prohibit employee performance analytics, sentiment profiling beyond approved QA, and any attempt to reconstruct real calls.

Drill 5: “What evidence would you show internal audit?”

30 秒答案:

I would show the intake, source permission matrix, data classification, generation plan, privacy attack report, utility/fidelity scorecard, bias review, data card, allowed-use license, watermark/provenance evidence, approval record, residual risk owner, retention plan and access logs. The goal is to replay why the dataset was created, why it was safe enough for that use, who approved it and how misuse is controlled.

2 分钟展开:

Audit does not need a story that “synthetic means safe”; it needs evidence. I would connect the dataset ID to PRD, ADR, release gate and downstream systems. The evidence packet should show source owners approved the purpose, privacy tests were run, utility was measured against the actual task, limitations were disclosed, and access was limited to approved environments. If the dataset was shared externally, I would add contract terms, deletion proof and vendor access logs. If it was used for model training or validation, I would include model-risk sign-off and monitoring thresholds.

Source anchors

Source	Link	How to use this anchor
UK ICO synthetic data guidance	https://ico.org.uk/about-the-ico/research-reports-impact-and-evaluation/research-and-reports/technology-and-innovation/synthetic-data/	Ground synthetic data privacy, utility and governance discussion
NIST Privacy Framework	https://www.nist.gov/privacy-framework	Structure privacy risk management, controls and evidence
NIST AI RMF	https://www.nist.gov/itl/ai-risk-management-framework	Map synthetic data risks to AI governance, measurement and management
ISO/IEC 42001	https://www.iso.org/standard/81230.html	Align with AI management system operations, roles and continual improvement
ISO/IEC 23894	https://www.iso.org/standard/77304.html	Align with AI risk management lifecycle
W3C PROV	https://www.w3.org/TR/prov-overview/	Represent synthetic data provenance through entities, activities and agents