返回 Papers
AI 扩展计划 / Playbooks

AI Synthetic Data Governance / Privacy-Utility-Fidelity Playbook

版本: v1.0

434AI_SYNTHETIC_DATA_GOVERNANCE_PRIVACY_UTILITY_FIDELITY_PLAYBOOK.md

AI Synthetic Data Governance Privacy-Utility-Fidelity Playbook

版本: v1.0
日期: 2026-06-30
适用对象: AI 产品经理、CBAP / Senior BA、数据产品架构师、AI 架构师、隐私风险伙伴、模型风险管理、金融零售业务负责人、数据治理、信息安全、内审


Purpose and when to use

本手册用于把 synthetic data 从“项目临时生成文件”升级为可治理、可发布、可审计的数据产品。它适合以下场景:

场景什么时候使用本手册
AML typology training需要构造 mule network、layering、structuring、trade-based pattern, 但不能复制真实 case
KYC document testing需要测试 OCR、document ingestion、exception routing、vendor sandbox, 但禁止使用真实证件图像
Credit model scenario augmentation需要扩展边界样本、压力测试政策阈值、验证 adverse action reason 稳定性
Payment fraud simulation需要模拟 APP scam、account takeover、merchant fraud、device anomaly 和 rule trigger
Contact center transcript generation需要训练 copilot、intent classifier、QA rubric, 但不应暴露真实通话原文
Complaint analytics需要扩展 complaint taxonomy、root-cause classifier、regulatory response drill, 但不能制造真实投诉指标
Vendor PoC / sandbox需要给供应商提供近似业务数据, 但要限制用途、保留、再训练和再分发

使用边界:

  • 本手册不是法律意见、隐私影响评估结论、模型验证报告或监管解释。
  • Synthetic data 不自动等于 anonymized / de-identified / safe for sharing。
  • 如果 source data 不允许某用途, 不能靠“合成”绕过目的限制、合同边界或客户期望。
  • 高风险 synthetic data 只有在 privacy、utility、fidelity、bias、license 和 release evidence 都满足门槛后, 才能释放。

Operating model

1. Lifecycle

Intake
-> source permission review
-> generation design
-> controlled build
-> privacy attack testing
-> utility/fidelity scoring
-> bias/coverage review
-> data card and allowed-use license
-> release gate
-> catalog, access, monitoring
-> renewal, restriction or retirement

2. RACI

ActivityPMBAData OwnerArchitectData SciencePrivacySecurityModel RiskLegal/ComplianceBusiness Owner
Define use case and allowed useA/RRCCCCCCCA
Classify source data and permissionsCCA/RCCA/RCCA/RC
Design generation approachCCCA/RA/RCCCCI
Control generation environmentIICA/RRCA/RICI
Run privacy attack testsIICCRA/RCCCI
Score utility/fidelityA/RRCCRCICCC
Review bias/coverageCCCCRCCA/RCA
Approve release gateACAA/RCA/RA/RA/RA/RA
Monitor usage and expiryRCA/RCCCCCCA

RACI discipline:

  • Business Owner owns why the synthetic data is needed and accepts business residual risk.
  • Data Owner owns source permission and release scope.
  • Privacy and Security own leakage, re-identification, environment and access controls.
  • Model Risk owns model-impacting uses, especially credit, fraud, AML and decision-support augmentation.
  • PM / BA / Architect ensure the license is embedded in PRD, architecture review, release checklist and vendor handoff.

3. Gate levels

GateUse typeRequired reviewersRelease posture
G0 Draft mockDesign-only mock data, no sensitive sourcePM / Tech LeadLocal use, no catalog release
G1 Internal low riskDemo data, generic process testingPM, Data OwnerInternal release with data card and expiry
G2 Controlled operational testKYC, contact center, complaint, payment test dataPM, Data Owner, Architect, Privacy, SecurityCataloged release with license and attack evidence
G3 Model-impactingTraining, fine-tune, validation, scenario augmentationData Owner, Model Risk, Privacy, Risk, ArchitectLimited release with residual risk owner and monitoring
G4 External / vendorVendor PoC, offshore test, partner reviewLegal, Procurement, Privacy, Security, Business OwnerContract-bound release with deletion evidence

Template: synthetic data intake

FieldRequired contentExample
Request IDStable ID tied to PRD / ADR / ticketSYN-KYC-DOC-2026-001
Business objectiveThe concrete decision, workflow or test supportedTest KYC OCR and exception routing for document quality edge cases
Data product typeTabular, text, transcript, image, document, graph, sequence, mixedSynthetic document images + field labels
Target usersNamed teams, roles, vendors and locationsKYC QA, onboarding platform team, approved OCR vendor
Downstream systemRAG, Agent, Copilot, model training, analytics, QA, sandboxOCR regression pipeline and vendor sandbox
Approved usePrecise allowed actionsRun extraction tests and compare field-level accuracy
Prohibited useExplicitly blocked actionsNo production decisioning, no model pretraining, no customer profiling
Source data summarySystems and field categories, not raw examplesKYC policy rules, document templates, historical exception taxonomy
Sensitive data classPII, financial crime, credit, complaint, call recording, employee dataIdentity document-like data, no real document images
Risk tierG0-G4 recommendation with rationaleG4 because vendor sandbox release is requested
Utility targetWhat “useful enough” meansOCR extracts required fields with 95% expected-label agreement in test
Fidelity targetWhat real-world structure must be preservedLayouts, expiry formats, country-specific field constraints, image noise
Privacy targetWhat leakage must be preventedNo real names, IDs, faces, document numbers, metadata or source image similarity
ExpiryDate or event requiring renewal/deletion90 days after vendor test completion
Business ownerAccountable ownerHead of Digital Onboarding

Intake decision rule:

No approved use, no source permission, no release.

Template: source permission matrix

SourceData classOwnerProposed use in synthesisPermission basisExclusionsRetentionEvidence
AML closed case taxonomyFinancial crime sensitiveFinancial Crime OpsAbstract typology labels and event sequencesTraining simulation approved by FC governanceReal names, account numbers, counterparties, exact dates1 year synthetic training set, source extracts deleted after buildFC owner approval, privacy review
KYC document policy rulesInternal policyOnboarding PolicyGenerate valid/invalid document combinationsInternal testing allowedReal customer documents excludedUntil next policy versionPolicy owner approval
Contact center intent labelsCustomer interaction metadataContact Center Data OwnerCondition transcript generation by intentQA and copilot testing approvedRaw transcripts, authentication phrases, vulnerability notes unless reviewed180 days synthetic setData classification, transcript exclusion log
Payment fraud typologiesFraud sensitiveFraud RiskBuild transaction sequence simulationFraud rule stress testing approvedUnique merchant/customer/device combinations90 days pilotFraud risk sign-off
Complaint taxonomyComplaint / conduct sensitiveComplaints OpsGenerate complaint narratives by themeQA taxonomy testing approvedLegal privilege, regulator-specific case text, identifiable narratives180 daysCompliance review

Permission checks:

  • Does the original collection purpose support synthetic generation for this use?
  • Does any contract or policy restrict derivatives, vendor sharing, retention or model training?
  • Are there fields that must be excluded before generation rather than redacted after generation?
  • Does source data include rare events whose structure could identify a customer even after field removal?
  • Will intermediate artifacts be deleted or retained as evidence?

Template: privacy attack checklist

TestRequired forMethodPass evidenceEscalation trigger
PII / SPI scanAll G1+ text, document, transcript, tabular dataAutomated scan + sampled human reviewNo unauthorized names, IDs, addresses, accounts, phone, email, auth phrasesAny direct identifier found
Nearest-neighbor similarityG2+ datasets derived from real casesCompare synthetic samples with approved source embeddings/featuresSimilarity below threshold or high-similarity samples removedRare case or near-copy detected
Membership inferenceG3/G4 model-impacting or externally released dataAttack model or holdout comparison where feasibleAttack performance not materially above baselineAttacker can infer source membership
Model inversionG3/G4, especially text/credit/fraudAttempt reconstruction of sensitive attributes from output or generatorSensitive reconstruction below threshold, no usable identity cluesSensitive field reconstructed or inferred
Linkage attackG2+ transaction, graph, complaint, AML, creditJoin-like test using quasi-identifiers or external/internal reference fieldsSmall cells suppressed, combinations generalizedUnique customer-like path exposed
Canary extractionLLM or neural generator trained/fitted on sensitive sourceInsert controlled marker during test build and probe outputsCanary not reproducedCanary reproduced or paraphrased closely
Metadata / provenance leakDocument/image/transcript filesInspect EXIF, file metadata, hidden comments, prompt logsMetadata scrubbed, synthetic marker retainedReal source path, author, customer or vendor metadata present
Prompt/source leakageLLM generationReview prompts, logs and outputsNo raw sensitive source in uncontrolled prompts/logsRaw transcript/case/doc copied into vendor prompt

Privacy result classification:

ResultMeaningAction
PassMeets gate threshold and no material exceptionsContinue to utility/fidelity review
Conditional passMinor exceptions removed, limited use, stronger controlsRelease only with restricted license and expiry
FailMaterial leakage or attack successRegenerate, reduce fidelity, narrow source, change method or reject

Template: utility/fidelity scorecard

DimensionMetricTargetEvidenceExample
Task utilityDownstream task score / workflow pass rateMeets approved use thresholdTest reportKYC OCR extracts required fields at target accuracy
Defect discoveryNumber and severity of meaningful failures foundFinds known and plausible edge failuresQA reportPayment fraud simulation triggers rule gaps
Business rule validityImpossible combination rateBelow threshold set by SMERule engine outputCredit attributes obey policy constraints
Distribution fidelityPSI / KS / correlation delta / domain comparisonWithin approved range for relevant fieldsStatistical reportTransaction amount bands and timing resemble target segment
Temporal fidelitySequence order, seasonality, burst behaviorMatches domain pattern needed for testSequence analysisFraud bursts and device changes occur in plausible order
Text realismSME realism rating, taxonomy alignmentMeets rubric thresholdSME panel reviewComplaint narrative includes product, harm, resolution request
Document fidelityLayout validity, OCR readability, image artifact realismMeets test targetVisual/OCR inspectionSynthetic ID statement has valid layout and controlled noise
Graph fidelityMotif, degree, path length, community patternPreserves typology without copying real graphGraph reportAML mule ring has plausible layering structure
StabilityRegeneration varianceWithin expected bandRe-run comparisonSame generator settings produce comparable metrics
Limitation clarityKnown gaps documentedAll material gaps capturedData cardDataset does not represent rural branch onboarding

Decision rule:

Utility must be measured against the approved use, not against generic realism.
Fidelity must preserve business structure without copying customer-identifying facts.

Template: bias/coverage review

Review areaQuestionEvidenceExample control
Segment coverageWhich customer/product/channel segments are included or absent?Coverage matrixSeparate online, branch, call center and mobile-wallet paths
Edge-case coverageWhich low-frequency high-impact cases are represented?Scenario listKYC expired document + address mismatch + manual review
Vulnerable customer riskDoes generated language stereotype or mishandle hardship, disability, language barriers or scams?SME/risk reviewApproved vulnerability phrase library and escalation rules
Protected/proxy attributesAre sensitive attributes used directly or inferred through proxies?Feature reviewRemove or restrict proxy-like fields unless approved for fairness testing
Label amplificationDoes generator reproduce historical biased labels or investigator decisions?Label distribution and error reviewSeparate historical label from adjudicated target label
Complaint harm framingDoes narrative understate customer harm or overfit company-friendly resolution?Complaints QA reviewInclude harm, impact, expectation and remediation fields
Credit fairnessDoes scenario augmentation distort adverse action reasons or subgroup performance?Model risk reviewRun subgroup utility and reason-code stability checks
Fraud false positivesDoes simulation over-associate specific behaviors with fraud risk?Fraud risk reviewBalance legitimate lookalike behaviors and require reason codes
Language and channelDoes generated data overrepresent polished English/digital confidence?Linguistic/channel reviewInclude approved multilingual or low-digital-confidence cases

Bias decision categories:

CategoryAction
Acceptable for limited purposeRelease with documented limitations
Needs rebalanceRegenerate or add missing segments before release
Needs stronger warningRelease only for narrow test, not model training
RejectBias amplification creates unacceptable customer, conduct or model risk

Template: allowed-use license

Dataset ID: SYN-PAY-FRAUD-SEQUENCE-2026-002
Version: v1.0
Status: Limited release
Approved purpose: Payment fraud rule and warning UX stress testing.
Allowed users: Fraud Risk Analytics, Payment Platform QA, named AI copilot test engineers.
Allowed environments: Enterprise non-production fraud sandbox only.
Allowed operations: Query, aggregate, run fraud rule simulations, run agent workflow tests, create test reports.
Prohibited operations: Production decisioning, customer scoring, external sharing, model pretraining, customer re-identification, linkage to production customer tables, employee performance analytics.
Derivative rule: All derivative samples, embeddings, reports and test fixtures inherit this license and must retain synthetic marker.
Watermark/provenance: Dataset metadata must retain synthetic=true, source_permission_id, generation_run_id and license_id.
Retention: Delete working copies after 90 days. Evidence packet retained under governance retention schedule.
Review trigger: New fraud typology source, vendor release request, model training request, privacy attack finding, incident, expiry.
Approvers: Business Owner, Data Owner, Privacy, Security, Model Risk, Architect.

License table version:

License fieldRequired wording standard
Approved purposeOne or two concrete uses, not broad “AI development”
Allowed usersRoles and teams; vendors named separately
Allowed environmentsNamed non-production / sandbox / approved external environment
Allowed operationsRead, query, transform, test, train, validate, export, embed, index, summarize
Prohibited operationsRe-identification, production decisioning, unrelated training, marketing, resale, external sharing
Derivative ruleDefines whether embeddings, prompts, labels, reports and transformed files inherit restrictions
Watermark/provenanceRequired markers and metadata
RetentionExpiry, deletion proof, evidence retention
Review triggerEvents that force reapproval

Template: release evidence packet

Evidence itemG1G2G3G4
Use-case intakeRequiredRequiredRequiredRequired
Risk tier rationaleRequiredRequiredRequiredRequired
Source permission matrixSummaryRequiredRequiredRequired
Data classificationRequiredRequiredRequiredRequired
Generation plan and methodSummaryRequiredRequiredRequired
Generator version / prompts / rulesOptionalRequiredRequiredRequired
Privacy attack checklistBasic scanRequiredRequired with reportRequired with report
Utility/fidelity scorecardBasic SME reviewRequiredRequiredRequired
Bias/coverage reviewOptionalRequired for customer-like dataRequiredRequired
Data cardRequiredRequiredRequiredRequired
Allowed-use licenseRequiredRequiredRequiredRequired
Watermark/provenance evidenceRecommendedRequiredRequiredRequired
Security environment evidenceOptionalRequiredRequiredRequired
Vendor/contract evidenceNot neededIf vendor involvedIf vendor involvedRequired
Residual risk ownerOptionalRequired for exceptionsRequiredRequired
Retention/deletion planRequiredRequiredRequiredRequired
Monitoring planOptionalRequiredRequiredRequired
Approver recordRequiredRequiredRequiredRequired

Release decision log:

DecisionWhen to use
ApprovedAll hard gates pass, limitations documented
Approved with restrictionsMinor gaps controlled by narrower use, shorter retention, smaller audience or stronger monitoring
Rework requiredUtility/fidelity/bias/privacy evidence incomplete or fixable
RejectedPrivacy leakage, source permission failure, unacceptable bias, misleading utility or prohibited downstream use

PM/BA/architecture questions

Product questions

QuestionWhat a strong answer shows
What business decision does this synthetic dataset support?Clear purpose instead of generic AI experimentation
Who will consume it and what action will they take?User, workflow and environment clarity
What would be harmed if the synthetic data is wrong?Customer, operational, model, conduct and compliance impact
What is explicitly out of scope?Prevention of purpose creep
What would make this dataset no longer fit for use?Expiry, drift, policy change, source change

BA questions

QuestionWhat a strong answer shows
Which process steps, exceptions and controls must be represented?Real workflow grounding
Which business rules must never be violated?Test oracle and rule validity
Which labels or expected outputs are authoritative?Prevents synthetic label drift
What limitations must users see before consuming the dataset?Data card discipline
Which stakeholders must approve changes to scope?Governance routing

Architecture questions

QuestionWhat a strong answer shows
Where is synthetic generation executed and logged?Environment and evidence design
How are generator prompts, rules, seeds and versions managed?Reproducibility and lineage
How does watermark/provenance survive transformations and embeddings?Downstream control
Can synthetic data enter RAG indexes, agent tools or model training pipelines?Typed access and license enforcement
How will unauthorized use be detected?Access logs, catalog policy, pipeline checks
What is the deletion and renewal mechanism?Retention operationalization

Privacy/risk questions

QuestionWhat a strong answer shows
Can an attacker infer source membership, identity or sensitive attributes?Privacy attack thinking
What quasi-identifiers or rare patterns remain?Linkage risk awareness
Does the dataset amplify historical bias or false-positive patterns?Fairness and conduct risk
Who accepts residual risk and when does it expire?Accountability
What evidence would satisfy internal audit?Replayable evidence packet

Release checklist

Use this checklist before any G2+ release.

#CheckPass condition
1Use case approvedBusiness purpose, target users, allowed/prohibited uses are documented
2Risk tier assignedG-level is justified by data class, use, audience and environment
3Source permission completeData Owner, Privacy/Legal/Compliance as needed approve source use
4Generation plan approvedMethod, generator, prompts/rules, source windows and exclusions are recorded
5Environment controlledGeneration and storage environment meets security requirements
6Privacy attacks executedRequired tests pass or exceptions are remediated/restricted
7Utility measuredDataset supports the approved task with evidence
8Fidelity measuredRelevant domain structure is preserved without unsafe copying
9Bias/coverage reviewedCoverage gaps and amplification risks are accepted or corrected
10Data card completePurpose, source summary, methods, tests, limitations and owner are visible
11License attachedAllowed uses, prohibited uses, derivative rules, retention and expiry are explicit
12Watermark/provenance appliedMetadata markers and generation lineage are retained
13Access controls configuredOnly approved users/environments can access
14Retention/deletion scheduledWorking copies, derivatives and evidence retention are defined
15Release evidence packet storedAll approval and evidence artifacts are linked
16Monitoring enabledAccess, use, exceptions and expiry are tracked
17Reopen triggers definedNew use, vendor release, training use, attack finding, incident or expiry triggers review

Hard-stop conditions:

  • Source permission is unclear or denied.
  • Direct identifiers or near-copies remain.
  • Membership inference, linkage or inversion risk exceeds threshold.
  • Utility evidence does not support requested use.
  • Bias review identifies unacceptable customer or conduct risk.
  • Downstream use includes production decisioning without explicit high-risk approval.
  • Vendor release lacks contract, retention and deletion evidence.

Executive narrative

1-minute version

Synthetic data can accelerate AI delivery, but only if it is governed as a data product. The main risk is not that the data is fake; the risk is that teams treat it as automatically safe, then use it for training, testing, vendor sharing or analytics beyond its permission boundary. Our architecture introduces risk-tiered release gates: source permission, generation controls, privacy attack testing, utility/fidelity scoring, bias review, data card, allowed-use license, provenance and retention. This lets us use synthetic AML, KYC, credit, fraud, contact center and complaint datasets safely enough for approved purposes while preserving audit evidence.

CTO / CDO version

The platform decision is to build a synthetic data control plane, not a loose generation toolkit. The control plane standardizes intake, generator registry, source permission, privacy attack workbench, scorecards, catalog, license enforcement, watermark/provenance and expiry. Engineering gets reusable datasets and faster test coverage. Data governance gets a record of what was generated, from which source, for which use, and under which restrictions. Risk teams get evidence that privacy, utility, fidelity and bias were measured before release. This reduces duplicated scripts, vendor sandbox sprawl and hidden training-data contamination.

CRO / Privacy version

Synthetic data does not remove privacy or conduct risk by definition. The release gate must prove that the dataset cannot reasonably expose real customers, rare cases, documents, calls or transaction paths, and that it will not be used outside the approved purpose. The evidence package includes source permission, membership inference and linkage testing, model inversion review where relevant, bias/coverage review, residual risk owner, retention and deletion plan. For high-risk uses, approval is conditional and time-bound.


Interview drills

Drill 1: “Is synthetic data automatically safe to use with vendors?”

30 秒答案:

Synthetic data is not automatically safe. I would first check source permission, whether the vendor contract allows derivative data, and whether the dataset passed privacy attack testing. For vendor release I would require a G4 gate: data card, allowed-use license, watermark/provenance, access restrictions, retention period, deletion proof and explicit prohibition on model training or onward sharing unless approved.

2 分钟展开:

The key mistake is assuming that removing direct identifiers or generating records from a model eliminates privacy risk. Synthetic data can preserve rare transaction patterns, complaint narratives, KYC document artifacts or fraud graph structures that support linkage or membership inference. For vendor use, I would narrow the purpose, reduce fidelity where needed, test for nearest-neighbor similarity and PII leakage, confirm contractual restrictions, and require evidence that the vendor cannot reuse the data for unrelated model training. If the vendor needs higher fidelity, that is a risk trade-off requiring stronger controls, not an informal data share.

Drill 2: “How do you balance privacy, utility and fidelity?”

30 秒答案:

I treat privacy, utility and fidelity as separate gate dimensions, not one averaged score. Privacy asks whether real people or cases can be inferred. Utility asks whether the dataset supports the approved task. Fidelity asks whether it preserves the relevant business structure without copying protected facts. Any hard failure can block or narrow release.

2 分钟展开:

For example, payment fraud simulation needs realistic transaction timing, device changes and merchant patterns, but copying exact rare fraud sequences can expose source cases. I would abstract typologies, generate plausible variants, suppress unique combinations, and run linkage and nearest-neighbor tests. Then I would measure utility through rule trigger coverage and investigator realism scores, and fidelity through sequence validity. If privacy is strong but utility is weak, the dataset may be suitable for demos but not rule validation. If utility is strong but privacy fails, it cannot be released until regenerated or restricted.

Drill 3: “What makes a synthetic dataset fit for credit model augmentation?”

30 秒答案:

It needs source permission, model-risk approval, privacy attack evidence, policy-valid features, subgroup coverage, reason-code stability and clear limits. I would not let synthetic credit data become production training input unless Model Risk, Privacy, Data Owner and Business Owner approve the use and evidence shows it does not amplify bias or distort adverse action explanations.

2 分钟展开:

Credit is high impact because synthetic data can alter model boundaries and fairness behavior. I would define whether the dataset is for sensitivity analysis, challenger testing, rejected-inference exploration or actual training. The scorecard would include distribution fidelity, business rule validity, subgroup performance, proxy-feature review, label provenance, adverse action reason stability and membership/linkage testing. The license would prohibit production scoring and unrelated use unless explicitly approved. This protects the organization from treating synthetic scenarios as ground truth.

Drill 4: “How would you govern contact center transcript generation?”

30 秒答案:

I would avoid sending raw transcripts to an uncontrolled LLM. I would use approved intent labels, redacted summaries and policy constraints to generate synthetic conversations, then run PII scans, phrase similarity checks, vulnerability-language review and utility testing against copilot QA or intent classification tasks. The data card must say it is synthetic and cannot be used as real customer evidence.

2 分钟展开:

Contact center transcripts contain authentication phrases, account details, hardship signals, health or vulnerability disclosures and employee behavior. The generator should operate in a controlled environment using minimized source features. Utility should be measured by intent classification accuracy, escalation coverage and SME realism. Bias review should check that the synthetic language does not stereotype customers with low digital confidence, accents, vulnerability markers or complaint intensity. The license should prohibit employee performance analytics, sentiment profiling beyond approved QA, and any attempt to reconstruct real calls.

Drill 5: “What evidence would you show internal audit?”

30 秒答案:

I would show the intake, source permission matrix, data classification, generation plan, privacy attack report, utility/fidelity scorecard, bias review, data card, allowed-use license, watermark/provenance evidence, approval record, residual risk owner, retention plan and access logs. The goal is to replay why the dataset was created, why it was safe enough for that use, who approved it and how misuse is controlled.

2 分钟展开:

Audit does not need a story that “synthetic means safe”; it needs evidence. I would connect the dataset ID to PRD, ADR, release gate and downstream systems. The evidence packet should show source owners approved the purpose, privacy tests were run, utility was measured against the actual task, limitations were disclosed, and access was limited to approved environments. If the dataset was shared externally, I would add contract terms, deletion proof and vendor access logs. If it was used for model training or validation, I would include model-risk sign-off and monitoring thresholds.


Source anchors

SourceLinkHow to use this anchor
UK ICO synthetic data guidancehttps://ico.org.uk/about-the-ico/research-reports-impact-and-evaluation/research-and-reports/technology-and-innovation/synthetic-data/Ground synthetic data privacy, utility and governance discussion
NIST Privacy Frameworkhttps://www.nist.gov/privacy-frameworkStructure privacy risk management, controls and evidence
NIST AI RMFhttps://www.nist.gov/itl/ai-risk-management-frameworkMap synthetic data risks to AI governance, measurement and management
ISO/IEC 42001https://www.iso.org/standard/81230.htmlAlign with AI management system operations, roles and continual improvement
ISO/IEC 23894https://www.iso.org/standard/77304.htmlAlign with AI risk management lifecycle
W3C PROVhttps://www.w3.org/TR/prov-overview/Represent synthetic data provenance through entities, activities and agents