# Synthetic Data Generation for Testing
One-sentence definition: Creating artificial datasets that mimic real data without exposing sensitive information.
## Key Facts
- Replaces using production data in lower environments.
- Techniques: rule-based, statistical, ML-based generators.
- Preserve distributions while removing direct identifiers.
- Validate privacy risk; document generation recipes.
- Balance utility vs privacy; monitor drift.
- **Verify:** check official (ISC)² CBK and current exam outline.
## Exam Relevance
- Select synthetic data to meet dev/test needs safely.
**Mnemonic:** “Looks real, isn’t.”
## Mini Scenario
Q: QA needs realistic PII—solution?
A: Synthetic datasets aligned to schema and distributions.
## Revision Checklist
- Name two generation methods.
- State one validation step.
- Tie to privacy/risk benefits.
## Related
[[Data Masking and Redaction]] · [[Pseudonymization vs Anonymization]] · [[Tokenization]] · [[Data Quality and Integrity Controls]] · [[Data Warehouse and Data Lake Security]] · [[Domain 2 - Index]]