# Synthetic Data Generation for Testing One-sentence definition: Creating artificial datasets that mimic real data without exposing sensitive information. ## Key Facts - Replaces using production data in lower environments. - Techniques: rule-based, statistical, ML-based generators. - Preserve distributions while removing direct identifiers. - Validate privacy risk; document generation recipes. - Balance utility vs privacy; monitor drift. - **Verify:** check official (ISC)² CBK and current exam outline. ## Exam Relevance - Select synthetic data to meet dev/test needs safely. **Mnemonic:** “Looks real, isn’t.” ## Mini Scenario Q: QA needs realistic PII—solution? A: Synthetic datasets aligned to schema and distributions. ## Revision Checklist - Name two generation methods. - State one validation step. - Tie to privacy/risk benefits. ## Related [[Data Masking and Redaction]] · [[Pseudonymization vs Anonymization]] · [[Tokenization]] · [[Data Quality and Integrity Controls]] · [[Data Warehouse and Data Lake Security]] · [[Domain 2 - Index]]