Thales and Faculty evaluate synthetic data assurance for Defence AI
Above: To download the White Paper, click here.
Courtesy Thales / Faculty
Synthetic data is often used to expand training and test datasets, and to represent conditions that are hard to capture at scale. The risk is that synthetic and real data can differ in ways that materially affect model behaviour. If those differences are not identified and managed, models can perform well in controlled testing and then behave differently when deployed.
The white paper argues that AI assurance depends on data assurance. If synthetic data is part of a training or evaluation pipeline, teams need evidence that the synthetic data is representative and fit for purpose.
The paper proposes a repeatable approach to assessing synthetic data quality, combining statistical analysis with model-based diagnostics. This helps teams identify where synthetic and real data diverge, and understand what those differences mean for model performance.
It also highlights a common failure mode. Models can learn patterns that separate “real versus synthetic” rather than learning the task itself. The paper describes how this can be detected using embedding analysis and related diagnostics, so teams can address issues earlier in development.
Ajay Chakravarthy, Chief AI Officer at Thales in the UK, said: “Defence and security organisations are moving quickly to adopt AI, but confidence still depends on evidence. This work is focused on what you can measure and test when synthetic data is part of the training and evaluation story. It is a practical contribution to proving AI systems are fit for purpose.”
