Synthetic Data: Examples – Realistic – using AI (SYNDERAI), pronounced /ˈsɪn.də.raɪ/
© HL7 Europe | Main Contributor: Dr. Kai U. Heitmann | Privacy Policy • LGPL-3.0 license
The xSHARE Project workpage contains deliverables such as the toolbox (D3.3) in Work Package 3 that arranges tools that support proper implementation of the European EHRxF, or the X-bundles deliverable (D2.8) in Work Package 2.
For this set of deliverables, Synthetic Example Data is created on the base of the EEHRxF specifications. The purpose of Synthetic Example Data is seen in testing and validation (e.g. industry proofs, connect-a-thons etc.) as well as in education and further implementation support for vendors.
Granular facts for Synthetic Example Data are subject to real medical knowledge. Create data just randomly may result in very unlikely data from a clinical perspective such as a child with a myocardial infarction or a grandmother with Type 1 Diabetes. Therefor designing examples need to be as close as possible to real medical workflows, as if they would be able to support care.
HL7 Europe started this effort with a focus on Synthetic Example Data with a “real” medical background – as if from or for “real” care – but “invented” matching patient demographics. This is done using several sources of generated data, amalgamating it with additional localized data. For that purpose, the (invented) geo location of the synthetic patients and the a "close-by" primary care provider or hospital, see figure 1) as well as statistical methods that are also used for clinical trials (stratification [1] of subjects) were used.
![]() |
|---|
| Figure 1: Geo-Localization of “patients” and “providers” of Synthetic Example Realistic Data. The example data is a randomized amalgamation of synthetic sources, bringing stratification and other statistical methods into play. |
In addition, Artificial Intelligence (AI) technologies is used to supplement data for the created strata and synthetic clinical story. Supportive data and narrative / human readable text is generated by Artificial Intelligence (AI) once given precise instructions and prompting (e.g. granular data as the source). This is applied to several areas such as tabular data (lab, meds) and (semi-structured) text for the human target (e.g. Hospital Discharge Report), or just for clinically relevant reference ranges for lab results, medication dosage information, etc. However, AI was only applied in "low dose" to selected areas of the example creation, not used to create complete FHIR instances. SYNDERAI uses the OpenAI API.
The Synthetic Example Realistic Data and AI (SYNERDAI) methodology emitted the first 200 HL7 Europe Laboratory Report (EU-Lab) in October 2024 already, based on the HL7 Europe Laboratory Report FHIR Implementation Guide [2] LAB which is the implementation specification of the eHN Laboratory Result Guidelines [3]. The other areas of specification will be submitted as a follow-up covering the Hospital Discharge Report (HDR), the European Patient Summary (EPS) and others. The instances are available publicly through the SYNDERAI website. The sets are also registered on zenodo [4].
The generation of Synthetic Example Data was also combined with a reference implementation of Visualization of the Synthetic Example Instances. For more information on this activity, refer to the vi7eti website [5] and the GitHub repository [6].
The overall goal is to publish at least 1,000 Synthetic Patients with Lab Report instances (LAB) with multiple reports per patient over time, 1,000+ Synthetic European Patient Summaries (EPS) and several Hospital Discharge Reports (HDR) in greater variability (lot of human text vs. more tabular data). Work is underway for the provision of Synthetic Imaging Reports (IMG). Finally, there will be a big run for creating 25,000 European Patient Summaries (EPS) in fulfilment of the 25tipster (25 thousand IPS test records) initiative as of 2020 by Kai Heitmann, that built the cornerstones of the SYNDERAI methodology.
[1] Stratification of clinical trials is the partitioning of subjects and results by a factor other than the treatment given. – see Wikipedia https://en.wikipedia.org/wiki/Stratification_(clinical_trials)
[2] See https://build.fhir.org/ig/hl7-eu/laboratory/
[3] See https://health.ec.europa.eu/publications/ehn-laboratory-result-guidelines_en
[4] See SYNDERAI registration at Zenodo https://zenodo.org/records/16792934
[5] See https://vi7eti.net