Synthetic dataset

A synthetic dataset that mimics a part of the Netherlands Cancer Registry (NCR) is available for research purposes. This dataset does not contain data on real patients. It enables researchers to use record-level cancer data safely, while knowing that there is no risk of breaching patient confidentiality.

The synthetic data mimics the structure and some of the statistical patterns of the NCR. It gives a good impression of the data available in the NCR for researchers who wish to apply for data in the NCR. The synthetic dataset can help researchers to decide what data they need to answer their research question and hence on how to complete the data application form.

Synthetic data can also be used for the development of software and analysis scripts while still producing realistic results. We would like to emphasize that these results are based on synthetic data and can thus not be used for clinical decision-making and should not be used as cancer statistics or for scientific publication (other than on its methodological merits).

This dataset will enable more people to derive insights from cancer data, while at the same time protecting the privacy of patients.

Release information

This first version of the synthetic data contains a subset of the items we register for breast cancer patients, however, we plan to include other tumour types in the future. This gives us the ability to use your feedback and ensure that future releases fit your requirements.

For information on all items registered in the NCR, please check the NCR Catalogue and the item sets.

Different versions of the synthetic dataset

You can receive different versions of the synthetic dataset.
A regular dataset can be supplied with the variables in the standard NKR format or a version according to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM).
You can choose from 2 variants for the OMOP-CDM version. You can indicate on the application form which version(s) you would like to receive.
•    Regular data set
•    OMOP-CDM dataset where you convert the regular dataset to OMOP-CDM format online yourself
•    OMOP-CDM dataset where you receive the data directly in OMOP-CDM format

Request the dataset

To request the NCR synthetic dataset (version October 2021), please complete the request form and attach this form to an email to gegevensaanvraag@iknl.nl  

Feedback

In case you have any questions or feedback with regard to the synthetic data, feel free to contact us by email: gegevensaanvraag@iknl.nl  

Terms and conditions for your request for the synthetic data set

Article 1

These terms and conditions apply to the provision of the NCR synthetic data set as requested by you.

Article 2

After receiving the request form, IKNL will check whether the form has been completed in full. You must complete all the required fields. IKNL will not process any incomplete requests.

Article 3

All possible intellectual property rights to the synthetic dataset, including database rights, belong to IKNL.

IKNL is granting you a non-exclusive and non-transferable right of use to the synthetic dataset. This right of use will apply for an unlimited period of time.

If you act contrary to these terms and conditions or commit an act of infringement in any other way, IKNL will be entitled to terminate the right of use granted to you. In that case you must end your use of the NCR synthetic data set immediately after notification by IKNL.

The right of use granted by IKNL will only apply to you, and possibly to the people in your organisation who are involved in the work performed within the scope of the purpose described in the request form.

Sub-licensing of the right of use granted is not permitted. In other words: you cannot share the synthetic data set with any other parties and you cannot exploit it commercially under any circumstances.

Article 4

You will handle the synthetic data provided by IKNL with care, and you will adhere to the applicable legislation and codes of conduct.

You will only use the synthetic data for the purposes mentioned in the request form. If you wish to use the synthetic data for a new application, we would like to be informed about this additional purpose as well.

Article 5

The lead times communicated by IKNL are indicative in nature and should be construed as an obligation of effort.

Article 6

You are aware of the fact that it concerns simulated data. If you are using the synthetic data for the development of software or methods of analysis with realistic results, you cannot use these results for clinical decision-making or for any scientific publications about cancer.

Making changes to the synthetic data set is permitted. Publishing of the synthetic data – whether or not they have been edited – is not permitted.

Article 7

IKNL is making an effort to ensure that the synthetic data set is as accurate as possible, but cannot guarantee complete accuracy.

IKNL will not be liable for the provision of any incorrect or incomplete synthetic data.

IKNL will not be responsible and/or liable for any damage incurred by you and/or any third parties which directly or indirectly arises from or occurs in connection with the provision of the synthetic data set, unless this damage arises from a failure by IKNL in the performance of its obligations arising from the agreements that exist between you and IKNL, including these terms and conditions.

The total liability of IKNL due to an attributable failure in the performance of the agreements or for any other reason, including any guarantee and/or indemnity obligations, will be limited to compensation of direct damage up to an amount that is equal to the amount paid by the insurance.

FAQ - Synthetic dataset NCR

Does the synthetic data contain information of real patients?

No, none of the records in the synthetic data are linked to real patients. Instead, records in the synthetic data are generated based on statistical distributions. Should you find any similarities with individual patients, these will be based on coincidence.

What can the synthetic data be used for?

The synthetic data gives insight into the items that are registered in the NCR, which can help with the submission of a data request. One can also perform some experiments or develop software on the synthetic data while waiting for the data request to be approved. Alternatively, code developed using the synthetic data can safely be tested on the real data by IKNL employees, without the need for developer to see real patient information.

Can I send the code that I developed using the synthetic data to run it on the real data?

Yes, researchers can share their code with IKNL. If they would like to test the code on the real data and receive back results, they need to follow (real) data usage request process. If approved, the results can be shared if these do not contain any sensitive information.

How do I cite the synthetic dataset?

Be aware that the synthetic data cannot be used for clinical studies, as the results would be based on synthetic data. However, if the synthetic data used for exploration or software development, you can cite it with the following citation:

Synthetic dataset Netherlands Cancer Registry (NCR), Netherlands Comprehensive Cancer Organisation (IKNL). https://iknl.nl/en/ncr

(Click here for our privacy statement)