Methods for data de-identification - an overview of EU projects

Data deidentification in medical imaging

Data de-identification in medical imaging remains an important issue due to the high level of privacy and the large amount of data used. But when is data considered anonymous or pseudonymous? What methods are used in research?

Researchers examined five European projects - PRIMAGE, CHAIMELEON, ProCAncer-I, INCISIVE and EuCanImage - and detailed their approaches to de-identification and the challenges they faced. The study found that de-identification methods varied significantly across the projects, with each using a combination of anonymisation and pseudonymisation techniques. The most conservative approach modified over 1300 data tags, ensuring patient privacy while maintaining the utility of the data for AI development.

The different approaches presented in this research provide valuable guidance for future AI projects, and highlight the importance of robust de-identification processes when creating GDPR-compliant medical imaging datasets.

Read full study


Documenting the de-identification process of clinical and imaging data for AI for health imaging projects

Insights into Imaging, 2024

Abstract

Artificial intelligence (AI) is revolutionizing the field of medical imaging, holding the potential to shift medicine from a reactive “sick-care” approach to a proactive focus on healthcare and prevention. The successful development of AI in this domain relies on access to large, comprehensive, and standardized real-world datasets that accurately represent diverse populations and diseases. However, images and data are sensitive, and as such, before using them in any way the data needs to be modified to protect the privacy of the patients. This paper explores the approaches in the domain of five EU projects working on the creation of ethically compliant and GDPR-regulated European medical imaging platforms, focused on cancer-related data. It presents the individual approaches to the de-identification of imaging data, and describes the problems and the solutions adopted in each case. Further, lessons learned are provided, enabling future projects to optimally handle the problem of data de-identification.