GLOSSARY

DICOM Anonymization

Q: Is re-identification possible after DICOM anonymization?

Standard DICOM de-identification using HIPAA's Safe Harbour method reduces but does not eliminate re-identification risk. Combinations of indirect identifiers — scan date, approximate age, institution, and rare disease status — can in some cases allow re-identification even after the 18 specified fields are removed. GDPR's anonymization standard requires re-identification risk to be reduced to the point where identification is not reasonably possible by any available means. For trials with EU data, additional safeguards such as date shifting, age banding, and institution identifier generalisation are often applied. Sponsors should work with their legal advisors to determine what combination of measures satisfies their regulatory obligations.

DICOM anonymization is the process of removing or replacing protected health information (PHI) from medical image files before they are transferred outside a clinical care environment — to a research institution, central imaging platform, contract research organisation, or regulatory agency. It protects patient privacy and is required by applicable privacy regulations — including HIPAA for covered entities making regulated disclosures, and GDPR for data involving EU subjects — before clinical trial imaging data can be transferred outside the care setting. Depending on the modality and workflow, a compliant process must address the DICOM metadata header and, where burned-in pixel PHI is present, the image pixel data as well.

What is DICOM anonymization?

DICOM anonymization — also called DICOM de-identification or PHI removal from DICOM — is the process of removing or replacing protected health information from DICOM medical image files before those files are transferred outside a clinical care setting.

DICOM (Digital Imaging and Communications in Medicine) is the universal standard for storing, transmitting, and displaying medical imaging data. Every DICOM file contains two categories of data: a structured metadata header with hundreds of tag fields, many of which are populated with patient-identifying information at the time of image acquisition, and the image pixel data itself.

Anonymization in clinical trials serves two purposes simultaneously: it protects patient privacy by removing identifying information from data that will leave the clinical site, and it satisfies the regulatory requirements that govern the transfer of clinical trial data under HIPAA, GDPR, and GCP. Both purposes must be met before any trial imaging data is transmitted to a sponsor, CRO, central imaging platform, or regulatory agency.

Anonymization vs. de-identification — what is the difference?

In everyday clinical and research usage, anonymization and de-identification refer to the same process. There is a technical distinction under privacy law that matters for multinational trials.

Note: This section describes regulatory frameworks for educational purposes. Sponsors should confirm their specific anonymization requirements with their regulatory and legal advisors before implementing a trial workflow.

De-identification (HIPAA)

HIPAA's Safe Harbour method requires removal of 18 specific categories of identifier — including patient name, date of birth, geographic data, dates of service, and device identifiers — enumerated in 45 CFR §164.514(b).¹ Once these identifiers are removed, the data is no longer considered protected health information under HIPAA and is exempt from its protections.

This is a defined checklist: remove the specified fields and the legal standard is met. HIPAA also provides an Expert Determination method, in which a qualified statistician certifies that re-identification risk is very small — but the Safe Harbour method is more commonly used in imaging trials.

Anonymization (GDPR)

GDPR does not provide a fixed list of identifiers to remove. Instead, Recital 26 and Article 4(1) establish a functional test: data is anonymous only if the individual cannot be identified by any means reasonably available — by the data controller, the processor, or any third party.² This is a higher and less precisely defined standard than HIPAA's Safe Harbour method.

The practical implication is that de-identified data under HIPAA may not satisfy GDPR anonymization. Combinations of indirect identifiers — scan date, approximate age, institution, and rare disease status — can in some cases allow re-identification even after the 18 HIPAA identifiers are removed. For trials collecting imaging data from both US and EU sites, applying the stricter GDPR functional test to the full dataset is the standard approach.

What PHI is in a DICOM file?

DICOM files contain more PHI than most clinical teams expect. Understanding the two categories is essential because they require different removal approaches.

Header tag PHI

The DICOM standard defines a very large number of metadata tag fields — far more than most clinical teams are aware of. While most are technical parameters — field strength, slice thickness, acquisition sequence — many are populated with patient-identifying information by the scanner or PACS system at acquisition.

Common header PHI includes:

PatientName
PatientBirthDate
PatientID
AccessionNumber
ReferringPhysicianName
InstitutionName
StudyDate
OperatorsName

The DICOM standard's Attribute Confidentiality Profiles — defined in Supplement 142 — specify which tags must be removed, replaced, or modified to achieve compliant de-identification for different use cases.³ Header tag PHI can be identified and removed systematically by any software that can read and write DICOM metadata.

Burned-in pixel PHI

Some imaging modalities — particularly older ultrasound, fluoroscopy, cine, and some endoscopy systems — embed patient-identifying text directly into the image pixel data as a visible overlay. This is called burned-in pixel PHI, and it cannot be removed by stripping header tags because it is part of the image itself.

Why burned-in pixel PHI is harder to detect than header PHI: header PHI sits in structured, machine-readable fields that a parser can locate and read by tag name. Burned-in pixel PHI is unstructured — it is rendered text within a raster image, indistinguishable from clinical image content at the data level. Detecting it requires pattern recognition rather than tag lookup: the system must visually parse the image to find text characters and determine whether they represent identifying information.

Detection approaches include optical character recognition (OCR), deep learning-based text detection models, and region-of-interest template methods that apply redaction to image areas where overlays are typically placed for a given modality and vendor. No single approach detects all cases; production implementations typically combine methods.

A practical example: A brain MRI acquired on a Siemens 3T scanner contains two categories of PHI.

In the header: PatientName = SMITH^JOHN, PatientBirthDate = 19680312, InstitutionName = Massachusetts General Hospital, StudyDate = 20240315, AccessionNumber = MGH-2024-03890.

In the pixel data: a scanner overlay printed in the upper-left corner of every frame reads "MGH RADIOLOGY — J. SMITH — DOB: 1968 — 03/15/2024." This text is part of the image bitmap and survives header tag removal intact.

A compliant anonymization process removes both: strips the header tags according to the applicable confidentiality profile, and detects and redacts the pixel overlay before the file is transmitted.

Manual vs. automated DICOM anonymization

Manual anonymization

A human reviewer opens each DICOM file, checks the header tags against a removal checklist, and redacts any pixel overlays found. Compliant for small, low-volume datasets — but it does not scale. In a multi-site clinical trial receiving hundreds or thousands of images per week, manual anonymization creates a processing bottleneck, introduces inconsistency between reviewers, and is a common source of compliance gaps based on QMENTA's operational experience across imaging studies.

Manual review also has limited effectiveness against burned-in pixel PHI: a reviewer who is not specifically looking for pixel overlays — or who is unfamiliar with a particular scanner model's overlay format — can miss them.

Automated anonymization

Software processes DICOM files systematically, applying a pre-defined anonymization profile to every file in a batch. A production automated anonymization system:

Removes or replaces all header tags specified by the applicable confidentiality profile (DICOM Supplement 142 or equivalent)
Detects burned-in pixel PHI using one or more detection methods (OCR, template-based redaction, AI-based text detection)
Flags files where PHI detection confidence is below a defined threshold for human review
Quarantines files that fail anonymization checks — returning them to a non-compliant holding state rather than allowing them to enter the trial dataset or reach the imaging platform. Files in quarantine are not accessible to readers or analysis pipelines until the issue is resolved and re-processed
Logs every action in an audit trail documenting which files were processed, what was removed or replaced, and when

Automated anonymization at the point of upload — where PHI is stripped before images leave the site network — is the approach recommended by imaging CROs and validated by QMENTA's experience across multi-site trials. It ensures that no PHI reaches the central imaging platform or sponsor systems regardless of downstream access controls.

QMENTA's Smart Uploader performs automated DICOM anonymization at the point of upload, including burned-in pixel PHI detection, with complete audit logging before transmission to the Imaging Hub.

Running a multi-site trial with complex anonymization requirements?
QMENTA handles DICOM anonymization, quality checking, quarantine workflows, and audit logging automatically across all sites — with no configuration required at the site level.
Learn how it works →

DICOM anonymization in multi-site trials

Multi-site trials introduce anonymization complexity that single-site studies do not face. Each site runs a different PACS from a different vendor, populated differently by a different acquisition team. A tag that is consistently empty at one site may be populated at another; a field correctly configured at trial start may contain identifying information after a scanner software update mid-trial.

Effective anonymization in multi-site trials requires a dynamic system that adapts to site-level variability rather than applying a fixed tag lookup table. Receipt-side quality checks should verify anonymization completeness for every incoming submission — with non-compliant files quarantined and flagged before they reach the analysis pipeline. Sites should not receive silent rejection; they need a clear notification that a specific submission failed and why, so the issue can be remediated and the file resubmitted while the protocol window is still open.

Downstream, images that have passed anonymization checks feed into the harmonization and analysis pipeline. The completeness and consistency of anonymization directly affects what harmonization methods are applicable and how much manual correction is required before endpoint analysis.

Key takeaways

DICOM anonymization must address two distinct PHI categories: structured header tags and unstructured burned-in pixel data — each requires a different removal method
HIPAA de-identification uses a defined 18-identifier checklist; GDPR anonymization applies a functional re-identification test — a stricter and less precisely defined standard
For trials with both US and EU sites, applying GDPR's stricter functional test to the full dataset is the standard approach
Burned-in pixel PHI is harder to detect than header PHI because it is unstructured pixel data requiring pattern recognition, not a machine-readable tag
Files that fail anonymization checks should be quarantined — not silently rejected or silently accepted — and the submitting site should receive a specific notification so the issue can be remediated
Automated anonymization at the point of upload, before images leave the site network, is the approach that best protects both patient privacy and trial compliance
Every anonymization action should be captured in an audit trail — QMENTA's Smart Uploader logs all de-identification events before transmission

By Paulo Rodrigues, PhD, Chief Technology Officer and Co-Founder at QMENTA
Paulo Rodrigues leads technology strategy at QMENTA and writes about imaging clinical trials, protocol standardization, real-time QC, and compliance-ready neuroimaging workflows for multi-site studies. View executive leadership.

¹ HHS. Methods for De-identification of PHI under HIPAA. hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification

² GDPR Recital 26; Article 4(1). gdpr-info.eu

³ DICOM Supplement 142: Attribute Confidentiality Profiles. dicom.nema.org/medical/dicom/current/output/html/part15.html

In clinical practice the terms are used interchangeably to describe the process of removing patient-identifying information from DICOM files. Technically, de-identification is the HIPAA legal standard requiring removal of 18 specified identifier categories under 45 CFR §164.514(b). Anonymization under GDPR applies a stricter functional test: data is anonymous only if the individual cannot be identified by any means reasonably available. For trials collecting data from both US and EU sites, applying the GDPR standard across the full dataset is the approach that satisfies both frameworks simultaneously. Sponsors should confirm their specific requirements with their regulatory and legal advisors.

Burned-in pixel PHI is patient-identifying information — such as name, date of birth, or medical record number — embedded directly into the image pixel data as visible text, rather than stored in DICOM header tags. It is produced by some imaging modalities that overlay patient information on the acquired image. Header PHI sits in structured, machine-readable tag fields that a parser can locate by name. Burned-in pixel PHI is unstructured — it is rendered text within a raster image that requires pattern recognition to detect, not a tag lookup. Detection approaches include OCR, template-based redaction for known overlay positions, and AI-based text detection models. Undetected burned-in PHI is a common source of compliance failures in clinical trial imaging transfers.

Yes, this is one of the most common and underappreciated risks in multi-site clinical trials and research studies. Anonymization tools that apply aggressive "remove all private tags" policies by default can strip vendor-specific fields that carry essential acquisition parameters. When these fields are lost, the image pixel data is still present but is effectively unusable for quantitative analysis because the information needed to interpret it mathematically is gone. The image file opens and displays, which makes the problem easy to miss until downstream processing fails or produces meaningless results.

Anywhere vendor-specific metadata is required for quantitative interpretation. Diffusion MRI (DWI, DTI, HARDI, multi-shell) is the highest-risk modality. B-values and b-vectors are stored in vendor-specific private tags, each with different conventions for sign, coordinate frame, and ordering. Loss of these fields makes tensor fitting, FA/MD maps, tractography, NODDI, and any multi-compartment model impossible to compute. Other at risk modalities are arterial spin labeling (labeling duration, post-labeling delay, labeling type), dynamic contrast-enhanced MRI (timing and injection parameters), multi-echo and quantitative MRI sequences (echo times, flip angles, inversion times when stored privately), fMRI (slice timing and phase encoding direction in some vendor implementations), PET (reconstruction parameters, decay correction references), and advanced CT (iterative reconstruction kernel identifiers). Any pipeline that depends on knowing exactly how the image was acquired is vulnerable.

Correctly performed anonymization does not affect the diagnostic or research quality of the image pixel data. Header tag anonymization removes or replaces metadata fields only, leaving the image content intact. Pixel-level redaction for burned-in PHI replaces a small region — typically a corner overlay — with a blank or neutral area. This may obscure a date or name stamp in the image margin but does not affect the anatomical or pathological content of the image that is being measured or assessed.

Investigator sites are typically required to de-identify images before transferring them to the sponsor, CRO, or imaging platform under both HIPAA and GDPR. In practice, this responsibility is operationalised through the imaging platform's upload tool: the site uploads images through a tool that automatically performs anonymization before transmission, removing the technical burden from site staff and ensuring the process is applied consistently across all sites. The trial contract and data transfer agreement should specify which party is responsible for anonymization and how compliance will be verified.

HIPAA's Safe Harbour method, correctly applied, satisfies HIPAA's de-identification standard — but it does not address residual statistical re-identification risk, which is precisely what GDPR's functional test targets. Combinations of indirect identifiers — scan date, approximate age, institution, and rare disease status — can in some cases allow re-identification even after the 18 HIPAA specified fields are removed. GDPR's anonymization standard requires re-identification risk to be reduced to the point where identification is not reasonably possible by any available means. For trials with EU data, additional safeguards such as date shifting, age banding, and institution identifier generalisation are often applied to satisfy GDPR's functional test. Sponsors should work with their legal advisors to determine what combination of measures satisfies their regulatory obligations.

DICOM Anonymization

What is DICOM anonymization?

Anonymization vs. de-identification — what is the difference?

De-identification (HIPAA)

Anonymization (GDPR)

What PHI is in a DICOM file?

Header tag PHI

Burned-in pixel PHI

Manual vs. automated DICOM anonymization

Manual anonymization

Automated anonymization

DICOM anonymization in multi-site trials

Key takeaways

See QMENTA's automated anonymization in action

Frequently asked questions

What is the difference between DICOM anonymization and de-identification?

What is burned-in pixel PHI and why is it harder to detect than header PHI?

Can DICOM anonymization accidentally remove information needed for image analysis?

What modalities and acquisitions are at risk?

Does DICOM anonymization affect image quality or the ability to analyse images?

Who is responsible for DICOM anonymization in a clinical trial?

Is re-identification possible after DICOM anonymization?