Tuesday, July 24, 2012

How Dirty Is Your DICOM Data?

This looks like a nice image, but the metadata
could be totally incorrect or corrupted

If you would take a snapshot of any DICOM archive and check the image headers for correctness, I would argue that there are quite a few hidden problems that you might not know about.
Errors in a DICOM header can cause images to be incorrectly displayed, incorrectly added to the database, or being flatly rejected by the PACS. By DICOM errors, I don’t mean an incorrect Accession Number of patient name, or duplicate ID, but rather a violation of the rules defined in the DICOM standard for a particular field entry.
These errors are typically categorized as length errors (exceeding the maximum allowed length for a particular field), invalid characters, or a value that does not match the defined list of terms for a particular field. A typical example of a length error would be the value of the station name exceeding the maximum allowed 16 characters, an invalid character would be using a backslash “\” as part of a patient field (note that the “\” is defined as a control character in DICOM), and an example of an invalid term would be the value of “U,” for “Unknown” in a patient sex field.
How and where do these problems occur? There are several potential sources. One is user input errors. I remember that my developers once spent about a week figuring out why certain images were intermittently rejected. Eventually the source was traced to a user who sometimes by accident used disallowed control characters. A robust user interface will filter these out and/or warn the user that the entry is incorrect. However, when the data is generated as part of the order, which is entered by a data entry person and creates a HL7 message, it might not be noticed, as HL7 has a different set of allowable control characters than DICOM. A robust mapping from HL7 to DICOM should convert and/or filter out these incorrect characters.
This gets us to the second source of potential problems: incorrect mapping by an interface engine, modality worklist (MWL) provider, or broker. HL7 has different lists of defined terms, such as the case for “U” for sex, and different length specifications for the fields. A robust MWL provider should take care of most mapping errors.
Some modalities might create invalid headers as well. One such common problem is having a leading zero (0) in one of the segments of a Unique Identifier. Some UID generators do not always check for that, and these headers are typically rejected by the PACS.
Last but not least, one might have invalid images on a CD that need to be imported, which were created by some unknown modality. I had that issue when I tried to import and view images of my dog in my DICOM viewer, which were rejected as these images were missing a Patient ID.
To troubleshoot these problems, in many cases a visual inspection will do as the errors are relatively common and easy to spot. One can import the image into a DICOM viewer and use the DICOM header dump feature. When the problem is not that easy to see, one would use a DICOM validator, which tests the header against the DICOM specifications. A demonstration of the visual inspection and how to perform a validation can be seen here.
After diagnosing the problem, one can either fix the header and resubmit it to the PACS, assuming it is a one-time issue, or if it is recurring, one should go back to the source, for example the modality worklist provider, or modality manufacturer to get this fixed.
I strongly recommend running the image through the validator for every modality, especially prior to purchasing a new device. Remember that potential issues might not bite until later, as your current PACS might be more forgiving and not reject certain attributes in the header, but when the time comes to migrate the data, these problems might resurface and prevent proper viewing or even storage.
In conclusion, image header encoding problems due to incorrect DICOM encoding are easy to see and/or validate using open source tools. It is highly recommended to check your data.