Tuesday, July 24, 2012

How Dirty Is Your DICOM Data?

This looks like a nice image, but the metadata
could be totally incorrect or corrupted

If you would take a snapshot of any DICOM archive and check the image headers for correctness, I would argue that there are quite a few hidden problems that you might not know about.
Errors in a DICOM header can cause images to be incorrectly displayed, incorrectly added to the database, or being flatly rejected by the PACS. By DICOM errors, I don’t mean an incorrect Accession Number of patient name, or duplicate ID, but rather a violation of the rules defined in the DICOM standard for a particular field entry.
These errors are typically categorized as length errors (exceeding the maximum allowed length for a particular field), invalid characters, or a value that does not match the defined list of terms for a particular field. A typical example of a length error would be the value of the station name exceeding the maximum allowed 16 characters, an invalid character would be using a backslash “\” as part of a patient field (note that the “\” is defined as a control character in DICOM), and an example of an invalid term would be the value of “U,” for “Unknown” in a patient sex field.
How and where do these problems occur? There are several potential sources. One is user input errors. I remember that my developers once spent about a week figuring out why certain images were intermittently rejected. Eventually the source was traced to a user who sometimes by accident used disallowed control characters. A robust user interface will filter these out and/or warn the user that the entry is incorrect. However, when the data is generated as part of the order, which is entered by a data entry person and creates a HL7 message, it might not be noticed, as HL7 has a different set of allowable control characters than DICOM. A robust mapping from HL7 to DICOM should convert and/or filter out these incorrect characters.
This gets us to the second source of potential problems: incorrect mapping by an interface engine, modality worklist (MWL) provider, or broker. HL7 has different lists of defined terms, such as the case for “U” for sex, and different length specifications for the fields. A robust MWL provider should take care of most mapping errors.
Some modalities might create invalid headers as well. One such common problem is having a leading zero (0) in one of the segments of a Unique Identifier. Some UID generators do not always check for that, and these headers are typically rejected by the PACS.
Last but not least, one might have invalid images on a CD that need to be imported, which were created by some unknown modality. I had that issue when I tried to import and view images of my dog in my DICOM viewer, which were rejected as these images were missing a Patient ID.
To troubleshoot these problems, in many cases a visual inspection will do as the errors are relatively common and easy to spot. One can import the image into a DICOM viewer and use the DICOM header dump feature. When the problem is not that easy to see, one would use a DICOM validator, which tests the header against the DICOM specifications. A demonstration of the visual inspection and how to perform a validation can be seen here.
After diagnosing the problem, one can either fix the header and resubmit it to the PACS, assuming it is a one-time issue, or if it is recurring, one should go back to the source, for example the modality worklist provider, or modality manufacturer to get this fixed.
I strongly recommend running the image through the validator for every modality, especially prior to purchasing a new device. Remember that potential issues might not bite until later, as your current PACS might be more forgiving and not reject certain attributes in the header, but when the time comes to migrate the data, these problems might resurface and prevent proper viewing or even storage.
In conclusion, image header encoding problems due to incorrect DICOM encoding are easy to see and/or validate using open source tools. It is highly recommended to check your data.


  1. Hi Herman,

    This was great information.
    I have also faced the same issue with my customer. Some orders were rejected by my PACS system & when we dig further we came to know that this behavior is due to some special characters (delimiters for PACS) entered in fields. Your tutorial will help me in analysing a bit more reagrding this issue.. thanks a lot..


  2. This was a good read. I too have encountered silly issues like the customer entered 00000 for the Patient ID and the PACS server interpreted this as a NUL value. I am by no means a DICOM or PACS expert, but blogs like yours have definitely helped me troubleshoot problems thousands of miles away with customers who insist, our product is faulty.

  3. Really i appreciate the effort you made to share the knowledge.The topic here i found was really effective to the topic which i was researching for a long time nursing personal statement

  4. The problem with storing everything on disk storage is the limited space. While the storage space of disk storage devices has grown tremendously while prices for storage have come down, some companies still need a lot more space than disk storage can provide for their archived data. Self Storage