Sunday, July 29, 2012

How Would You Store 1 Billion Images?

The Mayo Clinic in Rochester, MN had 1.3 billion images in its enterprise archive as reported by Ken Persons during the SIIM conference in June. Each day approximately another million images are added. They have gone through 13 data migrations so far, and are doing another three right now covering 27 departments.
Even though 99 percent of hospitals today are not at this level of digitalization and image production, it makes sense to look at institutions like the Mayo Clinic to find out what they learned handling an archiving system on this scale, as the time will come, even if only 5 or 10 years from now, that many institutions will face similar challenges. Just wait until pathology begins to convert to digital archiving, as a typical department handling 30,000 procedures could easily create 100 Tbytes/year.
Managing this amount of data and number of migrations could only be feasible using a Vendor Neutral Archive or VNA. The folks at Mayo hate the term VNA as much as anyone else, that is why they talk about “enterprise archive” as there is no commonly agreed upon functionality for a VNA, even though I tried to define such functionality in this white paper (see link).
One of the major challenges Ken reported at that meeting is ensuring that a new PACS is made “aware” of the historical data in the enterprise archive so that priors are pre-fetched as needed. There are several options on how this can be accomplished, the first one being a “brute force” method, which requires all of the data to be pushed to the PACS to be re-archived, or the images from a specified number of months to be archived and re-indexed. This is clearly unacceptable and defeats the main purpose of having a VNA.
Another option is a one-time PACS database update with all of the available exam content. This is basically a migration of the database only, leaving the archived images in their enterprise location. A third option is to perform a query by the PACS of the enterprise archive to discover any studies that are relevant. The fourth, or “order driven” option is to pre-fetch as needed based on order information. Critical is the migration of the study description so that the relevant priors can be retrieved. If the performance of the retrieval is acceptable and if it is done in the background, I would guess that the “query method” is probably most preferable, followed by the “order driven” method.
One of the major discoveries the folks at the Mayo Clinic made is that there are a lots of pictures, i.e. conventional photographs made as well as videos for all kind of clinical purposes, ranging from documenting a certain gait of people who have trouble walking, to documenting skin lesions. The challenge is to archive all these clips and photos, which are typically stored on CD’s, DVD’s and archived on various computers and laptops, and should be part of the electronic health record as well. I would assume that if you walk around different departments in your institution, you too will find a lot of those types of images as well.
One of the observations I made when talking with the Mayo folks is the fact that they don’t use a commercial viewer to access the images in their enterprise archive. They have their own viewer and even though they benchmark this viewer every couple of years against available commercial viewers, it appears that they can’t buy what is needed to satisfy their physicians with regard to functionality. It is true that their viewer is not just a radiology imaging viewer, rather it is capable of displaying all of the various image types in their enterprise archive. I would argue that it does not take a lot of effort to create such a viewer. I would encourage vendors, however, to find out what is needed to satisfy the Mayo Clinic folks, not only would it result in a customer licensing tens of thousands of your viewers, but it would also provide the capabilities that very likely might be needed for every other customer whose imaging archives begin to grow on the scale of the Mayo Clinic.
In conclusion, it makes sense to find out how large institutions such as the Mayo Clinic are dealing with the exponential increases in image production and how they facilitate all the different specialties and departments in their enterprise archive in order to be prepared as your institution begins going through the same growth process.


  1. Current solutions are subpar. With the right kind of technical planning an team can build a better solution for much better costs.

    The programming/development problems faced are not difficult for a solid technical team. Perhaps the biggest issues faced are infrastructure changes required to implement the correct solution.

  2. Cloud memory. It's more cost effective, and eats up less space for your own infrastructure.

  3. Institutions such as the Mayo clinic consider their images a strategic asset and would never outsource that. They manage their own VNA, which is in their situation preferred. Outsourcing your data makes only sense if their is a lack of skills and/or local support at the institution in my opinion.That is why fedex, Walmart, and others will never outsource it.

  4. If my understanding is correct, with VNA, the viewer will now have to download all relevant images to its local storage space before it can display, process, and do something about it. This, by itself, creates an implementation issue, especially when the viewer is located in another facility or even doctor's home connected via WAN.

  5. I don't have the same understanding, e.g., a "zero footprint" viewer does not load anything.

  6. It would be good to dig deeper into the challenges that Mayo and soon others will face when moving from one imaging solution to another. The whole point of adopting a VNA or "enterprise archive" is to create an inclusive collection of image data and avoid future migration or wholesale transport of the data. The four methods identified to make a new PACS aware of the historical data or not terribly encouraging.

    Brute Force - all images pushed back to PACS to be re-archived
    PACS Database update/Database migration
    PACS Query of VNA (real time?)
    Order driven - prefetch

    Let's hope as the PACS industry continues to mature, there will be a mechanism (perhaps some sort of agent) that can explore the archive as a background task and inform the new database of the historical data. One potential issue that comes to mind first is what will be the performance impact in loading the database with billions of records from prior exams, many of which will never be accessed?

    Another question is, how do we address access to images across multiple repositories as we begin connecting Exchanges across the country? Or how do we manage access to images within a local HIE?