Thursday, March 19, 2020

Healthcare AI Regulatory Considerations.

Based on the information provided during the recent FDA sponsored workshop, “The Evolving
Role of Artificial Intelligence in Radiological Imaging,” here are the key US FDA regulatory considerations you should be aware of.

1. AI software applications are fundamentally different in that an AI algorithm is created and improved by feeding it data so it can learn, and eventually, if it implements Deep Learning, it can learn and improve autonomously based on new data. AI is a big business opportunity.

According to an analysis by Accenture, the market for AI applications for preliminary diagnosis and automated diagnosis is $8 billion. The same analysis points out that there is a 20 percent unmet demand for clinicians in the US by 2026, which can be addressed by AI. 


It became clear during the conference that the prediction made in November of 2016 by Geoffrey Hinton that deep learning would put radiologists out of a job within 5 years was a gross miscalculation. No jobs have been lost as of today, by contrast, the number of studies to be reviewed is increasing to almost 100 billion images per year, to be read by approximately 34,000 radiologists, requiring more and more images to be read faster and more efficiently. The use of AI to eliminate “normal” cases, especially for screening exams such as for breast cancer or TB in chest images, will only be a big relief for radiologists.


2.       AI will not make radiologists obsolete but rather will change their focus as the image by itself might become less important than the overall patient context. We spend a lot of time improving image quality by reducing image artifacts and increasing resolution so a physician can make a better diagnosis. However, as one of the speakers brought up, using autonomous AI could potentially eliminate the need of creating an image, by basing the diagnosis directly on the information in the raw data. Why would we need an image? Remember, the image was created to optimally present information to a human, ideally matching our eye-brain detection and interpretation. If we apply the AI algorithm on the acquired data without worrying about the image, we could use it on CT raw data streaming straight from the detector, or the signals directly from the MR high frequency coils, the ultrasound sound waves, or the EKG electrical signals, or whatever information comes from any kind of detector. Images have served the physicians very well for many years. In some cases, “medical imaging” will be implemented without the need to produce an image and we might need to rename it to become “medical diagnosing” instead. I believe that a radiologist is first and foremost an MD and thinking that they will be out of a job when there is less of an emphasis on the images seems misguided.

3.       AI algorithms are often focused on a single characteristic, which is a problem when using them in an autonomous mode causing incidental findings to go unnoticed. There were two good examples given during the workshop, the first one was an ultrasound of the heart of a fetus which looked perfectly normal. So, if one would run an AI algorithm to look for defects, it would pass as being OK. However, in this particular case as shown in the image, the heart was outside the chest, aka Ectopia Cordis, a rare condition, but if present should be diagnosed early to treat accordingly. The other example was for autonomous AI detection of fractures. Fractures are very common for children as I can attest personally having many grandkids who are very active. One of the speakers mentioned that in some cases when looking at the fracture there are incidental findings of bone cancer, something that a “fracture algorithm” would not detect. So, maybe my previous hypothesis that an image might become eventually obsolete is not quite correct, unless we have an all-encompassing AI detection algorithm that can identify every potential finding.
The problem with creating an all-encompassing AI is that there are some very rare findings and diseases for which there is relatively little data available. It is easy to get access to tens of thousands of chest images or breast images with lung or breast cancer from the public domain for example from NCI, however for rare cases there might be not enough data available to be statistically significant to train and validate an AI algorithm.

4.       There are still many legal questions and concerns about AI applications. As an analogy, the electric car company Tesla is being sued right now by the surviving family of the person who died after his car crashed in a highway median because the autopilot misread the lane lines. Many people die because they crash into the medians because of human error, however, there is much less tolerance for errors made by machines than by humans. The question is who is accountable if an algorithm fails with subsequent patient harm or even death, the hospital, the responsible physician, or vendor of the AI algorithm?

5.       A discussion about any new technology would not be complete without a discussion about standards. How is an algorithm integrated into an existing PACS viewer or medical device software and how is the output of the AI encoded? The IHE has just released a set of profiles that address both the AI results and workflow integration in two profiles. Implementors are encouraged to support these standards and potential users are encouraged to request them in their RFP’s.

6.       There are three different US FDA regulatory approval and oversight classifications for medical devices and software:
      1.       Class 1: Low risk, such as an image router. This classification requires General Controls to be applied (Good Manufacturing practices, complaint handling, etc.)
      2.       Class 2: Moderate risk such as a PACS system or medical monitor, as well as Computer Aided Detection software. This classification requires both general as well as special controls to be applied. These devices and software require a 510(k) premarket clearance.
For a moderate risk device that does NOT have a predicate device, a new procedure has been developed aka a “de novo” filing. For example, the first Computer Aided Acquisition device which was approved in January 2020 followed the de novo process.
      3.       Class 3: High risk such as Computer Aided Diagnosis which requires general controls AND Premarket Approval (PMA).

7.       AI can be distinguished into the following categories:
a.       CADe or Computer Aided Detection – These aid in localizing and marking of regions that may reveal specific abnormalities. The first application was for breast CAD, initially approved in 1997, followed by several other organ CAD applications. CADe has recently (as of January 2020) be reclassified to NOT need a PMA but rather being class 2 and needing only a 510(k).
b.       CADx or Computer Aided Diagnosis – Aids in characterizing and assessing disease type, severity, stage and progression
c.       CADe/x or Computer Aided Detection and Diagnosis – This is a combination of the first two classifications as it will do both localizing as well as characterizing the condition.
d.       CADt or Computer Aided Triage – This aids in prioritizing/triaging time sensitive patient detection and diagnosis. Based on a CADe and/or CADx finding, it could immediately alert a physician or put it on the top of a worklist to be evaluated.
e.       CADa/o or Computer Aided Acquisition/Optimization – Aids in the acquisition/optimization of images and diagnostic signals. The first CADa/o was approved in January 2020 for ultrasound to provide help to non-medical users to acquire images. Being first-in-class, it followed the de novo clearance process.

8.       Other dimensions or differentiation between the different AI algorithms are:
·         Is the algorithm “locked” or if it is continuously adaptive? An example of a locked algorithm was the first CADe application for digital mammography, its algorithm was locked and it is still basically the same as when the FDA cleared its initial filing in 1996. An adaptive algorithm will continue to learn and supposedly improve.
·         What is the reader paradigm? AI can serve as the first reader, which then possibly determines its triage, as a concurrent reader, e.g. it will do image segmentation or annotation while a physician is looking at an image, as a secondary reader, such as used to replace a double read for mammography, or it can include no human reader being autonomous. The first clearance for a fully autonomous AI application, based on having a better specificity and sensitivity than a human reader, was for diabetic retinopathy which was cleared in January of 2019.
·         What is the oversight? Is there no oversight, is it sporadic, or continuous? Note that this is different from the reader paradigm, a fully autonomous AI algorithm application might still require regular oversight as part of the QA checking and post-market surveillance, especially if the algorithm is not locked but adaptive.

9.       The FDA has several product codes for AI applications. The labeling and relationship between these codes, the various CAD(n) definitions and corresponding Class 1,2,3 and “de novo” classifications is
inconsistent and unclear. The majority of the products, i.e. more than 60 percent are cleared under the PACS product code (LLZ) as that is the most logical place for any image processing and analysis related filings, the remainder is cleared under 6 different CAD categories (QAS, QFM, QDQ,POK, QBS, and the most recent QJU) and a handful others. If a vendor wants to file a new algorithm, the easiest path is to convince the FDA that it fits under LLZ as there are many predicates and a lot of examples, assuming that the FDA approves that approach. I would assume that they want to steer new submissions towards the new classifications, however as you can see from the chart, there are very few predicates, sometimes only a single one.

10.   Choosing the correct size and type of dataset that is used for the learning is challenging:
·         There are no guidelines on the number of cases that are to be included in the dataset that is used for the algorithm to learn and to validate its implementation. The unofficial FDA position is that the data should be “statistically significant,” which means that it requires intensive interaction with the FDA to make sure it meets its criteria.
·         Techniques and image quality vary a lot between images, to the extent that certain images might not even be useful as part of the dataset.
·         One needs to make sure that the dataset is representative for the body part, disease, and population characteristics. It has been acknowledged that a dataset from e.g. Chinese citizens might not be applicable for a population in US, Europe or Africa. In addition, it became clear that it might need to be retrained based on the type of institution (compare a patient population at a VA medical center with the patients at a clinic in a suburb) and even geographic location (compare Cleveland with Portland, the youth in Cleveland being the most obese in all of the US).
·         There is a big difference between different manufacturers on how to represent their data.  This requires the normalizing and/or preparation of the data to make sure the algorithm can work on it. Even for CR/DR there are different detector/plate characteristics, different noise patterns, image processing applied by the vendor, different LUT’s applied, etc.
The figure shows the intensity values for different MRI’s.

11.   There should be a clear distinction between the three different datasets that are used for different purposes:
·         The training dataset that is used to train the AI algorithm.
·         After the initial training is done, one would use a tuning dataset to optimize the algorithm.
·         As soon as the algorithm development is complete, it will become part of the overall architecture and is verified with an integration test, which tests against the detailed design specs. This is followed by a system test that verifies against the system requirements, and lastly by a final Validation and Verification, which test against the user requirements using a separate Test dataset.

12.   AI clearance changed the traditional process in that now pre-clearance testing and validation and post-market surveillance are required. The pre-clearance is covered by the pre-submission, aka as the Q-Submission program, which has a separate set of guidelines and is extensively used by AI vendors. It is basically a set of meetings with the FDA with the focus on determining that the clinical testing is statistically significant and that the filing strategy is acceptable. Last year, there were 2200 pre-submissions out of 4000 submissions, which shows that it has become common practice. The FDA strongly encourages this approach.

The post-market surveillance is very important for non-locked algorithms, i.e. the ones that are self-learning and supposedly continuously improving. The challenge is to make sure that the algorithms are getting better and not worse, which requires post-market surveillance. There was a lot of discussion about the post-market surveillance and a consensus that it is needed but there were no guidelines available (yet) on how this would work.

13.   There are a couple of applicable documents that are useful when looking to get FDA clearance for an AI application: the Q-submission process, the De Novo classification request, and regulatory framework discussion paper.

The FDA initiative to have an open discussion in the form of a workshop was an excellent idea and brought forth a lot of discussion and valuable information. You can find a link to the many presentations at their website. It was obvious that the regulatory framework for AI applications is still very much under discussion. Key take-aways are the use of pre-submissions to have an early dialogue with the FDA about the acceptable clinical data used for training and validation, and regulatory product classification and approach, as well as the need for a post market assessment, which is not defined (yet) especially for adaptive AI algorithms.

The de novo approach will also be very useful for the “to-be-defined” product definitions and it might be expected that the list of product classifications will grow as more products are introduced. AI is here to stay and the sooner the FDA has a well defined process and approach, the faster these products can make an impact to the healthcare industry and patient care.