As the use of artificial intelligence continues to expand in healthcare, there are many legitimate concerns about what appears to be a new normal built around this powerful and rapidly changing technology. There are concerns about the job market, widespread concerns about fairness, ethics and equality – and, perhaps for some, the fear of a dystopian future in which intelligent machines become too powerful.
But the promise of AI and machine learning is also enormous: predictive analytics can mean better health outcomes for individuals and potentially game-changing advances in public health, while passing on the costs.
Finding a regulatory balance that benefits from the good and protects against the bad is a major challenge.
Government and healthcare leaders are more determined than ever to address racial bias, protect safety and “get it right.” Done incorrectly, it can harm patients, undermine trust, and potentially create legal liability for healthcare organizations.
We spoke with Dr. Sonya Makhni, medical director of the Mayo Clinic Platform and senior associate consultant for the Division of Hospital Internal Medicine, on recent developments in AI in healthcare and discussed some of the key challenges in performance tracking, generalizability and clinical validity.
Makhni explained how AI models in healthcare should be assessed before use, citing the use of AI for acquisitions as an example of the importance of understanding the performance of a specific model.
Q. What does it generally mean to deliver an AI solution?
A. An AI solution is more than just an algorithm; the solution also includes everything you need to make it work in a real workflow. There are a number of key stages to consider when developing and delivering an AI solution.
The first is the design and development phase of the algorithm. During this phase, solution developers must work closely with clinical stakeholders to understand the problem to be solved and the available data.
Then the solution developers can start the process of algorithm development, which itself includes many steps such as data acquisition and preprocessing, model training, and model testing (among a number of other important steps).
After algorithm development, AI solutions should be validated against third-party data and ideally implemented by an independent party. An algorithm that performs well on the initial data set may perform differently on a different data set representing different demographic populations. External validation is an important step in understanding the generalizability and bias of an algorithm and should be completed for all clinical AI solutions.
Solutions also need to be tested in clinical workflows, and this can be achieved through pilot studies, prospective studies and trials and through ongoing, real-world evidence studies.
Once an AI solution has been assessed for performance, generalizability, bias, and clinical validity, we can start thinking about how to integrate the algorithm into real clinical workflows. This is a crucial and challenging step that requires a lot of attention.
Clinical workflows are heterogeneous across healthcare systems, clinical contexts, specialties and even end users. It is important that the prediction results are communicated to end users at the right time, for the right patient and in the right way. For example, if every AI solution requires the end user to navigate to a different remote digital workflow, these solutions may not be widely adopted. Suboptimal integration into workflows can even perpetuate bias or worse clinical outcomes.
It is important to work closely with clinical stakeholders, implementation scientists, and human factors specialists when possible.
Finally, a solution must be monitored and refined as long as the algorithm is in use. The performance of algorithms can change over time, and it is critical that AI solutions are assessed periodically (or in real time) for both mathematical performance and clinical outcomes.
Q. What are the points in AI development that allow biases to creep in?
A. If used effectively, AI can improve or even transform the way we diagnose and treat diseases.
However, assumptions and decisions are made at every step of the AI development lifecycle, and if incorrect, these assumptions can lead to systematic errors. Such errors can bias the final result of an algorithm against a subgroup of patients and ultimately pose risks to the equity of healthcare. This phenomenon has been demonstrated in existing algorithms and is called algorithmic bias.
For example, if we design an algorithm and choose an outcome variable that is inherently biased, we may be perpetuating bias through the use of this algorithm. Or decisions made during the data preprocessing step may unintentionally have a negative impact on certain subgroups. Bias can be introduced and/or propagated at every stage, including deployment. Involving key stakeholders can help mitigate the risks and unintended consequences of algorithmic bias.
It is likely that almost all AI algorithms exhibit bias.
This does not mean that the algorithm cannot be used; it does emphasize the importance of transparency to know where the algorithm is biased. An algorithm may perform well in one population and poorly in another; the algorithm can and should still be used in the first case, as it can improve the results. However, it would be best if it were not used in the population for which it performs poorly.
Biased algorithms can still be useful, but only if we understand where it is and is not appropriate to use them.
At Mayo Clinic Platform, we’ve developed a tool to validate algorithms and perform quantitative bias assessments so we can help end users better understand how to use AI solutions safely and appropriately in clinical care.
Q. What should AI users consider when using tools like acquisition AI?
A. Users of AI algorithms should use the AI development lifecycle as a framework to understand where biases might be introduced.
Ideally, users should be aware of the algorithm’s predictors and outcome variable, if possible; however, this can be more challenging when using more complex algorithms. Understanding the variables used as inputs and outputs of an algorithm can help end users identify incorrect or problematic assumptions. For example, an outcome variable can be chosen that is itself biased.
End users also need to understand the training population used during model development. The AI solution may have been trained on a population that is not representative of the population to which the model is to be applied. This may be an indication to be cautious with the generalizability of the model. To this end, users need to understand how well the algorithm performed during development and whether the algorithm was externally validated.
Ideally, all algorithms should undergo a bias assessment – both quantitative and qualitative. This can help users understand math performance in different subgroups that vary by race, age, gender, etc. Qualitative bias assessments performed by solution developers can help alert users to situations that may arise in the future due to potential algorithmic bias; Knowledge of these scenarios can help users better monitor and reduce unintended performance disparities.
AI solutions for acquisition should be assessed on similar factors.
In particular, users need to understand if there are certain subgroups where performance varies. These subgroups may consist of patients with different demographics, or even patients with different diagnoses. This will help clinicians evaluate if and when the model’s predicted output is most appropriate and reliable.
Q. What are your thoughts on AI risk and risk management?
A. We normally think of risk as operational and regulatory risk. These pieces address how a digital health solution complies with privacy, security, and regulatory laws and are critical to any assessment.
We also need to start thinking clinically.
In other words, we need to consider how an AI solution could impact clinical outcomes and what the potential risks are if an algorithm is incorrect or biased, or if the actions taken on an algorithm are incorrect or biased.
It is the responsibility of both the solution developers and the end users to frame an AI solution in terms of risks as best as possible.
There are likely many ways to do this, and Mayo Clinic Platform has developed our own risk rating system to help us achieve this, with AI solutions undergoing a qualification process before being used externally.
Q. How can physicians and healthcare systems get involved in the process of creating and delivering AI solutions?
A. Clinicians and solution developers must work together throughout the AI development lifecycle and during solution implementation.
Active involvement from both parties is necessary in predicting potential areas of bias and/or suboptimal performance. This knowledge will help clarify contexts that are better suited to a particular AI algorithm and contexts that may require more monitoring and oversight. All relevant stakeholders should also be involved during the implementation phase, and AI solutions should be carefully monitored and refined as necessary.
Andrea Fox is editor-in-chief of Healthcare IT News.
Email: afox@himss.org
Healthcare IT News is a HIMSS Media publication.