The FDA has asked for public input on how to measure the real-world performance of AI-enabled medical devices. Stakeholders have raised concerns over performance drift, data quality, and regulatory feasibility.
Key takeaways:
- The FDA is exploring how to monitor AI-enabled medical devices post-deployment, focusing on performance drift and real-world reliability.
- The FDA’s draft lifecycle guidance and PCCPs explore adaptive algorithm pathways.
- Counsel should proactively incorporate monitoring into product lifecycles ahead of potential changes in FDA regulation.
Why the FDA is seeking public input
The US Food and Drug Administration (FDA) has authorised 1,247 AI-enabled medical devices since 1995, and the current regulatory framework is working well for static algorithms. “The current FDA regulatory framework for AI-enabled medical devices is pretty effective in regulating these devices, and that’s mainly because the framework is tailored based on device risk,” said Siemens Healthineers senior advisor for digitalisation and AI Peter Shen.
Here's the catch – all FDA-authorised AI-enabled medical devices use “locked” algorithms that manufacturers cannot alter after approval.
The FDA is now exploring how to accommodate continuous AI learning in medical devices. In January 2025, the agency issued draft guidance on lifecycle management – which remains unfinalised – providing recommendations on marketing AI-enabled medical devices and post-market performance monitoring. The agency has also piloted Predetermined Change Control Plans (PCCPs) for medical devices, which allows manufacturers to notify the FDA of algorithmic modifications to medical devices without requiring additional submissions. Shen described the pilot as a good first step to indicate that AI algorithms need to be improved with new training data or adaptability.
The FDA explored these challenges in November 2024, when its Digital Health Advisory Committee met to discuss real-world evaluation strategies to ensure that AI-enabled medical devices remain safe and effective after deployment. Public commentary preceding that meeting revealed monitoring challenges as algorithms encounter real-world issues.
The FDA broadened this exploration in September 2025 by requesting public comments on measuring real-world performance of AI devices, building on insights from the earlier advisory committee discussion. Manufacturers and other stakeholders have until 1 December 2025 to respond. Notably, the request is meant to gather input, not set new rules.
“What the FDA is trying to determine is whether the post-market surveillance process will still work with AI algorithms, especially with algorithms that may be continuously learning or changing over the course of time. The ask is really just exploration by the FDA about how to refine a process that’s already in place,” Shen added.
Regulatory advisors also caution that compliance strategies must extend beyond the FDA. “Given how rapidly regulations are developing, I would advise clients to maintain a proactive and flexible compliance strategy. This includes closely tracking updates not only from FDA and local regulators, but also from key global regulators. The key is spotting where rules are aligning – or diverging – early so they can adapt quickly as things change,” said Baker Sterchi Cowden & Rice partner Megan Sterchi Lammert.
For example, AI-enabled medical device manufacturers in the EU must comply with both the Medical Device Regulation and the EU Artificial Intelligence Act, creating a dual framework that combines device safety with AI obligations. In contrast, the UK’s Medicines and Healthcare products Regulatory Agency regulates AI without standalone AI legislation.
The FDA’s questions in the request for public comments – spanning performance metrics, monitoring infrastructure, human AI-interaction, and data governance – reveal considerable uncertainty about implementation details.
Real-world AI risks
AI-enabled medical devices operate in environments where performance can degrade silently. Unlike traditional devices, AI models drift in ways that training data may not have anticipated.
The drift problem demands both technical solutions and regulatory adaptation. As the FDA explores lifecycle oversight, manufacturers should prepare for further post-market responsibilities.
The need for adaptive oversight has grown alongside the rapid evolution of AI itself. In late 2022, ChatGPT demonstrated that models could perform tasks they weren’t explicitly designed for, like multi-step reasoning. These so-called “emergent abilities” have been documented in research on large language models – raising questions about predictability and control.
Medical software engineer and software consulting firm owner Logan Frederick explained in a comment letter that emerging technologies have “extraordinarily broad capabilities and may be impossible to fully constrain to narrow medical-use cases.” His observation reflects broader concerns about the complexity introduced by AI in medical devices – complexity that makes continuous monitoring essential.
The FDA’s request for comment signals the agency’s concern regarding performance drift monitoring and data management. Dr. Akshaya Bhagavathula, Associate Professor of Epidemiology at North Dakota State University, argues that traditional metrics like accuracy scores miss a critical dimension of real-world performance. He proposes new approaches, such as temporal stability tracking, population-weighted scoring to avoid masking errors in underrepresented groups, and clinical impact assessments that weigh errors based on their consequences.
But knowing what to monitor requires a systematic approach. Professor Christian Johner, founder of the Johner Institute – a consultancy specialising in medical device regulation and quality systems – identifies four essential monitoring categories: the model’s input data to ensure patient populations still match intended use, model outputs to track performance metrics and bias, user interaction to detect over-trust or under-trust of devices, and the technical environment.
Mohammad Sufiyan Khan, an oncology-focused biomedical AI researcher, points to a deeper problem with how models are trained. He explained that most current models are “trained on cleaned and curated datasets, which limits their capacity to generalise to real-world biological noise.” Drawing from his own experience designing an AI workflow for cancer drug discovery, his proposed solution is a dual-stream learning model – one stream trained on noisy data to build “noise recognition literacy,” and another on curated data to preserve interpretability.
These technical challenges converge on a fundamental question: how will the FDA continue to assess whether AI-enabled medical devices remain safe and effective as they evolve? The Institute for AI Governance in Healthcare recommends addressing data quality through routine validation processes and governance committee reviews. But as Dan Noyes, a healthcare AI strategist who builds AI tools for clinicians and patients, warns, “without transparent monitoring, silent failures such as model drift or bias could erode trust and jeopardize outcomes.”
Baker Sterchi’s Lammert emphasised that the future of AI oversight in healthcare will depend on finding the right balance. “Transparency, human judgment, and ongoing collaboration between developers and regulators will be key to promoting responsible AI innovation in healthcare,” she said.