There are more AI health tools than ever—but how well do they work?
Earlier this month, Microsoft launched Copilot Health, a new space within its Copilot app where users will be able to connect their medical records and ask specific questions about their health. A couple of days earlier, Amazon had announced that Health AI, an LLM-based tool previously restricted to members of its One Medical service, would…

In recent weeks, the tech industry has seen a surge in AI health tools, with Microsoft launching Copilot Health and Amazon expanding access to its Health AI. These developments join other similar offerings like ChatGPT Health and Anthropic’s Claude, highlighting a growing trend in the use of chatbots for health advice. As the demand for accessible health information grows, these AI tools aim to fill the gap left by traditional medical systems. However, while there is optimism about the potential of these language models (LLMs) to provide safe and useful recommendations, concerns about their effectiveness and rigorous evaluation persist.
Microsoft’s Copilot Health, a new feature within its Copilot app, allows users to connect their medical records and ask specific health-related questions. Similarly, Amazon’s Health AI, previously limited to One Medical members, is now available to a wider audience. These tools leverage the capabilities of LLMs to offer personalized health insights, tapping into the growing need for accessible healthcare solutions. The ease of use and immediate availability of these AI tools make them an attractive alternative for individuals seeking medical advice.
The potential of LLMs in healthcare is not without precedent. OpenAI’s ChatGPT Health, released in January, and Anthropic’s Claude, which can access health records with user permission, have already demonstrated the viability of AI in this domain. Research suggests that these models can generate accurate and actionable health recommendations, particularly in areas where traditional medical resources are scarce or inaccessible.
Despite these promising developments, experts emphasize the need for more rigorous evaluation of these AI health tools. In a high-stakes field like healthcare, relying solely on companies to assess the quality and safety of their products may not be sufficient. Independent expert reviews, ideally conducted before widespread release, are crucial to ensure these tools meet the necessary standards. While some companies, such as OpenAI, are investing in thorough research, the broader research community can help identify potential blind spots and gaps in evaluation.
Andrew Bean, a doctoral candidate at the Oxford Internet Institute, acknowledges the potential of AI health tools but stresses the importance of a robust evidence base. “To the extent that you always are going to need more healthcare, I think we should definitely be chasing every route that works,” he says. “It’s entirely plausible to me that these models have reached a point where they’re actually worth rolling out.” However, Bean cautions that the evidence supporting their effectiveness must be comprehensive and transparent.
Developers argue that the recent release of these AI health tools is a testament to the advancements made in LLMs. Dominic King, vice president of health at M, asserts that these models have indeed reached a point where they can effectively provide medical advice. As the technology continues to evolve, it is essential to strike a balance between harnessing the potential of AI and ensuring that these tools are evaluated rigorously and transparently.
In conclusion, the proliferation of AI health tools signifies a significant shift in the healthcare landscape, offering new avenues for accessible medical advice. While the potential of LLMs in this domain is undeniable, the need for independent evaluation and robust evidence remains paramount. By fostering collaboration between companies, researchers, and healthcare professionals, it is possible to harness the benefits of AI in healthcare while mitigating potential risks. As the field progresses, it is crucial to prioritize the safety and efficacy of these tools, ensuring they meet the high standards required in the healthcare arena.







