Limited effect and serious risk when using AI to draft patient replies

LLM GenAI Epic reading time

Following much optimism regarding GPT-4's capabilities, recent studies highlight its limited effect on reply time and the potential risks associated with using AI to draft replies to patient messages.

A study conducted by University of California San Diego School of Medicine showed a significant increase in response length and reading time and no effect on the reply time using Epic's GenAI. Additionally, the study raised safety concerns, noting instances where the AI-generated drafts suggested clinical actions, such as recommending an X-ray or physical therapy, which exceeded its intended function. Although some physicians found the tool beneficial, there is room for improvement, particularly in enhancing personalization to better match the physician's style and in making more judicious recommendations about whether to advise a patient visit.

Also at Mass General Brigham a proof-of-concept end-user study was carried out assessing the effect and safety of Epic's AI-assisted patient messaging. The results indicated that the language model drafts posed a risk of severe harm in approximately 7.1% of the responses, and in one instance (0.6%), there was a risk of death. Noteworthy, the supplementary materials show that treatment recommendations were asked for in the prompt. So some of this results could have potentially been prevented with alternative prompting.

The researchers of both studies acknowledge the potential of these tools but advise caution. "LLMs might affect clinical decision making in ways that need to be monitored and mitigated when used in a human and machine collaborative framework."

To date, this application of AI has primarily been viewed as an administrative aid, escaping the scrutiny typically reserved for medical devices. However, if AI is influence clinical decisions, this maybe worth reconsidering.