MIT Building E25-140
45 Carleton Street, Cambridge, MA 02142
Natural Language Foundation Models in Medical Artificial Intelligence
Over the past decade, the transformative rise of deep learning, particularly large language models, has spurred innovators across diverse fields, including medicine, to explore the potential of artificial intelligence (AI) to revolutionize existing practices. In this time, general foundation models, rather than narrow and highly specialized task-specific systems, have begun to emerge as the dominant paradigm. In healthcare, we are witnessing the development of AI systems with medical knowledge and diagnostic capabilities that appear (in limited evaluations) to rival or even surpass human clinicians. Given their ability to process natural language, a crucial medium for knowledge and communication in medicine, many of these modern foundation models hold the promise of unlocking a new generation of highly versatile and impactful clinical AI systems. This thesis focuses on two key classes of natural language-driven foundation models --- Contrastive Language Image Pretraining (CLIP) models, and Large Language Models (LLMs) --- and investigates how such of models can be leveraged to encode and deliver useful clinical knowledge.
First, we introduce Text-Image Entropy Regularization, or TIER, a novel method inspired by the observation that clinical findings are often localized in medical imaging. Using large public chest x-ray (CXR) datasets, we apply TIER during pre-training to improve the local alignment of CLIP representations and use this model to achieve state-of-the-art zero-shot classification performance on CXR findings. Next, we examine the reliability of CLIP-style models. In one study, we evaluate their robustness to shortcut learning, a phenomenon in which deep learning models learn to rely on non-generalizable decision rules, to understand the potential protective effects of text self-supervision. In another study, we explore how conformal prediction, a statistical framework for quantifying uncertainty, can be used to control zero-shot classification performance and preempt which inputs will be compatible and safe to use for these CLIP-style models.
Thirdly, we develop Articulate Medical Intelligence Explorer (AMIE), a conversational diagnostic LLM-based AI system fine-tuned with simulated patient-doctor dialogues. We evaluate the diagnostic capabilities of AMIE in two randomized studies, comparing with primary care physicians (PCPs); first, in challenging clinicopathological conference (CPC) cases, and then in virtual text-based objective structured clinical examinations (OSCE). In this text-only setting, AMIE was rated as superior in both diagnostic and conversation quality by specialist clinicians, and access to AMIE greatly improved the accuracy and comprehensiveness of the PCP’s differentials. Finally, we explore AMIE's management reasoning capabilities in two subspecialty domains, genetic cardiovascular disease and breast oncology, comparing (under subspecialist evaluation) our system’s performance on real and synthetic cases to general cardiologists/oncologists. Our results indicate that AMIE is complementary and may help up-level generalists, with AMIE often more thorough and sensitive, while the clinicians were often more concise and specific. As a whole, this thesis explores the potential of natural language-driven foundation models in medicine, while emphasizing the need for further research to address real-world challenges and ensure their safety and efficacy.
Thesis Supervisor:
Andrew Beam, PhD
Assistant Professor of Epidemiology, Harvard T.H Chan School of Public Health
Assistant Professor of Biomedical Informatics, Harvard Medical School
Thesis Committee Chair:
Marzyeh Ghassemi, PhD
Associate Professor of Electrical Engineering and Computer Science (EECS) and of the Institute for Medical Engineering & Sciences (IMES), MIT
Thesis Readers:
Marinka Zitnik, PhD
Assistant Professor of Biomedical Informatics, Harvard Medical School
Tianxi Cai, Sc.D.
John Rock Professor of Population and Translational Data Sciences, Biostatistics, Harvard T.H. Chan School of Public Health
________________________________________________________________________________________
Zoom Invitation
Anil Palepu is inviting you to a scheduled Zoom meeting
Topic: Anil Palepu MEMP PhD Thesis Defense
Time: Friday, January 17, 2025, 3:00 PM Eastern Time (US and Canada)
Your participation is important to us: please notify hst [at] mit.edu (hst[at]mit[dot]edu), at least 3 business days in advance, if you require accommodations in order to access this event.
Join Zoom Meeting
https://mit.zoom.us/j/2381189569
Password: 1111
One tap mobile
+496971049922,,2381189569# Germany
Meeting ID: 238 118 9569
US: +1 646 558 8656 or +1 669 900 6833
International Numbers: https://mit.zoom.us/u/aeshFd0faT
Join by SIP
2381189569 [at] zoomcrc.com (2381189569[at]zoomcrc[dot]com)
Join by Skype for Business
https://mit.zoom.us/skype/2381189569