MIT Building 4-370
182 Memorial Drive, Cambridge, MA 02139
Biology-Guided Representation Learning for Computational Pathology
Cancer diagnosis has been grounded in tissue morphology for over a century, yet the molecular programs that drive disease progression, treatment response, and patient outcome are not directly visible in standard histology slides. Current foundation models for computational pathology learn representations primarily from morphology, without access to the underlying biology of mutations, gene expression, or protein states. This thesis investigates whether encoding biological structure directly into computational pathology models can produce stronger representations than those learned from morphology alone. The thesis introduces biology as a learning signal at multiple levels of the computational pathology pipeline. At the task level, structuring gene expression into biological pathways and fusing them with histology produces more interpretable and accurate survival predictions than unstructured approaches. At the representation level, a demographic bias audit reveals that encoder quality is the most consequential lever among the design choices evaluated, motivating a shift from task-specific integration of biological information toward using biology directly as a signal for representation learning for computational pathology encoders. At the pretraining level, three self-supervised methods learn slide-level representations by aligning histology with biological structure: immunohistochemistry stains for organ-specific pretraining, paired genomic and transcriptomic profiles for pan-cancer pretraining, and spatial protein markers for a new imaging modality beyond H\&E histology. These results provide evidence that biological structure, whether encoded as pathways, stains, genomes, or protein markers, is an effective and transferable learning signal for computational pathology. Whether biology-guided representations can improve patient outcomes when deployed in clinical decision-making, and how such systems should be integrated into diagnostic workflows, validated across diverse populations, and regulated in practice, remain open questions that span science, engineering, and health policy.
Thesis Supervisor:
Faisal Mahmood, PhD
Associate Professor of Pathology, Brigham and Women’s Hospital and Harvard Medical School
Thesis Committee Chair:
Brett Bouma, PhD
Professor of Dermatology, Massachusetts General Hospital and Harvard Medical School
Thesis Reader:
Eliezer M. Van Allen, MD
Professor of Medicine, Dana-Farber Cancer Institute and Harvard Medical School
________________________________________________________________________________________
Zoom Invitation
Anurag Vaidya is inviting you to a scheduled Zoom meeting
Topic: Anurag Vaidya MEMP PhD Thesis Defense
Time: Wednesday, May 6, 2026, 11:00 AM Eastern Time (US and Canada)
Your participation is important to us: please notify hst [at] mit.edu (hst[at]mit[dot]edu), at least 3 business days in advance, if you require accommodations in order to access this event.
Join Zoom Meeting
https://mit.zoom.us/j/95035650374
One tap mobile
+16465588656,,95035650374# US (New York)
+16699006833,,95035650374# US (San Jose)
Meeting ID: 950 3565 0374
US: +1 646 558 8656 or +1 669 900 6833
International Numbers: https://mit.zoom.us/u/acaQkJifBl
Join by SIP
95035650374 [at] zoomcrc.com (95035650374[at]zoomcrc[dot]com)
Join by Skype for Business
https://mit.zoom.us/skype/95035650374