Back
Technology

APOLLO Foundation Model Trained on 25 Billion Medical Events from 7.2 Million Patients

View source

Researchers developed APOLLO, a multimodal temporal foundation model trained on the MGB-7M dataset, which includes 7.2 million patients and 25.2 billion medical events from 33 years of records across 17 Mass General Brigham institutions.

The model integrates 28 medical modalities, including lab tests, progress notes, and medical images, and was trained using Masked Token Modeling to learn virtual patient representations.

Performance Highlights

Evaluated on 322 clinical tasks (261 prognostic, 61 retrieval), APOLLO demonstrated remarkable predictive power:

  • Schizophrenia onset: AUROC of 0.92
  • In-hospital dialysis dependence: Balanced accuracy of 0.97
  • 3-year heart failure risk: AUROC 0.88 (vs. baseline 0.77)
  • 3-year type 2 diabetes risk: AUROC 0.85 (vs. baseline 0.61, p<0.0001)
  • 3-year post-stroke survival: AUROC 0.84 (vs. baseline 0.72, p<0.0001)
  • Survival prediction for trastuzumab therapy in HER2+ breast cancer: AUROC 0.93 (vs. baseline 0.66, p<0.0001)

Ablation studies revealed that multimodal integration achieved a mean AUROC of 0.735 for cancer progression, outperforming structured-data-only (0.71) and task-specific fine-tuning (0.626).

Additionally, the model demonstrated the ability to retrieve similar patients from a test database of 1.4 million using pathology slide queries.

Background & Context

Only 3% of an estimated 50 PB of annual healthcare data is used for research. Existing AI models often analyze single modalities and fail to capture longitudinal patient histories.

APOLLO aims to create unified digital patient representations to enable precision trial matching and personalized risk stratification.

Important Limitations

  • The model is trained on observational EHR data, so predictions are associational, not causal.
  • It cannot estimate differential efficacy of competing treatments.
  • Treatment-response analyses stratify risk within cohorts receiving a given therapy, not across different treatments.