Skip to content

Latest commit

 

History

History
22 lines (18 loc) · 3.21 KB

README.md

File metadata and controls

22 lines (18 loc) · 3.21 KB

Introduction

MEREDITH (Medical Evidence Retrieval and Data Integration for Tailored Healthcare) is a novel LLM system to support treatment recommendations in precision oncology. MEREDITH leverages Google Cloud AI Platform to generate predictions for recommended precision oncology treatment for patients based on their diagnosis (e.g., NSCLC) and molecular profile. MEREDITH was developed as part of a research project undertaken by a team of scientists at [MRI Klinikum rechts der Isar der Technischen Universtität München] (https://www.mri.tum.de/). As soon as the fulltext is published, it will be provided here for full context.

Technical Design

MEREDITH leverages a combination of Vertex AI Agents, which are used as retrievers in a Retrieval Augmented Generation (RAG) architecture, and calls to Vertex AI Text Generation Models to perform summarization tasks, followed by calls to Vertex AI Generative Models to perform synthesization tasks.

How-to

Prerequisites

To execute Meredith, you will need:

  • A Google Cloud project with the Vertex AI API enabled
  • A credentials.json for a service account in your Google Cloud project with Vertex AI user and Discovery Engine viewer permissions
  • Three Vertex AI Agents: 1/ agent with literature on precision oncology research (we used the query targeted treatment for [biomarker] on PubMed, 2/ agent with currently valid standards of care, we used Deutsche Krebsgesellschaft, 3/ agent with currently recruiting clinical trials, we used QuickQueck; all input documents are provided as .pdf; all agents must be configured as Layout Parsers and set to "Chunking" mode; unfortunately we cannot make our agents publicly available due to copyright concerns
  • And that's it, you're good to go!

Patient input data

  • Patient input data is provided in patient.json. These patients are synthetic patients that were used in previous studies, e.g., Benary et al., 2023
  • MRI TUM MTB annotations for patients are stored as mtb_recommendations. These are used to calculate cosine similarity scores
  • Patient input data can be modified. We performed pre-screening on patient input data to identify and separate pathogenic mutations in tumor_pathogenic to optimize load on the system. Only pathogenic mutations will be processed

Cosine similarity score

  • A cosine similarity score is calculated after LLM annotations are completed
  • Cosine similarity scores use the cosine similarity between two arrays of Vector embeddings to calculate the semantic similarity between two texts. They are commonly in a range between 0.5 and 0.9, with 0.5 being the lowest and 0.9 being the highest cosine similarity