MEREDITH (Medical Evidence Retrieval and Data Integration for Tailored Healthcare) is a novel LLM system to support treatment recommendations in precision oncology. MEREDITH leverages Google Cloud AI Platform to generate predictions for recommended precision oncology treatment for patients based on their diagnosis (e.g., NSCLC) and molecular profile. MEREDITH was developed as part of a research project undertaken by a team of scientists at [MRI Klinikum rechts der Isar der Technischen Universtität München] (https://www.mri.tum.de/). As soon as the fulltext is published, it will be provided here for full context.
MEREDITH leverages a combination of Vertex AI Agents, which are used as retrievers in a Retrieval Augmented Generation (RAG) architecture, and calls to Vertex AI Text Generation Models to perform summarization tasks, followed by calls to Vertex AI Generative Models to perform synthesization tasks.
To execute Meredith, you will need:
- A Google Cloud project with the Vertex AI API enabled
- A
credentials.json
for a service account in your Google Cloud project withVertex AI user
andDiscovery Engine viewer
permissions - Three Vertex AI Agents: 1/ agent with literature on precision oncology research (we used the query
targeted treatment for [biomarker]
on PubMed, 2/ agent with currently valid standards of care, we used Deutsche Krebsgesellschaft, 3/ agent with currently recruiting clinical trials, we used QuickQueck; all input documents are provided as .pdf; all agents must be configured as Layout Parsers and set to "Chunking" mode; unfortunately we cannot make our agents publicly available due to copyright concerns - And that's it, you're good to go!
- Patient input data is provided in patient.json. These patients are synthetic patients that were used in previous studies, e.g., Benary et al., 2023
- MRI TUM MTB annotations for patients are stored as
mtb_recommendations
. These are used to calculate cosine similarity scores - Patient input data can be modified. We performed pre-screening on patient input data to identify and separate pathogenic mutations in
tumor_pathogenic
to optimize load on the system. Only pathogenic mutations will be processed
- A cosine similarity score is calculated after LLM annotations are completed
- Cosine similarity scores use the cosine similarity between two arrays of Vector embeddings to calculate the semantic similarity between two texts. They are commonly in a range between 0.5 and 0.9, with 0.5 being the lowest and 0.9 being the highest cosine similarity