Skip to content

Chat with Open Targets genetics database leveraging text-to-graphQL capabilities of OpenAI Codex.

Notifications You must be signed in to change notification settings

cx0/chatGPT-for-genetics

Repository files navigation

chatGPT for genetics

This is the code repo for my submission at Scale AI Generative AI & LLM hackathon last week. Please feel free to fork or submit PR for feature request.

Demo

Q: What's the disease / phenotype associated with gene of my interest?

Asking Open Targets questions about gene-phenotype associations.

Motivation

Open Targets is the largest public-private partnership to curate information about genetic diseases, clinical trials and molecular entities (e.g., drugs) to accelerate drug discovery research. While the web interface provides a user-friendly web interface for making simple queries, it does not provide a convenient search engine to support more sophisticated queries.

Fortunately, Open Targets team provides a graphQL API to access and query relevant data from a number of useful endpoints. However, most biologists are not comfortable to write their own graphQL queries or access these data via API. Can we leverage GPT-based search engines to translate natural language to valid graphQL queries to navigate the richest drug discovery dataset?

Inspiration

I really liked BirdSQL - Perplexity AI's GPT-based search engine using OpenAI Codex to translate natural language to SQL queries to navigate Twitter. Make sure to check it out yourself here. Great choice to showcase the capabilities of their search engine and very impressive implementation overall!

Implementation

As you can see with BirdSQL, OpenAI's Codex tool does a wonderful job translating natural language to SQL queries. OpenAI suggests specific prompt templates to improve the generated text response. For text-to-SQL code completion task, they suggest to prompt the model with SQL tables and their properties. I was curious to find out whether Codex models are capable of translating text to graphQL queries given a similar schema.

Input  : prompt template with schema + user query
Model  : code-davinci-002 Codex model
Output : user query in graphQL syntax

Once we have a valid graphQL query, we can submit this query to the relevant API endpoint provided by Open Targets Platform GraphQL and relay the response to the user. This approach did not work well. Codex model populated unnecessary extra fields that resulted in invalid graphQL queries.

Instead, providing an illustrative example graphQL query was sufficient for Codex to produce a decent text-to-graphQL translation that plays nicely with Open Target graphQL API. Demo code in this repo used this approach. I'm looking forward to exploring other prompt and fine-tuning strategies to improve text-to-graphQL translation for a wide range of Q&A tasks where there is a well-structured domain-specific graphQL API available.

TODO

  • SQL-to-graphQL direct translation may be a better option?
  • Expand prompt capabilities to answer other frequently asked questions
  • Hook up a Slack chatbot for more user-friendly interaction
  • Slick web interface similar to BirdSQL?
  • Implement using langchain?

About

Chat with Open Targets genetics database leveraging text-to-graphQL capabilities of OpenAI Codex.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages