Skip to content

Commit

Permalink
Typo fix
Browse files Browse the repository at this point in the history
  • Loading branch information
Kitty Murphy committed Jun 10, 2024
1 parent f7cdf5e commit 4ffc285
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion manuscript.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ annotation_order <- gsub("_"," ",names(sort(unlist(weights_dict), decreasing = T

## Abstract

There are thousands of human phenotypes which are linked to genetic variation. These range from the benign (white eyelashes) to the deadly (respiratory failure). The Human Phenotype Ontology has categorised all human phenotypic variation into an unified framework that defines the relationships between them (e.g. missing arms and missing legs are both abnormalities of the limb). This has made it possible to perform phenome-wide analyses, e.g. to prioritise which make the best candidates for gene therapies. However, there is currently limited metadata describing the clinical characteristics / severity of these phenotypes. With \>`r format(round(n_ids_hpo/100)*100, scientific = FALSE)` phenotypic abnormalities across \>`r format(round(length(unique(p2g$disease_id))/100)*100, scientific = FALSE)` rare diseases, manual curation of such phenotypic annotations by experts would be exceedingly labour-intensive and time-consuming. Leveraging advances in artificial intelligence, we employed the OpenAI GPT-4 large language model (LLM) to systematically annotate the severity of all phenotypic abnormalities in the HPO. Phenotypic severity was defined using a set of clinical characteristics and their frequency of occurrence. First, we benchmarked the generative LLM clinical characteristic annotations against ground-truth labels within the HPO (e.g. phenotypes in the 'Cancer' HPO branch were annotating as causing cancer by GPT-4). True positive recall rates across different clinical characteristics ranged from `r round(min(checks$true_pos_rate)*100)`-`r round(max(checks$true_pos_rate)*100)`% (mean=`r round(mean(checks$true_pos_rate)*100)`%), clearly demonstrating the ability of GPT-4 to automate the curation process with a high degree of fidelity. Using a novel approach, we developed a severity scoring system that incorporates both the nature of the clinical characteristic and the frequency of its occurrence. These clinical characteristic severity metrics will enable efforts to systematically prioritise which human phenotypes are most detrimental to human health, and best targets for therapeutic intervention.
There are thousands of human phenotypes which are linked to genetic variation. These range from the benign (white eyelashes) to the deadly (respiratory failure). The Human Phenotype Ontology has categorised all human phenotypic variation into a unified framework that defines the relationships between them (e.g. missing arms and missing legs are both abnormalities of the limb). This has made it possible to perform phenome-wide analyses, e.g. to prioritise which make the best candidates for gene therapies. However, there is currently limited metadata describing the clinical characteristics / severity of these phenotypes. With \>`r format(round(n_ids_hpo/100)*100, scientific = FALSE)` phenotypic abnormalities across \>`r format(round(length(unique(p2g$disease_id))/100)*100, scientific = FALSE)` rare diseases, manual curation of such phenotypic annotations by experts would be exceedingly labour-intensive and time-consuming. Leveraging advances in artificial intelligence, we employed the OpenAI GPT-4 large language model (LLM) to systematically annotate the severity of all phenotypic abnormalities in the HPO. Phenotypic severity was defined using a set of clinical characteristics and their frequency of occurrence. First, we benchmarked the generative LLM clinical characteristic annotations against ground-truth labels within the HPO (e.g. phenotypes in the 'Cancer' HPO branch were annotated as causing cancer by GPT-4). True positive recall rates across different clinical characteristics ranged from `r round(min(checks$true_pos_rate)*100)`-`r round(max(checks$true_pos_rate)*100)`% (mean=`r round(mean(checks$true_pos_rate)*100)`%), clearly demonstrating the ability of GPT-4 to automate the curation process with a high degree of fidelity. Using a novel approach, we developed a severity scoring system that incorporates both the nature of the clinical characteristic and the frequency of its occurrence. These clinical characteristic severity metrics will enable efforts to systematically prioritise which human phenotypes are most detrimental to human health, and best targets for therapeutic intervention.

## Introduction

Expand Down

0 comments on commit 4ffc285

Please sign in to comment.