Identifying Semantic Relationships Between Research Topics Using LLMs in a Zero-Shot Learning Setting

In this GitHub repository, you can access the Gold Standard and the code for identifying relationships between research topic pairs in the Gold Standard.

Abstract.

Knowledge Organization Systems (KOS), such as ontologies, taxonomies, and thesauri, play a crucial role in organising scientific knowledge. They help scientists navigate the vast landscape of research literature and are essential for building intelligent systems such as smart search engines, recommendation systems, conversational agents, and advanced analytics tools. However, the manual creation of these KOSs is costly, time-consuming, and often leads to outdated and overly broad representations. As a result, researchers have been exploring automated or semi-automated methods for generating ontologies of research topics. This paper analyses the use of large language models (LLMs) to identify semantic relationships between research topics. We specifically focus on six open and lightweight LLMs (up to 10.7 billion parameters) and use two zero-shot reasoning strategies to identify four types of relationships: broader, narrower, same-as, and other. Our preliminary analysis indicates that Dolphin2.1-OpenOrca-7B performs strongly in this task, achieving a 0.853 F1-score against a gold standard of 1,000 relationships derived from the IEEE Thesaurus. These promising results bring us one step closer to the next generation of tools for automatically curating KOSs, ultimately making the scientific literature easier to explore.

Figure 1: Architecture of our two strategies. The first strategy (red dashed box) determines the relationship between 𝑡_𝑎 and 𝑡_b in one way, whereas the second strategy (green dashed box) determines the relationship between pairs of topics in both ways.

📂 Folders

Below is the distribution and organisation of the folders in this repository.

📄 dataset

This folder contains the Gold Standard. Accessible here

To create our gold standard dataset from the IEEE Thesaurus, we followed these steps:

Data Extraction and Transformation: We transformed the hierarchical structures and relationships from the original IEEE Thesaurus PDF into RDF format using a script that we developed, which is available here.
Sampling for Gold Standard: The relationships were represented using the SKOS-Notation. Here are the relationships utilised:

SKOS Notation Relationship Type

skos:broader broader

skos:narrower narrower

skos:altLabel same-as

skos:prefLabel same-as

We randomly selected 250 relationships for each category: broader, narrower, same-as, and other. For the other category, the relationships were established by pairing topics at random, ensuring that they did not overlap with existing semantic relationships within the thesaurus.

This method ensured that our gold standard dataset of 1K semantic relationships was diverse and representative of the various types of relationships defined in the IEEE Thesaurus.

🖥 code

This folder contains the script that we used to identify the semantic relationships between pairs of research topics.

The script (accessible here) functions as follows:

Task Definition and Experiments

The task involves classifying the semantic relationship between pairs of research topics ($t_A$, $t_B$) into four categories essential for ontology generation:

broader: $t_A$ is a parent topic of $t_B$. Example: ontological languages is broader than owl.
narrower: $t_A$ is a child topic of $t_B$. Example: nosql is a specific area within databases.
same-as: $t_A$ and $t_B$ can be used interchangeably to refer to the same concept. Example: haptic interface and haptic device.
other: $t_A$ and $t_B$ do not fit into the above categories. Example: blockchain and user interfaces.

Experiment Strategies

The experiments are conducted using two strategies:

One-way Strategy: Each pair of topics is processed once using a prompt template designed for classification by a language model.
Two-way Strategy: Each pair is processed twice:
1. First, the relationship between $t_A$ and $t_B$ is identified.
2. Then, the relationship between $t_B$ and $t_A$ is identified in a separate context.

Prompt Template

A standardised prompt template is used across both strategies and all models.

Empirical Rules for Two-way Strategy Agreement

Empirical rules (cyan box in Figure 1) are employed to reconcile agreements and disagreements between the two branches of the two-way strategy:

broader :- f(broader) $\land$ s(narrower)
narrower :- f(narrower) $\land$ s(broader)
broader :- ((f(narrower) $\land$ s(narrower)) $\lor$ (f(broader) $\land$ s(broader))) $\land$ len($t_A$) $\leq$ len($t_B$)
narrower :- ((f(narrower) $\land$ s(narrower)) $\lor$ (f(broader) $\land$ s(broader))) $\land$ len($t_A$) $>$ len($t_B$)
same-as :- f(same-as) $\land$ s(same-as)
broader :- (f(broader) $\land$ s(other)) $\lor$ (f(other) $\land$ s(narrower))
narrower :- (f(narrower) $\land$ s(other)) $\lor$ (f(other) $\land$ s(broader))
:- f(X)

Rule Number	Rule Description
1	Assign broader if first branch `(f)` returns broader and second branch `(s)` returns narrower.
2	Assign narrower if `(f)` returns narrower and `(s)` returns broader.
3	Assign broader if both branches return narrower or broader and $t_A$ length $\leq$ $t_B$ length.
4	Assign narrower if both branches return narrower or broader and $t_A$ length > $t_B$ length.
5	Assign same-as if both branches return same-as.
6	Assign broader if `(f)` returns broader and `(s)` returns other, or f returns other and s returns narrower.
7	Assign narrower if `(f)` returns narrower and `(s)` returns other, or f returns other and s returns broader.
8	Relationship returned by LLM.

These rules are applied sequentially to determine the final classification based on the outputs from the two branches of the two-way strategy (f and s), ensuring consistent and reasoned assignment of semantic relationships.

This approach allows for robust classification of semantic relationships between research topics, contributing to the development of ontologies in the field of interest.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
code		code
dataset		dataset
figures		figures
results		results
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Identifying Semantic Relationships Between Research Topics Using LLMs in a Zero-Shot Learning Setting

Abstract.

📂 Folders

📄 dataset

🖥 code

Task Definition and Experiments

Experiment Strategies

Prompt Template

Empirical Rules for Two-way Strategy Agreement

About

Releases

Packages

Languages

SKOS Notation	Relationship Type
`skos:broader`	broader
`skos:narrower`	narrower
`skos:altLabel`	same-as
`skos:prefLabel`	same-as

License

ImTanay/LLM-Semantic-Relationship-Analysis

Folders and files

Latest commit

History

Repository files navigation

Identifying Semantic Relationships Between Research Topics Using LLMs in a Zero-Shot Learning Setting

Abstract.

📂 Folders

📄 dataset

🖥 code

Task Definition and Experiments

Experiment Strategies

Prompt Template

Empirical Rules for Two-way Strategy Agreement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages