Skip to content

Constructing a knowledge graph from chemistry research papers' abstract using NLP AND graphs

Notifications You must be signed in to change notification settings

AnasAito/um6p-chemistry-kg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Knowledge graph from Um6p' chemistry papers using NLP and Graphs

image

Problem :

Literature reviewing and documentation, since finding good and relatable scientific resources for your research can be challenging and time-consuming.

Solution :

Build a knowledge graph connecting scientific entities occurring in chemistry papers. This enables users to start from a simple query (a chemical substance name or a laboratory procedure) in order to find relatable papers for their quest.

How to build it?

The pipeline starts with scientific entities' extraction from chemistry papers abstracts using a fine-tuned NLP model on biomedical corpus maintained by Alan Turing Institute researchers. Those entities are then cross-matched with the UMLS database in order to label them. Finally, we can build the knowledge graph by connecting keywords that co-occurred in the same paper abstract.

Try it yourself!

I uploaded commented notebooks detailing the pipeline :

Chimestry-extract

In this notebook, i use the Scpacy pipeline to extract and label keywords from papers abstracts then save them to a JSON file with the paper id and other metadata

  • Data format (You can found it in chimestry_papers.json)
{
  "paper_id": {
    "year": 2020,
    "title": "D",
    "paper_type": "Article",
    "keywords": [
      {
        "label": "",
        "canonical form": "",
        "type": ""
      },...
    ]
  },...
}

Chimestry-process

The model used is finetuned on Biomedical data so we might have some false labeled keywords we use manual utils to fix the problem

Chimestry-graph

In this notebook, we give the code for the knowledge graph creation

Data

The data is extracted from the Web of Science database, the papers used for the graph creation are chemistry papers

Results

  • we use Gephy for graph visualition image image

  • the graph can be used eventualy to feed a client app (to be done in the future)

  • you can find edges list in the data folder

About

Constructing a knowledge graph from chemistry research papers' abstract using NLP AND graphs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published