Knowledge graph from Um6p' chemistry papers using NLP and Graphs

Problem :

Literature reviewing and documentation, since finding good and relatable scientific resources for your research can be challenging and time-consuming.

Solution :

Build a knowledge graph connecting scientific entities occurring in chemistry papers. This enables users to start from a simple query (a chemical substance name or a laboratory procedure) in order to find relatable papers for their quest.

How to build it?

The pipeline starts with scientific entities' extraction from chemistry papers abstracts using a fine-tuned NLP model on biomedical corpus maintained by Alan Turing Institute researchers. Those entities are then cross-matched with the UMLS database in order to label them. Finally, we can build the knowledge graph by connecting keywords that co-occurred in the same paper abstract.

Try it yourself!

I uploaded commented notebooks detailing the pipeline :

Chimestry-extract

In this notebook, i use the Scpacy pipeline to extract and label keywords from papers abstracts then save them to a JSON file with the paper id and other metadata

Data format (You can found it in chimestry_papers.json)

{
  "paper_id": {
    "year": 2020,
    "title": "D",
    "paper_type": "Article",
    "keywords": [
      {
        "label": "",
        "canonical form": "",
        "type": ""
      },...
    ]
  },...
}

Chimestry-process

The model used is finetuned on Biomedical data so we might have some false labeled keywords we use manual utils to fix the problem

Chimestry-graph

In this notebook, we give the code for the knowledge graph creation

Data

The data is extracted from the Web of Science database, the papers used for the graph creation are chemistry papers

Results

we use Gephy for graph visualition
the graph can be used eventualy to feed a client app (to be done in the future)
you can find edges list in the data folder

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledge graph from Um6p' chemistry papers using NLP and Graphs

Problem :

Solution :

How to build it?

Try it yourself!

Data

Results

About

Releases

Packages

Languages

AnasAito/um6p-chemistry-kg

Folders and files

Latest commit

History

Repository files navigation

Knowledge graph from Um6p' chemistry papers using NLP and Graphs

Problem :

Solution :

How to build it?

Try it yourself!

Data

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages