GitHub - AleSteB/CatalysisIE_Knowledge_Graph_Generator

CatalysisIE based Knowledge Graph Generator

Repository for the publication "Generating knowledge graphs through text mining of catalysis research related literature". The two Excel-files listing the output of the queries as described in the publication are contained in the output folder

The tool consists following modules: preprocess_onto.py, txt_extract.py, text_mining.py, onto_extension.py also there are jupyter notebook with SPARQL queries examples and functions for querying the ontology depending on the information of interest.

Preparations

Before starting the code, some preparations must be done:

Folder structure must be the following:

main_folder
├── import
├── ontologies
├── ontology_snipet
├── CatalysisIE
├── PDFDataExtractor
├── robot
├── output
└── classlist

The ontology to be extended must be stored in the “ontologies” folder
The following modules need to be installed/placed here:
- Pytorch version 1.8.0 and cuda toolkit version 11.1
- Clone the CatalysisIE (https://github.com/nsndimt/CatalysisIE) repository and download their checkpoints if needed
- Robot command line tool (http://robot.obolibrary.org/)
- PDFDataExtractor (https://pdfdataextractor.readthedocs.io/en/latest/getting_started/installation.html)
- More details regarding modules listed in cat_environment.yml and cat_environment.txt
Global variables listed in config.json must be adjusted for the process

CatalysisIE Checkpoint

The checkpoint of the extended CatalysisIE model is found here:

Usage

Execute create_ChEBIdict.py to create a dictionary of all ChEBI classes for later entity recognition (might take some time)
Place PDFs in folder import
Make sure a model for
Insert your Scopus API key in config.json and adjust other settings where necessary
Execute run_pdfs.py (this uses modules txt_extract.py, text_mining.py, preprocess_onto.py, and onto_extension.py and stores resulting knowledge graph in ontologies)
Execute the jupyter notebook user_queries.ipynb for predefined queries on the resulting knowledge graph

Remarks

The directory labeling contains json files exported from labelStudio for the labeling of abstracts of both the methanation and hydroformylation datasets. Furthermore, this directory contains the resulting labeling of the models and the performances of the models.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.ipynb_checkpoints		.ipynb_checkpoints
class_lists		class_lists
envs		envs
import		import
labeling		labeling
ontologies		ontologies
ontology_snippet		ontology_snippet
output		output
robot		robot
text_xml		text_xml
.gitattributes		.gitattributes
AFO_CHMO.xlsx		AFO_CHMO.xlsx
AFO_ChEBI.xlsx		AFO_ChEBI.xlsx
AFO_RXNO.xlsx		AFO_RXNO.xlsx
CER.ipynb		CER.ipynb
CITATION.cff		CITATION.cff
Readme.md		Readme.md
Untitled.ipynb		Untitled.ipynb
clean_json.py		clean_json.py
config.json		config.json
conll_json2.py		conll_json2.py
create_ChEBIdict.py		create_ChEBIdict.py
doi-based-retrieval.ipynb		doi-based-retrieval.ipynb
iriDictionary.json		iriDictionary.json
iriDictionaryChEBI.json		iriDictionaryChEBI.json
json_evaluation.py		json_evaluation.py
log.txt		log.txt
metric_from_jsons.py		metric_from_jsons.py
metrics_ext_hydform.csv		metrics_ext_hydform.csv
model_base.py		model_base.py
model_eval.py		model_eval.py
model_eval_methanation.py		model_eval_methanation.py
onto_comparison.py		onto_comparison.py
onto_extension_new.py		onto_extension_new.py
parse_xml.py		parse_xml.py
preprocess_onto.py		preprocess_onto.py
queries.py		queries.py
run_excel.py		run_excel.py
run_pdfs.py		run_pdfs.py
scrap_SD-from_excel.py		scrap_SD-from_excel.py
scrap_SD.py		scrap_SD.py
tde_log.txt		tde_log.txt
text_mining.py		text_mining.py
text_mining_1.py		text_mining_1.py
txt_extract.py		txt_extract.py
user_queries.ipynb		user_queries.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CatalysisIE based Knowledge Graph Generator

Preparations

CatalysisIE Checkpoint

Usage

Remarks

About

Uh oh!

Releases

Packages

Uh oh!

Languages

AleSteB/CatalysisIE_Knowledge_Graph_Generator

Folders and files

Latest commit

History

Repository files navigation

CatalysisIE based Knowledge Graph Generator

Preparations

CatalysisIE Checkpoint

Usage

Remarks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages