MaTableGPT: GPT-based Table Data Extractor from Materials Science Literature

Introduction

1) Overall workflow of MaTableGPT

Generate customized TSV, JSON representation for HTML format of table and split the table.
Test 3 models (fine tuning, few shot, zero shot) and go through the follow up questions process.

2) GPT modeling process

User Manual

1) Installation

Using conda

conda env create -f requirements_conda.txt

Using pip

pip install -r requirements_paper.txt

2) Download data files

git clone https://github.com/KIST-CSRC/MaTableGPT.git
git lfs pull

3) Script architecture

MaTableGPT
├── data
│   └── non_split
│   └── split
│   └── pickle_folder
│   └── result
├── GPT_models
│   └── models.py
│   └── follow_up_q.py.py
├── model_evaluation
│   └── utils
│   └── evaluation.py
├── table_representation
│   └── table_representer.py
│   └── table2json.py
├── table_splitting
│   └── split_table.py
│ 
└── run.py

4) Code usage (run.py)

Examples : Input generation (split, tsv)

input_guneration("split", "TSV")

Examples : Data extraction (few shot, follow_up questions)

model_test("few_shot", True)

Benefit

Using MaTableGPT, we achieved a table data extraction accuracy of 96.8% and proposed the optimal solution for each situation through the Pareto-front solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MaTableGPT: GPT-based Table Data Extractor from Materials Science Literature

Introduction

1) Overall workflow of MaTableGPT

2) GPT modeling process

User Manual

1) Installation

2) Download data files

3) Script architecture

4) Code usage (run.py)

Benefit

Reference

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
GPT_models		GPT_models
data		data
model_evaluation		model_evaluation
table_representation		table_representation
table_splitting		table_splitting
README.md		README.md
input_generation_script.json		input_generation_script.json
model_script.json		model_script.json
requirements_conda.txt		requirements_conda.txt
requirements_pip.txt		requirements_pip.txt
run.py		run.py

KIST-CSRC/MaTableGPT

Folders and files

Latest commit

History

Repository files navigation

MaTableGPT: GPT-based Table Data Extractor from Materials Science Literature

Introduction

1) Overall workflow of MaTableGPT

2) GPT modeling process

User Manual

1) Installation

2) Download data files

3) Script architecture

4) Code usage (run.py)

Benefit

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages