- Generate customized TSV, JSON representation for HTML format of table and split the table.
- Test 3 models (fine tuning, few shot, zero shot) and go through the follow up questions process.
Using conda
conda env create -f requirements_conda.txt
Using pip
pip install -r requirements_paper.txt
git clone https://github.com/KIST-CSRC/MaTableGPT.git
git lfs pull
MaTableGPT
├── data
│ └── non_split
│ └── split
│ └── pickle_folder
│ └── result
├── GPT_models
│ └── models.py
│ └── follow_up_q.py.py
├── model_evaluation
│ └── utils
│ └── evaluation.py
├── table_representation
│ └── table_representer.py
│ └── table2json.py
├── table_splitting
│ └── split_table.py
│
└── run.py
Examples : Input generation (split, tsv)
input_guneration("split", "TSV")
Examples : Data extraction (few shot, follow_up questions)
model_test("few_shot", True)
Using MaTableGPT, we achieved a table data extraction accuracy of 96.8% and proposed the optimal solution for each situation through the Pareto-front solution.