End-to-End table OCR model using a hierarchical Transformer that outputs HTML tags and cell contents.
Install PyTorch 2.0 and run the following command:
pip install -e .
See releases.
Download the following datasets:
Run preprocess.py as follows:
$ python preprocess.py datasets/FinTabNet.yaml
$ python preprocess.py datasets/FinTabSub.yaml
$ python preprocess.py datasets/PubTab250.yaml
$ python preprocess.py datasets/PubTabNet.yaml
$ python preprocess.py datasets/PubTabSub.yaml
The datasets must be placed in data
directory as follows:
$ ls ~/data
fintabnet/
img_tables/
train/
100000_61623.png
100001_61624.png
100002_61625.png
100003_61626.png
100004_61627.png
val/
ground_truth_fintabnet.json
ground_truth_pubtabnet.json
ground_truth_syntabnet.json
icdar-task-b/
final_eval/
000221630ba33f9118f2671a715d6962e08d6b76a5a0c77a9fe26c291df763b0.png
0005e8fe1b3ba14982336837219f285921af7c152cfc81ac88bcf52809299279.png
002b1bf2bbb7dd7ec6201174e68df6346f448cd3951e861c3f940711c769f25f.png
002bfeebe20be2e97fab46b99ce68321afb8972f6d8f131f0c1f5392819d3a23.png
002c7215e95cd4bfebffb13dc0db32ab229a6674f4f1add84518ae52b75ac0da.png
final_eval.json
mmocr_fintabnet/
train/
100000_61623.txt
100001_61624.txt
100002_61625.txt
100003_61626.txt
100004_61627.txt
val/
mmocr_pubtabnet/
train/
PMC1064074_007_00.txt
PMC1064076_003_00.txt
PMC1064076_004_00.txt
PMC1064080_002_00.txt
PMC1064094_007_00.txt
val/
mmocr_syntabnet/
test/
train/
image_000000_1634629328.513163.txt
image_000000_1634629370.543605.txt
image_000000_1634629424.098128.txt
image_000001_1634629104.265115.txt
image_000001_1634629370.544624.txt
val/
pubtabnet/
PubTabNet_2.0.0.jsonl
train/
PMC1064074_007_00.png
PMC1064076_003_00.png
PMC1064076_004_00.png
PMC1064080_002_00.png
PMC1064094_007_00.png
val/
synthtabnet/
fintabnet/
marketing/
pubtabnet/
sparse/
The annotation format is fully compatible with MTL-TabNet.
Run train.py to start training using four GPUs:
name=pubtab250
save=~/work/$name
CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./train.sh ./configs/$name.py $save 4
Run test.py to evaluate the model and calculate TEDS score:
path=~/data/icdar-task-b/final_eval
json=~/data/icdar-task-b/final_eval.json
python test.py --conf ./configs/$name.py --ckpt $save/latest.pth --path $path --json $json
For FinTabNet, we use validation set including 10,656 tables as test set in imitation of the previous work.
We recommend that you use at least four NVIDIA V100 32GB GPUs.
This project is licensed under the MIT License. See LICENSE for more details.
@inproceedings{ICDAR24KAT,
author={Takaya Kawakatsu},
title={Multi-Cell Decoder and Mutual Learning for Table Structure and Character Recognition},
booktitle={Document Analysis and Recognition - ICDAR 2024},
publisher={Springer Nature Switzerland},
year={2024},
pages={389--405},
}