Lougat

This project aims to create a text scanner that converts paper images into machine-readable formats (e.g., Markdown, JSON). It is the son of Nougat, and thus, grandson of Douat.

The key idea is to combine the bounding box modality with text, achieving a pixel scan behavior that predicts not only the next token but also the next position.

The name "Lougat" is a combination of LLama and Nougat. In this repo, you'll also find other combinations like:

Florence2 + LLama → Flougat
Sam2 + LLama → Slougat
Nougat + Relative Position Embedding LLama → Rlougat

The key idea is nature continues of this paper [LOCR: Location-Guided Transformer for Optical Character Recognition]([2403.02127] LOCR: Location-Guided Transformer for Optical Character Recognition (arxiv.org))

Dataset

The dataset is the UltexB dataset from Uparxive

Model

download weight from huggingface

Flougat

huggingface-cli download --resume-download --local-dir-use-symlinks False https://huggingface.co/LLM4SCIENCE/flougat_iter1000 --local-dir ckpts/flougat_iter1000

Locr

huggingface-cli download --resume-download --local-dir-use-symlinks False https://huggingface.co/LLM4SCIENCE/locr_alpha --local-dir ckpts/locr_alpha

Inference

The model is huggingface compatible. You can see test_prediction.py

MODELWEIGHT="ckpts/locr_alpha"
python test_prediction.py ckpts/locr_alpha

Train

DATASETTYPE=lougat
MODELTYPE=slougat_small 
trial_name=new_train
python train_via_accelerate.py --task train --model $MODELTYPE--Dataset $DATASETTYPE --root_name "" \
    --preload_weight pretrain_weights/slougat.matched_start.pt \
    --batch_size 10 --gradient_accumulation_steps 5 --lr 0.0001 --num_workers 8 \
    --coordinate_retreive_method mapping_coordinate_hard --trial_name $\
    --start_count 2 --start_weight 2 --token_weight 5 --bbox_weight 1 --find_unused_parameters \
    --epochs 200 --clip_value 1 --use_wandb

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
config		config
dataset		dataset
evaluator		evaluator
images		images
model		model
predictor		predictor
server_script		server_script
train		train
.gitignore		.gitignore
README.md		README.md
convert_old_checkpoint_into_new.py		convert_old_checkpoint_into_new.py
project_arguements.py		project_arguements.py
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt
test_datasets.py		test_datasets.py
test_inference.py		test_inference.py
test_prediction.py		test_prediction.py
train_via_accelerate.py		train_via_accelerate.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lougat

Dataset

Model

Inference

Train

About

Releases

Packages

Languages

veya2ztn/Lougat

Folders and files

Latest commit

History

Repository files navigation

Lougat

Dataset

Model

Inference

Train

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages