GitHub - marco-peer/icdar24: Code implementation of our ICDAR24 submission "SAGHOG: Self-Supervised Autoencoder for Generating HOG Features for Writer Retrieval"

This repository contains the code implementation of our paper

Marco Peer, Florian Kleber and Robert Sablatnig : SAGHOG: Self-Supervised Autoencoder for Generating HOG Features for Writer Retrieval,

a two stage approach using a masked autoencoder predicting the HOG features. We finetune our network by appending NetRVLAD. SAGHOG works exceptionally well for large, complex datasets with small amounts of handwriting such as HisFrag20, where we outperform SOTA by ~12% mAP. The paper is accepted for presentation at ICDAR2024. Check it out on arXiv here.

The code hasn't been completely cleaned up yet, but should be enough to either reuse parts of it (model, training code, etc.) or take a closer look for your own work or reviewing.

Installation

Install the packages via

pip install -r requirements.txt

The repository uses wandb for logging.

Patch extraction

We provide four scripts (two with a color version each for RGB images) to extract the patches from the documents:

utils/extract_patches_only.py : only extracts patches without clustering (mainly used for test sets)
utils/extract_patches.py : extracts patches and clusters their descriptors (mainly used to train sets)

The respective configs for the scripts to reproduce our results are located in the config directory.

Defining your own dataset

in datasets_example/create_dataset_csvs.py, create a class, e.g.

    class HisFrag20_Train_Cluster(CSVWriter):
        name = 'hisfrag20_train_patches_clustered'
        labels = {
            'cluster' : '(\d+)',
            'writer' : '\d+_(\d+)',
            'page' : '\d+_\d+_(\d+)',
            'fragment' : '\d+_\d+_\d+_(\d+)'
        }
        root = '/data/mpeer/resources/hisfrag20_train_clusters'
        mode = 'color'
        out = 'datasets/' + name + '.csv'

where root is the directory containing the images (e.g. the patches) and labels describes all relevant regular expressions to extract writer/page/fragment labels etc. This will create a csv for the config.yml, see examples in datasets_example. (For testing - aggregation to calculate the descriptors - writer and page are mandatory, training label, e.g. Supervised or CL-S, can be set in the config).

Pretraining

    python main_maskfeat.py --config=CONFIG_FILE.yml --gpuid=GPUID

Finetuning

    python main_finetune.py --config=CONFIG_FILE.yml --gpuid=GPUID

Models and dataset

I currently do not intend to publish the final dataset (the 24k documents preprocessed with SAM) for pretraining since you can generate it with the scripts in preprocess, the original datasets are available at the webpages of the corresponding competitions.

I will upload pretrained model checkpoints soon. However, if you need them sooner or you want the dataset, just drop me a mail - [email protected] - and I am happy to send it to you.

Questions

If you have questions, feel free to contact me.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
config		config
dataloading		dataloading
datasets_example		datasets_example
evaluators		evaluators
finetune		finetune
mae		mae
models		models
preprocess		preprocess
utils		utils
.gitignore		.gitignore
README.md		README.md
main_finetune.py		main_finetune.py
main_maskfeat.py		main_maskfeat.py
page_encodings.py		page_encodings.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Patch extraction

Defining your own dataset

Pretraining

Finetuning

Models and dataset

Questions

About

Releases

Packages

Languages

marco-peer/icdar24

Folders and files

Latest commit

History

Repository files navigation

Installation

Patch extraction

Defining your own dataset

Pretraining

Finetuning

Models and dataset

Questions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages