Skip to content

This repository contains the official implementation of the paper titled "Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural Network", which was presented in the 18th International Conference on Document Analysis and Recognition (ICDAR 2024).

License

Notifications You must be signed in to change notification settings

HamzaGbada/direct-neighbor-vrd

Repository files navigation

Direct-Neighbor-VRD

This repository contains the official implementation of the paper titled Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural Network.

Abstract

This paper introduces a novel approach for information extraction (IE) from visually rich documents (VRD) by employing a directed weighted graph representation. This approach enhances performance by capturing relationships among VRD components using directed weighted graphs, as opposed to traditional methods based on Euclidean distance. The IE task is treated as a node classification problem, with graph convolutional networks (GCNs) processing the VRD graphs. Evaluations conducted on five real-world datasets demonstrate the efficacy and alignment with established norms.

Dependencies

To run the code, you need the following libraries:

You can install these dependencies using pip:

pip install -r requirements.txt

Usage

Building the Graph-based Dataset

To build a graph-based dataset, use the following command:

python builder.py build -d <dataset>

This command creates a graph-based dataset for node classification for a specific dataset.

Optional Arguments:

  • -d DATASET, --dataset DATASET: Choose the dataset to use. Options are XFUND, FUNSD, SROIE, Wildreceipt, or CORD.
  • -n MAX_NODE, --max_node MAX_NODE: Maximum number of nodes per node (edges per node). Default is 6.

Example:

python builder.py build -d CORD

Training the Model

To train the model, use the following command:

python train.py -h

Arguments:

  • -d DATANAME, --dataname DATANAME: Select the dataset for model training. Options are FUNSD, SROIE, Wildreceipt, or CORD.
  • -p PATH, --path PATH: Path to the dataset for model training.
  • -hs HIDDEN_SIZE, --hidden_size HIDDEN_SIZE: GCN hidden size. Default is 32.
  • -hl HIDDEN_LAYERS, --hidden_layers HIDDEN_LAYERS: Number of GCN hidden layers. Default is 20.
  • -lr LEARNING_RATE, --learning_rate LEARNING_RATE: Learning rate. Default is 0.01.
  • -e EPOCHS, --epochs EPOCHS: Number of epochs. Default is 200.

Example:

python train.py -d CORD -hs 64 -hl 128

Acknowledgments

We acknowledge the contributions of the authors of the paper and the developers of the libraries used in this project.

About

This repository contains the official implementation of the paper titled "Information Extraction from Visually Rich Documents Using Directed Weighted Graph Neural Network", which was presented in the 18th International Conference on Document Analysis and Recognition (ICDAR 2024).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published