Skip to content

A library for embedding documents and clustering them by layout -- augmented with image features! This is a class project for Stanford CS 231n: Computer Vision with Deep Learning (Spring 2022).

License

Notifications You must be signed in to change notification settings

poojasethi/visual-doc-clustering

Repository files navigation

Document Clustering

This is a library for clustering documents in an unsupervised fashion.

Check out the 4-minute video explanation here and paper here.

CS 231n Poster Session

This extends the work done in doc-clustering to use visual features of documents!

Getting started

Install dependencies in a new conda environment.

conda env create --name doc-clustering --file=doc-clustering.yml

Once you've created the environment, you can activate it using: conda activate doc-clustering

If you're using an M1 (Apple Silicon), you'll need to use Minforge in order to use TensorFlow: https://developer.apple.com/metal/tensorflow-plugin/

Alternatively, you can also create your own, fresh environment: conda env create --name doc-clustering python=3.8

And then manually find and install the missing dependencies by running: python clustering.py -h

Download datasets

Datasets are available for download here.

And should be stored with the following directory structure and names:

datasets/rvl-cdip/
datasets/sroie2019/

Download finetuned models.

Models are available for download here.

And should be stored with the following directory structure and names:

finetuned_models/finetuned_related_lmv1/
finetuned_models/finetuned_unrelated_lmv2/

Download embeddings and results from paper.

Prepared document embeddings and experiment results are here.

And should be stored with the following directory name:

results/

Training and running models.

Run one of the commands from EXPERIMENTS.md, or python clustering.py --help for example usage.

Add the --debug flag to get interactive visualizations as well. Example commands:

ResNet

mkdir -p results/sroie2019/resnet/
python clustering.py -p datasets/sroie2019/ \
	-r resnet \
	-o results/sroie2019/resnet/ \
	--debug

AlexNet

mkdir -p results/sroie2019/alexnet/
python clustering.py -p datasets/sroie2019/ \
	-r alexnet \
	-o results/sroie2019/alexnet/ \
	--debug

LayoutLM Base ([CLS] Token)

mkdir -p results/sroie2019/layoutlm_base/cls_token/
python clustering.py -p datasets/sroie2019/ \
	-r layoutlm_base \
	-s cls_token \
	-o results/sroie2019/layoutlm_base/cls_token/ \
	--debug

About

A library for embedding documents and clustering them by layout -- augmented with image features! This is a class project for Stanford CS 231n: Computer Vision with Deep Learning (Spring 2022).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages