Multi-scale DSNs for degraded document image binarization
We propose a novel supervised binarization method by learning multi-scale deep supervised networks. Our model is directly trained from image regions using pixel values as inputs and the binary ground truth as labels. By extracting high-level features, the networks can differentiate text pixels from background noises, and thus can deal with severe degradations occurring in document images. Comparing to traditional algorithms, binary images generated by our method have a cleaner background and better-preserved strokes. The proposed approach achieves state-of-the-art results over widely used DIBCO datasets.
- Install the modified Caffe for pixel prediction with DSN: https://github.com/s9xie/hed
- Copy our created dataset to the root directory of the DSN library: https://drive.google.com/file/d/0B5q_fE0K48IVN0VWcy01U0pGUVU/view?usp=sharing&resourcekey=0-7xWcr8Vw873WmS9oljAXiA
- Copy the training and testing source code to \examples directory of the DSN library: https://drive.google.com/file/d/0B5q_fE0K48IVOWlDSG9kRWhrVEU/view?usp=sharing&resourcekey=0-PK3J2pWHMZKbyt8dwoEXQw
GPU: GTX 980Ti, GTX 1070.
To reproduce our results:
- Download our created dataset, this dataset is created from the DIBCO 2009 & 2013, H-DIBCO 2010 & 2012& 2014, Bickley diary, PHIDB, and S-MS datasets.
- The training parameters are defined in solver.prototxt files.
- Three DSN structures are defined in train_val_conv3.prototxt, train_val_conv4.prototxt, and train_val_conv5.prototxt
- Run the python code for training thee DSN structure: solve_conv3.py, solve_conv4.py, and solve_conv5.py.
The pre-trained DSNs models are provided in conv_3, conv_4, and conv_5 folders. Run the python code multiscale_DSN.py for binarizing the document images. These pre-trained models are tested with H-DIBCO 2016 and DIBCO 2011 datasets.
$ python multiscale_DSN.py arg1 arg2
arg1: Input gray scale or color image
arg2: Resulting binary image
This code is based on Caffe and the implementation of DSN.
If you are using the code/model/data provided here in a publication, please cite our paper:
@proceedings{vqnhat,
Author = {Quang Nhat, Vo, Soo Hyung, Kim, Hyung Jeong, Yang, and Gueesang, Lee},
Title = {Binarization of degraded document images based on hierarchical deep supervised network},
Year = {2017},
Booktitle = {Pattern Recognition}
}