Skip to content
Kartikeya Shukla edited this page Apr 21, 2020 · 7 revisions

Denoising Dirty Documents

Team: Bits_Please (cnw282, mfl340, ks5173)

Problem

Numerous scientific papers, historical documentaries/artifacts, recipes, books are stored as papers be it handwritten/typewritten. With time, the paper/notes tend to accumulate noise/dirt through fingerprints, weakening of paper fibers, dirt, coffee/tea stains, abrasions, wrinkling, etc. There are several surface cleaning methods used for both preserving and cleaning, but they have certain limits, the major one being: that the original document might get altered during the process. The purpose of this project is to do a comparative study of traditional computer vision techniques vs deep learning networks when denoising dirty documents.


Data

It’s a dataset of images containing text that has seen “better days” (i.e. Coffee stains, faded sun spots, dog-eared pages, and extreme wrinkles, etc) It includes images with both synthetic/real noise and also a set of clean images to train our neural net

Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science (https://archive.ics.uci.edu/ml/datasets/NoisyOffice)

Clone this wiki locally