Skip to content

v1.0.0

Latest
Compare
Choose a tag to compare
@jlerouge jlerouge released this 25 Jan 10:50
· 4 commits to main since this release

Initial release of DocXPand tool and DocXPand-25k dataset.

DocXPand tool

This includes the vectorial templates of 9 fictitious identity documents designs, and python code to generate synthetic documents and paste them onto real documents images.

DocXPand-25k dataset

The synthetic ID document images dataset ("DocXPand-25k"), released alongside this tool, is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-sa/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

You can download the dataset from this release. It's split into 12 parts (DocXPand-25k.tar.gz.xx, from 00 to 11). Once you've downloaded all 12 binary files, you can extract the content using the following command : cat DocXPand-25k.tar.gz.* | tar xzvf -.
The labels are stored in a JSON format, which is readable using the DocFakerDataset class. The document images are stored in the images/ folder, which contains one sub-folder per-class. The original image fields (identity photos, ghost images, barcodes, datamatrices) integrated in the documents are stored in the fields/ sub-folder.

Disclaimer

The data used for the generation of the DocXPand-25k dataset are not personal data in the meaning of GDPR as they are not related to an identified or identifiable natural person. They are test data generated through Faker Stable Diffusion v1.5 and Quicksign anomyzed scene images. The photos of faces are generated through an AI system. Therefore, the application of GDPR is excluded. Should personal data in the meaning of GDPR be used through the algorithm, an assessment should be made to evaluate the conformity of the data processing with GDPR.
The ID designs used to generate the DocXPand-25k dataset and other fictitious ID documents in the DocXPand-25k are fictitious ID and could not be assimilated to forgery ID. 
The purpose of the DocXPand-25k and the algorithm is to provide a dataset and a algorithm to generate fictitious ID documents to train on document localization, text recognition but not fraud detection as the fictitious ID documents could not be assimilated to valid ID documents.
QuickSign disclaims all responsibility for the use of the DocXPand-25k dataset and the associated code