hershey dataset

Tool to extract svg's from hershey's font definition, dataset of already extracted fonts in svg format is available in compressed to font_svgs zip file.

turns out binary data streams are easily compressible, compressed pickled dataset can be found in ./datasets/

Local Dataset

Global Dataset

NOTE !

abandoning this dataset because of few reasons and moving to kanjivg characters for dataset:

Due to relatively small set of characters in hershey (even with use of data augmentation x4 samples, model is prone to overfitting)
redundant strokes or overwritting, even though hershey is single stroke font, strokes are overwritten for better visual appearance which is not a good feature for training

while it is easy to create dataset using svg with only M,L commands, it is not good tradeoff with kanjivg characters which have bezeir curves.

steps to extract dataset yourself :

few scripts need certain directories with data in place, figure it out with code or wait for the commit with code to handle all directory creation

$./extract_hershey_font.py
$./remove_invalid_svg.py  #removes invalid files ex:  empty svg files
$mkdir global_dataset
$./create_globaldataset.py
$mkdir local_dataset
$./create_localdataset.py

Dependencies

matplotlib
opencv
numpy

Files

./extract_hershey_font.py python script to read hershey.jhf and output font in .svg format

./hershey.jhf original file by Dr. Hershey contains all characters

./visualise_dataset.py visualise pickled dataset

./create_globaldataset.py creates global dataset in ./global_dataset directory

./create_localdataset.py create local dataset in ./local_dataset directory

./create_metadata.py create a metadata file in respective dataset directory with train, test, validation sample values, helpful when experimenting with number of files to include for training and validation ex : for validation_steps, steps_per_epochs

./tools/chartrace.py simple tkinter application to trace characters or draw them into svg files.(it needs folder with characters you want to trace as .png files to work)

This dataset was inspired from paper :

Teaching Robots To Draw

Atsunobu Kotani and Stefanie Tellex

Department of Computer Science

Brown University

ANOTHER NOTE !

Original paper was based on japanese characters, you can also extract the same using the script in this repository, get the japaneses and roman jhf file from http://paulbourke.net/dataformats/hershey/ (UPDATE : both files are now available in this repository )

TODO :

create global dataset
create local dataset
create a pickled form of global dataset
create guide and script to get global and local dataset
added japanese characters
upload visualisazations
create seperate file to get metadata
upload actual dataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hershey dataset

NOTE !

steps to extract dataset yourself :

Dependencies

Files

Teaching Robots To Draw

ANOTHER NOTE !

TODO :

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
datasets		datasets
res		res
tools		tools
.gitignore		.gitignore
README.md		README.md
bresenhamsalgo.py		bresenhamsalgo.py
create_globaldataset.py		create_globaldataset.py
create_localdataset.py		create_localdataset.py
create_metadata.py		create_metadata.py
drawing_utils.py		drawing_utils.py
extract_hershey_font.py		extract_hershey_font.py
font_svgs.zip		font_svgs.zip
hershey.jhf		hershey.jhf
japanese_hershey.jhf		japanese_hershey.jhf
remove_invalid_svg.py		remove_invalid_svg.py
visualise_dataset.py		visualise_dataset.py

prajwaltr93/hershey_dataset

Folders and files

Latest commit

History

Repository files navigation

hershey dataset

NOTE !

steps to extract dataset yourself :

Dependencies

Files

Teaching Robots To Draw

ANOTHER NOTE !

TODO :

About

Topics

Resources

Stars

Watchers

Forks

Languages