TexSet

Generate own dataset

Preparation

Modify config/job.json

{
    "max-depth": 4, # max depth of elements
    "parts": 4, # number of threads
    "samples-in-part": 10, # number of elements generated by a one thread without level augmentation
    "level-augmentation": true, # do the same job for depth = 0, 1, 2, ..., max-depth - 1 - each job with next thread
    "deeper-chance": 1 # chance that recursion of a generating tree will go deeper, but bounded by max-depth
}

Modify config/aug_job.json if you need augmentation

{
    "threads": 12,
    "samples-lvl-percent": { # how many % of samples from given level will be augmented
        "1": 0.2,
        "2": 0.1,
        "3": 0.05
    }
}

Modify config/rescale.json

{
    "1": 0.5, # Expression with depth 1 will be treated as they were 50% of the original size.
    "2": 0.35, # Expression with depth 2 will be treated as they were 35% of the orginal size.
    "3": 0.2, # ...
    "4": 0.15 # leafs as they were 15% of the orginal size.
}

Generate step

python3 toolkit/generate_dataset.py --job config/job.json --aug-job config/aug_job.json

WARNING: parts * ([level-augmentation ? (max-depth)! : 1]) threads will be created.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
config		config
dataset		dataset
toolkit		toolkit
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TexSet

Generate own dataset

Preparation

Generate step

About

Releases

Packages

Contributors 2

Languages

License

kakainet/TexSet

Folders and files

Latest commit

History

Repository files navigation

TexSet

Generate own dataset

Preparation

Generate step

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages