- Modify
config/job.json
{
"max-depth": 4, # max depth of elements
"parts": 4, # number of threads
"samples-in-part": 10, # number of elements generated by a one thread without level augmentation
"level-augmentation": true, # do the same job for depth = 0, 1, 2, ..., max-depth - 1 - each job with next thread
"deeper-chance": 1 # chance that recursion of a generating tree will go deeper, but bounded by max-depth
}
- Modify
config/aug_job.json
if you need augmentation
{
"threads": 12,
"samples-lvl-percent": { # how many % of samples from given level will be augmented
"1": 0.2,
"2": 0.1,
"3": 0.05
}
}
- Modify
config/rescale.json
{
"1": 0.5, # Expression with depth 1 will be treated as they were 50% of the original size.
"2": 0.35, # Expression with depth 2 will be treated as they were 35% of the orginal size.
"3": 0.2, # ...
"4": 0.15 # leafs as they were 15% of the orginal size.
}
python3 toolkit/generate_dataset.py --job config/job.json --aug-job config/aug_job.json
WARNING: parts * ([level-augmentation ? (max-depth)! : 1])
threads will be created.