"Bug fixed" make_dataset supports dicts as input and doesn't do unnecessary save-load #65

egorssed · 2022-03-18T13:19:57Z

In the documentation it was declared that you can generate a dataset from python's dict

def make_dataset(config, ...):
    """
    Generate a dataset from a config file.
    Args:
        config (str or dict): name of yaml file specifying dataset characteristics or pre-parsed yaml file as dictionary
    ...
    """

, but all the realization relied on parser and manipulation with files, hence the actual generation from dict didn't work. I fixed it in the first commit.

Some lines are very questionable, e.g. line 349

    try:
        dataset.seed = int(dataset.config_dict['DATASET']['PARAMETERS']["SEED"])
    except: 
        ...

Just check that the key 'SEED' exists and is a 32-bit unsigned integer convertible. Try-catch blocks in C++ were super slow, not speaking of Python.

or line 310-316

# paths and configurations of 'BACKGROUNDS' images (copy-paste from Parser._get_image_locations)
image_paths = []
image_configurations = []
if "BACKGROUNDS" in config.keys():
    image_paths.append(config['BACKGROUNDS']['PATH'])
    image_configurations = config['BACKGROUNDS']['CONFIGURATIONS'][:]

both image_paths and image_configurations were declared as empty lists in lines 310,311, but one of them is appended and the other is overwritten. I copy-pasted it from Parser._get_image_locations (to keep the same behavior), so it might be worth checking what happens if you have more than one background image (I guess only the last one will be used)

Previously if you did make_dataset with an option save_to_disk=False the dictionaries with config files we saved to the disk anyway. The reason was that in line 405 you saved dicts and in line 411 you read them back. First, it is unnecessary, second it is a hard drive use bottleneck. I fixed this behavior in the second commit, but there is a question about the lines 352-354 in the old code.

    for configuration in dataset.configurations:
        np.save("{0}/{1}_sim_dicts.npy".format(dataset.outdir, configuration), {0: organizer.configuration_sim_dicts[configuration]}, allow_pickle=True)
        del organizer.configuration_sim_dicts[configuration]

So you saved dicts to the disk, cleared them from organizer's class fields (as well as from dataset.organizer's fields), and then reread the dicts back. I don't see any reason to clear those from fields, but if there is something behind it you might want to change the behavior.

Overall I would say that the logic behind the code is quite questionable. I might not know the general paradigm and underlying details, so think through the corrections I've made. To check that everything is fine I also added unit tests for the changes I've made. They don't follow your usual paradigm of unit tests, so you can delete them if they are not needed.

egorssed added 2 commits March 17, 2022 19:40

make_dataset support dicts

002d8ed

Remove unnecessary save-load in save_to_disk=False

474a318

egorssed added bug Something isn't working enhancement New feature or request invalid This doesn't seem right labels Mar 18, 2022

egorssed requested a review from bnord March 18, 2022 13:19

egorssed changed the title ~~"Bug fixed" make_dataset supports dicts as input + doesn't do unnecessary save-load~~ "Bug fixed" make_dataset supports dicts as input + don't do unnecessary save-load Mar 18, 2022

egorssed changed the title ~~"Bug fixed" make_dataset supports dicts as input + don't do unnecessary save-load~~ "Bug fixed" make_dataset supports dicts as input and doesn't do unnecessary save-load Mar 18, 2022

AeRabelais changed the base branch from master to development June 26, 2023 16:23

AeRabelais merged commit 0a57f6e into development Jun 26, 2023

AeRabelais deleted the dev-egor branch June 26, 2023 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Bug fixed" make_dataset supports dicts as input and doesn't do unnecessary save-load #65

"Bug fixed" make_dataset supports dicts as input and doesn't do unnecessary save-load #65

egorssed commented Mar 18, 2022 •

edited

Loading

"Bug fixed" make_dataset supports dicts as input and doesn't do unnecessary save-load #65

"Bug fixed" make_dataset supports dicts as input and doesn't do unnecessary save-load #65

Conversation

egorssed commented Mar 18, 2022 • edited Loading

egorssed commented Mar 18, 2022 •

edited

Loading