Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Bug fixed" make_dataset supports dicts as input and doesn't do unnecessary save-load #65

Merged
merged 2 commits into from
Jun 26, 2023

Conversation

egorssed
Copy link

@egorssed egorssed commented Mar 18, 2022

  1. In the documentation it was declared that you can generate a dataset from python's dict
def make_dataset(config, ...):
    """
    Generate a dataset from a config file.
    Args:
        config (str or dict): name of yaml file specifying dataset characteristics or pre-parsed yaml file as dictionary
    ...
    """

, but all the realization relied on parser and manipulation with files, hence the actual generation from dict didn't work. I fixed it in the first commit.

  1. Some lines are very questionable, e.g. line 349
    try:
        dataset.seed = int(dataset.config_dict['DATASET']['PARAMETERS']["SEED"])
    except: 
        ...

Just check that the key 'SEED' exists and is a 32-bit unsigned integer convertible. Try-catch blocks in C++ were super slow, not speaking of Python.

or line 310-316

# paths and configurations of 'BACKGROUNDS' images (copy-paste from Parser._get_image_locations)
image_paths = []
image_configurations = []
if "BACKGROUNDS" in config.keys():
    image_paths.append(config['BACKGROUNDS']['PATH'])
    image_configurations = config['BACKGROUNDS']['CONFIGURATIONS'][:]

both image_paths and image_configurations were declared as empty lists in lines 310,311, but one of them is appended and the other is overwritten. I copy-pasted it from Parser._get_image_locations (to keep the same behavior), so it might be worth checking what happens if you have more than one background image (I guess only the last one will be used)

  1. Previously if you did make_dataset with an option save_to_disk=False the dictionaries with config files we saved to the disk anyway. The reason was that in line 405 you saved dicts and in line 411 you read them back. First, it is unnecessary, second it is a hard drive use bottleneck. I fixed this behavior in the second commit, but there is a question about the lines 352-354 in the old code.
    for configuration in dataset.configurations:
        np.save("{0}/{1}_sim_dicts.npy".format(dataset.outdir, configuration), {0: organizer.configuration_sim_dicts[configuration]}, allow_pickle=True)
        del organizer.configuration_sim_dicts[configuration]

So you saved dicts to the disk, cleared them from organizer's class fields (as well as from dataset.organizer's fields), and then reread the dicts back. I don't see any reason to clear those from fields, but if there is something behind it you might want to change the behavior.

Overall I would say that the logic behind the code is quite questionable. I might not know the general paradigm and underlying details, so think through the corrections I've made. To check that everything is fine I also added unit tests for the changes I've made. They don't follow your usual paradigm of unit tests, so you can delete them if they are not needed.

@egorssed egorssed added bug Something isn't working enhancement New feature or request invalid This doesn't seem right labels Mar 18, 2022
@egorssed egorssed requested a review from bnord March 18, 2022 13:19
@egorssed egorssed changed the title "Bug fixed" make_dataset supports dicts as input + doesn't do unnecessary save-load "Bug fixed" make_dataset supports dicts as input + don't do unnecessary save-load Mar 18, 2022
@egorssed egorssed changed the title "Bug fixed" make_dataset supports dicts as input + don't do unnecessary save-load "Bug fixed" make_dataset supports dicts as input and doesn't do unnecessary save-load Mar 18, 2022
@AeRabelais AeRabelais changed the base branch from master to development June 26, 2023 16:23
@AeRabelais AeRabelais merged commit 0a57f6e into development Jun 26, 2023
@AeRabelais AeRabelais deleted the dev-egor branch June 26, 2023 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request invalid This doesn't seem right
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants