Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catalog #13

Merged
merged 74 commits into from
Oct 24, 2024
Merged

Catalog #13

merged 74 commits into from
Oct 24, 2024

Conversation

stroblme
Copy link
Collaborator

@stroblme stroblme commented Oct 21, 2024

This PR resolves #11 by

  • moving IO interaction to kedro by introducing various datasets in the catalog
  • removing file actions in various nodes and pipelines
    • adiabatic maxcut
    • adiabatic trackrec
    • anneal schedule
    • generation
    • qaoa maxcut
    • qaoa trackrec
    • qubo
    • visualization
  • cleaning up variable dependencies
  • copied the sample events into 01_raw (persistent in git now). In the long run we should utilize e.g. the trackml library to load arbitrary event data (e.g. based on a parameter)
  • added cleanup.sh to remove any data that nodes created
  • update docstring (once concept is approved by @majafranz )

Note1: When merging from main, I broke the geometric_index stuff, because of conflicts. Should be fixed before merging back again. Fixed it, parameter is now considered when building qubo.
Note2: I added a mean value calc. in the anneal schedules method, because the length of the vectors did not add up. However, I might have messed sth. up with the datatype when saving, so please check that.
Note3: The visualization pipeline is only roughly implemented and lacks more details, but we should discuss that in a meeting how to proceed imo. Because the "approach" slightly changed with this PR from "put all in one CSV and have parameters embedded there" to "separate result files and parameters are being inferred from the config", context is always required when visualizing from the result files. One quick-and-dirty solution would be to have a final "collection" node that gathers together all the results and required parameter and then builds up the solution.csv file which is being used for plotting right now. A much cleaner solution would be the plotting of the individual results directly as kedro node, which, however, would require switching from R to Python or using the ggplot2 port.
Note4: In the long run we could utilize namespaces to allow switching between track-reconstruction and maxcut and thus reusing existing nodes to be a bit closer to the "spirit"

lc3267 added 30 commits October 17, 2024 12:10
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
gc
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
gc
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
qa
Signed-off-by: lc3267 <[email protected]>
gc
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
@majafranz
Copy link
Member

Thanks for the effort and the great work! Looks good to me. 🚀

But, I would prefer CSV files instead of JSON in the results, as this is easier to plot with R.

I think pandas dataframes are easy to export as a pandas.CSVDataset, but we have to make sure the columns are properly set. I can try to fix that.

@stroblme
Copy link
Collaborator Author

CSV is fine for me, just keep in mind what I wrote in Note3; tldr: eventually we don't have to read from the "intermediate" results with R :) maybe you need to tweak the load_args or save_args

@majafranz
Copy link
Member

CSV is fine for me, just keep in mind what I wrote in Note3; tldr: eventually we don't have to read from the "intermediate" results with R :) maybe you need to tweak the load_args or save_args

Ah I think there might be a misunderstanding. I am not only using the previous "solution.csv" but also the other files for plotting (and also for plotting different things). But maybe let's discuss this later.

@@ -383,7 +383,9 @@ def times_from_QAOA_params(betas: np.ndarray, gammas: np.ndarray) -> np.ndarray:
time = 0.0
time_midpoints = np.zeros(p + 1)
for i in range(p):
time_segment = np.abs(gammas[i]) + np.abs(betas[i])
time_segment = np.mean(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the mean should not be necessary. Maybe the beta and gamma values are somehow lists of lists now? beta[i] and gamma[i] should just be a float.

Copy link
Collaborator Author

@stroblme stroblme Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, let's go with max_p, can you fix that @majafranz ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 1e13311

"xy_angle_parts ": num_angle_parts,
}
model = QallseSplit(data_wrapper, **extra_config)
build_model(doublets=doublets, model=model, add_missing=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you add an additional build_model function? The qallse version does not like the "doublets"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed with 807836a

@stroblme
Copy link
Collaborator Author

stroblme commented Oct 23, 2024

  • @stroblme check how to connect config params and output files in kedro (eventually create export config node)
  • @majafranz convert json output to csv

lc3267 added 6 commits October 23, 2024 13:21
qa
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
Signed-off-by: lc3267 <[email protected]>
@stroblme
Copy link
Collaborator Author

added export parameters pipeline (merged with viz)

@stroblme stroblme mentioned this pull request Oct 24, 2024
@stroblme stroblme changed the title [DRAFT] Catalog Catalog Oct 24, 2024
@stroblme stroblme merged commit 1f6fea2 into main Oct 24, 2024
1 check passed
@stroblme stroblme deleted the catalog branch October 24, 2024 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use Kedro`s capabilities of having dataset catalog
2 participants