-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep (some) input data ordered in MUSE2 #395
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #395 +/- ##
=======================================
Coverage 95.42% 95.43%
=======================================
Files 31 31
Lines 4004 4007 +3
Branches 4004 4007 +3
=======================================
+ Hits 3821 3824 +3
Misses 96 96
Partials 87 87 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think deterministic outputs making debugging easier is enough justification for this. I'm very on the fence with this in principle though.
It's possibly an interesting discussion for another time, but there's something about separation of concerns, and not baking in order to the output so as not to impose arbitrary structure. Data processing pipelines should be robust enough to handle unordered data.
Code changes look good.
I agree in principle, but this is mostly about debugability/convenience really -- I think it will confuse users if they get different-looking data files every time they run the program, for no obvious reason. Users won't necessarily be particularly technically savvy either and their Excel-based workflows might not be that robust 😛. It will also make it annoying to write regression tests, for example, if the outputs are in a different order on every run. This probably wouldn't be worth doing if it was annoying to implement, but given it's an easy fix, I think we should just do it to make life a bit easier. Anyhoo, thanks for the review! |
Description
We make heavy use of
HashMap
s to store data in MUSE2, usually with the key being some kind of user-specified string ID (representing a commodity, process etc.).HashMap
s are not ordered data structures, meaning if you iterate over their contents you will get a different ordering every time you run the program, even when using the same input files.While the dispatch optimisation model doesn't care about ordering (even time slices within a year are independent of one another), a consequence of this design is that the output files will be effectively randomly shuffled every time the user runs the program with the same input data, which is annoying and potentially confusing. As it stands, if a user was to e.g. make a bar chart of commodity prices by time slice in Excel, unless they sorted the data properly beforehand they would get a different-looking plot every time they run the program. You'd have the same problem with Python too: users will end up having to write a whole bunch of logic to sort the input data just in order to plot it. Note that sorting time slices alphabetically will give a confusing order, so they'll have to have an ordered list of time slices -- and if they add/remove time slices to the model then they'll end up having to change the script. In addition, this will also make it really annoying to debug the code: every time you start the program under a debugger, your decision variables will be in a different randomised order.
So let's use an ordered map where it makes sense (i.e. if the ordering will be user-visible). There's an external crate (
indexmap
) which maintains the insertion order of elements. We're using it already, so I've just replaced variousHashMap
s withIndexMap
s (as well as using a few ordered sets in the time slice code). While I was at it, I defined type aliases for various common map types we're using to make the code a bit cleaner: e.g. aCommodityMap
as an alias forIndexMap<Rc<str>, Rc<Commodity>>
.Closes #327.
Type of change
Key checklist
$ cargo test
$ cargo doc
Further checks