-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent util.execute from reading output files #296
base: master
Are you sure you want to change the base?
Conversation
What about making it like Perhaps add a note that |
I like your suggestion. I'll refine this and improve the docstring. |
qcengine/util.py
Outdated
@@ -377,6 +377,7 @@ def execute( | |||
infiles: Optional[Dict[str, str]] = None, | |||
outfiles: Optional[List[str]] = None, | |||
*, | |||
outfiles_track: Optional[List[str]] = [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to have None
here, like as_binary
. https://stackoverflow.com/questions/26320899/why-is-the-empty-dictionary-a-dangerous-default-value-in-python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point but shouldn't matter here since outfiles_track
should never be modified, but if you're concerned someone might I could change its type to an immutable object like a Tuple
OR if you prefer a list for consistency that is by default None
, I could add a single line to disk_files
:
outfiles_track = outfiles_track or []
that would make the current code work with no additional changes.
qcengine/util.py
Outdated
except (OSError, FileNotFoundError): | ||
if "*" in fl: | ||
filename = lwd / fl | ||
if fl not in outfiles_track: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you handling the case when outfiles=["subdir/*"], outfiles_track=["subdir/bigfile"]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this will not work (bigfile will be loaded in memory). For outfiles_track
to work, it has to be a subset of outfiles
. So for the following input:
{
...
"outfiles": ["*"],
"outfiles_track": ["bigfile"],
...
}
or the opposite:
{
...
"outfiles": ["bigfile", ...],
"outfiles_track": ["*"],
...
}
outfiles_track
will be ignored. There are also no sanity checks being done on outfiles_track
so if one specifies a filename not part of outfiles
, it will be ignored without any warnings.
I guess the main advantage in setting outfiles_load: bool
is we don't have to worry about individual cases like these.
Description
Adds an extra keyword (
outfiles_load
) to util.execute that instructs disk_files how to handle output files. Ifoutfiles_load=True
(default), the output files are read and their contents returned inoutfiles
. Whenoutfiles_load=False
, only the file paths are returned instead. This is useful and can be necessary when output files are large so keeping track of the output files while retaining the scratch dir becomes a viable alternative:The default behavior is preserved (
outfiles_load=True
inutil.execute
), so by defaultoutfiles
stores the file contents i.e.Changelog description
util.execute supports tracking output files without loading them in memory
Status