-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-31824: Add experimental mtransfer class method #107
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## main #107 +/- ##
==========================================
+ Coverage 86.80% 86.96% +0.15%
==========================================
Files 27 27
Lines 4690 4793 +103
Branches 566 578 +12
==========================================
+ Hits 4071 4168 +97
- Misses 474 477 +3
- Partials 145 148 +3 ☔ View full report in Codecov by Sentry. |
Instead of using tmpdir and moving to the new location which might then involve a copy (and in some cases fill up tmpdir)
Only go over the loop once.
Do not download directly to destination directory if the destination directory does not exist.
|
||
Returns | ||
------- | ||
copy_status : `dict` [ `ResourcePath`, `bool` ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dhirving once sticking point for mtransfer
over transfer_from
in a loop is that the latter raises an exception and tells you the problem immediately and you stop doing the transfers. mtransfer
as written lets everything complete and then tells you a simple yes/no when the caller would like to know some reasons for failures (such as FileExistsError). I could store the Exceptions in the returned dict. Or I could let the first failure raise (is that allowed in concurrent futures?)? Or at the end of the transfers all the number of failures could be counted and the final exception encountered could be raised (with a note saying how many other failures there were). Should there be a parameter to indicate whether the caller wants a raise vs dict returned? I would be interested in your opinion on this. (butler does need to have some idea as to why a failure happened to try to help in its error message)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One idea might be to raise an ExceptionGroup with every failure.
I think your existing dict thing would work too, I wonder if it could be like
class TransferResult(NamedTuple):
success: bool
exception: Exception | None
dict[ResourcePath, TransferResult]
or something like that.
Bailing immediately on the first failure is potentially problematic because you will still have concurrent transfers going in the background... I think you can cancel the unscheduled ones but you would want the already-executing ones to finish before throwing. I think your idea of adding a parameter to choose "bail on first failure" vs "continue as far as possible" is decent.
(which is not currently public)
Checklist
doc/changes