DM-31824: Add experimental mtransfer class method #107

timj · 2025-03-04T19:14:22Z

Checklist

ran Jenkins
added a release note for user-visible changes to doc/changes

codecov · 2025-03-04T19:17:19Z

Codecov Report

Attention: Patch coverage is 94.81481% with 7 lines in your changes missing coverage. Please review.

Project coverage is 86.96%. Comparing base (2cf0f8f) to head (aa90387).

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
python/lsst/resources/_resourcePath.py	89.58%	2 Missing and 3 partials ⚠️
python/lsst/resources/gs.py	50.00%	1 Missing ⚠️
python/lsst/resources/tests.py	97.82%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #107      +/-   ##
==========================================
+ Coverage   86.80%   86.96%   +0.15%     
==========================================
  Files          27       27              
  Lines        4690     4793     +103     
  Branches      566      578      +12     
==========================================
+ Hits         4071     4168      +97     
- Misses        474      477       +3     
- Partials      145      148       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Instead of using tmpdir and moving to the new location which might then involve a copy (and in some cases fill up tmpdir)

Only go over the loop once.

Do not download directly to destination directory if the destination directory does not exist.

timj · 2025-03-06T19:59:47Z

python/lsst/resources/_resourcePath.py

+
+        Returns
+        -------
+        copy_status : `dict` [ `ResourcePath`, `bool` ]


@dhirving once sticking point for mtransfer over transfer_from in a loop is that the latter raises an exception and tells you the problem immediately and you stop doing the transfers. mtransfer as written lets everything complete and then tells you a simple yes/no when the caller would like to know some reasons for failures (such as FileExistsError). I could store the Exceptions in the returned dict. Or I could let the first failure raise (is that allowed in concurrent futures?)? Or at the end of the transfers all the number of failures could be counted and the final exception encountered could be raised (with a note saying how many other failures there were). Should there be a parameter to indicate whether the caller wants a raise vs dict returned? I would be interested in your opinion on this. (butler does need to have some idea as to why a failure happened to try to help in its error message)

One idea might be to raise an ExceptionGroup with every failure.

I think your existing dict thing would work too, I wonder if it could be like

class TransferResult(NamedTuple): success: bool exception: Exception | None dict[ResourcePath, TransferResult]

or something like that.

Bailing immediately on the first failure is potentially problematic because you will still have concurrent transfers going in the background... I think you can cancel the unscheduled ones but you would want the already-executing ones to finish before throwing. I think your idea of adding a parameter to choose "bail on first failure" vs "continue as far as possible" is decent.

(which is not currently public)

timj force-pushed the tickets/DM-31824 branch from b1d09d8 to b821655 Compare March 4, 2025 19:37

Add experimental mtransfer class method

edee89d

timj force-pushed the tickets/DM-31824 branch from b821655 to edee89d Compare March 4, 2025 20:29

timj added 2 commits March 4, 2025 15:10

Allow transfer_from and as_local to control multithreading

0e02e0a

Disable multithreading in bulk transfers

efe6ba9

timj force-pushed the tickets/DM-31824 branch from f1f6b31 to 2eb4cef Compare March 5, 2025 21:21

Use env var to determine number of workers

de1ecf0

timj force-pushed the tickets/DM-31824 branch from 2eb4cef to de1ecf0 Compare March 5, 2025 21:32

timj added 4 commits March 6, 2025 12:44

Allow as_local to specify the temp directory to use

4a9d719

Change transfer_from for posix destination to use output dir

cef7a5c

Instead of using tmpdir and moving to the new location which might then involve a copy (and in some cases fill up tmpdir)

Allow mtransfer to be called with iterable

d4ad5c3

Only go over the loop once.

Reorganize file.transfer_from to move some logic earlier

1ceaecb

Do not download directly to destination directory if the destination directory does not exist.

timj force-pushed the tickets/DM-31824 branch from 66bad34 to 1ceaecb Compare March 6, 2025 19:44

Fix sphinx docstring

79f29fa

timj force-pushed the tickets/DM-31824 branch from c781ae9 to 79f29fa Compare March 6, 2025 19:52

timj commented Mar 6, 2025

View reviewed changes

timj added 2 commits March 6, 2025 15:19

Raise an ExceptionGroup in mtransfer

0d2b2be

Add mtransfer tests

b52a09f

timj force-pushed the tickets/DM-31824 branch from 9f8e277 to 6977e01 Compare March 6, 2025 23:21

timj added 2 commits March 6, 2025 16:50

Fix warning in doc build with MTransferResult

f7f22cc

(which is not currently public)

Add tests for as_local tmpdir

aa90387

timj force-pushed the tickets/DM-31824 branch from 6977e01 to aa90387 Compare March 6, 2025 23:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-31824: Add experimental mtransfer class method #107

DM-31824: Add experimental mtransfer class method #107

timj commented Mar 4, 2025

codecov bot commented Mar 4, 2025 •

edited

Loading

timj Mar 6, 2025

dhirving Mar 6, 2025

DM-31824: Add experimental mtransfer class method #107

Are you sure you want to change the base?

DM-31824: Add experimental mtransfer class method #107

Conversation

timj commented Mar 4, 2025

Checklist

codecov bot commented Mar 4, 2025 • edited Loading

Codecov Report

timj Mar 6, 2025

Choose a reason for hiding this comment

dhirving Mar 6, 2025

Choose a reason for hiding this comment

codecov bot commented Mar 4, 2025 •

edited

Loading