Benchmark amount of intermediate data stored #10

TomNicholas · 2024-03-04T23:53:19Z

WIP Attempt to record the amount of intermediate data stored, just by looking at the work directory.

TomNicholas · 2024-03-04T23:53:43Z

tests/benchmarks/test_array.py

        executor=spec.executor,
+        spec=spec,


Seems redundant

TomNicholas · 2024-03-04T23:56:03Z

tests/utils.py

@@ -34,7 +58,7 @@ def run(
    callbacks.append(history)
    kwargs['callbacks'] = callbacks

-    with benchmarks(history):
+    with benchmarks(history, spec.work_dir):


The spec.work_dir isn't the right thing to pass - it's not specific enough. I want the directory containing only the intermediate results written during the last cubed run.

I can see that each such directly is uniquely labeled which is nice, but I'm not sure exactly how to get this label from the computation. Should I import cubed.core.plan.CONTEXT_ID or is there a simpler way?

The event object passed to on_compute_start (and on_compute_end) in the callback contains the context_id (see https://github.com/cubed-dev/cubed/blob/fda2f1e7bedec5389cae00ee751a54076ccf28e3/cubed/runtime/types.py#L28-L49). The HistoryCallback doesn't capture this - perhaps it could be changed, or maybe it's simpler to write a custom callback?

In cubed.core.plan there is a CONTEXT_ID and a compute_id. The former seems to be used to make temporary directories, whilst the latter is what's stored in the event object. What's the difference between them?

CONTEXT_ID is the directory name created for the duration of the client Python process. The compute_id is a recent addition used for creating directories to store history data in that is unique to the call to compute.

So you're right - you want CONTEXT_ID, but the intermediate data for all computations run in the pytest session will be there, so it's hard to pick out the arrays for a particular computation.

We should probably make Cubed store its intermediate data in a directory named {CONTEXT_ID}/{compute_id}, but that's a bit more work.

Thanks for the explanation!

We should probably make Cubed store its intermediate data in a directory named {CONTEXT_ID}/{compute_id}, but that's a bit more work.

I would like to do that - I can make a PR to cubed.

(Continuing discussion from cubed-dev/cubed#413)

I think the easier way to solve the original problem in #10 would be to just get the intermediate array paths from the DAG

So then the benchmark context managers need to know about the plan object right? Or can we add it to what's saved in history.plan?

I tried looking at the plan object but if this information is stored in there I can't seem to find it. I thought looking at target would give me what I need but for the smallest quad means example I just get None for each node, when clearly something was written.

I'm also wondering if there is any subtlety here with the optimized vs unoptimized plan?

The plan only has the op names now. Optimized vs unoptimized shouldn't make a difference. Try something like (untested)

array_names = [name for name in dag if name.startswith("array-")]

Where dag comes from the callback object.

TomNicholas · 2024-03-04T23:57:15Z

tests/utils.py

@@ -34,7 +58,7 @@ def run(
    callbacks.append(history)
    kwargs['callbacks'] = callbacks

-    with benchmarks(history):
+    with benchmarks(history, spec.work_dir):


I can't see a way to avoid passing extra information to the benchmarks context manager - the history does not contain this information.

TomNicholas · 2024-03-05T00:02:13Z

tests/utils.py

+    # List all files and subdirectories in the directory
+    contents = fs.glob(f'{work_dir}/**', detail=True)


This is currently failing to discover any of the contents

tomwhite · 2024-03-05T09:35:50Z

tests/utils.py

+    total_size = 0
+
+    if work_dir.startswith('s3://'):
+        fs = fsspec.filesystem('s3')


A simpler way to do this (that I only discovered yesterday!) is

fs, _, _ = fsspec.get_fs_token_paths(work_dir)

TomNicholas added 2 commits March 4, 2024 18:37

fixture to benchmark amount of intermediate data stored

af11fe2

pass test to executor so it can know the work_dir

4630591

TomNicholas commented Mar 4, 2024

View reviewed changes

tests/benchmarks/test_array.py

Comment on lines 72 to +73

executor=spec.executor,

spec=spec,

Copy link

Member Author

TomNicholas Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems redundant

TomNicholas commented Mar 4, 2024

View reviewed changes

TomNicholas commented Mar 5, 2024

View reviewed changes

tomwhite reviewed Mar 5, 2024

View reviewed changes

This was referenced Mar 5, 2024

Cubed-specific diagnostics to record #3

Open

Store intermediate data in a directory named after the compute ID cubed-dev/cubed#413

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark amount of intermediate data stored #10

Benchmark amount of intermediate data stored #10

Uh oh!

TomNicholas commented Mar 4, 2024 •

edited

Loading

Uh oh!

TomNicholas Mar 4, 2024

Uh oh!

TomNicholas Mar 4, 2024

Uh oh!

tomwhite Mar 5, 2024

Uh oh!

TomNicholas Mar 5, 2024 •

edited

Loading

Uh oh!

tomwhite Mar 5, 2024

Uh oh!

TomNicholas Mar 5, 2024

Uh oh!

TomNicholas Mar 7, 2024

Uh oh!

tomwhite Mar 7, 2024

Uh oh!

TomNicholas Mar 4, 2024

Uh oh!

TomNicholas Mar 5, 2024

Uh oh!

tomwhite Mar 5, 2024

Uh oh!

Uh oh!

		# List all files and subdirectories in the directory
		contents = fs.glob(f'{work_dir}/**', detail=True)

Benchmark amount of intermediate data stored #10

Are you sure you want to change the base?

Benchmark amount of intermediate data stored #10

Uh oh!

Conversation

TomNicholas commented Mar 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomNicholas Mar 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TomNicholas commented Mar 4, 2024 •

edited

Loading

TomNicholas Mar 5, 2024 •

edited

Loading