Issue64 #66

bnlawrence · 2025-01-07T07:55:13Z

This is the simple fix described for #64, includes a test that exposes it, and the one liner to fix it. Works for a number of other data files. Now includes the correct issue number in the commits.

kalvdans · 2025-01-07T08:01:19Z

tests/test_filter_pipeline_v2.py

+        assert 'data' in hfile
+        d = hfile['data']
+        # the point of this test is to ensure we can actually retrieve the compression opts
+        x = d.compression_opts


As an enhancement to the unit test, could you also check the value of x to be correct?

kalvdans · 2025-01-07T08:04:50Z

pyfive/dataobjects.py

                          if d['filter_id'] == GZIP_DEFLATE_FILTER][0]
-            return gzip_entry['client_data'][0]
+            key = {0:'client_data_values',1:'client_data'}['client_data' in gzip_entry]
+            return gzip_entry[key][0]
        return None


By the documentation pointed to in your issue, [link], it seems there is only a single list of client data values. Instead of your suggested fix, I think we instead should search-and-replace in the code and never produce any dictionary with "client_data" as the key. (disclaimer: I haven't actually studied the source code yet)

I can see arguments for doing it both ways, the code I've committed is as faithful to what is in the file at point of use, to do different would require hacking the message structure before using it ... I'd defer to the package owners on what is the best option.

That's fine, thanks for elaborating!

(At some point you'd have to do what I've got above, insofar as we see files with both compression option structures, we can't control what we will get to have to read.)

bmaranville · 2025-01-07T14:37:13Z

I think there is a bug in dataobjects.py, when reading the filter pipeline.

For version 1 pipeline descriptions, filter_info["client_data_values"] is read directly from the message and corresponds to the length of the client data array. The values are then read into filter_info["client_data"].

For version 2 pipelines, the values in the client data array are stored in filter_info["client_data_values"], and a different variable (num_client_values) is used to temporarily store the length of the array.

This is incorrect - the values should be stored in filter_info["client_data"] as they are for v1, and possibly for consistency we could store the length of the value array in filter_info["client_data_values"] (though that value would probably not get used again)

If this is fixed then the changes to dataobjects.py above are not needed.

bmaranville

I recommend fixing the section of dataobjects.py that reads the client data from the file into a dict rather than trying to deal with the broken dict after the fact.

bmaranville · 2025-01-07T14:43:45Z

Ha... looking back through the history it looks like it might be me that committed the code that I'm suggesting be fixed here.

bmaranville · 2025-01-07T14:50:21Z

diff --git a/pyfive/dataobjects.py b/pyfive/dataobjects.py
index adb19eb..5cdc169 100644
--- a/pyfive/dataobjects.py
+++ b/pyfive/dataobjects.py
@@ -409,7 +409,8 @@ class DataObjects(object):
                 filter_info['name'] = name
                 client_values = struct.unpack_from("<{:d}i".format(num_client_values), self.msg_data, offset)
                 offset += (4 * num_client_values)
-                filter_info['client_data_values'] = client_values
+                filter_info['client_data'] = client_values
+                filter_info['client_data_values'] = num_client_values
 
                 filters.append(filter_info)
         else:

bmaranville · 2025-01-07T14:53:41Z

Your new test passes after the diff above (without the other change in ca1b6d6)

kalvdans · 2025-01-07T15:13:58Z

+                filter_info['client_data'] = client_values
+                filter_info['client_data_values'] = num_client_values

Storing the number of values in a separate key is redundant, as it could be extracted with len(filter_info['client_data']).

bmaranville · 2025-01-07T15:26:38Z

Oh, I agree it's redundant - I was just trying to keep the dict consistent between v1 and v2 pipelines (in v1, you have to have that key because it's part of the struct message parsed directly from disk, while in v2 we're manually creating the dict)

bnlawrence · 2025-01-15T17:35:31Z

Yes, it works for me too! What's the best way forward? Do you want me to fix this in my pull request and go from there?

…jhelmus#67 and jjhelmus#66

Bryan Lawrence added 2 commits January 7, 2025 07:53

Test which exposes jjhelmus#64

b7ba6b2

Fix for issue jjhelmus#64.

ca1b6d6

kalvdans reviewed Jan 7, 2025

View reviewed changes

With check we get the right compression options value

cb5a98d

kalvdans approved these changes Jan 7, 2025

View reviewed changes

bmaranville self-requested a review January 7, 2025 14:38

bmaranville requested changes Jan 7, 2025

View reviewed changes

bnlawrence pushed a commit to NCAS-CMS/pyfive that referenced this pull request Jan 15, 2025

Minor changes which come from upstream advice on my two pull requests j…

c4a38b9

…jhelmus#67 and jjhelmus#66

bnlawrence mentioned this pull request Jan 21, 2025

client data values should be a list in the filter pipeline NCAS-CMS/pyfive#32

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue64 #66

Issue64 #66

bnlawrence commented Jan 7, 2025

kalvdans Jan 7, 2025

kalvdans Jan 7, 2025

bnlawrence Jan 7, 2025

kalvdans Jan 7, 2025

bnlawrence Jan 7, 2025

bmaranville commented Jan 7, 2025

bmaranville left a comment

bmaranville commented Jan 7, 2025

bmaranville commented Jan 7, 2025

bmaranville commented Jan 7, 2025

kalvdans commented Jan 7, 2025

bmaranville commented Jan 7, 2025

bnlawrence commented Jan 15, 2025

Issue64 #66

Are you sure you want to change the base?

Issue64 #66

Conversation

bnlawrence commented Jan 7, 2025

kalvdans Jan 7, 2025

Choose a reason for hiding this comment

kalvdans Jan 7, 2025

Choose a reason for hiding this comment

bnlawrence Jan 7, 2025

Choose a reason for hiding this comment

kalvdans Jan 7, 2025

Choose a reason for hiding this comment

bnlawrence Jan 7, 2025

Choose a reason for hiding this comment

bmaranville commented Jan 7, 2025

bmaranville left a comment

Choose a reason for hiding this comment

bmaranville commented Jan 7, 2025

bmaranville commented Jan 7, 2025

bmaranville commented Jan 7, 2025

kalvdans commented Jan 7, 2025

bmaranville commented Jan 7, 2025

bnlawrence commented Jan 15, 2025