Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error processing SEQNAME_ARRAY with h5py #295

Open
bjeight opened this issue Feb 15, 2024 · 1 comment
Open

Error processing SEQNAME_ARRAY with h5py #295

bjeight opened this issue Feb 15, 2024 · 1 comment

Comments

@bjeight
Copy link

bjeight commented Feb 15, 2024

For one reason or another, I want to parse the hal file output from cactus in python.

Using h5py, like:

import h5py

f = h5py.File('my.hal', 'r')
print(f['Anc00']['SEQNAME_ARRAY'])

gives:

...
  File "h5py/h5t.pyx", line 435, in h5py.h5t.TypeID.dtype.__get__
  File "h5py/h5t.pyx", line 951, in h5py.h5t.TypeIntegerID.py_dtype
TypeError: data type '<i15' not understood

Using h5dump to look at the header:

❯ h5dump -H my.hal | grep -A50 "Anc00" | grep -A2 "SEQNAME_ARRAY"
      DATASET "SEQNAME_ARRAY" {
         DATATYPE  120-bit little-endian integer 8-bit precision
         DATASPACE  SIMPLE { ( 32 ) / ( 32 ) }

Same as above, but look at another genome (Pfa):

TypeError: data type '<i14' not understood

and:

❯ h5dump -H my.hal | grep -A50 "Pfa" | grep -A2 "SEQNAME_ARRAY"
      DATASET "SEQNAME_ARRAY" {
         DATATYPE  112-bit little-endian integer 8-bit precision
         DATASPACE  SIMPLE { ( 16 ) / ( 16 ) }

I am out of my depth here but: 8 * 15 = 120 and 8 * 14 = 112, so it's like the python library is considering this field as 8 lots of 14(15)-bit integers instead of the other way around.

Or maybe it's just that there's no appropriate numpy type for this sort of variable length integer?

Thanks very much for your time.

@glennhickey
Copy link
Collaborator

Re-implementing the HAL API in Python is probably not something you want to undertake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants