Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin issue expected LP_struct_BFarray_ instance instead of struct_BFarray_ #1

Open
telegraphic opened this issue Dec 15, 2021 · 6 comments

Comments

@telegraphic
Copy link
Collaborator

(note: this is 'solved', but I'm writing up here for prosperity and discussion)

@LDRyan0 and I have been working on getting the Bifrost TCC plugin working on the topaz supercomputer.

To get it to compile in py 3.6, with the autoconf branch required some makefile edits (we will PR these once we've tidied it up).

Once compiled, we ran into an 'interesting' error when running it:

ctypes.ArgumentError: argument 2: <class 'TypeError'>: 
expected LP_struct_BFarray_ instance instead of struct_BFarray_

This turns out to be due to the definition of class struct_BFarray_(Structure) in btcc_generated.py, which even though it looks identical the definition of class struct_BFarray_ in bifrost.libbifrost_generated, is being considered a different class by ctypes.

Our fix was straightforward in the end: remove this from btcc_generated.py:

# ../src/bifrost/array.h: 128
class struct_BFarray_(Structure):
    pass

struct_BFarray_.__slots__ = [
    'data',
    'space',
    'dtype',
    'ndim',
    'shape',
    'strides',
    'immutable',
    'big_endian',
    'conjugated',
]
struct_BFarray_._fields_ = [
    ('data', POINTER(None)),
    ('space', BFspace),
    ('dtype', BFdtype),
    ('ndim', c_int),
    ('shape', c_long * int(BF_MAX_DIMS)),
    ('strides', c_long * int(BF_MAX_DIMS)),
    ('immutable', BFbool),
    ('big_endian', BFbool),
    ('conjugated', BFbool),
]

BFarray = struct_BFarray_# ../src/bifrost/array.h: 128

and replace with:

from bifrost.libbifrost_generated import struct_BFarray_
BFarray = struct_BFarray_

@jaycedowell: I'm assuming you didn't have to do this?

Background debug info

For some more debug information, the error was being raised here in btcc_genereatd.py:

    BTccExecute = _lib.BTccExecute
    # print("HERE", type(BFarray), type(POINTER(BFarray)))
    BTccExecute.argtypes = [btcc, POINTER(BFarray), POINTER(BFarray), BFbool]
    BTccExecute.restype = BFstatus

The LP_ prefix in LP_struct_BFarray_ is ctypes telling us it is a pointer to the struct_BFarray,and we are indeed passing POINTER(BFarray) to BTccExecute.argtypes. For example, changing the code to POINTER(POINTER(BFarray)) will result in LP_LP_struct_BFarray_.

The expected LP_struct_BFarray_ instance instead of struct_BFarray_ error message is misleading here, removing the POINTER() and running the code will result in a segfault (we do want to pass a pointer after all). The cryptic hint that helped me solve it was when after some trial and error the traceback said:

expected struct_BFarray_ instance instead of struct_BFarray_

!

@jaycedowell
Copy link
Collaborator

I did not but I've also only tested it under Python2.Dinosaur so this is useful feedback. It will also help with the development of plugin-wrapper since BTCC is based on that.

@jaycedowell
Copy link
Collaborator

Actually, could you transfer this issue over to BTCC so I can keep track of it?

@jaycedowell jaycedowell transferred this issue from lwa-project/bifrost Dec 15, 2021
@jaycedowell
Copy link
Collaborator

That seems to have worked.

@telegraphic
Copy link
Collaborator Author

Yup, looks like it! (I can't see the transfer button so guessing I don't have right permissions)

@jaycedowell
Copy link
Collaborator

I started looking at this today since I'm working on moving ADP over to Py3 but I'm just not hitting this problem. The strange thing is that my btcc_generated.py doesn't have any reference to a # ../src/bifrost/array.h: 128 in it or any of those struct_BFarray_ declarations. What does your call to make_bifrost_plugin.py look like?

@telegraphic
Copy link
Collaborator Author

In btcc_generated.py, the header is:

'''Wrapper for btcc.h

Generated with:
-c -lbtcc -I. -Igroup/director2183/dancpr/src/bifrost_tcc_wrapper/src btcc.h -o btcc_generated.py

Do not modify this file.
'''

libbtcc was generated with:

nvcc -Xcompiler=-Wall,-Winvalid-pch,-Wnon-virtual-dtor -g -G -g -Xcompiler=-fPIC -I/pawsey/centos7.6/devel/binary/cuda/11.1/include -I/group/director2183/dancpr/software/centos7.6/apps/cascadelake/gcc/8.3.0/bifrost/0.0.2/include -I/group/director2183/dancpr/src/tensor-core-correlator/libtc    c -I/group/director2183/dancpr/src/tensor-core-correlator/libtcc -I/group/director2183/dancpr/src/tensor-core-correlator/util -I/group/director2183/dancpr/src/tensor-core-correlator/util -I/group/director2183/dancpr/src/tensor-core-correlator -I/group/director2183/dancpr/src/tensor-core-correlator -I.. -I    . -Ilibbtcc.so.0.0.2.p -Ilibbtcc.so.0.0.2.p -o libbtcc.so.0.0.2.p/btcc.cu.o -c ../btcc.cu

It is a pain to install things on the supercomputer, so my most recent setup is using the meson build system (which I quite like). bifrost itself is built using a strange Pawsey-specific build system called MAALI. However I recall hitting this error when compiling on a more regular GPU system. I'll try and find a minimal example to recreate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants