-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Better error reporting #858
Comments
Hi @jwmelto we need to update that location to provide the error number/string. We will add a fix for that soon. In the meantime it looks like your code is using a CUDA executor, but it's trying to pass host memory.
I will try to reproduce the error on our side |
Thanks. It isn't quite clear what "magic" is handled by the MatX tensors and what I need to handle. In particular, the "locality" of memory is a complete mystery. What I had before seemed grossly less efficient (but may work) (updated for clarity): auto dataT = matx::make_tensor<Complex>( { N } );
for (auto idx = 0; idx < data.size(); ++idx) dataT(idx) = data[idx]; does this construct mystically handle the data movement from host to device? |
I agree, and this is something we need to improve. Ideally the user would never need to worry about where the memory is located and we would handle that for them. That's exactly what managed memory does for you (the default), but that doesn't help if you have existing code using With this line: With your When you do a straight assignment of A = B matx could likely detect this properly and do a cudaMemcpy behind the scenes, including working for vectors. This would make it so the user doesn't have to do it themselves. However, this will not work if you make a larger expression like Does this all make sense? By the way I added better cuFFT error reporting here: #860 For your example I now get:
In this case cuFFT only knows that its kernel crashed. For more information you can run compute-sanitizer:
|
Hi @jwmelto after talking internally about this, doing memory type checks at runtime can be expensive and it's likely better to leave this to the user. This is what python does when you have to explicitly call |
My loop was using a "normal" tensor, not the pointer-initialized version. That does work, and I'm moving forward. As I contemplated reasons for the FFT to fail (when elsewhere in my code it did not), the memory mismatch did come to mind, but I was looking for a little more confirmation, like "invalid parameter" (that's what I get from For a performance-sensitive operation, do you recommend explicitly doing the |
You definitely don't want to copy each item individually if you can avoid it. A |
Thanks. It's what I would have expected, but I've been wrong before. |
One last thing on your example: using a std::vector backing a tensor is technically undefined anyways. The reason is that if your vector grows the standard library may reallocate that data in a completely different memory location, copy it, and free the old pointer. At that point the MatX tensor would point to invalid memory and could crash. |
if you see my last comment in the "Discussion", I was looking for a more "natural" container assignment idiom. I found the pointer-initialized tensor constructor and thought that tensor-to-tensor might invoke the movement to device. I'm OK with doing the explicit declaration and copies, but learning the MatX idioms is slow going. |
I added a small section to the documentation here: Hopefully this will clarify things since I think you have a very common use case. If you've read through the quickstart guide and the "external libraries" section and things still aren't making sense, please let us know. We're always looking to add more docs. |
That looks good. I'll try to create discrete issues for each suggestion. Thanks for your help. |
Is your feature request related to a problem? Please describe.
I ran an FFT and got the obtuse exception thrown:
This occurs at fft/fft_cuda.h:437 in v0.9.0
The exception is raised from the
MATX_ASSERT
macro, which lacks fidelity to describe the failure condition.Describe the solution you'd like
The actual error code is the minimum requirement for problem resolution. The error string would be helpful. Something like the ubiquitous
cudaCheck
macro.Describe alternatives you've considered
I'm currently at a loss how to get the error code out, without modifying MatX code directly.
Additional context
I can guess at the issue; I created a tensor over host memory and passed that into the fft. I surmise that it's a host/device memory issue.
I took a look at v01.1.1 but my project had numerous compilation errors with this version. Further investigation is on-going.
The text was updated successfully, but these errors were encountered: