Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

single and double floating-point numbers are confused #98

Closed
zjin-lcf opened this issue May 12, 2021 · 10 comments · Fixed by #100
Closed

single and double floating-point numbers are confused #98

zjin-lcf opened this issue May 12, 2021 · 10 comments · Fixed by #100
Assignees
Labels

Comments

@zjin-lcf
Copy link

~/BabelStream/build$ ./sycl-stream --float --device 2
BabelStream
Version: 3.4
Implementation: SYCL
Running kernels 100 times
Precision: float
Array size: 134.2 MB (=0.1 GB)
Total size: 402.7 MB (=0.4 GB)
Using SYCL device Intel(R) Iris(R) Xe Graphics
terminate called after throwing an instance of 'cl::sycl::compile_program_error'
what(): The program was built for 1 devices
Build program log for 'Intel(R) Iris(R) Xe Graphics':

error :double type is not supported on this platform
in kernel: 'typeinfo name for sycl_kernels::copy'
error: backend compiler failed build.

Thanks

@tomdeakin
Copy link
Contributor

Thanks for reporting this. I had forgotten that support for FP64 on the device is optional in SYCL just as it is in OpenCL. In the OpenCL code, we have to check that the double version will work.

I've added a SYCL 1.2.1 solution to the current version of the code in #100 in the issue-98 branch. Can you check if this works on your device please?

SYCL 2020 makes this check a bit nicer, so I'll update #77 with a similar check ready for when we merge that in.

tomdeakin added a commit that referenced this issue May 17, 2021
This will resolve #98 in the future SYCL 2020 version.
@tomdeakin tomdeakin added the bug label May 17, 2021
@tomdeakin tomdeakin self-assigned this May 17, 2021
@zjin-lcf
Copy link
Author

Sorry, the issue may not be really fixed.

./sycl-stream --float --device device_id
error :double type is not supported on this platform
in kernel: 'typeinfo name for sycl_kernels::init'
error: backend compiler failed build.

error :double type is not supported on this platform
in kernel: 'typeinfo name for sycl_kernels::init'
error: backend compiler failed build.

error :double type is not supported on this platform
in kernel: 'typeinfo name for sycl_kernels::init'
error: backend compiler failed build.
-11 (CL_BUILD_PROGRAM_FAILURE)
Aborted (core dumped)

@zjin-lcf
Copy link
Author

Sorry, I didn't check it before you closed the issue.

@tomdeakin
Copy link
Contributor

It auto-closed when I merged the PR, sorry!

@tomdeakin tomdeakin reopened this May 18, 2021
@tomdeakin tomdeakin reopened this May 18, 2021
@tomdeakin
Copy link
Contributor

Thanks for testing @zjin-lcf. I think the problem is now that both float and double have template instantiations (final two lines of SYCLStream.cpp, so both are being passed to the SYCL compiler at runtime. The compiler must be running before this code selects the device and exits if the feature is not supported.

I need to double check the SYCL specifications to find out if there is anything we can do in application code at compile time; but I suspect not.

@tomdeakin
Copy link
Contributor

@tom91136 also suggested that you could try to enable FP64 emulation on the DevCloud Xe GPUs with the following environment variables:

export OverrideDefaultFP64Settings=1 
export IGC_EnableDPEmulation=1

This should silence the compiler issues and run the float code. The double will probably run, just in emulated mode.

@zjin-lcf
Copy link
Author

zjin-lcf commented May 18, 2021

Thank you for your suggestion. Running the program shows the message :

Validation failed on sum. Error 3.64297e-06

It seems that the option --arraysize does not change the number of array elements.

Could you reproduce that ?

Thanks

@tomdeakin
Copy link
Contributor

Issue #20 summarises the problems with the dot product kernel and single precision. There isn't a good solution that we've found.

@tomdeakin tomdeakin reopened this May 19, 2021
@zjin-lcf
Copy link
Author

Okay. I assume that the error bounds are different for single precision and double precision. People may just care about bandwidth for the benchmark, though.

@tomdeakin
Copy link
Contributor

Same error bounds (1.0E-8), but that might be a good way to account for the difference. I agree it's off-putting to have an error about correct values. I'll make a note in #20 with this suggestion. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants