Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global load #25

Open
karloballa opened this issue Aug 23, 2019 · 2 comments
Open

Global load #25

karloballa opened this issue Aug 23, 2019 · 2 comments

Comments

@karloballa
Copy link

I want to optimize memory loads and tried several different instructions to see how it goes. Up to now, however, I got success only with flat_load.

I would like to compile a kernel for RX 4xx/5xx, but if I use global_load_dword** I'm getting "unknown instruction". I suppose that there are some flags for that...

Also, I have a question about "buffer resources". How to access them, and how to use them? I tried several kernel setups but didn't have luck. Is there any table which explains what is where? Who decides what info goes to which register (driver?)? Infos on the net are sparse and most of them are contradictional...

Thanks in advance

@matszpk
Copy link
Member

matszpk commented Aug 23, 2019

The GLOBAL_* and SCRATCH_* instructions has been introduced in the RX Vega GPU and they are not available in Fiji/Polaris GPU's. The resource buffer was used in old OpenCL implementation and in the first GCN GPU generation (Tahiti, Pitcairn, HD 7xxx). For new OpenCL drivers and newer GPU's the FLAT_* instructions are recommended to access memory.

@karloballa
Copy link
Author

Thanks for the explanation,

It seems that I misinterpreted GCN generations. I thought that Polaris is GCN 1.4 and Vega GCN 1.5.

As for buffer_load it looked to me that I could spare some instructions. Now it is understandable why the compiler is forcing flat_load.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants