Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize memory access #67

Draft
wants to merge 63 commits into
base: main
Choose a base branch
from
Draft

Conversation

sfiligoi
Copy link
Collaborator

The weighted method on the CPU benefits drastically from transposed access pattern in CPU mode, due to shorter vector length and finer grained logic.
Also optimized the CPU vs GPU sizes a bit.
Newer NVCC also seems to optimize better without explicit vector_size in OpenACC.

sfiligoi and others added 30 commits January 23, 2025 14:38
@sfiligoi sfiligoi marked this pull request as draft February 12, 2025 02:39
@sfiligoi
Copy link
Collaborator Author

sfiligoi commented Feb 12, 2025

Some benchmark numbers:
On a 8-core AMD Ryzen 9 7940HS (16 threads, using AVX512)
EMP weighted normalized times went from 822s to 273s (vs 682s in v1.4)
EMP unweighted times went from 155s to 139s (vs 190s in v1.4)

On a NVIDIA RTX4060 GPU
EMP weighted normalized times went from 247s to 173s
EMP unweighted times went from 41s to 37s

@sfiligoi
Copy link
Collaborator Author

On a Apple M2 Pro CPU (12 threads, ARM)
EMP weighted normalized times went from 690s to 331s
EMP unweighted times went from 164s to 161s

@sfiligoi
Copy link
Collaborator Author

sfiligoi commented Feb 15, 2025

On barnacle2 b2-006 node, which has 2x AMD EPYC 7302 CPUs

A single 16-core AMD EPYC 7302 CPU (32 threads, AVX2)
EMP weighted normalized times went from 602s to 200s (vs 507s in v1.4)
EMP unweighted times went from 120s to 112s (vs 147s in v1.4)

Using both 16-core AMD EPYC 7302 CPUs (64 threads total, AVX2)
EMP weighted normalized times went from 341s to 116s (vs 291s in v1.4)
EMP unweighted times went from 65s to 63s (vs 80s in v1.4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant