Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding cuda device synchronizes frequently and peeking at cuda errors #59

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

koparasy
Copy link
Member

No description provided.

@ggeorgakoudis
Copy link
Member

Can you explain more why we are doing this? Is this to catch errors that may have happened in other modules that do not explicitly check the return value of cuda calls?

@koparasy
Copy link
Member Author

Can you explain more why we are doing this? Is this to catch errors that may have happened in other modules that do not explicitly check the return value of cuda calls?

To make something clear. I will not merge this in yet. As it is too brute force. Briefly though, any cuda call made directly by AMS already tests the error code. Yet, we do not have control over copies of umpire or invocations that happen in other modules (torch/FAISS). This is more of a PR/commit to be cherry picked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants