Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signal 11 error #294

Open
nataneb opened this issue Oct 23, 2016 · 5 comments
Open

Signal 11 error #294

nataneb opened this issue Oct 23, 2016 · 5 comments

Comments

@nataneb
Copy link

nataneb commented Oct 23, 2016

Hello,
I get this error when I try to run my grappa program:

Graph memory breakdown:
locale_heap_size: 0.133796 GB
global_heap_size: 0.0735908 GB
graph_total_size: 0.207387 GB
Exiting due to signal 11 with siginfo 0x400149f6f270 and payload 0x400149f6f140
srun: error: n25: task 0: Exited with exit code 1

I'm also using your graphlab implementation and running the program on sampa server.
What can I do to solve the problem?
Please, let me know if you need more information.

Thanks

@bmyerz
Copy link
Member

bmyerz commented Oct 24, 2016

A backtrace will provide more information

see https://github.com/uwsampa/grappa/blob/master/doc/debugging.md

@nelsonje
Copy link
Member

Natalia, since you're running on our cluster you should just send me an email with a pointer to the code that's failing, and I'll take a look when I get a chance.

@metolent
Copy link

I'm sitting with Natalia looking at the issue. We disassembled her binary to look at the assembly at the location of the segfault and it wasn't clear precisely where it was occurring. We observed a number of calls to Boost library functions before and after the faulting address, but weren't able to track them down as the addresses were not included within the binary. The sizes of the vertices used increased, but not by more than 2KB.

I believe we would have to rebuild grappa in debug mode to get a backtrace, right? We don't own the cluster, so I believe that would be problematic.

Is there another debug mechanism you could propose that would enable us to glean some insight from the segfault?

@nelsonje
Copy link
Member

You're running on our cluster, and Natalia has sent me the details on the segfaulting binary, so it's easy for me to take a look at the backtrace myself---I just haven't had a chance yet.

In fact the backtrace from the non-debug binary is often useful in debugging these sorts of problems (we compile the optimized binary with debugging symbols too), but it's hard to make sense of it without understanding the guts of Grappa. When the backtrace just shows addresses instead of code, it usually means the problem actually occurred before the segfault happened, but it corrupted some scheduler data structure and screwed up the stack. The backtrace won't be helpful in this case.

I'll get back to you two as soon as I've found a moment to take a look at the code.

@nataneb
Copy link
Author

nataneb commented Oct 27, 2016

We tried some experiments. As it's label propagation algorithm, we have an array attached to each vertex. We change its size and realized that after some point, if it's big enough, we get segmentation fault. For example the program works for the arrays with size of 62 and doesn't work for 75 or more. We also tried for another program which is similar to this one and it also fails. It might give you some idea about failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants