Program received signal SIGSEGV: Segmentation fault - invalid memory reference. #9

asterismo · 2018-05-10T00:05:34Z

I modified NMAX to 20000 objects, and recompiled. Executed the integrator and i got a segfault.

I pasted in Los Molinos Observatory pastebin.

https://pastebin.oalm.gub.uy/view/94773e01

At the bottom of the pastebin is the error.

texadactyl · 2018-05-10T17:17:46Z

I cannot reproduce the reported segmentation fault. But, I have fixed some bugs and got rid of the warnings. You can find my work under the issue, "donationware"

I am using Xubuntu 17.10 on a Celeron 1.8 GHz motherboard.

asterismo · 2018-05-14T13:26:58Z

I'm using Debian 8 Jessie on a 1st Gen Core i5 (2.4 GHz) in my personal laptop, and i also tried in a Intel Xeon CPU E3-1225 v3 @ 3.20GHz. It throw the same error in both machines. I will try your code, thanks!

texadactyl · 2018-05-14T14:03:25Z

I discovered character-handling anomalies (not in the Science code). I had different symptoms with NMAX=20,000. Cannot claim that I found them all.

asterismo · 2018-05-14T14:49:06Z

I got the same error:

./mercury6
Integrating massive bodies and particles up to the same epoch.
Beginning the main integration.

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x7F6CA8F86407
#1 0x7F6CA8F86A1E
#2 0x7F6CA84A30DF
#3 0x414EF4 in mco_x2ov_
#4 0x415775 in mio_ce_
#5 0x419C89 in mal_hvar_
#6 0x424DAD in MAIN__ at mercury6_2.for:?
Violación de segmento

texadactyl · 2018-05-14T14:53:53Z

Could you try adding a -g parameter to FFLAGS in the Makefile to see if we could get line numbers in the traceback?

FFLAGS=-g -O2 -Wline-truncation -Wsurprising -Werror

texadactyl · 2018-05-14T14:57:18Z

Are you using the exec_all.sh script that I created after the make step? See file 20180507_texadactyl.txt.

texadactyl · 2018-05-14T14:59:33Z

My last results (should still be in the log folder):

===== Thu May 10 13:04:57 CDT 2018 ==============================================================
Begin mercury6 (basic integration) .....

Integrating massive bodies and particles up to the same epoch.
Beginning the main integration.
Date: 2119 8 6.8 dE/E: -5.74125E-13 dL/L: -3.01314E-13
Date: 2229 2 12.0 dE/E: -3.80231E-13 dL/L: -2.23733E-13
Date: 2338 8 24.2 dE/E: -2.39739E-13 dL/L: -2.57743E-13
Date: 2448 3 1.5 dE/E: -5.57036E-13 dL/L: -4.18236E-13
Date: 2557 9 10.4 dE/E: -1.39440E-12 dL/L: -8.00990E-13
Date: 2667 3 22.5 dE/E: -1.55674E-12 dL/L: -9.02654E-13
Date: 2776 9 27.8 dE/E: -2.84598E-13 dL/L: -4.58864E-13
Date: 2886 4 6.7 dE/E: -3.24691E-13 dL/L: -5.87001E-13
Date: 2995 10 18.5 dE/E: 1.11851E-12 dL/L: -1.20047E-13
Date: 3105 4 26.1 dE/E: 1.33508E-12 dL/L: -2.04062E-13
Date: 3214 11 1.1 dE/E: 2.18526E-12 dL/L: 9.32068E-14
Date: 3324 5 8.6 dE/E: 2.61544E-12 dL/L: 2.91202E-13
Date: 3433 11 17.8 dE/E: 2.35533E-12 dL/L: 1.08282E-13
Date: 3543 5 27.1 dE/E: 1.56611E-12 dL/L: -2.68590E-13
Date: 3652 12 2.6 dE/E: 2.16965E-12 dL/L: -1.24092E-13
Date: 3762 6 10.9 dE/E: 1.91726E-12 dL/L: -2.17666E-13
Date: 3871 12 24.0 dE/E: 2.18246E-12 dL/L: -3.00946E-13
etc.

texadactyl · 2018-05-14T15:03:54Z

Set the NMAX back to 20000 then:

elkins@biostar:/projects/mercury-master$ simple_clean.sh
elkins@biostar:/projects/mercury-master$ make clean all
rm -f ./bin/element6 ./bin/close6 ./bin/mercury6 ./src/flog_*.log
gfortran -g -O2 -Wline-truncation -Wsurprising -Werror -o ./bin/close6 ./src/close6.for 2>&1 | tee ./src/flog_close6.log
gfortran -g -O2 -Wline-truncation -Wsurprising -Werror -o ./bin/element6 ./src/element6.for 2>&1 | tee ./src/flog_element6.log
gfortran -g -O2 -Wline-truncation -Wsurprising -Werror -o ./bin/mercury6 ./src/mercury6_2.for 2>&1 | tee ./src/flog_mercury6.log
elkins@biostar:~/projects/mercury-master$ exec_all.sh

===== Mon May 14 10:02:18 CDT 2018 ==============================================================
Begin mercury6 (basic integration) .....

Integrating massive bodies and particles up to the same epoch.
Beginning the main integration.
Date: 2119 8 6.8 dE/E: -5.74125E-13 dL/L: -3.01314E-13
Date: 2229 2 12.0 dE/E: -3.80231E-13 dL/L: -2.23733E-13
Date: 2338 8 24.2 dE/E: -2.39739E-13 dL/L: -2.57743E-13
Date: 2448 3 1.5 dE/E: -5.57036E-13 dL/L: -4.18236E-13
Date: 2557 9 10.4 dE/E: -1.39440E-12 dL/L: -8.00990E-13
etc.

asterismo · 2018-05-14T15:19:57Z

Nope, i spotted the issue. I had also to edit the CMAX parameter and recompile

This is the info.out. Now is running fine

tail -f info.out
Integration details
-------------------

Initial energy: -3.32264E-08 solar masses AU^2 day^-2
Initial angular momentum: 6.08216E-05 solar masses AU^2 day^-1

Integrating massive bodies and particles up to the same epoch.

Beginning the main integration.

WARNING: Total number of current close encounters exceeds CMAX.
Modify mercury.inc and recompile Mercury.

texadactyl · 2018-05-14T15:22:17Z

When you had the crash, what were the parameters set to? Cut and paste or just tell me. I'll try to see if there is a relatively simple fix and put in some diagnostics. Crashing is dumb.

texadactyl · 2018-05-14T15:37:23Z

Actually, that "warning" should be a termination message. When it appears, various vector and array initialization is by passed, leaving random values - see if-statement at line 1663 in mercury6_2.for:
if (nclo.gt.CMAX) then

asterismo · 2018-05-14T15:45:48Z

param.in
param.in.txt

asterismo · 2018-05-14T15:47:37Z

small.in
small.in.txt

texadactyl · 2018-05-14T15:49:47Z

So, you modified param.in and small.in in the exec subfolder.
How about the src subfolder before you compiled? mercury.inc (NMAX=20000, right?)

asterismo · 2018-05-14T15:50:23Z

Steps to reproduce the initial problem.

clone and compile
execute, expect crash and warning about NMAX and recompile
modify NMAX from 2000 to 20000 and recompile, crash and the "...received signal SIGSEGV: Segmentation fault - invalid memory reference.
modify CMAX for encounters from 50 to 5000 and recompile
Then it works

texadactyl · 2018-05-14T15:51:39Z

Compiled with -g option? Did you see a traceback with line numbers this time?

asterismo · 2018-05-14T15:55:58Z

I executed the ./compile from the git repo, and got plenty of warnings.
Now i executed make in your personal mercury-master, and it compiled without warnings.
I'm using Debian 8 Jessie

texadactyl · 2018-05-14T16:01:42Z

Okay, with your small.in and param.in, I have mercury6 in a loop.
top shows mercury6 eating nearly 100% of a CPU core.
I was waiting to see if it would crash finally.
It seems to be writing to info.out over & over again:

WARNING: Total number of current close encounters exceeds CMAX.
Modify mercury.inc and recompile Mercury.

Again, this is not a "WARNING" situation, in my opinion. This should be a fatal error.

asterismo · 2018-05-14T16:05:26Z

yes, with your version, i still waiting to start... the full-of-warnings-git-version seems to do the trick.

texadactyl · 2018-05-14T16:05:37Z

I suspect that this old Fortran IV/77 code has never really been fully diagnosed. Ideally, it will someday be converted to Python and use the numpy libraries for vector and matrix calls without hand-coding.

But, in the meantime, Fortran diagnostic code can be added.

texadactyl · 2018-05-14T16:07:10Z

Finally, a traceback with line numbers!

Begin mercury6 (basic integration) .....

Integrating massive bodies and particles up to the same epoch.
Beginning the main integration.

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x7fb31e67d16a in ???
#1 0x7fb31e67c393 in ???
#2 0x7fb31dd4c13f in ???
#3 0x56300c09836c in mco_x2ov_
at ./src/mercury6_2.for:2878
#4 0x56300c098b87 in mio_ce_
at ./src/mercury6_2.for:5155
#5 0x56300c09d054 in mal_hvar_
at ./src/mercury6_2.for:401
#6 0x56300c0a6f3e in MAIN__
at ./src/mercury6_2.for:170
#7 0x56300c085ffe in main
at ./src/mercury6_2.for:217
exec_all.subsh: line 11: 15109 Segmentation fault (core dumped) $BIN/mercury6

texadactyl · 2018-05-14T16:43:31Z

Memory is trashed. Doesn't matter which version is used.

The crash is caused by calculations in subroutine mco_x2ov, line 2878:
v2 = uu + vv + w*w

Variables u, v, and w are parameters passed in. E.g. ending on line 5155 in the middle of a do-loop from 1 to nclo in subroutine mio_ce:
call mco_x2ov (rcen,rmax,m(1),0.d0,jxvclo(1,k),jxvclo(2,k),
% jxvclo(3,k),jxvclo(4,k),jxvclo(5,k),jxvclo(6,k),fr,theta,phi,
% fv,vtheta,vphi)

but k exceeds CMAX!

See subroutine mce_stat starting at line 1662.
nclo keeps getting incremented regardless of the array bounds.

See local data arrays of dimension CMAX starting at line 296. nclo, as used all over the place, is passed in as an array-element counter. nclo must never exceed CMAX.

Failure to control nclo causes the array bounds to be exceeded upon reference.
Therefore, illegal memory reference.

texadactyl · 2018-05-14T16:45:51Z

Suggested solution: Once CMAX is breached in line 1663, put out an ERROR message and exit to the O/S.

Make sense to you?

asterismo · 2018-05-14T16:53:42Z

I'm afraid that is too technical for my knowledge, but if you say so... go ahead.

texadactyl · 2018-05-14T18:25:56Z

Sorry, I wasn't trying to be obtuse. You are too modest!

texadactyl · 2018-05-14T18:33:52Z

Ok, the new code, with your small.in and param.in displays this and exits peacefully:

Begin mercury6 (basic integration) .....

Integrating massive bodies and particles up to the same epoch.
Beginning the main integration.

ERROR: Total number of current close encounters exceeds CMAX.
Modify mercury.inc and recompile Mercury.

That error message also appears in info.out. I just copied it to the console to wake up the sleeping scientist.

asterismo · 2018-05-14T18:35:42Z

Is that a valid advice? if i keep increasing CMAX value it eventually run?

texadactyl · 2018-05-14T18:42:46Z

That depends on how much your small.in population causes close encounters. Your case was a lot of asteroids (4667?) compared to the default which was 2. On the other hand, CMAX cannot be arbitrarily large or it won't fit into RAM. This code needs to give some better advice.

I have not read all of John Chambers' code.

smirik · 2018-05-14T21:00:43Z

As far as I know mercury6 was initially designed to work with no more than 1000 objects. If you increase the number of object it might cause segfault errors.

Please also take into account that if you increase the number of object you usually decrease the accuracy of the calculations.

So, I would like to advise:

Decrease the number of the asteroids to 1000 or less.
Test the software with different numbers of the bodies and compare the results.

smirik · 2018-05-16T07:09:17Z

Fix was applied in #11.

asterismo · 2018-05-17T04:48:36Z

Thanks, i will take that into account.

texadactyl · 2018-05-21T16:41:04Z

@asterismo, did you solve your issue by a combination of decreasing the number of objects and/or increasing CMAX?

Aneeskhan673 · 2022-01-07T13:28:51Z

properly specified in dipoleconfig.f
Dipole configurations for this process not
properly specified in dipoleconfig.f
Dipole configurations for this process not
properly specified in dipoleconfig.f

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
free(): corrupted unsorted chunks
Dipole configurations for this process not
properly specified in dipoleconfig.f

Program received signal SIGABRT: Process abort signal.
Thanks for using LHAPDF 6.2.3. Please make sure to cite the paper:
Eur.Phys.J. C75 (2015) 3, 132 (http://arxiv.org/abs/1412.7420)

Backtrace for this error:
#0 0x7f7803a8ed21 in ???
#1 0x7f7803a8def5 in ???
#0 0x7f7803a8ed21 in ???
#1 0x7f7803a8def5 in ???
#2 0x7f780375620f in ???
#2 0x7f780375620f in ???
#3 0x7f780375618b in ???
#3 0x7f78037599f6 in ???
#4 0x7f7803735858 in ???
#4 0x7f7803759bdf in ???
#5 0x7f78037a03ed in ???
#5 0x7f7803a90f14 in ???
#6 0x7f78037a847b in ???
#6 0x559f7b0087e3 in ???
#7 0x559f7b008e66 in ???
#7 0x7f78037aa1c1 in ???
#8 0x559f7b0fba15 in ???
#8 0x559f802784f0 in ???
#9 0x559f7af809ab in ???
#9 0x7f780375a43e in ???
#10 0x7f7803f7c78d in ???
#10 0x7f7803759b8c in ???
#11 0x7f7803759bdf in ???
#12 0x7f7803a90f14 in ???
#13 0x559f7b0087e3 in ???
#14 0x559f7b008e66 in ???
#15 0x559f7b0fba15 in ???
#16 0x559f7af809ab in ???
#17 0x7f7803f7c78d in ???
#18 0x7f78036f0608 in start_thread
#11 0x7f78036f0608 in start_thread
at /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477
at /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477
#19 0x7f7803832292 in ???
#12 0x7f7803832292 in ???
#20 0xffffffffffffffff in ???
#13 0xffffffffffffffff in ???
Aborted (core dumped)
why this error came?

smirik closed this as completed May 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Program received signal SIGSEGV: Segmentation fault - invalid memory reference. #9

Program received signal SIGSEGV: Segmentation fault - invalid memory reference. #9

asterismo commented May 10, 2018

texadactyl commented May 10, 2018 •

edited

Loading

asterismo commented May 14, 2018

texadactyl commented May 14, 2018 •

edited

Loading

asterismo commented May 14, 2018

texadactyl commented May 14, 2018

texadactyl commented May 14, 2018

texadactyl commented May 14, 2018

texadactyl commented May 14, 2018 •

edited

Loading

asterismo commented May 14, 2018

texadactyl commented May 14, 2018

texadactyl commented May 14, 2018

asterismo commented May 14, 2018

asterismo commented May 14, 2018

texadactyl commented May 14, 2018

asterismo commented May 14, 2018

texadactyl commented May 14, 2018

asterismo commented May 14, 2018

texadactyl commented May 14, 2018

asterismo commented May 14, 2018

texadactyl commented May 14, 2018

texadactyl commented May 14, 2018

texadactyl commented May 14, 2018 •

edited

Loading

texadactyl commented May 14, 2018

asterismo commented May 14, 2018

texadactyl commented May 14, 2018

texadactyl commented May 14, 2018

asterismo commented May 14, 2018

texadactyl commented May 14, 2018

smirik commented May 14, 2018

smirik commented May 16, 2018

asterismo commented May 17, 2018

texadactyl commented May 21, 2018

Aneeskhan673 commented Jan 7, 2022

Program received signal SIGSEGV: Segmentation fault - invalid memory reference. #9

Program received signal SIGSEGV: Segmentation fault - invalid memory reference. #9

Comments

asterismo commented May 10, 2018

texadactyl commented May 10, 2018 • edited Loading

asterismo commented May 14, 2018

texadactyl commented May 14, 2018 • edited Loading

asterismo commented May 14, 2018

texadactyl commented May 14, 2018

texadactyl commented May 14, 2018

texadactyl commented May 14, 2018

texadactyl commented May 14, 2018 • edited Loading

asterismo commented May 14, 2018

texadactyl commented May 14, 2018

texadactyl commented May 14, 2018

asterismo commented May 14, 2018

asterismo commented May 14, 2018

texadactyl commented May 14, 2018

asterismo commented May 14, 2018

texadactyl commented May 14, 2018

asterismo commented May 14, 2018

texadactyl commented May 14, 2018

asterismo commented May 14, 2018

texadactyl commented May 14, 2018

texadactyl commented May 14, 2018

texadactyl commented May 14, 2018 • edited Loading

texadactyl commented May 14, 2018

asterismo commented May 14, 2018

texadactyl commented May 14, 2018

texadactyl commented May 14, 2018

asterismo commented May 14, 2018

texadactyl commented May 14, 2018

smirik commented May 14, 2018

smirik commented May 16, 2018

asterismo commented May 17, 2018

texadactyl commented May 21, 2018

Aneeskhan673 commented Jan 7, 2022

texadactyl commented May 10, 2018 •

edited

Loading

texadactyl commented May 14, 2018 •

edited

Loading

texadactyl commented May 14, 2018 •

edited

Loading

texadactyl commented May 14, 2018 •

edited

Loading