Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FireMarshal br-build crashes #206

Closed
michael-etzkorn opened this issue Aug 24, 2021 · 5 comments
Closed

FireMarshal br-build crashes #206

michael-etzkorn opened this issue Aug 24, 2021 · 5 comments

Comments

@michael-etzkorn
Copy link

Trying to run ./marshal -v -d build br-base.json still.
I'm not sure what the issue is, but I'm guessing it has something to do with RAM since the errors seem resource and fork related. Is 16 gigs insufficient for this? Is there any workaround for this?

2021-08-24 17:09:47,914 [run         ] [DEBUG]  scripts/Makefile.build:266: recipe for target 'fs/nfs/nfs4namespace.o' failed
2021-08-24 17:09:47,914 [run         ] [DEBUG]  make[2]: *** [fs/nfs/nfs4namespace.o] Error 1
2021-08-24 17:09:47,914 [run         ] [DEBUG]  scripts/Makefile.build:266: recipe for target 'drivers/gpu/drm/radeon/rs690.o' failed
2021-08-24 17:09:47,914 [run         ] [DEBUG]  make[4]: *** [drivers/gpu/drm/radeon/rs690.o] Error 1
2021-08-24 17:09:47,914 [run         ] [DEBUG]  scripts/Makefile.build:266: recipe for target 'drivers/gpu/drm/radeon/radeon_legacy_tv.o' failed
2021-08-24 17:09:47,914 [run         ] [DEBUG]  make[4]: *** [drivers/gpu/drm/radeon/radeon_legacy_tv.o] Error 1
2021-08-24 17:09:47,914 [run         ] [DEBUG]  scripts/Makefile.build:266: recipe for target 'lib/earlycpio.o' failed
2021-08-24 17:09:47,914 [run         ] [DEBUG]  make[1]: *** [lib/earlycpio.o] Error 1
2021-08-24 17:09:47,914 [run         ] [DEBUG]  riscv64-unknown-linux-gnu-gcc: fatal error: cannot execute '/mnt/Vivado_part/chipyard/chipyard/riscv-tools-install/libexec/gcc/riscv64-unknown-linux-gnu/9.2.0/cc1': vfork: Resource temporarily unavailable
2021-08-24 17:09:47,914 [run         ] [DEBUG]  scripts/Makefile.build:266: recipe for target 'fs/nfs/nfs4namespace.o' failed
2021-08-24 17:09:47,914 [run         ] [DEBUG]  make[2]: *** [fs/nfs/nfs4namespace.o] Error 1
2021-08-24 17:09:47,914 [run         ] [DEBUG]  scripts/Makefile.build:266: recipe for target 'drivers/gpu/drm/radeon/rs690.o' failed
2021-08-24 17:09:47,914 [run         ] [DEBUG]  make[4]: *** [drivers/gpu/drm/radeon/rs690.o] Error 1
2021-08-24 17:09:47,914 [run         ] [DEBUG]  scripts/Makefile.build:266: recipe for target 'drivers/gpu/drm/radeon/radeon_legacy_tv.o' failed
2021-08-24 17:09:47,914 [run         ] [DEBUG]  make[4]: *** [drivers/gpu/drm/radeon/radeon_legacy_tv.o] Error 1
2021-08-24 17:09:47,914 [run         ] [DEBUG]  scripts/Makefile.build:266: recipe for target 'lib/earlycpio.o' failed
2021-08-24 17:09:47,914 [run         ] [DEBUG]  make[1]: *** [lib/earlycpio.o] Error 1
2021-08-24 17:09:47,914 [run         ] [DEBUG]  riscv64-unknown-linux-gnu-gcc: fatal error: cannot execute '/mnt/Vivado_part/chipyard/chipyard/riscv-tools-install/libexec/gcc/riscv64-unknown-linux-gnu/9.2.0/cc1': vfork: Resource temporarily unavailable
2021-08-24 17:09:47,914 [run         ] [DEBUG]  scripts/Makefile.build:266: recipe for target 'fs/nfs/nfs4namespace.o' failed
2021-08-24 17:09:47,914 [run         ] [DEBUG]  make[2]: *** [fs/nfs/nfs4namespace.o] Error 1
2021-08-24 17:09:47,914 [run         ] [DEBUG]  scripts/Makefile.build:266: recipe for target 'drivers/gpu/drm/radeon/rs690.o' failed
2021-08-24 17:09:47,914 [run         ] [DEBUG]  make[4]: *** [drivers/gpu/drm/radeon/rs690.o] Error 1
2021-08-24 17:09:47,914 [run         ] [DEBUG]  scripts/Makefile.build:266: recipe for target 'drivers/gpu/drm/radeon/radeon_legacy_tv.o' failed
2021-08-24 17:09:47,914 [run         ] [DEBUG]  make[4]: *** [drivers/gpu/drm/radeon/radeon_legacy_tv.o] Error 1
2021-08-24 17:09:47,914 [run         ] [DEBUG]  scripts/Makefile.build:266: recipe for target 'lib/earlycpio.o' failed
2021-08-24 17:09:47,914 [run         ] [DEBUG]  make[1]: *** [lib/earlycpio.o] Error 1
2021-08-24 17:09:47,914 [run         ] [DEBUG]  riscv64-unknown-linux-gnu-gcc: fatal error: cannot execute '/mnt/Vivado_part/chipyard/chipyard/riscv-tools-install/libexec/gcc/riscv64-unknown-linux-gnu/9.2.0/cc1': vfork: Resource temporarily unavailable
2021-08-24 17:09:18,784 [run         ] [DEBUG]  /bin/sh: fork: retry: No child processes
2021-08-24 17:09:18,784 [run         ] [DEBUG]  riscv64-unknown-linux-gnu-gcc: fatal error: cannot execute '/mnt/Vivado_part/chipyard/chipyard/riscv-tools-install/libexec/gcc/riscv64-unknown-linux-gnu/9.2.0/cc1': vfork: Resource temporarily unavailable
2021-08-24 17:09:18,785 [run         ] [DEBUG]  compilation terminated.
2021-08-24 17:09:18,785 [run         ] [DEBUG]  riscv64-unknown-linux-gnu-gcc: fatal error: cannot execute '/mnt/Vivado_part/chipyard/chipyard/riscv-tools-install/libexec/gcc/riscv64-unknown-linux-gnu/9.2.0/cc1': vfork: Resource temporarily unavailable
2021-08-24 17:09:18,785 [run         ] [DEBUG]  /bin/sh: fork: retry: No child processes
2021-08-24 17:09:18,785 [run         ] [DEBUG]  /bin/sh: fork: retry: No child processes
2021-08-24 17:09:18,785 [run         ] [DEBUG]  compilation terminated.
2021-08-24 17:09:18,785 [run         ] [DEBUG]  /bin/sh: fork: retry: No child processes
2021-08-24 17:09:18,785 [run         ] [DEBUG]  riscv64-unknown-linux-gnu-gcc: fatal error: cannot execute '/mnt/Vivado_part/chipyard/chipyard/riscv-tools-install/libexec/gcc/riscv64-unknown-linux-gnu/9.2.0/cc1': vfork: Resource temporarily unavailable
2021-08-24 17:09:18,786 [run         ] [DEBUG]  compilation terminated.
2021-08-24 17:09:18,786 [run         ] [DEBUG]  riscv64-unknown-linux-gnu-gcc: fatal error: cannot execute '/mnt/Vivado_part/chipyard/chipyard/riscv-tools-install/libexec/gcc/riscv64-unknown-linux-gnu/9.2.0/cc1': vfork: Resource temporarily unavailable
2021-08-24 17:09:18,786 [run         ] [DEBUG]  riscv64-unknown-linux-gnu-gcc: fatal error: cannot execute '/mnt/Vivado_part/chipyard/chipyard/riscv-tools-install/libexec/gcc/riscv64-unknown-linux-gnu/9.2.0/cc1': vfork: Resource temporarily unavailable
2021-08-24 17:09:18,786 [run         ] [DEBUG]  /bin/sh: fork: retry: No child processes
2021-08-24 17:09:18,786 [run         ] [DEBUG]  riscv64-unknown-linux-gnu-gcc: fatal error: cannot execute '/mnt/Vivado_part/chipyard/chipyard/riscv-tools-install/libexec/gcc/riscv64-unknown-linux-gnu/9.2.0/cc1': vfork: Resource temporarily unavailable
2021-08-24 17:09:18,786 [run         ] [DEBUG]  CC      lib/irq_regs.o
2021-08-24 17:09:18,787 [run         ] [DEBUG]  /bin/sh: fork: retry: No child processes
2021-08-24 17:09:18,787 [run         ] [DEBUG]  riscv64-unknown-linux-gnu-gcc: fatal error: cannot execute '/mnt/Vivado_part/chipyard/chipyard/riscv-tools-install/libexec/gcc/riscv64-unknown-linux-gnu/9.2.0/cc1': vfork: Resource temporarily unavailable
2021-08-24 17:09:18,787 [run         ] [DEBUG]  compilation terminated.
2021-08-24 17:09:18,787 [run         ] [DEBUG]  /bin/sh: fork: Resource temporarily unavailable
2021-08-24 17:09:18,787 [run         ] [DEBUG]  CC      drivers/gpu/drm/radeon/evergreen_blit_shaders.o
2021-08-24 17:09:18,787 [run         ] [DEBUG]  /bin/sh: fork: retry: No child processes
2021-08-24 17:09:18,788 [run         ] [DEBUG]  compilation terminated.
2021-08-24 17:09:18,788 [run         ] [DEBUG]  /bin/sh: fork: retry: No child processes
2021-08-24 17:09:18,796 [run         ] [DEBUG]  riscv64-unknown-linux-gnu-gcc: fatal error: cannot execute '/mnt/Vivado_part/chipyard/chipyard/riscv-tools-install/libexec/gcc/riscv64-unknown-linux-gnu/9.2.0/cc1': vfork: Resource temporarily unavailable
2021-08-24 17:09:18,796 [run         ] [DEBUG]  riscv64-unknown-linux-gnu-gcc: fatal error: cannot execute '/mnt/Vivado_part/chipyard/chipyard/riscv-tools-install/libexec/gcc/riscv64-unknown-linux-gnu/9.2.0/cc1': vfork: Resource temporarily unavailable
2021-08-24 17:09:18,796 [run         ] [DEBUG]  riscv64-unknown-linux-gnu-gcc: fatal error: cannot execute '/mnt/Vivado_part/chipyard/chipyard/riscv-tools-install/libexec/gcc/riscv64-unknown-linux-gnu/9.2.0/cc1': vfork: Resource temporarily unavailable
2021-08-24 17:09:18,796 [run         ] [DEBUG]  compilation terminated.
2021-08-24 17:09:18,796 [run         ] [DEBUG]  scripts/Makefile.build:266: recipe for target 'drivers/gpu/drm/radeon/r600.o' failed
2021-08-24 17:09:18,796 [run         ] [DEBUG]  make[4]: *** [drivers/gpu/drm/radeon/r600.o] Error 1
2021-08-24 17:09:18,796 [run         ] [DEBUG]  scripts/Makefile.build:266: recipe for target 'lib/extable.o' failed
2021-08-24 17:09:18,797 [run         ] [DEBUG]  make[1]: *** [lib/extable.o] Error 1
2021-08-24 17:09:18,797 [run         ] [DEBUG]  make[1]: *** Waiting for unfinished jobs....
@NathanTP
Copy link
Contributor

hmm, this is a new one to me. It looks like it's happening fairly early on in the linux build process, before any of the --no-disk stuff would come into play. Does this still happen without -d? You might be out of disk space or if /mnt/Vivado_part is a network mount or something there could be an issue there. Has anything ever built?

You might try building something simpler, like one of the bare-metal tests (try "marshal test -s test/bare.py"). I don't think you need 16GB of RAM for this, and even if you did I would hope you have a swap device to pick up the slack. One other possibility is that marshal by default uses unbounded parallelism which in hindsight probably isn't great (though it's never caused any problems), you might have an unusually low process cap on your machine. I should probably fix that, but for now you can try adding "jlevel: 4" to marshal-config.yaml and see if that fixes anything (replace 4 with however many cores you have).

@michael-etzkorn
Copy link
Author

The mnt directory isn't actually mounted over a network, it's an artifact of cloning a server specifically to test out chipyard. We have 55GB available on the disk so should have enough room.

I'm guessing the disk included binary wouldn't be bootable onto the VCU118 which is the hope of this experimenting. At any rate, I've launched the build once more to see if it'll fail. Before that I tried running marshal test -s test/bare.py which resulted in:

WARNING: Unrecognized Option: InvalidOption!
WARNING: Skipping inherit-childOwnBin.yaml:
WARNING: 	Missing required option 'modules'
WARNING: Base config 'bare not found.
WARNING: Skipping spike-jobs.yaml:
WARNING: 	Missing required option 'bare'
WARNING: Base config 'bare not found.
WARNING: Skipping dummy.yaml:
WARNING: 	Missing required option 'bare'
WARNING: Base config 'bare not found.
WARNING: Skipping dummy-bare.yaml:
WARNING: 	Missing required option 'bare'
WARNING: Base config 'fedora-base.json not found.
WARNING: Skipping fed-run.yaml:
WARNING: 	Missing required option 'fedora-base.json'
WARNING: Base config 'bare not found.
WARNING: Skipping bare.yaml:
WARNING: 	Missing required option 'bare'
WARNING: Base config 'bare not found.
WARNING: Skipping spike.yaml:
WARNING: 	Missing required option 'bare'
WARNING: Skipping linux-src.yaml:
WARNING: 	Missing required option 'modules'
WARNING: Base config 'fedora-base.json not found.
WARNING: Skipping fed-smoke0.yaml:
WARNING: 	Missing required option 'fedora-base.json'
WARNING: Base config 'bare not found.
WARNING: Skipping rocc.yaml:
WARNING: 	Missing required option 'bare'
WARNING: Skipping kfrag.yaml:
WARNING: 	Missing required option 'modules'
To check on progress, either call marshal with '-v' or see the live output at: 
/mnt/Vivado_part/chipyard/chipyard/software/firemarshal/logs/bare-test-2021-08-24--22-31-03-VV3FACESO709TB5Q.log
ERROR: Cannot locate workload: bare.py

Not sure if I needed to add anything to my marshal-config.yaml for that. If it crashes building the disk version, I'll try setting the jlevel and hopefully that fixes it.

@NathanTP
Copy link
Contributor

Sorry, that was a typo on my part. It should be 'test/bare.yaml' not 'bare.py'.

I'm still not clear on the issue with Linux, but you should try the jlevel thing and report back (you can also see #207 for a cleaner fix). Also, if you're going for a nodisk build, be aware of ucb-bar/chipyard#950 which is affecting workloads with >~100MB of data in them (we're still looking into that bug, it used to work with much larger images).

@michael-etzkorn
Copy link
Author

Adding jlevel: 4 to marshal-config.yaml fixed the problem! Maybe there's some cap on processes for this machine. Seems like br-base builds ok now (hopefully it boots) and we'll probably be sizing down from here, but thanks for warning me of ucb-bar/chipyard#950

@NathanTP
Copy link
Contributor

Nice! Ya, so #207 will fix the issue permanently. I don't know how many processes Linux can spawn while building but clearly it was more than your system cap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants