Skip to content

WIP: Write a CMakeLists.txt to build the NaCl loader #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

illwieckz
Copy link
Member

@illwieckz illwieckz commented Apr 7, 2025

I added some CMakeLists.tx code to rebuild sel_ldr and nacl_helper_bootstrap.

This relies on some unified DaemonPlatform framework copied from DaemonEngine/Daemon#1641 on purpose to give this CMake code the same easiness at doing cross-compiled builds.

I need help to complete the src/trusted/service_runtime/CMakeLists.txt file.

The CMakeLists.txt files are a rewrite of the file SConstruct and *.scons files with all unit tests deleted. Remaining unported code is commented out with lines starting with #TODO:.

Build status:

system arch build sel_ldr build nacl_helper_bootstrap run helloworld nexe
windows amd64 not tested N/A not tested
windows i686 not tested N/A not tested
mingw amd64 ✅️ N/A not tested
mingw i686 ✅️ N/A not tested
linux amd64 ✅️ ✅️ ✅️
linux i686 ✅️ ✅️ not tested
linux armhf ✅️ ✅️ not tested
linux armhf 16k ✅️ ✅️ not tested
linux armel ✅️ ✅️ not tested
linux mips ✅️ ✅️ not tested
android armel ✅️ ✅️ ❌️
macos amd64 ✅️ N/A not tested

Dynamically linked loader:

system arch shared link sel_ldr run helloworld nexe
windows amd64 not tested not tested
windows i686 not tested not tested
mingw amd64 ✅️ not tested
mingw i686 ✅️ not tested
linux amd64 ✅️ ✅️
linux i686 ✅️ not tested
linux armhf ✅️ ✅️
linux armhf 16k ✅️ not tested
linux armel ✅️ not tested
linux mips ✅️ not tested
android armel ✅️ not tested
macos amd64 ✅️ not tested

Statically linked loader:

system arch static link sel_ldr run helloworld nexe
windows amd64 not implemented not implemented
windows i686 not implemented not implemented
mingw amd64 ✅️ not tested
mingw i686 ✅️ not tested
linux amd64 ✅️ ✅️
linux i686 ✅️ not tested
linux armhf ✅️ ❌️ segfault
linux armhf 16k ✅️ not tested
linux armel ✅️ not tested
linux mips ✅️ not tested
android armel ✅️ not tested
macos amd64 ✅️ not tested

Things like linux-mips were tested only because I ported the SCons code to CMake for completeness and make sure I forgot nothing.

I tested android-armel because I remember that in the past @cu-kai tried to get daemon-tty running on Android to get a console for his server.

@illwieckz illwieckz marked this pull request as draft April 7, 2025 18:42
@illwieckz illwieckz force-pushed the illwieckz/cmake branch 2 times, most recently from 2775579 to 993144e Compare April 7, 2025 20:34
@slipher
Copy link
Member

slipher commented Apr 7, 2025

Is it really so bad to use Scons? Unlike CMake, it can handle multiple toolchains, which makes it well-suited for this repo. Also if we keep the same build system, we can easily compare things between our version and upstream.

@illwieckz
Copy link
Member Author

illwieckz commented Apr 7, 2025

Is it really so bad to use Scons?

Yes. 😅️

Unlike CMake, it can handle multiple toolchains, which makes it well-suited for this repo.

Uh, totally not. The Scons scripts are not compatible with cross-compiling to begin with.

First purpose of this effort is to make possible to use multiple toolchains.

@slipher
Copy link
Member

slipher commented Apr 7, 2025

The Scons scripts are not compatible with cross-compiling to begin with.

Wrong. With Chromium native_client I can do ./scons --mode=nacl,opt-linux platform=x86-32 sel_ldr irt_core_raw or ./scons --mode=nacl,opt-linux platform=arm sel_ldr irt_core_raw and everything completes successfully and produces executables of the expected architectures.

@illwieckz
Copy link
Member Author

How do I build with MinGW for Windows on Linux? For Linux Arm?

How do I make a static nacl_loader? How do I rebuild with a 16k PageSize for Arm?

This scons stuff is over-convoluted…

@illwieckz
Copy link
Member Author

illwieckz commented Apr 7, 2025

Wrong. With Chromium native_client I can do ./scons --mode=nacl,opt-linux platform=arm sel_ldr

With this exact command I get that:

Exception: Cannot find a toolchain for arm in toolchain/linux_x86/pnacl_newlib_raw:
  File "SConstruct", line 2799:
    if UsingNaclMode(): nacl_env = nacl_env.Clone(
  File "/usr/lib/python3/dist-packages/SCons/Environment.py", line 1610:
    apply_tools(clone, tools, toolpath)
  File "/usr/lib/python3/dist-packages/SCons/Environment.py", line 117:
    _ = env.Tool(tool)
  File "/usr/lib/python3/dist-packages/SCons/Environment.py", line 2033:
    tool(self)
  File "/usr/lib/python3/dist-packages/SCons/Tool/__init__.py", line 265:
    self.generate(env, *args, **kw)
  File "site_scons/site_tools/naclsdk.py", line 756:
    _SetEnvForNativeSdk(env, root)
  File "site_scons/site_tools/naclsdk.py", line 109:
    raise Exception("Cannot find a toolchain for %s in %s" %

I have arm-linux-gnueabihf-gcc (from the gcc-arm-linux-gnueabihf package), and also clang.

@slipher
Copy link
Member

slipher commented Apr 7, 2025

How do I build with MinGW for Windows on Linux?

Found some documentation: https://github.com/DaemonEngine/native_client/blob/master/docs/build_systems.md. Although cross-architecture builds are supported, cross-OS builds are not. And you can't build with MinGW at all as it is designed for the MSVC toolchain. It seems building with MinGW would imply a porting effort beyond just the build system, as there is, e.g., a Microsoft assembler file.

The lack of cross-OS support would seem to be a limitation of Native Client though, not a limitation of Scons.

For Linux Arm?

I posted that in the previous message.

Wrong. With Chromium native_client I can do ./scons --mode=nacl,opt-linux platform=arm sel_ldr

With this exact command I get that:

You're getting an error finding a PNaCl toolchain, which should have been downloaded by gclient or whatever. Make sure you are in a fully equipped Chromium environment.

@illwieckz
Copy link
Member Author

Wrong. With Chromium native_client I can do ./scons --mode=nacl,opt-linux platform=arm sel_ldr

With this exact command I get that:

You're getting an error finding a PNaCl toolchain, which should have been downloaded by gclient or whatever. Make sure you are in a fully equipped Chromium environment.

Why do I need a PNaCl toolchain to build an Arm sel_ldr?

The lack of cross-OS support would seem to be a limitation of Native Client though, not a limitation of Scons.

Yes, all that scons code in the repository is not meant for cross-compilation, that's what I meant.

And you can't build with MinGW at all as it is designed for the MSVC toolchain. It seems building with MinGW would imply a porting effort beyond just the build system, as there is, e.g., a Microsoft assembler file.

Very annoying… I have seen they also have some Cygwin code, so I wonder if that can be used on MinGW, I haven't looked at this deeply though.

@illwieckz
Copy link
Member Author

Sorry, the output of ./scons --mode=opt-linux platform=arm sel_ldr is (I forgot to remove the nacl mode):

AttributeError: 'SConsEnvironment' object has no attribute 'Program':
  File "SConstruct", line 3889:
    BuildEnvironments(selected_envs)
  File "site_init", line 198:
    
  File "/usr/lib/python3/dist-packages/SCons/Util/envs.py", line 242:
    return self.method(*nargs, **kwargs)
  File "site_scons/site_tools/defer.py", line 148:
    func(env)
  File "site_init", line 125:
    
  File "/usr/lib/python3/dist-packages/SCons/Script/SConscript.py", line 598:
    return _SConscript(self.fs, *files, **subst_kw)
  File "/usr/lib/python3/dist-packages/SCons/Script/SConscript.py", line 285:
    exec(compile(scriptdata, scriptname, 'exec'), call_stack[-1].globals)
  File "src/trusted/validator_arm/build.scons", line 271:
    nexe = untrusted_env.ComponentProgram(test, 'testdata/' + test + '.S',
  File "/usr/lib/python3/dist-packages/SCons/Util/envs.py", line 242:
    return self.method(*nargs, **kwargs)
  File "site_scons/site_tools/component_builders.py", line 485:
    out_nodes = env.Program(prog_name, *args, **kwargs)

@illwieckz
Copy link
Member Author

This can already build sel_ldr on linux-amd64.

@illwieckz
Copy link
Member Author

This can now rebuild sel_ldr on linux-armhf.

@illwieckz
Copy link
Member Author

This can now rebuild sel_ldr on linux-i686.

@illwieckz
Copy link
Member Author

This can now rebuild nacl_helper_bootstrap on both linux-amd64, linux-i686 and linux-armhf.

@illwieckz
Copy link
Member Author

This can now rebuild sel_ldr on macos-amd64.

@illwieckz illwieckz force-pushed the illwieckz/cmake branch 6 times, most recently from 21e0ec0 to 49d6042 Compare April 10, 2025 00:43
@illwieckz
Copy link
Member Author

So, now, when the compiler is not a Clang-derivative, it build the nacl_helper_bootstrap binary with Clang in a subproject (like we do with NaCl VMs), because nacl_helper_bootstrap only works when built with Clang.

@illwieckz
Copy link
Member Author

So, it looks like the ET_DYN check in nacl_helper_bootstrap was just some lazy check for the file being in executable format. Adding another check for ET_EXEC makes it possible to run statically build sel_ldr!

$ file sel_ldr
sel_ldr: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), statically linked, BuildID[sha1]=cbf906bedf2088500b220f244900ad9c94399fbf, for GNU/Linux 3.2.0, not stripped

$ ./nacl_helper_bootstrap ./sel_ldr --r_debug=0xXXXXXXXXXXXXXXXX --reserved_at_zero=0xXXXXXXXXXXXXXXXX -a -S -B irt_core.nexe -- helloworld-amd64.nexe
DEBUG MODE ENABLED (bypass acl)
[306836,200389696:02:55:52.963842] BYPASSING ALL ACL CHECKS
[306836,200389696:02:55:52.964008] Native Client module will be loaded at base address 0x0000039700000000
hello world

@illwieckz
Copy link
Member Author

Now, what's remaining on the list of things I would like to see being implemented is to get a 16k PageSize sel_ldr binary for linux-armhf.

@illwieckz
Copy link
Member Author

So, I found a way to build a sel_ldr for linux-armhf that uses 16K PageSize:

$ readelf -l sel_ldr | grep LOAD
  LOAD           0x000000 0x00000000 0x00000000 0xf9704 0xf9704 R E 0x4000
  LOAD           0x0fb980 0x000ff980 0x000ff980 0x08f4c 0x1b6c8 RW  0x4000

But the nacl_helper_bootstrap is still using a 4K PageSize:

$ readelf -l nacl_helper_bootstrap | grep LOAD
  LOAD           0x000000 0x00010000 0x00010000 0x02666 0x02666 R E 0x1000
  LOAD           0x000000 0x00013000 0x00013000 0x00000 0x01005 RW  0x1000
  LOAD           0x000000 0x00015000 0x00015000 0x3ffed000 0x3ffed000     0x1000
  LOAD           0x003000 0x40002000 0x40002000 0x00014 0x00014 RW  0x1000

@illwieckz illwieckz force-pushed the illwieckz/cmake branch 2 times, most recently from fd5719a to 555055d Compare April 10, 2025 02:29
@illwieckz
Copy link
Member Author

So, I found a way to build nacl_helper_bootstrap for linux-armhf with 16K PageSize:

$ readelf -l nacl_helper_bootstrap | grep LOAD
  LOAD           0x000000 0x00010000 0x00010000 0x02666 0x02666 R E 0x4000
  LOAD           0x000000 0x00014000 0x00014000 0x00000 0x01005 RW  0x4000
  LOAD           0x002000 0x00016000 0x00016000 0x3ffec000 0x3ffec000     0x4000
  LOAD           0x006000 0x40002000 0x40002000 0x00014 0x00014 RW  0x4000

I have not tested it though, and I wonder if the armhf nexe also have to be built with a 16K PageSize too.

@illwieckz
Copy link
Member Author

To run on 16K PageSize kernel, it looks like we would also need to rebuild the nexe as well:

$ readelf -l irt_core-armhf.nexe | grep LOAD
  LOAD           0x010000 0x0ffc0000 0x0ffc0000 0x30000 0x30000 R E 0x10000
  LOAD           0x000000 0x3efe0000 0x3efe0000 0x041b8 0x041b8 R   0x10000
  LOAD           0x0041b8 0x3eff41b8 0x3eff41b8 0x00b90 0x011b0 RW  0x10000

@slipher
Copy link
Member

slipher commented Apr 10, 2025

The nexe is loaded by sel_ldr, not the system executable loader right? So maybe its "page size" doesn't matter, or maybe the code of sel_ldr has to be changed. Hard to speculate since I don't understand the concept of page size of an executable.

@illwieckz
Copy link
Member Author

illwieckz commented Apr 10, 2025

As far as I know, this is related to the kernel, not the executable loader.

Systems like Box64 or FEX-Emu to run amd64/i686 binaries on arm64 are required to do some translation on the binaries (not just for the architecture, but for the page size itself). Wine cannot run them out of the box for the same reason, even with an amd64 translator.

To run 4k binaries on a 16k kernel, the current solution is to run a 4k secondary kernel in a microvm using some pass-through techniques for input and graphics: https://asahilinux.org/2024/10/aaa-gaming-on-asahi-linux

I assume the NaCl virtual machine is not low-level enough to emulate a kernel interface, it would be surprising.

@slipher
Copy link
Member

slipher commented Apr 10, 2025

https://tristanxr.com/post/why-16k-page-size/ has more details on Asahi Linux's experiences with page size portability issues. It says that most Linux programs work without issues. The ones with problems are ones that manage their own memory mappings somehow, e.g. using a custom allocator instead of libc's. In the thing about Windows games you linked, the problem is probably with the Windows memory allocator or some other part of the Windows runtime. The problem is surely not with executable alignment: given that you are translating to a different ISA, you can lay out the translated instructions however you want. So I don't think the section alignments in the program header really matter for 16k page compatibility. As a last bit of evidence, I tried setting a binary's executable section's alignment to 1, and it still worked.

I do very much expect that sel_ldr is a program that manages its own memory mappings. Firstly, I believe that it maps the nexe into memory itself, rather than using the kernel's executable loading. Secondly, The NaCl sandbox model specifies a specific range of memory that the untrusted code is allowed to read/write access. All memory allocations by untrusted code must go in this range and all allocations by trusted code outside it. If there is an NaCl syscall for mapping more memory pages, we should check whether the page size is a hard-coded number, baked in at compile time from a system header, or queried at runtime.

Note that if you read the nexe program header dump closely, the alignment is not 4k, but 64k. So no problem even if it somehow mattered.

@illwieckz illwieckz force-pushed the illwieckz/cmake branch 2 times, most recently from 2dc28fc to a5ae566 Compare April 11, 2025 00:31
@illwieckz
Copy link
Member Author

The nacl_helper_bootstrap for linux-armhf crashes when the sel_ldr is linked statically.

It works on linux-amd64, but not on linux-armhf… That's annoying because arm is the platform where it would be easier to have a static sel_ldr (less files to ship, and a simpler loading chain).

Even when building with debug symbols, I cannot debug the tool:

Reading symbols from ./nacl_helper_bootstrap...
(gdb) r
Starting program: ./nacl_helper_bootstrap ./sel_ldr --r_debug=0xXXXXXXXXXXXXXXXX --reserved_at_zero=0xXXXXXXXXXXXXXXXX -a -S -B irt_core.nexe -- helloworld-armhf.nexe

Program received signal SIGSEGV, Segmentation fault.
0xf7f34924 in ?? ()
(gdb) bt
#0  0xf7f34924 in ?? ()
#1  0xf7ea78c0 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

@illwieckz illwieckz force-pushed the illwieckz/cmake branch 4 times, most recently from a8436ab to 961bfa0 Compare April 11, 2025 04:31
@illwieckz
Copy link
Member Author

illwieckz commented Apr 13, 2025

The inability to build a working nacl_helper_bootstrap on amd64 is in fact a GCC bug.

When running on a debugger, It fails at the empty line preceding the assembly startup code (outside of any code block).

When building with GCC 8 instead of GCC 13, it works.

I also noticed that building it with optimization enabled whatever the GCC version tested introduces other bugs. Disabling optimization (-O0) fixes it on older GCC like GCC 8, but is not enough to fix it on GCC 13.

I'll make a bug report to GCC at some point, with some reduced source sample.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants