Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(compiler)!: Custom Grain object files #2104

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open

Conversation

ospencer
Copy link
Member

@ospencer ospencer commented May 4, 2024

Overview

This PR replaces Grain's intermediate .gr.wasm files with Grain object files. This is for a variety of reasons, including:

  • Debuggability. Our wasm-based linker is unable to reconstruct debugging/sourcemap information from wasm binaries. This approach once again generates proper sourcemaps. There is additional work to be done to make the sourcemaps truly usable, but this is a major step in the right direction.
  • Optimizations. This change will allow us to do "whole program" optimizations to better assist Binaryen with its optimizations.
  • Performance. Going from our IR to Binaryen IR and serializing is the bulk of compilation time. Since we no longer are serializing/deserializing many files, there is a performance boost. For a hello world program, cold compile time is down 10% and warm compile time is down 25%. When running in JS, cold compile time is down 15% and warm compile time is down 21%. This also gets subsecond warm builds in JS on my machine.

Technical details

Grain object files use the .gro extension. As no wasm files are specific to Grain, the default output filename from the compiler uses the .wasm extension rather than .gr.wasm.

Object files follow this layout:

magic version len version cmi length cmi mash_code
0xF0 0x9F 0x8C 0xBE 5 0.6.3 1024 <cmi> <mash_code>

We use OCaml's Marshal module as our binary format (and for Whole Grain, we can use our Marshal module). js_of_ocaml can't marshal floats into a format that can safely be read back by native code. I wanted object files to be the same between native and js, so we store floats as int64s in the mashtree to guarantee consistent behavior.

Reviewing this PR

The code delta for this PR is fairly small, but snapshot tests are now sexprs of mashtrees. As such, we have all new snapshots. I'd recommend clicking the first commit to review the code diff, or using the GitHub UI to hide changes to snapshots.

Closes #622
Closes #842
Closes #1754
Closes #1903
Closes #1997

@@ -377,6 +377,6 @@ describe("basic functionality", ({test, testSkip}) => {
~config_fn=smallestFileConfig,
"smallest_grain_program",
"",
4769,
6424,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this go up?

Is it because of names changing and debug info?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to leave a comment about this; thanks for pointing it out. The old linker had a concept of hard/soft dependencies. If a file was a dependency but no values/functions from that module were actually uses (just types or an unused import), then the linker would compile in type information but not link in the actual files.

At first thought this seems like good behavior, but if this is done, side effects from importing those modules don't happen, which is bad. For example, a normal Grain program that doesn't use anything in Pervasives doesn't link in Pervasives, which means the setupExceptions side effects don't happen, and throwing exceptions in those programs prints GrainError instead of the actual error. In practice, users don't run into this because most programs use something from Pervasives.

This new linker links in all dependencies, which is more correct. As I mentioned, in practice this doesn't make a real difference for most users. Side note: the major reason the original implementation did this was because the compiler wasn't great at removing unused code from the resulting binary and this resulted in large module sizes. The compiler has gotten much better and this isn't really an issue anymore.

I also discovered a few more configurations producing smaller programs while working on this, as small as 25 bytes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay then this probably closes #1903 then.

@ospencer ospencer force-pushed the oscar/gro-linking branch 2 times, most recently from 92e6423 to 89fcb11 Compare May 4, 2024 16:15
@ospencer ospencer changed the base branch from main to oscar/digest May 4, 2024 16:19
compiler/src/codegen/transl_anf.re Show resolved Hide resolved
compiler/src/codegen/mashtree.re Outdated Show resolved Hide resolved
compiler/src/typed/cmi_format.re Show resolved Hide resolved
compiler/src/codegen/compcore.re Show resolved Hide resolved
Base automatically changed from oscar/digest to main July 28, 2024 22:04
@ospencer
Copy link
Member Author

@alex-snezhko This should be ready for you to look at again.

Copy link
Member

@alex-snezhko alex-snezhko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not see any obvious problems, LGTM!

Function.set_start(wasm_mod, start);
} else {
ignore @@
Export.add_function_export(wasm_mod, "_start", Comp_utils.grain_start);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: inconsistent use of grain_start constant and "_start" literal (a few other times in the lines above)

Copy link
Member

@spotandjake spotandjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks great to me, I love the new implementation but I have a few questions.

We should probably also update the contributer docs in this pr so they do not get too far out of date.

compiler/src/codegen/compcore.re Show resolved Hide resolved
compiler/src/codegen/compcore.re Show resolved Hide resolved
compiler/src/codegen/compcore.re Show resolved Hide resolved
compiler/src/codegen/compcore.re Outdated Show resolved Hide resolved
compiler/src/codegen/emitmod.re Show resolved Hide resolved
compiler/src/codegen/emitmod.re Outdated Show resolved Hide resolved
compiler/src/codegen/emitmod.re Show resolved Hide resolved
compiler/src/codegen/emitmod.re Show resolved Hide resolved
compiler/src/codegen/linkedtree.re Outdated Show resolved Hide resolved
compiler/src/typed/cmi_format.re Show resolved Hide resolved
Copy link
Member

@spotandjake spotandjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, can we also open an issue to update the linking section in the contributer docs, we should probably include the binary format there, and some stuff from the pr description.

@ospencer ospencer force-pushed the oscar/gro-linking branch 3 times, most recently from d6920a8 to e4632f3 Compare December 13, 2024 00:58
Copy link
Member

@spotandjake spotandjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me, I had one tiny question and it looks like some of the tests are failing currently.

I love that we are using an exceptionMod flag it seems a lot less fragile then file paths.

Before we get this merged could we update the static linking contributor docs I think the pr description covers a lot of the content we would want.

compiler/src/codegen/compcore.re Show resolved Hide resolved
@@ -149,6 +149,7 @@ let next_state = (~is_root_file=false, {cstate_desc, cstate_filename} as cs) =>
Grain_utils.Config.apply_attribute_flags(
~no_pervasives=has_attr("noPervasives"),
~runtime_mode=has_attr("runtimeMode"),
~no_exception_mod=has_attr("exceptionMod"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about "noExceptionMod" to keep with the pattern of "noPervasives"? I assume you did this because currently this attribute would only be used on the exception module so it is taken to mean "this is the exception module", but "noExceptionMod" more accurately reflects what the flag is actually doing I feel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought of noExceptions but I thought that was too ambiguous, but I've come around on it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@ospencer
Copy link
Member Author

@spotandjake I updated the contributor docs.

Copy link
Member

@spotandjake spotandjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great oscar, im excited to get this into main I had one tiny NIT left but otherwise lgtm.

@@ -1,5 +1,5 @@
@noPervasives
@exceptionMod
@noExceptions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny NIT, do we just want to add a comment here something like, // Exceptions cannot be thrown from the exception module or we would have a circular dependency.?

Copy link
Member

@marcusroberts marcusroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I really like the new approach, and the code read fine - I'll not spot any bugs though but looking forward to this being the new way of generating code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants