Releases: tetratelabs/wazero
v1.3.1
Aren't you excited for the imminent release of Go 1.21 with GOOS=wasip1 GOARCH=wasm
? We are!
wazero 1.3.1 is a minor release with important changes. No public APIs have been affected in any way, but we improved portability and further moved forward support to nonblocking I/O, especially on Windows!
Remember to star wazero, and check our community for other cool wazero users like you!
Improve Support to Cross-Compilation
@SlashScreen noticed (#1578) that builds for GOOS=plan9
were failing due to some missing constant definitions; this is somewhat related to an earlier issue reported by @supermigo (#1526) that prevents building wazero against the new GOOS=wasip1
GOARCH=wasm
target pair with Go 1.21.
@codefromthecrypt has led a spike (#1582, #1584, #1586, #1588) of refactorings to decouple our core from the direct dependency on the syscall
package, which is obviously highly platform-specific; for instance, some error codes or flags may only exist on some platforms: referring to such constants in shared code paths prevents a successful cross-platform build. #1582 introduces sys.Errno
as an abstraction over platform-specific error codes, and #1586 defines fsapi.Oflag
to abstract over file open flags.
If you are interested in that kind of oddity, support to GOOS=wasip1
will allow you to run a wasm wazero on a native wazero because self-hosting is the ultimate test for a runtime, but also because... why not? 🔥
Improve Nonblocking I/O
This release is also improving support for nonblocking I/O, first by fixing a bug (#1538) that was originally addressed with #1542, but ultimately was not completely resolved. @johanbrandhorst reported that the Go CI was still producing corrupted output when writing to a redirected stdout. @evacchi uncovered the root cause of the bug rather quickly; in fact, most time was spent writing an automated test case that reproduced the conditions of the Go CI (i.e. spawning a stand-alone wazero process and hooking that stdout to a buffer). This was fixed with #1581.
Notably, @evacchi moved issue #1500 another step closer to resolving, i.e., nonblocking I/O on Windows, by emulating select
using a few different Windows APIs. Details on how this has been realized are described in #1579 and are summarized in /RATIONALE.md. Further work is still needed to close the gap in poll_oneoff
.
Minor changes
- @abraithwaite contributed the RunReveal project to community page (#1577)
- @codefromthecrypt bumped the CI version of Go to the latest Go 1.21rc3 (#1589)
v1.3.0
wazero 1.3.0 is ready for next month's release of Go 1.21.
The new GOOS=wasip1 GOARCH=wasm
will be very popular: It is the first WebAssembly platform for non-browser use. A %.wasm
file compiled with Go 1.21 runs fine in wazero. This is thanks to efforts on both sides, CI with gotip
and the latest release candidate (1.21rc2).
Go 1.21 is already bringing new developers to WebAssembly: We suggest you unconditionally upgrade in anticipation of demand. Besides this and bug fixes, you may also be interested in our new sys.Stat_t
type used for fs.FS
integration.
Don't forget to star wazero and any project relevant to you, made by our community. Let's get into the details of what's new!
Go 1.21 ready!
wazero has two relationships to Go 1.21:
- Can wazero be a dependency of a project compiled with Go 1.21?
- Can wazero run wasm compiled with Go 1.21's
GOOS=wasip1 GOARCH=wasm
?
We have made significant progress on both these points. We now run tests on every change with the latest Go 1.21 release candidate and gotip
(last commit refreshed weekly).
wazero tests now run with go1.21rc2
wazero includes logic conditional on Go versions, notably to dodge known problems on Windows with Go 1.18. This logic was made more flexible to be forwards compatible with any subsequent version. There was also a platform difference in setting file times, fixed by @evacchi. wazero now runs tests with the latest release candidate on every change, as well as everything it was testing before.
Go no longer skips wasip1 tests if the runtime is wazero
Before, there were a couple of standard library tests skipped on wazero. Notably, these were around non-blocking I/O (os.Stdout
and io.Pipe()
) and pre-opened sockets. @chriso and @evacchi collaborated to both fix these and also remove the special casing in golang/go. Edo even went beyond to improve code not tested upstream such as named pipes on Windows!
Go and TinyGo share the same heuristics when reading directories
Many compilers use wasi-libc directly to implement functions such as readdir. Go doesn't rely on this, so all logic in GOOS=wasip1
is implemented directly. TinyGo's -target=wasi
is a hybrid where some features are implemented in Go and others imported from wasi-libc.
@GeorgeMac noticed some inconsistency when wrapping file systems due to this, where files without inodes were filtered out. Details such as this are not defined in wasip1
: while POSIX has something to say, there is murky water. After a long and detailed investigation between @achille-roussel and @codefromthecrypt, three things happened:
- wazero improved its /RATIONALE.md dramatically around inodes.
- @achille-roussel championed consistent behavior ending in merged pull requests both to Go and TinyGo.
- wazero added a new way to control stat info, discussed below.
Custom file stat
fs.FileInfo
returned from Stat
can optionally include raw info from its Sys()
method. In Unix systems, this returns *syscall.Stat_t
, allowing access to more timestamps and the file's inode.
However, *syscall.Stat_t
is not platform agnostic: its field lengths vary, and the type doesn't exist in Windows or virtual filesystems such as go:embed
.
Our first answer to this problem is to define and process a new type, *sys.Stat_t
, if returned from info.Sys()
. This allows developers to intercept *os.File
's Readdir
or any fs.File.Stat
functions to backfill inode information. For example, if a Readdir
returns file info with this, it can set a non-zero inode. This prevents a performance penalty caused by wasi-libc, which would otherwise fan out with a Stat
call for each directory entry.
It is understood this first feature is not for most integrators. However, it is an important milestone for filesystem performance and customization. wazero 1.4 will promote our internal writeable API to experimental, allowing full control over filesystem features used by wasip1
.
Minor changes
There were numerous small changes in 1.3, thanks to folks who brought them to our attention. Notable call out to @abraithwaite, @ncruces, @leighmcculloch who helped us understand edge cases on resource cleanup and close state concerns.
- fixed a performance regression on fd_readdir since 1.2.0
- adds integration tests for TinyGo and
GOOS=wasip1
covering custom file systems. - added
IsClosed()
toapi.Module
for better error handling when a module exits during initialization (_start
). - numerous improvements in trap code by @ncruces.
- compiler memory leak mitigation by @inkeliz.
- experimental: Adds CloseNotifier for custom cleanup tasks.
- experimental: Removes HTTP and filesystem features only supported in
GOOS=js
.
v1.2.1
wazero 1.2.1 is an easy upgrade decision for everyone: There is no public API change of any kind, yet updating gives you a more portable and faster wasm runtime.
This release was helped and motivated by friendly projects in our ecosystem, please star their repos!
- go needs non-blocking I/O for the new
GOOS=wasip1
, for example, running an HTTP server with middleware, using only one thread. - go-sqlite3 wants smaller and faster machine code to make queries more performant.
- wasi-go collaborated on a different approach to non-blocking I/O.
As usual, we have a hefty amount of background for you to enjoy, even if you aren't a wazero user, yet. Please star our repo if you like it!
Progress in non-blocking I/O
WebAssembly does not yet support true parallelism; it lacks support for multiple threads, atomics, and memory barriers. This may be addressed portably in the future, when the threads proposal standardizes and common concerns like garbage collection employ it. For now, the only way to safely use wasm is sequentially, like how GOMAXPROCS=1
works in Go.
This isn't a problem in a lot of cases, but in I/O it can be. Network services often need to accept new connections while processing existing ones. Interleaving of this isn't viable with blocking I/O. The typical way to address this is using non-blocking I/O, with a loop that looks for channels which are ready and processes accordingly.
As mentioned in the 1.2.0 release, non-blocking I/O is a work in progress, and it is improved in 1.2.1. This is improved so much that @chriso was able to remove skips for HTTP and file-based non-blocking tests in the emerging GOOS=wasip1
in Go 1.21.
@evacchi focused on the problem of non-blocking I/O, both network sockets and files. He both invented some solutions and leveraged existing approaches from wasi-go to fix issues such as inability to use HTTP middleware in Go source compiled to wasm. To make sure it worked, he landed new tests, for example C code compiled with wasi-libc and go(tip) compiled with GOOS=wasip1
.
The changes are all transparent to end users, and while Edo led this effort, @achille-roussel and @chriso deserve a large thank you for support and the some prior art in wasi-go.
One final note is the battle is not over. We still have work to do in windows and surely there will be more edge cases. Please track issue 1500 and add comments if you noticed any more glitches.
Significant execution time savings
While developing go-sqlite3, @ncruces noticed some opportunities both to save size in machine code and performance. This focused on "traps" which are unresolvable execution errors that happen in cases such as divide by zero or invalid bounds conditions. The basic idea was to centralize the concern, so that any instruction that could trap uses the same way out of machine code.
While developing this, Nuno found a glitch which is fast-pathing these cases can interfere with source code mapping. For example, if you were using a debugger or using a DWARF-enabled trace, the line numbers could be wrong. After discussing with others, a pragmatic way out was chosen: optimize when there is either no debug information (usually the case in release builds) or if RuntimeConfiguration.WithDebugInfoEnabled
is set to false. The latter is helpful, because it can be difficult to prevent a compiler from obfuscating function names, so some may use debug builds always. This is handy in the case of a rare bug such as a nil pointer, because you can still identify original source function by name.
While led by @ncruces, others were involved in review and design. This was made easier because Nuno kept excellent notes and comments, as well made speedtest1 to test results. @achille-roussel and @mathetake contributed feedback and @evacchi ported the implementation over to arm64 a lot easier due to the rigor involved.
The end results are really quite excellent, especially as debug info is rarely used in release builds. For example, an unrelated project kube-scheduler-wasm-extension performance improved with real data, up to 6.5%, with no other changes except upgrading to the latest patch.
Note on TinyGo v0.28.1
We updated our TinyGo examples to v0.28.1, which supports more features (such as ReadDir and json), and more idiomatic wasm import signatures. A lot went into this release, so please thank the TinyGo team with a star!
- //go:wasm-module env
- //export log
+ //go:wasmimport env log
func _log(ptr, size uint32)
v1.2.0
wazero 1.2.0 includes 33 days of valiant effort towards performance, interop and debug goals, shared not only in wazero, but WebAssembly in general. We appreciate folks waiting a couple more days than usual and expect you'll enjoy what you see below.
While we haven't set a formal post 1.0 release cadence, you can expect another patch or minor within a month. Meanwhile, this is our most performant and best tested release yet. As always, star all the projects that interest you, and say thanks when you can.
Performance
Performance is something we aim to always improve, release to release. This includes looking at memory usage as well as latency. While there were multiple people involved in efficiency work, @achille-roussel and @lburgazzoli deserve special call outs for leading efforts, and @ncruces deserves a big pat on the back for contributing reviews, cleanups and advice.
@achille-roussel made many changes internal to our compiler, as well linux-only specializations such as using huge pages for the memory mapped regions under wasm functions. These were all profile and benchmark guided changes and proposed in top rigor.
@lburgazzoli tracked down best practice in TinyGo, consolidating advice from various experts, such as the primary developer of TinyGo @aykevl. He worked with @ncruces to make sure our allocation example is not just a code snippet, but an exemplar of good practice, without risk of memory leaks and performance validated with benchmarks.
The combination of backend work (e.g. runtime improvements) and frontend work (e.g. changes to our TinyGo example) combined in a notable holistic gain across the board. This was true teamwork and a job well done!
$ benchstat v1.1.0.txt v1.2.0.txt
goos: darwin
goarch: arm64
pkg: github.com/tetratelabs/wazero/internal/integration_test/vs/compiler
│ v1.1.0.txt │ v1.2.0.txt │
│ sec/op │ sec/op vs base │
Allocation/Compile-12 3.365m ± 1% 3.174m ± 1% -5.66% (p=0.002 n=6)
Allocation/Instantiate-12 149.1µ ± 26% 120.4µ ± 7% -19.23% (p=0.002 n=6)
Allocation/Call-12 1.404µ ± 2% 1.297µ ± 2% -7.66% (p=0.002 n=6)
geomean 88.97µ 79.13µ -11.05%
│ v1.1.0.txt │ v1.2.0.txt │
│ B/op │ B/op vs base │
Allocation/Compile-12 2.404Mi ± 0% 1.292Mi ± 0% -46.24% (p=0.002 n=6)
Allocation/Instantiate-12 319.4Ki ± 0% 230.5Ki ± 0% -27.84% (p=0.002 n=6)
Allocation/Call-12 48.00 ± 0% 48.00 ± 0% ~ (p=1.000 n=6) ¹
geomean 33.28Ki 24.27Ki -27.07%
¹ all samples are equal
│ v1.1.0.txt │ v1.2.0.txt │
│ allocs/op │ allocs/op vs base │
Allocation/Compile-12 1.830k ± 0% 1.595k ± 0% -12.84% (p=0.002 n=6)
Allocation/Instantiate-12 803.0 ± 0% 508.0 ± 0% -36.74% (p=0.002 n=6)
Allocation/Call-12 5.000 ± 0% 5.000 ± 0% ~ (p=1.000 n=6) ¹
geomean 194.4 159.4 -18.00%
¹ all samples are equal
Interop
Compatibility is a moving target as both specifications change as well understanding of specifications. For example, the WebAssembly Core Specification 2.0 remains in a draft state, and expectations of the VM change as it changes. Also the d'facto WASI version preview1 (a.k.a. wasip1) had no tests, nor detailed documentation for the first several years of its existence. This left interop as more a quorum of implementation practice vs a spec. While new initiatives such as the wasi-testsuite and wasix aim to stabilize this, WASI compatibility remains a source of work from wazero maintainers and compiler developers. We really appreciate the efforts spent here to keep as many users unaware of these glitches as possible.
On the WebAssembly Core (VM) side, we appreciate @mathetake updating our code and spec suite to pass latest changes there. Also, we appreciate an attempt by @anuraaga with support by @ncruces on the Threads proposal, despite us ending up parking the idea until the proposal finishes.
On the WASI side, we appreciate a lot of work driven by the team working on Go. Specifically, the upcoming GOOS=wasip1
planned for 1.21 helped reveal a number of grey areas that required work to support in Go without breaking other languages. Championing these came from various team members including @Pryz, @achille-roussel and @evacchi on various file rights and non-blocking related glitches, some fixing other language runtimes such as python.
We're also excited that @evacchi began an experiment to support sockets, currently working for blocking requests. As wasm only has one thread to use, libraries often need non-blocking functionality to do anything notable. We'll report more on sockets once non-blocking glitches sort out.
Meanwhile, those using sockets know that the preview1 version of WASI is extremely limited. There are other ABI such as wasmedge_wasi_socket, wasi-sockets and most recently wasix. All of these go beyond the simple TCP sock accept, read/write in wasip1. If you'd like the bleeding edge socket support, please try wasi-go and request the features you want to experiment with. This project is a layer over wazero with an alternate syscall layer. wasi-go can move faster due to less constraints than upstream, such as Windows or virtual files. This makes it a lower risk and ideal playground to develop evolving host function (ABI) specifications. As these functions mature, what makes sense to build-in will land upstream in wazero.
Debug
We are very excited to power the only known out-of-browser CPU and memory profiler for WebAssembly, wzprof. wzprof brings the power of pprof to wasm, regardless of if the source language is Go or not.
wzprof has already served a lot of benefits in its short life so far. For example, the kube-scheduler-wasm-extension used it to isolate a garbage collection problem endemic in large protobuf decoders compiled to wasm.
Implementing this required significant work in wazero, which we are happy went upstream! @pelletier added the source offset to experimental.StackIterator
which allows source-mapping in debugging use cases. @achille-roussel polished the experimental.FunctionListener
to be more performant, including optimizing its API around errors and removing context propagation. @chriso and @mathetake helped fix some glitches along the way. Finally, @achille-roussel made it easier to develop 3rd party listeners by adding experimental.FunctionListenerFactory
to supply them and wazerotest.Module
to test them.
While we don't expect a lot of people to implement listeners, it was a great team effort to get the substrate together to the point you can build a profiler on top of it. Kudos especially to the wzprof team on finally giving wasm developers a decent profiler!
What's next?
In the short term, we'll try to close the gaps on non-blocking I/O inside WASI. We still aim to have a fully pluggable filesystem soon, including an in-memory option for those needing to provide something like tmpfs from Go. Of course compatibility issues and user demands will take priority as they always do.
Longer term, @mathetake is taking on the task of an optimizing compiler affectionately named wazevo. This will likely take a year to mature, and will narrow the performance gap on certain libraries like libsodium without adding any platform dependencies whatsoever.
Meanwhile, if you want updates you can always contact the community and ask, or just wait for the next release. Until next time!
v1.1.0
wazero 1.1.0 improves debug, reduces memory usage and adds new APIs for advanced users.
1.1 includes every change from prior versions. The below elaborates the main differences, brought to you by many excellent engineers who designed, reviewed, implemented and tested this work. Many also contribute to other areas in Go, TinyGo and other languages, as well specification. If you are happy with the work wazero is doing for Go and Wasm in general, please star our repo as well any projects mentioned!
Now, let's dig in!
Debug
This section is about our debug story, which is better than before thanks to several contributors!
Go stack trace in the face of Go runtime errors
When people had bugs in their host code, you would get a high-level stack trace like below
2023/04/25 10:35:16 runtime error: invalid memory address or nil pointer dereference (recovered by wazero)
wasm stack trace:
env.hostTalk(i32,i32,i32,i32) i32
.hello(i32,i32) i64
While helpful, especially to give the wasm context of the error. This didn't point to the specific function that had the bug. Thanks to @mathetake, we now include the Go stack trace in the face of Go runtime errors. Specifically, you can see the line that erred and quickly fix it!
2023/04/25 10:35:16 runtime error: invalid memory address or nil pointer dereference (recovered by wazero)
wasm stack trace:
env.hostTalk(i32,i32,i32,i32) i32
.hello(i32,i32) i64
Go runtime stack trace:
goroutine 1 [running]:
runtime/debug.Stack()
/usr/local/go/src/runtime/debug/stack.go:24 +0x64
github.com/tetratelabs/wazero/internal/wasmdebug.(*stackTrace).FromRecovered(0x140001e78d0?, {0x100285760?, 0x100396360?})
/Users/mathetake/wazero/internal/wasmdebug/debug.go:139 +0xc4
--snip--
experimental.InternalModule
Stealth Rocket are doing a lot of great work for the Go ecosystem, including implementation of GOOS=wasip1 in Go and various TinyGo improvements such as implementing ReadDir in its wasi target.
A part of success is the debug story. wazero already has some pretty excellent logging support built into the command-line interface. For example, you can add -hostlogging=filesystem
to see a trace of all sys calls made (e.g. via wasi). This is great for debugging. Under the scenes, this is implemented with an experimental listener.
Recently, @pelletier added StackIterator to the listener, which allows inspection of stack value parameters. This enables folks to build better debugging tools as they can inspect which values caused an exception for example. Since it can walk the stack, propagation isn't required to generate images like this, especially as listeners can access wasm memory.
However, in practice, stack values and memory isn't enough. For example, Go maintains its own stack in the linear memory, instead of using the regular wasm stack. The Go runtime stores the stack pointer in global 0. In order to retrieve arguments from the stack, the listener has to read the value of global 0, then the memory. Notably, this global isn't exported.
To work around this, Thomas exposed an experimental interface experimental.InternalModule
which can inspect values given an api.Module
. This is experimental as we still aren't quite sure if we should allow custom host code to access unexported globals. However, without this, you can't effectively debug either. If you have an opinion, please join our slack channel and share it with us! Meanwhile, thank @pelletier and Stealth Rocket in general for all the help in the Go ecosystem, not just their help with wazero!
Memory Usage
This section describes an advanced internal change most users don't need to know about. Basically, it makes wazero more memory efficient. If you are interested in details, read on, otherwise thank @mathetake for the constant improvements!
Most users of wazero use the implicit compiler runtime configuration. This memory maps (mmap
syscall) the platform-specific code wazero generates from the WebAssembly bytecode. Since it was written, this used mmap
once per function in a module.
One problem with this was page size. Basically, mmap
can only allocate the boundary of the page size of the underlying os. For example, a very simple function can be as small as several bytes, but would still reserve up a page each (marked as executable and not reusable by the Go runtime). Therefore, we wasted roughly (len(body)%osPageSize)*function.
The new compiler changes the mmap scope to module. Even though we still need to align each function on 16 bytes boundary
when mmaping per module, the wasted space is much less than before. Moreover, with the code behind functions managed at module scope, it can be cleaned up with the module. We no longer have to abuse the runtime.Finalizer
for cleanup.
Those using Go benchmarks should see improved compilation performance, even if it appears more allocations than before. One tricky thing about Go benchmarks is they can't report what happens via mmap
. The net result of wazero will be less wasted memory, even if you see slightly more allocations compiling than before. These allocations are a target of GC and should be ignorable in the long-running program vs the wasted page problem in the prior implementation, as that was persistent until the compiled module closed.
In summary, this is another example of close attention to the big picture, even numbers hard to track. We're grateful for the studious eyes of @mathetake always looking for ways to improve holistic performance.
Advanced APIs
This section can be skipped unless you are really interested in advanced APIs!
Function.CallWithStack
WebAssembly is a stack-based virtual machine. Parameters and results of functions are pushed and popped from the stack, and in Go, the stack is implemented with a []uint64
slice. Since before 1.0, authors of host functions could implement exported functions with a stack-based API, which both avoids reflection and reduces allocation of these []uint64
slices.
For example, the below is verbose, but appropriate for advanced users who are ok with the technical implementation of WebAssembly functions. What you see is 'x' and 'y' being taken off the stack, and the result placed back on it at position zero.
builder.WithGoFunction(api.GoFunc(func(ctx context.Context, stack []uint64) {
x, y := api.DecodeI32(stack[0]), api.DecodeI32(stack[1])
sum := x + y
stack[0] = api.EncodeI32(sum)
}), []api.ValueType{api.ValueTypeI32, api.ValueTypeI32}, []api.ValueType{api.ValueTypeI32})
This solves functions who don't make callbacks. For example, even though the normal api.Function
call doesn't use reflection, the below would both allocate a slice for params and also one for the result.
Here's how the current API works to call a calculator function
results, _ := add.Call(ctx, x, y)
sum := results[0]
Specifically, both the size and ptr would be housed by a slice of size one. Code that makes a lot of callbacks from the host spend resources on these slices. @inkeliz began a design of a way out. @ncruces took lead on implementation with a lot of feedback from others. This resulted in several designs, with the below chosen to align to similar semantics of how host functions are defined.
wazero now has a second API for calling an exported function, Function.CallWithStack
.
Here's how the stack-based API works to call a calculator function
stack := []uint64{x,y}
_ = add.CallWithStack(ctx, stack)
sum := stack[0]
As you can see above, the caller provides the stack, eliminating implicit allocations. The results of this are more significant on calls that are in a pattern, where the stack is re-used for many calls. For example, like this:
stack := make([]uint64, 4)
for i, search := range searchParams {
// copy the next params to the stack
copy(stack, search)
if err := searchFn.CallWithStack(ctx, stack); err != nil {
return err
} else if stack[0] == 1 { // found
return i // searchParams[i] matched!
}
}
While most end users won't ever see this API, those using shared libraries can get wins for free. For example, @anuraaga updated go-re2 to use this internally and it significantly improved a regex benchmark, reducing its ns/op from 350 to 311 and eliminating all allocations.
wazero treats core APIs really seriously and don't plan to add any without a lot of consideration. We're happy to have had the help of @inkeliz @ncruces as well the many other participants including @achille-roussel @anuraaga @codefromthecrypt and @mathetake.
emscripten.InstantiateForModule
@jerbob92 has been porting various PDF libraries to compile to wasm. The compilation process uses emscripten, which generates dynamic functions. At first, we hard-coded the set of "invoke_xxx" functions required by PDFium. This was a decision to defer design until we understood how common end users would need these. Recently, Jeroen began porting more PDF libraries, including ghostscript and xpdf . These need...
v1.0.3
wazero v1.0.3 improves optimizes compilation on the amd64 platform and fixes bugs notably in Windows packaging.
A few days ago, we released wazero v1.0.2, which notably improved compilation performance. A lot of folks jumped on that release and we found a few glitches in the process. Meanwhile @mathetake finished up optimization work on amd64 compilation.
Here are some of the notable fixes.
- Windows (MSI and winget) installers didn't set the PATH correctly, so wazero wasn't a one-step install. @evacchi fixed this by updating our infrastructure to the latest native WiX toolchain, scrubbing some other glitches along the way. Thanks to @inliquid for reporting this and verifying the fix.
- @jerbob92 has been porting various PDF libraries to compile to wasm, and in the process noticed we didn't handle invalid file descriptors properly. @codefromthecrypt changed our internal code to treat file descriptors as signed integers and negative ones as EBADF.
- @jerbob92 also noticed compiler cache corruption when you switch on and off experimental function listeners. @mathetake fixed this bug.
- @kevburnsjr tried to copy/paste some of our godoc examples and they had drifted. Thanks for fixing that!
Thanks so much for being such an engaging community, where opportunities and misses can action so quickly! If you haven't already, please thank our contributors with a star!
v1.0.2
wazero 1.0.2 improves compiler performance, supports non-blocking stdin and adds a couple new experimental APIs.
Many people were involved in a lot of work in the last 3 weeks. Please reach out and thank them!
Improved compiler performance
wazero has a compile phase (CompileModule) which lowers WebAssembly bytecode into an intermediate representation (IR) and into machine code. This process is CPU and memory intensive and has been optimized significantly since 1.0.1.
We used SQLite wasm, to ensure the encouraging improvements were relevant to real-world use cases.
goos: linux
goarch: amd64
pkg: github.com/tetratelabs/wazero/internal/integration_test/bench
cpu: AMD Ryzen 9 3950X 16-Core Processor
│ v1.0.1.txt │ new.txt │
│ sec/op │ sec/op vs base │
Compilation_sqlite3/compiler-32 1001.9m ± 2% 544.0m ± 2% -45.70% (p=0.001 n=7)
Compilation_sqlite3/interpreter-32 208.57m ± 5% 83.81m ± 5% -59.82% (p=0.001 n=7)
│ v1.0.1.txt │ new.txt │
│ B/op │ B/op vs base │
Compilation_sqlite3/compiler-32 305.10Mi ± 0% 55.31Mi ± 0% -81.87% (p=0.001 n=7)
Compilation_sqlite3/interpreter-32 142.24Mi ± 0% 51.77Mi ± 0% -63.60% (p=0.001 n=7)
│ v1.0.1.txt │ new.txt │
│ allocs/op │ allocs/op vs base │
Compilation_sqlite3/compiler-32 5217.0k ± 0% 343.2k ± 0% -93.42% (p=0.001 n=7)
Compilation_sqlite3/interpreter-32 1770.43k ± 0% 14.00k ± 0% -99.21% (p=0.001 n=7)
The changes to bring the above included a series of refactoring by @evacchi on union types, as well dozens of optimizations by @mathetake, and a couple by @ckaznocha.
All of this was easier due to frequent and thorough advice by @achille-roussel and our latest core maintainer @ncruces. Thanks to all involved for the epic improvement in less than 3 weeks!
Non-blocking stdin
container2wasm is an interesting project that converts containers such that they can run in a webassembly runtime, such as a browser or wazero.
One feature this relies on is non-blocking access to STDIN. Like some other runtimes, wazero didn't handle this properly.
Thanks to a lot of effort by @evacchi with advice from @achille-roussel and support from the container2wasm author @ktock, wazero now handles non-blocking STDIN properly (via the select syscall).
Experimental changes
Code in our "experimental" directory isn't under an API guarantee, so can change even in a patch version. Here are a couple new experiments since last release.
- @pelletier added a StackIterator parameter to listeners, allowing inspection of the stack leading to a function call. Thanks to @Pryz for the initial design and background, as this is used for CPU profiling data.
- @codefromthecrypt added
emscripten.InstantiateForModule
to dynamically build function imports given a CompiledModule. Thanks to @jerbob92 for the idea and testing with various PDF tools.
Fixes and behavior changes
1.0.2 includes some bug fixes..
- @twilly and @mathetake fixed some concurrency and ordering issues closing modules
- @mathetake fixed a bug in
RuntimeConfig.WithMemoryCapacityFromMax
- @ckaznocha fixed a module cleanup related issue.
- @codefromthecrypt fixed a bug in the logging listener
It also includes a couple behavior changes..
- @abraithwaite made it possible to use
errors.Is
for context-done related error cases. - @codefromthecrypt made host functions retain insertion order (instead of lexicographic).
v1.0.1
wazero v1.0.1 fixes a stdio glitch, improves performance and polishes documentation. We decided to cut an early patch mainly to ensure python works properly.
Python repl hang
Despite trying many things prior to v1.0.0, a glitch escaped us. @evacchi tried the VMware Labs python-wasm, and noticed a repl hang. Edo and @achille-roussel collaborated on a fix, which also ended up deleting tricky code. He verified python-wasm works, and @ncruces verified dcraw still works as well. Thank these folks for the teamwork and rigor!
Optimizations
Due to the nature of our team, you can expect optimizations in every release. A lot of work by @mathetake has been optimization both from line count and performance. There were only several days duration since v1.0.0, the culmination of work by Takeshi and @evacchi (with review support by @achille-roussel) resulted in less code and a slight bump in performance in an end user benchmark:
goos: darwin
goarch: arm64
pkg: github.com/dapr/components-contrib/bindings/wasm
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
Example-12 12.11µ ± 2% 12.02µ ± 1% ~ (p=0.132 n=6)
pkg: github.com/dapr/components-contrib/middleware/http/wasm
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
Native/rewrite/rewrite-12 573.7n ± 0% 575.0n ± 0% ~ (p=0.240 n=6)
Tinygo/rewrite/rewrite-12 1.161µ ± 1% 1.155µ ± 1% -0.52% (p=0.026 n=6)
Wat/rewrite/rewrite-12 986.2n ± 0% 988.4n ± 1% ~ (p=0.485 n=6)
geomean 869.3n 869.0n -0.03%
Docs
Our documentation improved in the last few days as well: @jcchavezs fixed some glitches on our home page around trying out wazero, @jerbob92 added PDFium tools to our users page, and @codefromthecrypt implemented @Xe's suggestion to improve our our walltime clock documentation. We really appreciate the pro-activity on user facing documentation!
v1.0.0
wazero v1.0.0 completes our six month pre-release period and begins our compatibility promise. We will use semantic versions to label releases, and not break APIs we've exposed as non-experimental.
Those not familiar with wazero can check out this blog which overviews the zero dependency runtime. You can also check out our website especially the community and users pages.
Many of you have been following along with our pre-releases over the last 6 months. We did introduce change since v1.0.0-rc.2 with a particularly notable feature we call "anonymous modules". So, let's talk about that first.
Anonymous modules
There are two main ways wazero is used for high-volume request handling. One way is pooling modules and the other is instantiating per-request.
The pool approach is used for functions designed to be invoked many times, such as http-wasm's handler functions. A host, such a dapr keeps a pool of modules, and checks one out per request.
The re-instantiate approach is where you know you can't re-use a module, because the code is not safe to invoke more than once. For example, WASI commands are not safe to re-invoke. So, you have to instantiate a fresh module per request. You can also re-instantiate for higher security on otherwise safe functions.
The latter case was expensive before, because we had to make sure each request had not just a new module, but also a unique name in the runtime. You would see things like this to do that.
// Currently, concurrent modules can conflict on name. Make sure we have
// a unique one.
instanceNum := out.instanceCounter.Add(1)
instanceName := out.binaryName + "-" + strconv.FormatUint(instanceNum, 10)
moduleConfig := out.moduleConfig.WithName(instanceName)
Both allocating a unique name and also name-based locks have a cost to them, and very high throughput use cases, such as event handling would show some contention around this.
Through a lot of brainstorming and work, @achille-roussel @ckaznocha and @mathetake found a clever way to improve performance. When a module has no name, it has nothing to export to other modules. Most of the lock tension was around things to export, and an unnamed module is basically a leaf node with no consumer except the host. We could avoid a lot of the more expensive locking by special-casing modules instantiated without a name.
In the end, to improve re-instantiation performance (when you can't pool modules), clear your module name!
- // Currently, concurrent modules can conflict on name. Make sure we have
- // a unique one.
- instanceNum := out.instanceCounter.Add(1)
- instanceName := out.binaryName + "-" + strconv.FormatUint(instanceNum, 10)
- moduleConfig := out.moduleConfig.WithName(instanceName)
+ // Clear the module name so that instantiations won't conflict.
+ moduleConfig := out.moduleConfig.WithName("")
Other changes
There were a myriad of change from wazero regulars, all of them in the bucket of stabilization, bug fixes or efficiency in general. @achille-roussel @codefromthecrypt and @jerbob92 put a lot of work into triage on WASI edge cases, both discussion and code. @ncruces fixed platform support for solaris/illumos @mathetake optimized wazero performance even more than before. @evacchi fixed a really important poll issue.
These were driven by and thanks to community work. For example, @Pryz led feedback and problem resolution for go compiler tests. Both @ncruces on go-sqlite and @jerbob92 on pdfium shared wins and opportunities for next steps.
In short, there were a lot of exciting relevant work in the short period between rc2 and 1.0.0, and we are lucky for it!
v1.0.0-rc.2
wazero v1.0.0-rc.2 is a stabilizing release, and the last version before 1.0 next week.
wazero 1.0.0 will happen at our release party attended by many contributors present at wasmio in Barcelona. While this is our first community meetup, it won't be our last. Please join us to suggest or help organize subsequent events.
Below are a list of changes, notably new is operating system packaging. Read on to get the full story!
Packaging
@mathetake and @evacchi worked together to publish OS artifacts, you can see attached to this release. Most work was needed around windows as MSI installers need to be signed to avoid warnings. We'll begin distributing wazero via homebrew and winget soon, as well.
Tests
Thanks particularly to @codefromthecrypt and @evacchi, wazero is more tested than we were before, and more than any other runtime we are aware of. We've notably closed gaps not just in WASI, but edge cases around windows and GOOS=js. @mathetake stepped in not just in support of tests, but also adding a test flow so we can code in confidence:
Website
wazero already has extensive code documentation, examples, low-level RATIONALE and language guides.
We have work, yet, on high-level and conceptual documentation. To start, @mathetake added a documentation page, covering architecture and some low-level questions. @evacchi also polished the home page, now that we're focusing a lot more on our CLI user base.
Performance
@mathetake has worked relentlessly to improve performance, especially around initialization of modules. This is analogous to startup time. You can see some of the dramatic improvements below:
$ benchstat v1.0.0-rc.1.txt v1.0.0-rc.2.txt
name old time/op new time/op delta
Initialization/interpreter-32 52.9µs ± 2% 36.1µs ± 2% -31.74% (p=0.000 n=28+25)
Initialization/interpreter-multiple-32 39.9µs ± 2% 39.7µs ±13% ~ (p=0.140 n=29+30)
Initialization/compiler-32 32.2µs ± 7% 28.3µs ± 9% -12.09% (p=0.000 n=30+30)
Initialization/compiler-multiple-32 25.3µs ± 5% 24.5µs ± 8% -3.29% (p=0.000 n=30+26)
Compilation/with_extern_cache-32 206µs ± 2% 200µs ± 2% -2.78% (p=0.000 n=29+30)
Compilation/without_extern_cache-32 6.00ms ± 1% 5.94ms ± 1% -0.95% (p=0.000 n=29+30)
name old alloc/op new alloc/op delta
Initialization/interpreter-32 137kB ± 0% 136kB ± 0% -0.35% (p=0.000 n=30+30)
Initialization/interpreter-multiple-32 137kB ± 0% 137kB ± 0% -0.04% (p=0.000 n=27+27)
Initialization/compiler-32 141kB ± 0% 137kB ± 0% -3.09% (p=0.000 n=30+23)
Initialization/compiler-multiple-32 142kB ± 0% 142kB ± 0% -0.03% (p=0.000 n=27+25)
Compilation/with_extern_cache-32 55.6kB ± 0% 54.6kB ± 0% -1.79% (p=0.000 n=29+30)
Compilation/without_extern_cache-32 1.99MB ± 0% 1.99MB ± 0% -0.12% (p=0.000 n=30+30)
name old allocs/op new allocs/op delta
Initialization/interpreter-32 52.0 ± 0% 38.0 ± 0% -26.92% (p=0.000 n=30+30)
Initialization/interpreter-multiple-32 58.0 ± 0% 57.0 ± 0% -1.72% (p=0.000 n=30+30)
Initialization/compiler-32 42.0 ± 0% 38.0 ± 0% -9.52% (p=0.000 n=30+30)
Initialization/compiler-multiple-32 48.0 ± 0% 47.0 ± 0% -2.08% (p=0.000 n=30+30)
Compilation/with_extern_cache-32 1.10k ± 0% 0.98k ± 0% -10.86% (p=0.000 n=27+30)
Compilation/without_extern_cache-32 32.7k ± 0% 32.6k ± 0% -0.37% (p=0.000 n=30+30)
To support future improvements, we no longer allow importing unnamed modules (moduleName=""). This is an edge case allowed by spec, but not used in practice. By disallowing this, future versions of wazero can be considerably faster instantiating anonymous modules than today.
Changes in support of wasm compiled by Go
Some of you know Go builds wasm binaries when the environment variables GOARCH=wasm and GOOS=js
are set. We include an experimental package gojs
which supports this until Go includes a WASI operating system for at least 2 releases. We support gojs
in part due to users who want an alternative runtime besides node.js. The other reason is to help support future development in the Go compiler. Our hope is developers can quickly check behavior between JS and WASI, so that problems are solved quicker.
The main change in this version is moving the gojs directory under the experimental folder. It was always experimental, but via documentation. This should help people know that this operating system is temporary until Go supports WASI (GOOS=wasip1
) for at least 2 releases.
The other change is exposing gojs.Config
with additional go-specific feature toggles, enabled by default in our CLI. This allows things not defined in WASI to pass, for example functionality about the working directory or user IDs of the process. This allows wazero to pass 100pct of the os package tests defined by Go.