Skip to content

c-api: component-model: Values and function calling #10697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

MangoPeachGrape
Copy link
Contributor

@MangoPeachGrape MangoPeachGrape commented Apr 29, 2025

Only contains primitive values and no docs as of now. Trying to gauge if this is the correct way to approach this.

Some things of note:

  • wasmtime_component_valunion_t::boolean instead of ::bool because keyword.
  • wasmtime_component_valunion_t::f32 and WASMTIME_COMPONENT_F32 instead of float32, should this be changed to be consistent?
  • C++20 needed for syntax used when creating wasmtime_component_val_t in the test, so I bumped to 20 only for tests. Is that fine? Could the whole project be bumped to 20?

Also, is there any better way to test all values, other than copying the test for each kind of value?

@MangoPeachGrape MangoPeachGrape requested a review from a team as a code owner April 29, 2025 16:36
@MangoPeachGrape MangoPeachGrape requested review from pchickey and removed request for a team April 29, 2025 16:36
@github-actions github-actions bot added the wasmtime:c-api Issues pertaining to the C API. label Apr 29, 2025
@ac000
Copy link

ac000 commented Apr 29, 2025

Could the whole project be bumped to 20?

My feeling is that is a bit too recent, even GCC 15.1 just released still defaults to -std=gnu++17 for C++. Though admittedly I don't really know which bits of wasmtime that would effect. Like how would that affect building wasmtime on older (but still supported, thinking Debian, RHEL etc) Linux distributions?.

Also, is there any better way to test all values, other than copying the test for each kind of value?

An array of structs of the values perhaps?

@pchickey pchickey requested review from alexcrichton and removed request for pchickey April 30, 2025 00:00
@MangoPeachGrape
Copy link
Contributor Author

My feeling is that is a bit too recent, even GCC 15.1 just released still defaults to -std=gnu++17 for C++. Though admittedly I don't really know which bits of wasmtime that would effect. Like how would that affect building wasmtime on older (but still supported, thinking Debian, RHEL etc) Linux distributions?.

The feature used was 'designated initializer', which was implemented in gcc 8 and clang 10, in 2018 and 2020 respectively. AFAIK it would only be required for building tests, and when a consumer depends on the wasmtime-cpp project, as the build step for the C-API doesn't build any C++ code.

Feel free to let me know if its too new/not wanted!

@ac000
Copy link

ac000 commented Apr 30, 2025

The feature used was 'designated initializer', which was implemented in gcc 8 and clang 10, in 2018 and 2020

Heh, only 20 years after C!

respectively. AFAIK it would only be required for building tests, and when a consumer depends on the wasmtime-cpp project, as the build step for the C-API doesn't build any C++ code.

Feel free to let me know if its too new/not wanted!

Not that this is really any business of mine, but that's probably OK then, Debian 11 has GCC 10.x and Clang 11.x and RHEL/etc 8 has GCC 8.x

B.t.w thanks for doing this work!

@alexcrichton
Copy link
Member

On the PR at-hand here, one thing I am worried about with values is ownership of lists and transitive pointers. For example the string variant here is currently implemented as an owning pointer which means that even for invocations such as wasmtime_component_func_call it'll require that ownership is passed into the component. It also means that on the "other half", defining a function in a linker, ownership will be given and the calling code will be responsible for deallocating it. More-or-less this is I think the main sticking point of component model values and figuring this out will pave the way for the rest of the API.

Historically I've thought that we want to not have ownership during wasmtime_component_func_call and maybe give ownership when wasm calls the host. That being said this strategy has a large downside of now wasmtime_component_val_t doesn't have static ownership semantics which makes it much more difficult to bind in languages and such. Static ownership semantics definitely makes the type more usable (as implemented here), but less efficient.

Personally I'm sort of leaning towards this-style API that you've prototyped here. It's less efficient than a hypothetical alternative but the "hypothetical alternative" is IMO so unusable it's not worth pursuing (e.g. not being able to write down static semantics for a type and how it's owned).

One way I can perhaps think about this API is that it's similar to wasmtime::component::Val which is intended for expressive power, not efficiency. The downside is that there's no way to implement a "typed" API in C so it means that this would be the only way to invoke a component, which isn't great. This is where I've historically gotten stuck...


For your specific questions though:

wasmtime_component_valunion_t::boolean instead of ::bool because keyword.

sounds reasonable to me yeah

wasmtime_component_valunion_t::f32 and WASMTIME_COMPONENT_F32 instead of float32, should this be changed to be consistent?

The "f32" naming is the right naming to use, we just haven't completed the rename from "float32" to "f32" in all places yet (but if you see them feel free to flag them)

C++20 needed for syntax used when creating wasmtime_component_val_t in the test, so I bumped to 20 only for tests. Is that fine? Could the whole project be bumped to 20?

It seems reasonable to bump this for tests yeah if it works on CI given that it's test-only. I'm not really sure how this would impact consumers since this library is primarily built in Rust and otherwise just shipped as a bunch of headers, but we can handle anything in issues and such. As you say if -DBUILD_TESTS=OFF, which is the default, I think will handle most of the impact here.

Also, is there any better way to test all values, other than copying the test for each kind of value?

Given the dynamic ownership nature of values I think this is basically going to involve a lot of copy/paste. That's where I think C++ dtors can help a lot by reducing the amount of boilerplate, but that would also require binding in C++ APIs. Maybe something where tests are mostly calling a central helper function with a few customized callbacks? The callbacks would customize the module/value per-test in that case maybe?


And as a final thought: a natural dual for this feature will be the ability to define host functions that can be called from wasm (e.g. defining a function in a component linker). I've personally found values tricky enough in the past that I've found it worthwhile to implement both calling wasm and wasm calling the host at the same time. If you'd like I think it'd be reasonable to include the basics of such infrastructure in this PR. If you'd prefer not to, however, I think it's reasonable to defer this to a future PR as well.

@MangoPeachGrape
Copy link
Contributor Author

wasmtime_component_val_t doesn't have static ownership semantics which makes it much more difficult to bind in languages and such. Static ownership semantics definitely makes the type more usable

It's less efficient than a hypothetical alternative

Could you expand on what does the "static ownership semantics" and the "hypothetical alternative" mean?


One idea that popped into my head was to have some helper functions for each type to lift/lower to the raw values, or maybe give access to the raw values? I have zero idea if that would be a valid approach. I might be missing something big, as I don't have much at all knowledge of the existing codebase.

@alexcrichton
Copy link
Member

Oh sure, by static ownership semantics I mean that a type always prescribes a particular way to manage its memory. For example wasmtime_component_t always requires that the caller deallocates it, no questions asked. this is also reflected in your wasmtime_component_val_t where the string variant is "always owned" and whenever it's passed along that's a transfer of ownership.

This is in contrast to thinking I've had historically about component values. For example invoking a component function does not in theory require giving up ownership of anything. You could in theory pass in string pointers that aren't free'd by wasmtime_component_func_call. If we were to go down this road though this is the "hypothetical alternative" where all of a sudden the memory management of wasmtime_component_val_t is dependent on where the value came from. Sometimes it might be transferred by ownership, sometimes not. Personally I think such a system, while possibly more efficient, is too complex to reliably use correctly.

Basically what you've sketched out here is I think a good idea and we should keep it this way.

In terms of prototyping I think it might be useful to, here in this PR, sketch out not only the terminal types of wasmtime_component_val_t but something that also involves recursion, for example a record type. That might help sort out some of these questions perhaps?

@MangoPeachGrape
Copy link
Contributor Author

After thinking about this for few days, I started to dislike passing args as mut, and the forced allocation. For strings requiring the C-API to allocate a wasm_name_t isn't a big issue, as rust can use that allocation, but for lists and other values that isn't possible.

What I tried to achiveve in this iteration is being able to create arguments for a function fully on the stack, and rust to be able to still return values.

In practice this would mean that you need to call wasmtime_component_val_delete() on each return value that rust might have allocated.

Please let me know if you feel like I'm going in the wrong direction!

@alexcrichton
Copy link
Member

This is possible yeah and it's the alternative I mentioned above too. The main downside is that this is quite difficult to bind in other languages because the ownership semantics of wasmtime_component_val_t is not clear. Sometimes you need to call a destructor and sometimes not, and that makes it quite difficult to use correctly in any context, including bindings in other languages.

This is basically where I've always stopped short historically in thinking about what this would look like. I get stuck here not knowing how best to bind component values. I've yet to find a scheme that feels like it balances pros/cons effectively unfortunately :(

My best thinking at this time is that we should have two methods of calling component functions, similar to the wasmtime_func_call{,_unchecked}. One takes wasmtime_component_val_t which is flexible, but slow. This would look mostly like this PR wasmtime_component_val_t but it always has a destructor with it. How exactly that would work out I'm not entirely certain. The "unchecked" path though would take a "canonical ABI blob" which is sort of a packed representation of the canonical ABI of a type, but for the host, and in theory has much lower overhead.

Or... something like that, I really don't have concrete ideas about how best to progress here. I really am worried though about this style of API though in that I don't know how to bind it in languages like Python safely.

@MangoPeachGrape
Copy link
Contributor Author

Would something like this be an acceptable API for bindings in other languages:

wasmtime_component_val_t argument = wasmtime_component_valrecord_new(2);
assert(argument.kind == WASMTIME_COMPONENT_RECORD);
assert(argument.of.record.len == 2);

wasm_name_new(&argument.of.record.ptr[0].name, "first")
argument.of.record.ptr[0].val.kind = WASMTIME_COMPONENT_U32;
argument.of.record.ptr[0].val.of.u32 = 1;

wasm_name_new(&argument.of.record.ptr[1].name, "second")
argument.of.record.ptr[1].val.kind = WASMTIME_COMPONENT_U32;
argument.of.record.ptr[1].val.of.u32 = 1;

wasmtime_component_val_t result = {};

wasmtime_component_func_call(func, &argument, 1, &result, 1);

wasmtime_component_val_delete(&argument);
wasmtime_component_val_delete(&result);

This kind of API would allow the record entries to be both heap allocated (like this example) or stack allocated (not calling wasmtime_component_valrecord_new() and wasmtime_component_val_delete()). I haven't looked at the Python bindings, does it allow having wasmtime_component_val_t on the stack, does that also have to be a pointer to heap?

The "unchecked" path though would take a "canonical ABI blob" which is sort of a packed representation of the canonical ABI of a type, but for the host, and in theory has much lower overhead.

I agree, that would be great in the long run.

Why I leaned towards having the option for stack allocated values is that I thought it would make some cases simpler and faster, like:

wasmtime_component_valrecord_entry_t entry = (wasmtime_component_valrecord_entry_t) {
    .name.ptr = "namee",
    .name.len = 5,
    .val.kind = WASMTIME_COMPONENT_U8,
    .val.of.u8 = 123,
};
wasmtime_component_func_call(
    func,
    &(wasmtime_component_val_t) {
        .kind = WASMTIME_COMPONENT_RECORD,
        .of.record.ptr = &entry,
        .of.record.len = 1
    },
    1,
    &return,
    1
)

(not sure if that's 100% correct C syntax, but the idea should remain)

@alexcrichton
Copy link
Member

I think the trickier parts of memory management are going to come up during host-defined functions being inserted into a linker. For example when arguments are passed to the host, should the host or wasmtime deallocate them? When arguments are returned, does wasmtime need to deallocate them? The precise answer to this question ends up informing how difficult this is to bind in other languages.

For other languages though I don't really have a concrete worry per se. Historically wasmtime_val_t had an allocation inside of it for wasmtime_externref_t which I remember being a huge pain binding. Everything got way easier when it became a bland u32 from a guest bindings perspective. Unfortunately though I don't recall the exact pain and issues that I came up with. I think the only real way to bottom out such a concern would be to write such bindings, but I also don't think that's reasonable to expect of you.


Perhaps though another radical alternative. This is something I've had rattling around in my head for awhile that I keep forgetting about and have also not fully fleshed out. What I'm imagining is something like this:

typedef struct wasmtime_component_vals wasmtime_component_vals_t;
typedef struct wasmtime_component_call wasmtime_component_call_t;

wasmtime_component_call_t *call = wasmtime_component_func_call_start(store, &func);
wasmtime_component_vals_t *params = wasmtime_component_call_params(call);
// set the first parameter, in this case a `u32`
wasmtime_component_vals_set_u32(params, 3);
// set the second parameter, in this case a `string`
wasmtime_component_vals_set_string(params, &my_variable_typed_as_wasm_name_t);
// set the third parameter, in this case a `record point { x: u32, y: u32 }`
wasmtime_component_vals_set_record(params, /*nfields=*/ 2);
  // set "x: 0"
wasmtime_component_vals_set_record_field(params, &x_as_wasm_name_t);
wasmtime_component_vals_set_u32(params, 0);
  // set "y: 1"
wasmtime_component_vals_set_record_field(params, &y_as_wasm_name_t);
wasmtime_component_vals_set_u32(params, 1);

// dispatch the call
wasmtime_component_vals_t *results = wasmtime_component_call_finish(call);

// if the result is a u32
wasmtime_component_vals_get_u32(results, &ret);
// if the result is a string
wasmtime_component_vals_get_string(results, &something_with_wasm_name_t);
// etc ..

The rough idea is that this is a much "chattier" C ABI boundary but is, in theory, much more flexible about where things are stored and how exactly the host represents things. The wasmtime_component_vals_t type is an "iterator" of sorts that walks over the structure of the type tree in tandem with where values are actually being stored in a WebAssembly module's linear memory and such. This would be, in essence, interleaving lowering code with actually storing into wasm linear memory.

Actually implementing this would require new support on the Wasmtime side of things, which would arguably be a good thing as well. Whether or not this is a good idea I don't know, this definitely isn't a fully fleshed out idea. It would, however, remove the need for strict ownership around wasmtime_component_val_t

@MangoPeachGrape
Copy link
Contributor Author

MangoPeachGrape commented May 6, 2025

I think the trickier parts of memory management are going to come up during host-defined functions being inserted into a linker. For example when arguments are passed to the host, should the host or wasmtime deallocate them? When arguments are returned, does wasmtime need to deallocate them? The precise answer to this question ends up informing how difficult this is to bind in other languages.

These are the semantics I've been thinking of:

  • When calling guest function, you own both the arguments and return values, so you need to clean both up (i.e. call wasmtime_component_val_delete()).
  • When guest calls your host function, the arguments and return values will be owned by rust, so if you create a record return value (what would be wasmtime_component_valrecord_new(/* size */) from my last comment) and assign in to the return value array, then the rust side cleans it up after your function has executed.

Edit: Just to be clear (because the current naming of wasmtime_component_val_delete() isn't great), wasmtime_component_val_delete() doesn't actually delete the pointer passed to it, it just calls the destructor on the pointer, which will delete the allocations inside the value, e.g. a record that has allocated entries.

I feel like this is what I had initially, just that you need to clean up the arguments you pass to a function after calling it, which I feel should be fine, as return values would have had to been cleaned up anyway.

Perhaps though another radical alternative. This is something I've had rattling around in my head for awhile that I keep forgetting about and have also not fully fleshed out. What I'm imagining is something like this:

I see the idea, I'll need some time to properly think through it.

I'll try to implement host functions now, to understand the problem better.

How should testing of records be done? I've been doing using this guest component locally:

record ccc {
	value: u64,
	multiplier: u64,
}

export cccc: func(a: ccc) -> ccc;
fn cccc(a: Ccc) -> Ccc {
    Ccc {
        value: a.value * a.multiplier,
        multiplier: a.multiplier,
    }
}

... and then calling it and checking the return value. Is there any resonable way to test these kinds of values without a guest component written in a higher level language?
I saw the /examples/component/ that used a guest component written in rust, should it be done similarly here?

@alexcrichton
Copy link
Member

Those ownership semantics make sense to me, but they're a bit tricky to bind in a higher level language like Python. For example Python will have some sort of Val type which is the union wasmtime_component_val_t under the hood. When calling a function you'd provide a list of Val and get back a list of Val, and that particular one would have to be owned and managed in Python itself. When wasm calls back into the host, in this case Python, then you'd still get a list of Val and produce a list of Val but the ownership is different where Python can't persist the Val beyond the function call, for example, so various pointers would have to be invalidated just before the host function returns. This is all doable, but will be tricky basically.

How should testing of records be done?

Yeah it's possible to use the text format of components, albeit it's a bit verbose. You can get a bit of a feel for the text format for components from this directory

@MangoPeachGrape
Copy link
Contributor Author

Wrote some tests for the complex types, tried to abstract away most of the creation. Followed the tests in tests/all/component_model/ so I duplicated the REALLOC_AND_FREE helper. Not sure if its needed though, as the current tests only require one allocation. Could you look thoroughly through the test code, as I was not at all familiar with the text format.

What's your current thoughts on the design? Do you think its good enough to go forward?

I think one case is not covered: in a host function, setting the return value to the argument value, as doing rets[0] = args[0] would result in a double free. A wasmtime_component_val_clone() should fix this though.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks for being patient I was a bit too busy last week! Overall I think let's commit to this approach. This looks to be workable enough and if it's difficult to integrate into other languages we can tackle that then.

Thank you again very much for working on this!

@MangoPeachGrape MangoPeachGrape force-pushed the c-api/component-model/val branch from b7f9dd6 to e644e3e Compare May 20, 2025 18:26
@MangoPeachGrape
Copy link
Contributor Author

MangoPeachGrape commented May 20, 2025

Oops... thought I could rebase on top of main cleanly because I didn't get any conflicts, sorry.

@MangoPeachGrape MangoPeachGrape force-pushed the c-api/component-model/val branch from 7467bd3 to d9e81ef Compare May 22, 2025 13:59
@MangoPeachGrape
Copy link
Contributor Author

Refactored to use declare_vecs!, but I'm not entirely happy with it:

_new(), _empty(), and _uninit() using an out param, instead of a return value, but I also understand the backward compatibility requirements of the other APIs...

_copy() and _delete() are duplicated for each inner type, like wasmtime_component_vallist_copy() and wasmtime_component_valrecord_copy().
What I've been pondering is would it be simpler to have a "global" wasmtime_component_val_copy() instead? I can't seem to think of scenarios where an inner type copy() would be needed, but they might exist?


Feel free to bikeshed all the names, like:
wasmtime_component_vallist_t vs wasmtime_component_val_list_t (i.e. should there be _)
wasmtime_component_valrecord_entry_t vs wasmtime_component_valrecord_field_t or something else?

I also assume rest of the value types should come in a later PR?

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to have wasmtime_component_val_copy but I also think it's good to have intermediate helpers as well in case they're needed, so personally I'm ok having both the top-level copy method plus helpers for intermediate ones as needed.

@alexcrichton alexcrichton added this pull request to the merge queue May 22, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 22, 2025
@MangoPeachGrape
Copy link
Contributor Author

Hmm, the doxygen output for the component files don't show up like it does for other files..
See https://docs.wasmtime.dev/c-api/component_2component_8h_source.html

@alexcrichton alexcrichton added this pull request to the merge queue May 22, 2025
@alexcrichton
Copy link
Member

Odd! Not that I'm enough of a doxygen expert to know why...

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 22, 2025
@alexcrichton alexcrichton added this pull request to the merge queue May 23, 2025
@alexcrichton
Copy link
Member

If this bounces again you can put prtest:full in a commit message somewhere and it'll run full CI on this PR instead of just a subset (avoids the need to have me in the middle of the iteration loop)

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 23, 2025
@alexcrichton alexcrichton enabled auto-merge May 23, 2025 14:02
@alexcrichton alexcrichton added this pull request to the merge queue May 23, 2025
Merged via the queue into bytecodealliance:main with commit b2c64de May 23, 2025
160 checks passed
@MangoPeachGrape MangoPeachGrape deleted the c-api/component-model/val branch May 23, 2025 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wasmtime:c-api Issues pertaining to the C API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants