Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share a common richly-typed description of Linux syscalls with other projects #181

Closed
catern opened this issue Jan 19, 2022 · 7 comments
Closed

Comments

@catern
Copy link

catern commented Jan 19, 2022

Hi, I've followed Rustix (and other downstream projects) with great interest - very exciting! I'm excited for the safer future it can bring!

A really detailed, richly-typed, machine-readable description of the types/behavior of Linux system calls would, I think, be really useful for Rustix and other projects like strace, gvisor, WSL1, etc. This is a big missing part of the Linux ecosystem, I think. (And I've seen other projects, e.g. strace, bemoan the lack of it before too.)

In theory, you could use it to generate Rustix instead of implementing it by hand - or at least to sanity-check that Rustix is compatible with ground truth. I've been wanting this kind of shared description from my work on https://github.com/catern/rsyscall which I think has a similar philosophy to Rustix. (specifically the "type-safe" and "low-level" bullet points in the summary)

Have you thought about such a description? Do you think it would be useful to you?

@sunfishcode
Copy link
Member

I like the idea, in theory :-).

A concern I have is that a lot of Linux syscalls end up having special considerations, like 32-bit vs. 64-bit time_t, stat's st_mode field holding both discrete flags and an enum or'd together, sendto taking what is effectively a discriminated union, sendmsg being really complex, syscalls that are only available in some kernel versions or some architectures or different versions of different architectures, or some syscalls being "socketcalls" on 32-bit x86, futex which is really about 5 or more different operations that awkwardly share a single signature, and more. I'd be interested to see a sketch of what some of these might look like.

@catern
Copy link
Author

catern commented Jan 19, 2022

For sure, it would require a much richer ability to describe types and bit-level-data-formats than is available in Rust or other general-purpose languages.

Just some thoughts about features to address the specific things you brought up:

stat's st_mode field holding both discrete flags and an enum or'd together

The ability to describe bit-level data formats

sendto taking what is effectively a discriminated union

Explicit support for tagged unions

sendmsg being really complex

Different types for sendmsg and recvmsg msg_hdr, for starters, and explicitly describing the possible variants that can be sent through sendmsg CMSGs etc.

syscalls that are only available in some kernel versions or some architectures or different versions of different architectures, or some syscalls being "socketcalls" on 32-bit x86,

A different syscall description for each architecture

futex which is really about 5 or more different operations that awkwardly share a single signature

Support for overloaded syscalls witch dispatch to different variants based on a parameter (useful for ioctl as well); possibly implemented with literal types/singleton types + overloads

@sunfishcode
Copy link
Member

Those sound like they're heading in the right direction overall. With this one:

A different syscall description for each architecture

There is a lot of commonality between the architectures, so it might be desirable to have a common set that all architectures share, while allowing architectures to diverge as needed.

@sunfishcode
Copy link
Member

The original question here is answered: yes, such a project would be useful!

@cgwalters
Copy link
Contributor

Some related bits in https://github.com/google/syzkaller/blob/master/sys/syz-sysgen/sysgen.go
and seccomp/libseccomp#11 at least.

@catern
Copy link
Author

catern commented Apr 30, 2022

FWIW I posted about this on LKML here: https://lore.kernel.org/lkml/[email protected]/t/

@sunfishcode
Copy link
Member

Thanks for posting that!

My one observation is it isn't necessary to describe the semantics of io_uring or ebpf to be useful :-). It doesn't really even need to describe what a "file", "socket", or "pipe" is, or how I/O works. Just being able to say which arguments are file descriptors (and which of those are consumed), which arguments are buffer lengths, whether buffers are read, written, or both, and so on, would be very useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants