Skip to content

Commit

Permalink
2022/11/29-gpu-story: Typo fixes
Browse files Browse the repository at this point in the history
Signed-off-by: Asahi Lina <[email protected]>
  • Loading branch information
asahilina committed Nov 29, 2022
1 parent 837d46a commit a78843a
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions content/blog/2022/11/29-gpu-story.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ You probably know what a GPU is, but do you know how they work under the hood? L

(This is all very simplified and in reality there are a lot more parts that vary from GPU to GPU, but those are the most important bits!)

In order to handle all these moving parts in a reasonably safest way, modern GPU drivers are split into two parts: a *user space driver* and a *kernel driver*. The user space part is in charge of compiling shader programs and translating API calls (like OpenGL or Vulkan) into the specific command lists that the command processor will use to render the scene. Meanwhile, the kernel part is in charge of managing the MMU and handling memory allocation/deallocation from different apps, as well as deciding how and when to send their commands to the command processor. All modern GPU drivers work this way, on all major OSes!
In order to handle all these moving parts in a reasonably safe way, modern GPU drivers are split into two parts: a *user space driver* and a *kernel driver*. The user space part is in charge of compiling shader programs and translating API calls (like OpenGL or Vulkan) into the specific command lists that the command processor will use to render the scene. Meanwhile, the kernel part is in charge of managing the MMU and handling memory allocation/deallocation from different apps, as well as deciding how and when to send their commands to the command processor. All modern GPU drivers work this way, on all major OSes!

Between the user space driver and the kernel driver, there is some kind of custom API that is customized for each GPU family. These APIs are usually different for every driver! In Linux we call that the UAPI, but every OS has something similar. This UAPI is what lets the user space part ask the kernel to allocate/deallocate memory and submit command lists to the GPU.

Expand All @@ -41,7 +41,7 @@ Earlier this year, her work was so far ahead that she was running [games](https:

In April this year, I decided to start trying to figure out how to write an M1 GPU kernel driver! [Scott Mansell](https://github.com/phire) had already done a bit of reconnaisance work on that front when I got started... and it was already clear this was no ordinary GPU. Over the first couple of months, I worked on writing and improving a [m1n1 hypervisor](https://asahilinux.org/2021/08/progress-report-august-2021/#hardware-reverse-engineering-with-the-m1n1-hypervisor) tracer for the GPU, and what I found was very, very unusual in the GPU world.

Normally, the GPU driver is responsible for details such as scheduling and prioritizing work on the GPU, and preempting jobs when they take too long to run to allow apps to use the GPU fairly. Sometimes the driver takes care of power management, and sometimes that is done by dedicated firmware running on a power management coprocessor. And sometimes there is other firmware taking care of some details of command processing, but it's mostly invisible to the kernel driver. In the end, especially for simpler "mobile-style" GPUs like ARM Mali, the actual hardware interface for getting the GPU to render something is usually pretty simple: There's the MMU, which works like a standard CPU MMU or IOMMU, and then the command processor usually takes pointers userspace command buffers directly, in some kind of registers or ring buffer. So the kernel driver doesn't really need to do much other than manage the memory and schedule work on the GPU, and the Linux kernel DRM (Direct Rendering Manager) subsystem already provides a ton of helpers to make writing drivers easy! There are some tricky bits like preemption, but those are not critical to get the GPU working in a brand new driver. But the M1 GPU is different...
Normally, the GPU driver is responsible for details such as scheduling and prioritizing work on the GPU, and preempting jobs when they take too long to run to allow apps to use the GPU fairly. Sometimes the driver takes care of power management, and sometimes that is done by dedicated firmware running on a power management coprocessor. And sometimes there is other firmware taking care of some details of command processing, but it's mostly invisible to the kernel driver. In the end, especially for simpler "mobile-style" GPUs like ARM Mali, the actual hardware interface for getting the GPU to render something is usually pretty simple: There's the MMU, which works like a standard CPU MMU or IOMMU, and then the command processor usually takes pointers to userspace command buffers directly, in some kind of registers or ring buffer. So the kernel driver doesn't really need to do much other than manage the memory and schedule work on the GPU, and the Linux kernel DRM (Direct Rendering Manager) subsystem already provides a ton of helpers to make writing drivers easy! There are some tricky bits like preemption, but those are not critical to get the GPU working in a brand new driver. But the M1 GPU is different...

Just like other parts of the M1 chip, the GPU has a coprocessor called an "ASC" that runs Apple firmware and manages the GPU. This coprocessor is a full ARM64 CPU running an Apple-proprietary real-time OS called RTKit... and it is in charge of everything! It handles power management, command scheduling and preemption, fault recovery, and even performance counters, statistics, and things like temperature measurement! In fact, the macOS kernel driver doesn't communicate with the GPU hardware at all. All communication with the GPU happens via the firmware, using data structures in shared memory to tell it what to do. And there are a lot of those structures...

Expand All @@ -56,7 +56,7 @@ Just like other parts of the M1 chip, the GPU has a coprocessor called an "ASC"
* **Vertex rendering commands**, which tell the vertex processing and tiling part of the GPU how to process commands and shaders from userspace to run the vertex part of a whole render pass.
* **Fragment rendering commands**, which tell the rasterization and fragment processing part of the GPU how to render the tiled vertex data from the vertex processing into an actual framebuffer.

It gets even more complicated than that! The vertex and fragment rendering commands are actually very complicated structures with many nested structures within, and then each command actually has a pointer to a "microsequence" of smaller commands that are interpreted by the GPU firmware, like a custom virtual CPU! Normally those commands set up the rendering pass, wait for it to complete, and clean up... but it also supports things like timestamping commands, and even loops and arithmetic! It's crazy! And all of these structures need to be filled in with intimate details about what is going to be rendered, like pointers to the depth and stencil buffers are, the framebuffer size, whether MSAA (multisampled antialiasing) is enabled and how it is configured, pointers to specific helper shader programs, and much more!
It gets even more complicated than that! The vertex and fragment rendering commands are actually very complicated structures with many nested structures within, and then each command actually has a pointer to a "microsequence" of smaller commands that are interpreted by the GPU firmware, like a custom virtual CPU! Normally those commands set up the rendering pass, wait for it to complete, and clean up... but it also supports things like timestamping commands, and even loops and arithmetic! It's crazy! And all of these structures need to be filled in with intimate details about what is going to be rendered, like pointers to the depth and stencil buffers, the framebuffer size, whether MSAA (multisampled antialiasing) is enabled and how it is configured, pointers to specific helper shader programs, and much more!

In fact, the GPU firmware has a strange relationship with the GPU MMU. It uses the same page tables! The firmware literally takes the same page table base pointer used by the GPU MMU, and configures it as its ARM64 page table. So GPU memory *is* firmware memory! That's crazy! There's a shared "kernel" address space (similar to the kernel address space in Linux) which is what the firmware uses for itself and for most of its communication with the driver, and then some buffers are shared with the GPU hardware itself and have "user space" addresses which are in a separate address space for each app using the GPU.

Expand Down Expand Up @@ -122,7 +122,7 @@ Normally, when you write a brand new kernel driver as complicated as this one, t

But all that just... didn't happen! I only had to fix a few logic bugs and one issue in the core of the memory management code, and then everything else just worked stably! Rust is truly magical! Its safety features mean that the design of the driver is guaranteed to be thread-safe and memory-safe as long as there are no issues in the few unsafe sections. It really guides you towards not just safe but good design.

Of course, there are always unsafe sections of code, but since Rust makes you think in terms of safe abstractions, it's very easy to keep the surface area of possible bugs very low. There were still some safety issues! For example, I had a bug in my DRM memory management abstraction that could end up wtih an allocator being freed before all of its allocations were freed. But since those kinds of bugs are specific to one given piece of code, they tend to be major things that are obvious (and can be audited or caught in code review), instead of hard-to-catch race conditions or error cases that span the entire driver. You end up reducing the amount of possible bugs to worry about to a tiny number, by only having to think about specific code modules and safety-relevant sections individually, instead of their interactions with everything else. It's hard to describe unless you've tried Rust, but it makes a huge difference!
Of course, there are always unsafe sections of code, but since Rust makes you think in terms of safe abstractions, it's very easy to keep the surface area of possible bugs very low. There were still some safety issues! For example, I had a bug in my DRM memory management abstraction that could end up with an allocator being freed before all of its allocations were freed. But since those kinds of bugs are specific to one given piece of code, they tend to be major things that are obvious (and can be audited or caught in code review), instead of hard-to-catch race conditions or error cases that span the entire driver. You end up reducing the amount of possible bugs to worry about to a tiny number, by only having to think about specific code modules and safety-relevant sections individually, instead of their interactions with everything else. It's hard to describe unless you've tried Rust, but it makes a huge difference!

Oh, and there's also error and cleanup handling! All the error-prone `goto cleanup` style error handling to clean up resources in C just... vanishes with Rust. Even just that is worth it on its own. Plus you get real iterators and reference counting is automatic! ❤

Expand All @@ -132,7 +132,7 @@ Oh, and there's also error and cleanup handling! All the error-prone `goto clean

With the kernel driver on the right track, it was time to join forces with Alyssa and start working together! No longer bound by the confines of testing only on macOS, she started making major improvements to the Mesa driver! I even helped a little bit ^^

We gave a [joint talk](https://www.youtube.com/watch?v=SDJCzJ1ETsM) at XDC 2022, and at the time we ran the entire talk on an M1 using our drivers! Since then we've been working on adding new features, bug fixes, and performance improvements to both sides. I added support for the M1 Pro/Max/Ultra family and the M2 to the kernel side, as well as more and better debugging tools and memory allocation performance improvements. She's been steadily improving GL comformace, with OpenGL ES 2.0 conformance practically complete and 3.0 conformance at over 96%! She also added many new features and performance improvements, and today you can play games like Xonotic and Quake at 4K!
We gave a [joint talk](https://www.youtube.com/watch?v=SDJCzJ1ETsM) at XDC 2022, and at the time we ran the entire talk on an M1 using our drivers! Since then we've been working on adding new features, bug fixes, and performance improvements to both sides. I added support for the M1 Pro/Max/Ultra family and the M2 to the kernel side, as well as more and better debugging tools and memory allocation performance improvements. She's been steadily improving GL conformace, with OpenGL ES 2.0 conformance practically complete and 3.0 conformance at over 96%! She also added many new features and performance improvements, and today you can play games like Xonotic and Quake at 4K!

<div style="text-align: center">
<iframe src="https://social.treehouse.systems/@alyssa/109311591472543702/embed" class="mastodon-embed" style="max-width: 100%; border: 0" width="400" allowfullscreen="allowfullscreen"></iframe><script src="https://social.treehouse.systems/embed.js" async="async"></script>
Expand All @@ -155,6 +155,6 @@ But even with those limitations, the drivers can run stable desktops today and p

So where do you get it? We're not quite there yet! Right now the driver stack is complicated to build and install (you need custom m1n1, kernel, and mesa builds), so please wait a little bit longer! We have a few loose ends to tie still... but we hope we can bring it to Asahi Linux as an opt-in testing build before the end of the year! ✨✨

If you're interested in following my work on the GPU, you can follow me at [@lina@vt.social](https://vt.social/@lina) or subscribe to my [YouTube channel](https://youtube.com/AsahiLina)! Tomorrorow I'm going to be working on figuring out the power consumption calculations for the M1 Pro/Max/Ultra and M2, and I hope to see you there! ✨
If you're interested in following my work on the GPU, you can follow me at [@lina@vt.social](https://vt.social/@lina) or subscribe to my [YouTube channel](https://youtube.com/AsahiLina)! Tomorrow I'm going to be working on figuring out the power consumption calculations for the M1 Pro/Max/Ultra and M2, and I hope to see you there! ✨

If you want to support my work, you can donate to marcan's Asahi Linux support funds on [GitHub Sponsors](http://github.com/sponsors/marcan) or [Patreon](https://patreon.com/marcan), which helps me out too! And if you're looking forward to a Vulkan driver, check out Ella's [GitHub Sponsors](https://github.com/sponsors/Ella-0) page! Alyssa doesn't take donations herself, but she'd love it if you donate to a charity like the [Software Freedom Conservancy](https://sfconservancy.org/) instead. (Although maybe one day I'll convince her to let me buy her an M2... ^^;;)

0 comments on commit a78843a

Please sign in to comment.