You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 2023/03/road-to-vulkan/index.html
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -74,7 +74,7 @@ <h1 class=entry-title>Paving the Road to Vulkan on Asahi Linux</h1><ul class=blo
74
74
<li>Create a framebuffer (possibly shareable), but don’t share it yet.</li><li>Render stuff into the buffer.</li><li>Share it.</li></ol><p>When we submit the rendering command, it doesn’t look like it’s shared yet, so the driver doesn’t do the implicit sync dance… and then when the app shares it, it’s too late, and it doesn’t have the right fence attached to it. Whoever is on the other side will try to use the buffer, and won’t wait until the render is complete. Whoops!</p><p>I had to add a mechanism that keeps track of sync object IDs for all submitted but not complete batches, and attaches them to all buffers that are written. Then if those buffers are shared before we know those batches are complete, we can retroactively attach the fences.</p><p>Interestingly, when I brought this up with the Intel folks working on the Xe merge request… they hadn’t heard of this before! It looks like their driver might have the same bug… I guess they might want to start testing with Sway ^^;;</p><p>Are we done yet? Mostly, though there are still bugs to squash… and we haven’t even talked about the kernel yet!</p><h2id=explicit-sync-meets-rust>Explicit Sync Meets Rust</h2><p>The previous version of the Asahi DRM kernel driver was pretty bare-bones in how it interacted with the rest of the kernel, since it had a very simple UAPI. I only had to add Rust abstractions for these DRM APIs:</p><ul>
75
75
<li><code>drv</code> and <code>device</code>, the core of DRM drivers and handling devices.</li><li><code>file</code>, which is how DRM drivers interact with userspace.</li><li><code>gem</code>, which manages memory for GPUs with unified memory.</li><li><code>mm</code>, a generic memory range allocator which my driver uses for several things.</li><li><code>ioctl</code>, just some wrappers to calculate DRM ioctl numbers for the UAPI.</li></ul><p>To add proper explicit sync support, I had to add a bunch of new abstractions!</p><ul>
76
76
<li><code>dma_fence</code>, the core Linux DMA fence mechanism.</li><li><code>syncobj</code>, DRM’s sync object API.</li><li><code>sched</code>, which is the DRM component in charge of actually queuing GPU work and scheduling it.</li><li><code>xarray</code>, a generic kernel data structure that is basically an <code>int</code> → <code>void *</code> mapping, which I use to keep track of userspace UAPI objects like VMs and queues by their unique ID.</li></ul><p>I’ve now <ahref=https://lore.kernel.org/asahi/687b54e7-b9a6-f37b-e5e6-8972e3670cc1@asahilina.net/T/#t>sent out</a> all the DRM abstractions for initial review, so we can get them upstream as soon as possible and, after that, upstream the driver itself!</p><p>As part of this work, I even found two memory safety bugs in the DRM scheduler component that were causing kernel oopses for Alyssa and other developers, so the Rust driver work also benefits other kernel drivers that use this shared code! Meanwhile, I still haven’t gotten any reports of kernel oopses due to bugs in the Rust code at all~ ✨</p><h2id=even-more-stuff>Even more stuff!</h2><p>Explicit sync is the biggest change for this release, but there’s even more! Since we want to get the UAPI as close as possible to the final version, I’ve been working on adding lots more stuff:</p><ul>
77
-
<li>Multiple GPU VMs (virtual memory address spaces) and GEM object binding based on the Xe UAPI model, to support future Vulkan requirements.</li><li>A result buffer, so the kernel driver can send GPU job execution results back to Mesa. This includes things like statistics and timings, but also whether the command succeeded and detailed fault information, so you can get verbose fault decoding right in Mesa!</li><li>Compute job support, to run compute shaders. We’re still working on the Mesa side of this, but it should be enough to pass most tests and eventually add OpenCL support with <ahref=https://www.phoronix.com/news/Rusticl-OpenCL-3.0-Conformance>Rusticl</a>!</li><li>The ability to submit multiple GPU jobs at once, and specify their dependencies directly, without using sync objects. This allows the GPU firmware to autonomously execute everything, which is a lot more efficient than going through the DRM scheduler every time. The Gallium driver doesn’t use this yet, but it probably will in the future, and our upcoming Vulkan driver definitely will! There are <ahref=https://github.com/AsahiLinux/docs/wiki/SW:AGX-driver-notes#queues>a lot of subtleties</a> around how all the queuing stuff works…</li><li>Stub support for blit commands. We don’t know how these work yet, but at least we have some skeleton support in the UAPI.</li></ul><p>To make all this work on the driver side, I ended up refactoring the <ahref=https://github.com/AsahiLinux/linux/blob/gpu/rust-wip/drivers/gpu/drm/asahi/workqueue.rs>workqueue</a> code and adding a whole new <ahref=https://github.com/AsahiLinux/linux/tree/gpu/rust-wip/drivers/gpu/drm/asahi/queue>queue</a> module which adds all the infrastructure to use sync objects to track command dependencies and completions and manage work via the DRM scheduler. Phew!</p><h2id=conclusions>Conclusions</h2><p>So what does this all mean for users of the Asahi Linux reference distro today? It means… things are way faster!</p><p>Since the Mesa driver no longer serializes GPU and CPU work, performance has improved a ton. Now we can run Xonotic at over 800 FPS, which is faster than macOS on the same hardware (M2 MacBook Air) at around 600*! This proves that open source reverse engineered GPU drivers really have the power to beat Apple’s drivers in real-world scenarios!</p><p>Not only that, our driver passes 100% of the dEQP-GLES2 and dEQP-EGL conformance tests, which is better OpenGL conformance than macOS for that version. But we’re not stopping there of course, with full GLES 3.0 and 3.1 support well underway thanks to Alyssa’s tireless efforts! You can follow the driver’s feature support progress over at the <ahref=https://mesamatrix.net/>Mesa Matrix</a>. There have been many, many other improvements over the past few months, and we hope you find things working better and more smoothly across the board!</p><p>Of course, there are lots of new corner cases we can hit now that we have support for implicit sync with an explicit sync driver. We already know of at least one minor regression (brief magenta squares for a couple of frames when KDE starts up), and there’s probably more, so please report any issues on <ahref=https://github.com/AsahiLinux/linux/issues/72>the GitHub tracker bug</a>! The more issue reports we get, especially if they come with easy ways to reproduce the problem, the easier it is for us to debug these problems and fix them ^^.</p><p>* <em>Please don’t take the exact number <strong>too</strong> seriously, as there are other differences too (Xonotic runs under Rosetta on macOS, but it was also rendering at a lower resolution there due to being a non-Retina app). The point is that the results are in the same league, and we will only keep improving our driver going forward!</em></p><h2id=get-it>Get it!</h2><p>If you’re already using the GPU drivers, just update your system and reboot to get the new version! Keep in mind that since the UAPI changed (a lot), apps will probably stop launching or will launch with software rendering until you reboot.</p><p>If you still haven’t tried the new drivers, just install the packages:</p><pretabindex=0><code>$ sudo pacman -Syu
77
+
<li>Multiple GPU VMs (virtual memory address spaces) and GEM object binding based on the Xe UAPI model, to support future Vulkan requirements.</li><li>A result buffer, so the kernel driver can send GPU job execution results back to Mesa. This includes things like statistics and timings, but also whether the command succeeded and detailed fault information, so you can get verbose fault decoding right in Mesa!</li><li>Compute job support, to run compute shaders. We’re still working on the Mesa side of this, but it should be enough to pass most tests and eventually add OpenCL support with <ahref=https://www.phoronix.com/news/Rusticl-OpenCL-3.0-Conformance>Rusticl</a>!</li><li>The ability to submit multiple GPU jobs at once, and specify their dependencies directly, without using sync objects. This allows the GPU firmware to autonomously execute everything, which is a lot more efficient than going through the DRM scheduler every time. The Gallium driver doesn’t use this yet, but it probably will in the future, and our upcoming Vulkan driver definitely will! There are <ahref=/docs/SW-AGX-driver-notes#queues>a lot of subtleties</a> around how all the queuing stuff works…</li><li>Stub support for blit commands. We don’t know how these work yet, but at least we have some skeleton support in the UAPI.</li></ul><p>To make all this work on the driver side, I ended up refactoring the <ahref=https://github.com/AsahiLinux/linux/blob/gpu/rust-wip/drivers/gpu/drm/asahi/workqueue.rs>workqueue</a> code and adding a whole new <ahref=https://github.com/AsahiLinux/linux/tree/gpu/rust-wip/drivers/gpu/drm/asahi/queue>queue</a> module which adds all the infrastructure to use sync objects to track command dependencies and completions and manage work via the DRM scheduler. Phew!</p><h2id=conclusions>Conclusions</h2><p>So what does this all mean for users of the Asahi Linux reference distro today? It means… things are way faster!</p><p>Since the Mesa driver no longer serializes GPU and CPU work, performance has improved a ton. Now we can run Xonotic at over 800 FPS, which is faster than macOS on the same hardware (M2 MacBook Air) at around 600*! This proves that open source reverse engineered GPU drivers really have the power to beat Apple’s drivers in real-world scenarios!</p><p>Not only that, our driver passes 100% of the dEQP-GLES2 and dEQP-EGL conformance tests, which is better OpenGL conformance than macOS for that version. But we’re not stopping there of course, with full GLES 3.0 and 3.1 support well underway thanks to Alyssa’s tireless efforts! You can follow the driver’s feature support progress over at the <ahref=https://mesamatrix.net/>Mesa Matrix</a>. There have been many, many other improvements over the past few months, and we hope you find things working better and more smoothly across the board!</p><p>Of course, there are lots of new corner cases we can hit now that we have support for implicit sync with an explicit sync driver. We already know of at least one minor regression (brief magenta squares for a couple of frames when KDE starts up), and there’s probably more, so please report any issues on <ahref=https://github.com/AsahiLinux/linux/issues/72>the GitHub tracker bug</a>! The more issue reports we get, especially if they come with easy ways to reproduce the problem, the easier it is for us to debug these problems and fix them ^^.</p><p>* <em>Please don’t take the exact number <strong>too</strong> seriously, as there are other differences too (Xonotic runs under Rosetta on macOS, but it was also rendering at a lower resolution there due to being a non-Retina app). The point is that the results are in the same league, and we will only keep improving our driver going forward!</em></p><h2id=get-it>Get it!</h2><p>If you’re already using the GPU drivers, just update your system and reboot to get the new version! Keep in mind that since the UAPI changed (a lot), apps will probably stop launching or will launch with software rendering until you reboot.</p><p>If you still haven’t tried the new drivers, just install the packages:</p><pretabindex=0><code>$ sudo pacman -Syu
78
78
$ sudo pacman -S linux-asahi-edge mesa-asahi-edge
79
79
$ sudo update-grub
80
80
</code></pre><p>Then if you’re using KDE, make sure you have the Wayland session installed too:</p><pretabindex=0><code>$ sudo pacman -S plasma-wayland-session
0 commit comments