WIP: Decoder: fix mesa driver support (ANV and RADV) #34

dabrain34 · 2025-04-23T14:55:30Z

Multiple fixes were necessary to address the support of Intel Mesa ANV driver:

ANV driver has a separate transfer queue from the decode queue. Add support of this queue.

A transfer filter should be implemented as described in #34 (comment)

This patch fixes #13

dabrain34 · 2025-04-24T15:16:12Z

related to nvpro-samples/vk_video_samples#71

dabrain34 · 2025-04-24T15:20:38Z

@lolzballs can you have a look to this change too ?

lolzballs

Did not test with RADV, but the changes make sense and it works on AMD's Windows driver.

zlatinski · 2025-05-01T15:25:52Z

common/libs/VkCodecUtils/VulkanFrame.cpp

    imageCreateInfo.pQueueFamilyIndices = &queueFamilyIndices;
    imageCreateInfo.initialLayout = VK_IMAGE_LAYOUT_PREINITIALIZED;
-    imageCreateInfo.flags = 0;
+    imageCreateInfo.flags = VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT;


Why is this required?
From the spec:
If image was created with a multi-planar format, and the image view’s aspectMask is one of VK_IMAGE_ASPECT_PLANE_0_BIT, VK_IMAGE_ASPECT_PLANE_1_BIT or VK_IMAGE_ASPECT_PLANE_2_BIT, the view’s aspect mask is considered to be equivalent to VK_IMAGE_ASPECT_COLOR_BIT when used as a framebuffer attachment.

Does this have somethig todo with:
If image was created with the VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT and the image has a multi-planar format, and if subresourceRange.aspectMask is VK_IMAGE_ASPECT_PLANE_0_BIT, VK_IMAGE_ASPECT_PLANE_1_BIT, or VK_IMAGE_ASPECT_PLANE_2_BIT, format must be compatible with the corresponding plane of the image, and the sampler to be used with the image view must not enable sampler Y′CBCR conversion. The width and height of the single-plane image view must be derived from the multi-planar image’s dimensions in the manner listed for plane compatibility for the plane.

Why is this required? From the spec: If image was created with a multi-planar format, and the image view’s aspectMask is one of VK_IMAGE_ASPECT_PLANE_0_BIT, VK_IMAGE_ASPECT_PLANE_1_BIT or VK_IMAGE_ASPECT_PLANE_2_BIT, the view’s aspect mask is considered to be equivalent to VK_IMAGE_ASPECT_COLOR_BIT when used as a framebuffer attachment.

Right, I think this is just a workaround.. but still the spec also says:

If the image has a multi-planar format, subresourceRange.aspectMask is VK_IMAGE_ASPECT_COLOR_BIT, and usage includes VK_IMAGE_USAGE_SAMPLED_BIT, then the format must be identical to the image format and the sampler to be used with the image view must enable sampler Y′CBCR conversion.

So, I think, the IV format should be identical to the image format. Thoughts?

Yes, so this above sentence is for the usage with graphics pipeline. It says: YCbCr images must be used with a sampler in the graphics/compute pipelines or as individual planes.

Yeah, I mean in the code, the IV format is NOT identical to the image format, which I think it's not valid. So we need to fix this?

This violates the spec for sure.

VVL complains like that:

vkCreateImageView(): pCreateInfo->format VK_FORMAT_R8_UNORM is different from VkImage 0x190000000019 format (VK_FORMAT_G8_B8R8_2PLANE_420_UNORM). Formats MUST be IDENTICAL unless VK_IMAGE_CREATE_MUTABLE_FORMAT BIT was set on image creation. The Vulkan spec states: If image was not created with the VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT flag, or if the format of the image is a multi-planar format and if subresourceRange.aspectMask is VK_IMAGE_ASPECT_COLOR_BIT, format must be identical to the format used to create image (https://docs.vulkan.org/spec/latest/chapters/resources.html#VUID-VkImageViewCreateInfo-image-01762)

So we need to set VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT(as this patch) or do not try to create iv for each plane.
Thoughts?

Yeah, I agree with the need for MUTABLE.

zlatinski

I don't believe this is the most effective way to implement the transfer. Consider using a filter for your transfer. Furthermore, we can derive a filter class that uses transfer-only operations. The filter framework doesn't allocate and free resources on each frame and uses a semaphore for synchronization (it doesn't stall the pipeline).

vk_video_decoder/libs/VkVideoDecoder/VkVideoDecoder.cpp

zlatinski · 2025-05-01T15:42:50Z

vk_video_decoder/libs/VkVideoDecoder/VkVideoDecoder.cpp

    m_vkDevCtx->CmdEndVideoCodingKHR(frameDataSlot.commandBuffer, &decodeEndInfo);

-    if (m_useTransferOperation == VK_TRUE) {
+    if (m_useTransferOperation == VK_TRUE && m_transferCommandPool == VK_NULL_HANDLE) {


Have you tried using the filter?

--enablePostProcessFilter 0

This should work on any implementations.

I confirm this is working. Thanks for your recommendation.
It was not obvious from the command line help.
So I changed the documentation to explain this param and its default value.

What is the most efficient the transfer in the decode queue or the compute filter ? I enabled the compute filter YCBCRCOPY(1) in order to support Mesa driver by default.

Why would we prefer the compute based copy rather than the transfer based which uses vkCmdCopyImage?

If an implementation has dedicated transfer HW it would be more efficient to use the transfer queue.

So vkCmdCopyImage with dedicated transfer queue will be more efficient that the compute based copy ?

Yes since it might use DMA based copy, so that the compute units are free to do other stuff or even be off the conserve power.

so I should keep the transfer queue usage seen that it is not used by nvidia

I brought back the use of this transfer queue as it does not harm if the decode and transfer are on the same queue.

I compared the use of transfer queue against Process Filter and the result is quite eloquent:

With Intel mesa driver:

--noPresent --postProcessFilterType 1 : Frame 301, FPS: 1197.6

--noPresent --postProcessFilterType 0 : Frame 301, FPS: 2461.32

With nvidia:

--noPresent --postProcessFilterType 1 : Frame 301, FPS: 1534.82

--noPresent --postProcessFilterType 0 : Frame 301, FPS: 3040.68

Why would we prefer the compute based copy rather than the transfer based which uses vkCmdCopyImage?
If an implementation has dedicated transfer HW it would be more efficient to use the transfer queue.
I'm not saying that compute would be more efficient here on all HW. But if we are going to allocate and free resources on each frame and stall the pipeline with fences, then the compute filter would be much more efficient.

The filter interface is a generic class. Instead of using the compute implementation, one can inherit a transfer-based filter. This class provides pre-allocated command buffers, fences, and semaphores. So,

When you are allocating the object for the filter, just create an instance of class transfer, not compute. The rest of the code would work the same. The filter is using a semaphore to synchronize with the video queues without stalling the pipeline.

common/libs/VkShell/Shell.cpp

common/libs/VkCodecUtils/DecoderConfig.h

zlatinski · 2025-05-09T08:28:32Z

I suggest this MR be split into multiple topic-specific MRs. Some of the changes are ready to be merged; others need more work.

dabrain34 · 2025-05-19T14:50:08Z

#40 has been created to separate the MR
#43 has been created to separate the MR

dabrain34 requested a review from zlatinski April 23, 2025 14:56

dabrain34 force-pushed the dab_fix_anv_support branch 2 times, most recently from 2e390c8 to 1f9bcf8 Compare April 23, 2025 15:19

dabrain34 changed the title ~~Fix Intel ANV support~~ Decoder: fix mesa driver support (ANV and RADV) Apr 24, 2025

lolzballs approved these changes Apr 25, 2025

View reviewed changes

zlatinski reviewed May 1, 2025

View reviewed changes

zlatinski requested changes May 1, 2025

View reviewed changes

zlatinski reviewed May 1, 2025

View reviewed changes

common/libs/VkShell/Shell.cpp Outdated Show resolved Hide resolved

dabrain34 force-pushed the dab_fix_anv_support branch 4 times, most recently from a112eef to 19a1e10 Compare May 7, 2025 09:49

zlatinski reviewed May 9, 2025

View reviewed changes

common/libs/VkCodecUtils/DecoderConfig.h Outdated Show resolved Hide resolved

dabrain34 force-pushed the dab_fix_anv_support branch from 19a1e10 to 3b4d826 Compare May 13, 2025 15:47

dabrain34 mentioned this pull request May 19, 2025

VulkanImage: set the image create format #43

Merged

dabrain34 force-pushed the dab_fix_anv_support branch 2 times, most recently from b63af07 to c197c0a Compare May 19, 2025 14:44

dabrain34 force-pushed the dab_fix_anv_support branch from c197c0a to 70baac6 Compare June 2, 2025 11:52

dabrain34 force-pushed the main branch from fdd3cfc to c29c26e Compare June 11, 2025 10:28

VkVideoDecoder: support a separate transfer queue

822dedf

dabrain34 force-pushed the dab_fix_anv_support branch from 70baac6 to 822dedf Compare June 17, 2025 16:27

dabrain34 force-pushed the main branch from 3c8497b to c5155c0 Compare June 20, 2025 09:21

dabrain34 changed the title ~~Decoder: fix mesa driver support (ANV and RADV)~~ WIP: Decoder: fix mesa driver support (ANV and RADV) Jun 25, 2025

dabrain34 force-pushed the main branch from 82ffea8 to 9de2b32 Compare August 19, 2025 09:45

dabrain34 mentioned this pull request Dec 18, 2025

Add support for cross-queue family transfer operations #133

Open

WIP: Decoder: fix mesa driver support (ANV and RADV) #34

Are you sure you want to change the base?

WIP: Decoder: fix mesa driver support (ANV and RADV) #34

Uh oh!

Conversation

dabrain34 commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dabrain34 commented Apr 24, 2025

Uh oh!

dabrain34 commented Apr 24, 2025

Uh oh!

lolzballs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zzoon May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zlatinski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dabrain34 May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dabrain34 May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zlatinski commented May 9, 2025

Uh oh!

dabrain34 commented May 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dabrain34 commented Apr 23, 2025 •

edited

Loading

zzoon May 16, 2025 •

edited

Loading

dabrain34 May 6, 2025 •

edited

Loading

dabrain34 May 7, 2025 •

edited

Loading