Skip to content

Conversation

@dabrain34
Copy link
Contributor

@dabrain34 dabrain34 commented Apr 23, 2025

Multiple fixes were necessary to address the support of Intel Mesa ANV driver:

  • ANV driver has a separate transfer queue from the decode queue. Add support of this queue.

A transfer filter should be implemented as described in #34 (comment)

This patch fixes #13

@dabrain34 dabrain34 requested a review from zlatinski April 23, 2025 14:56
@dabrain34 dabrain34 force-pushed the dab_fix_anv_support branch 2 times, most recently from 2e390c8 to 1f9bcf8 Compare April 23, 2025 15:19
@dabrain34 dabrain34 changed the title Fix Intel ANV support Decoder: fix mesa driver support (ANV and RADV) Apr 24, 2025
@dabrain34
Copy link
Contributor Author

related to nvpro-samples/vk_video_samples#71

@dabrain34
Copy link
Contributor Author

@lolzballs can you have a look to this change too ?

Copy link
Contributor

@lolzballs lolzballs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not test with RADV, but the changes make sense and it works on AMD's Windows driver.

imageCreateInfo.pQueueFamilyIndices = &queueFamilyIndices;
imageCreateInfo.initialLayout = VK_IMAGE_LAYOUT_PREINITIALIZED;
imageCreateInfo.flags = 0;
imageCreateInfo.flags = VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this required?
From the spec:
If image was created with a multi-planar format, and the image view’s aspectMask is one of VK_IMAGE_ASPECT_PLANE_0_BIT, VK_IMAGE_ASPECT_PLANE_1_BIT or VK_IMAGE_ASPECT_PLANE_2_BIT, the view’s aspect mask is considered to be equivalent to VK_IMAGE_ASPECT_COLOR_BIT when used as a framebuffer attachment.

Does this have somethig todo with:
If image was created with the VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT and the image has a multi-planar format, and if subresourceRange.aspectMask is VK_IMAGE_ASPECT_PLANE_0_BIT, VK_IMAGE_ASPECT_PLANE_1_BIT, or VK_IMAGE_ASPECT_PLANE_2_BIT, format must be compatible with the corresponding plane of the image, and the sampler to be used with the image view must not enable sampler Y′CBCR conversion. The width and height of the single-plane image view must be derived from the multi-planar image’s dimensions in the manner listed for plane compatibility for the plane.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this required? From the spec: If image was created with a multi-planar format, and the image view’s aspectMask is one of VK_IMAGE_ASPECT_PLANE_0_BIT, VK_IMAGE_ASPECT_PLANE_1_BIT or VK_IMAGE_ASPECT_PLANE_2_BIT, the view’s aspect mask is considered to be equivalent to VK_IMAGE_ASPECT_COLOR_BIT when used as a framebuffer attachment.

Right, I think this is just a workaround.. but still the spec also says:

If the image has a multi-planar format, subresourceRange.aspectMask is VK_IMAGE_ASPECT_COLOR_BIT,
and usage includes VK_IMAGE_USAGE_SAMPLED_BIT, then the format must be identical to
the image format and  the sampler to be used with the image view must enable 
sampler Y′CBCR conversion.

So, I think, the IV format should be identical to the image format. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so this above sentence is for the usage with graphics pipeline. It says: YCbCr images must be used with a sampler in the graphics/compute pipelines or as individual planes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I mean in the code, the IV format is NOT identical to the image format, which I think it's not valid. So we need to fix this?

Copy link
Contributor

@zzoon zzoon May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This violates the spec for sure.

VVL complains like that:

vkCreateImageView(): pCreateInfo->format VK_FORMAT_R8_UNORM is different from VkImage 
0x190000000019 format (VK_FORMAT_G8_B8R8_2PLANE_420_UNORM). Formats MUST be IDENTICAL 
unless VK_IMAGE_CREATE_MUTABLE_FORMAT BIT was set on image creation.

The Vulkan spec states: If image was not created with the VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT flag,
or if the format of the image is a multi-planar format and if subresourceRange.aspectMask is 
VK_IMAGE_ASPECT_COLOR_BIT, format must be identical to the format used to create image 
(https://docs.vulkan.org/spec/latest/chapters/resources.html#VUID-VkImageViewCreateInfo-image-01762)

So we need to set VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT(as this patch) or do not try to create iv for each plane.
Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree with the need for MUTABLE.

Copy link
Contributor

@zlatinski zlatinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is the most effective way to implement the transfer. Consider using a filter for your transfer. Furthermore, we can derive a filter class that uses transfer-only operations. The filter framework doesn't allocate and free resources on each frame and uses a semaphore for synchronization (it doesn't stall the pipeline).

m_vkDevCtx->CmdEndVideoCodingKHR(frameDataSlot.commandBuffer, &decodeEndInfo);

if (m_useTransferOperation == VK_TRUE) {
if (m_useTransferOperation == VK_TRUE && m_transferCommandPool == VK_NULL_HANDLE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried using the filter?

--enablePostProcessFilter 0

This should work on any implementations.

Copy link
Contributor Author

@dabrain34 dabrain34 May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm this is working. Thanks for your recommendation.
It was not obvious from the command line help.
So I changed the documentation to explain this param and its default value.

What is the most efficient the transfer in the decode queue or the compute filter ? I enabled the compute filter YCBCRCOPY(1) in order to support Mesa driver by default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we prefer the compute based copy rather than the transfer based which uses vkCmdCopyImage?

If an implementation has dedicated transfer HW it would be more efficient to use the transfer queue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So vkCmdCopyImage with dedicated transfer queue will be more efficient that the compute based copy ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes since it might use DMA based copy, so that the compute units are free to do other stuff or even be off the conserve power.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I should keep the transfer queue usage seen that it is not used by nvidia

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I brought back the use of this transfer queue as it does not harm if the decode and transfer are on the same queue.

Copy link
Contributor Author

@dabrain34 dabrain34 May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compared the use of transfer queue against Process Filter and the result is quite eloquent:

With Intel mesa driver:

  • --noPresent --postProcessFilterType 1 : Frame 301, FPS: 1197.6
  • --noPresent --postProcessFilterType 0 : Frame 301, FPS: 2461.32

With nvidia:

  • --noPresent --postProcessFilterType 1 : Frame 301, FPS: 1534.82
  • --noPresent --postProcessFilterType 0 : Frame 301, FPS: 3040.68

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we prefer the compute based copy rather than the transfer based which uses vkCmdCopyImage?
If an implementation has dedicated transfer HW it would be more efficient to use the transfer queue.
I'm not saying that compute would be more efficient here on all HW. But if we are going to allocate and free resources on each frame and stall the pipeline with fences, then the compute filter would be much more efficient.

The filter interface is a generic class. Instead of using the compute implementation, one can inherit a transfer-based filter. This class provides pre-allocated command buffers, fences, and semaphores. So,

When you are allocating the object for the filter, just create an instance of class transfer, not compute. The rest of the code would work the same. The filter is using a semaphore to synchronize with the video queues without stalling the pipeline.

@dabrain34 dabrain34 force-pushed the dab_fix_anv_support branch 4 times, most recently from a112eef to 19a1e10 Compare May 7, 2025 09:49
@zlatinski
Copy link
Contributor

I suggest this MR be split into multiple topic-specific MRs. Some of the changes are ready to be merged; others need more work.

@dabrain34 dabrain34 force-pushed the dab_fix_anv_support branch from 19a1e10 to 3b4d826 Compare May 13, 2025 15:47
@dabrain34 dabrain34 force-pushed the dab_fix_anv_support branch 2 times, most recently from b63af07 to c197c0a Compare May 19, 2025 14:44
@dabrain34
Copy link
Contributor Author

#40 has been created to separate the MR
#43 has been created to separate the MR

@dabrain34 dabrain34 force-pushed the dab_fix_anv_support branch from 70baac6 to 822dedf Compare June 17, 2025 16:27
@dabrain34 dabrain34 changed the title Decoder: fix mesa driver support (ANV and RADV) WIP: Decoder: fix mesa driver support (ANV and RADV) Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RADV, ANV: Decoder: fails to decode any codec: Assertion `image_view->format == image->format'

5 participants