-
Notifications
You must be signed in to change notification settings - Fork 13
WIP: Decoder: fix mesa driver support (ANV and RADV) #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
2e390c8 to
1f9bcf8
Compare
|
related to nvpro-samples/vk_video_samples#71 |
|
@lolzballs can you have a look to this change too ? |
lolzballs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did not test with RADV, but the changes make sense and it works on AMD's Windows driver.
| imageCreateInfo.pQueueFamilyIndices = &queueFamilyIndices; | ||
| imageCreateInfo.initialLayout = VK_IMAGE_LAYOUT_PREINITIALIZED; | ||
| imageCreateInfo.flags = 0; | ||
| imageCreateInfo.flags = VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this required?
From the spec:
If image was created with a multi-planar format, and the image view’s aspectMask is one of VK_IMAGE_ASPECT_PLANE_0_BIT, VK_IMAGE_ASPECT_PLANE_1_BIT or VK_IMAGE_ASPECT_PLANE_2_BIT, the view’s aspect mask is considered to be equivalent to VK_IMAGE_ASPECT_COLOR_BIT when used as a framebuffer attachment.
Does this have somethig todo with:
If image was created with the VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT and the image has a multi-planar format, and if subresourceRange.aspectMask is VK_IMAGE_ASPECT_PLANE_0_BIT, VK_IMAGE_ASPECT_PLANE_1_BIT, or VK_IMAGE_ASPECT_PLANE_2_BIT, format must be compatible with the corresponding plane of the image, and the sampler to be used with the image view must not enable sampler Y′CBCR conversion. The width and height of the single-plane image view must be derived from the multi-planar image’s dimensions in the manner listed for plane compatibility for the plane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this required? From the spec: If image was created with a multi-planar format, and the image view’s aspectMask is one of VK_IMAGE_ASPECT_PLANE_0_BIT, VK_IMAGE_ASPECT_PLANE_1_BIT or VK_IMAGE_ASPECT_PLANE_2_BIT, the view’s aspect mask is considered to be equivalent to VK_IMAGE_ASPECT_COLOR_BIT when used as a framebuffer attachment.
Right, I think this is just a workaround.. but still the spec also says:
If the image has a multi-planar format, subresourceRange.aspectMask is VK_IMAGE_ASPECT_COLOR_BIT,
and usage includes VK_IMAGE_USAGE_SAMPLED_BIT, then the format must be identical to
the image format and the sampler to be used with the image view must enable
sampler Y′CBCR conversion.
So, I think, the IV format should be identical to the image format. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so this above sentence is for the usage with graphics pipeline. It says: YCbCr images must be used with a sampler in the graphics/compute pipelines or as individual planes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I mean in the code, the IV format is NOT identical to the image format, which I think it's not valid. So we need to fix this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This violates the spec for sure.
VVL complains like that:
vkCreateImageView(): pCreateInfo->format VK_FORMAT_R8_UNORM is different from VkImage
0x190000000019 format (VK_FORMAT_G8_B8R8_2PLANE_420_UNORM). Formats MUST be IDENTICAL
unless VK_IMAGE_CREATE_MUTABLE_FORMAT BIT was set on image creation.
The Vulkan spec states: If image was not created with the VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT flag,
or if the format of the image is a multi-planar format and if subresourceRange.aspectMask is
VK_IMAGE_ASPECT_COLOR_BIT, format must be identical to the format used to create image
(https://docs.vulkan.org/spec/latest/chapters/resources.html#VUID-VkImageViewCreateInfo-image-01762)
So we need to set VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT(as this patch) or do not try to create iv for each plane.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree with the need for MUTABLE.
zlatinski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this is the most effective way to implement the transfer. Consider using a filter for your transfer. Furthermore, we can derive a filter class that uses transfer-only operations. The filter framework doesn't allocate and free resources on each frame and uses a semaphore for synchronization (it doesn't stall the pipeline).
| m_vkDevCtx->CmdEndVideoCodingKHR(frameDataSlot.commandBuffer, &decodeEndInfo); | ||
|
|
||
| if (m_useTransferOperation == VK_TRUE) { | ||
| if (m_useTransferOperation == VK_TRUE && m_transferCommandPool == VK_NULL_HANDLE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried using the filter?
--enablePostProcessFilter 0
This should work on any implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirm this is working. Thanks for your recommendation.
It was not obvious from the command line help.
So I changed the documentation to explain this param and its default value.
What is the most efficient the transfer in the decode queue or the compute filter ? I enabled the compute filter YCBCRCOPY(1) in order to support Mesa driver by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we prefer the compute based copy rather than the transfer based which uses vkCmdCopyImage?
If an implementation has dedicated transfer HW it would be more efficient to use the transfer queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So vkCmdCopyImage with dedicated transfer queue will be more efficient that the compute based copy ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes since it might use DMA based copy, so that the compute units are free to do other stuff or even be off the conserve power.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I should keep the transfer queue usage seen that it is not used by nvidia
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I brought back the use of this transfer queue as it does not harm if the decode and transfer are on the same queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I compared the use of transfer queue against Process Filter and the result is quite eloquent:
With Intel mesa driver:
--noPresent --postProcessFilterType 1: Frame 301, FPS: 1197.6--noPresent --postProcessFilterType 0: Frame 301, FPS: 2461.32
With nvidia:
--noPresent --postProcessFilterType 1: Frame 301, FPS: 1534.82--noPresent --postProcessFilterType 0: Frame 301, FPS: 3040.68
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we prefer the compute based copy rather than the transfer based which uses vkCmdCopyImage?
If an implementation has dedicated transfer HW it would be more efficient to use the transfer queue.
I'm not saying that compute would be more efficient here on all HW. But if we are going to allocate and free resources on each frame and stall the pipeline with fences, then the compute filter would be much more efficient.
The filter interface is a generic class. Instead of using the compute implementation, one can inherit a transfer-based filter. This class provides pre-allocated command buffers, fences, and semaphores. So,
When you are allocating the object for the filter, just create an instance of class transfer, not compute. The rest of the code would work the same. The filter is using a semaphore to synchronize with the video queues without stalling the pipeline.
a112eef to
19a1e10
Compare
|
I suggest this MR be split into multiple topic-specific MRs. Some of the changes are ready to be merged; others need more work. |
19a1e10 to
3b4d826
Compare
b63af07 to
c197c0a
Compare
c197c0a to
70baac6
Compare
70baac6 to
822dedf
Compare
Multiple fixes were necessary to address the support of Intel Mesa ANV driver:
A transfer filter should be implemented as described in #34 (comment)
This patch fixes #13