Skip to content

Conversation

@xlz
Copy link
Member

@xlz xlz commented Feb 7, 2016

Previous attempt: #210 (including performance numbers)

Now tested with each commit tested on Debian/Ubuntu.

abigail reports some "indirect sub-type change" of PacketPipeline::comp_. Actual testing using old Protonect with new libfreenect.so works. I consider this OK, unless any ABI experts point out the problem.

Interestingly there is a performance regression from kernel 3.19 (230Hz) to 4.2 (160Hz) without userspace change. The kernel performance regression is fixed in torvalds/linux@f87a780 (4.4-rc7). For Ubuntu kernels, 4.2.0-28.33~14.04.1, or 4.4+ have this fixed. Kernels of 4.0 or earlier are not affected.

@xlz xlz mentioned this pull request Feb 7, 2016
@xlz
Copy link
Member Author

xlz commented Feb 10, 2016

@christiankerl I implemented your original idea of memory pool of size 2. As I expected, this is tricky to get right across multiple threads, as there is no explicit guarantee (there are some implicit conditions, but easy to get wrong) that the frame will be freed before the next allocation. I used your design such that a potential future improvement of zerocopy between processors and frame listeners can be done in similar ways without changing the API.

@rastaxe @tlind @guanchar The VAAPI feature is now ready for merge. Can you give it a run, and I'll merge it next week.

Note that there seems to be some performance regression between kernel 3.19 and 4.x.

return new VTRgbPacketProcessor();
#elif defined(LIBFREENECT2_WITH_VAAPI_SUPPORT)
RgbPacketProcessor *vaapi = new VaapiRgbPacketProcessor();
if (vaapi->good())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a factory method to VaapiRgbPacketProcessor like tryCreate, which returns true if vaapi is available and false otherwise, depending on this you choose the code path here. this would avoid extension of the packetprocessor api with the good() method which is only used here (as far as I can see)?!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could. I am thinking of adding check against errors in packet processors. Errors can happen anytime at runtime in packet processors. Once errors happen, there should be a way to report this to the user. Something outside of packet processors needs the check for the error state. I chose the interface as PackteProcessor::good(). What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, didn't know this

@christiankerl
Copy link
Contributor

did you try the allocator stuff with your cuda depth processor? just want to see if the api is sufficient or needs more changes.

@xlz
Copy link
Member Author

xlz commented Feb 13, 2016

I had not. I have been cleaning up the cuda series since yesterday.

@xlz
Copy link
Member Author

xlz commented Feb 13, 2016

v6:

  • Made *RgbPacketProcess:process() public from protected
  • Changed PoolAllocator from a template to accepting base allocator in its constructor.
  • Moved the allocator definitions to allocator.{cpp,h} replacing double_buffer.{cpp.h}

Major changes, need retesting.

@xlz
Copy link
Member Author

xlz commented Feb 13, 2016

v7: Cosmetic

@xlz
Copy link
Member Author

xlz commented Feb 14, 2016

I tried it with cuda processor. The API is OK.

@xlz
Copy link
Member Author

xlz commented Feb 14, 2016

v8: Fixed a memory leak in NewAllocator::free(). Removed unnecessary releaseBuffer() in VaapiRgbPacketProcessor::process().

It was somehow protected accidentally.
@xlz
Copy link
Member Author

xlz commented Feb 15, 2016

v9: Changed PoolAllocator::allocate() from try_lock(); lock() to a single buffers[next].lock(); next = !next;. Added comments specifying PoolAllocator's behavior.

PoolAllocator can allocate in the same thread at most twice, then the next allocate() will block until free()'d by another thread (but this won't happen between stream parsers and processors - this blocking is intended for between processors and frame listeners).

xlz added 7 commits February 15, 2016 10:52
Provide memory-mapped packet buffers allocated by VA-API to the
RGB stream parser to save a 700KB malloc & memcpy.

Reuse decoding results from the first JPEG packet for all
following packets, assuming JPEG coding parameters do not change
based on some testing.
@xlz xlz merged commit b305053 into OpenKinect:master Feb 15, 2016
@xlz xlz deleted the vaapi-again branch February 15, 2016 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants