Fix: Thunderbolt eGPU hot-unplug kernel support#985
Fix: Thunderbolt eGPU hot-unplug kernel support#985bdandy wants to merge 20 commits intoNVIDIA:mainfrom
Conversation
619bbe9 to
3b5943d
Compare
|
I think that #984 is not related as it's about wrong detection of external gpu. PR was tested on Thunderbolt 3 with 3060 GPU and everything working perfectly now (I was waiting for a fix more than few years). As PR was initially created for 580.105.08 - merged it with master Please review the changes and apply if possible! Additionally created AUR package for those who need it right now https://aur.archlinux.org/packages/nvidia-open-egpu-dkms |
Sorry, I should have been more clear: I'm also facing crashes on hot unplug and driver unload as well, mine just happened to be on a TB5 enclosure (which also happened to fail eGPU detection). Will try your patch alongside mine (on #984) when I get a moment and see if resolves crashes on TB5 as well 👍 |
e941598 to
1ba8ccb
Compare
1ba8ccb to
5a12cfb
Compare
|
I don't think this repository is maintained… I looked through the list of commits and PRs, and it turns out they never merge pull-requests, at least as far back as 2023 there wasn't a single one they merged. |
|
@bdandy Can you modify this patch to work without timeout on forced driver unbind? |
|
@neon12345 well the patch is not perfect, I guess there are some things added only by battle testing eGPU unplugging on my laptop many times (with games running etc) so it reflects issues found with it, however if you can explain the issue with timeout I will update the PR |
|
I switch between NVIDIA and vfio_pci for GPU passthrough on the host, and in my opinion, the driver should just unload as requested, no matter the usage count. I have now managed to get it to zero, but if you need to know what happened, you could in nv-pci.c:
|
Thunderbolt eGPU Surprise Removal Support
Prevents kernel crashes when a Thunderbolt eGPU is unplugged unexpectedly.
Changes
nvInvalidateDeviceReferences()nvKmsIsDeviceValid()checks, safe event dispatchinSurpriseRemovalflag, skip nvKms calls on unpluguvm_parent_gpu_is_accessible()guards in ISR, cleanup, and memory pathsgpuIsLostchecks, graceful session teardownKey Protections
Testing
RTX 3060 + Thunderbolt 3: idle unplug, workload unplug, reconnect, module reload
fixes #842