-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows: Freezing in Kernel with libusb-win32 #1255
Comments
libusb0.dll (libusb-win32 API) library is not used. libusbK.dll will talk to libusb0.sys directly through IOCTLs. libusbK.sys (KMDF based) supports all the IOCTLs from libusb0.sys (WDM based). Does the following Wiki FAQ entry answer your question?
Edit --> Wiki updated. For libusb 1.0.24 onwards, it is alway use the following scheme, regardless of the presence of libusbK.dll.
|
Hello, thank you for your prompt response. Sorry that I missed the Wiki FAQ. The reason behind it is that I have an issue with libusb_clear_halt: when called to a libusb0.sys device, it sometimes freeze completely, making the whole process unkillable. I am trying to understand where in the call it freezes exactly and started by looking at libusb sources. But then I guess I shall have a look a libusbK so start with. |
Just wondering what is the version of libusb0.sys are you using? You may want to try the latest 1.2.7.3 release if you are still using the older libusb0.sys 1.2.6.0 version. Does the problem occure if you switch to WinUSB driver or libusbK.sys driver? If you suspect libusbK.dll issue, you may want to remove libusbK.dll and use WinUSB driver to try it out. Another thing, some devices may choke at |
Hello @mcuee : I am using 1.2.7.3 indeed. And this behavior is happening with plain physical Windows (in case you wonder if I am on virtual machine, since I posted PR #1202 . I still need to test with switching drivers indeed, though it is currently not an option to use them for real, beside for testing. And yes: I am pretty sure that there is some bugs in the device FW. Right before I get a freezing clear_halt I can see failed ControlTransfer (out), that should not happen. What I see is that as soon as I have a failed ControlTransfer, the next clear_halt will freeze for ever. My problem is that I currently don't have access to the FW, so fixing the bug is not an option. I am writing complicated code to detect the failure condition and try a way to reset the device withtout freezing or causing other issues on the software side. However, I feel like having a function that can freeze so hard that it prevent even Task Manager from killing its calling process, with no way of cancelling it, is not something that is desirable in general. I don't understand why clear_halt could not return with an error if it can not do its task in a reasonable amount of time. And I didn't test yet, but I am pretty sure (based on what I observed so far) that reset_device would freeze as well. Did I miss something in the API that could either let me specify a timeout for those functions, or let me call another function to cancel them? |
Interesting. However buggy the device is, this shouldn't freeze the program or the machine if a timeout* is given. There are some parts in libusb0.sys, especially the error paths, that I am unsure about. Do you have the WDK so that you can build and test patched drivers? Can you reproduce this in a VM (would be nice for such tests)? *) if not in the high-level API, there should be some timeout /somewhere/. libusb0.sys uses the timeout specified in the IOCTL, also for clear_feature. |
Thank you for your interest in my issue. "However buggy the device is, this shouldn't freeze the program": yes, that's my point, I am happy that we agree. As for now I am working on finding a workaround at the application level, that's the priority. But in parallel I will swap drivers to better understand who is responsible for this behavior, and in the end decide whether I could try to fix it or rather change driver for good. I should be able to build patched libusb0.sys versions if it happens that this code is where the freeze occurs. For reference: my device is made of a FX3 connected to a Zynq. The issue happen when the Zynq is power-cycled while the FX3 is kept alive. It occurs with a probabily of 1/5 approximately, so there is probably some dependency of the exact timing of the events. Also, I just noticed that I have the same issue with various devices when I hiibernate my computer, remove the devices, then wake it up: it seems that some part of the library / driver do not realize what just happened and go on with the tasks they were doing. If I call clear_halt at this point, it will freeze 100% of the times. |
I am wondering whether the "freezes" I experience are somehow related to what is described in #1127 . I will try with the code proposed in this PR as well. |
Any updates on this issue? Thanks. |
@mcuee : Regarding the "freeze": I should soon be able to reproduce it and study it more in depth. I'll keep you posted |
@mcuee : So I have been able to reproduce a similar issue, quite easily actually. I start the program, I can see that control transfer out are correctly transmitted and received, then I simply disconnect the USB cable (physically). The behavior that occurs depends on the driver used. With libusb0.sys 1.2.7.3, the next transfer will fail with error code LIBUSB_ERROR_IO, but the next one will completely freeze the program. From there I have an unkillable process that will stay there until I reboot the OS. Using kernel debugger I can see that it is frozen somewhere within libusb0:
I tried to have a look at libusb-win32 source code, but so far I didn't understand exactly what is the issue, and I doubt I will. Using Zadig to replace driver by either WinUSB or libusbK and trying the same experiment result in no freeze at all: after I unplugged the USB cable, So, it is not exactly the issue with I think this also shows that there is no real problem with libsub-1.0 itself. |
I am wondering whether this #1018 may help, I will give it a try |
I don't think 1018 will help for the kernel hang, but it is in any case appreciated if you test that PR. I notice your kernel dump is stuck in KeWaitForSingleObject() which is called from the libusb.sys parts that I mentioned above. From looking at the code I had a suspicion that the IRP locking is not done properly and there is a chance of races or deadlocks. I started working on some patches (based on the MS documentation) but haven't been able to test them myself. I could compile it with the WDK but didn't figure out the installation part. I can push this to an experimental libusb.sys PR and hope someone can look at it or test it. |
@tormodvolden : thank you. I can try to build a patched libusb0.sys and try again, or better just try to replace mine with your built libusb0.sys. I'll do this soon. |
@tormodvolden : would you mind attaching a built libusb0.sys containing your latest patch somewhere? Thanks in advance! |
@dontech |
Ok, no, it doesn't help with kernel hang, but I get 2 control transfer fails (with ERROR_IO) before the freeze ;-) I was hoping that the Apart from this I see no difference under normal behavior in between master and your PR. I guess this is good.. |
@sonatique : This should be the patched libusb.sys. I also included the other build artifacts (from running make_all.bat in libusb/ddk_make in a "Windows Win7 x64 Free Build Environment" window). I have done minimal testing of the patched libusb.sys with xusb just to see that it works and doesn't BSOD (copied the file into system32/drivers in a Recovery Command Line window and rebooted with "Disable driver signature enforcement"). |
@tormodvolden : so I tested your patched libusb0.sys by replacing my current one with yours (and of course rebooting with disabled driver signature enforcement). I then repeated the exact same procedure as described above:
So very good improvement since no hang at all, but still strange behavior I think: I thank you very much for this attempt, and for sure it goes into the right direction, but I think it is not usable as is since it could fool the application into thinking everything is fine when there are no longer any device attached. Anyway this "disable driver signature enforcement" thing makes me think we are doomed: we cannot any longer use driver that we built ourselves (I am yes we could use them but not distribute them, etc.), so basically we are stuck with staying with driver we built before the change in requirements made by MS, or switch to using WinUSB.... This basically makes any futur special driver development meaningless, expect maybe for niche/hobby things. What do you think? |
Thanks for testing this. I am very glad that the hang went away. I also wonder why "then OK again" happens. Can you please attach debug logs? This is still with PR 1018 applied, right?
For signed libusb0.sys driver updates, this is still possible because dontech has the appropriate certifications to sign it. |
Here is a debug build, I gave it version number 1.2.7.101. EDIT: Seems to work fine with DebugView. |
@tormodvolden : OK here are some results with your debug build (was tricky because of the overwhelming amount of log produced by the very fast looping). Here are some logs: Current master, commit 6bf2db6 (without PR 1018)Output from my modified sam3u_benchmark (a few printf and libusb-1.0 debug-level logs):
Corresponding libusb0.sys driver debug log:
then:
With PR 1018 commits (3) applied on top of current master (commit 6bf2db6)my program output:
Corresponding libusb0.sys driver debug log:
then:
|
@tormodvolden @dontech : I just tried with 1.2.7.4 snapshot of libusb0.sys driver (thanks for it) and got the same results (minus the driver logs obviously) |
@tormodvolden : I retested exactly the same with libusbK and WinUSB drivers, and (as previously mentionned) I get a completely, and in my opinion correct, behavior: As of now, I would say that unfortunately current libusb-1.0 is not usable with libusb0.sys:: with version <= 1.2.7.3 we get a kernel hang and with >= 1.2.7.4 we incorrectly get OK status for write request that can not physically be completed successfully... What do you think? Are there any reasonably possible improvements? Thank you anyway, this at least gave me some precious indications. EDIT: Could the "return success after cable removal problem" caused by libusbK.dll? And not by libusb-1.0 nor libusb0.sys? Given the results this is what seems to be the most probable to me... I don't understand why the "removal during transfer" would end-up in a different state than "removal while idle" except if there is a bug in libusbK.dll in the specific code that interact with libusb0.sys |
@mcuee : And yes: I can try with a direct libusb-win32.dll only program, let's see if I can do this. |
@mcuee : update (sorry for posting frantically): When I remove the cable, those 2 lines, only once:
Then after, this line for each false successful transfer: So it seems that somehow libusb0.sys is unloaded, but libubsK do not correctly handle this... |
Can you create an issue with libusbK project so that @TravisRo can take a look when he is available? Thanks. |
If you can debug a bit, please check the calling stack when k_Init_Version is called for a transfer, because that is a bit weird although an issue on its own. And the return value from k_Init_Version after unplugging, also inside it SubmitSimpleSyncRequestEx() -> Ioctl_Sync -> DeviceIoControl(). Somewhere a failure is ignored or turned into success. |
I am not very successful in using IRP. I must be missing something. I did what is explained under the link you posted earlier: and then tried to use |
I have no experience with it. I also not sure if that tracing (on libusb0.sys) will do anything after libusb0.sys has been unloaded. It may confirm that is actually has been unloaded though. |
@tormodvolden :
libusb-1.0 call But, if I do claim_interface before executing my control transfer loop, I get a completely different outcome:
So my issue is specific to cases where auto_claim -> Initialized is called, as if calling Initialize after driver unload would trigger a wrong behavior of libusbK |
Well, probably that libusb-1.0 code around the call to Initialize should understand that the device is no longer there, I am investigating why
does not return, for instance. EDIT: well, no, it's the Initialize call that should fail, I think, at least this is the expectation from libusb-1.0:
I think libusb-1.0 expect a non-zero return value from Initialize and then fail with LIBUSB_ERROR_NO_DEVICE, I cannot say whether this expectation is correct or not. |
But I still think, both in the case of failing control transfer reported as success, and the failing version retrieval reported as success, the DeviceIoControl() should have failed, and I don't think it does here. |
Yes, I also think Initialize should have failed already when it fails to retrieve the version information. But I wonder if it is the same underlying issue as in the control transfers, that the DeviceIoControl() returns success. |
Yes, something is strange in libubsK: if I breakpoint break right before the call to Initialize (from libusb-1.0), remove the cable, then go on with the execution, every subsequent calls (including claim_interface) are successful. So I think the root bug lies in libubsK Initialize |
Well, I instrumented libusbK code and run my faulting scenario. All I can say is that This is very strange to me, especially since doing something similar from inside libusbK solution (for instance using open-device example and removing the cable in between the enumeration and the init) works, I mean : fails as expected. So far I didn't figure out what could produce this behavior. I am thinking about some level of caching, but I don't know. If |
I won't exclude the possibility of a bug in libusb0.sys or the driver stack. That's also why I would like to be sure the driver is indeed unloaded, and not being passed IRPs from the I/O manager any longer. In our case userspace has been given a handle from CreateFileA() and is using it with DeviceIoControl() after the physical device has been removed. I am not sure what happens in detail. Since the driver unload has at least been initiated by the I/O manager, I would think the device object was already removed, but then I would expect DeviceIoControl() to fail. Actually I also thought the driver unload would only be called after all file handles have been closed, even if the physical device has disappeared. libusb is not closing it but maybe libusbk does in some cases? |
At this point I have to say I am a bit lost. I did not succeed in showing IRPs. What I observe is that if I run my program which does claim_interface, then a few control transfer, then break, then remove then cable then continue, everything happens as if the device was there: all transfer are returned to have successfully transferfed bytes. The only way to get errors it to remove the cable at some point during the control transfer, in which case I get errors from this point on. Indeed I don't see libusb-1.0 closing any handle up disconnection, maybe because hot plug is not supported. All handles seems to stay valid after removal if removal does not occur right in the middle of an operation, and low level / driver calls made from libusbK dll all succeed. I will try to reproduce with libusb-win32 only. |
I switched to libusb-win32 source code, latest master commit from github. I opened bulk.c from project testbulk. I just modified defines for VID and PID, then did the following:
Looking at the code I can see this call is eventually implemented with DeviceIoControl, and so, again this call fails to fail when there is no device. I have no idea why, and for now I don't know how to go on. |
So this is using libusb-win32 library/dll -> libusb0.sys without any libusbK involved? If you see the same here, it looks like libusb0.sys (or driver installation) is at fault. Maybe an experienced driver developer like @dontech could help us to get closer? |
Ah, I think this may be due to the fact that libusb-win32 API does not support hotplug. Same for libusb-1.0 under Windows. libusbK API does support hotplug. Can you use libusbK API and libusb0.sys to see if that works? |
I believe hotplug support would help here, but I don't see it as necessary. If a transfer cannot be completed successfully (whether due to unplugging or something else) there should be an error. |
Previously I verified that my libusb-1.0 test program indeed call this line (I put some printf in libubsK, like right before this line, code, compiled the dll and made sure to use it from libusb-1.0. So: when I run in debug, with my device initially plugged-in: first call so SubmitSimpleSyncRequestEx succeed (as expected). This is the expected behavior but I have no idea what is the magic that makes it works, while something similaire called from libusb-1.0 will result in the second call (while disconnected) succeed. I will be happy to continue to help you, but at this point the experiment is quite simple and you should be able to reproduce with any device, I guess, maybe it's easier. Anyway, if I have new idea on how to further comprehend the problem I'll try, but I am a bit stuck right now. |
@sonatique Just a summary of your findings. Is the following correct?
I will agree with you here. |
@mcuee : yes, correct. I tested this as carefully as I can to avoid going in the wrong direction, though I struggle to find a logic in these results. |
So I tested1.3.0.0 on Windows 10/11 x64 as well as Windows 11 ARM64 and it worked fine in all cases. I didn't see any functional issue so far, thanks! Strangely, while installation of it worked like a charm under x64, the signature was considered invalid under Windows 11 ARM64. I was still able to install it and run an application using it, though. This seems to be a different behavior than on Windows x64, I think, where a failed signature verification completely prevent installation. So far I wasn't able to figure out whether the cause of this installation glitch under ARM64 is caused by my INF file or by something else. While the signature of libusb0.sys is said to be valid by Explorer under ARM64 (right click on the file), something goes wrong during the install process.. I'll try to figure out why. |
@sonatique Reference: as of now, Zadig does not work with Windows 11 on ARM64. |
@mcuee Since the driver is signed, on Windows x64 there was no warning, no nothing, installation went straight. On ARM64 I got a red warning dialog telling me that the publisher is unknow, but gave me the choice to install anyway and afterward it didn't complain anymore. I did not disable signing check nor enable test signing, I used plain configuration. I installed the driver under ARM64 using the same CAT / INF file I use for Windows x64, just modified for adding ARM64 (I signed the .cat, but not the .sys) But maybe I did a mistake causing the red warning, I don't know. Maybe someone with more experience could tell. As of now I can not share my CAT / INF, I'll try to see if I can later. Do you know whether there exist a reference for it? Could Zadig be considered a reference? Maybe the fact that Zading is having issue as well as I do could be caused by an issue on Windows ARM64 itself, I don't know |
Interesting. Thanks for the info. So you have a valid code signing certificate (and it is an EV certificate), right? Then you sign the driver package to get a signed cat file under Windows 11 on ARM64, and then it works, even though there is a red warning dialog telling you that the publisher is unknown. libusb0.sys was already signed in this case by @dontech. This is very interesting. Supposedly the right way is to submit the driver package to Microsoft and carry out attestation signing which is what @dontech has done to get signed libusb0.sys. But you did not need to do that. And from what you mentioned, your method will not work for Windows 11 on x64, but strangely it works under WIndows 11 on ARM64. Interesting. @pbatard |
@mcuee : Actually I tried to things: an INF based on the "legacy" system where the driver ends-up installed in system32/drivers. (The problem is that it is not unique: different app could end-up installing different version of libsub0.sys and the last one will override previous one, but that is the way of doing that has been used so far). I also tried the more modern approach where the file ends up in the "Driver Store" with a unique name. I would prefer using this method. This worked without glitch for x64, but completely failed on ARM64: I just got a dialog saying "Windows encountered a problem installing the driver for your device" but no mention of signing issue (or any other issue). I am currently trying to understand how I could get more details about this error. DebugView doesn't work at all on ARM64, and I have no idea so far where to look for logs or something that could give me a hint about what is wrong. |
OK, I think i need to install WIN11 ARM64 on one of my boxes and debug the problem. I added ARM64 more or less blind-folded and did not have any way of verifying this. |
I have now tried this myself on ARM64 Windows 11. It seems to work great. I tried both:
Both worked without any problems. |
@sonatique I can recommend using the INF file from the repository under "ddk_make" as a template. This seems to work great on all supported platforms. |
And does this mean that, even if a given device is "officially" using libusb0 (i.e. libusb0.sys, libusb0.dll reported by Device Manager), then in the end all functions of winusb_interface are backed by libusbK implementation and none of libusb0 code will be called?
I am having this understanding based on this code snippet from windows_winsub.c, funciton winusbx_init, line 2505:
sub_api
takes value from 0 to 1, i.e.SUB_API_LIBUBSK
andSUB_API_LIBUSB0
, meanig that function pointers corresponding to libusb0 are actually pointing to libusbK.Thank you in advance for your help understanding this.
Best regards.
The text was updated successfully, but these errors were encountered: