ViGEm v1.10.2.0 released

Ey,

a big update of the bus driver was just released. I’m happy to announce, that I was finally able to fix two of the nastiest long-time bugs in the bus driver:

Fortunately both were linked to the same (stupid) mistake I made when porting some sections over from the legacy ScpVBus driver sub-project of the ScpServer/ScpToolkit. Since it’s a rather subtle yet “hidden in plain sight”-obvious bug I’d like to explain it in greater detail.

When you emulate USB devices, you typically want to mimic the physical devices properties as close as possible in your virtual environment as well. Sometimes (depending on the device and the driver stack it uses) you can get away with either slight or even major simplifications of the original implementation and everything is still working fine (the device might have unused interfaces you can ditch or special power/sleep attributes you don’t have to replicate in an emulated device) but typically – especially when debugging – it’s the best idea to emulate everything as precise as possible. Now why is that important? Let’s dive into more detail on how the Xbox 360 Controller interacts with its USB host.

An XUSB-compatible game pad typically exposes four interfaces, where the first one has two endpoints (Interrupt In and Interrupt Out), the second one has four endpoints (two pairs of Interrupt In and Interrupt Out) and the last two interfaces are unused and non of our concern.

When emulating the X360 pad, only the first interface and its Interrupt In endpoint is used to transfer pad state changes to the XUSB/XInput subsystem (while the Interrupt Out endpoint delivers vibration and LED state changes) which then get reflected to games using the XInput API (and DInput, ofc.). This was also the first logic I ported over and tested when first developing the ViGEm Bus ground up.

Now you have to know, that ViGEm is built using the Windows Driver Frameworks, an abstraction layer for lots and lots of boilerplate and pitfalls from WDM-land, which ScpVBus is written in. Why do I bring this up? Well, the same code you need in WDF to queue an IRP and mark it pending are roughly two, with proper locking, maybe five lines of code whereas in WDM it’s more like thirteen to fifteen lines of code. When hunting down the bus I mentioned I used the most basic option available; compare the code of the legacy yet working project to the brand new shiny yet faulty project. I read the same passages over and over and over and didn’t see any difference so I made the mistake of wasting my time looking in other areas for the issue. Then I came back and read again. And again. And ag… wait a minute. What’s this line doing, I’m not checking the value in ViGEm. Ok, that’s a major fail in plain sight but why is the check which pipe the request is associated to important?

if (Pipe == 0x81)
{
	status = STATUS_PENDING;
	IoMarkIrpPending(Irp);

	KeAcquireSpinLock(&pdoData->PendingQueueLock, &PrevIrql);
	{
		IoSetCancelRoutine(Irp, Bus_CancelIrp);

		if (Irp->Cancel == TRUE)
		{
			IoSetCancelRoutine(Irp, NULL);
			KeReleaseSpinLock(&pdoData->PendingQueueLock, PrevIrql);

			Irp->IoStatus.Status = STATUS_CANCELLED;
			Irp->IoStatus.Information = 0;

			IoCompleteRequest(Irp, IO_NO_INCREMENT);
		}
		else
		{
			PPENDING_IRP le = ExAllocateFromNPagedLookasideList(&g_LookAside);

			le->Irp = Irp; InsertTailList(&pdoData->PendingQueue, &le->Link);
			KeReleaseSpinLock(&pdoData->PendingQueueLock, PrevIrql);
		}
	}
}
else
{
	status = STATUS_PENDING;
	IoMarkIrpPending(Irp);

	KeAcquireSpinLock(&pdoData->PendingQueueLock, &PrevIrql);
	{
		IoSetCancelRoutine(Irp, Bus_CancelIrp);

		if (Irp->Cancel == TRUE)
		{
			IoSetCancelRoutine(Irp, NULL);
			KeReleaseSpinLock(&pdoData->PendingQueueLock, PrevIrql);

			Irp->IoStatus.Status = STATUS_CANCELLED;
			Irp->IoStatus.Information = 0;

			IoCompleteRequest(Irp, IO_NO_INCREMENT);
		}
		else
		{
			PPENDING_IRP le = ExAllocateFromNPagedLookasideList(&g_LookAside);

			le->Irp = Irp; InsertTailList(&pdoData->HoldingQueue, &le->Link);
			KeReleaseSpinLock(&pdoData->PendingQueueLock, PrevIrql);
		}
	}
}

Both sections look identical so what’s this check for? 🤔 Wait a minute:

&pdoData->HoldingQueue

What the hell are you? 😮 I’ve never seen you in my entire life, what’s this queue for?! Are you kidding me, how could I have missed that?

Dammit! Flash of genius! That’s the culprit right there; in ViGEm I simply checked for the transfer direction but didn’t distinguish the pipes which are mapped to different endpoints. So since both IRPs from different pipes ended up in the same queue reserved for input reports only, some packets got randomly (race condition too, yay 😖) sent to the wrong pipe. When you send a D-Pad Up state change, it would look like this on the wire:

0x00 0x14 0x01 ...

Which happens to be one of the “headset got inserted” control codes of the XUSB protocol 😂 Feeding two pipes from one queue also caused every second or even third submit request getting swallowed by the wrong pipe and therefore no state changes in XInput. What a beast!

Thankfully the fix was very easy; just introduce a second queue in ViGEm which holds the control interrupt pipe requests pending until device destruction, distinguish the pipes properly and everything is fine! 😀

’till next time!