Warning: This post includes ETW, Windbg, Kernel and Process memory dumps. If you don´t want to deep dive into the Windows Internals you should stop reading now.
One strange issue was a UI hang. Normally these are easy to solve because something is doing CPU intensive things on the UI thread, the UI thread is stuck because of a blocking call to wait for something to happen (e.g. to read a 2 GB large file), or a deadlock has occurred. But this case was different. The UI was stuck but sometimes it did recover after 20s. The UI looked like this while it was not responding:
With Windbg we can examine where the UI thread is stuck from a live process or a memory dump. For managed code we need to load sos.dll as usual.
From the screenshot above we find that the managed stack is calling WaitMessage
0:000> !ClrStack OS Thread Id: 0x2dbc (0) Child SP IP Call Site 0053ec9c 761a2a9c [InlinedCallFrame: 0053ec9c] System.Windows.Forms.UnsafeNativeMethods.WaitMessage() 0053ec98 58a4d1ea System.Windows.Forms.Application+ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(IntPtr, Int32, Int32) 0053ed24 58a4cbee System.Windows.Forms.Application+ThreadContext.RunMessageLoopInner(Int32, System.Windows.Forms.ApplicationContext) 0053ed74 58a4ca60 System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext) 0053eda0 58a35d59 System.Windows.Forms.Application.Run(System.Windows.Forms.Form) UIHang.Program.Main() [D:\Source\FocusHang\UIHang\Program.cs @ 19]
which is a perfectly legal call stack and is by no way an indication of a hung UI thread with the id 0x2dbc. The deadlock check command for unmanaged locks !locks yielded no results and the managed counterpart !SyncBlk also showed nothing.
Dumping the other threads can be done in Windbg but when many threads are involved the Parallel Stacks window of Visual Studio is much better:
The other threads also look normal. By the way how would a not normal stack look like? If something has got stuck I simply check out the stacks with the longest stacktrace because these are usually which are actually doing more than waiting for things to happen. But as you can see from the picture above there are no long stacks involved.
The current dump shows nothing. What can we do? Get more dumps! These showed hangs happening in
but nowhere was a reason visible why they were hanging. The window manager of Windows inside the kernel is the win32k subsystem. If something is stuck at a deeper level then it is happening inside the kernel and user mode stacks are useless. Procdump (my favorite memory dump creation tool) can give you a peek inside the kernel by dumping not only the user mode part of the call stack but also the kernel stacks (this works on Windows 10 only as far as I know). If you have looked carefully at the Windbg output you will notice that the memory dump was performed with the -mk option (see Comment: in the Windbg window) which creates a second dump file besides the user mode dump
D:\UIHang>procdump -mk -ma UIHang.exe
ProcDump v9.0 – Sysinternals process dump utility
Copyright (C) 2009-2017 Mark Russinovich and Andrew Richards
Sysinternals – http://www.sysinternals.com
[16:56:31] Dump 1 initiated: D:\UIHang\UIHang.exe_180218_165631.dmp
[16:56:31] Dump 1 writing: Estimated dump file size is 177 MB.
[16:56:31] Dump 1 complete: 177 MB written in 0.3 seconds
[16:56:31] Dump 1 kernel: D:\UIHang\UIHang.exe_180218_165631.Kernel.dmp
[16:56:32] Dump count reached.
When you open the xxxx.Kernel.dmp file you can navigate to the user mode thread 0x2dbc from our stuck UI thread to see where the call stack continues in the kernel:
Sometimes you can learn something new by looking at the kernel side. In this case the Kernel waits for a new window messages in NtUserWaitMessage but it is still not clear why this call never wakes up. In that case it makes sense to examine the contents of the window message queue. Unfortunately that can only be done by MS support because the whole windowing stuff is not exposed in Windbg or any published Windbg extension that I am aware of. When sending data to someone else we should get as much evidence as possible. My current favorite data collection for such types of issues are
- ETW Sample Profiling with 8kHz sample rate and Context Switch Tracing
- Memory Dump of frozen process
- Kernel Memory Dump
Full Kernel Memory dumps are a pain because the are huge. If you are on Windows 10 or Server 2016 there is the option to take a kernel memory dump of only the active memory (https://blogs.msdn.microsoft.com/clustering/2015/05/18/windows-server-2016-failover-cluster-troubleshooting-enhancements-active-dump/) which is great because this excludes the file system cache which usually many GB in size. To force the creation of a kernel dump which excludes the file system cache you can create a reg file with the contents below:
Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\kbdhid\Parameters] "CrashOnCtrlScroll"=dword:00000001 [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\i8042prt\Parameters] "CrashOnCtrlScroll"=dword:00000001 [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl] "CrashDumpEnabled"=dword:00000001 "FilterPages"=dword:00000001
and import the reg file. After that you need to reboot the machine. When you now press
Right Ctrl + Scroll Lock + Scroll Lock
you get a sad face which contains as bug check code MANUALLY INITIATED CRASH which is just what we want. You can use the .reg file also on Windows 7 machines where you get a full memory dump because the FilterPages registry key is ignored on older Windows versions.
On my 16 GB machine I now get a small 2,2 GB dump file.
If you want to transfer large files you should compress the data as much as possible. 7z archives are in my experience about 20% smaller than regular .zip files at the expense of ca. x5 times longer compression times. You can use multithreaded compression with the LZMA2 switch which splits the data into blocks which can be compressed by multiple threads. If you resort back to LZMA then then the speedup is much less. If you are doing this on a server machine where users start working after you have taken a memory dump you should perhaps stick to the .zip format to compress on a single core and stay nicely in the background.
7z a -m0=LZMA2 c:\temp\Kernel.7z MEMORY.DMP
If you use LZMA2 or LZMA (with many different files it will also be highly parallel like LZMA within one file) 7z will use all cores it can get. The compressed file is then a 577MB file which can be much easier sent around. A quick look by Microsoft support revealed the the message queue on our main UI thread is stuck to wait for window messages on another thread with the thread id 1880. Lets check the dump file for that thread with e.g. Windbg
0:011> ~~s eax=00000000 ebx=00000002 ecx=00000000 edx=00000000 esi=00000000 edi=0000050c eip=7769e7ac esp=0866f7a8 ebp=0866f818 iopl=0 nv up ei pl nz na po nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202 ntdll!NtWaitForSingleObject+0xc: 7769e7ac c20c00 ret 0Ch 0:011> k # ChildEBP RetAddr 00 0866f7a4 7642ebf9 ntdll!NtWaitForSingleObject+0xc 01 0866f818 70e0d5bd KERNELBASE!WaitForSingleObjectEx+0x99 02 0866f87c 70e0d80e clr!CLRSemaphore::Wait+0xc0 03 0866f8b8 70e0d8a8 clr!ThreadpoolMgr::UnfairSemaphore::Wait+0x132 04 0866f924 70d6edf1 clr!ThreadpoolMgr::WorkerThreadStart+0x389 05 0866fa44 76568654 clr!Thread::intermediateThreadProc+0x55 06 0866fa58 77694a77 kernel32!BaseThreadInitThunk+0x24 07 0866faa0 77694a47 ntdll!__RtlUserThreadStart+0x2f 08 0866fab0 00000000 ntdll!_RtlUserThreadStart+0x1b
Hm. That is a .NET Threadool thread which waits for more work to come. There is no indication what did happen before on that thread. But since we have recorded ETW tracing we know thanks to high frequency sample profiling data with 8KHz a lot of the history of that thread. The default sampling rate of 1kHz is not sufficient when you are searching for cheap method calls which can have far reaching consequences. In this case we are not looking for a performance issue but we want to know which methods this thread has executed before it was waiting for more work.
Lets check out what thread 0x1880=6272 was doing in WPA:
It was executing a TPL task on that thread were the suspiciously named ctor of HangForm was called. But what is even more important that the thread was destroyed exactly 20s after it has executed the ctor of the form where Windows destroys the window of our hung from on thread exit (see Selection Duration of 20.011s). After that the UI was responsive again. At least that is the observation. But that still does not explain why the main UI thread was stuck even if you create a window on another thread you can create multiple UI threads within an application without issues as long as you do not mix them up. WinForms is very vigilant in this respect and will always throw an InvalidOperationException of the form
InvalidOperationException: “Cross–thread operation not valid: Control ‘<name>’ accessed from a thread other than the thread it was created on.
if you try bad things like that.
The only way I know how to connect window message pumps from different threads is to call AttachThreadInput. When we search the profiling data for that method we find this:
| | | | |- UIHang.exe!UIHang.HangForm::StartUIOnOtherThread 0x0 | | | | | |- UIHang.exe!UIHang.HangForm::.ctor 0x0 | | | | | |- System.Windows.Forms.ni.dll!System.Windows.Forms.Control.Show() | | | | | |- UIHang.exe!dynamicClass::IL_STUB_PInvoke 0x0 | | | | | | |- user32.dll!SetParentStub | | | | | | | win32u.dll!NtUserSetParent | | | | | | | ntdll.dll!LdrInitializeThunk | | | | | | | ntdll.dll!LdrpInitialize | | | | | | | ntdll.dll!_LdrpInitialize | | | | | | | wow64.dll!Wow64LdrpInitialize | | | | | | | wow64.dll!RunCpuSimulation | | | | | | | wow64cpu.dll!Thunk0Arg | | | | | | | wow64cpu.dll!CpupSyscallStub | | | | | | | ntoskrnl.exe!KiSystemServiceCopyEnd | | | | | | | win32kfull.sys!NtUserSetParent | | | | | | | win32kfull.sys!xxxSetParentWorker | | | | | | | |- win32kfull.sys!xxxShowWindowEx | | | | | | | |- win32kfull.sys!zzzAttachThreadInput
The window was created on another thread but it did not attach the thread input queue directly. Instead it was calling user32.dll!SetParent which did in the kernel in the win32k subsystem attach the window input queues by calling zzzAttachThreadInput. That is all happening on our non message pumping TPL Task thread which is the missing ingredient to finally understand why our main UI thread was blocked due to a programming error on a seemingly unrelated thread. We have from the zzzAttachThreadInput method even with 8kHz sampling rate only one stack trace where one still needs a bit of luck to see the root cause so nicely with ETW data.
More documentation about that Win32 behavior would be great. These details seem to be discussed on the Old New Thing Blog (Sharing an input queue takes what used to be asynchronous and makes it synchronous, like focus changes). A few more hints are shown at P41 https://www.slideshare.net/wvdang/five-things-every-win32-developer-should-know. According to that you will attach thread input queues implicitly if you
- Set a parent window (user32.dll!SetParent)
- Set an Owner window (user32.dll!
SetWindowLongPtr(win32window, GWLP_HWNDPARENT, formhandle)
- Or install journal hooks (user32.dll!SetWindowsHookEx with an JournalRecordProc)
Win32K ETW Tracing?
You can also enable tracing for the Win32K subsystem to track the window focus events by adding this ETW provider to your xperf command line
But if the window message pump is stuck the results of this ETW provider and the WPA Window In Focus chart can be misleading.
Window message queue issues are notoriously hard to debug because most relevant data is only available during live debugging while you still can query window states with e.g. Spy++. But if you only have a memory dump you will have a hard time to figure out what went wrong. A kernel dump would give you all information but since no public information is present how you can examine the contents a window message queue this must be left as an exercise for Microsoft support. If someone knows how to get e.g. thread affinity from a user mode memory dump from an HWND please leave a note below. Once again memory dumps and ETW tracing have helped to find the actual root cause. The memory dump helps to find stuck threads and strange data points. ETW helps you to find how you did get into that state once you know from the dump file where you need to look further.
This time I have learned that .NET Threadpool threads seem to be shut down 20s after they had no real work to do and that Windows will destroy window handles once the creating thread terminates. That can unblock a previously stuck UI as a side effect. If you want to play with the UIHang application you can find it here: https://1drv.ms/f/s!AhcFq7XO98yJgrklCE9_p4RuHoG0Mg