The Mysterious UI Hang Which Resolved Itself After 20s

Warning: This post includes ETW, Windbg, Kernel and Process memory dumps. If you don´t want to deep dive into the Windows Internals you should stop reading now.

One strange issue was a UI hang. Normally these are easy to solve because something is doing CPU intensive things on the UI thread, the UI thread is stuck because of a blocking call to wait for something to happen (e.g. to read a 2 GB large file),  or a deadlock has occurred. But this case was different. The UI was stuck but sometimes it did recover after 20s. The UI looked like this while it was not responding:

image

With Windbg we can examine where the UI thread is stuck from a live process or a memory dump. For managed code we need to load sos.dll as usual.

image

From the screenshot above we find that the managed stack is calling WaitMessage

0:000> !ClrStack
OS Thread Id: 0x2dbc (0)
Child SP       IP Call Site
0053ec9c 761a2a9c [InlinedCallFrame: 0053ec9c] System.Windows.Forms.UnsafeNativeMethods.WaitMessage()
0053ec98 58a4d1ea System.Windows.Forms.Application+ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(IntPtr, Int32, Int32)
0053ed24 58a4cbee System.Windows.Forms.Application+ThreadContext.RunMessageLoopInner(Int32, System.Windows.Forms.ApplicationContext)
0053ed74 58a4ca60 System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, System.Windows.Forms.ApplicationContext)
0053eda0 58a35d59 System.Windows.Forms.Application.Run(System.Windows.Forms.Form)
UIHang.Program.Main() [D:\Source\FocusHang\UIHang\Program.cs @ 19]

which is a perfectly legal call stack and is by no way an indication of a hung UI thread with the id 0x2dbc. The deadlock check command for unmanaged locks !locks yielded no results and the managed counterpart !SyncBlk also showed nothing.

Dumping the other threads can be done in Windbg but when many threads are involved the Parallel Stacks window of Visual Studio is much better:

image

The other threads also look normal. By the way how would a not normal stack look like? If something has got stuck I simply check out the stacks with the longest stacktrace because these are usually which are actually doing more than waiting for things to happen. But as you can see from the picture above there are no long stacks involved.

The current dump shows nothing. What can we do? Get more dumps! These showed hangs happening in

  • user32.dll!PeekMessage
  • user32.dll!SetFocus
  • user32.dll!ShowWindow

but nowhere was a reason visible why they were hanging. The window manager of Windows inside the kernel is the win32k subsystem. If something is stuck at a deeper level then it is happening inside the kernel and user mode stacks are useless. Procdump (my favorite memory dump creation tool) can give you a peek inside the kernel by dumping not only the user mode part of the call stack but also the kernel stacks (this works on Windows 10 only as far as I know). If you have looked carefully at the Windbg output you will notice that the memory dump was performed with the -mk option (see Comment: in the Windbg window) which creates a second dump file besides the user mode dump

D:\UIHang>procdump -mk -ma UIHang.exe

ProcDump v9.0 – Sysinternals process dump utility
Copyright (C) 2009-2017 Mark Russinovich and Andrew Richards
Sysinternals – http://www.sysinternals.com

[16:56:31] Dump 1 initiated: D:\UIHang\UIHang.exe_180218_165631.dmp
[16:56:31] Dump 1 writing: Estimated dump file size is 177 MB.
[16:56:31] Dump 1 complete: 177 MB written in 0.3 seconds
[16:56:31] Dump 1 kernel: D:\UIHang\UIHang.exe_180218_165631.Kernel.dmp
[16:56:32] Dump count reached.

When you open the xxxx.Kernel.dmp file you can navigate to the user mode thread 0x2dbc from our stuck UI thread to see where the call stack continues in the kernel:

image

Sometimes you can learn something new by looking at the kernel side. In this case the Kernel waits for a new window messages in NtUserWaitMessage but it is still not clear why this call never wakes up. In that case it makes sense to examine the contents of the window message queue. Unfortunately that can only be done by MS support because the whole windowing stuff is not exposed in Windbg or any published Windbg extension that I am aware of. When sending data to someone else we should get as much evidence as possible. My current favorite data collection for such types of issues are

  • ETW Sample Profiling with 8kHz sample rate and Context Switch Tracing
  • Memory Dump of frozen process
  • Kernel Memory Dump

Full Kernel Memory dumps are a pain because the are huge. If you are on Windows 10 or Server 2016 there is the option to take a kernel memory dump of only the active memory (https://blogs.msdn.microsoft.com/clustering/2015/05/18/windows-server-2016-failover-cluster-troubleshooting-enhancements-active-dump/) which is great because this excludes the file system cache which usually many GB in size. To force the creation of a kernel dump which excludes the file system cache you can create a reg file with the contents below:

CrashOnCtrlScroll_ActiveMemory.reg

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\kbdhid\Parameters]
"CrashOnCtrlScroll"=dword:00000001

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\i8042prt\Parameters]
"CrashOnCtrlScroll"=dword:00000001

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl]
"CrashDumpEnabled"=dword:00000001
"FilterPages"=dword:00000001

and import the reg file. After that you need to reboot the machine. When you now press

Right Ctrl + Scroll Lock + Scroll Lock

you get a sad face which contains as bug check code MANUALLY INITIATED CRASH which is just what we want. You can use the .reg file also on Windows 7 machines where you get a full memory dump because the FilterPages registry key is ignored on older Windows versions.

image

On my 16 GB machine I now get a small 2,2 GB dump file.

image

If you want to transfer large files you should compress the data as much as possible. 7z archives are in my experience about 20% smaller than regular .zip files at the expense of ca. x5 times longer compression times. You can use multithreaded compression with the LZMA2 switch which splits the data into blocks which can be compressed by multiple threads. If you resort back to LZMA then then the speedup is much less. If you are doing this on a server machine where users start working after you have taken a memory dump you should perhaps stick to the .zip format to compress on a single core and stay nicely in the background.

7z a -m0=LZMA2 c:\temp\Kernel.7z MEMORY.DMP

If you use LZMA2 or LZMA (with many different files it will also be highly parallel like LZMA within one file) 7z will use all cores it can get. The compressed file is then a 577MB file which can be much easier sent around. A quick look by Microsoft support revealed the the message queue on our main UI thread is stuck to wait for window messages on another thread with the thread id 1880. Lets check the dump file for that thread with e.g. Windbg

0:011> ~~[1880]s
eax=00000000 ebx=00000002 ecx=00000000 edx=00000000 esi=00000000 edi=0000050c
eip=7769e7ac esp=0866f7a8 ebp=0866f818 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
ntdll!NtWaitForSingleObject+0xc:
7769e7ac c20c00          ret     0Ch
0:011> k
 # ChildEBP RetAddr  
00 0866f7a4 7642ebf9 ntdll!NtWaitForSingleObject+0xc
01 0866f818 70e0d5bd KERNELBASE!WaitForSingleObjectEx+0x99
02 0866f87c 70e0d80e clr!CLRSemaphore::Wait+0xc0
03 0866f8b8 70e0d8a8 clr!ThreadpoolMgr::UnfairSemaphore::Wait+0x132
04 0866f924 70d6edf1 clr!ThreadpoolMgr::WorkerThreadStart+0x389
05 0866fa44 76568654 clr!Thread::intermediateThreadProc+0x55
06 0866fa58 77694a77 kernel32!BaseThreadInitThunk+0x24
07 0866faa0 77694a47 ntdll!__RtlUserThreadStart+0x2f
08 0866fab0 00000000 ntdll!_RtlUserThreadStart+0x1b

Hm. That is a .NET Threadool thread which waits for more work to come. There is no indication what did happen before on that thread. But since we have recorded ETW tracing we know thanks to high frequency sample profiling data with 8KHz a lot of the history of that thread. The default sampling rate of 1kHz is not sufficient when you are searching for cheap method calls which can have far reaching consequences. In this case we are not looking for a performance issue but we want to know which methods this thread has executed before it was waiting for more work.

Lets check out what thread 0x1880=6272 was doing in WPA:

image

It was executing a TPL task on that thread were the suspiciously named ctor of HangForm was called. But what is even more important that the thread was destroyed exactly 20s after it has executed the ctor of the form where Windows destroys the window of our hung from on thread exit (see Selection Duration of 20.011s). After that the UI was responsive again. At least that is the observation. But  that still does not explain why the main UI thread was stuck even if you create a window on another thread you can create multiple UI threads within an application without issues as long as you do not mix them up. WinForms is very vigilant in this respect and will always throw an InvalidOperationException of the form

InvalidOperationException: “Crossthread operation not valid: Control ‘<name>’ accessed from a thread other than the thread it was created on.

if you try bad things like that.

The only way I know how to connect window message pumps from different threads is to call AttachThreadInput. When we search the profiling data for that method we find this:

 |    |    |    |    |- UIHang.exe!UIHang.HangForm::StartUIOnOtherThread 0x0

 |    |    |    |    |    |- UIHang.exe!UIHang.HangForm::.ctor 0x0

 |    |    |    |    |    |- System.Windows.Forms.ni.dll!System.Windows.Forms.Control.Show()

 |    |    |    |    |    |- UIHang.exe!dynamicClass::IL_STUB_PInvoke 0x0

 |    |    |    |    |    |    |- user32.dll!SetParentStub

 |    |    |    |    |    |    |    win32u.dll!NtUserSetParent

 |    |    |    |    |    |    |    ntdll.dll!LdrInitializeThunk

 |    |    |    |    |    |    |    ntdll.dll!LdrpInitialize

 |    |    |    |    |    |    |    ntdll.dll!_LdrpInitialize

 |    |    |    |    |    |    |    wow64.dll!Wow64LdrpInitialize

 |    |    |    |    |    |    |    wow64.dll!RunCpuSimulation

 |    |    |    |    |    |    |    wow64cpu.dll!Thunk0Arg

 |    |    |    |    |    |    |    wow64cpu.dll!CpupSyscallStub

 |    |    |    |    |    |    |    ntoskrnl.exe!KiSystemServiceCopyEnd

 |    |    |    |    |    |    |    win32kfull.sys!NtUserSetParent

 |    |    |    |    |    |    |    win32kfull.sys!xxxSetParentWorker

 |    |    |    |    |    |    |    |- win32kfull.sys!xxxShowWindowEx

 |    |    |    |    |    |    |    |- win32kfull.sys!zzzAttachThreadInput

The window was created on another thread but it did not attach the thread input queue directly. Instead it was calling user32.dll!SetParent which did in the kernel in the win32k subsystem attach the window input queues by calling zzzAttachThreadInput. That is all happening on our non message pumping TPL Task thread which is the missing ingredient to finally understand why our main UI thread was blocked due to a programming error on a seemingly unrelated thread. We have from the zzzAttachThreadInput method even with 8kHz sampling rate only one stack trace where one still needs a bit of luck to see the root cause so nicely with ETW data.

More documentation about that Win32 behavior would be great. These details seem to be discussed on the Old New Thing Blog (Sharing an input queue takes what used to be asynchronous and makes it synchronous, like focus changes). A few more hints are shown at P41 https://www.slideshare.net/wvdang/five-things-every-win32-developer-should-know.  According to that you will attach thread input queues implicitly if you

  • Set a parent window (user32.dll!SetParent)
  • Set an Owner window (user32.dll!SetWindowLongPtr(win32window, GWLP_HWNDPARENT, formhandle)
  • Or install journal hooks  (user32.dll!SetWindowsHookEx with an JournalRecordProc)

Win32K ETW Tracing?

You can also enable tracing for the Win32K subsystem to track the window focus events by adding this ETW provider to your xperf command line

Microsoft-Windows-Win32k:0x42e3000:0xff

But if the window message pump is stuck the results of this ETW provider and the WPA Window In Focus chart can be misleading.

Conclusions

Window message queue issues are notoriously hard to debug because most relevant data is only available during live debugging while you still can query window states with e.g. Spy++. But if you only have a memory dump you will have a hard time to figure out what went wrong. A kernel dump would give you all information but since no public information is present how you can examine the contents a window message queue this must be left as an exercise for Microsoft support. If someone knows how to get e.g. thread affinity from a user mode memory dump from an HWND please leave a note below. Once again memory dumps and ETW tracing have helped to find the actual root cause. The memory dump helps to find stuck threads and strange data points. ETW helps you to find how you did get into that state once you know from the dump file where you need to look further.

This time I have learned that .NET Threadpool threads seem to be shut down 20s after they had no real work to do and that Windows will destroy window handles once the creating thread terminates. That can unblock a previously stuck UI as a side effect. If you want to play with the UIHang application you can find it here: https://1drv.ms/f/s!AhcFq7XO98yJgrklCE9_p4RuHoG0Mg

Advertisements

Be Careful Where You Put GC.SuppressFinalize

I had an interesting issue to debug which resulted in a race condition where the finalizer was being called while the object was still in use. If you know how .NET works this should ring some alarm bells since this should never happen. The finalizer is expected to run only when no one has a reference to the finalizable object anymore.

A simple reproducer is below. It creates 50K finalizable objects. Each object allocates 500 bytes of unmanaged memory which is released either by calling dispose on it on a dedicated thread or the finalizer thread will kill the rest during application shutdown.

using System;
using System.Linq;
using System.Runtime.InteropServices;
using System.Threading;
using System.Threading.Tasks;

class Program
{
    static void Main(string[] args)
    {
        // create 50K events
        var events = Enumerable.Range(1, 50 * 1000)
                                .Select(x => new Event())
                                .ToList();

        ManualResetEvent startEvent = new ManualResetEvent(false);

        Task.Factory.StartNew(() =>
        {
            startEvent.WaitOne();  // wait for event
            foreach (var ev in events) // dispose events
            {
                ev.Dispose();
            }
        });

        startEvent.Set(); // start disposing events
        Thread.Sleep(1);  // wait a bit and then exit
    }
}

public class Event : IDisposable
{
    internal IntPtr hGlobal;  // allocate some unmanaged memory

    public Event()
    {
        hGlobal = Marshal.AllocHGlobal(500);
    }

    ~Event()  // finalizer 
    {
        Dispose();
    }

    public void Dispose()
    {
        if( hGlobal !=  IntPtr.Zero) // check if memory is gone
        {
            Marshal.FreeHGlobal(hGlobal); // free it
            GC.SuppressFinalize(this); // Prevent finalizer from running it again
            hGlobal = IntPtr.Zero;
        }
    }
}

Looks good to you? Let it run:

image

Ups that should not happen. When trying to run the application under the VS debugger everything works on my machine™. No matter how hard I try it will never crash under the debugger. But if I start it without debugging it will crash every time.

Debug The Problem

When the application crashes without the debugger on a machine where VS is installed you will get a nice dialog

image

where you can click Debug. Then I choose to debug managed and unmanaged debugging

image

Because part of the issue has to do with the .NET Runtime we need managed and unmanaged debugging. It is therefore wise to enable Native and Managed debugging.

image

If you do not manually select both debugging engine VS will default to unmanaged debugging only where we will miss our managed stack frames which is not particularly helpful:

image

With the correct debugging engine we find that while the finalizer was called a heap corruption was reported:

image

While another thread is also disposing events

image

So what is the problem here? Could it be that the finalizer is disposing the same instance on which our TPL thread is still working? A concurrent double free sounds likely but by only using Visual Studio we cannot prove it. If a finalizer is called while we the object is still alive we would have found have a pretty serious GC bug. On the other hand if that would be the case many people would have complained.

Gather More Evidence

To analyze the crash with other tools it is good to save a memory dump from the crashing aplication. You can do this pretty easily with

D:\Source\FinalizerFun\bin\Release>procdump -ma -e -x . FinalizerFunNetFull.exe

ProcDump v9.0 – Sysinternals process dump utility
Copyright (C) 2009-2017 Mark Russinovich and Andrew Richards
Sysinternals – www.sysinternals.com

[21:13:50] Exception: 04242420
[21:13:52] Exception: 80000003.BREAKPOINT
[21:13:52] Exception: C0000374
[21:13:52] Unhandled: C0000374
[21:13:52] Dump 1 initiated: .\FinalizerFunNetFull.exe_180204_211352.dmp
[21:13:52] Dump 1 writing: Estimated dump file size is 83 MB.
[21:13:52] Dump 1 complete: 83 MB written in 0.1 seconds
[21:13:52] Dump count reached.

procdump is a command line tool to take memory dumps in many ways. This time we take a full memory dump -ma for unhandled exceptions -e where we execute a process -x and put the dump to the current directory . followed by the executable and optional command line arguments for the executable. The most difficult part is that I always forget that the first parameter after -x is not the executable and its arguments but the dump folder. If you try to capture a dump on first chance exceptions before it becomes unhandled you normally use -e 1 but for reasons not known to me this did never trigger the creation of a dump file. If all fails you can still take a memory dump while the “… has stopped working” dialog is shown with procdump for a given pid like “procdump -ma pid”.

You can open the memory dump with Visual Studio without problem by dragging and dropping the .dmp file from the explorer into VS

image

Press Debug with Mixed to see managed and unmanaged code. Many people shy away from memory dumps. But if you dare to open them the debugging experience is the same as it would be for a live process which is stuck at a breakpoint. The only difference is that you cannot continue execution. VS will show your source code and the crashing thread just like it would happen during a live debugging session:

image

VS has great memory dump support (since ca. VS2012/2013 around).  If you have an automated build system it is possible to get full source code debugging  for your released application. The feature is called Source Server support. For TFS builds it is a simple config switch of your build. With git things are more involved https://shonnlyga.wordpress.com/2016/05/28/source-server-with-git-repository/. If you have Source Indexed builds you definitely want to enable Source Server support for the debugger to get live and memory dump debugging without the need to download the source files.  In Debug –  Options

image

check all items below Enable source server support. Unfortunately VS 2017 has broken Source Server Support which is tracked here: https://developercommunity.visualstudio.com/content/problem/169309/debugger-cant-create-folder-structure-when-trying.html

It is working on VS 2013, 2015 or 2017 15.6 (still beta). As workaround you can copy srcsrv.dll from an earlier VS edition to the VS2017 one to get source server support back again.

No Not Windbg!

We have reached a dead end with Visual Studio. It is time to admit that the nice GUI based tools although powerful are not always the most helpful ones when you want to completely understand an issue. First we need to download Windbg for which MS has put up a page https://developer.microsoft.com/en-us/windows/hardware/download-windbg. This will point you to the Windows SDK page

image

from where you can download the Windows SDK installer. If the installer wont start you have a newer version of the Win 10 SDK already installed. In that case you can download the latest SDK installer from  https://developer.microsoft.com/en-us/windows/downloads/windows-10-sdk.

When you start the installer you need to press next a few times to get to the list of features you want to install. Check Debugging Tools for Windows and press Install.

image

Now you will find the 32 bit version of Windbg in

“C:\Program Files (x86)\Windows Kits\10\Debuggers\x86\windbg.exe”

and the 64 bit version at

“C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\windbg.exe”

Start the correct Windbg version (x86,x64) and load the crash dump file.

image

Since it is a managed process we need to load to a managed debugging extensions named sos.dll. This is done with the Windbg command

.loadby sos clr

If you get a failure of the form

0:005> .loadby sos clr
The call to LoadLibrary(C:\Windows\Microsoft.NET\Framework\v4.0.30319\sos) failed, Win32 error 0n193
    “%1 is not a valid Win32 application.
Please check your debugger configuration and/or network access.

then you did load the wrong Windbg because you did open a 32 bit dump with the 64 bit version of Windbg. Things get easier with the upcoming new Windbg which is in Beta where is only one debugger which will load the right debugger.
The cryptic command tells the debugger to load the sos.dll from the same directory where the .NET Runtime dll clr.dll is located. If you wish you can also fully qualify the name like

.load C:\Windows\Microsoft.NET\Framework\v4.0.30319\sos.dll

The 64 bit framework dll is located at

.load C:\Windows\Microsoft.NET\Framework64\v4.0.30319\sos.dll

If you dare to use ” for the path then you need to adhere to the C style escape rules where you need to use \\ to get a \. If you analyze a memory dump on another machine with a different .NET Framework version installed some of the SOS commands might not work or sos.dll refuses to be loaded. In that case you can check out my OneDrive folder https://1drv.ms/f/s!AhcFq7XO98yJgoMwuPd7LNioVKAp_A which contains a pretty up to date list of nearly all .NET Framework sos dlls. You need to extend the symbol path to the downloaded sos dlls (.sympath+ c:\mscordackwksDownloadDir) and then load it via the full path. Things become easier in the future if Windbg automatically loads the right sos.dll from the symbol server which seems now to be in place.

We have a managed debugging extension loaded. Now what? First we test if the extension works by executing the !Threads command

0:000> !Threads
c0000005 Exception in C:\Windows\Microsoft.NET\Framework\v4.0.30319\sos.Threads debugger extension.
      PC: 0b13b8e3  VA: 00000000  R/W: 0  Parameter: ed04c8b4
0:000> !Threads
ThreadCount:      4
UnstartedThread:  0
BackgroundThread: 4
PendingThread:    0
DeadThread:       0
Hosted Runtime:   no
                                                                         Lock  
       ID OSID ThreadOBJ    State GC Mode     GC Alloc Context  Domain   Count Apt Exception
   0    1 2594 02959160   2022220 Preemptive  046E6CF8:00000000 02952d00 0     MTA 
   5    2 1a08 02966f30     2b220 Preemptive  046F3CDC:00000000 02952d00 0     MTA (Finalizer) System.BadImageFormatException 046ebff4
   9    3 4300 06d3f690   3021220 Preemptive  046E829C:00000000 02952d00 0     MTA (Threadpool Worker) 
  11    4 2cec 06d41e78   1029220 Preemptive  046EA1E4:00000000 02952d00 0     MTA (Threadpool Worker) 

For some reason the first time I execute the command I get an exception. But it works the second time. This is happening to me since years on many different machines. I have no idea what the bug is but it should be fixed someday. We know that we have 4 threads and one thread did throw a BadImageFormatException. Lets examine that thread. The first column are the Windbg thread numbers given by Windbg to switch easier between threads. The command to switch to a specific thread 5 where our exception lives is

~5s

Then we can execute the sos command to dump the managed thread stack with

0:005> !ClrStack
OS Thread Id: 0x1a08 (5)
Child SP       IP Call Site
0676f888 7748ed3c [HelperMethodFrame: 0676f888] System.Runtime.InteropServices.Marshal.ThrowExceptionForHRInternal(Int32, IntPtr)
0676f8fc 70d0065e System.Runtime.InteropServices.Marshal.FreeHGlobal(IntPtr) [f:\dd\ndp\clr\src\BCL\system\runtime\interopservices\marshal.cs @ 1211]
0676f908 0291116a Event.Dispose() [D:\Source\vc17\FinalizerFuncNetFull\FinalizerFuncNetFull\Program.cs @ 51]
0676f914 029111a9 Event.Finalize() [D:\Source\vc17\FinalizerFuncNetFull\FinalizerFuncNetFull\Program.cs @ 43]
0676fb10 714e63f2 [DebuggerU2MCatchHandlerFrame: 0676fb10] 

We know that thread number 5 is the finalizer thread and we see that it is indeed calling into Event.Finalize where our exception happens. So far we did not get more information than we could get from the much easier to use Visual Studio debugger. Now lets check on which event object the finalizer was called. For that we can use a heuristic command named !dso which is the short form of Dump Stack Objects.

0:005> !dso
OS Thread Id: 0x1a08 (5)
ESP/REG  Object   Name
0676F5A8 046ebff4 System.BadImageFormatException
0676F698 046ebff4 System.BadImageFormatException
0676F6AC 046ebff4 System.BadImageFormatException
0676F6D0 046ebff4 System.BadImageFormatException
0676F6FC 046ebff4 System.BadImageFormatException
0676F710 046ebff4 System.BadImageFormatException
0676F718 046ebff4 System.BadImageFormatException
0676F71C 046ebff4 System.BadImageFormatException
0676F7BC 046ebff4 System.BadImageFormatException
0676F7FC 046ebff4 System.BadImageFormatException
0676F8FC 046507c0 Event
0676F958 046507c0 Event
0676F98C 046507c0 Event
0676F998 046507c0 Event
0676F9A8 046507c0 Event
0676F9B0 046507c0 Event
0676F9C0 046507c0 Event

The command is rather dumb and dumps the same object reference several times where it was located as pointer on the thread stack. There is actually a much better extension out there for that which is called netext (https://github.com/rodneyviana/netext/tree/master/Binaries). To “install” the extension you can copy it to the Windbg default extension folder which allows you to load the dll with no directory qualifier on my machine to

  • C:\Program Files (x86)\Windows Kits\10\Debuggers\x86\winext
  • C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\winext

Now we can load it

0:000> .load netext
netext version 2.1.19.5000 Feb  6 2018
License and usage can be seen here: !whelp license
Check Latest version: !wupdate
For help, type !whelp (or in WinDBG run: ‘.browse !whelp’)
Questions and Feedback:
https://github.com/rodneyviana/netext/issues
Copyright (c) 2014-2015 Rodney Viana (
http://blogs.msdn.com/b/rodneyviana)
Type: !windex -tree or ~*e!wstack to get started

0:005> !wstack

Listing objects from: 0676b000 to 06770000 from thread: 5 [1a08]

046ebff4 701d13c4   0  0         92 System.BadImageFormatException
046507c0 028b6260   0  0         12 Event

2 unique object(s) found in 104 bytes

to get a much less cluttered output. This extension is pure gold because it allows you to write LINQ style debugger queries to e.g. dump all object instances which derive from a common base class. It has extended support for WCF connections, sockets and APS.NET specific things.

From the dump we know that the event 046507c0 did cause an exception in the unmanaged heap. Was someone else working with this object? Visual Studio is of no help here but we can use the !GCRoot command to find out who references this object from somewhere else:

0:005> !GCRoot 046507c0
Thread 1a08:
    0676f908 0291116a Event.Dispose() [D:\Source\vc17\FinalizerFuncNetFull\FinalizerFuncNetFull\Program.cs @ 51]
        esi: 
            ->  046507c0 Event

Thread 4300:
    08edf790 0291116a Event.Dispose() [D:\Source\vc17\FinalizerFuncNetFull\FinalizerFuncNetFull\Program.cs @ 51]
        esi: 
            ->  046507c0 Event

    08edf79c 02911108 Program+<>c__DisplayClass0_0.<Main>b__1() [D:\Source\vc17\FinalizerFuncNetFull\FinalizerFuncNetFull\Program.cs @ 23]
        ebp+2c: 08edf7ac
            ->  046324b4 System.Collections.Generic.List`1[[Event, FinalizerFuncNetFull]]
            ->  05655530 Event[]
            ->  046507c0 Event

    08edf79c 02911108 Program+<>c__DisplayClass0_0.<Main>b__1() [D:\Source\vc17\FinalizerFuncNetFull\FinalizerFuncNetFull\Program.cs @ 23]
        ebp+28: 08edf7b0
            ->  046507c0 Event

Found 4 unique roots (run '!GCRoot -all' to see all roots).

The finalizer thread 1a08 was expected but what is thread 4300 doing with our object? Lets switch to that thread. We can use either the thread number of the OS thread id with the even more cryptic command

0:005> ~~[4300]s
eax=00000000 ebx=00000001 ecx=00000000 edx=00000000 esi=00000001 edi=00000001
eip=7748ed3c esp=08edf2b8 ebp=08edf448 iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
ntdll!NtWaitForMultipleObjects+0xc:
7748ed3c c21400          ret     14h
0:009> !ClrStack
OS Thread Id: 0x4300 (9)
Child SP       IP Call Site
08edf754 7748ed3c [InlinedCallFrame: 08edf754] 
08edf750 7013bb80 DomainNeutralILStubClass.IL_STUB_PInvoke(IntPtr)
08edf754 7010d8b9 [InlinedCallFrame: 08edf754] Microsoft.Win32.Win32Native.LocalFree(IntPtr)
08edf784 7010d8b9 System.Runtime.InteropServices.Marshal.FreeHGlobal(IntPtr) [f:\dd\ndp\clr\src\BCL\system\runtime\interopservices\marshal.cs @ 1212]
08edf790 0291116a Event.Dispose() [D:\Source\vc17\FinalizerFuncNetFull\FinalizerFuncNetFull\Program.cs @ 51]
08edf79c 02911108 Program+c__DisplayClass0_0.b__1() [D:\Source\vc17\FinalizerFuncNetFull\FinalizerFuncNetFull\Program.cs @ 23]
08edf7e0 70097328 System.Threading.Tasks.Task.InnerInvoke() [f:\dd\ndp\clr\src\BCL\system\threading\Tasks\Task.cs @ 2884]
08edf7ec 70096ed0 System.Threading.Tasks.Task.Execute() [f:\dd\ndp\clr\src\BCL\system\threading\Tasks\Task.cs @ 2498]
08edf810 700972fa System.Threading.Tasks.Task.ExecutionContextCallback(System.Object) [f:\dd\ndp\clr\src\BCL\system\threading\Tasks\Task.cs @ 2861]
08edf814 7010bcd5 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) [f:\dd\ndp\clr\src\BCL\system\threading\executioncontext.cs @ 954]
08edf880 7010bbe6 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) [f:\dd\ndp\clr\src\BCL\system\threading\executioncontext.cs @ 902]
08edf894 70097178 System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef) [f:\dd\ndp\clr\src\BCL\system\threading\Tasks\Task.cs @ 2827]
08edf8f8 7009704d System.Threading.Tasks.Task.ExecuteEntry(Boolean) [f:\dd\ndp\clr\src\BCL\system\threading\Tasks\Task.cs @ 2767]
08edf908 70096fcc System.Threading.Tasks.Task.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem() [f:\dd\ndp\clr\src\BCL\system\threading\Tasks\Task.cs @ 2704]
08edf90c 700e87f2 System.Threading.ThreadPoolWorkQueue.Dispatch() [f:\dd\ndp\clr\src\BCL\system\threading\threadpool.cs @ 820]
08edf95c 700e865a System.Threading._ThreadPoolWaitCallback.PerformWaitCallback() [f:\dd\ndp\clr\src\BCL\system\threading\threadpool.cs @ 1161]
08edfb80 7143eb16 [DebuggerU2MCatchHandlerFrame: 08edfb80] 

Ahh that is our TPL thread which is also freeing the object. The call stack shows that we have caught it in action while it was still calling Marshal.FreeHGlobal the finalizer did finalize it right away! That is pretty serious since that must not happen. To see the full picture we need a mixed mode stack with no hidden stack frames like Visual Studio is showing us. For mixed mode stacks there is another Windbg extension best suited. It is called sosex (http://www.stevestechspot.com/)

0:009> .load sosex
This dump has no SOSEX heap index.
The heap index makes searching for references and roots much faster.
To create a heap index, run !bhi
0:009> !mk
Thread 9:
        SP       IP
00:U 08edf2b8 7748ed3c ntdll!NtWaitForMultipleObjects+0xc
01:U 08edf2bc 753f1293 KERNELBASE!WaitForMultipleObjectsEx+0x103
02:U 08edf450 714dff96 clr!WaitForMultipleObjectsEx_SO_TOLERANT+0x3c
03:U 08edf4a0 714dfcd8 clr!Thread::DoAppropriateWaitWorker+0x237
04:U 08edf52c 714dfdc9 clr!Thread::DoAppropriateWait+0x64
05:U 08edf598 714dff3c clr!CLREventBase::WaitEx+0x128
06:U 08edf5e4 71560152 clr!CLREventBase::Wait+0x1a
07:U 08edf5fc 714fe9dc clr!WaitForEndOfShutdown_OneIteration+0x81
08:U 08edf670 714fea29 clr!WaitForEndOfShutdown+0x1b
09:U 08edf67c 714fcd76 clr!Thread::RareDisablePreemptiveGC+0x52f
0a:U 08edf6c8 714e8374 clr!JIT_RareDisableHelper+0x24
0b:M 08edf74c 7013bb95 DomainNeutralILStubClass.IL_STUB_PInvoke(IntPtr)
0c:M 08edf750 7013bb80 DomainNeutralILStubClass.IL_STUB_PInvoke(IntPtr)
0d:M 08edf784 7010d8b9 System.Runtime.InteropServices.Marshal.FreeHGlobal(IntPtr)(+0xe IL,+0x19 Native) [f:\dd\ndp\clr\src\BCL\system\runtime\interopservices\marshal.cs @ 1212,17]
0e:M 08edf790 0291116a Event.Dispose()(+0x1d IL,+0x12 Native) [D:\Source\vc17\FinalizerFuncNetFull\FinalizerFuncNetFull\Program.cs @ 51,13]
0f:M 08edf79c 02911108 Program+<>c__DisplayClass0_0.<Main>b__1()(+0x21 IL,+0x70 Native) [D:\Source\vc17\FinalizerFuncNetFull\FinalizerFuncNetFull\Program.cs @ 23,17]

There we see that the thread did call into unmanaged code to free the heap memory but the CLR will not let it run managed code anymore because we are shutting down the process. There are some gotchas related to managed application shutdown where I did write a long time (12 years now) ago here: https://www.codeproject.com/Articles/16164/Managed-Application-Shutdown. Most things are still the same. The key takeaway is that when managed application shutdown is initiated the .NET Runtime ensures that

  • managed code calling into unmanaged code never returns (clr!WaitForEndOfShutdown)
  • All managed threads are suspended except for the finalizer thread

0:009> ~
#  0  Id: aa0.2594 Suspend: 1 Teb: 002d2000 Unfrozen
   1  Id: aa0.18e0 Suspend: 1 Teb: 002d5000 Unfrozen
   2  Id: aa0.3ac4 Suspend: 1 Teb: 002d8000 Unfrozen
   3  Id: aa0.30c0 Suspend: 1 Teb: 002db000 Unfrozen
   4  Id: aa0.1d34 Suspend: 1 Teb: 002de000 Unfrozen
  5  Id: aa0.1a08 Suspend: 0 Teb: 002e1000 Unfrozen
   6  Id: aa0.2954 Suspend: 1 Teb: 002e4000 Unfrozen
   7  Id: aa0.3cf4 Suspend: 1 Teb: 002e7000 Unfrozen
   8  Id: aa0.3d2c Suspend: 1 Teb: 002ea000 Unfrozen
.  9  Id: aa0.4300 Suspend: 1 Teb: 002ed000 Unfrozen
  10  Id: aa0.4224 Suspend: 1 Teb: 002f0000 Unfrozen
  11  Id: aa0.2cec Suspend: 1 Teb: 002f3000 Unfrozen

  • All finalizeable objects are declared as finalizable
  • Only the finalizer thread is allowed to run to finalize all now considered garbage objects

The problem with that approach is that there is an inherent race condition where a not yet completed Dispose call is calling into unmanaged code then the finalizer will try to call the unmanaged cleanup call a second time.

Is This A Problem?

Well lets check who in the .NET Framework calls GC.SuppressFinalize

image

There are quite a few classes in the Base Class Library which implement finalizers this way. GC.SuppressFinalize is always called last which is a time bomb waiting to crash on you at the worst possible time and killing e.g. your UI while you are closing everything. Lets try an experiment by changing our code to create Brushes instead of events:

// create 50K events
var events = Enumerable.Range(1, 50 * 1000)
                        .Select(x => new SolidBrush(Color.AliceBlue))
                        .ToList();

When I let it run I get a nice AccessViolationException which some of us certainly have sporadically seen but have been left back wondering why that exception did happen to them:

image

To be fair. Not all classed listed above are susceptible to that race condition. Some classes already check if a shutdown is running and do in that case nothing:

~OverlappedCache()
{
   if (!NclUtilities.HasShutdownStarted)
  {
       this.InternalFree();
  }
}

The Fix

There are several ways to get around that. The easiest is to move the GC.SuppressFinalize call before the Dispose call which will prevent the finalizer from running during shutdown if a Dispose call is already executing. If an exception escapes from the Dispose call it will not be tried by the finalizer a second time which sounds like a good deal for most resources.

public void Dispose()
{
    if( hGlobal !=  IntPtr.Zero) // check if memory is gone
    {
        GC.SuppressFinalize(this); // Prevent finalizer from running it again
        Marshal.FreeHGlobal(hGlobal); // free it            
        hGlobal = IntPtr.Zero;
    }
}

Another way is to check if a shutdown or an AppDomain unload is happening right now:

    if (!Environment.HasShutdownStarted && !AppDomain.CurrentDomain.IsFinalizingForUnload())

.NET Core on the other hand does not suffer from that issue because a .NET Core application has no final finalizer call which prevents that race condition entirely. Now go and check your finalizers to make your application correctly shutting down.

Update 1

As requested by Steve I present a fixed safe version:

In 2018 you should not write a finalizer at all. The basic Dispose(bool bDisposing) pattern is from a time where we had no SafeHandles. Today I would write my Event class entirely without a finalizer but the unmanaged resource/s are self contained by finalizable Safehandles. A typical wrapper would look like the one below which owns the memory pointer:

sealed class SafeNativeMemoryHandle : SafeHandleZeroOrMinusOneIsInvalid
{
    public SafeNativeMemoryHandle(int size):base(true)
    {
        SetHandle(Marshal.AllocHGlobal(size));
    }

    protected override bool ReleaseHandle()
    {
        if (this.handle != IntPtr.Zero)
        {
            Marshal.FreeHGlobal(this.handle);
            this.handle = IntPtr.Zero;
            return true;
        }
        return false;
    }
}

With that infrastructure in place we can improve the event class to the much easier version which will never leak any memory although it contains no finalizer at all:

/// <summary>
/// SafeEvent class needs no finalizer because unmanaged resources
/// are managed by the SafeNativeMemoryHandle which is the only class which needs a finalizer.
/// </summary>
public class SafeEvent : IDisposable
{
    internal SafeNativeMemoryHandle hGlobal;  // allocate some unmanaged memory

    public SafeEvent()
    {
        hGlobal = new SafeNativeMemoryHandle(500);
    }

    public void Dispose()
    {
        hGlobal.Dispose();
        hGlobal = null;
    }
}

You can also create event hierarchies by making the Dispose method virtual without fear to leak any handles from its derived classes. Each class which contains unmanaged resources should contain its own self cleaning members and you are done as long as there are no dependencies between them.