The Case Of A Stuck LoadLibrary Call

I frequently encounter impossible bugs which after full analysis have a reasonable explanation although I have no source code to the problematic library. Todays fault is featured by



the Intel Math Kernel Library.

Lets use some fancy neural network library (AlgoLibrary!UsingCaffee) like Berkeley AI caffee.dll which in turn uses the Intel Math Kernel Library (mkl_core.dll) which hangs forever in a call to LoadLibrary. The first thing to do when you are stuck is taking memory dump (e.g. use Task Manager or procdump)

and loading it into Windbg or Visual Studio. Since you are reading this I assume you want to know more about the mysterious commands Windbg has to offer. You can load a memory dump into Windbg from the File – Menu

To get an overview where all threads of a process are standing or executing you can dump with the ~* command the following command on all threads. The dump thread stack command in Windbg is k. To dump all threads you need to execute therefore


107  Id: 67b0.8094 Suspend: 0 Teb: 00000049`9df88000 Unfrozen
Call Site

When you load a memory dump into Windbg you can issues with the k command a full stack walk which will output hopefully a nice stack trace after you have resolved your symbols from Microsoft, Intel and the caffee.dll. We see that LoadLibrary is stuck while caffee tries to allocate memory which internally loads a library. We would expect that our call from AlgoLibrary!UsingCaffee should be finished within a few ms. But this one hangs forever.

The allocation code comes from this:

 *ptr = mkl_malloc(size ? size:1, 64);

When LoadLibrary hangs it is due to some other thread holding the OS Loader lock. This loader lock is effectively serializing dll loading and global/thread static variables initialization and release. See for a more elaborate description.

With Windbg and the !peb command we can check further

dt ntdll!_PEB 499df0d000
+0x0e8 NumberOfHeaps    : 0x15
+0x0ec MaximumNumberOfHeaps : 0x20
+0x0f8 GdiSharedHandleTable : 0x00000183`71d00000 Void
+0x100 ProcessStarterHelper : (null) 
+0x108 GdiDCAttributeList : 0x14
+0x10c Padding3         : [4]  ""
+0x110 LoaderLock       : 0x00007ffe`f447a568 _RTL_CRITICAL_SECTION
+0x118 OSMajorVersion   : 0xa
+0x11c OSMinorVersion   : 0
+0x120 OSBuildNumber    : 0x3839
+0x122 OSCSDVersion     : 0
+0x124 OSPlatformId     : 2

We can dump the Loader Lock structure by clicking on the blue underlined link in Windbg. But this formatting has not survived in WordPress.

0:107> dx -r1 ((ntdll!_RTL_CRITICAL_SECTION *)0x7ffef447a568)
((ntdll!_RTL_CRITICAL_SECTION *)0x7ffef447a568)                 : 0x7ffef447a568 [Type: _RTL_CRITICAL_SECTION *]
[+0x000] DebugInfo        : 0x7ffef447a998 [Type: _RTL_CRITICAL_SECTION_DEBUG *]
[+0x008] LockCount        : -2 [Type: long]
[+0x00c] RecursionCount   : 1 [Type: long]
[+0x010] OwningThread     : 0x6710 [Type: void *]
[+0x018] LockSemaphore    : 0x0 [Type: void *]
[+0x020] SpinCount        : 0x4000000 [Type: unsigned __int64]

Ahh now we know that thread 0x6710 is owning the lock. Switch to that thread with the command ~~[0x6710]s which allows you specify OS thread id directly. The usual ~dds command switches to debugger enumerated threads which start always at 0. That is convenient but sometimes you need to switch by thread id.

0:114> k


That stack trace is strange because it calls SwitchToThread which basically tells the OS if any other runnable thread for this core is ready to run it should run. If not this is essentially a CPU burning endless loop. Such constructs are normally used in Spinlocks where ones takes at first not lock at all to prevent the costs of a context switch at the expense of burning CPU cycles. In this case it looks like mkl_core is cleaning up some thread static variables which requires synchronization with some lock object which is taken by some other thread. Since I do not have the code for this I can only speculate which thread it is.

Lets get the CPU time this thread to see how long it tries to enter the lock:

0:114> !runaway
 User Mode Time
  Thread       Time
  114:6710     0 days 2:14:28.671
   17:62a4     0 days 0:00:47.656
   92:88bc     0 days 0:00:08.218
   22:107c     0 days 0:00:02.953
   19:3a9c     0 days 0:00:02.765

Nearly all CPU of the process (2h 15 minutes) is spent by this thread (2h 14 minutes) which wastes CPU cycles for hours.

Since we have only one other mkl thread running which is our stuck thread in the LoadLibrary call it all makes sense now. The LoadLibrary call hangs because thread 0x6710 is exiting and cleaning up some thread static variables which takes a lock which our stuck LoadLibrary thread already possesses. But since our shutting down thread owns the loader lock we have two dependant locks owned by two different threads -> Deadlock.

You need to be careful on which threads you execute MKL code which are exiting at some point in time while new threads are initializing new mkl variables. At some points in time you can see funny deadlocks which look really strange. One thread burns CPU like hell which is coming from an exiting thread and your own code is stuck in a LoadLibrary call which waits for the Loader Lock to become free.

If you dig deep enough you can find the root cause even when you do not have the full source code. As added bonus I have learned some arcane Windbg commands along the way. These will be helpful the next time you need to dig into some stuck threads.


2 thoughts on “The Case Of A Stuck LoadLibrary Call

  1. Instead of ~*k you can use !uniqstack which coalesces identical stacks. Or even better, open the dump in Visual Studio, and use the Parallel Stacks view, that gets you a nice and handy prefix tree breakdown of all call stacks.


    • Yes I use these also. Visual Studio has become pretty good at showing nice stacks but it often annoys me that some threads are missing, or stacks are shortened. E.g. if you have a stuck finalizer it will not show the full finalizer thread call stack. Or if you have an exiting thread VS tends to forget about this one and does not even show that thread in the thread list. For a normal analysis VS is a good start.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.