MS Technical Summit 2016 at Darmstadt

I had the pleasure to give a talk at the MS Technical Summit 2016 at Darmstadt about ETW. You can find our slides, the sample code, a crash dump and the resulting sample ETW trace here. It was a great event together with Wolfgang Kroneder from MS. Here is the video:

https://channel9.msdn.com/Events/microsoft-techncial-summit/Technical-Summit-2016/Deep-Dive-Event-Tracing-for-Windows/player

For some problems you need custom ETW tracing to find out what is going on. Reference counting bugs are one of the nasty ones where it is very hard to find the root cause if they are timing dependent. If you add normal printf, OutputDebugString tracing to the Add/ReleaseReference calls the error usually goes away. Below is the sample code from my talk minus the ETW manifest generated tracing macros. It mimic’s a reference counting issue were we suspected that someone was calling ReleaseReference too early on a reference counted class which did lead to a too early destruction of the reference counted object. Interestingly it was being deleted more than once which resulted in either a heap corruption or a pure virtual function call exit depending on the exact timing.

The code below will call AddReference once in the main method and then many times balanced Add/ReleaseReference from 5 different threads which should never cause the reference count to drop to zero.

 

#include <windows.h>
#include <vector>
#include <thread>
#include <mutex>
#include "ETWRefCounter.h"  // ETW Tracing header generated from ETW manifest during each build with Pre build command
                            // mc -um $(ProjectDir)RefCounterETW.man  -h $(ProjectDir)  -z $(ProjectName)
                            // ecmangen is the editor to create ETW manifest files. 


// Thread safe refcounter base class
class CRefCounterBase
{
public:
    CRefCounterBase() : m_Counter(0) {}
    
    CRefCounterBase(CRefCounterBase &other) = delete;
    CRefCounterBase& operator=(const CRefCounterBase&) = delete;
    
    virtual void DeleteObject() = 0;

    // Return new reference count incremented by one
    long AddReference()
    {
        auto lret = InterlockedIncrement(&m_Counter);
        EventWriteAddRefEvent((__int64)this, lret); // trace new value
        return lret;
    }

    // Return new reference count decremented by one
    long ReleaseReference()
    {
        auto old = m_Counter;
        auto newValue = InterlockedDecrement(&m_Counter);
        EventWriteReleaseRefEvent((__int64)this, newValue); // trace new value
        if (newValue == 0)
        {
            EventWriteFinalReleaseEvent((__int64)this);
            DeleteObject();
        }
        return newValue;
    }

    // Check if RefCount == 1. This can only be safely used when the threads using this object are
    // guaranteed to not change the refcount after this check has been made. 
    bool OnlySingleUser()
    {
        EventWriteIsSingleUserEvent((__int64) this, m_Counter);
        return InterlockedExchange(&m_Counter, m_Counter) == 1;
    }

protected:
    virtual ~CRefCounterBase() { m_Counter = 0; }

    long m_Counter;
};

/// Actual reference counted class
class RefCountedObject : public CRefCounterBase
{
public:
    RefCountedObject() {}
    RefCountedObject(RefCountedObject &other) = delete;
    RefCountedObject& operator=(const RefCountedObject&) = delete;
    virtual ~RefCountedObject() {}

    virtual void DeleteObject()
    {
        printf("\nDeleteObject called 0x%p, %d", this, ::GetCurrentThreadId());
        delete this;
    };
};


// Using RefCounter from multiple threads in a balanced way which works
void AsyncWorker(CRefCounterBase *counter)
{
    while (true)
    {
        counter->AddReference();
        if (counter->OnlySingleUser())
        {
            // some optimized code not shown here which also causes add/releaseref
        }
        counter->ReleaseReference();
    }
}

int main()
{
    EventRegisterRefCounter();

    static const int ThreadsCount = 5;
    std::vector<std::thread> threads; // thread list 

    auto pCounter = new RefCountedObject();   // construct object
    pCounter->AddReference(); // ensure that refcount does not drop to zero while other threads are working with it.

    for (int i = 0; i < ThreadsCount; i++) // start some threads and increment and decrement stuff in a loop
    {
        std::thread th(AsyncWorker, pCounter);
        threads.push_back(std::move(th));
    }

    threads[0].join();

    EventUnregisterRefCounter();
    return 0;
}

When you let the exe run it will either run happily forever or crash. During a crash it will terminate with an unhandled exception because of a pure virtual function call like this,

image

or with a heap corruption. In reality the code was not as simple but it pretty much boils down to the code above. Can you spot the error?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s