How .NET 4.8 Can Break Your Application

A colleague was calling me in to look at a broken .NET application after installing a preview up an upcoming Windows KB. The application was not starting up all the time. That are easy issues because it is easy and fast to reproduce the problem. Since it was happening on a VM with no Visual Studio installed I used the best und most underestimated tool named dnSpy. I learned about this tool by reading the book Pro .NET Memory Management from Konrad Kokosa is not only an in depth explanation how the GC works but it covers also many not so well known tools to drill into .NET for Windows and Linux. The most useful sections are in my opinion the last ones which cover in depth usage of Span and  System.IO.Pipelines. The GC is a fascinating piece of software but in practice you are fastest if you do not need to rely on it. dnSpy is a xcopy deployable Visual Studio Debugger clone which automatically decompiles managed code on the fly where you can set breakpoints and do line level debugging without having the actual source code. If you ever have to do production debugging you need to have this tool in your toolbox. In many ways dnSpy is better than Windbg because until today Windbg has no way to debug .NET code with a source code window which is in my opinion a shame.

One really cool option of dnSpy is to edit the decompiled source code and modify a running binary. Why am I excited about that? Will strong name verification not prevent loading such a tampered assembly? First of all: No you can recompile a strong named assembly and it will load normally since .NET 3.5 SP1

Bypass signature verification of trusted assemblies

Starting with the .NET Framework 3.5 Service Pack 1, strong-name signatures are not validated when an assembly is loaded into a full-trust application domain,

Not many people are aware of this. To repeat: Strong naming won´t give you any security since over 10 years.

This makes it easy to force a  highly sporadic exception on production machines (e.g. connection loss, disk full, ….) with ease by modifying the original method where the exception was thrown. If you have e.g. a small application which for some reason ends up in the general exception handler and there the error handling does bad things with some code like this:

        static void Main(string[] args)
        {
            try
            {
                new FileStream(@"C:\temp\NotexistingFile.txt", FileMode.Open);
                Console.WriteLine("Got Here");
            }
            catch(FileNotFoundException ex)
            {
                Console.WriteLine("File was not found.");
            }
            catch(Exception ex)
            {
                Console.WriteLine("Fatal exception encountered!");
            }
        }

image

If you start the application from dnSpy it will skip using Native Images which will give you even for release builds all local variables with full fidelity. While debugging the same application in VS 2019 even with Source Server support often it cannot locate .NET Framework sources and you see no local variables and no C# code:

image

That is not nice. What is the reason for that? First Visual Studio uses Native Images if present. If these were compiled with no debug settings then the local variables will be optimized away. Even if you check

image

in Debugger – Options will not help because it only affects JITed code like your application but not loaded NGenned images. But there is a way out

1. Set the environment variable COMPLUS_ZAPDISABLE=1  or the registry key HKLM\SOFTWARE\Microsoft\.NETFramework\ZapDisable  and launch from that shell either VS or your application to which you attach later. See here for more information.

2. Create a file in C:\Windows\Microsoft.NET\assembly\GAC_32\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.ini with the content

[.NET Framework Debugging Control]
GenerateTrackingInfo=1
AllowOptimize=0

Point 1 will disable loading of native images. If you set this system wide your application will start much slower but all of your code will be JITed now. Point 2 will disable JIT optimizations for a given dll where the ini file must be named for an assembly xxxx.dll like xxxx.ini but NOT xxxx.dll.ini or xxxx.ini.dll. You need to do this for each assembly you want to disable JIT optimizations. The  post of Scott Hanselman gives misleading advice. His post leaves you with the impression you need to create that ini file only for your executable. That is not the case. To make things easier I have wrapped that stuff into a powershell script which allows me to disable the use of native images system wide and generate the ini files in the GAC and additionally in my application directory.

PS D:\Utils\EnableLocalVars.ps1 -enable 1 -dir "C:\Program Files\Test" -gac -ngen 0
Did set NGen Flag to 0. NGen Images are disabled now. This will impact application startup!
Adding 1518 files from GAC
Adding 68 files from GAC64
Adding 2 files from C:\Program Files\Test
Skipped 8 files because ini file already exists
Created 1580 ini files.

That script makes it easy to enable/disable on an installed machine. I have put a link to my OneDrive Folder here.

.\EnableLocalVars.ps1 
EnableLocalVars.ps1 -enable 0/1 -dir xxxx [-recurse] or [-gac] [-Ngen 0/1]
 by Alois Kraus 2018-2019
 Use this script when you want to attach to a running process or you want to get memory dumps of crashed processes of release builds with all local variables not optimized away.
 This will disable JIT optimizations to enable viewing local variables of methods in a debugger which otherwise would be optimized away.
  -enable 0/1         Enable local variables to view in debugger
                      If 1 enable local variables. The .ini files are created.
                      If 0 disable local variables. The .ini files are removed.
  -dir directory      Select directory where all dll/exe files get a .ini file
  -recurse            If set it will search recursively for all dlls below that path
  -gac                Make all GAC dlls debuggable
  -NGen 0/1           Disable/enable usage of NGen binaries system wide by setting the registry key HKLM\SOFTWARE\Microsoft\.NETFramework\ZapDisable
                      1 Enable NGen Images  == ZapDisable=1
                      0 Disable NGen Images == ZapDisable=0
                      You need to enable this if you want to view variables of .NET Framework code or if your application was NGenned
You need to restart the applications after executing this command for the changes to take effect.
Examples
 Enable locals for all GAC and  application dlls on an installed and NGenned machine
   EnableLocalVars.ps1 -enable 1 -gac -dir "c:\program files\myApp\bin" -ngen 0
 Reset settings
   EnableLocalVars.ps1 -enable 0 -gac -dir "c:\program files\myApp\bin" -ngen 1
 Enable NGen Images again
  EnableLocalVars.ps1  -ngen 1
WARNING: NGen images are disabled. Do not forget to enable them (EnableLocalVars.ps1 -Ngen 1) when you are done or all .NET applications will start ca. x2 times slower!

It comes with extensive help so it should make it easy enough that you perform this step automatically before taking a memory dump. This will make memory dump debugging a whole lot easier. Usually I make the error configurable by e.g. checking for the existence of a file to force the error when I want to inject some error in .NET Framework code like this:

image

This trick helped me a lot to force the error only when the application has got into the initial non faulted state. Now I can trigger the issue by creating a file from the command line with

 C>echo Hi > c:\temp\ForceError.txt

This will trigger the exception in the patched file:

if (File.Exists(“C:\\Temp\\ForceError.txt”))
{
         throw new Exception(“This one is rare!”);
}

When you try to save the changed mscorlib.dll you get an access denied error because the file is in use

image

But since we are pros we rename the running mscorlib.dll with no fear

image

because we know that Windows opens dlls always with the FILE_SHARE_DELETE sharing permission. This  gives us the right to always rename a running dll. We can now save the patched mscorlib.dll without any issue:

image

When we run the modified mscorlib.dll we get indeed into the impossible error region once the trigger is activated:

image

That is a great way to test how robust your application really is if you suspect that the error handling is wonky. But I am digressing here. Now back to the .NET 4.8 issue.

When I debugged it with Windbg the error did occur but not with dnSpy which left me back puzzled. If you have followed this far you can probably guess that this issue only surfaces when native images are loaded into the process. That was the reason why dnSpy was working perfectly. By enabling local variables and Ngenning the binaries again it did help to find out what was happening. The exact details are not important but it suffices to say that the application did use an old Dependency Injection framework named ObjectBuilder which was part of the Enterprise Library many years ago. To select the right ctor the default policy was to always use the first one by a call to Type.GetConstructors:

// Microsoft.Practices.ObjectBuilder.DefaultCreationPolicy
public ConstructorInfo SelectConstructor(IBuilderContext context, Type typeToBuild, string idToBuild)
{
    ConstructorInfo[] constructors = typeToBuild.GetConstructors();
    if (constructors.Length > 0)
    {
        return constructors[0];
    }
    return null;
}

That worked well for a over a decade but no more with .NET 4.8. If you read the API help for Type.GetConstructors

Remarks

The GetConstructors method does not return constructors in a particular order, such as declaration order. Your code must not depend on the order in which constructors are returned, because that order varies.

it becomes evident that this approach can cause problems. In this particular case the first ctor was the default ctor for a given type but with .NET 4.8 and NGenning the application the order of ctors did change. The first ctor was now ctor(string) which did cause object creation to fail miserably. To hit this issue many things must come together. If you e.g. add another type to the specific assembly that contained the type the issue goes away because for the error to surface you need to craft your IL metadata a specific offsets to make the issue happen. I have created a small reproducer which contains 10K classes with two ctors which shows the opposite issue. With JITed code class I have one class which has mixed ctor order. The Source code is here.

D:\CtorOrderNgenNotNGen\bin\Debug>CtorOrderNgenNotNGen.exe
Type DerivedSealed1808 has different ordered ctor!

The application checks if the first ctor is the default ctor which should be nearly always the same:

        private static void CheckCtorOrder()
        {
            foreach(var type in typeof(Base).Assembly.GetTypes())
            {
                if( type.Name.StartsWith("DerivedSealed"))
                {
                    ConstructorInfo[] infos = type.GetConstructors();
                    if( infos[0].GetParameters().Length > 0 )
                    {
                        Console.WriteLine($"Type {type.Name} has different ordered ctor!");
                    }
                }
            }
        }

After NGenning the application all classes have the same ctor order. This is pretty fragile. The advice on MSDN is therefore there for a reason. I have checked other more popular Dependency Injection Framework’s namely

  • Spring.NET
  • Windsor
  • AutoFac
  • Unity

which all have done their homework and contain not such problematic code. I might be the only one hitting that issue. But if you find strange application failures after a Windows KB check if the KB contains .NET 4.8 which could be the reason why your .NET application fails. .NET 4.8 has been available for 6 month and is pushed out via Windows Update as mandatory update now. MS is careful to not roll this out globally at once. Currently every 20th machine which requests all mandatory Windows Updates will get it but I am not sure when the switch will be flipped to 100%. This approach of MS is good but if you wonder why only some machines of your department behave strangely you should check if they really got all the same KBs installed even if everyone did install all mandatory updates. If you have a larger department and maintain your own Windows Update Service you should definitely keep an eye on strange .NET application behavior after you get the update which contains .NET 4.8. There are some issues with themed WPF context menus which can lead to misplaced, empty or non themed context menus but these should happen only sporadically. WPF still has issues with grouped Virtualizing Stackpanels (since .NET 4.5 where it was introduced) which can cause indefinite layout calls which can surface as UI hangs due to semi memory leaks. But that is part of another story.

Advertisements

7 thoughts on “How .NET 4.8 Can Break Your Application

  1. That’s a cool bug. Very nasty.

    I remember when .NET 4.7.2 broke out production apps because a new LINQ method was added that conflicted with out own, identical method.

    The update was auto-installed in production and we woke up to a broken app.

    Not sure I can fault Microsoft for that. Compatibility is very hard. .NET compat is amazing compared to anything else. Open source developer platforms break things all the time.

    Like

    1. Yes Appcompat is really hard. I was informed that they have stopped the occasional rollout via Windows Update now and are investigating further if they can make it non breaking. Lets keep the fingers crossed that they can improve it further to really break not anyone.

      Like

  2. This was one of the most helpful posts I’ve read in the recent months. That trick of .ini files made me feel decieved for years – I even thought it was a descontinued feature. All clear now.

    Like

  3. Alois, I work for the team that recently shipped .NET 4.8. You’re absolutely right, getting app-compat right is hard. Even though the official GetConstructors() documentation calls out that the return order should be not be relied upon we realize some customers might still have problematic code that does rely on ordering. Unfortunately this realization came after we had rolled out .NET 4.8 earlier this summer. We have a fix that will more closely align constructor ordering for NGEN to match the IL behavior. This fix is currently being built and tested, and getting into the pipeline to ship out to customers using 4.8 via a cumulative update over the next ~6 weeks.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.