What do you do when you get an impossible exception of the form
System.InvalidCastException: Object of type 'MyApp.Type' cannot be converted to type 'System.Collections.Generic.IEnumerable`1[MyApp.Type]'. System.RuntimeType.TryChangeType(…) System.RuntimeType.CheckValue(…) System.Reflection.MethodBase.CheckArguments(…) System.Reflection.RuntimeMethodInfo.InvokeArgumentsCheck(…) ...
where basically an array of type X cannot be casted to IEnumerable<X>. Any LINQ affine programmer will do that 20 times a day. What the heck is going on here? It looks like the CLR type system has got corrupted by some bad guy. It could be anything from a spurious memory corruption due to cosmic rays or some code on this or another thread did corrupt memory somehow. Random bit flips are not so uncommon if you know that cosmic muons hit the earth surface at a rate of ca. 60 000 muons/(h*m^2). It is only a matter of time until some bits in your memory gets randomly another value.
This issue surfaced on some machines quite reproducible so we can put back exotic theories about possible sources of memory corruption back into the bookshelf. First of all I needed to get a memory dump of the process in question. This is much easier if you know how you can “attach” procdump to one process and let it create a dump on any first chance exception with a specific type or message. That comes in handy if you want to create a dump not when a process crashes but when an internally catched exception occurs and you want to inspect the application state at this point in time.
The Sysinternals tool Procdump can do it from the command line without any extra installation. The -e 1 will instruct it to dump on first chance exceptions and -f will do a substring filter of the shown message in the console by procdump.
C:\>procdump -ma -e 1 -f "InvalidCast" Contractor.exe ProcDump v7.1 - Writes process dump files Copyright (C) 2009-2014 Mark Russinovich Sysinternals - www.sysinternals.com With contributions from Andrew Richards Process: Contractor.exe (12656) CPU threshold: n/a Performance counter: n/a Commit threshold: n/a Threshold seconds: 10 Hung window check: Disabled Log debug strings: Disabled Exception monitor: First Chance+Unhandled Exception filter: *InvalidCast* Terminate monitor: Disabled Cloning type: Disabled Concurrent limit: n/a Avoid outage: n/a Number of dumps: 1 Dump folder: C:\ Dump filename/mask: PROCESSNAME_YYMMDD_HHMMSS Press Ctrl-C to end monitoring without terminating the process. CLR Version: v4.0.30319 [19:48:12] Exception: E0434F4D.System.InvalidCastException ("Object of type 'MyApp.Type' cannot be converted to type 'System.Collections.Generic.IEnumerable`1[MyApp.Type]'.") [19:48:12] Dump 1 initiated: C:\Contractor.exe_160718_194812.dmp [19:48:12] Dump 1 writing: Estimated dump file size is 50 MB. [19:48:12] Dump 1 complete: 50 MB written in 0.1 seconds [19:48:13] The process has exited. [19:48:13] Dump count reached.
Once I had the dumps I did look into it but I could not make much sense out of the TypeCastException. Without the exact CLR source code it is very hard to make sense of the highly optimized data structures in the CLR. At that point I did reach out to Microsoft to look into the issue. Since the dump did also not provide enough clues I did pack a virtual machine together and transfer it directly to MS support via their upload tool. The support engineer was not able to get the VM running which seems to happen quite frequently if you upload 50+ GB files over the internet. The solution was to send me via mail an USB hard drive, upload the VM onto the disk and send it back to MS. that worked out and I heard nothing from MS support for some time.
It seems that my issue was routed through various hands until it reached one of CoreCLR devs which have debugged certainly many similar issues. When I was finally suspecting to never hear about that issue again I got a mail back that they have found the issue.
The problem is that the CLR type system got confused about invalid assembly references which have invalid Flags set. The assembly reference flag set in the IL code must be set to zero but in my case it was set to 0x70 which is normally a C# compiler internal tracking code to mark reference assemblies. In the final code emitting phase the C# compiler will zero out the assembly reference flags and all is fine. But for some reason one of my assemblies still had set 0x70 for all reference assemblies.
If you wonder what reference assemblies are: These are the assemblies the .NET Framework consist of in various versions. If you link against a specific .NET Framework version like 4.6.2, you “link” against the assemblies located in these directories to ensure that can call only APIs for the currently configured .NET target framework.
I was told that with the .NET 4.5.2 C# compiler an issue was reported which could trigger that behavior. If the newer Roslyn based C# 6 compiler suffers still from this issue is not known to me. That sounded like an arcane and spurious compiler bug. For some reason I was asked a few days later by MS if we modify the generated assemblies. It turns out some targets use Code Contracts to enforce run time checks which need assembly rewriting. After doing a query for all assembly references I found that ALL Code contracts rewritten assemblies have invalid Assembly reference flags with 0x70 set. This is still true with newest Code Contracts version which is currently 1.9.10714.2.
If you try to check the Assembly references via Reflection you will find that it will always tell you that the assembly references fields in the references assemblies are zero. This is even true if you load them with Assembly.LoadForReflectionOnlyFrom. Even Ildasm will not show you the fields of the Assembly references. Windbg? Nope.
C:\>apichange -sr D:\bin\SubContractor.dll SubContractor.dll -> System, Version=18.104.22.168, Culture=neutral, PublicKeyToken=b77a5c561934e089 0x70 mscorlib, Version=22.214.171.124, Culture=neutral, PublicKeyToken=b77a5c561934e089 0x70
There I could finally see that my assembly had invalid assembly references. How can I fix it? That is easy. Turn off Perform Runtime Contract Checking!
Now you get correct assembly references as the ECMA spec requires them.
C:\>apichange -sr D:\bin\SubContractor.dll SubContractor.dll -> mscorlib, Version=126.96.36.199, Culture=neutral, PublicKeyToken=b77a5c561934e089 0x0 System, Version=188.8.131.52, Culture=neutral, PublicKeyToken=b77a5c561934e089 0x0
The source of the spurious InvalidCastExceptions is found but why does it happen so infrequently? I have tried to create a simple reproducer but I was not able to force the cast error. It seems to have something to do with the machine, memory layout, number of loaded types, IL layout, moon phase, …. to trigger it. If something goes wrong it seems that in one assembly with correct assembly references to the Reference assemblies a cast is tried from
mscorlib ; IEnumerable<T> to mscorlib,Ref=0x70; IEnumerable<T>
where the type identity with generic types and their arguments gets mixed up in the process which then correctly tells me that seemingly identical types are not identical because they differ in their full type expansion by their assembly reference.
I was never a big fan of Code Contracts because of the way how they were implemented. Rewriting existing assemblies can and will cause unforeseen issues in production code. If I can I want to keep my C# and JIT compiler errors and not add Code Contract bugs on top of it. Do not get me wrong. I really like the idea of contract checking. But such checks should be visible in the source code and or at least compiled directly by a Code Contracts plugin by the now much more open Roslyn Compiler.
As it is now Code Contracts considerably increase compilation time and if you enable static code checks you easily can get builds which take several hours which is a way too hefty price for some additional checks.
If you now want to remove Code Contracts you need to remove from your source code all Contract.Requires<Exn>(…) checks because these will assert in non rewritten Release builds as well which is certainly not what you want. As it stands now Code Contracts violate the contract of generating valid assembly references. The assembly rewriter of Code Contracts generates assemblies with invalid IL code which can trigger under specific circumstances impossible runtime errors in the CLR.