The last test is from 2019 which is still accurate but the world is changing and we have arrived at .NET 7.0 which is reason enough to spin up my test suite again and measure from .NET 4.8, 3.1, 5.0, 6.0 up to 7.0. The “old” articles are still relevant despite their age.
The following serializers are tested
Serializers with – in front are in maintenance mode.
New entries since 2019 are
- Google Protobuf
Which of these 23 serializers are in maintenance mode? Conditions (logical or) for maintenance mode are
- Last commit > 1 year
- It is archived
- All commits within one year changed no product code
- The author tells you to not use it
Maintenance mode serializers at the point of writing are:
- JIL (2019 last commit)
You can still look at them, but I would not use them for real products where future updates to e.g. .NET 8.0 or security fixes are mandatory.
What else is not recommended in 2022?
BinaryFormatter should nowhere used in production anymore because it allows, if you deserialize attacker controlled data, to open a shell on your box. See https://learn.microsoft.com/en-us/dotnet/standard/serialization/binaryformatter-security-guide.
The following serializers which are part of .NET Framework are unsafe:
- BinaryFormatter (System.Runtime.Serialization.Formatters.Binary)
- SoapFormatter (System.Runtime.Serialization.Formatters.Soap obsolete since .NET 2.0)
- NetDataContractSerializer (System.Runtime.Serialization aka WCF)
- LosFormatter (System.Web.UI)
- ObjectStateFormatter (System.Web.UI)
I did write the SerializerTests test suite which makes it easy to test all these serializers. If you want to measure for yourself, clone the repository, compile it, execute RunAll.cmd and you get a directory with CSV files which you can graph as you like.
Tooling is getting better and I have also added profiling support so you can check oddities at your own if you call RunAll.cmd -profile you get ETW traces besides the CSV files. This assumes you have installed the Windows Performance Toolkit from the Windows SDK.
The winner in terms of serialize speed is again FlatBuffer which still has the lowest overhead, but you need to go through a more complex compiler chain to generate the de/serialization code.
After discussions about the fairness how FlatBuffer should be tested I have changed how FlatBuffer performance is evaluated. Now we use an existing object (BookShelf) which is used as input to create a FlatBuffer object. Additionally it was mentioned that some byte buffer should also be part of the tested object. This moves FlatBuffer from rank 1 to 15, because the new test assumes you have an object you want to convert to the efficient FlatBuffer binary format. This assumption is reasonable, because few people will want to work with the FlatBuffer object directly. If you use the FlatBuffer object directly then the previous test results still apply.
The layout of the serialized Book object include
- int Id
- string Title
- byte Payload Before it was null now it contains 10 bytes of data
Our BookShelf object looks as serialized xml like this
<?xml version="1.0" encoding="utf-8"?> <BookShelf xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <Books> <Book> <Title>Book 1</Title> <Id>1</Id> <BookData>YWJjZGVmZ2hpag==</BookData> </Book> </Books> </BookShelf>
Now we serialize nearly 40% of binary data which contains just printable ASCII characters. This “simple” change moves the landscape quite a bit: GroBuf is first and MemoryPack second. This contains also an update from MemoryPack from 1.4.4 -> 1.8.10. If GroBuf has less overhead (but double the serialized size because it stores strings as Utf-16 while most others convert the data to Utf-8), or if MemoryPack has some regression I cannot tell. To reproduce the tests you can run the tests with
to use a 10 byte buffer payload for each Book object. If you omit the 10 then you will serialize with 0 bytes of payload. The combined csv file SerializationPerf_Combined_10.csv contains as suffix how much payload per book object was used.
*Update 1 End*
The new entry MemoryPack is second which is the newest serializer by Yoshifumi Kawai who seems to write serializers for a living. He has several entries in this list namely
His latest MemoryPack library shows that you still can get faster once you know what things cost and what you can avoid without becoming too specialized. This one has pretty much all features one could wish for including a Source Generator for Typescript to read the binary data inside the browser or wherever your Typescript code is running.
Additionally he provided a helper to compress the data with Brotli, a very efficient compression scheme which competes with Bois_LZ4 which is currently the only serializer which supports also compression of the data. The Brotli algorithm is nice but there is zstd from Facebook which should be a bit faster which would make a nice addition perhaps in the future?
That is really impressive!
From a higher level we find that the old .NET Framework is getting behind in terms of speed and we see a nice step pattern which proves that every .NET Framework release has become faster than its predecessor.
Binary is good, but Json has also a large fan base. Microsoft did add with .NET 7.0 Source Generators to generate the serialization code at compile time for your objects. See https://github.com/Alois-xx/SerializerTests/blob/master/Serializers/SystemTextJsonSourceGen.cs how you can enable code generation for serialization. This feature generates code only for serialize but uses pregenerated metadata for deserialize. The feature to add pregenerated deserialization code is tagged as future (post .NET 8?). If you think this is important you can upvote the issue. The main reason to add this was not only performance but also AOT scenarios where you have no JIT compiler.
With pregenerated serialization code in SystemTextJsonSourceGen our BookShelf object is 20+% faster serialized compared to SystemTextJson which uses the “normal” runtime generated code. A significant part of the performance improvement seems to be related to directly write UTF8 without the need to convert an UTF-16 char memory stream to an UTF-8 byte stream.
We access after deserialization of the FlatBuffer object all properties which cause the FlatBuffer struct to create a new string or byte array on each access again. AllocPerf does not use a binary blob yet. The numbers are therefore identical.
Even with these changes FlatBuffer wins the deserialize race with some margin. The second place goes to GroBuf instead of MemoryPack. But these are really close and had before been fighting for the second place. As already mentioned before GroBuf doubles the serialized data size which allows it to omit the Utf-16 to Utf-8 conversion which saves some speed at the expense of data size.
*Update 1 End*
Normally data is more often read than written. Deserialization needs to allocate all the new objects which makes it largely GC bound. This is the prime reason why deserialize can never be faster than serialize because it needs to allocate new objects while in the serialize case these are already existing.
The fastest serializer is AllocPerf which is not a serializer, but an allocation test which walks over an array of binary data without parsing and constructs strings out of the data. This is a test how fast one could get without any parsing overhead.
MemoryPack sets the mark again which is nearly overhead free. With so little overhead there must be a downside? Yes you loose versioning support to some extent.
The follower after MemoryPack is BinaryPack which is not even trying to be version tolerant and can be used for simple caches but other than that I would stay away from it. But beware it is in maintenance mode!
Another notable change in Protobuf_net was the deprecation since 3.0 of the AsReference attribute which many people are missing. This made it easy to deserialize object graphs with cyclic dependencies, which is with Protobuf_net 3.0 no longer the case.
The now in maintenance mode Utf8JsonSerializer is still 35% faster than SystemTextJson. There is still room for improvement for .NET.
First Startup Results
Besides the pure runtime performance startup costs are also important. That is more tricky to measure since you not only need to pay the costs of reflection, code gen, JIT time for the precompiled (Cross/Ngen) and not precompiled scenario inside a fresh executable. If you do not use a new executable each time you would attribute the costs for first time init effects to the first called serializer in a process which would introduce systematic bias. SerializerTests starts for each serializer a new executable which is started three times (last time is added to CSV) to warmup the loaded dlls to not induce expensive hard page faults which are served from the disk, but from the file system cache. Then it serializes 4 different object types with one object and takes the maximum value which should be the first call. These details allow one to check if just the first code gen is so expensive and subsequent code gen is much cheaper because many one time init effects already have happened.
The old .NET Framework 4.8 looses by a large margin here, but only if Ngen is not used. If the images are Ngenned then it is not too bad compared to crossgenned .NET core binaries.
Is this the full story? Of course not. We are only measuring inside a small loop long after many things have happened, but since precompilation will cause all dependent dlls to be eagerly loaded by the OS loader we should better look at the runtime of the process as a whole. This can show interesting … gaps in our testing methodology.
To look deeper you can run the test suite with ETW profiling enabled with
to generate ETW data while executing the tests. You need to download Windows Performance Toolkit to generate the profiling data and ETWAnalyzer to easily query the generated data. You get from the RunAll.cmd command a bunch of CSV files, a combined file CSV file and some ETW files.
When you put ETWAnalyzer into your path you can extract the ETW file data into Json files which are put into the Extract folder. To convert the ETL files into Json files you add to the ETWAnalyzer command line -Extract All, the folder where the ETL files are located and a symbol server e.g. MS which is predefined. You can also use NtSymbolPath to use your own Symbol search path defined by the _NT_SYMBOL_PATH environment variable.
ETWAnalyzer -Extract All -FileDir D:\Source\git\SerializerTests\SerializerTestResults\_0_14_06 -SymServer MS
The extracted Json files have a total size of 112 MB vs 5 GB of the input ETL files which is a good compression ratio! Now we can generate a nice chart with an Excel Pivot Chart to look how long each of the 3 invocations of the startup tests did run:
And the winner is .NET 4.8! That is counter intuitive after we have measured that .NET 7 is faster everywhere. What is happening here? This chart was generated with input from ETWAnalyzer to dump process start/stop times command line and their duration into a CSV file:
ETWAnalyzer.exe -dump process -cmdline *firstcall#* -timefmt s -csv FirstCall_Process.csv
ETWAnalyzer can also dump the CPU consumption to console so we can drill deeper
Although the command line syntax looks complex it is conceptually easy. You first select what to dump e.g. CPU, Process, Disk, … and then add filters and sorting. Because ETW records a lot of data you get a lot of sorting and filter options for each command of ETWAnalyzer which may look overwhelming but it is just filtering, sorting and formatting after you have selected what to dump.
The following call dumps process CPU consumption for all processes with a command line that contains
- firstcall and cross and datacontract to select the .NET Core crossgenned processes OR
- firstcall and ngen and datacontract to select the .NET 4.8 Ngened SerializerTests calls
-ProcessFmt s prints process start stop times in ETW recording time (time since session start) and the process duration. You can format also your local time, or utc by using local or utc as time formats.
EtwAnalyzer -dump cpu -cmdline "*firstcall* cross*#datacontract*";*firstcall*ngen*#datacontract* -ProcessFmt s
After drilling into the data it became apparent that since .NET Core 3.1 we are now having two additional serializers (MemoryPack and BinaryPack) which both have significant startup costs. These costs are always payed even if you never use them! I have created a branch of SerializerTests (noBinaryMemoryPack) to remove them so we can compare .NET 4.8 and .NET Core+ with the same number of used dependencies.
Each .NET version has two bars, the lower one is crossgen and the higher one involves JIT compilation of the binaries. Since each process is started 3 times we see in the y-axis the sum of 3 process starts. NET 7.0 seems to have become a bit slower compared to the previous versions. The winner is still .NET 4.8 (455 ms) but at least the difference is not as big as before (558ms vs 972ms).
It is very easy to fool yourself with isolated tests which do not measure the complete (system) picture. Measuring something and drawing a picture is easy. Measuring and understanding what the data means in full details is a much bigger endeavor.
Cool Console Tools
I work a lot with command line tools which produce large width output. The cmd shell has the nice feature to configure a horizontal scroll bar by setting the console width to e.g. 500. This works but newer shells like Windows Terminal are becoming the norm. It works like Linux shells which never had support for horizontal scroll bars. The issue to add a scroll bar is denied https://github.com/microsoft/terminal/issues/1860 which is a pity. But there is an open source solution for it. A little known terminal extension named vtm is offering all features I need. When configuring my command prompt with vtm -r cmd
I get a Windows Terminal window which supports horizontal scrolling, PlainText, RTF, HTML text copy, Ctrl+V (Ctrl+C not yet) and Unix style copying with right mouse click. This makes working with ETWAnalyzer much easier and you can also properly read the help which is also wide output …
EtwAnalyzer -dump CPU -ProcessName SerializerTests.exe -ShowModuleInfo -TopN 1 -NoCmdLine
The command dumps of all extracted files in the current directory the SerializerTests executable with the highest CPU consumption and the module information which includes file version, product name, and directory from where it was started. That can be useful to check if e.g. your local IT has installed the updated software to your machine, or you need to call again to get the update rolled out. By making such previously hard to get information easily queryable I have found that for some reason SerializerTests was running under false flag as protobuf product and since a long time without any file version update. Both issues are fixed, because ETWAnalyzer made it very obvious that something was wrong.
This is the end of this post, but hopefully your start to make a better informed decision which serializer is best suited for your needs. You should always measure in your environment with your specific data types to get numbers which apply to your and not my still synthetic scenario. When you execute RunAll.cmd dd you add to each Book object a byte array with dd bytes which may change the performance landscape dramatically. This can simulate a test where you use a serializer as wrapper over another already serialized payload which did already generate a byte array. That should be cheap, in theory. But when larger byte arrays were used with the fastest Json serializer (UTF8Json) it became one of the slower ones. I tried hard to get the numbers right, but you still need to measure for yourself to get your own right numbers for your use case and scenario.
Performance numbers are not everything. If essential features are there, and you trust the library author that he will provide support over the foreseeable future, only then you should consider performance as next item on your decision matrix to choose a library for mission critical long lived projects. If you are running a pet project, use whatever you seem fit.
4 thoughts on “.NET Serialization Roundup 2022”
Suggestion: Include FlatSharp ( https://github.com/jamescourtney/FlatSharp ) in the benchmark. It is a more developer-friendly implementation of FlatBuffers format.
I always welcome PRs. I cannot test every serializer. If you think it is important you are welcome to add it.
FlatBuffers is missed in “Deserialize Results”
FlatBuffers numbers are currently under discussion how a more fair test should look like. See https://github.com/Alois-xx/SerializerTests/issues/25 for some preliminary results.