With the release of the first Windows Anniversary SDK Beta also a new version of Windows Performance Toolkit was shipped. If you want to search for the changes here are the most significant ones. Not sure if the version number of the beta will change but it seems to target 10.0.14366.1000.
Stack Tags for Context Switch Events
Now you can assign tags for wait call stacks which is a big help if you want to have tags why your application is waiting for something. Here is sample snippet of common sources of waits.
<Tag Name="Waits"> <Tag Name="Thread Sleep"> <Entrypoint Module="ntdll.dll" Method="NtDelayExecution*"/> </Tag> <Tag Name="IO Completion Port Wait"> <Entrypoint Module="ntoskrnl.exe" Method="IoRemoveIoCompletion*"/> </Tag> <Tag Name="CLR Wait" Priority="-1"> <Entrypoint Module="clr.dll" Method="Thread::DoAppropriateWait*"/> </Tag> <Tag Name=".NET Thread Join"> <Entrypoint Module="clr.dll" Method="Thread::JoinEx*"/> </Tag> <Tag Name="Socket"> <Tag Name="Socket Receive Wait"> <Entrypoint Module="mswsock.dll" Method="WSPRecv*"/> </Tag> <Tag Name="Socket Send Wait"> <Entrypoint Module="mswsock.dll" Method="WSPSend*"/> </Tag> <Tag Name="Socket Select Wait"> <Entrypoint Module="ws2_32.dll" Method="select*"/> </Tag> </Tag> <Tag Name="Garbage Collector Wait"> <Entrypoint Module="clr.dll" Method="WKS::GCHeap::WaitUntilGCComplete*"/> </Tag> </Tag>
That feature is actually already part of the Windows 10 Update 1 WPA (10.0.10586.15, th2_release.151119-1817) but I have come just over that feature now. That makes hang analysis a whole lot easier. But a really new feature are Flame Graphs which Bruce Dawson wanted since a long time built into WPA. I have written some small tool to generate them also some time ago Visualize Your Callstacks Via Flame Graphs but it never got much traction.
If you want to see e.g. why your threads are blocking you get in the context switch view now a nice view what are the blocking reasons and which thread did unblock your threads most often. You can stack two Wait views over each other so you can drill down to the relevant time where something is hanging and you can get to conclusions much faster now.
Another nice feature is that if you select a region in the graph and hold down the Ctrl key you can select multiple regions at one time and highlight them which is useful if you frequently need to zoom in into different regions and you move between them. The current flame graphs cannot really flame call stacks because the resulting flames become so small you have no chance to select them. Zooming with Ctrl and the mouse wheel only zooms into the time region which is for a flame graph perhaps not what I want. I would like to have a zoom where I can make parts of the graph readable again.
Symbol loading has got significantly faster and it seems also to crash less often which is quite surprising for a beta.
My Presets Window for managing presets
Another new feature is the My Presets Window which is useful if you work with custom WPA profiles. From there you can select from different loaded profiles the graph in your current view and simply add it.
Reference Set Tracing
WPR now also supports Reference Set analysis along with a new graph in WPA. This basically traces every page access and release in the system which allows you to track exactly how your working set evolved over time and why.
Since page in/out operations happen very frequently this is a high volume provider. With xperf you can enable it with the keyword REFSET. The xperf documentation about it only tells you
REFSET : Support footprint analysis
but until now no tool was publicly available to decode the events in a meaningful manner. I still do not understand how everything works together there but it looks powerful to deeply understand when something is paged in or out.
WPA has for a long time known nothing about the network. That is changing now a bit.
ACK Delays are nice but as far as I understand TCP the application is free to send the ACK until the server has got the data ready. If you see high ACK delay times you cannot point directly to the network but you still need to investigate the server if something was happening there as well. For network issues I still like Wireshark much more because it understands not only raw TCP but also the higher protocols. For a network delay analysis TCP retransmit events are the most important ones. The next thing to look at are the packet round trip times (RTT) where Wireshark does an incredible good job at it. I have seen some ETW events in the kernel which have a SRTT (Sample RTT) field but I do not know how significant that actually is.
Load files from Zip/CAB
You no longer need to extract the etl files but WPA can open them directly which is great if you deal with compressed files a lot.
- The new WPT no longer works on Windows 7 so beware.
- For WPR/UI I have mixed feelings because it is not really configurable and records too much. The recorded amount of data exceeds easily one GB on a not busy machine if I enable CPU Disk, File and Reference Set tracing. The beta version also records for the Disk profile Storeport traces which are only useful if you suspect bugs in your hard disk firmware or special hard disk drivers. If I enable context switch tracing with call stacks I usually do not need for every file/disk operation the call stacks since I will find it anyway in the context switch traces at that time. If VirtualAlloc tracing is enabled the stack traces for the free calls are recorded which are seldom necessary because double frees are most often not the issue. Memory leaks are much more common. For these the allocation call stacks are the only relevant ones.
- The improvements to the ETW infrastructure with Windows 8.1 which support filtering of specific events or to record stack traces only for some specific events have not made it into WPR nor xperf. I really would like to see more feature parity between what the kernel supports and what xperf allows me to configure to reduce the sometimes very high ETW event rates down to something manageable.
- Currently xperf can start only two kernel trace sessions (NT Kernel Logger and Circular Kernel Context Logger) but not a generic kernel tracing session where one can have up to 8 since Windows 8. Besides this I am not sure if setting the profiling frequency is even possible to set for a specific kernel session.
The new version has many improvements which can help a lot to get new insights in your system in ways that were not possible before. Flame Graphs look nice and I hope that the final version makes it possible to zoom in somehow.
The WPA viewer is really a great tool and has gotten a lot of added features which shows that more and more people are using it.
3 thoughts on “New Beta of Windows Performance Toolkit”
[…] New Beta of Windows Performance Toolkit […]
[…] New Beta of Windows Performance Toolkit – Alois Kraus […]
I asked Microsoft and they told me that they refactored the code to use ApIsets (https://msdn.microsoft.com/en-us/library/windows/desktop/hh802935(v=vs.85).aspx), which are not part of Windows 7, so that WPT now requires windows 8.