Networking is hard and many things can go wrong. I have bought an Oculus Quest VR headset for my daughter. Soon I have found it frustrating to connect it via an USB cable to my PC. To get rid of the cable I have bought a cheap TP-Link Archer C5 AC1200 router which I connect via my LAN cable to the PC. After becoming wireless I can send realtime VR images over my dedicated WIFI to the VR headset with stable 150 MBit/s.

The “real” WIFI (yellow) which is connected to the internet is located in another room which is not capable to transfer 150 MBit/s without any interrupts which would kill the VR experience. That is the reason why nearly everyone recommends to use either an USB cable (inconvenient) or a LAN cable (blue) connected to a dedicated WIFI router for the VR headset.
That setup works great, but I have noticed that whenever I have the second router running that browsing the internet would feel very slow. Every page refresh was taking a lot longer compared to when I did turn the second WIFI router off.
Since all browsers have great development tools I just needed to press F12 to see what was blocking my browser requests. In the example below I have tried to access a far far away web site which shows in the network tab for the first request a delay 10,55s for host name resolution (Dns).

On my box these timings were always around 10s so I knew that I did suffer from slow Dns queries to resolve the host name e.g. http://www.dilbert.com to the actual IP address. Knowing that page load times are blocked by 10+ seconds by Dns is great, but now what? I was suspecting that, since I have two network tabs in Task Manager, that the DNS query was first sent via the LAN cable to a dead end because my Oculus is not connected to the internet and then via WIFI to the actual internet connected router.
Initially I tried the route print command to see where why packets were routed, but that looked too complicated to configure the order for every local route. After a bit more searching I have found the powershell command Get-NetIPInterface which shows you over which network interface Windows sends network packets to the outside world. The relevant column is InterfaceMetric. If multiple network cards are are considered the network interface with the lowest InterfaceMetric number is tried first.

As expected I have found my local Ethernet connection with InterfaceMetric 5 with the preferred connection to send network packets to the outside world. But what should be the other network? The easy way out is to cause some traffic in the browser and check in Task Manager in the Performance Tab which network gets all the traffic:

I need therefore to change the order vEthernet (New Virtual Switch) with metric 15 and interface id 49 and vEthernet (New Virtual Switch) with metric 25 and interface id 10.
Set-NetIPInterface -InterfaceIndex 10 -InterfaceMetric 5
Set-NetIPInterface -InterfaceIndex 49 -InterfaceMetric 25

The average reader can stop reading now.
What follows is a deep dive into how Dns works on Windows and how you can diagnose pretty much any Dns issue with the builtin Dns tracing of Windows with the help of ETWAnalyzer.
So far we have just looked at Dns as a black box, but you can trace all Dns requests with the help of ETW (Event Tracing for Windows) and the Microsoft-Windows-DNS-Client ETW provider. The easiest way to record the data is to download a recording profile like Multiprofile.wprp, download the file e.g. via curl from the command line
curl https://raw.githubusercontent.com/Alois-xx/FileWriter/master/MultiProfile.wprp
> c:\temp\MultiProfile.wprp
Then start from an Admin command prompt tracing with
wpr -start c:\temp\MultiProfile.wprp!Network
Now surf to some web sites or do your slow use case. After you have found some slowdown you can stop recording with
wpr -start c:\temp\SlowBrowsing_%ComputerName%.etl
Good naming is hard, but if you later want to revisit your recorded files you should try to come up with a descriptive name.
After recording the data you can extract a Json file with ETWAnalyzer by extracting everything (-extract all) or just Dns data with -extract DNS
C:>EtwAnalyzer -extract all -fd c:\temp\SlowBrowsing_Skyrmion.etl -symserver ms 1 - files found to extract. Success Extraction of c:\temp\Extract\SlowBrowsing_Skyrmion.json Extracted 1/1 - Failed 0 files. Extracted: 1 files in 00 00:02:40, Failed Files 0 |
ETWAnalyzer supports many different queries for e.g. Disk, File, CPU and other things. Since 2.4.0.0 support for dumping Dns Queries was added. To show the aggregated top 5 slowest Dns queries you just need -fd (File or Directory) and the file name of the extracted data and -dump Dns to dump all Dns queries. To select the top 5 add -topn 5 and you get this output per process printed:
C:\temp\Extract>EtwAnalyzer -dump DNS -topn 5 -fd SlowBrowsing_Skyrmion.json 02.10.2022 17:47:32 SlowBrowsing_Skyrmion NonOverlapping Min Max Count TimeOut DNS Query Total s s s # OVRServer_x64.exe(6624) 124 128 12,038 s 0,000 s 12,038 s 4 1 graph.oculus.com backgroundTaskHost.exe(6692) -ServerName:BackgroundTaskHost.WebAccountProvider 12,043 s 0,000 s 12,043 s 2 1 login.microsoftonline.com firefox.exe(13612) 12,047 s 0,021 s 12,047 s 2 1 d21lj84g4rjzla.cloudfront.net 12,052 s 0,000 s 12,052 s 2 1 loadus.exelator.com 12,052 s 0,000 s 12,052 s 2 1 cdn.adrtx.net Totals: 60,232 s Dns query time for 12 Dns queries |
We find that we have indeed some slow Dns queries which all did time out after ca. 12s. The magenta named processes are issuing all of the following DNS queries. To see more we can add -details to see the timing of every slow DNS Query
C:\temp\Extract>EtwAnalyzer -dump DNS -topn 5 -details -fd SlowBrowsing_Skyrmion.json 02.10.2022 17:47:32 SlowBrowsing_Skyrmion NonOverlapping Min Max Count TimeOut DNS Query Total s s s # 12,038 s 0,000 s 12,038 s 4 1 graph.oculus.com 2022-10-02 17:47:38.416 Duration: 0,000 s OVRServer_x64.exe(6624) DnsAnswer: 2022-10-02 17:47:38.417 Duration: 0,000 s OVRServer_x64.exe(6624) DnsAnswer: 2022-10-02 17:47:38.417 Duration: 12,038 s OVRServer_x64.exe(6624) TimedOut: 192.169.0.1:53 Servers: 192.169.0.1;fe80::1;0.0.0.0 DnsAnswer: 2a03:2880:f21c:80c1:face:b00c:0:32c2;::ffff:31.13.92.49 2022-10-02 17:47:38.417 Duration: 12,038 s OVRServer_x64.exe(6624) TimedOut: 192.169.0.1:53 Servers: 192.169.0.1;fe80::1;0.0.0.0 DnsAnswer: 2a03:2880:f21c:80c1:face:b00c:0:32c2;::ffff:31.13.92.49 12,043 s 0,000 s 12,043 s 2 1 login.microsoftonline.com 2022-10-02 17:47:42.020 Duration: 0,000 s backgroundTaskHost.exe(6692) DnsAnswer: 2022-10-02 17:47:42.020 Duration: 12,043 s backgroundTaskHost.exe(6692) TimedOut: 192.169.0.1:53 Servers: 192.169.0.1;fe80::1;0.0.0.0 DnsAnswer: ::ffff:40.126.32.74;::ffff:20.190.160.14;::ffff:40.126.32.134;::ffff:40.126.32.72;::ffff:20.190.160.22;::ffff:40.126.32.136;::ffff:40.126.32.138;::ffff:40.126.32.68 12,047 s 0,021 s 12,047 s 2 1 d21lj84g4rjzla.cloudfront.net 2022-10-02 17:47:54.203 Duration: 0,021 s firefox.exe(13612) DnsAnswer: 13.227.153.112;13.227.153.94;13.227.153.81;13.227.153.66 2022-10-02 17:47:54.224 Duration: 12,047 s firefox.exe(13612) TimedOut: 192.169.0.1:53 Servers: 0.0.0.0;192.169.0.1;fe80::1 DnsAnswer: 2600:9000:225b:fc00:1e:b6b1:7b80:93a1;2600:9000:225b:3600:1e:b6b1:7b80:93a1;2600:9000:225b:2e00:1e:b6b1:7b80:93a1;2600:9000:225b:f000:1e:b6b1:7b80:93a1;2600:9000:225b:7200:1e:b6b1:7b80:93a1;2600:9000:225b:a400:1e:b6b1:7b80:93a1;2600:9000:225b:9200:1e:b6b1:7b80:93a1;2600:9000:225b:dc00:1e:b6b1:7b80:93a1 12,052 s 0,000 s 12,052 s 2 1 loadus.exelator.com 2022-10-02 17:47:42.091 Duration: 0,000 s firefox.exe(13612) DnsAnswer: 2022-10-02 17:47:42.091 Duration: 12,052 s firefox.exe(13612) TimedOut: 192.169.0.1:53 Servers: 192.169.0.1;fe80::1;0.0.0.0 DnsAnswer: ::ffff:54.78.254.47 12,052 s 0,000 s 12,052 s 2 1 cdn.adrtx.net 2022-10-02 17:47:42.091 Duration: 0,000 s firefox.exe(13612) DnsAnswer: 2022-10-02 17:47:42.091 Duration: 12,052 s firefox.exe(13612) TimedOut: 192.169.0.1:53 Servers: 192.169.0.1;fe80::1;0.0.0.0 DnsAnswer: ::ffff:34.248.125.130;::ffff:3.248.26.101 Totals: 60,232 s Dns query time for 12 Dns queries |
You might ask yourself. What are NonOverlapping Totals? The ETW provider Microsoft-Windows-DNS-Client instruments dnsapi.dll!DnsQueryEx which supports synchronous and async queries. By default an IPV4 and IPV6 query is started whenever when you try to resolve a host name. If both queries need e.g. 10s, and are started at the same time the observed delay by the user is 10s (Non Overlapping time) and not 20s (that would be the sum of both query times). To estimate the observed user delay for each query ETWAnalyzer calculates the non overlapping query time which should much better reflect the actual user delays. The summary time Totals is also taking overlapping DNS query times into account which should be a better metric than a simple sum.
For more details please refer to https://github.com/Siemens-Healthineers/ETWAnalyzer/blob/main/ETWAnalyzer/Documentation/DumpDNSCommand.md.
The details view shows that every slow timed out query has a red TimedOut Dns server name (192.169.0.1:53). This is the Dns server IP for which a Dns query did time out. By looking at the interfaces with ipconfig we can indeed verify that the new TPLink Router was tried first, did time out and after that the next network (WIFI) was tried which did finally succeed.
If you are in an IPV6 network the Dns server IP might be a generic default Link Local IP which can be the same for several network interfaces. In that case you can add -ShowAdapter to show which network interfaces were tried. That should give you all information to diagnose common Dns issues. Of course you can also use WireShark to directly get your hands on the Dns queries, but with WireShark you will not know which process was waiting for how long. The ETW instrumented DnsQueryEx Api will give you a clear process correlation and also which queries (IPV4, IPV6) were tried. To get all details you can export the data to a CSV file (add -csv xxx.csv) and post process the data further if you need to. To select just one process (e.g. firefox) you can filter by process with -pn firefox which is is the shorthand notation for -processname firefox. For more information on the query syntax of ETWAnalyzer see https://github.com/Siemens-Healthineers/ETWAnalyzer/blob/main/ETWAnalyzer/Documentation/Filters.md.

If you have just Context switch traces but no DNS Client traces not all is lost. If you suspect DNS delays you can still query the thread wait times of all dns related methods with ETWAnalyzer which also calculates for wait and ready the non overlapping times summed for all threads.
EtwAnalyzer -dump cpu -pn firefox -fd WebBrowsingRouterOn_SKYRMION.10-06-2022.19-46-16 -methods *dns* -fld s s

If you have async queries in place, but you know that just one async operation was initiated you can add -fld s s to show the time difference the method was seen first and last in the trace which should give you a good estimate how long that async method was running, if upon completion it was called again (and consumed enough CPU so we could get a sample). For more details please refer to https://github.com/Siemens-Healthineers/ETWAnalyzer/blob/main/ETWAnalyzer/Documentation/DumpCPUCommand.md.
You can also use one specific method or event as zero point and let ETWAnalyzer calculate first/last timings relative to a specific event which can be useful to calculate e.g. timing of methods relative to a key event like OnLoadClicked or something similar. That advanced feature is explained at https://github.com/Siemens-Healthineers/ETWAnalyzer/blob/main/ETWAnalyzer/Documentation/ZeroTime.md in much greater detail. It is also possible to correlate traces across machines without synchronized clocks by using different zero times for different files within one query and export the data to a CSV file for further processing.
That was a lot of information, but I hope I have convinced you that you can access previously hard to get information now much easier with ETWAnalyzer. It is not only useful for performance investigations, but it might be also interesting to get an overview which servers were tried to query from your machine for forensic audits.
A fine walkthrough/demo/showcase of the tool – reminiscent of Winternals postings back in the good ol’ days. I’ve added ETWAnalyzer to the shelf of tools that need a good play.
LikeLike