a network panel with cables attached

Troubleshooting network issues – Junior Developer Handbook

Your day isn’t really going good so far. Someone told you to set something up for the application you’re working on, you read tens of articles and manuals about how to get the damn thing running but to no avail. It’s 3pm already and the system still sings you the song of its people. Sad.

A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 – Could not open a connection to SQL Server) (Microsoft SQL Server, Error: 53)

Microsoft SQL Server

System.Net.Http.HttpRequestException: ‘The SSL connection could not be established, see inner exception.’

.NET’s HttpClient

These and many other error messages do not really point us to where the problem source is. We could just blindly go at it with trial and error but the fact that we often operate in non-trivial systems makes this strategy unfeasable. We need to strategically work our way towards the solution which, at least for me, always starts with validating the network connectivity. This article tells you how to accomplish this.

A vast amount of devices and services stand between you (top left) and the service endpoint (top right) you’re trying to reach. Our goal is to eliminate everything between the two as the source of the problem, which narrows our failure domain significantly.

DNS – They call me how? That’s not my name!

“It’s always DNS” is a well-known meme in the sysadmin and developer space for a reason: It really (almost) always is DNS when you try to find the culprit in a network-related failure analysis. The single most important part of the job DNS does for us is resolving domain names to IP addresses. Interestingly that also seems to be the thing that never works, so you’ll run into problems here a lot, trust me.

To see if maybe DNS is the problem, we need to use the service endpoint’s IP address instead of its name. To do this, we open up a terminal and type nslookup <servername> and look for the server IP address in the output. There may be more than one in the output, you can use any one of them in this case.

PS C:\Users\michael> nslookup api.github.com
...

Non-authoritative answer:
Name:    api.github.com
Address:  140.82.121.5

Now you replace the name with the IP in your application and try again. Did it work? Congratulations! The problem was – once again – DNS. If it did not, it’s not DNS who’s at fault and you’ll need to continue your investigation.

TCP and UDP – I’m caught up in the middle

Every application layer protocol needs a lower-level transport layer protocol. Almost all of them use either TCP, UDP or both. For the higher-layer communication to work, the lower-laver communication needs to work, too. The good news is you can test if it does or not.

We first need to figure out which connection our protocol needs to work. You’ll find this information by querying the search engine of your choice. Type <the application/protocol/thing> ports and you’ll find it. Examples:

  • sql server ports
    • TCP 1433, 4022, 135, 1434, UDP 1434
  • https ports
    • TCP 443, some deployments use 8443
  • http3 ports
    • HTTP/3 doesn’t have a designated port
    • a client will just use http2=https for initial communication
    • it will report the UDP port to use in a header calledAlt-Svc

With our newly acquired knowledge, we can now see if we can get a connection going. For TCP we use telnet, an ancient Windows command line tool. It is so ancient that it’s not even enabled by default and you’ll need to do that first if you haven’t already. To find out how, use a search engine and type “activate telnet windows”, there are plenty of articles about how that works.

To test if a TCP connection is possible, type telnet <host> <port> in your terminal and observe the output. If no TCP connection could be established it will look like this:

PS C:\Users\michael> telnet bogus.example.com 443
Connecting To bogus.example.com...Could not open connection to the host, on port 443: Connect failed

If it succeeds there will be no output, instead you will see a blank terminal without any text. To get out of this simply close the terminal window.

Now for UDP, things are a little more complicated. We first need an external tool because Windows lacks onboard functionality for this. Fortunately, Mark Russinovic has something in his awesome Sysinternals suite of tools we can use: psping. Download it at https://learn.microsoft.com/en-us/sysinternals/downloads/psping, unzip it somewhere and open a terminal in the folder with all the .exe files. For me this is C:\Users\michael\Downloads\PSTools.

No we’ll do a latency test via UDP to see if the server responds. Do to that, type .\psping.exe -l 512 -n 1 -u <Host>:<Port> and observe the output. If the connection was successful, it will look like this:

.\psping.exe -l 512 -n 1 -u 1.1.1.1:53

PsPing v2.12 - PsPing - ping, latency, bandwidth measurement utility
Copyright (C) 2012-2023 Mark Russinovich
Sysinternals - www.sysinternals.com

Setting warmup count to match number of outstanding I/Os: 16
UDP bandwidth test connecting to 1.1.1.1:53: Connected
17 iterations (16 warmup) sending 512 bytes UDP bandwidth test: 100%

UDP sender bandwidth statistics:
  Sent = 1, Size = 512, Total Bytes: 8192,
  Minimum = 0.01 b/s, Maximum = 0.01 b/s, Average = 0.01 b/s
UDP receiver bandwidth statistics:
  Received = 1, Size = 512, Total Bytes: 8192,
  Minimum = 0.01 b/s, Maximum = 0.01 b/s, Average = 0.01 b/s
UDP packet rate: 100.00%
Error exchanging UDP statistics: The operation completed successfully.

If the connection was not successful it will look like this:

.\psping.exe -l 512 -n 1 -u 192.168.1.1:30923

PsPing v2.12 - PsPing - ping, latency, bandwidth measurement utility
Copyright (C) 2012-2023 Mark Russinovich
Sysinternals - www.sysinternals.com

Setting warmup count to match number of outstanding I/Os: 16
UDP bandwidth test connecting to 192.168.1.1:30923:
The remote computer refused the network connection.

ICMP – Pretty fPing useless

There is the ping command. It tells you if a remote host is reachable from your machine on the network level which means basic IP packet exchange is possible. In most corporate networks, however, ICMP is completely blocked for security reasons and will permanently report that communication is impossible even if it works via UDP/TCP. It gets an honourable mention here because it’s often the first thing people tell you to do on forums. Don’t, it’s a useless tool for the kind of troubleshooting you’ll do.

Nothing worked – now what?

You can be pretty sure at this point that the issue is network related. Go write down the findings you already have and ask someone for help be it the network team, a colleague or your boss. They’ll be glad to help, especially when they see you tried to investigate yourself but ran into a wall.

Summary – TL;DR

  • Our goal is to eliminate the network as the source of failure
  • Test if DNS is the issue by finding the IP behind the hostname withnslookup <servername> and trying that in your application instead of the hostname
  • Find the TCP/UDP ports that your appliation/protocol uses
  • Test TCP connections with telnet <host> <port>
  • Test UDP connections with psping.exe -l 512 -n 1 -u <Host>:<Port>
  • Forget about ping. TCP/UDP tests include network level connectivity.

Photo by Jordan Harrison on Unsplash