DNS trouble shooting for beginners

Introduction

DNS generally just works (at least as far as you’re concerned), which is good as the internet would be far less fun without it. However this does mean that many people don’t really know how to tell if a problem is a DNS error or something else – this makes life difficult for support desks and even worse causes work for DNS admins. It needn’t be so!

Telling if something is a DNS issue is actually quite simple, and trouble shooting it isn’t much more difficult.

To start with there are really only a very few ways that DNS can go wrong ( from a user perspective – from an admin perspective DNS can go wrong in many and varied ways ).

  1. Not responding at all
  2. Returning the wrong data
  3. Not returning a record when it should

That from an end users point of view are really the only ways that DNS can go wrong, this guide will cover each in turn. But first….

Things that aren’t DNS errors

Most things that get reported as DNS issues aren’t!

Consider these common web errors, if you’re being told that the page isn’t found, that access is denied or that the page is for some other reason unavailable – well then DNS is working. DNS might be returning the wrong data but that error page you’re seeing doesn’t tell you that.

404 Page not found 403 503 unavailable

Of course modern “friendly” errors really don’t help anyone have any clue what’s going on, but if you’re getting any of the above errors then DNS is working, it may just not be telling you the right thing. Equally the following error message might indicate a problem with DNS or it might just show that you’ve made a spelling mistake.

Not Found

In this case either you’ve made a typo, the DNS doesn’t exist and isn’t mean to or there really is a problem ( top tip don’t assume that things must start with “www.” or that it will work without ). Sadly from the error given by your web page there’s no way you or anyone else can tell. So the first lesson to learn is that by and large error messages from web browsers or any other application are by and large useless for trouble shooting DNS – so save yourself and everyone else time and don’t mention them at all when reporting a DNS issue.

Other things that don’t help

So you’re not going to send in a web page or application error, but you might be tempted to send in the output of ping or traceroute/tracert/tracepath or some other command you’ve heard of. Don’t! They’re not useful – even if they look useful they’re not. Except possibly to show the errors you’ve made.

Consider this seemingly diagnostic command:
> ping nonexistent
Ping request could not find host nonexistent. Please check the name and try again.

You may be thinking that this shows that there’s a problem with DNS, you’re wrong as you only used a short name and not the fully qualified domain name (FQDN). A DNS name will almost invariably consist of at least two parts separated by a “.” (localhost being the notable exception), if it doesn’t you’re relying on either:

  • DNS search suffixes
  • NetBios resolution
  • A local hosts file
  • Some other form of DNS resolution

Using the FQDN isn’t much better, consider:
> ping nonexistant.scranworks.net
Ping request could not find host nonexistant.scranworks.net. Please check the name and try again.

At least with this we can probably rule out some of the issues mentioned above, but it doesn’t tell us what name server is being used, if that name server returned a “not found” (NXDomain) response or simply timed out. The same issues apply to traceroutes, and most any other application you might like to name – so please save everyones time and if you’re thinking abut sending in such things as a means of trouble shooting just don’t.

Whilst we’re on the subject if a short name doesn’t work that isn’t a DNS issue, that’s a client configuration issue. It’s only a DNS issue if the fully qualified domain name (FQDN) isn’t working. So don’t ever report a problem with a short name as a DNS issue and even better get out of the habit of using short names, they make you far more vulnerable to attack and make life far far easier for bad people to do bad things.

So what should I do?

I’m afraid that if you want to do any sort of trouble shooting at all, you’re going to have to use the command line ( or install some sort of pretty app, but I’m not going to talk about those ). Don’t worry though you only need to use one command:

nslookup

It’s not my favourite command for trouble shooting DNS but it has the advantage of being very universal, and it is the only command you need to tell if the problem is DNS related and to give useful information to your over worked help desk or sysadmin. Please don’t ever send them the output of ping , traceroute or any application error – at least not unless they ask you to. Now before we go any further lets have a quick look at
nslookup
. Go ahead and open a command prompt, if you don’t know how – then an internet search should tell you how to for your device. Though on mobile devices trouble shooting DNS is much much harder unless you install a suitable application so I’m just going to ignore mobile devices for now. Right if you’ve got a command line open just type:
nslookup localhost

You should get output a bit like this:
> nslookup localhost
Server: 213.133.99.99
Address: 213.133.99.99#53

Name: localhost
Address: 127.0.0.1

The first line is obviously the command, the second and third lines tell you the IP address and name of the DNS server you’re querying. In this case the DNS server doesn’t have it’s own DNS set up properly which is very bad practice, but is why the IP address is repeated. Then the last two lines are the answer the DNS server gave. In most cases that will tell both you and everyone else everything they need to know. Now lets look at the three possible DNS errors you might encounter.

The Errors

Not responding at all

This is a really easy error to spot as (virtually) nothing will be resolving, however it may still not be a DNS problem, it could be a network error. If the DNS server isn’t responding then nslookup will return something like this:
> nslookup localhost
Server: [10.0.0.1]
Address: 10.0.0.1

DNS request timed out.
timeout was 2 seconds.
DNS request timed out.
timeout was 2 seconds.
*** Request to [10.0.0.1] timed-out

This tells us that the request to the DNS server has timed out for some unknown reason. This may mean that the DNS server isn’t working or that there’s a network problem that’s stopping you getting to the DNS server. Depending on your situation at this point contact your local support to either fix DNS or to fix the network. If you’re at home then the problem may be your router and restarting that might fix it, always worth a try. But if you provide that nslookup output to your support people they at least know which DNS server you can’t reach and will be able to talk you through checking if it’s a network issue, an issue with the DNS server on your router (if you happen to be configured that way) or if there actually is an issue with the DNS server. If it is an issue with the DNS server you’ve already given them the address of the problem server so they’re good to go.

Not returning a record when it should

This one is also quite easy to identify but trickier to solve. Again we turn to nslookup to see what’s going on:
> nslookup nothere.scramworks.net
Server: UnKnown
Address: 2001:470:1f09:86a::53

*** UnKnown can't find nothere.scramworks.net: Non-existent domain

As ever the first two line show the DNS server (in this case on an IPv6 address). But now the last line indicates that the domain doesn’t exist. Obviously the very first thing to check here is that you’ve typed the domain name correctly. Assuming that you’ve typed the address correctly and that you know for 100% absolutely sure that the address really really does exist – then you need to do a bit more trouble shooting to tell where the problems is.

At this point you may be thinking that the problem is with DNS and so you need to talk to your DNS admin – if you are : stop it. DNS is a distributed system and your local DNS admin only looks after a very very small bit of it. If you know that they are responsible for the record you’re trying to look up, perhaps it belongs to your company, then it may well be their problem to fix, but otherwise it could well be a problem with someone else’s DNS server at which point your poor over worked sys admin can do nothing to help you. So how do you tell? Quite easily, all you do is ask another DNS server, or even better several of them. If you know the address of another DNS server you could run nslookup up again telling it to talk to a different server, e.g.:
> nslookup
> server 192.168.1.53
Default Server: [192.168.1.53]
Address: 192.168.1.53

> nothere.scramworks.net
Server: [192.168.1.53]
Address: 192.168.1.53
*** [192.168.1.53] can't find nothere.scramworks.net: Non-existent domain

Still not there, so it may be a problem with the domain and not with your local DNS (unless you local DNS is responsible for that domain). Just to be sure you could check with a web based tool like whatsmydns.net

Widespread DNS problems

In this case it looks like it’s not found anywhere so it’s a problem for whoever owns the domain concerned. So it’s only worth contacting your support people or sysadmin if you know for certain that they are responsible for that domain – other wise there’s precious little they can do. If on the other hand the address is working on other servers then it could well be a local DNS issue so contact support.

Returning the wrong data

Pretty much everything that applies to the DNS server returning no data, applies to it returning the wrong data. Returning no data is after all just a special case of returning the wrong data. Except of course you need to know what the right data is, if you’re landing on the wrong website that could be a problem with the web server configuration rather than DNS. As with the case of getting no record found check against multiple DNS servers if you can. If you get varying answers then there are several possibilities, but the two most likely ones are:

  • The address was recently changed and not everywhere has caught up yet
  • The address is being load balanced and so has a different address depending on where you ask from

However yet again unless your admin is responsible for the domain/address there may not be much they can do unless the problem is only apparent on your local DNS server. As a rule of thumb your local support or sys admin can only help if either:

  • They are responsible for the address that is misbehaving
  • The problem with the address is only apparent on the DNS servers they are responsible for

If they’re not responsible for the address and the problem is wide spread beyond the servers they manage then there’s unlikely to be anything they can do.

Things that complicate matters

Sadly things are of course never that simple so here’s a few things that can complicate matters.

CNAMEs

Many addresses are CNAMEs, which are friendly names used to hide a more meaningful but less useful name, CNAMEs widely used with cloud services. For instance consider this example
> nslookup www.shrnk.org
Server: 192.168.1.53
Address: 192.168.1.53
Non-authoritative answer:
Name: www.shrnk.org.cdn.cloudflare.net
Addresses: 2400:cb00:2048:1::681b:9312
2400:cb00:2048:1::681b:9212
104.27.147.18
104.27.146.18
Aliases: www.shrnk.org

If the “Name” (The cloudflare bit) wasn’t responding then an initial glance would suggest that the problem was with the www.shrnk.org address. However the problem could actually be with cloudflare.net which might well be someone elses problem to fix. The good news is that nslookup can help determine where the problem is. If you look up the CNAME explicitly it will return only that value and not what the CNAME points to so removing the dependency, which can be then looked up separately. e.g.:

> nslookup -type=cname www.shrnk.org
Server: 192.168.1.53
Address: 192.168.1.53

Non-authoritative answer:
www.shrnk.org canonical name = www.shrnk.org.cdn.cloudflare.net
> nslookup www.shrnk.org.cdn.cloudflare.net
Server: 192.168.1.53
Address: 192.168.1.53

Non-authoritative answer:
Name: www.shrnk.org.cdn.cloudflare.net
Addresses: 2400:cb00:2048:1::681b:9312
2400:cb00:2048:1::681b:9212
104.27.147.18
104.27.146.18

By looking up the CNAME and it’s target separately it will rapidly reveal where the problem actually is.

Split DNS, Load balancing, RPZ’s and cache poisoning

So far we’ve been assuming that DNS looks broadly the same no matter where you are – this however is very far from true. Many networks use what is known as “private address space”, you’re almost certainly using it at home. This address space is reused many times on the internet, because it’s not allowed to talk to the rest of the internet directly and so it’s also not allowed to appear in public DNS. This leads to what is known as “split DNS”. “Split DNS” just means that the DNS on one network isn’t the same as it is on the internet or a different network. This means that sometimes even if an address exists on your local DNS it may not exist for the rest of the internet, or may have a different address. This is deliberate and often a good thing, but it does mean that using external DNS tools to check things – may give a misleading answer.

Having established that DNS isn’t the same everywhere the rest of this is really just special cases of the same thing.

  • Load balancing
    To make things faster, some addresses are served by several servers often in several locations. Load balancing just applies “intelligence” to a DNS request to decide what answer to return based on any number of factors. This means that the address you get returned may vary depending on what DNS server you query and just over time as server and network load shifts.
  • RPZs
    A RPZ (Response Policy Zone) is a mechanism for hiding or redirecting a DNS name ( DNS vendors call this a “DNS firewall” and like to charge a lot of money for it ). Basically the DNS server lies when you ask it to look up an address. This can be done for all manner of reasons such as to block viruses , interrupt malware, block adverts or enforce some company policy (and sometimes just because it’s funny). In this instance this will be the problem of your local DNS admin – but they may just tell you that the security team made them do it.
  • Cache poisoning
    This is basically the same thing as an RPZ except that, it was done by the bad guys tricking a DNS server into giving the wrong result. Your DNS admin will definitely want to know about this, but it won’t make them happy.

Not found redirection

As a foot note to the fun and interesting ways DNS can be deliberately made more interesting a special mention should be given to the redirection of “not found” responses. Some ISPs and DNS service providers think it’s terribly “helpful” to redirect failed DNS requests to a search engine – there is a special place in hell for these people. In this circumstance you as a humble user can do nothing but treat it as if the server is returning the wrong address because it is, and making everyones trouble shooting take longer in the process – but hey it probably generates Ad revenue for someone so that’s all good. Right?

DNSMasq

DNSMasq is a DNS program that is installed on a lot of ADSL routers and is generally terribly helpful. However on some routers you can’t turn it off and you can’t avoid it, so even if you think you’re querying a different DNS server, nope it’s still your local DNSMasq server responding. This makes querying different DNS servers pointless but it’s also very difficult to tell that it’s there and making your queries pointless. Things like this are why restarting your ADSL router can actually be good trouble shooting practice. Larger corporate firewalls can also exhibit the same behaviour.

Firewalls

As previously mentioned firewalls can hijack your DNS requests and either respond to your query themselves or redirect your request to a DNS server of their choosing. This makes debugging trickier as it’s normally transparent so you may think you’re querying different DNS servers when you’re not really. Also in many environments, especially large corporate ones, DNS requests to the outside world are blocked except from approved servers. There’s nothing you can do about either of these problems, so just curtail your trouble shooting and send that nslookup off to support and let them worry about it. Maybe go and make a nice cup of tea instead.

Finally

There is a lot more you can do to trouble shoot a DNS issue and identify where exactly the problem lies, but this is about DNS trouble shooting for beginners. Really the only thing you need to remember is:
Always provide the full results of an nslookup
Really that’s it, if you think you have a DNS issue, don’t bother with browser errors, pings traceroutes or anything else of that nature, just go straight for nslookup and send that to your local support – it saves everyone time and your problem will be fixed (or you’ll be told they can’t fix it) far far faster.

Bookmark the permalink.

Leave a Reply