Visualization of DNS Query and Response
Recently I’m strongly interested in exploratory data analysis and network analysis. Then I created one visualization based on the two techniques by DNS query and response. As you may know, we leverage domain name to look up IP address by querying to DNS server. Traditionally, domain name and IP address are configured symmetrically, however recent web services don’t apply symmetrical settings because of load balancing, virtual private hosting, etc. So, it’s difficult to understand usage of domain name service for the users.
Then, I’m trying to visualize DNS query and response to understand architecture of DNS for the services. I used 24 hours traffic data of my home network to visualize. The number of record is 75,098, and I leveraged my traffic analysis tool and graphviz for visualization.
At first, one easy case. The yellow node means domain name queried for CNAME (e.g. Browser querying the domain name to DNS server), the red node liked with the yellow node by dotted line is a result of querying CNAME, and the green node means actual IP address. The image shows:
- Some program queries IP address of
- DNS server tells
- The program queries IP address of
- DNS server tells IP address of
- The program connects to
Please note it’s just rough explanation :–)
Next one shows querying one domain name and multiple IP address returned. It’s not clear that these IP addresses returned at once, or one by one. But we know well it’s one of techniques to load sharing.
There is an opposite case. The image shows only one IP address is assigned for multiple domain name. I guess there are separated several services, however they don’t have high load. Then they are aggregated to one server. It’s also well known technique to service deployment.
OK, let’s see a tough case by now. When I(We?) use Dropbox, domain name
clientXX.dropbox.com is queried at first, and they are aggregated to
client.dropbox.com, and forwarded to
client.v.dropbox.com. Eventually domain name
client.v.dropbox.com returns a lot of IP addresses. I imagined the architecture for load balancing in the future, but I’m not sure.
Next one is more aggressive. The image shows domain names and IP addresses related to
cloudfront.net. As you may know, Cloud Front is Contents Delivery Network service provided by AWS. Several sites (
evernote.com) are entry point, however there are a lot of IP addresses and complex links between domain name and IP address. It’s estimated that internal system is cross site architecture, not separated per tenant in infrastructure level.
The image shows domain name related to Google. I could not tell about all of Google’s sophisticated system from only the image, however we can find several Google’s services, such as Yotube, Google-analysis, Google map, etc) on same servers. Then I imagined that many services of Google leverage their common middleware/infrastructure, or the IP addresses are just entry points.
Last one is related to Akamai. It looks their internal system is more complex than other giants such as Google and Amazon. Akamai is one of biggest Contents Delivery Network companies. Yellow node is first querying domain name, and the image shows they have a lot of customer.
Additionally, there may be “Apple Island” in Akamai network. It seems independent from other Akamai network that is used by other customers. There are nodes of
service1.ess.apple.com.akadns.net. at center point of the image (but other nodes hide them), and they are linked a lot of IP addresses. For example, it’s monitored at my home network that
us-courier.push-apple.com.akadns.net. returned 440 IP addresses.
You can find full size image from here here. However please note the image file is 37.5MB PNG, if your PC don’t have enough resources, it would be crashed.