Introduction to Computer Networking (Part 1)
The computer networking stack consists of 7 layers, each of the articles in the series is going to talk about one of these layers. The 7 layers are known as the OSI (Open Systems Interconnection) model. We will start from the “highest” layer which is known as the Application Layer.
Application Layer
Computers communicate with each other at the application layer. The application layer is generally defined by the developer that is building the application. There are usually two designs to choose from. The first is a client-server model and the second is a P2P model. In the client-server model, there is a host (the server) that is always online and accepts connections from clients. In the P2P model, instead of an always online host, computers serve as both a client and a host.
Irrespective of the model in which computers communicate, there is always a process on each computer running to send and receive the messages over the network. Processes send and receive these messages through sockets.
A socket is the interface for which a process sends and receives messages. An operating system can provide a network socket interface which allows processes to write messages and receive messages from the network without having to worry about the underlying implementation of how the message is sent. In Linux, a socket has the same interface as a file descriptor. This allows developers to call `read()`, `write()`, `open()` syscalls just like you would to a normal file descriptor.
Once specifying a message and the protocol (we will talk about this later) to send, an application developer can simply write the message to the socket and it will be delivered. But where will it be delivered? The process that writes the message also needs to specify what computer the message should be sent to along with the process on that computer that should receive this message.
On a computer network, computer’s are identified by IP addresses which is unique for every computer. An IP address generally takes the form of four numbers between 0 and 255 separated by four periods. For example, 10.100.3.255. To specify the process in the computer, the sender will also specify a port. A port is a number that ranges from 0 to 65535. The process on the receiving end of this message will be “listening” to input into one of these ports. Port numbers 0 to 1023 are generally used for pre-defined functions.IP address and port is a great for routers to identify the referenced computer. However, for humans, these numbers are difficult to remember. Imagine having to type in and remembering this number when trying to go to google.com or facebook.com. For a more human-readable version of these IP addresses, hosts can also be identified by a hostname (like google.com or facebook.com). The DNS system takes the human-readable hostname and translates it into an IP address which the underlying network technology prefers. We will talk about the DNS system later in this post.
Once the message and destination is specified, we need to also specify the protocol that the message will be traveling on. Different protocols have different use cases which is why it is important to understand these protocols when deciding how to send a message through the network. Some important considerations that an application developer needs to make when deciding what protocol to send messages with is:
- Does my application need to guarantee delivery from one computer to another? Because computer networks are unreliable (messages can be dropped, messages can be corrupted, etc…), simply sending a message to a receiver does not guarantee that the receiver will receive it. We will talk about how reliable data transfer protocols are built on top of unreliable computer networks in the following article.
- How fast does this data need to reach the receiver? There is generally a tradeoff between speed of data transfer and reliability. If not all applications require reliable data transfer, they can opt for faster data transfer by using different transport protocols.
- How secure does my data transfer need to be? Sometimes it is important for the data that is being sent across the network to be encrypted so that bad actors can’t see it.
Two of the most popular protocols on the internet are TCP and UDP. The key distinction between these protocols is that TCP sacrifices speed for reliable delivery of messages while UDP does not guarantee that the receiver will get the message. These protocols are part of the Transport Layer (the layer right below the application layer) and we will talk about them in the following article.
HTTP
For messages to be sent throughout the web and understood easily by many computers, it is helpful to determine a common message format. Similar to how humans use language to communicate, computers have a message format. HTTP (Hyper Text Transfer Protocol) is considered one of the common message formats for the Internet. HTTP is an application level protocol and operates on top of the TCP protocol. Application developers will generally write HTTP messages to a socket, have it be transported over TCP to a receiving process. The receiving process will unpack the HTTP message and then send a response to the client with the information requested.
The HTTP request message format generally looks as follows:
In this message, GET represents the HTTP verb. Each request has an HTTP verb associated with it. These verbs specify what type of HTTP message is being sent. Generally, HTTP GET requests are requesting a resource while HTTP POST requests are sending a resource from client to server. Some other HTTP verbs include PUT, DELETE, and HEAD. After the HTTP verb, still on line 1, is the path of the URL (or path of the resource) that the HTTP message is intending to send to. After that is the HTTP protocol version. The latest protocol as of the time of this article is HTTP/3! The next four lines are HTTP headers. Headers are sent in a key-value fashion. In our case, for example, Host is a key and www.test.com is the value. There is generally a standard set of header keys that clients and servers can include to describe common functions. Accept-language is one of these common functions which specifies the language of the request. Another is, Content-Type which specifies the format of the content sent in the body. The body is the final part of the message which is below the headers.
Once the receiver receives the message, it sends a response to the sender. The HTTP response message is formatted as follows:
Most of this should look familiar with the HTTP request message format. However, there is the “200 OK” which is slightly different. This part of the message describes the HTTP response code. Some common response codes are:
- 200 OK: the request succeeded
- 400 Bad Request: the sender sent incorrect information
- 401 Unauthorized: the sender could not authenticate the request
- 404 Not Found: the receiver didn’t recognize this object
- 500 Internal Server Error: there was an error with the server
Now, we have an understanding of how processes communicate with each other through a commonly used message format called HTTP. Let’s try to understand what actually happens when you type in www.google.com in a browser’s URL bar. But first, we need to understand DNS (Domain Name System).
DNS
DNS is the way that the internet translates human-readable hostnames like google.com to IP addresses that the underlying network technology understands. The Domain Name System is an application level protocol that translates hostnames to IP addresses which are then passed into the Transport Layer. In specific, DNS is a distributed database of DNS servers and the DNS service on your computer generally runs on port 53.
When you type in google.com into your browser URL bar, the browser will first initiate a DNS request to DNS servers to get the IP address of the google.com hostname. The request is transported using the UDP protocol which, as we discussed, is a Transport Layer protocol. The DNS server will then respond to the client with the set of IP addresses belonging to that hostname. The client will generally pick the first one and use that when constructing the request to the Transport Layer.
The underlying architecture of DNS is much more complicated than it may initially seem. To support the entire Internet, it is not practical to have one DNS server serving all the requests to translate host names to IP addresses. First off, if that server were to crash, DNS for the entire Internet would go down. Second, it would be really slow if many clients were attempting to do this translation at the same time. The designers of the DNS system chose a distributed approach which would be more fault tolerant and scale better for large numbers of requests. Additionally, all DNS servers don’t have all mappings for hostnames to IP addresses. This information is also distributed across DNS servers.
There are four classifications of DNS servers, local DNS servers, root DNS servers, top-level domain servers, and authoritative DNS servers. When a client makes a DNS request, the request will first hit a local DNS server. The local DNS server will then make iterative queries to other DNS servers to get the hostname to IP address mapping. The first of these DNS servers it hits is the root DNS server. The root DNS servers are managed by a group of organizations. The root DNS server will then provide the IP address of the top-level domain servers. The root DNS server picks the top-level domain server based on the suffix of the hostname. For example, .com suffixed hostnames all go to the same set of top-level domain servers. Once at the top-level domain server, the request gets forwarded to an authoritative DNS server which is generally owned by the same owners of the hostname. It is possible that the top-level domain server doesn’t know what the authoritative server is, so it forwards it to another intermediate DNS server which then works to find the authoritative server for this hostname. When receiving a DNS query response, servers will generally cache this information so that subsequent lookups are much faster.
DNS servers also communicate in a common “language.” They communicate using something called DNS resource records. A DNS resource record consists of a name, value, type, and TTL. TTL determines how long a resource record should be cached. The name and value of the resource record depends on its type. There are four types of resource records:
Sometimes a hostname can have an alias which is why the CNAME is a useful message type to send. The two types of resource records that we will focus on are A records and NS records. A records exist on authoritative DNS servers while NS records are used to help take non authoritative DNS servers to authoritative DNS servers. An A record will also be sent with NS records so that the client server can know the IP address of the DNS server it needs to query next.
Conclusion
This ends our discussion on the application layer of computer networking. For an application developer, it is important to consider the protocol, message, message format, and destination that they want to send messages to. If the destination of the message is not an IP address, then a DNS query needs to happen to translate the hostname to an IP address before it is passed into the socket. The next article in this series will discuss what happens after the message is written to the network socket and how servers can read from network sockets to receive messages. This is known as the Transport Layer of the OSI model.