What exactly happens to data that we send over the network, how does it go from a text file on one system to bits on a wire and back to a text file on another system, let’s find out!


The concept

It is no easy feat getting data from one system to another, on the other side of the world. Many things need to operate seamlessly for the journey the data is taking to be completed successfully. Fortunately, the overlords of the internet since it was first created (different versions of which date back as far as 1969!) have, over the years, formalized many standards which we use today. These standards ensure that all different kinds of systems are able to send and receive data from the internet and decode it into something they can read and use.

The process of how data gets encapsulated and decapsulated with various headers, each adding vital information for transporting the data, can be described by the OSI model. The OSI model, is infact just a model, it doesn’t inherently do anything, it doesn’t run on our computers, it is a reference we use to learn what is happening under the hood and to help troubleshoot issues with networks. You might hear of the TCP/IP model also. The TCP/IP model is similar to the OSI model although it is less descriptive in its segmentation of the layers. But, the fact is, the TCP/IP model IS what’s used under the hood because it directly describes the TCP protocol and IP protocol which are used to move our data. It is a bit of a confusing mess the whole TCP/IP vs OSI situation, but in practice, for a network engineer, we can safely just worry about the OSI model for our learning.


The layers

There are 7 layers to the OSI model and each has a different purpose. As a network engineer, we will mostly be concerned with the lower 4 layers as they relate more to what is happening at a network level with the data. The top 3 layers are more in the domain of software engineering and application development. That said, it is important we do know the basic function of the top 3 layers so we have a holistic view of how data moves.

The layers are:

  1. Physical
  2. Data Link
  3. Network
  4. Transport
  5. Session
  6. Presentation
  7. Application

This list goes from 1 at the top and lists down toward application at number 7. But it is conceptually better to think of it as the bottom of the stack is the physical layer (layer 1) and the top of the stack is the application layer (layer 7).

This is because data starts at the application layer and is encapsulated down the stack until it gets to the physical layer, when it is physically transferred from the system to the wire plugged into the network card.


Note on the above clip, in the TCP header there would be a source and destination port. Usually for a HTTP request like this, 443 is going to be the destination port on the server and the source port on the local machine will be randomly chosen.


Encapsulation

The encapsulation of the data is the act of adding new headers and in one case, a trailer, to the original data preparing it for ultimately going onto the wire and across the network. Let us now have a look at the function of each layer and how the original data changes at each stage.

During the encapsulation and decapsulation process, we have different names for what exists at that time. At the top end of town, the application, presentation and session layers just hand down data, it is not known as anything else, it is just data. Once the transport layer takes control, it breaks the data into what are called segments. Next when the network layer headers are added, the segments become packets. The packets are handed down to receive the header and trailer from the data link layer and become frames. Frames are then what is put onto the physical wire and transmitted as 1’s and 0’s.


What does each layer do

Application layer

The application layer is of little concern to a network engineer. Nonetheless, it is still important to the whole picture of data getting transferred between networks. This is where applications which users interface with, are interacting with protocols which give the applications access to the network.

Presentation layer

The presentation layer is where data will get either encrypted, encoded, formatted, or optimised in such a way that the receiving application on the other end of the data transfer, can interpret it and make use of it. The data may go through any or multiple of these steps in the presentation layer. This process can also allow the data to be transferred between different architectures and operating systems and still be useable.

Session layer

This is where applications will establish sessions with each other so they have a direct communication channel. The session will ensure that the applications can coordinate their communication so data transfer is reliable and able to recover from unexpected interruptions.

A note on the upper layers

In practice, the top layers and to an extent even the transport layer are most often intertwined to essentially operate as one big layer performing all the functions of each. This is modern computing working more efficiently. Conceptually we can think of all these functions to be a part of their respective layer, to aid in understanding the life of a packet, but practically it just doesn’t work that way.

Transport layer

This is where things start to get interesting to a network engineer. This is where the transmission protocol is determined for the packet. Most commonly this is either going to be TCP or UDP. Depending on the type of application, the connection either needs to be more reliable or more concerned with speed, TCP is more reliable, UDP is faster. I will not go into the detail of TCP and UDP here, but that is the short story of it. This is also where the TCP or UDP port numbers will be added to the data. This is also where the chunk of data the application wants to send is broken down into what we call, segments. An application may want to send 10 Megabytes of data across the network, but the maximum size a frame can be is 1500 bytes (disregarding jumbo frames for now). So the data must be broken into segments.

Network layer

The network layer is where IP addresses are put onto the segments and is the core of what makes transport between two different networks possible. Routing protocols use the network layer information to properly route the packets to where they need to go. If for example, we connected a laptop directly to a PC with an ethernet cable, there is no need to use the network layer, the devices are on the same network. That example is relatively impractical, but it also applies to multiple devices connected via switches.

Layer 2 is what allows data to be transferred on the same network. This is where MAC addresses are added to the packets in a header and also the CRC/FCS trailer is added to the other end of the packet and it becomes a frame. Devices use the MAC addresses to know what next device on the local network to send the frame to. If the frame is destined for the internet, it will be going via the default gateway for that local network (i.e the router). Once the frame hits the router the MAC addresses from the local network are stripped away and the source MAC becomes the router and the destination is determined by the network layer (a destination MAC address must still be found and attached, this will be for the next hop and will continue until the hop takes the frame into the desired network)

Physical layer

Electricity and light. This is where our data is physically put onto a cable or into a radio wave and sent across the wire or the air to the next device. On this layer is where we talk about ethernet cables, wifi signals and fibre optic cables.


In conclusion

So there is a basic summary of the OSI model and how it works. I have skipped many different things on each layer that are involved, these are mostly more advanced things that we don’t need to know just to get a conceptual understanding of the OSI model and how data is moved across networks.