Understanding Network Protocols

What is a Protocol?

  • A protocol is a set of rules that dictate how computer systems or programs should behave.
  • The term "protocol" is borrowed from the non-technical world, such as sociology and politics.
  • In sociology and politics, "protocol" can mean:
    • A formal agreement between nation states.
    • Etiquette, which is the set of conventional rules of personal behavior in polite society.
  • A protocol is a set of rules on how computer systems or programs should behave.

HTTP - Hypertext Transfer Protocol

  • An example of a protocol.
  • You can view HTTP requests and responses using browser dev tools or HTTP proxy tools like Burp Suite or Fiddler.
  • The browser sends an HTTP request to the web server, which understands the message.
  • The web server responds with an HTTP response, which the browser understands how to read.
  • This communication is possible because the HTTP protocol is well described.
  • The rules of behavior of HTTP are described in RFC 9112, an Internet Standard written by the Internet Engineering Task Force (IETF).

Understanding an HTTP Message

  • An HTTP message is made up of multiple parts.
  • Start-line followed by CRLF (carriage return and line-feed), which means a newline.
  • An HTTP message can be a request from client to server or a response from server to client.
  • The start line is either a request-line or a status-line.
  • A request-line consists of a method, [space], request-target, [space], HTTP version.
  • Example: GET /test HTTP/1.1
  • The request method is case-sensitive.
  • If the method is changed from uppercase to lowercase, it will not be a valid HTTP request message.
  • The server responds with HTTP 400 Bad Request if the rules are not followed.
  • The RFC is a long document describing all the rules of behavior, almost like a contract or formal agreement for how the Hypertext Transfer Protocol is supposed to work.

Importance of Detailed Rulebooks

  • Thanks to internet standards, different programs can fulfill the same roles.
  • Browsers like Chrome, Firefox, or Safari, or command line tools like curl or wget, can all be used to talk to a server like nginx or apache because they all implement the rules for how HTTP works.

Protocols and Languages

  • Protocols are important to computers like languages are important to humans.
  • If two different programs speak the same language (e.g., HTTP), they can communicate with each other.

Web APIs as Protocols

  • When you implement a web API for your own website, you also just invented a new protocol.
  • For example, Twitter has an API to look up tweets.
  • There is no standardized protocol on how to do that, so Twitter had to invent their own protocol.
  • This is just a set of rules.
  • You have to use HTTP in a very specific way, so you have to send an HTTP request to this endpoint with these values, and then you get back the tweets.
  • This is really also just another protocol on top of HTTP.

Stacking of Protocols

  • Stacking of protocols on top of each other is very common.
  • HTTP uses TCP.
  • The Twitter API uses HTTP.
  • Keep this in mind because in another video, I want to talk more about this.

Transmission Control Protocol (TCP)

  • There is also an RFC for it, a detailed document describing exactly what the Transmission Control Protocol TCP is.
  • In here we also describe the language, the messages that systems send to each other.
  • With TCP, we actually work with actual bits and bytes.
  • A TCP message consists of multiple parts: the source port, destination port, sequence number, acknowledge number, some flags, a checksum, and some data.
  • The source port is 16 bits long, so two bytes. Same with the destination port.
  • The sequence number is a 32-bit number.
  • Tools like Wireshark decode and show us this data in a human-readable way.

TCP Experiment with Wireshark

  • Sniff all network traffic on my system.
  • Then I open up http://liveoverflow.com in the browser, so we sent an HTTP request.
  • Then I filter for the HTTP protocol in Wireshark.
  • Wireshark recognized that this is HTTP request and response data, but we are not interested in HTTP.
  • HTTP is actually sent and received using TCP.
  • Here we can see the source port, the destination port, the sequence number, acknowledge number, different flags, and so forth. You can find here all the data as described in the RFC.
  • When I right-click on this entry and I say "Follow TCP Stream," we can get all TCP packets related to this HTTP request and response. Suddenly, we see a lot more TCP packets.

TCP Three-Way Handshake

  • A protocol is not just the message itself, but it also describes rules on how and when these messages are used.
  • The three-way handshake: SYN, SYN-ACK, ACK.
  • Fun fact: in reality, it’s four steps, but because steps 2 and 3 can be combined in a single message, it is called a three-way handshake.
  • System A sends a SYN TCP message to B, including a sequence number, 100.
  • SYN stands for synchronize.
  • Then B responds back to A with an ACK packet, acknowledging the reception of the particular sequence number.
  • B also includes their own SYN packet with a sequence number.
  • Now B waits for A to send back an acknowledgment for that.
  • After that, actual data can now be sent.
  • The browser sent a TCP SYN, the server responded with a SYN-ACK, then the browser responded with another ACK.
  • After that, data could be sent, so now the browser sends a TCP packet with the added HTTP data.

Why TCP?

  • A computer only has one internet connection.
  • So when a computer receives some data, which program on the computer should get this data?
  • This is what the port is for.
  • The TCP packets were sent to port 80, which allowed the operating system to forward the HTTP data to the web server program.
  • So with a port number, you can run a lot of different programs on the computer using the same network connection.

UDP - User Datagram Protocol

  • If we would send an HTTP request using UDP to a server, you would wait and nothing happens.
  • UDP packets or UDP messages are very similar to TCP messages.
  • It has a source port, destination port, checksum, and data.
  • But it’s missing other parts like the flag which indicates if it’s a SYN or an ACK packet.
  • The UDP RFC is very, very short, and it’s old. It never had to be updated.
  • This is because UDP is extremely simple. It’s just this message, no sequence back and forth required.

TCP vs UDP

  • TCP first sends a SYN with a sequence number.
  • If we get a TCP ACK packet back with the sequence number +1, then we KNOW for a fact the server really received this packet.
  • The client now knows, yes, this connection works.
  • The server can receive and respond to my TCP messages.
  • The server doesn’t yet know if the client can receive its response.
  • So it also sends a SYN packet with its own sequence number.
  • When the client responds to that packet with another ACK, including the correct sequence number, the server is now also sure the client can receive all packets.
  • So the connection can be considered established, and you can start sending data.
  • Using these sequence numbers, which you can increment for each packet, you can also recognize when data is missing.
  • When you receive sequence number 105 and 107, you know you are missing a 106.
  • Maybe it arrives out of order a bit later, or you have to ask for it to be retransmitted.
  • That’s why the TCP protocol is so much more complex and requires a very detailed description of exactly how each system has to behave.
  • The TCP connection state diagram is a summary and must not be taken as the total specification. Many details are not included.

Definition of a Computer Protocol

  • A computer protocol is a collection of rules, definitions, and specifications of how systems can communicate with each other.
  • Each protocol tries to solve specific problems of communication.
  • If you cannot find a suitable protocol for you, you could theoretically always invent your own.

UART - Universal Asynchronous Receiver-Transmitter

  • Something from the hardware world.
  • If you have ever done Arduino programming or hardware hacking, UART, or serial, is something you might recognize.
  • While it doesn’t have the protocol in the name, it really is a protocol.
  • This protocol works basically with single wires.
  • One wire to transmit and one to receive.
  • The sender and receiver have to agree on exactly the protocol, which means what baud rate to use, how many data bits, or how many stop bits.
  • As long as both systems agree on the configuration, you can use UART.
  • So using a single wire, with bit 0 or 1, whether it’s high or low voltage, you can follow the UART protocol to transmit entire bytes.

Importance of Protocols

  • Protocols are really important because when systems communicate, we need rules on how to do that.
  • Protocols are everywhere, and they are very different.
  • Some protocols are text-based, like HTTP, while some protocols are based on raw binary data like TCP and UDP.
  • Some protocols even talk about the expected voltage levels of wires like in UART.
  • Some protocols just have a single message, like a UART frame or a UDP packet.
  • Other protocols can define a lot of back-and-forth interaction, like the whole sequence diagram of how connections are established with TCP.
  • To use the Twitter API, you first need to follow the OAuth protocol, which is a protocol defining how using HTTP requests and responses, in a specific way, you can authenticate or authorize yourself to Twitter and then use their API.

Protocols Summary

  • Protocols are everywhere around you.
  • They are just a set of rules on how systems communicate with each other.
  • Anytime something sends or receives data, you know it is using some kind of protocol.
  • In order to attack a system, we need to be able to communicate with the system, and that’s why it is important for us to learn about different protocols and how to use them.