Link Layer: Error and Flow Control

Data Link Layer Functionalities

The Data Link Layer (DLL) has two sublayers:

Data Link Control (DLC): Handles error control, flow control, and framing for both point-to-point and broadcast links.
Medium Access Control (MAC): Manages channel access for broadcast links, determining when to send a packet.

Data Link Control (DLC)

The DLC provides:

Error Control: Mechanisms to detect and correct errors in transmitted data.
Flow Control: Manages the rate of data transmission to prevent overwhelming the receiver.
Framing: Defines the structure of data packets (frames) for transmission.

Reliable Transmission

Given an error detection scheme (e.g., Cyclic Redundancy Check - CRC) that identifies corrupted packets and sequence numbers to detect lost packets, the following questions arise:

Can we make the channel appear reliable to the upper layer?
How do we cope with packet errors to provide a reliable service?
How do we maintain packet order for in-order delivery service?
How do we provide reliability at full link capacity (link utilization)?

The answer is YES, using Automatic Repeat reQuest (ARQ) algorithms.

Automatic Repeat reQuest (ARQ)

ARQ is an error-control method that uses acknowledgments (ACKs) to achieve reliable data transmission over an unreliable channel. If the sender doesn't receive an ACK, it retransmits the packet until an ACK is received or a predefined number of retransmissions is exceeded.

ARQ protocols must satisfy two requirements:

Accuracy: Packets must be delivered to layer N+1 at the receiver correctly (without errors) and exactly once (no duplicates).
Efficiency: Capacity loss due to useless retransmissions and time wasted waiting for packets or ACKs must be minimized.

Terminology: ACK, SACK, NACK

Cyclic Redundancy Check (CRC): Used at the receiver to detect errors. The probability of undetected errors can be made very small (e.g., $10^{-10}$ ).
Acknowledgment (ACK): The receiver informs the sender that a frame (packet) has been correctly received.
Selective Acknowledgment (SACK): Specifies the set of frames correctly received (e.g., using a bitmask).
Cumulative Acknowledgment (ACK): "ACK i" means all frames 1, 2, .., i-1 have been correctly received, and frame i is expected.
Negative Acknowledgment (NACK): The receiver refuses to accept the frame due to errors or out-of-order delivery.

In-order vs Out-of-order Delivery

Due to channel errors and retransmissions (RETX), the frame sequence at the receiver can contain holes (missing frames.) Missing packets undergo retransmission, which can lead to out-of-order (OOO) delivery.

Example:

Sequence numbers are used to identify frames uniquely.
The transmission sequence is frames 1, 2, 3, 4, 5.
Frames 1 and 3 are corrupted (lost or contain errors).
The receiver asks for retransmission of 1 and 3, which are transmitted in the next round.
The second transmission of 1 and 3 is successful.
The final sequence at the receiver is 2, 4, 5, 1, 3 (OOO).

This sequence can be passed to the upper layers as is or re-ordered.

Terminology – ACK, SACK details

ACK(n): Cumulative acknowledgment, indicating that the next PDU expected at the receiver, in-order, is PDU n, and all previous LL PDUs 1, 2, …, n-1 have been successfully received.
Note: Other ACK types are possible, where ACK(n) only indicates the successful delivery of PDU n.
Cumulative ACKs are also used by TCP (layer 4) for normal operations (no errors).
NACK(n): The receiver is out-of-order; PDU n is the first expected in-order PDU at the RX.
SACK(n): Selective acknowledgment, indicating PDU n has been successfully received.
Note: ACK(k, l, n) could contain a “bitmask” indicating which packets (k, l, n) are still missing at the RX.
Modern versions of TCP also use SACK for fast recovery of error events involving multiple packets in a Round Trip Time (RTT).

ARQ Diagram (General)

Key components in a general ARQ Diagram consist of a sequential input of Link Layer PDUs, a SEND (TX) buffer, a RETX buffer, channel with feedback for ACK/SACK/NACK, delay in feedback, Frame error check functionality and a Resequencing buffer to re-order OOO sequences producing a Sequential Output of LL PDUs.

ARQ Issues: Deadlock

Problem: Deadlock arises when the sender waits for a positive feedback (ACK) from the receiver acknowledging the correct reception of the frame. If the frame doesn't arrive at the receiver, the sender may wait for the ACK forever, violating the efficiency requirement.

Solution: ACK time-out. After sending the frame, the sender starts a timer. If the timer expires before the sender receives the ACK, the timeout expires, and the frame is retransmitted (different retransmission strategies can be chosen).

ARQ Issues: Duplicate Packets

Problem: Duplicate packets occur when a frame is correctly received and passed to the upper layer, and the ACK is returned to the sender but gets lost. The timeout expires, and the sender retransmits the same frame again, leading to duplicate packets, violating the accuracy requirement.

Solution: Sequence Number (SN). Each new sent frame has an identifier number (sequence number). The RX discards consecutive frames with the same sequence number. The PDU format is Type | SN | ACK | INFO | CRC.

Key Assumption: Heavy-Traffic

The SEND BUFFER at the transmitter is never emptied, corresponding to a scenario where the channel is always filled with transmitted frames. This allows for a simpler mathematical analysis and allows obtaining bounds (max throughput).

Stop & Wait ARQ (SW-ARQ)

The simplest ARQ protocol, where simplicity doesn't equate to being bad.

Basic algorithm (SENDER):

Send FRAME i
Wait for an ACK or timeout (TO)
If TO occurs, go to 1
If ACK, get new FRAME from send BUFFER (increase i), go to 1

Link Layer Protocol Data Unit (PDU)

Consider a Link Layer PDU with:

H bits - header
I bits - payload
F=H+I - LL DATA packet size [bits]
A - LL ACK packet size [bits]
RLL - constant bitrate @ LL [bits/s]

Definitions for ARQ

$R_{LL}$ : bitrate at the LL [bit/s]
$\rho$ : channel utilization factor ∈[0,1]
F: (H+I) frame size [bits]
$\eta$ : normalized goodput efficiency
$tF$ : F/ $R{LL}$ frame TX time [s]
$\tau_p$ : propagation delay [s]
$t_I$ : TX time (DATA only) [s]
$tA$ : A/ $R{LL}$ ACK TX time [s]
$t_T$ : total frame TX time [s]
RTT: time to TX and ACK a pkt [s]
p: LL PDU error rate

Stop & Wait ARQ Round Trip Time Illustration

The channel is used inefficiently in SW-ARQ (e.g., IEEE 802.11 exhibits this problem).

Stop & Wait ARQ (SW-ARQ) Operation

Normal Operation: Frame 0 is sent, ACK 0 is received.
Timeout (to avoid deadlocks): Frame 0 is sent, Timeout occurs, Frame 0 is re-sent, ACK 0 is received.

SW-ARQ Expected Total Retransmission Time E[TT]

Given:

p: LL PDU error probability over the link
i: number of RETXs for the same LL PDU
$ti$ : total RETX time, where $ti = i \cdot RTT$
$P_i = p^i (1 - p)$ : probability that i RETXs are needed

The expected total transmission time E[TT] is given by:

$E[tT] = RTT + \sum{i=1}^{\infty} ti Pi = RTT \left[1 + (1 - p) \sum{i=1}^{\infty} i p^i \right] = RTT \left[ \frac{1}{1-p} \right] = \frac{tF + tA + 2Tp}{1-p}$

Utilization Factor

The utilization factor, $\rho$ , is defined as:

$\rho = \frac{tF}{E[tT]} = \frac{tF (1 - p)}{RTT} = \frac{tF (1 - p)}{tF + tA + 2T_p}$

Note that $\lim{p \to 0} \rho(p) = \frac{tF}{RTT}$ , which is not good since it never goes to 1, even for an error-free channel if RTT > tF.

Goodput Efficiency

The normalized goodput efficiency, $\eta$ , is defined as:

$\eta = \frac{tI}{E[tT]} = \frac{tI}{tF} \cdot (1 - p) = \frac{I/R{LL}}{F/R{LL}} \cdot (1 - p) = \frac{I}{F} \cdot (1 - p)$

Which can also be expressed as:

$\eta = \frac{I}{F} (1 - p) = \frac{I (1 - p)}{F + A + 2R{LL}Tp} = \frac{I (1 - p)}{I + H + A + 2R{LL}Tp}$

Approximating p as $p = 1 - (1 - P{bit})^F \approx FP{bit}$ if P_{bit} << 1, then

$\eta \approx \frac{I (1 - (I + H)P{bit})}{I + H + A + 2R{LL}T_p}$

Drawback of SW-ARQ

SW-ARQ allows the transmission of only one FRAME at a time. Given a bitrate $R{LL}$ [bit/s] and a round trip delay RTT [s], the Bandwidth-Delay Product (BDP) is $R{LL} \cdot RTT$ . It indicates the total number of bits that the channel pipe can host in a full round-trip-time. Consequently, SW-ARQ is inefficient if R{LL} \cdot RTT >> F, or equivalently, RTT >> tF.

Solution: Pipelining

The IDEA is to transmit multiple unacknowledged frames back-to-back, filling the “channel pipe” with LL PDUs, especially when the BDP is large. ARQ protocols that use pipelining are Go-Back-N (GBN) and Selective Repeat (SR).

Sliding Window Protocols: Introduction

GBN and SR are sliding window protocols. The window is a sequence of frames (identified by sequence numbers) to be transmitted or received. On the TX-side, the window size corresponds to the number of unacknowledged DATA frames that can be transmitted over the channel.

The Sender Window (sendW) increases and/or moves when an ACK is received. Packets in sendW must be buffered at the TX. The window may grow & shrink in some protocols (e.g., TCP).

Sliding Window ARQ - SENDER

The SENDER is provided with a sending window of size N. The Sending Window = [SNmin, SNmax], where:

SNmin= SN of the first unacknowledged packet
SNnxt = SN of the next packet that can be sent
SNmax=SN of the last packet that can be sent within the current sending window
SNnxt-SNmin+1= total number of in-flight packets, i.e., packets that have been sent but not yet ACKed (N) → these packets are buffered at the sender
SNmax-SNmin+1 = N = sending window size = max number of in-flight packets

Sliding Window ARQ - RECEIVER

The RECEIVER has a receiving window of size M. The Receiving Window = [RNmin, RNmin+M-1]: sequence numbers of the packets that can be accepted by the receiver (depends on buffer).

RNmin = SN of the next expected packet (in-order)
M = maximum number of packets that can be buffered at the receiver

If in-order-delivery mode is configured, PDUs are passed to the upper layer in sequence (without gaps). The packets in the receiving window are not passed to the upper layer until the packet with SN=RNmin is also received.

Sliding Window ARQ – Operation

If the sender receives ACK(n) with n ∈ (SNmin, SNmax+1], all packets up to and including n-1 are OK, n is the next expected, and the sending window is slid forward setting SNmin=n (cumulative ACK), and SNmax=SNmin+N-1.

When a packet with SN=x arrives from the lower level at the receiver:

If x falls outside the receiving window, the packet is discarded!
If x > RNmin, the packet is buffered
If x= RNmin, the receiver sets RNmin equal to SN of the next missing packet in the sequence of received packets and delivers to the upper layer the packets with SNs up to the new RNmin
Receiver sends ACK(RNmin) (the ACK of the next in-order packet to be received).

Go-Back-N ARQ (GBN-ARQ)

Inefficient because packets with SN 4–7 could have been received even before.
OVERALL, N=5 LL PDUs (SN3-7) WERE SENT TWICE!
A full RTT of packets is RETX.
SNnext goes back to 3 (first unacked).
PDUs from SNnext to SNmax are resent.
Note: prop. delay is asymmetric.

Can lead to full channel utilization. A NACK(i) or timeout restarts TX counter from i.

Full Channel Utilization vs N & RTT

Continuous transmission requires that the ACK relative to one in-flight packet is received while the transmitter is still transmitting (one of the) N packets in the previous window.

GBN-ARQ - Timeout vs Window Size

The timeout should be set such that: $to \geq tF + Tp + tA + T_p = RTT$

For a stringent timeout we set: $to = N tF$

GBN-ARQ – Timeout vs Window Size

With the stringent timeout assumption, the expected total TX time for a single packet is calculated. For a successful (last) TX of the PDU, we use tF in place of RTT; this is the only difference with respect to SW-ARQ.

GBN-ARQ - Utilization Factor

The utilization factor, $\rho$ , is:

$\rho = \frac{tF}{E[tT]} = \frac{tF (1 - p)}{tF + (tA + 2Tp)p} = \frac{F(1 - p)}{F + (A + 2R{LL}Tp)p}$

$\lim_{p \to 0} \rho(p) = 1$

Selective Repeat ARQ (SR-ARQ)

Requires a NACK(i) or timeout.

SR ARQ Example

Timeout relative to SN 3 expires.
PACKET 4 IS SAVED in queue
PACKET 5 IS SAVED in queue
PACKET 6 IS SAVED in queue
PACKET 7 IS SAVED in queue
SN 3 - RETX
Now packet with SN 3 is received, so the receiver can send to the upper layers also packets with SN 4–7 that were kept in the queue → REORDER AND SEND to upper layer
OVERALL, ONLY 1 PACKET IS RESENT → decrease no. of in-flight PDUs, advance SNmax, so that SNnext can be TX

SRQ-ARQ Examples

Link layer of mobile systems
5G cellular systems (and older generations LTE, UMTS)
Modern versions of TCP
NewReno with SACK
Modern versions of IEEE802.11
IEEE802.11ax (uses S&W+SR)

SR-ARQ- E[tT]

Given:

$t_i$ = i * tF: total RETX time
$P_i = p^i (1 - p)$ : prob. that i RETX are needed

Expected total TX time for a single packet:

$E[tT] = tF + \sum{i=1}^{+\infty} ti Pi = tF +tF(1 - p) \sum{i=1}^{+\infty} i p^i = \frac{t_F}{1 - p}$

Utilization Factor & Goodput Efficiency

Utilization factor: $\rho = \frac{tF}{E[tT]} = 1 - p \approx 1 - (I + H)P_{bit}$
Normalized goodput efficiency: $\eta = \frac{I}{tF} \cdot (1 - p) = \frac{I}{I + H} \cdot (1 - (I + H)P{bit})$

Example Results

The results show the throughput efficiency h as a function of the bit error probability Pb for different ARQ protocols (SW-ARQ, GBN-ARQ, SR-ARQ) under small BDP (tC=10 bits) and larger BDP (tC=1000 bits) conditions.

Optimal Packet Size for ARQ

Observations:

ARQ Performance depends on packet size
Adaptive selection of packet size as a function of BER is possible
Shorter packets are a better choice when the channel is error prone (BER >>)

Optimal PDU Size for SR-ARQ

Problem: find the value of x that maximizes the SR goodput (x),

where:

$R_{LL}$ : available bitrate at the link layer [bit/s]
p: packet (PDU) error probability
Pb: bit error probability (assumption: iid errors on bits)
x: PDU length (data+overhead) [bits]
o: PDU overhead (header H +CRC C) [bits]

SR Goodput

$\eta(x) = R{LL} \cdot \frac{x - o}{x} \cdot (1 - p) = R{LL} \cdot \frac{x - o}{x} \cdot (1 - P_b)^x$

Optimal PDU Size for SR-ARQ

$\eta(x) = R{LL} \cdot \frac{x - o}{x} \cdot (1 - p) = R{LL} \cdot \frac{x - o}{x} \cdot (1 - P_b)^x$

We can use the following formulas to derivate and solve for x:

$\frac{d(f(x)/g(x))}{dx} = \frac{f'(x)g(x) - f(x)g'(x)}{g^2(x)}$

$\frac{d(f(x)g(x))}{dx} = f'(x)g(x) + f(x)g'(x)$

$\frac{d(a^x)}{dx} = a^x log a$

Optimal PDU Size for SR-ARQ derivation

Setting $\frac{dn(x)}{dx} = 0$

We have: $0 = x^2\xi - xo\xi + o$
where $\xi = log(1 - P_b)$

With x > 0, x^2 > 0, (1 - P_b) > 0

Optimal PDU Size for SR-ARQ solutions

The solutions are:

$x{1,2} = \frac{o \pm \sqrt{o^2 - \frac{4o}{\log(1 - Pb)}}}{2}$

ARQ Performance Considerations

ARQ performance generally depends on pipe capacity.
ARQ Selective Repeat always achieves the best performance.
ARQ S&W is the simplest protocol, and its performance rapidly decreases for large pipe capacities; however, it performs optimally when the pipe capacity is close to one.
In point-to-point connections, the pipe capacity is often very close to one → S&W is very common at the DLL.
ARQ mechanisms are also used at other layers (e.g., transport (TCP), application). In this case, the pipe capacity can be much larger and, hence, GBN and SR (or their combinations) are generally preferred.
For sliding window protocols, the optimal size of the transmit window is equal to the pipe capacity (in packets). However, a larger window is fine if the sending rate is paced by the ACK reception process.
For Selective Repeat, the receive window should be equal to the sender window (N PDUs) – reordering occurs.

Selective Repeat ARQ protocols and TCP

Selective Repeat ARQ protocols are often used with a fixed window size.
Transmission Control Protocol (TCP) dynamically adapts the window size.
cwnd is increased/decreased in TCP, but always kept smaller than Wmax (fixed by design).
Specific details/rules might change, but the basic ideas apply to most protocols.