TĐ

Lecture 21–22 Sockets, Files, and Inter-Process Communication

Unix Philosophy – “Everything is a File”

  • Core design choice in early Unix: treat most system resources as files addressed only by a file descriptor (FD)
    • Standard streams: stdin (0), stdout (1), stderr (2)
    • Ordinary on-disk files
    • Pipes (anonymous or named) acting as byte buffers between related processes
  • Benefits
    • Uniform, minimal API (read, write, close, dup2, …) regardless of the “thing” behind the FD
    • Easy redirection/composition in shells (|, >, <)

Connecting Independent Processes

  • Naïve method: intermediary on-disk temporary file
    • One process writes, the other reads
    • Drawbacks
    • Disk I/O latency (slow)
    • No built-in synchronisation ⇒ races: Is writer done? Is reader still reading?
  • Pipes (pipe() + fork())
    • Fast in-memory buffer
    • Require parent/child relationship; cannot connect two arbitrary programs later in life
    • Only work inside one process tree (single machine)

fopen, fdopen, popen, pclose

  • FILE * fopen(const char *path, const char *mode)
    • Opens/creates file, returns high-level C stream
  • FILE * fdopen(int fd, const char *mode)
    • Wraps an existing FD inside a FILE * stream
  • FILE * popen(const char *command, const char *type)
    • Creates a process instead of a file
    • Under the hood performs
    • pipe() → builds the data channel
    • fork() → child inherits ends of pipe
    • execl("/bin/sh", "sh", "-c", command, NULL)
    • Modes
    • "w" ⇒ caller writes → child’s stdin; child’s stdout/stderr unchanged
    • "r" ⇒ caller reads ← child’s stdout; child’s stdin inherited from parent (adjustable)
  • int pclose(FILE *stream) waits for child & returns its exit status

Interfaces & Encapsulation

  • Analogy to C++ function interface: identical signature, different implementation
  // Loop implementation
  int sum(int a, int b){
      int total = 0;
      for(int i=a;i<b;i++) total += i;
      return total;
  }
  // Closed-form implementation
  int sum(int a, int b){
      return  (a + b)/2 * (b - a + 1);   // →  \frac{(a+b)}{2}\times (b-a+1)
  }
  • In IPC we need an agreed protocol (interface) so either side can evolve independently as long as it obeys the contract
    • Example: client never needs to care whether data came from live computation or cached file

Sockets – High-Level Idea

  • Generalise pipe concept to two totally independent processes (can be on different computers)
  • Acts like a bidirectional byte stream; still accessed by FD ⇒ read / write semantics hold
  • Only extra requirement: both ends must speak the same application-level protocol (HTTP, FTP, custom, …)
  • High flexibility: connect, disconnect, reconnect at runtime; no parent/child requirement

Typical Uses

  • Web servers / browsers (HTTP)
  • FTP clients/servers
  • Any networked multiplayer or distributed application

Client-Server Paradigm

  • One side (server) = long-lived “receptionist” waiting (listen) on a known address + port
  • Other side (client) initiates (connect) when it wants service
  • After connection established, roles can blur; both sides may read/write arbitrarily
  • Real-world metaphor: historical telephone “Time” service — server continuously answered calls with current time

Addressing: IP & Port

  • IP addresswhere (host)
    • Unique per network interface; many processes share it
  • Port numberwhich service (application)
    • 0–65535, per-host namespace
    • Multiple hosts can use same port (e.g.
      80 for HTTP) without conflict

Handshake Sequence (TCP Stream)

  1. Server
    • socket() → obtain FD
    • bind() → associate FD with local port (and optionally IP)
    • listen() → kernel moves FD to passive state; queue length = backlog
  2. Client (after server ready)
    • socket()
    • connect() → specify remote IP + port (server’s)
  3. Server
    • accept() → removes one pending request from queue, returns new FD dedicated to that client
  4. Both sides read/write freely; original listening FD keeps waiting for more clients
SERVER                            CLIENT
socket → bind → listen            socket
        ^                           |
        | (blocks)           connect()
accept() ———————————————→  (3-way handshake in TCP)
   ↕  dedicated FD                ↕
 read/write …                 read/write …

Low-Level API (Server Side)

  • Headers (POSIX / BSD sockets)
  #include <sys/types.h>
  #include <sys/socket.h>
  #include <netinet/in.h>
  • int sockfd = socket(PF_INET, SOCK_STREAM, 0);
    • PF_INET ⇒ IPv4 protocol family
    • SOCK_STREAM ⇒ reliable byte stream (TCP)
    • SOCK_DGRAM ⇒ datagrams (UDP)
  • struct sockaddr_in addr;
    • addr.sin_family = AF_INET;
    • addr.sin_port = htons(PORT); // host → network byte order
    • addr.sin_addr = …; via gethostbyname() or inet_aton()
  • bind(sockfd, (struct sockaddr*)&addr, sizeof(addr));
  • listen(sockfd, backlog); // e.g. backlog = 128
  • int clientfd = accept(sockfd, NULL, NULL);
    • Returns new FD dedicated to that connection

Low-Level API (Client Side)

  • Same headers
  • Build sockaddr_in server; (set IP+port)
  • int fd = socket(PF_INET, SOCK_STREAM, 0);
  • connect(fd, (struct sockaddr*)&server, sizeof(server));
  • Data transfer
  ssize_t n = read(fd, buffer, sizeof(buffer));
  write(fd, msg, strlen(msg));
  close(fd);

Dealing with Many Clients – Fork + Dup Example

  • Goal: decouple receptionist (accept loop) from worker (actual job)
  • Pattern
  void process_request(int fd){
      int pid = fork();
      if(pid < 0)   return;        // fork failed
      if(pid > 0){                 // parent = server
          wait(NULL);              // optional: reap child
          return;                  // back to accept loop
      }
      // child becomes worker
      dup2(fd, 1);     // redirect stdout to socket
      close(fd);
      execl("date", "date", NULL); // run arbitrary program
      _exit(1);                    // exec failed
  }
  • After dup2, the client reads directly from the spawned program’s stdout

Designing a Real Web Server (Roadmap)

  • Repeat: socket → bind → listen (server) & socket → connect (client)
  • After accept:
    • Fork/thread or event-driven hand-off to worker pool
    • Parse request according to protocol (e.g. HTTP/1.1 headers & body)
    • Generate response (file content, dynamic CGI, etc.)
    • Close client FD
  • In higher-level languages (Python, Go, Rust…) libraries/frameworks abstract most of the boilerplate; in C you manually manage parsing, buffers, byte-ordering, error handling

Cheat-Sheet: Key Functions & Macros

  • socket(domain, type, proto)
  • bind(fd, struct sockaddr*, len)
  • listen(fd, backlog)
  • accept(fd, struct sockaddr*, socklen_t*)
  • connect(fd, struct sockaddr*, len)
  • read / write / send / recv (blocking) / select or poll (non-blocking multiplex)
  • htons, htonl, ntohs, ntohl – host/network byte order conversion (endianness)
  • PF_INET, AF_INET, SOCK_STREAM, SOCK_DGRAM

Performance & Practical Tips

  • Always check return values (-1) and errno
  • Set SO_REUSEADDR with setsockopt to re-bind quickly after crash
  • Consider non-blocking I/O + select/epoll for high-concurrency servers instead of fork per connection
  • Backlog too low ⇒ dropped connections under high load
  • Understand TCP vs UDP trade-offs (reliability vs speed, ordering, connection overhead)

Glossary

  • FD (File Descriptor): integer handle returned by kernel (0…) for any open file-like object
  • Pipe: unidirectional in-memory buffer connecting related processes
  • Socket: endpoint for bidirectional communication between processes – may traverse networks
  • Server: waits for incoming connections on known port
  • Client: initiates connection to server
  • Backlog: max pending connection requests queue length in kernel before accept
  • sockaddr_in: IPv4-specific address struct (family, port, address)
  • htons / ntohs: convert 16-bit values between host and network byte order