AR

CMPSC 311 – File Input/Output & Systems Programming

Input/Output (I/O) Fundamentals

  • I/O = movement of raw bytes between a running process and external entities (devices, files, networks, …).
    • Terminal / keyboard ⇒ “terminal I/O”.
    • Secondary storage ⇒ “file I/O”.
    • Network sockets ⇒ “network I/O”.
  • Each class of I/O comes with its own API quirks.
    • Terminal I/O notoriously messy because of legacy behavior (not covered in detail here).
    • This lecture = focus on File I/O + Network-style blocking semantics.

Buffered vs. Unbuffered I/O

  • Buffered I/O
    • Runtime/library allocates an in-memory buffer.
    • Read buffering: library may pre-fetch > N requested bytes anticipating future reads.
    • Write buffering: data you “write” may sit in buffer; not yet forwarded to kernel/device.
    • Benefits
    • Dramatically fewer system calls ⇒ fewer context switches.
    • Memory (RAM) is orders of magnitude faster than external devices.
    • Canonical examples: printf(), scanf(), all FILE* (*stdio) functions.
  • Unbuffered I/O
    • Bytes go straight to kernel/device on each call.
    • Examples: read(), write(), low-level syscalls.

Blocking, Non-Blocking, & Asynchronous Models

  • Blocking (default): caller sleeps until requested bytes are transferred.
  • Non-blocking (same API as blocking, but FD set O_NONBLOCK)
    • Call returns immediately.
    • Possible results: 0 bytes (EAGAIN), short read/write, or full completion.
    • Short read/write: fewer bytes than requested.
  • Asynchronous (AIO)
    • Separate API family.
    • Submission returns immediately.
    • Completion delivered via callback / signal / event.
  • Design choice: pick matching coding style for device’s blocking behavior.

Terminal I/O Review

  • Three predefined FILE* streams (always open by kernel):
    • STDIN (fd 0)
    • STDOUT (fd 1)
    • STDERR (fd 2)
  • Shell utilities for simple output
    • echo \"hello world\" ⇒ sends formatted text to STDOUT.
    • cat path/to/file ⇒ copies file → STDOUT.
    • less file ⇒ read-only pager.

I/O Redirection in the Shell

  • Redirect output (>) ⇒ program writes go to file.
    • Example: echo \"cmpsc311 output redirection\" > this.dat
  • Redirect input (<) ⇒ program reads come from file.
    • Example: cat < this.dat (equivalent to cat this.dat)
  • Combine both: cat < this.dat > other.dat
  • Common test session
$ echo "cmpsc311 output redirection" > this.dat
$ cat this.dat
cmpsc311 output redirection
$ cat < this.dat
cmpsc311 output redirection

Reading from STDIN vs. File – Demo Program

#include <stdio.h>
int main(void){
  char buf[80];
  printf("What is your name? ");
  scanf("%s", buf);
  printf("Hello, %s\n", buf);
  return 0;
}
  • Interactive run: ./hello ⇒ prompts and reads via keyboard.
  • File-fed run: ./hello <name (where file ‘name’ contains Trinity)
    • Output shows prompt echoed, then file content consumed.
  • Add output redirection: ./hello out
    • Both prompt and greeting land in file out.

Pipes (|)

  • Not redirection; connects two processes.
    • cat this.dat | less ⇒ stdout of *cat* becomes stdin of *less*.
  • Can chain arbitrarily:
    • cat numbers.txt | sort -n | cat sorts numerically then prints.
$ cat numbers.txt
14 21 7 4
$ cat numbers.txt | sort -n | cat
4 7 14 21

libc: The C Standard Library

  • Provides compiler-agnostic interfaces that in turn rely on OS syscalls.
  • Core headers used here
    • stdio.h → buffered I/O, FILE*, printf family.
    • stdlib.h → memory mgmt, conversions, exit(), …
    • stdint.h → fixed width ints.
    • signal.h → POSIX signal handling.
    • math.h, time.h, … many more.
  • Conceptual layering
User Code  →  libc (fopen, printf, …)  →  Kernel Syscalls (open, write, …)

Library Call vs. System Call

  • open() → direct OS syscall (section 2 of man pages).
  • fopen() → library wrapper (section 3 of man pages).
    • Often implemented on top of open().
  • Use man 3 fopen, man 2 open etc.
    • Notation foo(2) means “documented in section 2”.

General File I/O Lifecycle

  1. Open file (library or syscall).
  2. Perform any sequence of reads / writes.
  3. Close file.
  • File object maintains a cursor/offset; next operation begins there (random access).

Paths: Absolute vs. Relative

  • Absolute: begins with root ‘/’ ⇒ /home/mcdaniel/courses/cmpsc311-sum19/this.dat
  • Relative forms from current directory
    • ./courses/cmpsc311-sum19/this.dat
    • courses/cmpsc311-sum19/this.dat
  • All examples above reference identical inode.

High-Level I/O – FILE* Abstraction

  • FILE is an opaque structure storing buffer, flags, fd, etc.
    • gdb printout shows internals (_IO_* fields).
  • Designed for text / line-oriented data, but can handle binary too.

fopen()

  • Prototype: FILE *fopen(const char *path, const char *mode);
  • Returns pointer (stream) on success, NULL + errno set on error.
  • Library allocates/deallocates structure; you only hold pointer.
  • Common mode strings
    • "r", "r+", "w", "w+", "a", "a+" (see table below).

fopen Mode Semantics

  • “r” ⇒ read, file must exist.
  • “r+” ⇒ read/write, cursor @ start.
  • “w” ⇒ truncate or create, write-only.
  • “w+” ⇒ read/write, create or truncate.
  • “a” ⇒ append, cursor @ EOF, create if absent.
  • “a+” ⇒ read + append.

Reading

  • fscanf(FILE *stream, const char *fmt, …)
    • Behaves like %%scanf; returns n_{matched} items or EOF.
  • fgets(char *buf, int size, FILE *stream)
    • Reads up to first \n, EOF, or size-1 chars; returns pointer or NULL.
char str[128];
if(fgets(str,128,file)!=NULL){
  printf("Read line [%s]\n", str);
}

Writing

  • fprintf(FILE *stream, const char *fmt, …)
    • Analogous to printf.
  • fputs(const char *s, FILE *stream)
    • Writes string w/out adding newline.

Buffer Control – fflush()

  • Prototype: int fflush(FILE *stream);
  • Use-cases
    1. Ensure buffered writes are pushed to kernel.
    2. Discard prefetched read buffer so program sees up-to-date file.
  • stream = NULL ⇒ flush all open output streams.
  • Not durable persistence guarantee; call fsync(2) after fflush if disk commit required.

Closing – fclose()

  • int fclose(FILE *stream);
  • Implicitly calls fflush; releases internal buffers; stream pointer invalid afterwards.

Complete stdio Example (Highlights)

  • Open "r+", loop while !feof(file) reading coordinates with fscanf + text with fgets.
  • Later, append new coordinates and line, fflush, fclose.
  • Outcome shows original data plus appended block.

Low-Level I/O – open(), read(), write(), close()

open()

  • Prototype: int open(const char *path, int flags, mode_t mode);
  • Returns file descriptor (small non-negative int) or -1 on error.
  • flags choose access mode + modifiers; mode supplies permission bits when creating.

Common Flag Bits

  • Basic
    • ORDONLY, OWRONLY, O_RDWR (pick one).
  • Modifiers
    • O_CREAT (create if absent).
    • OEXCL (with OCREAT ⇒ fail if file exists).
    • O_TRUNC (truncate to length 0).
    • Combine via bitwise-OR: e.g. OWRONLY | OCREAT | O_EXCL.

UNIX Permission Model Recap

  • Discretionary Access Control (DAC); owner controls permissions.
  • Three subjects: Owner (user), Group, World (others).
  • Three rights: Read, Write, Execute.
    • Execute ⇒ allowed to run file as program / enter directory.
    • Read ≠ Execute; reading text does not grant run privilege.
  • Display format: rwxrwxrwx (bits cleared shown as ‘-’).
    • Eg: rwxrw---x ⇒ owner = rwx, group = rw-, world = --x.

Numeric/Constant Forms (mode_t constants)

  • Provided in
    • SIRUSR (00400), SIWUSR (00200), S_IXUSR (00100)
    • S_IRWXU (00700) etc.
  • Build creation mode via bitwise-OR, e.g.
mode_t mode = S_IRUSR | S_IWUSR | S_IRGRP; // 00640

File Descriptors Internals

  • Kernel maintains per-process table mapping fd ⇒ open-file description.
  • open() returns index into that table ⇒ you pass fd back to read/write/close.
    • Why int? simple, cheap, copyable.

read() / write()

  • Prototypes:
ssize_t read(int fd, void *buf, size_t count);
ssize_t write(int fd, const void *buf, size_t count);
  • Return value = bytes transferred (may be < count).
  • Always verify result; loop until desired bytes processed.
  • Caller allocates buffer for read.

close()

  • int close(int fd); removes table entry; fd becomes free.
  • Common idiom: after close set variable to -1 to catch accidental reuse.

Complete open()/read()/write() Example

  • Phase 1: create ‘/tmp/open.dat’, flags OWRONLY|OCREAT|OEXCL, mode SIRUSR|SIWUSR|SIRGRP.
    • Write 1000 ints all 0xff to file, then close.
  • Phase 2: reopen read-only, load into vals2, verify.
  • Hex dump with od -x -N 256 … shows repeating 00ff words.

fopen() vs open() – Practical Guidelines

  • fopen advantages
    • Transparent buffering ⇒ possible speed gains for line-oriented text.
    • Automatic EOL translation when not in binary mode (important cross-platform).
    • Access to rich parsing/formatting functions: fscanf, fprintf, getline.
  • open advantages
    • Precise control, no hidden buffering.
    • Suitable for binary blobs, memory-mapped I/O, async, select/poll, etc.
  • Rule of thumb: use FILE*/stdio for ASCII/text; use fd syscalls for binary/performance-critical paths.

Required Header Files Cheat-Sheet

  • High-level (FILE*) API ⇒ #include .
  • Low-level syscalls ⇒
    • #include
    • #include
    • #include (open flags)
    • #include (read/write/close)
  • Always consult man page section “SYNOPSIS” for definitive list.