AR

CMPSC 311 – Strings in C

ASCII Encoding

  • Definition: American Standard Code for Information Interchange (ASCII) maps characters to integer codes.
    • Seven-bit code; values 0\text{–}127.
    • Control characters 0\text{–}31 & 127 (e.g., 0 = NUL, 7 = BEL).
    • Printable characters 32\text{–}126 (space, punctuation, digits, upper/lowercase letters, etc.).
  • Example (type–conversion)
    • int a = 65;
    • printf("a is %d or in ASCII '%c'\n", a, (char)a );
    • Output: a is 65 or in ASCII 'A'.
  • Take-away: Any single C char literally is a small integer containing an ASCII code.

Strings in C – General Model

  • A string == null-terminated array of char in C.
    • Terminating byte is \text{0x00}, written '\0'.
    • All library functions rely on this terminator; forgetting it creates unterminated strings.
  • Library header: #include <string.h> exposes string API.
  • Three semantically identical declarations (compile-time string literal “hello\n”):
    • char *x = "hello\n";pointer into read-only memory.
    • char x1[] = "hello\n";array with compiler-copied bytes.
    • char x2[7] = "hello\n"; ➔ explicit length (6 visible chars + null).
    • All yield identical runtime output: printf("%s\n", var); prints “hello”.

How String Storage Differs

  • String literal (char *x): pointer to read-only segment.
    • Attempting x[4] = 'j'; ⇒ Segmentation fault.
  • Array initialized from literal (char x1[]): resides in writable data segment.
    • x1[4] = 'j'; is legal; modifies local copy.

Determining Size vs. Length

  • sizeof(str)
    • Compile-time operator; returns allocated object size.
      • Pointer variable → typically 8 bytes on 64-bit machines.
      • Array variable → number of elements (includes null terminator if declared from literal).
  • strlen(str)
    • Run-time function; counts characters until first \0 (excludes terminator).
  • Code illustration:
  char *str  = "text for example";   // 16 chars + \0 in read-only seg.
  char  str2[17] = "text for example";// 17-byte array in data seg.
  printf("sizeof(str)  = %lu\n", sizeof(str));  // ⇒ 8
  printf("sizeof(str2) = %lu\n", sizeof(str2)); // ⇒ 17
  printf("strlen(str)  = %lu\n", strlen(str));  // ⇒ 16

String Initialization – Good & Bad

  • Good forms (proper null):
    • char *str1 = "abc";
    • char str2[] = "abc";
    • char str3[4] = "abc";
    • char str5[] = {'a','b','c','\0'};
  • Dangerous / Unterminated:
    • char str4[3] = "abcd"; → literal truncated, terminator missing.
    • char str6[3] = {'a','b','c'}; → no '\0' stored.
    • char str7[9] = {'a','b','c'}; → terminator missing; rest un-initialized.
  • Runtime symptom: Printing unterminated strings leaks adjacent memory until an accidental zero; output like abc*@.

Copying Strings – Buffer Overflows

  • Unsafe: strcpy(dest, src) blindly copies until \0.
  char *str1 = "abcde";
  char str2[6], str3[3];
  strcpy(str2, str1);   // OK (exact fit)
  strcpy(str3, str1);   // OVERFLOW! overwrites i, stack, etc.
  • Demonstrated stomp: local variable i changed from 255 to 101.
    • Moral: Always ensure destination large enough, or switch to length-bounded variants.

n-Variants: Safer Alternatives

  • Key idea: Provide max-bytes parameter n.
    • strncpy(dest, src, n)
    • strncat(dest, src, n)
    • strncmp(s1, s2, n)
  • Example (safe copy + manual terminator):
  strncpy(str3, str1, 2); // copy at most 2 chars → "ab"
  str3[2] = 0;            // force terminator
  • Caveat: If first n bytes of source lack a null, destination may remain unterminated. Always append your own 0.

Concatenation

  • Combine strings end-to-end.
    • strcat(dest, src); → appends entire src after dest’s terminator.
    • strncat(dest, src, n); → appends at most n bytes.
  • Example:
  char str1[20] = "abcde", *str2 = "efghi";
  strcat(str1, str2);   // str1 = "abcdeefghi"
  • Requirement: dest must already contain valid terminator and enough space for combined length + null.

String Comparison

  • Lexicographic functions:
    • strcmp(s1, s2) → examines until mismatch or null.
    • strncmp(s1, s2, n) → examines at most n bytes.
  • Return value semantics:
  • Use comparisons for sorting, equality checks, etc.

Searching within Strings

  • Character search:
    • strchr(str, ch) → first occurrence front-to-back.
    • strrchr(str, ch) → last occurrence back-to-front.
  • Substring search:
    • strstr(str, "sub") → case-sensitive front search.
    • strcasestr(str, "sub") → case-insensitive (GNU extension).
  • All return char* pointer to found position or NULL.
  • Demonstrative output:
  strchr : 0xxxFindm...
  strrchr: 0xxxxFindme2xxxxx
  strstr : Findmexxxx0...
  strcasestr: Findmexxxx0...  (case-insensitive success)

Parsing Strings with sscanf

  • Convert formatted text into typed variables.
    • Prototype: int sscanf(const char *str, const char *fmt, ... );
    • Works like inverse of printf; pointers to targets mandatory.
    • Returns number of successfully matched fields.
  • Example:
  char *src = "1 3.14 a bob";
  int   i; float f; char c; char s[20];
  int ret = sscanf(src, "%d %f %c %s", &i, &f, &c, s);
  // ret==4, i==1, f==3.14, c=='a', s=="bob"

Security & Practical Implications

  • The standard C string interface is not memory-safe; common source of:
    • Buffer overflows (overflowing return addresses, data corruption).
    • Information leakage (reading past terminator).
    • Crashes (segmentation faults on read-only literals).
  • Guidelines:
    • Prefer n-variants (strncpy, strncat, strncmp).
    • Manually ensure null terminators after bounded copies.
    • Follow CERT Secure C Coding Standards.
    • Consider safer languages (e.g., Rust) for new systems-level code.

Key Takeaways / Study Checklist

  • Understand ASCII mapping and its numerical nature.
  • Memorize how C represents strings (null-terminated arrays) and why that matters.
  • Distinguish between pointer to literal vs array initializations and mutability.
  • Correctly use sizeof vs strlen.
  • Recognize unterminated string bugs & buffer-overflows.
  • Use n-variants for safer copying, concatenation, and comparison.
  • Employ search & parse utilities (strchr, strstr, sscanf) effectively.
  • Always validate buffer capacity and terminate strings explicitly.
  • Keep security in mind; uncontrolled string manipulation is a common exploit path.