ASCII Encoding
- Definition: American Standard Code for Information Interchange (ASCII) maps characters to integer codes.
- Seven-bit code; values 0\text{–}127.
- Control characters 0\text{–}31 & 127 (e.g., 0 = NUL, 7 = BEL).
- Printable characters 32\text{–}126 (space, punctuation, digits, upper/lowercase letters, etc.).
- Example (type–conversion)
int a = 65;
printf("a is %d or in ASCII '%c'\n", a, (char)a );
- Output:
a is 65 or in ASCII 'A'
.
- Take-away: Any single C
char
literally is a small integer containing an ASCII code.
Strings in C – General Model
- A string == null-terminated array of
char
in C.- Terminating byte is \text{0x00}, written
'\0'
. - All library functions rely on this terminator; forgetting it creates unterminated strings.
- Library header:
#include <string.h>
exposes string API. - Three semantically identical declarations (compile-time string literal “hello\n”):
char *x = "hello\n";
➔ pointer into read-only memory.char x1[] = "hello\n";
➔ array with compiler-copied bytes.char x2[7] = "hello\n";
➔ explicit length (6 visible chars + null).- All yield identical runtime output:
printf("%s\n", var);
prints “hello”.
How String Storage Differs
- String literal (
char *x
): pointer to read-only segment.- Attempting
x[4] = 'j';
⇒ Segmentation fault.
- Array initialized from literal (
char x1[]
): resides in writable data segment.x1[4] = 'j';
is legal; modifies local copy.
Determining Size vs. Length
sizeof(str)
- Compile-time operator; returns allocated object size.
• Pointer variable → typically 8 bytes on 64-bit machines.
• Array variable → number of elements (includes null terminator if declared from literal).
strlen(str)
- Run-time function; counts characters until first
\0
(excludes terminator).
- Code illustration:
char *str = "text for example"; // 16 chars + \0 in read-only seg.
char str2[17] = "text for example";// 17-byte array in data seg.
printf("sizeof(str) = %lu\n", sizeof(str)); // ⇒ 8
printf("sizeof(str2) = %lu\n", sizeof(str2)); // ⇒ 17
printf("strlen(str) = %lu\n", strlen(str)); // ⇒ 16
String Initialization – Good & Bad
- Good forms (proper null):
char *str1 = "abc";
char str2[] = "abc";
char str3[4] = "abc";
char str5[] = {'a','b','c','\0'};
- Dangerous / Unterminated:
char str4[3] = "abcd";
→ literal truncated, terminator missing.char str6[3] = {'a','b','c'};
→ no '\0'
stored.char str7[9] = {'a','b','c'};
→ terminator missing; rest un-initialized.
- Runtime symptom: Printing unterminated strings leaks adjacent memory until an accidental zero; output like
abc*@
.
Copying Strings – Buffer Overflows
- Unsafe:
strcpy(dest, src)
blindly copies until \0
.
char *str1 = "abcde";
char str2[6], str3[3];
strcpy(str2, str1); // OK (exact fit)
strcpy(str3, str1); // OVERFLOW! overwrites i, stack, etc.
- Demonstrated stomp: local variable
i
changed from 255 to 101.- Moral: Always ensure destination large enough, or switch to length-bounded variants.
n-Variants: Safer Alternatives
- Key idea: Provide max-bytes parameter n.
strncpy(dest, src, n)
strncat(dest, src, n)
strncmp(s1, s2, n)
- Example (safe copy + manual terminator):
strncpy(str3, str1, 2); // copy at most 2 chars → "ab"
str3[2] = 0; // force terminator
- Caveat: If first n bytes of source lack a null, destination may remain unterminated. Always append your own
0
.
Concatenation
- Combine strings end-to-end.
strcat(dest, src);
→ appends entire src after dest’s terminator.strncat(dest, src, n);
→ appends at most n bytes.
- Example:
char str1[20] = "abcde", *str2 = "efghi";
strcat(str1, str2); // str1 = "abcdeefghi"
- Requirement:
dest
must already contain valid terminator and enough space for combined length + null.
String Comparison
- Lexicographic functions:
strcmp(s1, s2)
→ examines until mismatch or null.strncmp(s1, s2, n)
→ examines at most n bytes.
- Return value semantics:
- Use comparisons for sorting, equality checks, etc.
Searching within Strings
- Character search:
strchr(str, ch)
→ first occurrence front-to-back.strrchr(str, ch)
→ last occurrence back-to-front.
- Substring search:
strstr(str, "sub")
→ case-sensitive front search.strcasestr(str, "sub")
→ case-insensitive (GNU extension).
- All return
char*
pointer to found position or NULL
. - Demonstrative output:
strchr : 0xxxFindm...
strrchr: 0xxxxFindme2xxxxx
strstr : Findmexxxx0...
strcasestr: Findmexxxx0... (case-insensitive success)
Parsing Strings with sscanf
- Convert formatted text into typed variables.
- Prototype:
int sscanf(const char *str, const char *fmt, ... );
- Works like inverse of
printf
; pointers to targets mandatory. - Returns number of successfully matched fields.
- Example:
char *src = "1 3.14 a bob";
int i; float f; char c; char s[20];
int ret = sscanf(src, "%d %f %c %s", &i, &f, &c, s);
// ret==4, i==1, f==3.14, c=='a', s=="bob"
Security & Practical Implications
- The standard C string interface is not memory-safe; common source of:
- Buffer overflows (overflowing return addresses, data corruption).
- Information leakage (reading past terminator).
- Crashes (segmentation faults on read-only literals).
- Guidelines:
- Prefer n-variants (
strncpy
, strncat
, strncmp
). - Manually ensure null terminators after bounded copies.
- Follow CERT Secure C Coding Standards.
- Consider safer languages (e.g., Rust) for new systems-level code.
Key Takeaways / Study Checklist
- Understand ASCII mapping and its numerical nature.
- Memorize how C represents strings (null-terminated arrays) and why that matters.
- Distinguish between pointer to literal vs array initializations and mutability.
- Correctly use
sizeof
vs strlen
. - Recognize unterminated string bugs & buffer-overflows.
- Use n-variants for safer copying, concatenation, and comparison.
- Employ search & parse utilities (
strchr
, strstr
, sscanf
) effectively. - Always validate buffer capacity and terminate strings explicitly.
- Keep security in mind; uncontrolled string manipulation is a common exploit path.