Files and Input Processing

Lecture 11: Files and Input Processing - Fall 2025

Introduction

  • Overview of the topics to be covered in the lecture:

    • Writing to and reading from files

    • Processing strings (particularly to handle input)

Files

Basics of Files
  • Input/Output Basics:

    • Previous methods used for input and output were via the console (terminal).

    • Utilize the print command to output data to the console.

    • Use the input command to receive input from the console and also print messages.

    • Question posited: How do we deal with files?

Definition of Files
  • Files serve as a mechanism for storing information outside of main memory.

  • Accessing this information is different from accessing data in main memory.

  • Characteristics of memory levels:

    • CPU: Registers, Cache (near CPU)

    • Main Memory: Quicker access, but less permanent storage.

    • Secondary Memory (Files):

    • Slower to access

    • Capable of storing more data

    • Data is more long-lasting

    • Offline Memory (e.g., Cloud): Even slower to access but provides long-term storage.

File Extensions
  • File names typically include an extension, indicated by a period followed by a designation; examples:

    • Common extensions: .pdf, .docx, .jpg, .mov, .mp3, .xlsx, .csv, .txt, .dat, etc.

  • The extension provides a hint about how the data in a file is organized but does not guarantee it.

  • Renaming a file’s extension does not change the file's contents; the operating system uses the extension for operational hints.

  • Possibility exists to create files with non-standard extensions but care must be taken to write data in the expected formats.

File Identifier
  • To work with files, designate a file identifier (a variable) that refers to the specific file.

  • Example:

    • fileID will be assigned to the open file.

File Operations

Opening Files
  • Opening Format:

    • Syntax: <fileID> = open("<File Name>", "<designator>")

    • The first component selects the variable for the file ID, enabling reference to that specific file.

    • The assignment operator (=) correlates the ID with the desired file.

    • The open command designates the specific file for operations.

    • The file name string must include the exact name and the extension, as seen in file explorers.

Designators (File Modes)
  • Different designators determine how a file will be accessed:

    • Modes:

    • r: Reading (reading data from an existing file)

    • w: Writing (writing data to a new file)

    • a: Appending (adding data to the end of an existing file)

    • rb, wb, ab: Reading, writing, or appending binary data (used for non-text data)

    • r+: Reading and writing to a file

    • Default mode if no designator is provided is r.

Examples of Opening Files
  • To open and read from Measurements.dat, the code would be:

    • myfile = open('Measurements.dat', 'r')

  • To open Results.out for writing:

    • output_file = open('Results.out', 'w')

  • To open a binary file for writing and reading simultaneously:

    • df = open('data', 'rb+')

Closing Files
  • Importance: Closing files is essential for ensuring the file is left in a valid state post-operation.

  • Syntax for Closing:

    • Format: <fileID>.close()

    • Example calls:

    • myfile.close()

    • output_file.close()

    • df.close()

Alternative File Handling
  • An improved method for opening files uses a with statement:

    • Format: with <open command> as <fileID>:

    • Advantages include automatic closure of the file, preventing errors during read/write operations.

Writing to Files

Writing Operations
  • Assume writing in standard mode (not binary). To write data, use the following command:

    • <fileID>.write(<string to write>)

    • The file identifier must reference a file that is open for writing or appending.

Important Differences Between Write Command and Print Statement
  • The write command will:

    • Only accept single strings; multiple strings can’t be written at once.

    • Require that if numbers are included, they must be converted to a string first.

    • Not automatically add a newline after writing; explicit inclusion of the newline (\n) is needed for line breaks.

Example of Writing to a File
  • Example code:
    ```python
    outfile = open("MyOutput.txt", 'w')
    outfile.write("Testing the write command.\n")
    x = 987
    outfile.write("Here's a number: " + str(x) + '\n')
    outfile.write("And another number:")
    outfile.write(str(21))
    outfile.write("\n")
    outfile.close()

- Result in `MyOutput.txt`:  

Testing the write command.
Here's a number: 987
And another number:21

- The command will create the file in the same directory as the script.  

### Alternate File Location  
- To specify a different directory for a file, include the directory name in the file string as follows:  
  - MacOS:  `infile = open('data/data.txt', 'r+')`  
  - Windows:  `infile = open('data\data.txt', 'w')`  
- Note: Overwriting a file occurs if a new file of the same name is created.  

## Reading from Files  
### Basic Reading Operations  
- Assume working with text files. The most common method to read from a file is to retrieve one line at a time:  
  - Syntax: `<string variable> = <fileID>.readline()`  
- Each line is extracted as a string until encountering a newline character (`\n`).  

### Reading Multiple Lines  
- For a complete file read to process all lines at once, there are multiple methods:  
  - Option 1: Using a `for` loop:  

python
for in :
# Do something with

  - Option 2: Using a `while` loop to read until the end of the file:  

python
nextline = myfile.readline() while nextline != '':
# Do something with nextline nextline = myfile.readline()

### Examples of Reading a File  
- Reading and printing lines:  

python
myfile = open("WarHymn.txt", 'r')
for nextline in myfile: print(nextline, end='')
myfile.close()

- Alternative with `with`:  

python
with open("WarHymn.txt", "r") as myfile:
nextline = myfile.readline() while nextline != '':
print(nextline) nextline = myfile.readline()

### Reading Alternatives  
- To read the entire file into a single string:  
  - Syntax: `<string variable> = <fileID>.read()`  
- To read all lines into a list of strings:  
  - Syntax: `<list variable> = <fileID>.readlines()` or `<list variable> = list(<fileID>)`  

## String Processing  
### Working with Strings  
- After reading files and storing their content in strings, often need to break them up for further processing.  
- One common method is the `split` method that separates the string into a list based on a specified separator.  

### Using the `.split()` Method  
- Syntax:  

python
= .split()

- Examples of separators include `','`, `'
'`, or `':'`.  

### Example of String Splitting  
- Example code:  

python
s = "1,2,3,4"
elems = s.split(',')
print(elems) # Console Output: ['1', '2', '3', '4']

### Useful String Methods  
- Commonly used methods include:  
  - **Strip Method**:  
    - Removes leading and trailing whitespace.  
    - Syntax: `<string variable>.strip()`  
  - **Join Method**:  
    - Concatenates a list of strings with a specified separator.  
    - Syntax: `<string separator>.join(<list of strings>)`  
    - Example of chaining:

python
mystr = " 1,2,3,4,5 \n"
print(mystr.strip())

Output: "1,2,3,4,5"

print(mystr.strip().split(','))

Output: ['1', '2', '3', '4', '5']

print('.'.join(mystr.strip().split(',')))

Output: "1.2.3.4.5"

### Combining Processes  
- An application combining reading, splitting, and processing could look like this:  

python
with open("WarHymn.txt", "r") as myfile:
alllines = myfile.readlines() x = alllines[10].split('-')
y = '.'.join(alllines[11].strip().split('-')) print(x, y) alllines[11] = x
alllines[10] = y print(alllines)

## Conclusion and Best Practices  
### Tips for Handling Files  
- Opening and closing files are straightforward in Python.  
- Processing strings from files requires careful consideration; develop a robust strategy for data management.  
- Always ensure proper error handling when opening/closing files and processing string data.  
- Start by outlining the steps in pseudocode to have a clear direction before coding.  
### Example in Pseudocode  
- To read a student grade file:  


Open file grades.csv
Read entire file
Create an empty dictionary
Skip header line, read the rest of the lines
Split each line by commas
Extract student name and grade
Store name and grade in a dictionary
Format output for printing

### Example Code  

python
with open("grades.csv") as myfile:
lines = myfile.read().split("\n")
students = {}
for i in range(1, len(lines)):
line = lines[i].split(",")
students[line[2]] = {}
students[line[2]]["name"] = line[1] + " " + line[0]
students[line[2]]["grade"] = float(line[4])
for student in students:
print(f"Name: {students[student]["name"]:15s} Grade: {students[student]["grade"]:6.2f}")
```

  • Possible output includes student names and grades formatted neatly.