Chapter 5.9 - 5.10 Summary: File Processing and Chapter Summary
When a statement is broken in the middle of a function (like
print), Python knows it's not finished until the final closing parenthesis is reached.It's preferable to break a long statement across two lines rather than having one really long line.
String formatting in
printcontains slots for dollars (as an integer) and cents.Cents are printed with the specifier
0>2, where the zero pads the field with zeroes instead of spaces.e.g., 10 dollars and 5 cents prints as
$10.05rather than$10 . 5.
/
File Processing
File processing is a form of string processing.
Multi-line Strings
A file is a sequence of data stored in secondary memory (usually on a disk drive).
Files can contain any data type, but text files are the easiest to work with because they can be read/understood by humans.
Text files can be easily created and edited using text editors and word processors.
Python can easily convert between strings and other types, making text files flexible.
A text file can be thought of as a long string stored on disk.
Files generally contain more than one line of text.
A special character (or sequence) marks the end of each line.
Python uses the newline character
$\n$to indicate line breaks, handling different end-of-line conventions.Example:
Input:
Hello World Goodbye 32Stored in a file (or string) as:
Hello\nWorld\n\nGoodbye 32\nA blank line becomes a bare newline in the file/string.
Embedding newline characters into output strings produces multiple lines with a single
printstatement.Evaluating a string containing newline characters in the shell returns the embedded newline representation.
The special characters only affect display when the string is printed.
File Manipulation Concepts:
Opening a file: Associating a file on disk with an object in a program.
Contents can then be accessed through the associated file object.
Operations to manipulate the file object:
Reading information from a file.
Writing new information to a file.
Reading and writing operations for text files are similar to interactive input/output.
Closing a file: Finishing up bookkeeping to maintain the correspondence between the file on disk and the file object.
Changes written to a file object might not appear on disk until the file is closed.
Opening a file in a word processor:
The file is read from the disk and stored into RAM.
The file is opened for reading, and the contents are read into memory via file-reading operations.
The file is then closed (in the programming sense).
Editing the file involves making changes to data in memory, not the file itself.
Changes only show up on the disk when the application "saves" it.
Saving a file:
The original file on the disk is reopened in a mode that allows it to store information (opened for writing).
This erases the old contents of the file.
File writing operations copy the current contents of the in-memory version into the new file on the disk.
From the program's perspective, a new file is created (with the same name), the modified contents are written into it, and then it's closed.
Working with text files in Python:
Create a file object using the
openfunction:
<variable> = open(<name>, <mode>)nameis a string providing the file's name on disk.modeis"r"for reading or"w"for writing.Example:
infile = open("numbers.dat", "r")Opens "numbers.dat" for reading, associating it with the file object
infile.
Python's file reading operations:
<file>.read(): Returns the entire remaining contents of the file as a single (potentially large, multi-line) string.<file>.readline(): Returns the next line of the file, including the newline character.<file>.readlines(): Returns a list of the remaining lines in the file, each including the newline character at the end.
Example program to print a file's content to the screen using
readoperation:
# printfile.py
# Prints a file to the screen.
def main():
fname = input("Enter filename: ")
infile = open(fname, "r")
data = infile.read()
print(data)
main()
The program prompts for a filename, opens it for reading, reads the entire content as one string, and prints it.
The
readlineoperation reads the next line from a file.Successive calls get successive lines.
Analogous to
input(), butreadlinekeeps the newline character, whileinput()discards it.
Code to print the first five lines of a file:
infile = open(someFile, "r")
for i in range(5):
line = infile.readline()
print(line[:-1])
Slicing
[:-1]removes the newline character at the end of the line.Alternatively,
print(line, end="")preventsprintfrom adding its own newline character.Looping through the entire file content using
readlines:
infile = open(someFile, "r")
for line in infile.readlines():
# process the line here
infile.close()
Drawback: reading the entire file into a list at once may take up too much RAM for very large files.
Python treats the file itself as a sequence of lines for direct iteration:
infile = open(someFile, "r")
for line in infile:
# process the line here
infile.close()
Handy way to process the lines of a file one at a time.
Opening a file for writing (
"w") prepares it to receive data.
If the file doesn't exist, a new file is created.
If the file exists, Python will delete it and create a new, empty file.
Ensures you don't clobber files needed later!
Example:
outfile = open("mydata.out", "w")The
printfunction writes information into a text file.
Add a
filekeyword parameter:python print(..., file=<outputFile>)This sends the output to
outputFileinstead of the screen.
Example Program: Batch Usernames
Generates usernames from a file of names in batch mode.
Input file: each line contains the first and last names of a new user, separated by spaces.
Output file: contains a line for each generated username.
Example:
# userfile.py
# Program to create a file of usernames in batch mode.
def main():
print("This program creates a file of usernames from a")
print(" file of names.")
# get the file names
infileName = input("What file are the names in? ")
outfileName = input("What file should the usernames go in? ")
# open the files
infile = open(infileName, "r")
outfile = open(outfileName, "w")
# process each line of the input file
for line in infile:
# get the first and last names from line
first, last = line.split()
# create the username
uname = (first[0] + last[:7]).lower()
# write it to the output file
print(uname, file=outfile)
# close both files
infile.close()
outfile.close()
print("Usernames have been written to", outfileName)
main()
The program opens two files simultaneously (one for input, one for output).
The
lower()string method is applied to the concatenated string to ensure all usernames are lowercase.
File Dialogs (Optional)
File manipulation programs often need file specification.
If a data file is in the same directory as the program, simply type the correct file name.
Python will look for it in the "current" directory.
Operating systems use file names like
<name>.<type>, wheretypeis a 3- or 4-letter extension describing the data.e.g.,
users.txtwhere.txtindicates a text file.
Some operating systems (Windows, macOS) only show the name part, making it hard to get the full name.
If the file is not in the current directory, the complete path must be specified.
Windows example:
C:/users/susan/Documents/Python_Programs/users.txt
Users may not know the complete path+filename.
Solution: allow users to visually browse the file system using a dialog box.
The
tkinterGUI library provides functions for creating dialog boxes for getting file names.askopenfilename: asks the user for the name of a file to open.
from tkinter.filedialog import askopenfilename infileName = askopenfilename()asksaveasfilename: asks the user for the name of a file for saving.
from tkinter.filedialog import asksaveasfilename outfileName = asksaveasfilename()Both functions have optional parameters to customize the dialog (title, default file name).
Strings Are Sequences Of Characters
String literals can be put in single or double quotes.
Strings and lists can be manipulated with the built in sequence operations.
Concatenation (+)
Repetition (*)
Indexing ([])
Slicing ([ : ])
Length (len())
A for loop can be used to iterate through the characters of a string, items in a list, or lines of a file.
One way of converting numeric information into string ifnormation is to use a string or a list as a lookup table.
Lists are more general than strings.
Strings are always sequences of characters, whereas lists can contain values of any type.
Lists are mutable allowing the items in a list to be modified by assigning new values.
Strings are represented in the computer as numeric codes.
ASCII and Unicode are compatible standards that are used for specifying the correpsondence between characters and the underlying codes.
Python provides the ord and chr functions for translating between Unicode codes and characters.
Python string and list objects include many useful built in methods for string and list processing.
The process of encoding data to keep it private is called encryption.
There are two different kinds of encryption systems: private key and public key.
Program input and output often involve string processing.
Python provides numerous operators for converting back and forth between numbers and strings.
The string formatting method (format) is particularly useful for producing nicely formatted output.
Text files are multi line strings stored in secondary memory.
A text file may be opened for realing or writing. When opened for writing, the existing contents of the file are erased.
Python provides three file reading methods: read(), readline(), and readlins().
It is also possible to iterate through the lines of a file with a for loop.
Data is written to a file using the print function. When processing is finished, a file should be closed.