Chapter 5.9 - 5.10 Summary: File Processing and Chapter Summary

  • When a statement is broken in the middle of a function (like print), Python knows it's not finished until the final closing parenthesis is reached.

  • It's preferable to break a long statement across two lines rather than having one really long line.

  • String formatting in print contains slots for dollars (as an integer) and cents.

  • Cents are printed with the specifier 0>2, where the zero pads the field with zeroes instead of spaces.

  • e.g., 10 dollars and 5 cents prints as $10.05 rather than $10 . 5.

/

File Processing
  • File processing is a form of string processing.

Multi-line Strings
  • A file is a sequence of data stored in secondary memory (usually on a disk drive).

  • Files can contain any data type, but text files are the easiest to work with because they can be read/understood by humans.

  • Text files can be easily created and edited using text editors and word processors.

  • Python can easily convert between strings and other types, making text files flexible.

  • A text file can be thought of as a long string stored on disk.

  • Files generally contain more than one line of text.

  • A special character (or sequence) marks the end of each line.

  • Python uses the newline character $\n$ to indicate line breaks, handling different end-of-line conventions.

  • Example:

    • Input:

    Hello
    World
    
    Goodbye 32
    
    • Stored in a file (or string) as: Hello\nWorld\n\nGoodbye 32\n

    • A blank line becomes a bare newline in the file/string.

    • Embedding newline characters into output strings produces multiple lines with a single print statement.

    • Evaluating a string containing newline characters in the shell returns the embedded newline representation.

    • The special characters only affect display when the string is printed.

File Manipulation Concepts:
  • Opening a file: Associating a file on disk with an object in a program.

  • Contents can then be accessed through the associated file object.

  • Operations to manipulate the file object:

    • Reading information from a file.

    • Writing new information to a file.

    • Reading and writing operations for text files are similar to interactive input/output.

  • Closing a file: Finishing up bookkeeping to maintain the correspondence between the file on disk and the file object.

    • Changes written to a file object might not appear on disk until the file is closed.

  • Opening a file in a word processor:

    • The file is read from the disk and stored into RAM.

    • The file is opened for reading, and the contents are read into memory via file-reading operations.

    • The file is then closed (in the programming sense).

    • Editing the file involves making changes to data in memory, not the file itself.

    • Changes only show up on the disk when the application "saves" it.

  • Saving a file:

    • The original file on the disk is reopened in a mode that allows it to store information (opened for writing).

    • This erases the old contents of the file.

    • File writing operations copy the current contents of the in-memory version into the new file on the disk.

    • From the program's perspective, a new file is created (with the same name), the modified contents are written into it, and then it's closed.

  • Working with text files in Python:

    • Create a file object using the open function:

    <variable> = open(<name>, <mode>)
    
    • name is a string providing the file's name on disk.

    • mode is "r" for reading or "w" for writing.

    • Example:

    infile = open("numbers.dat", "r")
    
    • Opens "numbers.dat" for reading, associating it with the file object infile.

  • Python's file reading operations:

    • <file>.read(): Returns the entire remaining contents of the file as a single (potentially large, multi-line) string.

    • <file>.readline(): Returns the next line of the file, including the newline character.

    • <file>.readlines(): Returns a list of the remaining lines in the file, each including the newline character at the end.

  • Example program to print a file's content to the screen using read operation:

  # printfile.py
  # Prints a file to the screen.
  def main():
      fname = input("Enter filename: ")
      infile = open(fname, "r")
      data = infile.read()
      print(data)
  main()
  • The program prompts for a filename, opens it for reading, reads the entire content as one string, and prints it.

  • The readline operation reads the next line from a file.

    • Successive calls get successive lines.

    • Analogous to input(), but readline keeps the newline character, while input() discards it.

  • Code to print the first five lines of a file:

  infile = open(someFile, "r")
  for i in range(5):
      line = infile.readline()
      print(line[:-1])
  • Slicing [:-1] removes the newline character at the end of the line.

  • Alternatively, print(line, end="") prevents print from adding its own newline character.

    • Looping through the entire file content using readlines:

  infile = open(someFile, "r")
  for line in infile.readlines():
      # process the line here
  infile.close()
  • Drawback: reading the entire file into a list at once may take up too much RAM for very large files.

    • Python treats the file itself as a sequence of lines for direct iteration:

  infile = open(someFile, "r")
  for line in infile:
      # process the line here
  infile.close()
  • Handy way to process the lines of a file one at a time.

    • Opening a file for writing ("w") prepares it to receive data.

  • If the file doesn't exist, a new file is created.

  • If the file exists, Python will delete it and create a new, empty file.

  • Ensures you don't clobber files needed later!

  • Example:

    outfile = open("mydata.out", "w")
    
    • The print function writes information into a text file.

  • Add a file keyword parameter:
    python print(..., file=<outputFile>)

  • This sends the output to outputFile instead of the screen.

Example Program: Batch Usernames
  • Generates usernames from a file of names in batch mode.

  • Input file: each line contains the first and last names of a new user, separated by spaces.

  • Output file: contains a line for each generated username.

  • Example:

  # userfile.py
  # Program to create a file of usernames in batch mode.
  def main():
      print("This program creates a file of usernames from a")
      print(" file of names.")

      # get the file names
      infileName = input("What file are the names in? ")
      outfileName = input("What file should the usernames go in? ")

      # open the files
      infile = open(infileName, "r")
      outfile = open(outfileName, "w")

      # process each line of the input file
      for line in infile:
          # get the first and last names from line
          first, last = line.split()

          # create the username
          uname = (first[0] + last[:7]).lower()

          # write it to the output file
          print(uname, file=outfile)

      # close both files
      infile.close()
      outfile.close()

      print("Usernames have been written to", outfileName)
  main()
  • The program opens two files simultaneously (one for input, one for output).

  • The lower() string method is applied to the concatenated string to ensure all usernames are lowercase.

File Dialogs (Optional)
  • File manipulation programs often need file specification.

  • If a data file is in the same directory as the program, simply type the correct file name.

    • Python will look for it in the "current" directory.

  • Operating systems use file names like <name>.<type>, where type is a 3- or 4-letter extension describing the data.

    • e.g., users.txt where .txt indicates a text file.

  • Some operating systems (Windows, macOS) only show the name part, making it hard to get the full name.

  • If the file is not in the current directory, the complete path must be specified.

    • Windows example: C:/users/susan/Documents/Python_Programs/users.txt

  • Users may not know the complete path+filename.

  • Solution: allow users to visually browse the file system using a dialog box.

  • The tkinter GUI library provides functions for creating dialog boxes for getting file names.

    • askopenfilename: asks the user for the name of a file to open.

    from tkinter.filedialog import askopenfilename
    infileName = askopenfilename()
    
    • asksaveasfilename: asks the user for the name of a file for saving.

    from tkinter.filedialog import asksaveasfilename
    outfileName = asksaveasfilename()
    
  • Both functions have optional parameters to customize the dialog (title, default file name).

Strings Are Sequences Of Characters
  • String literals can be put in single or double quotes.

  • Strings and lists can be manipulated with the built in sequence operations.

    • Concatenation (+)

    • Repetition (*)

    • Indexing ([])

    • Slicing ([ : ])

    • Length (len())

  • A for loop can be used to iterate through the characters of a string, items in a list, or lines of a file.

  • One way of converting numeric information into string ifnormation is to use a string or a list as a lookup table.

  • Lists are more general than strings.

    • Strings are always sequences of characters, whereas lists can contain values of any type.

    • Lists are mutable allowing the items in a list to be modified by assigning new values.

  • Strings are represented in the computer as numeric codes.

    • ASCII and Unicode are compatible standards that are used for specifying the correpsondence between characters and the underlying codes.

    • Python provides the ord and chr functions for translating between Unicode codes and characters.

  • Python string and list objects include many useful built in methods for string and list processing.

  • The process of encoding data to keep it private is called encryption.

    • There are two different kinds of encryption systems: private key and public key.

  • Program input and output often involve string processing.

    • Python provides numerous operators for converting back and forth between numbers and strings.

    • The string formatting method (format) is particularly useful for producing nicely formatted output.

  • Text files are multi line strings stored in secondary memory.

    • A text file may be opened for realing or writing. When opened for writing, the existing contents of the file are erased.

    • Python provides three file reading methods: read(), readline(), and readlins().

    • It is also possible to iterate through the lines of a file with a for loop.

    • Data is written to a file using the print function. When processing is finished, a file should be closed.