Thus far in the course all of the storage we have used has been temporary. Data stored in variables is stored in RAM (random access memory). RAM used by a program is released back to the operating system when the program terminates and all data stored there is lost. Data in RAM is volatile - which means it is lost when the program ends. In this chapter we begin working with files. Data stored in files is persistent. That is, data in files remains viable, or accessible, even after the program has terminated. The original program, or another program, can access data stored in a file days, months, or years after it was created.
Attributes of File Input & Output
The following are some of the most relevant attributes about file I/O.
- open - Use the open() method to open files. The open() method returns a file object that can then be used to read/write to a file. The method has the following general case: open(file_name, [access_mode][,buffering]). The file_name is the name of the file that you want to open. See the table below for the access_mode options. The 'r' access_mode is the default. Buffering determines the buffering size when accessing a file. If buffering is set to 1, 1 line will be buffered. If set to 0, no buffering takes place. If set to a negative value, buffering size is set by the system which is the default if no buffering value is supplied. Both of the access_mode and buffering are optional.
- binary - The binary operations are specified by the b character in the file access mode. Examples of binary file types include: images, PDFs, word proccessing documents, spreadsheets, BLOB (Binary Large Objects), and executables. Every binary file type must be processed with special handling which is typically performed by import modules. For example, the PyPDF2 and python-docx are modules used to process PDF and MS Word files respectively.
The table below contains the access modes that can be supplied as the second argument to the open() method.
|File Open Access Modes|
|r||Read only. File pointer placed at the beginning. Default.|
|rb||Same as 'r' but in binary form.|
|r+||Read & write. File pointer placed at the beginning.|
|rb+||Same as 'r+' but in binary form.|
|w||Write only. Overwrites if exists. Creates if does not exist.|
|wb||Same as 'w' but in binary form.|
|w+||Write & read. Overwrites if exists. Creates if does not exist.|
|wb+||Same as 'w+' but in binary form.|
|a||Appends. Creates if does not exist.|
|ab||Same as 'a' but in binary form.|
|a+||Appends and reads. Creates if does not exist.|
|ab+||Same as 'a+' but in binary form.|
- close - Use the close() method to flush any unwritten content from the file buffer and close the file object.
- write - The write(string) method writes data (string) to a file.
- read - The read([count]) method reads data from a file. If 'count' is supplied, then that number of bytes will be read. Without count, the default is to attempt to read until the end of file.
- readline - The readline([count])) method reads a line from a file. If 'count' is supplied, then a maximum of that number of bytes will be read.
- tell - The tell() method reports the current file pointer position in a file.
- seek- Use the seek(offset[,from]) method to change the file pointer position. 'offset' is the number of bytes to move. The settings for 'from' are:
- If = 0 then the beginning of the file is the reference point
- If = 1 then the current position is the reference point
- If = 2 then the end of the file is the reference point
- rename - The rename(current_name, new_name) method changes a file's name from current to new.
- remove - The remove(file_name) method deletes the file.
Methods from the 'os' (operating system) module can be used to work with directories. Include the 'import os' statement to use these methods.
- mkdir - The mkdir("new_directory") method makes a directory named by the string supplied.
- chdir - The chdir("/users/bob/new_directory") method changes the current working directory.
- getcwd - The getcwd() method returns the current working directory.
- rmdir- The rmdir("/users/bob/old_directory") method removes the directory supplied.
In the code example below, a few of the file methods are demonstrated.
Here's the output.
Modules enable programmers to segment their code into smaller, more manageable pieces and to more easily reuse those code sections across multiple programs. This ability to 'modularize' code means that modules can be written once and reused many times.
Attributes of Modules
The following are some of the most commonly relevant attributes of modules.
- Import - Use the import statement to include the module in your program. Only include the file name and not the '.py' extension.
- Reuse - Modules can be imported by multiple programs.
- Calling Functions - In the importing program, call functions in the module by using the module name, followed by the dot operator, followed by the function name (e.g.my_module.f1())
- Module Locations - Python will check for modules in the following locations:
- in the current directory
- at the PYTHONPATH directory setting
- at the Python default path which is /usr/local/lib/python for Linux and C:\pythonx.x\lib for Windows
- Change Propagation - Changes made to the module affect all programs that import the module.
The code example below demonstrates importing a module named 'my_module' and using the module's functions 'f1()' and 'f2()'.
This is the module. Notice on line 24 a print statement is included to show that the file is being included as a module and not being run as the main program.
This is the program that uses the module.
Here's the output. Notice that the first line in the output is from line 24 of my_module.py which demonstrates that the file has been loaded as a module and is not being run as 'main()'.