File I/O & Modules
Foreword
Thus far in the course all of the storage we have used has been temporary. Data stored in variables is stored in RAM (random access memory). RAM used by a program is released back to the operating system when the program terminates and all data stored there is lost. Data in RAM is volatile - which means it is lost when the program ends. In this chapter we begin working with files. Data stored in files is persistent. That is, data in files remains viable, or accessible, even after the program has terminated. The original program, or another program, can access data stored in a file days, months, or years after it was created.
Attributes of File Input & Output
The following are some of the most relevant attributes about file I/O.
- open - Use the open() method to open files. The open() method returns a file object that can then be used to read/write to a file. The method has the following general case: open(file_name, [access_mode][,buffering]). The file_name is the name of the file that you want to open. See the table below for the access_mode options. The 'r' access_mode is the default. Buffering determines the buffering size when accessing a file. If buffering is set to 1, 1 line will be buffered. If set to 0, no buffering takes place. If set to a negative value, buffering size is set by the system which is the default if no buffering value is supplied. Both of the access_mode and buffering are optional.
The table below contains the access modes that can be supplied as the second argument to the open() method.
File Open Access Modes | |
r | Read only. File pointer placed at the beginning. Default. |
rb | Same as 'r' but in binary form. |
r+ | Read & write. File pointer placed at the beginning. |
rb+ | Same as 'r+' but in binary form. |
w | Write only. Overwrites if exists. Creates if does not exist. |
wb | Same as 'w' but in binary form. |
w+ | Write & read. Overwrites if exists. Creates if does not exist. |
wb+ | Same as 'w+' but in binary form. |
a | Appends. Creates if does not exist. |
ab | Same as 'a' but in binary form. |
a+ | Appends and reads. Creates if does not exist. |
ab+ | Same as 'a+' but in binary form. |
- binary - The binary operations are specified by the b character in the file access mode. Examples of binary file types include: images, PDFs, word proccessing documents, spreadsheets, BLOB (Binary Large Objects), and executables. Every binary file type must be processed with special handling which is typically performed by import modules. For example, PyPDF2 and python-docx are modules used to process PDF and MS Word files respectively.
- close - Use the close() method to flush any unwritten content from the file buffer and close the file object.
- write - The write(string) method writes data (string) to a file.
- read - The read([count]) method reads data from a file. If 'count' is supplied, then that number of bytes will be read. Without count, the default is to attempt to read until the end of file.
- readline - The readline([count])) method reads a line from a file. If 'count' is supplied, then a maximum of that number of bytes will be read.
- tell - The tell() method reports the current file pointer position in a file.
- seek- Use the seek(offset[,from]) method to change the file pointer position. 'offset' is the number of bytes to move. The settings for 'from' are:
- If = 0 then the beginning of the file is the reference point
- If = 1 then the current position is the reference point
- If = 2 then the end of the file is the reference point
- rename - The rename(current_name, new_name) method changes a file's name from current to new.
- remove - The remove(file_name) method deletes the file.
Methods from the 'os' (operating system) module can be used to work with directories. Include the 'import os' statement to use these methods.
- mkdir - The mkdir("new_directory") method makes a directory named by the string supplied.
- chdir - The chdir("/users/bob/new_directory") method changes the current working directory.
- getcwd - The getcwd() method returns the current working directory.
- rmdir- The rmdir("/users/bob/old_directory") method removes the directory supplied.
In the code example below, a few of the file methods are demonstrated.
Here's the output.
Modules
Modules enable programmers to segment their code into smaller, more manageable pieces and to more easily reuse those code sections across multiple programs. This ability to 'modularize' code means that modules can be written once and reused many times.
Attributes of Modules
The following are some of the most commonly relevant attributes of modules.
- Import - Use the import statement to include the module in your program. Only include the file name and not the '.py' extension.
- Reuse - Modules can be imported by multiple programs.
- Calling Functions - In the importing program, call functions in the module by using the module name, followed by the dot operator, followed by the function name (e.g.my_module.f1())
- Module Locations - Python will check for modules in the following locations:
- in the current directory
- at the PYTHONPATH directory setting
- at the Python default path which is /usr/local/lib/python for Linux and C:\pythonx.x\lib for Windows
- Change Propagation - Changes made to the module affect all programs that import the module.
The code example below demonstrates importing modules named bob, jayjay, and phineas. The file friends.py is the primary file and is the file that will be run with the __name__ == __main__. Remember, the name __main__ is assigned by python to the file being run from the command line or being run from within PyCharm.
Note that bob.py and jayjay.py can also be run as standalone scripts (i.e. when running as standalone, those scripts will be named main by python). To observe the effect of not including the check for __name__ == __main__ in files that can be run as standalone, in bob.py, comment lines 19-23 and uncomment line 25. Notice the difference the in the output.
Notice that three variations of imports are shown. The module bob is assigned an alias the_builder which is used on line 18. All of the module jayjay is imported. Only catch_perry() is imported from the phineas module.
Also, the lines "bob.py inlcuded as a module" and "jayjay.py included as a module" are printed in the output. Be sure you understand why the lines were printed.
Here's the output.
If PyCharm does not recognize your local modules, you may need to add your working folder in repos to the Project Structure as shown below. Go to File | Settings | Project : itse1359 | Project Structure and add itse1359 to your Source Folders.
For more practice with importing modules, see this extensive example.