How to Think Like a Computer Scientist: Learning with Python 2nd Edition/Modules and files
Modules and files
A module is a file containing Python definitions and statements intended for use in other Python programs. There are many Python modules that come with Python as part of the standard library. We have seen two of these already, the doctest module and the string module.
You can use pydoc to search through the Python libraries installed on your system. At the command prompt type the following:
and the following will appear:
pydoc tk window (note: see exercise 2 if you get an error)
Click on the open browser button to launch a web browser window containing the documentation generated by pydoc:
Python: Index of Modules This is a listing of all the python libraries found by Python on your system. Clicking on a module name opens a new page with documenation for that module. Clicking keyword, for example, opens the following page:
Python: module keyword Documentation for most modules contains three color coded sections:
- Classes in pink
- Functions in orange
- Data in green
Classes will be discussed in later chapters, but for now we can use pydoc to see the functions and data contained within modules.
The keyword module contains a single function, iskeyword, which as its name suggests is a boolean function that returns True if a string passed to it is a keyword:
The data item, kwlist contains a list of all the current keywords in Python:
We encourage you to use pydoc to explore the extensive libraries that come with Python. There are so many treasures to discover!
All we need to create a module is a text file with a .py extension on the filename:
We can now use our module in both scripts and the Python shell. To do so, we must first import the module. There are two ways to do this:
In the first example, remove_at is called just like the functions we have seen previously. In the second example the name of the module and a dot (.) are written before the function name.
Notice that in either case we do not include the .py file extension when importing. Python expects the file names of Python modules to end in .py, so the file extention is not included in the import statement.
The use of modules makes it possible to break up very large programs into managable sized parts, and to keep related parts together.
A namespace is a syntactic container which permits the same name to be used in different modules or functions (and as we will see soon, in classes and methods).
Each module determines its own namespace, so we can use the same name in multiple modules without causing an identification problem.
We can now import both modules and access question and answer in each:
If we had used from module1 import * and from module2 import * instead, we would have a naming collision and would not be able to access question and answer from module1.
Functions also have their own namespace:
Running this program produces the following output:
The three n's here do not collide since they are each in a different namespace.
Namespaces permit several programmers to work on the same project without having naming collisions.
Attributes and the dot operator
Variables defined inside a module are called attributes of the module. They are accessed by using the dot operator ( .). The question attribute of module1 and module2 are accessed using module1.question and module2.question.
Modules contain functions as well as attributes, and the dot operator is used to access them in the same way. seqtools.remove_at refers to the remove_at function in the seqtools module.
In Chapter 7 we introduced the find function from the string module. The string module contains many other useful functions:
You should use pydoc to browse the other functions and attributes in the string module.
String and list methods
As the Python language developed, most of functions from the string module have also been added as methods of string objects. A method acts much like a function, but the syntax for calling it is a bit different:
String methods are built into string objects, and they are invoked (called) by following the object with the dot operator and the method name.
We will be learning how to create our own objects with their own methods in later chapters. For now we will only be using methods that come with Python's built-in objects.
The dot operator can also be used to access built-in methods of list objects:
append is a list method which adds the argument passed to it to the end of the list. Continuing with this example, we show several other list methods:
Experiment with the list methods in this example until you feel confident that you understand how they work.
Reading and writing text files
While a program is running, its data is stored in random access memory (RAM). RAM is fast and inexpensive, but it is also volatile, which means that when the program ends, or the computer shuts down, data in RAM disappears. To make data available the next time you turn on your computer and start your program, you have to write it to a non-volatile storage medium, such a hard drive, usb drive, or CD-RW.
Data on non-volatile storage media is stored in named locations on the media called files. By reading and writing files, programs can save information between program runs.
Working with files is a lot like working with a notebook. To use a notebook, you have to open it. When you're done, you have to close it. While the notebook is open, you can either write in it or read from it. In either case, you know where you are in the notebook. You can read the whole notebook in its natural order or you can skip around.
All of this applies to files as well. To open a file, you specify its name and indicate whether you want to read or write.
Opening a file creates a file object. In this example, the variable myfile refers to the new file object.
The open function takes two arguments. The first is the name of the file, and the second is the mode. Mode 'w' means that we are opening the file for writing.
If there is no file named test.dat, it will be created. If there already is one, it will be replaced by the file we are writing.
When we print the file object, we see the name of the file, the mode, and the location of the object.
To put data in the file we invoke the write method on the file object:
Closing the file tells the system that we are done writing and makes the file available for reading:
Now we can open the file again, this time for reading, and read the contents into a string. This time, the mode argument is 'r' for reading:
If we try to open a file that doesn't exist, we get an error:
Not surprisingly, the read method reads data from the file. With no arguments, it reads the entire contents of the file into a single string:
There is no space between time and to because we did not write a space between the strings.
read can also take an argument that indicates how many characters to read:
If not enough characters are left in the file, read returns the remaining characters. When we get to the end of the file, read returns the empty string:
The following function copies a file, reading and writing up to fifty characters at a time. The first argument is the name of the original file; the second is the name of the new [file: file:]
This functions continues looping, reading 50 characters from infile and writing the same 50 charaters to outfile until the end of infile is reached, at which point text is empty and the break statement is executed.
A text file is a file that contains printable characters and whitespace, organized into lines separated by newline characters. Since Python is specifically designed to process text files, it provides methods that make the job easy.
To demonstrate, we'll create a text file with three lines of text separated by newlines:
The readline method reads all the characters up to and including the next newline character:
readlines returns all of the remaining lines as a list of strings:
In this case, the output is in list format, which means that the strings appear with quotation marks and the newline character appears as the escape sequence \\012.
At the end of the file, readline returns the empty string and readlines returns the empty list:
The following is an example of a line-processing program. filter makes a copy of oldfile, omitting any lines that begin with #:
The continue statement ends the current iteration of the loop, but continues looping. The flow of execution moves to the top of the loop, checks the condition, and proceeds accordingly.
Thus, if text is the empty string, the loop exits. If the first character of text is a hash mark, the flow of execution goes to the top of the loop. Only if both conditions fail do we copy text into the new file.
Files on non-volatile storage media are organized by a set of rules known as a file system. File systems are made up of files and directories, which are containers for both files and other directories.
When you create a new file by opening it and writing, the new file goes in the current directory (wherever you were when you ran the program). Similarly, when you open a file for reading, Python looks for it in the current directory.
If you want to open a file somewhere else, you have to specify the path to the file, which is the name of the directory (or folder) where the file is located:
This example opens a file named words that resides in a directory named dict, which resides in share, which resides in usr, which resides in the top-level directory of the system, called /. It then reads in each line into a list using readlines, and prints out the first 5 elements from that list.
You cannot use / as part of a filename; it is reserved as a delimiter between directory and filenames.
The file /usr/share/dict/words should exist on unix based systems, and contains a list of words in alphabetical order.
The ord function returns the integer representation of a character:
This example explains why 'Apple' < 'apple' evaluates to True.
The chr function is the inverse of ord. It takes an integer as an argument and returns its character representation:
The following program, countletters.py counts the number of times each character occurs in the book [[./resources/ch10/alice_in_wonderland.txt|Alice in Wonderland]]_:
Run this program and look at the output file it generates using a text editor. You will be asked to analyze the program in the exercises below.
The sys module and argv
The sys module contains functions and variables which provide access to the environment in which the python interpreter runs.
The following example shows the values of a few of these variables on one of our systems:
Starting Jython on the same machine produces different values for the same variables:
The results will be different on your machine of course.
The argv variable holds a list of strings read in from the command line when a Python script is run. These command line arguments can be used to pass information into a program at the same time it is invoked.
Running this program from the unix command prompt demonstrates how sys.argv works:
$ python demo_argv.py this and that 1 2 3 ['demo_argv.py', 'this', 'and', 'that', '1', '2', '3'] $
argv is a list of strings. Notice that the first element is the name of the program. Arguments are separated by white space, and separated into a list in the same way that string.split operates. If you want an argument with white space in it, use quotes:
$ python demo_argv.py "this and" that "1 2" 3 ['demo_argv.py', 'this and', 'that', '1 2', '3'] $
With argv we can write useful programs that take their input directly from the command line. For example, here is a program that finds the sum of a series of numbers:
In this program we use the from <module> import <attribute> style of importing, so argv is brought into the module's main namespace.
We can now run the program from the command prompt like this:
You are asked to write similar programs as exercises.
Complete the following:
- Start the pydoc server with the command pydoc -g at the command prompt.
- Click on the open browser button in the pydoc tk window.
- Find the calendar module and click on it.
While looking at the Functions section, try out the following in a Python shell:
Experiment with calendar.isleap. What does it expect as an argument? What does it return as a result? What kind of a function is this?
If you don't have Tkinter installed on your computer, then pydoc -g will return an error, since the graphics window that it opens requires Tkinter. An alternative is to start the web server directly:
$ pydoc -p 7464
This starts the pydoc web server on port 7464. Now point your web browser at:
and you will be able to browse the Python libraries installed on your system. Use this approach to start pydoc and take a look at the math module.
- How many functions are in the math module?
- What does math.ceil do? What about math.floor? ( hint: both floor and ceil expect floating point arguments.)
- Describe how we have been computing the same value as math.sqrt without using the math module.
- What are the two data contstants in the math module?
- Use pydoc to investigate the copy module. What does deepcopy do? In which exercises from last chapter would deepcopy have come in handy?
Create a module named mymodule1.py. Add attributes myage set to your current age, and year set to the current year. Create another module named mymodule2.py. Add attributes myage set to 0, and year set to the year you were born. Now create a file named namespace_test.py. Import both of the modules above and write the following statement:When you will run namespace_test.py you will see either True or False as output depending on whether or not you've already had your birthday this year.
Add the following statement to mymodule1.py, mymodule2.py, and namespace_test.py from the previous exercise:
Run namespace_test.py. What happens? Why? Now add the following to the bottom of mymodule1.py:Run mymodule1.py and namespace_test.py again. In which case do you see the new print statement?
In a Python shell try the following:What does Tim Peter's have to say about namespaces?
- Use pydoc to find and test three other functions from the string module. Record your findings.
- Rewrite matrix_mult from the last chapter using what you have learned about list methods.
- The dir function, which we first saw in Chapter 7, prints out a list of the attributes of an object passed to it as an argument. In other words, dir returns the contents of the namespace of its argument. Use dir(str) and dir(list) to find at least three string and list methods which have not been introduced in the examples in the chapter. You should ignore anything that begins with double underscore (__) for the time being. Be sure to make detailed notes of your findings, including names of the new methods and examples of their use. ( hint: Print the docstring of a function you want to explore. For example, to find out how str.join works, print str.join.__doc__)
Give the Python interpreter's response to each of the following from a continuous interpreter session:
Be sure you understand why you get each result. Then apply what you have learned to fill in the body of the function below using the split and join methods of str objects:Your solution should pass all doctests.
Create a module named wordtools.py with the following at the bottom:
Explain how this statement makes both using and testing this module convenient. What will be the value of __name__ when wordtools.py is imported from another module? What will it be when it is run as a main program? In which case will the doctests run? Now add bodies to each of the following functions to make the doctests pass:Save this module so you can use the tools it contains in your programs.
- unsorted_fruits.txt_ contains a list of 26 fruits, each one with a name that begins with a different letter of the alphabet. Write a program named sort_fruits.py that reads in the fruits from unsorted_fruits.txt and writes them out in alphabetical order to a file named sorted_fruits.txt.
Answer the following questions about countletters.py:
Explain in detail what the three lines do:What is would type(text) return after these lines have been executed?
- What does the expression 128 *  evaluate to? Read about ASCII_ in Wikipedia and explain why you think the variable, counts is assigned to 128 *  in light of what you read.
What doesdo to counts?
- Explain the purpose of the display function. Why does it check for values 10, 13, and 32? What is special about those values?
Describe in detail what the linesdo. What will be in alice_counts.dat when they finish executing?
Finally, explain in detail what
does. What is the purpose of if counts[i]?
Write a program named mean.py that takes a sequence of numbers on the command line and returns the mean of their values.:
$ python mean.py 3 4 3.5 $ python mean.py 3 4 5 4.0 $ python mean.py 11 15 94.5 22 35.625A session of your program running on the same input should produce the same output as the sample session above.
Write a program named median.py that takes a sequence of numbers on the command line and returns the median of their values.:
$ python median.py 3 7 11 7 $ python median.py 19 85 121 85 $ python median.py 11 15 16 22 15.5A session of your program running on the same input should produce the same output as the sample session above.
Modify the countletters.py program so that it takes the file to open as a command line argument. How will you handle the naming of the output file?