Python Programming/Print version
From Wikibooks, the open-content textbooks collection
Python is a general purpose programming language.
Note: current version of this book can be found at http://en.wikibooks.org/wiki/Python_Programming
Table of contents
Introduction
Learning to program in Python
Python concepts
Rocking the Python (Modules)
- Regular Expression
- Graphical User Interfaces in Python
- Python Programming/Game Programming in Python
- Socket programming
- Files (I/O)
- Databases
- Extracting info from web pages
- Threading
- Extending with C
- Extending with C++
- WSGI web programming
References
Authors
License
Overview
Python is a high-level, structured, open-source programming language that can be used for a wide variety of programming tasks. It is good for simple quick-and-dirty scripts, as well as complex and intricate applications.
It is an interpreted programming language that is automatically compiled into bytecode before execution (the bytecode is then normally saved to disk, just as automatically, so that compilation need not happen again until and unless the source gets changed). It is also a dynamically typed language that includes (but does not require one to use) object oriented features and constructs.
The most unusual aspect of Python is that whitespace is significant; instead of block delimiters (braces → "{}" in the C family of languages), indentation is used to indicate where blocks begin and end.
For example, the following Python code can be interactively typed at an interpreter prompt, to display the beginning values in the Fibonacci series:
>>> a,b = 0,1 >>> print b 1 >>> while b < 100: ... a,b = b,(a+b) ... print b, ... 1 2 3 5 8 13 21 34 55 89 144
Another interesting aspect in Python is reflection. The dir() function returns the list of the names of objects in the current scope. However, dir(object) will return the names of the attributes of the specified object. The locals() routine returns a dictionary in which the names in the local namespace are the keys and their values are the objects to which the names refer. Combined with the interactive interpreter, this provides a useful environment for exploration and prototyping.
Python provides a powerful assortment of built-in types (e.g., lists, dictionaries and strings), a number of built-in functions, and a few constructs, mostly statements. For example, loop constructs that can iterate over items in a collection instead of being limited to a simple range of integer values. Python also comes with a powerful standard library, which includes hundreds of modules to provide routines for a wide variety of services including regular expressions and TCP/IP sessions.
Python is used and supported by a large Python Community that exists on the Internet. The mailing lists and news groups like the tutor list actively support and help new python programmers. While they discourage doing homework for you, they are quite helpful and are populated by the authors of many of the Python textbooks currently available on the market.
| Index | Next: Getting Python |
Getting Python
In order to program in Python you need the Python interpreter.
Installing Python in Windows
Go to the Python Homepage or the ActiveState website and get the proper version for your platform. Download it, read the instructions and get it installed.
In order to run Python from the command line, you will need to have the python directory in your PATH. Alternatively, you could use an Integrated Development Environment (IDE) for Python like DrPython[1], eric[2], PyScripter[3], or Python's own IDLE (which ships with every version of Python since 2.3).
Installing Python on Mac
Users on Apple Mac OS X will find that it already ships with Python 2.3 (OS X 10.4 Tiger), but if you want the more recent version head to Python Download Page follow the instruction on the page and in the installers. As a Bonus you will also install the Python IDE.
Installing Python on Ubuntu
Users of Ubuntu 6.04 (Dapper Drake) and earlier will notice that Python comes installed by default, only it sometimes is not the latest version. If you would like to update it, just open a terminal and type at the prompt:
$ sudo apt-get update # This will update the software repository $ sudo apt-get install python # This one will actually install python
| Previous: Overview | Index | Next: Setting it up |
Interactive mode
| Previous: Setting it up | Index | Next: Creating Python programs |
Python has two basic modes: The normal "mode" is the mode where the scripted and finished .py files are run in the python interpreter. Interactive mode is a command line shell which gives immediate feedback for each statement, while running previously fed statements in active memory. As new lines are fed into the interpreter, the fed program is evaluated both in part and in whole.
To get into interactive mode, simply type "python" without any arguments. This is a good way to play around and try variations on syntax. Python should print something like this:
$ python Python 2.3.4 (#2, Aug 29 2004, 02:04:10) [GCC 3.3.4 (Debian 1:3.3.4-9)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>>
(If Python wouldn't run, make sure your path is set correctly. See Getting Python.)
The >>> is Python's way of telling you that you are in interactive mode. In interactive mode what you type is immediately run. Try typing 1+1 in. Python will respond with 2. Interactive mode allows you to test out and see what Python will do. If you ever feel the need to play with new Python statements, go into interactive mode and try them out.
A sample interactive session:
>>> 5 5 >>> print 5*7 35 >>> "hello" * 4 'hellohellohellohello' >>> "hello".__class__ <type 'str'>
However, you need to be careful in the interactive environment. If you aren't careful, confusion may ensue. For example, the following is a valid Python script:
if 1: print "True" print "Done"
If you try to enter this, as written in the interactive environment, you might be surprised by the result:
>>> if 1:
... print "True"
... print "Done"
File "<stdin>", line 3
print "Done"
^
SyntaxError: invalid syntax
What the interpreter is saying is that the indentation of the second print was unexpected. What you should have entered was a blank line, to end the first (i.e., "if") statement, before you started writing the next print statement. For example, you should have entered the statements as though they were written:
if 1: print "True" print "Done"
Which would have resulted in the following:
>>> if 1: ... print "True" ... True >>> print "Done" Done >>>
| Previous: Setting it up | Index | Next: Creating Python programs |
Creating Python programs
| Previous: Interactive mode | Index | Next: Variables and Strings |
Welcome to Python! This tutorial will show you how to start writing programs.
Python programs are nothing more than text files, and they may be edited with a standard text editor program.[1] What text editor you use will probably depend on your operating system: any text editor can create Python programs. It is easier to use a text editor that includes Python syntax highlighting, however.
Hello, World!
The first program that every programmer writes is called the "Hello, World!" program. This program simply outputs the phrase "Hello, World!" and then quits. Let's write "Hello, World!" in Python!
Open up your text editor and create a new file called hello.py containing just this line (you can copy-paste if you want):
print "Hello, world!"
This program uses the print statement, which simply outputs the rest of that line to the terminal. print ends with a newline character, which simply moves the cursor to the next line: if you want to continue printing on the same line, you can end the print statement with a comma, like so:
print "Hello, world!",
Now that you've written your first program, let's run it in Python! This process differs slightly depending on your operating system.
Windows
- Create a folder on your computer to use for your Python programs, such as
C:\pythonpractice, and save yourhello.pyprogram in that folder. - In the Start menu, select "Run...", and type in
cmd. This is cause the Windows terminal to open. - Type
cd \pythonpracticeto change directory to yourpythonpracticefolder, and hit Enter. - Type
python hello.pyto run your program!
If it didn't work, make sure your PATH contains the python directory. See Getting Python.
Mac
- Create a folder on your computer to use for your Python programs. A good suggestion would be to name it
pythonpracticeand place it in your Home folder (the one that contains folders for Documents, Movies, Music, Pictures, etc). Save yourhello.pyprogram into this folder. - Open the Applications folder, go into the Utilities folder, and open the Terminal program.
- Type
cd ~/pythonpracticeto change directory to yourpythonpracticefolder, and hit Enter. - Type
python hello.pyto run your program!
Linux
- Create a folder on your computer to use for your Python programs, such as
~/pythonpractice, and save yourhello.pyprogram in that folder. - Open up the terminal program. In KDE, open the main menu and select "Run Command..." to open Konsole. In GNOME, open the main menu, open the Applications folder, open the Accessories folder, and select Terminal.
- Type
cd ~/pythonpracticeto change directory to yourpythonpracticefolder, and hit Enter. - Type
python hello.pyto run your program!
Result
The program should print Hello, world!. Congratulations! You're well on your way to becoming a Python programmer.
Interactive mode
Instead of Python exiting when the program is finished, you can use the -i flag to start an interactive session. This can be very useful for debugging and prototyping.
python -i hello.py
Exercises
- Modify the
hello.pyprogram to say hello to a historical political leader (or to Ada Lovelace). - Change the program so that after the greeting, it asks, "How did you get here?".
- Re-write the original program to use two
printstatements: one for "Hello" and one for "world". The program should still only print out on one line.
Notes
- ↑ Sometimes, Python programs are distributed in compiled form. We won't have to worry about that for quite a while.
| Previous: Interactive mode | Index | Next: Variables and Strings |
Using variables and math
| Previous: Creating Python programs | Index | Next: Strings and arrays |
Using a variable
A variable is something with a value that may change. In Python, variables are dynamically typed, meaning that in the same program, a variable can have a number as a value, it can later be treated as a string, or vice versa. Here is a program that uses a variable:
#!/usr/bin/python name = 'Ada Lovelace' print "Goodbye, " + name + '!'
(Oops! I used single quotes for Ada's name, then double quotes around Goodbye. That's OK, however, because these two quotes do exactly the same thing in Python. The only thing you can't do is mix them and try to make a string like this: "will not work'.)
This program isn't much use, of course. But what about variables that the program truly can't guess about?
raw_input()
#!/usr/bin/python print 'Please enter your name.' name = raw_input() print 'How are you, ' + name + '?'
(What's raw_input() doing? Evidently, it's getting input from you. See Input and output.)
Of course, with the power of Python at hand, the urge to determine one's mass in stone is nearly irresistible. A concise program can make short work of this task. Since a stone is 14 pounds, and there are about 2.2 pounds in a kilogram, the following formula should do the trick:

Simple math
#!/usr/bin/python print "What is your mass in kilograms?", mass_kg = int(raw_input()) mass_stone = mass_kg * 2.2 / 14 print "You weigh " + str(mass_stone) + " stone."
Run this program and get your weight in stone!
This program is starting to get a little bit cluttered. That's because, in addition to all the math, I snuck in some new features.
- When the previous program asked for your name, you were typing below the question. This time, you're typing at the end of the line that asks, "What is your mass in kilograms?". What's happening here is that, normally, the print statement will add a newline to the end of what you're printing. That's why the cursor went to the next line in the previous program. But in this program, I added a little comma to the end. That makes print omit the newline.
- int() - this handy function takes a string, and returns an integer. Remember when you read that Python is strongly typed? Python won't allow us to do math on a string. Whatever you type is a string, even if it consists of digits. But int() will recognize a string made of digits and return an integer.
- The str(mass_stone) in the print statement. It turns out that you can't add together strings and numbers; "You weigh " + mass_stone just wouldn't work. So, we have to take the number and turn it into a string. Incidentally, ` would do the same thing as the str() function, but that is deprecated.
Formatting output
In the previous program, we used this line of code to print the result:
print "You weigh " + str(mass_stone) + " stone."
There are a couple of problems with this. First, it mixes up operators and quotes, and can be a little tough to read. Second, the number won't be printed very nicely, as the following example illustrates:
$ ./kg2stone
What is your mass in kilograms? 65 You weigh 10.214285714285714 stone.
Not only is that much accuracy unjustified, it doesn't look nice. Python's % operator comes to the rescue. It allows printf-like formatting, in the form:
STRING % (arg1, arg2, ...)
The string contains one format code for each argument. There are several types of format codes; see the strings section for a complete list.
To improve our program, we just need the %f format code:
print "You weigh %.1f stone." % (mass_stone)
The %.1f format code causes a floating point number to be printed, with exactly one digit after the decimal. This produces much nicer output:
$ ./kg2stone
What is your mass in kilograms? 65 You weigh 10.2 stone.
| Previous: Creating Python programs | Index | Next: Strings and arrays |
Strings and arrays
| Previous: Using variables and math | Index | Next: Decision Control |
Strings
What is a string? In computer programming and formal language theory, (and other branches of mathematics), a string is an ordered sequence of symbols. These symbols are chosen from a predetermined set. One of the most powerful features of Python is the use of the primitive data types Strings. Its ability to work with Strings is what attracts many programmers to Python. There are three ways of calling a string in Python.
The first is to surround the characters with single quotes(‘ ‘), the second is to surround the string in double quotes (“ “), and lastly there is a triple quote method (“”” “””). Each has different advantages.
If you need to use a single or double quote character in the string itself you can use the other to separate it from the code itself. For example, “The symbol for feet is '” would return the string: The symbol for feet is '. Conversely, ‘The symbol for inches is “’ would return the string: The symbol for inches is “.
However ‘The symbol for feet is ‘’ and “The symbol for inches is “” would both generate an error because the Python interpreter can’t tell which quote is supposed to be printed and which terminates the string. You can fix this problem by adding the character \ before the ‘ or “ which tells Python that you want that character included in the string itself. If you need to include both characters in the string you can use triple quotes like this: “””The symbol for feet is ‘ and the symbol for inches is “ ”””(Note: if you leave out the space at the end of the string in this instance it will generate an error, however as long as the quote isn’t at the end or the beginning of the string it should execute as expected.)
Let's do an example:
>>> print 'I am a single quoted string' I am a single quoted string >>> print "I am a double quoted string" I am a double quoted string >>> print """I am a triple quoted string""" I am a triple quoted string
Not only can you create strings, but you can also operate on them (such as concatenation). For example:
#!/usr/bin/python print 'Spam' + 'Eggs' print 'Ni!' * 10
This will print out SpamEggs on one line and Ni!Ni!Ni!Ni!Ni!Ni!Ni!Ni!Ni!Ni! on the next.
Python lets you access a small part, or "slice" as it is generally called, of a string. For example :
#!/usr/bin/python string = "And now for something completely different." print string[22:42]
This prints completely different. When you slice a string, Python will begin reading the string from the index on the left side of the colon (:) and end at the character with index one less than the index on the right side of the colon. In the above example, if the variable string was just ten characters wide, and you asked Python to return string[22:42], an out-of-index error would be generated. However, if you said string[5:10000], there would be no errors. Python would just read the string from the fifth character to the end of the string.
Lists
When you want to deal with many Python objects efficiently, you will want to use an array object, such as a list. A list is created using square brackets and values are separated by commas. You can access an array as a whole, or as individual items. For example:
#!/usr/bin/python spam = ['eggs', 42, 'bacon'] print spam print spam[0]
The first print statement displays the entire list, while the second only displays the first item, 'eggs'. Individual items are accessed (indexed is the correct term) in a 0-based manner; that is to say, the first item is index 0, the second is index 1, and so on. If you use a negative index, it offsets from the end of the list (e.g. -1 is the last item, -2 is the second to last item, etc).
You can also modify lists:
#!/usr/bin/python spam = ['eggs', 42, 'bacon'] print "Before modification:", spam spam[1] = 5 print "After modification:", spam
This will change the second item in the list to 5.
To add an item to the end of a list, create the item at the index off the end of the list. To do that you use the length of the list (note the colon after the index):
#!/usr/bin/python spam = ['eggs', 1, 'bacon'] print "Length: ", len(spam) spam[len(spam):] = [9] print spam
A 9 will be appended to the end of the list.
If you want to insert a string into a list you need to use brackets around the string:
#!/usr/bin/python mylist1 = [] mylist2 = [] mystring = "breakfast" mylist1[0:] = mystring print mylist1 mylist2[0:] = [mystring] print mylist2 #['b','r','e','a','k','f','a','s','t'] #['breakfast']
Tuples
Tuples are similar to lists, except you cannot change them. You can reassign the variable, but you cannot change individual members.
#!/usr/bin/python spam = ('eggs', 42, 'bacon') print spam print spam[0]
| Previous: Using variables and math | Index | Next: Decision Control |
Basic syntax
| Previous: Strings and arrays | Index | Next: Data types |
There are four fundamental concepts in Python.
Case Sensitivity
All variables are case-sensitive. Python treats 'number' and 'Number' as separate, unrelated entities.
Spaces and tabs don't mix
Because whitespace is significant, remember that spaces and tabs don't mix, so use only one or the other when indenting your programs. A common error is to mix them. While they may look the same in editor the interpreter will read them differently and it will result in either an error or unexpected behavior. However, tabs advance to the next multiple of 8 columns, so changing your tab width to 8 (in other words, a tab "stop" on every 8th column) in your editor helps if you find yourself frequently making this mistake.
Objects
In Python, like all object oriented languages, there are aggregations of code and data called Objects, which typically represent the pieces in a conceptual model of a system.
Objects in Python are created (i.e., instantiated) from templates called Classes (which are covered later, as much of the language can be used without understanding classes). They have "attributes", which represent the various pieces of code and data which comprise the object. To access attributes, one writes the name of the object followed by a period (henceforth called a dot), followed by the name of the attribute.
An example is the 'upper' attribute of strings, which refers to the code that returns a copy of the string in which all the letters are uppercase. To get to this, it is necessary to have a way to refer to the object (in the following example, the way is the literal string that constructs the object).
'bob'.upper
Code attributes are called "methods". So in this example, upper is a method of 'bob' (as it is of all strings). To execute the code in a method, use a matched pair of parentheses surrounding a comma separated list of whatever arguments the method accepts (upper doesn't accept any arguments). So to find an uppercase version of the string 'bob', one could use the following:
'bob'.upper()
Scope
In a large system, it is important that one piece of code does not affect another in difficult to predict ways. One of the simplest ways to further this goal is to prevent one programmer's choice of names from preventing another from choosing that name. Because of this, the concept of scope was invented. A scope is a "region" of code in which a name can be used and outside of which the name cannot be easily accessed. There are two ways of delimiting regions in Python: with functions or with modules. They each have different ways of accessing the useful data that was produced within the scope from outside the scope. With functions, that way is to return the data. The way to access names from other modules lead us to another concept.
Namespaces
It would be possible to teach Python without the concept of namespaces because they are so similar to attributes, which we have already mentioned, but the concept of namespaces is one that transcends any particular programming language, and so it is important to teach. To begin with, there is a built-in function dir() that can be used to help one understand the concept of namespaces. When you first start the Python interpreter (i.e., in interactive mode), you can list the objects in the current (or default) namespace using this function.
Python 2.3.4 (#53, Oct 18 2004, 20:35:07) [MSC v.1200 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> dir() ['__builtins__', '__doc__', '__name__']
This function can also be used to show the names available within a module namespace. To demonstrate this, first we can use the type() function to show what __builtins__ is:
>>> type(__builtins__) <type 'module'>
Since it is a module, we can list the names within the __builtins__ namespace, again using the dir() function (note the complete list of names has been abbreviated):
>>> dir(__builtins__) ['ArithmeticError', ... 'copyright', 'credits', ... 'help', ... 'license', ... 'zip'] >>>
Namespaces are a simple concept. A namespace is a place in which a name resides. Each name within a namespace is distinct from names outside of the namespace. This layering of namespaces is called scope. A name is placed within a namespace when that name is given a value. For example:
>>> dir() ['__builtins__', '__doc__', '__name__'] >>> name = "Bob" >>> import math >>> dir() ['__builtins__', '__doc__', '__name__', 'math', 'name']
Note that I was able to add the "name" variable to the namespace using a simple assignment statement. The import statement was used to add the "math" name to the current namespace. To see what math is, we can simply:
>>> math <module 'math' (built-in)>
Since it is a module, it also has a namespace. To display the names within this namespace, we:
>>> dir(math) ['__doc__', '__name__', 'acos', 'asin', 'atan', 'atan2', 'ceil', 'cos', 'cosh', 'degrees', 'e', 'exp', 'fabs', 'floor', 'fmod', 'frexp', 'hypot', 'ldexp', 'log', 'log10', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh'] >>>
If you look closely, you will notice that both the default namespace, and the math module namespace have a '__name__' object. The fact that each layer can contain an object with the same name is what scope is all about. To access objects inside a namespace, simply use the name of the module, followed by a dot, followed by the name of the object. This allow us to differentiate between the __name__ object within the current namespace, and that of the object with the same name within the math module. For example:
>>> print __name__ __main__ >>> print math.__name__ math >>> print math.__doc__ This module is always available. It provides access to the mathematical functions defined by the C standard. >>> math.pi 3.1415926535897931
| Previous: Strings and arrays | Index | Next: Data types |
Data types
| Previous: Basic syntax | Index | Next: Numbers |
Data types determine whether an object can do something, or whether it just would not make sense. Other programming languages often determine whether an operation makes sense for an object by making sure the object can never be stored somewhere where the operation will be performed on the object (this type system is called static typing). Python does not do that. Instead it stores the type of an object with the object, and checks when the operation is performed whether that operation makes sense for that object (this is called dynamic typing).
Python's basic datatypes are:
- Integers, equivalent to C longs
- Floating-Point numbers, equivalent to C doubles
- Long integers of non-limited length
- Complex Numbers.
- Strings
- Some others, such as type and function
Python's composite datatypes are:
- lists
- tuples
- dictionaries, also called dicts, hashmaps, or associative arrays
Literal integers can be entered as in C:
- decimal numbers can be entered directly
- octal numbers can be entered by prepending a 0 (0732 is octal 732, for example)
- hexadecimal numbers can be entered by prepending a 0x (0xff is hex FF, or 255 in decimal)
Floating point numbers can be entered directly.
Long integers are entered either directly (1234567891011121314151617181920 is a long integer) or by appending an L (0L is a long integer). Computations involving short integers that overflow are automatically turned into long integers.
Complex numbers are entered by adding a real number and an imaginary one, which is entered by appending a j (i.e. 10+5j is a complex number. So is 10j). Note that j by itself does not constitute a number. If this is desired, use 1j.
Strings can be either single or triple quoted strings. The difference is in the starting and ending delimiters, and in that single quoted strings cannot span more than one line. Single quoted strings are entered by entering either a single quote (') or a double quote (") followed by its match. So therefore
'foo' works, and
"moo" works as well,
but
'bar" does not work, and
"baz' does not work either.
"quux'' is right out.
Triple quoted strings are like single quoted strings, but can span more than one line. Their starting and ending delimiters must also match. They are entered with three consecutive single or double quotes, so
'''foo''' works, and
"""moo""" works as well,
but
'"'bar'"' does not work, and
"""baz''' does not work either.
'"'quux"'" is right out.
Tuples are entered in parenthesis, with commas between the entries:
(10, 'Mary had a little lamb')
Also, the parenthesis can be left out when it's not ambiguous to do so:
10, 'whose fleece was as white as snow'
Note that one-element tuples can be entered by surrounding the entry with parentheses and adding a comma like so:
('this is a stupid tuple',)
Lists are similar, but with brackets:
['abc', 1,2,3]
Dicts are created by surrounding with curly braces a list of key,value pairs separated from each other by a colon and from the other entries with commas:
{ 'hello': 'world', 'weight': 'African or European?' }
Any of these composite types can contain any other, to any depth:
((((((((('bob',),['Mary', 'had', 'a', 'little', 'lamb']), { 'hello' : 'world' } ),),),),),),)
| Previous: Basic syntax | Index | Next: Numbers |
Numbers
| Previous: Data types | Index | Next: Strings |
Python supports 4 types of Numbers, the int, the long, the float and the complex. You don’t have to specify what type of variable you want; Python does that automatically.
- Int: This is the basic integer type in python, it is equivalent to the hardware 'c long' for the platform you are using.
- Long: This is a integer number that's length is non-limited. In python 2.2 and later, Ints are automatically turned into long ints when they overflow.
- Float: This is a binary floating point number. Longs and Ints are automatically converted to floats when a float is used in an expression, and with the true-division // operator.
- Complex: This is a complex number consisting of two floats. It is in engineering style notation.
In general, the number types are automatically 'up cast' in this order:
Int → Long → Float → Complex. The farther to the right you go, the higher the precedence.
>>> x = 5 >>> type(x) <type 'int'> >>> x = 187687654564658970978909869576453 >>> type(x) <type 'long'> >>> x = 1.34763 >>> type(x) <type 'float'> >>> x = 5 + 2j >>> type(x) <type 'complex'>
However, some expressions may be confusing since in the current version of python, using the / operator on two integers will return another integer, using floor division. For example, 5/2 will give you 2. You have to specify one of the operands as a float to get true division, e.g. 5/2. or 5./2 (the dot specifies you want to work with float) to have 2.5. This behavior is deprecated and will disappear in a future python release as shown from the from __future__ import.
>>> 5/2 2 >>>5/2. 2.5 >>>5./2 2.5 >>> from __future__ import division >>> 5/2 2.5 >>> 5//2 2
| Previous: Data types | Index | Next: Strings |
Strings
| Previous: Numbers | Index | Next: Lists |
String manipulation
String operations
Equality
Two strings are equal if and only if they have exactly the same contents, meaning that they are both the same length and each character has a one-to-one positional correspondence. Many other languages test strings only for identity; that is, they only test whether two strings occupy the same space in memory. This latter operation is possible in Python using the operator is.
Example:
>>> a = 'hello'; b = 'hello' # Assign 'hello' to a and b. >>> print a == b # True True >>> print a == 'hello' # True >>> print a == "hello" # (choice of delimiter is unimportant) True >>> print a == 'hello ' # (extra space) False >>> print a == 'Hello' # (wrong case) False
Numerical
There are two quasi-numerical operations which can be done on strings -- addition and multiplication. String addition is just another name for concatenation. String multiplication is repetitive addition, or concatenation. So:
>>> c = 'a' >>> c + 'b' 'ab' >>> c * 5 'aaaaa'
Containment
There is a simple operator 'in' that returns True if the first operand is contained in the second. This also works on substrings
>>> x = 'hello' >>> y = 'll' >>> x in y False >>> y in x True
Note that 'print x in y' would have also returned the same value.
Indexing and Slicing
Much like arrays in other languages, the individual characters in a string can be accessed by an integer representing its position in the string. The first character in string s would be s[0] and the nth character would be at s[n-1].
>>> s = "Xanadu" >>> s[1] 'a'
Unlike arrays in other languages, Python also indexes the arrays backwards, using negative numbers. The last character has index -1, the second to last character has index -2, and so on.
>>> s[-4] 'n'
We can also use "slices" to access a substring of s. s[a:b] will give us a string starting with s[a] and ending with s[b-1].
>>> s[1:4] 'ana'
Neither of these is assignable.
>>> print s >>> s[0] = 'J' Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: object does not support item assignment >>> s[1:3] = "up" Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: object does not support slice assignment >>> print s
Outputs (assuming the errors were suppressed):
Xanadu Xanadu
Another feature of slices is that if the beginning or end is left empty, it will default to the first or last index, depending on context:
>>> s[2:] 'nadu' >>> s[:3] 'Xan' >>> s[:] 'Xanadu'
You can also use negative numbers in slices:
>>> print s[-2:] 'du'
To understand slices, it's easiest not to count the elements themselves. It is a bit like counting not on your fingers, but in the spaces between them. The list is indexed like this:
Element: 1 2 3 4
Index: 0 1 2 3 4
-4 -3 -2 -1
So, when we ask for the [1:3] slice, that means we start at index 1, and end at index 3, and take everything in between them. If you are used to indexes in C or Java, this can be a bit disconcerting until you get used to it.
String constants
String constants can be found in the standard string module. Either single or double quotes may be used to delimit string constants.
String methods
There are a number of methods of built-in string functions:
- capitalize
- center
- count
- decode
- encode
- endswith
- expandtabs
- find
- index
- isalnum
- isalpha
- isdigit
- islower
- isspace
- istitle
- isupper
- join
- ljust
- lower
- lstrip
- replace
- rfind
- rindex
- rjust
- rstrip
- split
- splitlines
- startswith
- strip
- swapcase
- title
- translate
- upper
- zfill
Only emphasized items will be covered.
is*
isalnum(), isalpha(), isdigit(), islower(), isupper(), isspace(), and istitle() fit into this category.
The length of the string object being compared must be at least 1, or the is* methods will return False. In other words, a string object of len(string) == 0, is considered "empty", or False.
- isalnum returns True if the string is entirely composed of alphabetic or numeric characters (i.e. no punctuation).
- isalpha and isdigit work similarly for alphabetic characters or numeric characters only.
- isspace returns True if the string is composed entirely of whitespace.
- islower, isupper, and istitle return True if the string is in lowercase, uppercase, or titlecase respectively. Uncased characters are "allowed", such as digits, but there must be at least one cased character in the string object in order to return True. Titlecase means the first cased character of each word is uppercase, and any immediately following cased characters are lowercase. Curiously, 'Y2K'.istitle() returns True. That is because uppercase characters can only follow uncased characters. Likewise, lowercase characters can only follow uppercase characters. Hint: whitespace is uncased.
Example:
>>> '2YK'.istitle() False >>> '2Yk'.istitle() True >>> '2Y K'.istitle() True
title, upper, lower, swapcase, capitalize
Returns the string converted to title case, upper case, lower case, inverts case, or capitalizes, respectively.
The title method capitalizes the first letter of each word in the string (and makes the rest lower case). Words are identified as substrings of alphabetic characters that are separated by non-alphabetic characters, such as digits, or whitespace. This can lead to some unexpected behavior. For example, the string "x1x" will be converted to "X1X" instead of "X1x".
The swapcase method makes all uppercase letters lowercase and vice versa.
The capitalize method is like title except that it considers the entire string to be a word. (i.e. it makes the first character upper case and the rest lower case)
Example:
>>> s = 'Hello, wOrLD' >>> s 'Hello, wOrLD' >>> s.title() 'Hello, World' >>> s.swapcase() 'hELLO, WoRld' >>> s.upper() 'HELLO, WORLD' >>> s.lower() 'hello, world' >>> s.capitalize() 'Hello, world'
count
Returns the number of the specified substrings in the string. i.e.
>>> s = 'Hello, world' >>> s.count('l') # print the number of 'l's in 'Hello, World' (3) 3
strip, rstrip, lstrip
Returns a copy of the string with the leading (lstrip) and trailing (rstrip) whitespace removed. strip removes both.
>>> s = '\t Hello, world\n\t ' >>> print s Hello, world >>> print s.strip() Hello, world >>> print s.lstrip() Hello, world # ends here >>> print s.rstrip() Hello, world
Note the leading and trailing tabs and newlines.
Strip methods can also be used to remove other types of characters.
import string s = 'www.wikibooks.org' print s print s.strip('w') # Removes all w's from outside print s.strip(string.lowercase) # Removes all lowercase letters from outside print s.strip(string.printable) # Removes all printable characters
Outputs:
www.wikibooks.org .wikibooks.org .wikibooks.
Note that string.lowercase and string.printable require an import string statement
ljust, rjust, center
left, right or center justifies a string into a given field size (the rest is padded with spaces).
>>> s = 'foo' >>> s 'foo' >>> s.ljust(7) 'foo ' >>> s.rjust(7) ' foo' >>> s.center(7) ' foo '
join
Joins together the given sequence with the string as separator:
>>> seq = ['1', '2', '3', '4', '5'] >>> ' '.join(seq) '1 2 3 4 5' >>> '+'.join(seq) '1+2+3+4+5'
map may be helpful here: (it converts numbers in seq into strings)
>>> seq = [1,2,3,4,5] >>> ' '.join(map(str, seq)) '1 2 3 4 5'
now arbitrary objects may be in seq instead of just strings.
find, index, rfind, rindex
The find and index functions returns the index of the first found occurrence of the given subsequence. If it is not found, find returns -1 but index raises a ValueError. rfind and rindex are the same as find and index except that they search through the string from right to left (i.e. they find the last occurrence)
>>> s = 'Hello, world' >>> s.find('l') 2 >>> s[s.index('l'):] 'llo, world' >>> s.rfind('l') 10 >>> s[:s.rindex('l')] 'Hello, wor' >>> s[s.index('l'):s.rindex('l')] 'llo, wor'
Because Python strings accept negative subscripts, index is probably better used in situations like the one shown because using find instead would yield an incorrect value.
replace
Replace works just like it sounds. It returns a copy of the string with all occurrences of the first parameter replaced with the second parameter.
>>> 'Hello, world'.replace('o', 'X') 'HellX, wXrld'
Or, using variable assignment:
string = 'Hello, world' newString = string.replace('o', 'X') print string print newString
Outputs:
'Hello, world' 'HellX, wXrld'
Notice, the original variable (string) remains unchanged after the call to replace.
expandtabs
Replaces tabs with the appropriate number of spaces. (default number of spaces per tab = 8; this can be changed by passing the tab size as an argument)
s = 'abcdefg\tabc\ta' print s print len(s) t = s.expandtabs() print t print len(t) abcdefg abc a 13 abcdefg abc a 17
Notice how (although these both look the same) the second string (t) has a different length because each tab is represented by spaces not tab characters.
To use a a tab size of 4 instead of 8:
v = s.expandtabs(4) print v print len(s)
Outputs:
abcdefg abc a 13
split, splitlines
The split method returns a list of the words in the string. It can take a separator argument to use instead of whitespace.
>>> s = 'Hello, world' >>> s.split() ['Hello, ', 'world'] >>> s.split('l') ['He', '', 'o, wor', 'd']
Note that in neither case is the separator included in the split strings, but empty strings are allowed.
The splitlines method breaks a multiline string into many single line strings. It is analogous to split('\n') (but accepts '\r' and '\r\n' as delimiters as well) except that if the string ends in a newline character, splitlines ignores that final character (see example).
>>> s = """ ... One line ... Two lines ... Red lines ... Blue lines ... Green lines ... """ >>> s.split('\n') ['', 'One line', 'Two lines', 'Red lines', 'Blue lines', 'Green lines', ''] >>> s.splitlines() ['', 'One line', 'Two lines', 'Red lines', 'Blue lines', 'Green lines']
| Previous: Numbers | Index | Next: Lists |
Lists
| Previous: Strings | Index | Next: Tuples |
About lists in Python
A list in Python is an ordered group of items (or elements). It is a very general structure, and list elements don't have to be of the same type. For instance, you could put numbers, letters, strings and donkeys all on the same list.
If you are using a modern version of Python (and you should be), there is a class called 'list'. If you wish, you can make your own subclass of it, and determine list behaviour which is different than the default standard. But first, you should be familiar with the current behaviour of lists.
List notation
There are two different ways to make a list in python. The first is through assignment ("statically"), the second is using list comprehensions("actively").
To make a static list of items, write them between square brackets. For example:
[ 1,2,3,"This is a list",'c',Donkey("kong") ]
A couple of things to look at.
- There are different data types here. Lists in python may contain more than one data type.
- Objects can be created 'on the fly' and added to lists. The last item is a new kind of Donkey.
Writing lists this way is very quick (and obvious). However, it does not take into account the current state of anything else. The other way to make a list is to form it using list comprehension. That means you actually describe the process. To do that, the list is broken into two pieces. The first is a picture of what each element will look like, and the second is what you do to get it.
For instance, lets say we have a list of words:
listOfWords = ["this","is","a","list","of","words"]
We will take the first letter of each word and make a list out of it.
>>> listOfWords = ["this","is","a","list","of","words"] >>> items = [ word[0] for word in listOfWords ] >>> print items ['t', 'i', 'a', 'l', 'o', 'w']
List comprehension allows you to use more than one for statement. It will evaluate the items in all of the objects sequentially and will loop over the shorter objects if one object is longer than the rest.
>>> item = [x+y for x in 'flower' for y in 'pot'] >>> print item ['fp', 'fo', 'ft', 'lp', 'lo', 'lt', 'op', 'oo', 'ot', 'wp', 'wo', 'wt', 'ep', 'eo', 'et', 'rp', 'ro', 'rt']
Python's list comprehension does not define a scope. Any variables that are bound in an evaluation remain bound to whatever they were last bound to when the evaluation was completed:
>>> print x, y
r t
This is exactly the same as if the comprehension had been expanded into an explicitly-nested group of one or more 'for' statements and 0 or more 'if' statements.
List creation shortcuts
Python provides a shortcut to initialize a list to a particular size and with an initial value for each element:
>>> zeros=[0]*5 >>> print zeros [0, 0, 0, 0, 0]
This works for any data type:
>>> foos=['foo']*8 >>> print foos ['foo', 'foo', 'foo', 'foo', 'foo', 'foo', 'foo', 'foo']
with a caveat. When building a new list by multiplying, Python copies each item by reference. This poses a problem for mutable items, for instance in a multidimensional array where each element is itself a list. You'd guess that the easy way to generate a two dimensional array would be:
listoflists=[ [0]*4 ] *5
and this works, but probably doesn't do what you expect:
>>> listoflists=[ [0]*4 ] *5 >>> print listoflists [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]] >>> listoflists[0][2]=1 >>> print listoflists [[0, 0, 1, 0], [0, 0, 1, 0], [0, 0, 1, 0], [0, 0, 1, 0], [0, 0, 1, 0]]
What's happening here is that Python is using the same reference to the inner list as the elements of the outer list. Another way of looking at this issue is to examine how Python sees the above definition:
>>> innerlist=[0]*4 >>> listoflists=[innerlist]*5 >>> print listoflists [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]] >>> innerlist[2]=1 >>> print listoflists [[0, 0, 1, 0], [0, 0, 1, 0], [0, 0, 1, 0], [0, 0, 1, 0], [0, 0, 1, 0]]
Assuming the above effect is not what you intend, one way around this issue is to use list comprehensions:
>>> listoflists=[[0]*4 for i in range(5)] >>> print listoflists [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]] >>> listoflists[0][2]=1 >>> print listoflists [[0, 0, 1, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
Operations on lists
List Attributes
To find the length of a list use the built in len() method.
>>> len([1,2,3]) 3 >>> a = [1,2,3,4] >>> len( a ) 4
Combining lists
Lists can be combined in several ways. The easiest is just to 'add' them. For instance:
>>> [1,2] + [3,4] [1, 2, 3, 4]
Another way to combine lists is with extend. If you need to combine lists inside of a lamda, extend is the way to go.
>>> a = [1,2,3] >>> b = [4,5,6] >>> a.extend(b) >>> print a [1, 2, 3, 4, 5, 6]
The other way to append a value to a list is to use append. For example:
>>> p=[1,2] >>> p.append([3,4]) >>> p [1, 2, [3, 4]] >>> # or >>> print p [1, 2, [3, 4]]
Getting pieces of lists (slices)
Like strings, lists can be indexed and sliced.
>>> list = [2, 4, "usurp", 9.0,"n"] >>> list[2] 'usurp' >>> list[3:] [9.0, 'n']
Much like the slice of a string is a substring, the slice of a list is a list. However, lists differ from strings in that we can assign new values to the items in a list.
>>> list[1] = 17 >>> list [2, 17, 'usurp', 9.0,'n']
We can even assign new values to slices of the lists, which don't even have to be the same length
>>> list[1:4] = ["opportunistic", "elk"] >>> list [2, 'opportunistic', 'elk', 'n']
It's even possible to append things onto the end of lists by assigning to an empty slice:
>>> list[:0] = [3.14,2.71] >>> list [3.14, 2.71, 2, 'opportunistic', 'elk', 'n']
You can also completely change contents of a list:
>>> list[:] = ['new', 'list', 'contents'] >>> list ['new', 'list', 'contents']
On the right site of assign statement can be any iterable type:
>>> list[:2] = ('element',('t',),[]) >>> list ['element', ('t',), [], 'contents']
With slicing you can create copy of list because slice returns a new list:
>>> original = [1, 'element', []] >>> list_copy = original[:] >>> list_copy [1, 'element', []] >>> list_copy.append('new element') >>> list_copy [1, 'element', [], 'new element'] >>> original [1, 'element', []]
but this is shallow copy and contains references to elements from original list, so be careful with mutable types:
>>> list_copy[2].append('something') >>> original [1, 'element', ['something']]
Comparing lists
Lists can be compared for equality.
>>> [1,2] == [1,2] True >>> [1,2] == [3,4] False
Sorting lists
Sorting lists is easy with a sort method.
>>> list = [2, 3, 1, 'a', 'b'] >>> list.sort() >>> list [1, 2, 3, 'a', 'b']
Note that the list is sorted in place, and the sort() method returns None to emphasize this side effect.
If you use Python 2.4 or higher there are some more sort parameters:
sort(cmp,key,reverse)
cmp : method to be used for sorting
key : function to be executed with key element. List is sorted by return-value of the function
reverse : sort ascending y/n
List methods
append(x)
Add item x onto the end of the list.
>>> list = [1, 2, 3] >>> list.