# Python Programming/Print version

Python is a general purpose programming language.

Overview
Getting Python
Interactive mode

## Learning to program in Python

Creating Python programs

Basic syntax
Data types
Numbers
Strings
Lists
Tuples
Dictionaries
Sets
Operators
Control Flow
Functions
Scoping
Exceptions
Input and output
Modules
Classes
MetaClasses

## Rocking the Python (Modules)

Regular Expression
Graphical User Interfaces in Python
Python Programming/Game Programming in Python
Socket programming
Files (I/O)
Databases
Extracting info from web pages
Extending with C
Extending with C++
Extending with ctypes
WSGI web programming

Authors

# Overview

Python is a high-level, structured, open-source programming language that can be used for a wide variety of programming tasks. Python was created by Guido Van Rossum in the early 1990s; its following has grown steadily and interest has increased markedly in the last few years or so. It is named after Monty Python's Flying Circus comedy program.

Python is used extensively for system administration (many vital components of Linux distributions are written in it); also, it is a great language to teach programming to novices. NASA has used Python for its software systems and has adopted it as the standard scripting language for its Integrated Planning System. Python is also extensively used by Google to implement many components of its Web Crawler and Search Engine & Yahoo! for managing its discussion groups.

Python within itself is an interpreted programming language that is automatically compiled into bytecode before execution (the bytecode is then normally saved to disk, just as automatically, so that compilation need not happen again until and unless the source gets changed). It is also a dynamically typed language that includes (but does not require one to use) object-oriented features and constructs.

The most unusual aspect of Python is that whitespace is significant; instead of block delimiters (braces → "{}" in the C family of languages), indentation is used to indicate where blocks begin and end.

For example, the following Python code can be interactively typed at an interpreter prompt, display the famous "Hello World!" on the user screen:

 >>> print "Hello World!"
Hello World!


Another great feature of Python is its availability for all platforms. Python can run on Microsoft Windows, Macintosh and all Linux distributions with ease. This makes the programs very portable, as any program written for one platform can easily be used on another.

Python provides a powerful assortment of built-in types (e.g., lists, dictionaries and strings), a number of built-in functions, and a few constructs, mostly statements. For example, loop constructs that can iterate over items in a collection instead of being limited to a simple range of integer values. Python also comes with a powerful standard library, which includes hundreds of modules to provide routines for a wide variety of services including regular expressions and TCP/IP sessions.

Python is used and supported by a large Python Community that exists on the Internet. The mailing lists and news groups like the tutor list actively support and help new python programmers. While they discourage doing homework for you, they are quite helpful and are populated by the authors of many of the Python textbooks currently available on the market.

 Python 2 vs Python 3: Several years ago, the Python developers made the decision to come up with a major new version of Python. Initially called “Python 3000”, this became the 3.x series of versions of Python. What was radical about this was that the new version is backward-incompatible with Python 2.x: certain old features (like the handling of Unicode strings) were deemed to be too unwieldy or broken to be worth carrying forward. Instead, new, cleaner ways of achieving the same results were added.

# Getting Python

In order to program in Python you need the Python interpreter. If it is not already installed or if the version you are using is obsolete, you will need to obtain and install Python using the methods below:

## Python 2 vs Python 3

In 2008, a new version of Python (version 3) was published that was not entirely backward compatible. Developers were asked to switch to the new version as soon as possible, but many of the common external modules are not yet (as of Aug 2010) available for Python 3. There is a program called 2to3 to convert the source code of a Python 2 program to the source code of a Python 3 program. Consider this fact before you start working with Python.

## Installing Python in Windows

Go to the Python Homepage or the ActiveState website and get the proper version for your platform. Download it, read the instructions and get it installed.

In order to run Python from the command line, you will need to have the python directory in your PATH. Alternatively, you could use an Integrated Development Environment (IDE) for Python like DrPython[1], eric[2], PyScripter[3], or Python's own IDLE (which ships with every version of Python since 2.3).

The PATH variable can be modified from the Window's System control panel. To add the PATH in Windows 7 :

1. Go to Start.
2. Right click on computer.
3. Click on properties.
4. Click on 'Advanced System Settings'
5. Click on 'Environmental Variables'.
6. In the system variables select Path and edit it, by appending a ';' (without quote) and adding 'C:\python27'(without quote).

If you prefer having a temporary environment, you can create a new command prompt short-cut that automatically executes the following statement:

PATH %PATH%;c:\python27


If you downloaded a different version (such as Python 3.1), change the "27" for the version of Python you have (27 is 2.7.x, the current version of Python 2.)

### Cygwin

By default, the Cygwin installer for Windows does not include Python in the downloads. However, it can be selected from the list of packages.

## Installing Python on Mac

Users on Apple Mac OS X will find that it already ships with Python 2.3 (OS X 10.4 Tiger) or Python 2.6.1 (OS X Snow Leopard), but if you want the more recent version head to Python Download Page follow the instruction on the page and in the installers. As a bonus you will also install the Python IDE.

## Installing Python on Unix environments

Python is available as a package for some Linux distributions. In some cases, the distribution CD will contain the python package for installation, while other distributions require downloading the source code and using the compilation scripts.

### Gentoo Linux

Gentoo is an example of a distribution that installs Python by default — the package management system Portage depends on Python.

### Ubuntu Linux

Users of Ubuntu will notice that Python comes installed by default, only it sometimes is not the latest version.

### Arch Linux

Arch Linux does not come with Python pre-installed by default, but it is easily available for installation through the package manager to pacman. As root (or using sudo if you've installed and configured it), type:

pacman -S python


This will be update package databases and install Python 3. Python 2 can be installed with:

pacman -S python2


Other versions can be built from source from the Arch User Repository.

### Source code installations

Some platforms do not have a version of Python installed, and do not have pre-compiled binaries. In these cases, you will need to download the source code from the official site. Once the download is complete, you will need to unpack the compressed archive into a folder.

To build Python, simply run the configure script (requires the Bash shell) and compile using make.

### Other Distributions

Python, which is also referred to as CPython, is written in the C Programming language. The C source code is generally portable, that means CPython can run on various platforms. More precisely, CPython can be made available on all platforms that provide a compiler to translate the C source code to binary code for that platform.

Apart from CPython there are also other implementations that run on top of a virtual machine. For example, on Java's JRE (Java Runtime Environment) or Microsoft's .NET CLR (Common Language Runtime). Both can access and use the libraries available on their platform. Specifically, they make use of reflection that allows complete inspection and use of all classes and objects for their very technology.

Python Implementations (Platforms)

Environment Description Get From
Jython Java Version of Python Jython
IronPython C# Version of Python IronPython

### Integrated Development Environments (IDE)

CPython ships with IDLE; however, IDLE is not considered user-friendly.[1] For Linux, KDevelop and Spyder are popular. For Windows, PyScripter is free, quick to install, and comes included with PortablePython.

Some Integrated Development Environments (IDEs) for Python

Environment Description Get From
ActivePython Highly flexible, Pythonwin IDE ActivePython
Anjuta IDE Linux/Unix Anjuta
Eclipse (PyDev plugin) Open-source IDE Eclipse
Eric Open-source Linux/Windows IDE. Eric
KDevelop Cross-language IDE for KDE KDevelop
Ninja-IDE Cross-platform open-source IDE. Nina-IDE
PyScripter Free Windows IDE (portable) PyScripter
Pythonwin Windows-oriented environment Pythonwin
Spyder Free cross-platform IDE (math-oriented) Spyder
VisualWx Free GUI Builder VisualWx

The Python official wiki has a complete list of IDEs.

There are several commercial IDEs such as Komodo, BlackAdder, Code Crusader, Code Forge, and PyCharm. However, for beginners learning to program, purchasing a commercial IDE is unnecessary.

## Keeping Up to Date

Python has a very active community and the language itself is evolving continuously. Make sure to check python.org for recent releases and relevant tools. The website is an invaluable asset.

Public Python-related mailing lists are hosted at mail.python.org. Two examples of such mailing lists are the Python-announce-list to keep up with newly released third party-modules or software for Python and the general discussion list Python-list. These lists are mirrored to the Usenet newsgroups comp.lang.python.announce & comp.lang.python.

# Interactive mode

Python has two basic modes: normal and interactive. The normal mode is the mode where the scripted and finished .py files are run in the Python interpreter. Interactive mode is a command line shell which gives immediate feedback for each statement, while running previously fed statements in active memory. As new lines are fed into the interpreter, the fed program is evaluated both in part and in whole.

To start interactive mode, simply type "python" without any arguments. This is a good way to play around and try variations on syntax. Python should print something like this:

### Result

The program should print:

Hello, world!


Congratulations! You're well on your way to becoming a Python programmer.

## Exercises

1. Modify the hello.py program to say hello to someone from your family or your friends (or to Ada Lovelace).
2. Change the program so that after the greeting, it asks, "How did you get here?".
3. Re-write the original program to use two print statements: one for "Hello" and one for "world". The program should still only print out on one line.

Solutions

## Notes

1. Sometimes, Python programs are distributed in compiled form. We won't have to worry about that for quite a while.
2. A Quick Introduction to Unix/My First Shell Script explains what a hash bang line does.

# Basic syntax

There are five fundamental concepts in Python.

### Case Sensitivity

All variables are case-sensitive. Python treats 'number' and 'Number' as separate, unrelated entities.

### Spaces and tabs don't mix

Because whitespace is significant, remember that spaces and tabs don't mix, so use only one or the other when indenting your programs. A common error is to mix them. While they may look the same in editor, the interpreter will read them differently and it will result in either an error or unexpected behavior. Most decent text editors can be configured to let tab key emit spaces instead.

Python's Style Guideline described that the preferred way is using 4 spaces.

Tips: If you invoked python from the command-line, you can give -t or -tt argument to python to make python issue a warning or error on inconsistent tab usage.

pythonprogrammer@wikibook:~\$ python -tt myscript.py


This will issue an error if you have mixed spaces and tabs.

### Objects

In Python, like all object-oriented languages, there are aggregations of code and data called objects, which typically represent the pieces in a conceptual model of a system.

Objects in Python are created (i.e., instantiated) from templates called classes (which are covered later, as much of the language can be used without understanding classes). They have attributes, which represent the various pieces of code and data which make up the object. To access attributes, one writes the name of the object followed by a period (henceforth called a dot), followed by the name of the attribute.

An example is the 'upper' attribute of strings, which refers to the code that returns a copy of the string in which all the letters are uppercase. To get to this, it is necessary to have a way to refer to the object (in the following example, the way is the literal string that constructs the object).

'bob'.upper


Code attributes are called methods. So in this example, upper is a method of 'bob' (as it is of all strings). To execute the code in a method, use a matched pair of parentheses surrounding a comma separated list of whatever arguments the method accepts (upper doesn't accept any arguments). So to find an uppercase version of the string 'bob', one could use the following:

'bob'.upper()


### Scope

In a large system, it is important that one piece of code does not affect another in difficult to predict ways. One of the simplest ways to further this goal is to prevent one programmer's choice of a name from blocking another's use of that name. The concept of scope was invented to do this. A scope is a "region" of code in which a name can be used and outside of which the name cannot be easily accessed. There are two ways of delimiting regions in Python: with functions or with modules. They each have different ways of accessing from outside the scope useful data that was produced within the scope. With functions, that way is to return the data. The way to access names from other modules leads us to another concept.

### Namespaces

It would be possible to teach Python without the concept of namespaces because they are so similar to attributes, which we have already mentioned, but the concept of namespaces is one that transcends any particular programming language, and so it is important to teach. To begin with, there is a built-in function dir() that can be used to help one understand the concept of namespaces. When you first start the Python interpreter (i.e., in interactive mode), you can list the objects in the current (or default) namespace using this function.

Python 2.3.4 (#53, Oct 18 2004, 20:35:07) [MSC v.1200 32 bit (Intel)] on win32
>>> dir()
['__builtins__', '__doc__', '__name__']


This function can also be used to show the names available within a module's namespace. To demonstrate this, first we can use the type() function to show what kind of object __builtins__ is:

>>> type(__builtins__)
<type 'module'>


Since it is a module, it has a namespace. We can list the names within the __builtins__ namespace, again using the dir() function (note that the complete list of names has been abbreviated):

>>> dir(__builtins__)
['ArithmeticError', ... 'copyright', 'credits', ... 'help', ... 'license', ... 'zip']
>>>


Namespaces are a simple concept. A namespace is a particular place in which names specific to a module reside. Each name within a namespace is distinct from names outside of that namespace. This layering of namespaces is called scope. A name is placed within a namespace when that name is given a value. For example:

>>> dir()
['__builtins__', '__doc__', '__name__']
>>> name = "Bob"
>>> import math
>>> dir()
['__builtins__', '__doc__', '__name__', 'math', 'name']


Note that I was able to add the "name" variable to the namespace using a simple assignment statement. The import statement was used to add the "math" name to the current namespace. To see what math is, we can simply:

>>> math
<module 'math' (built-in)>


Since it is a module, it also has a namespace. To display the names within this namespace, we:

>>> dir(math)
['__doc__', '__name__', 'acos', 'asin', 'atan', 'atan2', 'ceil', 'cos', 'cosh', 'degrees', 'e',
'exp', 'fabs', 'floor', 'fmod', 'frexp', 'hypot', 'ldexp', 'log', 'log10', 'modf', 'pi', 'pow',
'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh']
>>>


If you look closely, you will notice that both the default namespace and the math module namespace have a '__name__' object. The fact that each layer can contain an object with the same name is what scope is all about. To access objects inside a namespace, simply use the name of the module, followed by a dot, followed by the name of the object. This allows us to differentiate between the __name__ object within the current namespace, and that of the object with the same name within the math module. For example:

>>> print (__name__)
__main__
>>> print (math.__name__)
math
>>> print (math.__doc__)
This module is always available.  It provides access to the
mathematical functions defined by the C standard.
>>> math.pi
3.1415926535897931


# Data types

Data types determine whether an object can do something, or whether it just would not make sense. Other programming languages often determine whether an operation makes sense for an object by making sure the object can never be stored somewhere where the operation will be performed on the object (this type system is called static typing). Python does not do that. Instead it stores the type of an object with the object, and checks when the operation is performed whether that operation makes sense for that object (this is called dynamic typing).

##### Built-in Data types

Python's built-in (or standard) data types can be grouped into several classes. Sticking to the hierarchy scheme used in the official Python documentation these are numeric types, sequences, sets and mappings (and a few more not discussed further here). Some of the types are only available in certain versions of the language as noted below.

• boolean: the type of the built-in values True and False. Useful in conditional expressions, and anywhere else you want to represent the truth or falsity of some condition. Mostly interchangeable with the integers 1 and 0. In fact, conditional expressions will accept values of any type, treating special ones like boolean False, integer 0 and the empty string "" as equivalent to False, and all other values as equivalent to True. But for safety’s sake, it is best to only use boolean values in these places.

Numeric types:

• int: Integers; equivalent to C longs in Python 2.x, non-limited length in Python 3.x
• long: Long integers of non-limited length; exists only in Python 2.x
• float: Floating-Point numbers, equivalent to C doubles
• complex: Complex Numbers

Sequences:

• str: String; represented as a sequence of 8-bit characters in Python 2.x, but as a sequence of Unicode characters (in the range of U+0000 - U+10FFFF) in Python 3.x
• bytes: a sequence of integers in the range of 0-255; only available in Python 3.x
• byte array: like bytes, but mutable (see below); only available in Python 3.x
• list
• tuple

Sets:

• set: an unordered collection of unique objects; available as a standard type since Python 2.6
• frozen set: like set, but immutable (see below); available as a standard type since Python 2.6

Mappings:

• dict: Python dictionaries, also called hashmaps or associative arrays, which means that an element of the list is associated with a definition, rather like a Map in Java

Some others, such as type and callables

##### Mutable vs Immutable Objects

In general, data types in Python can be distinguished based on whether objects of the type are mutable or immutable. The content of objects of immutable types cannot be changed after they are created.

 Some immutable types: int, float, long, complex str bytes tuple frozen set Some mutable types: byte array list set dict

Only mutable objects support methods that change the object in place, such as reassignment of a sequence slice, which will work for lists, but raise an error for tuples and strings.

It is important to understand that variables in Python are really just references to objects in memory. If you assign an object to a variable as below,

a = 1
s = 'abc'
l = ['a string', 456, ('a', 'tuple', 'inside', 'a', 'list')]


all you really do is make this variable (a, s, or l) point to the object (1, 'abc', ['a string', 456, ('a', 'tuple', 'inside', 'a', 'list')]), which is kept somewhere in memory, as a convenient way of accessing it. If you reassign a variable as below

a = 7
s = 'xyz'
l = ['a simpler list', 99, 10]


you make the variable point to a different object (newly created ones in our examples). As stated above, only mutable objects can be changed in place (l[0] = 1 is ok in our example, but s[0] = 'a' raises an error). This becomes tricky, when an operation is not explicitly asking for a change to happen in place, as is the case for the += (increment) operator, for example. When used on an immutable object (as in a += 1 or in s += 'qwertz'), Python will silently create a new object and make the variable point to it. However, when used on a mutable object (as in l += [1,2,3]), the object pointed to by the variable will be changed in place. While in most situations, you do not have to know about this different behavior, it is of relevance when several variables are pointing to the same object. In our example, assume you set p = s and m = l, then s += 'etc' and l += [9,8,7]. This will change s and leave p unaffected, but will change both m and l since both point to the same list object. Python's built-in id() function, which returns a unique object identifier for a given variable name, can be used to trace what is happening under the hood.
Typically, this behavior of Python causes confusion in functions. As an illustration, consider this code:

def append_to_sequence (myseq):
myseq += (9,9,9)
return myseq

tuple1 = (1,2,3)     # tuples are immutable
list1  = [1,2,3]     # lists are mutable

tuple2 = append_to_sequence(tuple1)
list2  = append_to_sequence(list1)

print 'tuple1 = ', tuple1  # outputs (1, 2, 3)
print 'tuple2 = ', tuple2  # outputs (1, 2, 3, 9, 9, 9)
print 'list1  = ', list1   # outputs [1, 2, 3, 9, 9, 9]
print 'list2  = ', list2   # outputs [1, 2, 3, 9, 9, 9]


This will give the above indicated, and usually unintended, output. myseq is a local variable of the append_to_sequence function, but when this function gets called, myseq will nevertheless point to the same object as the variable that we pass in (t or l in our example). If that object is immutable (like a tuple), there is no problem. The += operator will cause the creation of a new tuple, and myseq will be set to point to it. However, if we pass in a reference to a mutable object, that object will be manipulated in place (so myseq and l, in our case, end up pointing to the same list object).

##### Creating Objects of Defined Types

Literal integers can be entered in three ways:

• decimal numbers can be entered directly
• hexadecimal numbers can be entered by prepending a 0x or 0X (0xff is hex FF, or 255 in decimal)
• the format of octal literals depends on the version of Python:
• Python 2.x: octals can be entered by prepending a 0 (0732 is octal 732, or 474 in decimal)
• Python 3.x: octals can be entered by prepending a 0o or 0O (0o732 is octal 732, or 474 in decimal)

Floating point numbers can be entered directly.

Long integers are entered either directly (1234567891011121314151617181920 is a long integer) or by appending an L (0L is a long integer). Computations involving short integers that overflow are automatically turned into long integers.

Complex numbers are entered by adding a real number and an imaginary one, which is entered by appending a j (i.e. 10+5j is a complex number. So is 10j). Note that j by itself does not constitute a number. If this is desired, use 1j.

Strings can be either single or triple quoted strings. The difference is in the starting and ending delimiters, and in that single quoted strings cannot span more than one line. Single quoted strings are entered by entering either a single quote (') or a double quote (") followed by its match. So therefore

'foo' works, and
"moo" works as well,
but
'bar" does not work, and
"baz' does not work either.
"quux'' is right out.


Triple quoted strings are like single quoted strings, but can span more than one line. Their starting and ending delimiters must also match. They are entered with three consecutive single or double quotes, so

'''foo''' works, and
"""moo""" works as well,
but
'"'bar'"' does not work, and
"""baz''' does not work either.
'"'quux"'" is right out.


Tuples are entered in parentheses, with commas between the entries:

(10, 'Mary had a little lamb')


Also, the parenthesis can be left out when it's not ambiguous to do so:

10, 'whose fleece was as white as snow'


Note that one-element tuples can be entered by surrounding the entry with parentheses and adding a comma like so:

('this is a singleton tuple',)


Lists are similar, but with brackets:

['abc', 1,2,3]


Dicts are created by surrounding with curly braces a list of key/value pairs separated from each other by a colon and from the other entries with commas:

{ 'hello': 'world', 'weight': 'African or European?' }


Any of these composite types can contain any other, to any depth:

((((((((('bob',),['Mary', 'had', 'a', 'little', 'lamb']), { 'hello' : 'world' } ),),),),),),)


## Null object

The Python analogue of null pointer known from other programming languages is None. None is not a null pointer or a null reference but an actual object of which there is only one instance. One of the uses of None is in default argument values of functions, for which see Python Programming/Functions#Default_Argument_Values. Comparisons to None are usually made using is rather than ==.

Testing for None and assignment:

if item is None:
...
another = None

if not item is None:
...

if item is not None: # Also possible
...


Using None in a default argument value:

def log(message, type = None):
...


PEP8 states that "Comparisons to singletons like None should always be done with is or is not, never the equality operators." Therefore, "if item == None:" is inadvisable. A class can redefine the equality operator (==) such that instances of it will equal None.

## Exercises

1. Write a program that instantiates a single object, adds [1,2] to the object, and returns the result.
1. Find an object that returns an output of the same length (if one exists?).
2. Find an object that returns an output length 2 greater than it started.
3. Find an object that causes an error.
2. Find two data types X and Y such that X = X + Y will cause an error, but X += Y will not.

# Numbers

Python 2.x supports 4 numeric types - int, long, float and complex. Of these, the long type has been dropped in Python 3.x - the int type is now of unlimited length by default. You don’t have to specify what type of variable you want; Python does that automatically.

• Int: The basic integer type in python, equivalent to the hardware 'c long' for the platform you are using in Python 2.x, unlimited in length in Python 3.x.
• Long: Integer type with unlimited length. In python 2.2 and later, Ints are automatically turned into long ints when they overflow. Dropped since Python 3.0, use int type instead.
• Float: This is a binary floating point number. Longs and Ints are automatically converted to floats when a float is used in an expression, and with the true-division / operator.
• Complex: This is a complex number consisting of two floats. Complex literals are written as a + bj where a and b are floating-point numbers denoting the real and imaginary parts respectively.

In general, the number types are automatically 'up cast' in this order:

Int → Long → Float → Complex. The farther to the right you go, the higher the precedence.

>>> x = 5
>>> type(x)
<type 'int'>
>>> x = 187687654564658970978909869576453
>>> type(x)
<type 'long'>
>>> x = 1.34763
>>> type(x)
<type 'float'>
>>> x = 5 + 2j
>>> type(x)
<type 'complex'>


The result of divisions is somewhat confusing. In Python 2.x, using the / operator on two integers will return another integer, using floor division. For example, 5/2 will give you 2. You have to specify one of the operands as a float to get true division, e.g. 5/2. or 5./2 (the dot specifies you want to work with float) will yield 2.5. Starting with Python 2.2 this behavior can be changed to true division by the future division statement from __future__ import division. In Python 3.x, the result of using the / operator is always true division (you can ask for floor division explicitly by using the // operator since Python 2.2).

This illustrates the behavior of the / operator in Python 2.2+:

>>> 5/2
2
>>> 5/2.
2.5
>>> 5./2
2.5
>>> from __future__ import division
>>> 5/2
2.5
>>> 5//2
2


# Strings

## Overview

Strings in Python at a glance:

str1 = "Hello"                # A new string using double quotes
str2 = 'Hello'                # Single quotes do the same
str3 = "Hello\tworld\n"       # One with a tab and a newline
str4 = str1 + " world"        # Concatenation
str5 = str1 + str(4)          # Concatenation with a number
str6 = str1[2]                # 3rd character
str6a = str1[-1]              # Last character
#str1[0] = "M"                # No way; strings are immutable
for char in str1: print char  # For each character
str7 = str1[1:]               # Without the 1st character
str8 = str1[:-1]              # Without the last character
str9 = str1[1:4]              # Substring: 2nd to 4th character
str10 = str1 * 3              # Repetition
str11 = str1.lower()          # Lowercase
str12 = str1.upper()          # Uppercase
str13 = str1.rstrip()         # Strip right (trailing) whitespace
str14 = str1.replace('l','h') # Replacement
list15 = str1.split('l')      # Splitting
if str1 == str2: print "Equ"  # Equality test
if "el" in str1: print "In"   # Substring test
length = len(str1)            # Length
pos1 = str1.find('llo')       # Index of substring or -1
pos2 = str1.rfind('l')        # Index of substring, from the right
count = str1.count('l')       # Number of occurrences of a substring

print str1, str2, str3, str4, str5, str6, str7, str8, str9, str10
print str11, str12, str13, str14, list15
print length, pos1, pos2, count


See also chapter Regular Expression for advanced pattern matching on strings in Python.

## String operations

### Equality

Two strings are equal if they have exactly the same contents, meaning that they are both the same length and each character has a one-to-one positional correspondence. Many other languages compare strings by identity instead; that is, two strings are considered equal only if they occupy the same space in memory. Python uses the is operator to test the identity of strings and any two objects in general.

Examples:

>>> a = 'hello'; b = 'hello' # Assign 'hello' to a and b.
>>> a == b                   # check for equality
True
>>> a == 'hello'             #
True
>>> a == "hello"             # (choice of delimiter is unimportant)
True
>>> a == 'hello '            # (extra space)
False
>>> a == 'Hello'             # (wrong case)
False


### Numerical

There are two quasi-numerical operations which can be done on strings -- addition and multiplication. String addition is just another name for concatenation. String multiplication is repetitive addition, or concatenation. So:

>>> c = 'a'
>>> c + 'b'
'ab'
>>> c * 5
'aaaaa'


### Containment

There is a simple operator 'in' that returns True if the first operand is contained in the second. This also works on substrings

>>> x = 'hello'
>>> y = 'ell'
>>> x in y
False
>>> y in x
True


Note that 'print x in y' would have also returned the same value.

### Indexing and Slicing

Much like arrays in other languages, the individual characters in a string can be accessed by an integer representing its position in the string. The first character in string s would be s[0] and the nth character would be at s[n-1].

>>> s = "Xanadu"
>>> s[1]
'a'


Unlike arrays in other languages, Python also indexes the arrays backwards, using negative numbers. The last character has index -1, the second to last character has index -2, and so on.

>>> s[-4]
'n'


We can also use "slices" to access a substring of s. s[a:b] will give us a string starting with s[a] and ending with s[b-1].

>>> s[1:4]
'ana'


None of these are assignable.

>>> print s
>>> s[0] = 'J'
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: object does not support item assignment
>>> s[1:3] = "up"
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: object does not support slice assignment
>>> print s


Outputs (assuming the errors were suppressed):

Xanadu


Another feature of slices is that if the beginning or end is left empty, it will default to the first or last index, depending on context:

>>> s[2:]
>>> s[:3]
'Xan'
>>> s[:]


You can also use negative numbers in slices:

>>> print s[-2:]
'du'


To understand slices, it's easiest not to count the elements themselves. It is a bit like counting not on your fingers, but in the spaces between them. The list is indexed like this:

Element:     1     2     3     4
Index:    0     1     2     3     4
-4    -3    -2    -1


So, when we ask for the [1:3] slice, that means we start at index 1, and end at index 3, and take everything in between them. If you are used to indexes in C or Java, this can be a bit disconcerting until you get used to it.

## String constants

String constants can be found in the standard string module. An example is string.digits, which equals to '0123456789'.

## String methods

There are a number of methods or built-in string functions:

• capitalize
• center
• count
• decode
• encode
• endswith
• expandtabs
• find
• index
• isalnum
• isalpha
• isdigit
• islower
• isspace
• istitle
• isupper
• join
• ljust
• lower
• lstrip
• replace
• rfind
• rindex
• rjust
• rstrip
• split
• splitlines
• startswith
• strip
• swapcase
• title
• translate
• upper
• zfill

Only emphasized items will be covered.

### is*

isalnum(), isalpha(), isdigit(), islower(), isupper(), isspace(), and istitle() fit into this category.

The length of the string object being compared must be at least 1, or the is* methods will return False. In other words, a string object of len(string) == 0, is considered "empty", or False.

• isalnum returns True if the string is entirely composed of alphabetic and/or numeric characters (i.e. no punctuation).
• isalpha and isdigit work similarly for alphabetic characters or numeric characters only.
• isspace returns True if the string is composed entirely of whitespace.
• islower, isupper, and istitle return True if the string is in lowercase, uppercase, or titlecase respectively. Uncased characters are "allowed", such as digits, but there must be at least one cased character in the string object in order to return True. Titlecase means the first cased character of each word is uppercase, and any immediately following cased characters are lowercase. Curiously, 'Y2K'.istitle() returns True. That is because uppercase characters can only follow uncased characters. Likewise, lowercase characters can only follow uppercase or lowercase characters. Hint: whitespace is uncased.

Example:

>>> '2YK'.istitle()
False
>>> 'Y2K'.istitle()
True
>>> '2Y K'.istitle()
True


### Title, Upper, Lower, Swapcase, Capitalize

Returns the string converted to title case, upper case, lower case, inverts case, or capitalizes, respectively.

The title method capitalizes the first letter of each word in the string (and makes the rest lower case). Words are identified as substrings of alphabetic characters that are separated by non-alphabetic characters, such as digits, or whitespace. This can lead to some unexpected behavior. For example, the string "x1x" will be converted to "X1X" instead of "X1x".

The swapcase method makes all uppercase letters lowercase and vice versa.

The capitalize method is like title except that it considers the entire string to be a word. (i.e. it makes the first character upper case and the rest lower case)

Example:

s = 'Hello, wOrLD'
print s              # 'Hello, wOrLD'
print s.title()      # 'Hello, World'
print s.swapcase()   # 'hELLO, WoRld'
print s.upper()      # 'HELLO, WORLD'
print s.lower()      # 'hello, world'
print s.capitalize() # 'Hello, world'


Keywords: to lower case, to upper case, lcase, ucase, downcase, upcase.

### count

Returns the number of the specified substrings in the string. i.e.

>>> s = 'Hello, world'
>>> s.count('o') # print the number of 'o's in 'Hello, World' (2)
2


Hint: .count() is case-sensitive, so this example will only count the number of lowercase letter 'o's. For example, if you ran:

>>> s = 'HELLO, WORLD'
>>> s.count('o') # print the number of lowercase 'o's in 'HELLO, WORLD' (0)
0


### strip, rstrip, lstrip

Returns a copy of the string with the leading (lstrip) and trailing (rstrip) whitespace removed. strip removes both.

>>> s = '\t Hello, world\n\t '
>>> print s
Hello, world

>>> print s.strip()
Hello, world
>>> print s.lstrip()
Hello, world
# ends here
>>> print s.rstrip()
Hello, world


Note the leading and trailing tabs and newlines.

Strip methods can also be used to remove other types of characters.

import string
s = 'www.wikibooks.org'
print s
print s.strip('w')                 # Removes all w's from outside
print s.strip(string.lowercase)    # Removes all lowercase letters from outside
print s.strip(string.printable)    # Removes all printable characters


Outputs:

www.wikibooks.org
.wikibooks.org
.wikibooks.



Note that string.lowercase and string.printable require an import string statement

### ljust, rjust, center

left, right or center justifies a string into a given field size (the rest is padded with spaces).

>>> s = 'foo'
>>> s
'foo'
>>> s.ljust(7)
'foo    '
>>> s.rjust(7)
'    foo'
>>> s.center(7)
'  foo  '


### join

Joins together the given sequence with the string as separator:

>>> seq = ['1', '2', '3', '4', '5']
>>> ' '.join(seq)
'1 2 3 4 5'
>>> '+'.join(seq)
'1+2+3+4+5'


map may be helpful here: (it converts numbers in seq into strings)

>>> seq = [1,2,3,4,5]
>>> ' '.join(map(str, seq))
'1 2 3 4 5'


now arbitrary objects may be in seq instead of just strings.

### find, index, rfind, rindex

The find and index methods return the index of the first found occurrence of the given subsequence. If it is not found, find returns -1 but index raises a ValueError. rfind and rindex are the same as find and index except that they search through the string from right to left (i.e. they find the last occurrence)

>>> s = 'Hello, world'
>>> s.find('l')
2
>>> s[s.index('l'):]
'llo, world'
>>> s.rfind('l')
10
>>> s[:s.rindex('l')]
'Hello, wor'
>>> s[s.index('l'):s.rindex('l')]
'llo, wor'


Because Python strings accept negative subscripts, index is probably better used in situations like the one shown because using find instead would yield an unintended value.

### replace

Replace works just like it sounds. It returns a copy of the string with all occurrences of the first parameter replaced with the second parameter.

>>> 'Hello, world'.replace('o', 'X')
'HellX, wXrld'


Or, using variable assignment:

string = 'Hello, world'
newString = string.replace('o', 'X')
print string
print newString


Outputs:

Hello, world
HellX, wXrld


Notice, the original variable (string) remains unchanged after the call to replace.

### expandtabs

Replaces tabs with the appropriate number of spaces (default number of spaces per tab = 8; this can be changed by passing the tab size as an argument).

s = 'abcdefg\tabc\ta'
print s
print len(s)
t = s.expandtabs()
print t
print len(t)


Outputs:

abcdefg abc     a
13
abcdefg abc     a
17


Notice how (although these both look the same) the second string (t) has a different length because each tab is represented by spaces not tab characters.

To use a tab size of 4 instead of 8:

v = s.expandtabs(4)
print v
print len(v)


Outputs:

abcdefg abc a
13


Please note each tab is not always counted as eight spaces. Rather a tab "pushes" the count to the next multiple of eight. For example:

s = '\t\t'
print s.expandtabs().replace(' ', '*')
print len(s.expandtabs())


Output:

 ****************
16

s = 'abc\tabc\tabc'
print s.expandtabs().replace(' ', '*')
print len(s.expandtabs())


Outputs:

 abc*****abc*****abc
19


### split, splitlines

The split method returns a list of the words in the string. It can take a separator argument to use instead of whitespace.

>>> s = 'Hello, world'
>>> s.split()
['Hello,', 'world']
>>> s.split('l')
['He', '', 'o, wor', 'd']


Note that in neither case is the separator included in the split strings, but empty strings are allowed.

The splitlines method breaks a multiline string into many single line strings. It is analogous to split('\n') (but accepts '\r' and '\r\n' as delimiters as well) except that if the string ends in a newline character, splitlines ignores that final character (see example).

>>> s = """
... One line
... Two lines
... Red lines
... Blue lines
... Green lines
... """
>>> s.split('\n')
['', 'One line', 'Two lines', 'Red lines', 'Blue lines', 'Green lines', '']
>>> s.splitlines()
['', 'One line', 'Two lines', 'Red lines', 'Blue lines', 'Green lines']


## Exercises

1. Write a program that takes a string, (1) capitalizes the first letter, (2) creates a list containing each word, and (3) searches for the last occurrence of "a" in the first word.
2. Run the program on the string "Bananas are yellow."
3. Write a program that replaces all instances of "one" with "one (1)". For this exercise capitalization does not matter, so it should treat "one", "One", and "oNE" identically.
4. Run the program on the string "One banana was brown, but one was green."