Software Carpentry logo

Python Strings, Lists, and Files

April 24, 2010: We are pleased to announce that Version 4 of this course is now under development. For updates and an early peek at the content, please check out the Software Carpentry blog at http://www.software-carpentry.org/blog/.

1) Introduction

2) You Can Skip This Lecture If...

3) Strings

element = "boron"
i = 0
while i < len(element):
    print element[i]
    i += 1
b
o
r
o
n

4) Immutability

$ python
>>> element = 'gold'
>>> print 'element is', element
element is gold
>>> element[0] = 's'
TypeError: object does not support item assignment
element = 'gold'
print 'element is', element
element = 'lead'
print 'element is now', element
element is gold
element is now lead

5) Slicing

element = "helium"
print element[1:3], element[:2], element[4:]
el he um
Visualizing Indices

Figure 3.1: Visualizing Indices

6) Bounds Checking

$ python
>>> element = 'helium'
>>> print element[1:22]
elium
>>> x = element[22]
IndexError: string index out of range

7) Negative Indices

element = "carbon"
print element[-2], element[-4], element[-6]
o r c
Visualizing Negative Indices

Figure 3.2: Visualizing Negative Indices

8) Consequences

9) Methods

10) String Methods

Method Purpose Example Result
capitalize Capitalize first letter of string "text".capitalize() "Text"
lower Convert all letters to lowercase. "aBcD".lower() "abcd"
upper Convert all letters to uppercase. "aBcD".upper() "ABCD"
strip Remove leading and trailing whitespace (blanks, tabs, newlines, etc.) " a b ".strip() "a b"
lstrip Remove whitespace at left (leading) edge of string. " a b ".lstrip() "a b "
rstrip Remove whitespace at right (trailing) edge of string. " a b ".rstrip() " a b"
count Count how many times one string appears in another. "abracadabra".count("ra") 2
find Return the index of the first occurrence of one string in another, or -1. "abracadabra".find("ra") 2
"abracadabra".find("xyz") -1
replace Replace occurrences of one string with another. "abracadabra".replace("ra", "-") "ab-cadab-"

Table 3.1: String Methods

11) Notes on String Methods

element = 'helium'
print element.upper()
print element.replace('el', 'afn')
print 'element after calls:', element
HELIUM
hafnium
element after calls: helium

12) Chaining Method Calls

element = "cesium"
print ':' + element.upper()[4:7].center(10) + ':'
:    UM    :

13) Testing for Membership

print "ant" in "tantalum"
print "mat" in "tantalum"
True
False

14) Lists

gases = ['He', 'Ne', 'Ar', 'Kr']
print gases
print gases[0], gases[-1]
['He', 'Ne', 'Ar', 'Kr']
He Kr

15) Modifying Lists

gases = ['He', 'Ne', 'Ar', 'Kr']
print 'before:', gases
gases[0] = 'H'
gases[-1] = 'Xe'
print 'after:', gases
before: ['He', 'Ne', 'Ar', 'Kr']
after: ['H', 'Ne', 'Ar', 'Xe']
$ python
>>> gases = ['He', 'Ne', 'Ar', 'Kr']
>>> print 'before:', gases
before: ['He', 'Ne', 'Ar', 'Kr']
>>> gases[10] = 'Ra'
IndexError: list assignment index out of range
characters = []
print characters
for c in 'aeiou':
    characters.append(c)
    print characters
[]
['a']
['a', 'e']
['a', 'e', 'i']
['a', 'e', 'i', 'o']
['a', 'e', 'i', 'o', 'u']

16) Concatenation

element = 'carbon'
mass = '14'
print element + '-' + mass

lanthanides = ['Ce', 'Pr', 'Nd']
actinides = ['Th', 'Pa', 'U']
all = lanthanides + actinides
print all
carbon-14
['Ce', 'Pr', 'Nd', 'Th', 'Pa', 'U']
water = 'H2O'
print 'before conversion:', water
water = list(water)
print 'after conversion:', water
before conversion: H2O
after conversion: ['H', '2', 'O']

17) Deleting List Elements

organics = ['H', 'C', 'O', 'N']
print 'original:', organics
del organics[2]
print 'after deleting item 2:', organics
del organics[-2:]
print 'after deleting the last two remaining items:', organics
original: ['H', 'C', 'O', 'N']
after deleting item 2: ['H', 'C', 'N']
after deleting the last two remaining items: ['H']
organics = ['H', 'C', 'O', 'N']
print 'original:', organics
del organics[1:-1]
print 'after deleting the middle:', organics
original: ['H', 'C', 'O', 'N']
after deleting the middle: ['H', 'N']

18) List Methods

19) Notes on List Methods

20) For Loops

for c in 'lead':
    print '/' + c + '/',
print

for v in ['he', 'ar', 'ne', 'kr']:
    print v.capitalize()
/l/ /e/ /a/ /d/
He
Ar
Ne
Kr

21) Ranges

print 'up to 5:', range(5)
print '2 to 5:', range(2, 5)
print '2 to 10 by 2:', range(2, 10, 2)
print '10 to 2:', range(10, 2)
print '10 to 2 by -2:', range(10, 2, -2)
up to 5: [0, 1, 2, 3, 4]
2 to 5: [2, 3, 4]
2 to 10 by 2: [2, 4, 6, 8]
10 to 2: []
10 to 2 by -2: [10, 8, 6, 4]

22) Ranged Loops

element = 'sulfur'
for i in range(len(element)):
    print i, element[i]
0 s
1 u
2 l
3 f
4 u
5 r

23) Membership

24) Nesting Lists

elements = [['H', 'Li', 'Na'], ['F', 'Cl']]
print 'first item in outer list:', elements[0]
print 'second item of second sublist:', elements[1][1]

first item in outer list: ['H', 'Li', 'Na']
second item of second sublist: Cl

25) Aliasing

elements = [['H', 'Li'], ['F', 'Cl']]
gases = elements[1]
print 'before'
print 'elements:', elements
print 'gases:', gases

gases[1] = 'Br'

print 'after'
print 'elements:', elements
before
elements: [['H', 'Li'], ['F', 'Cl']]
gases: ['F', 'Cl']
after
elements: [['H', 'Li'], ['F', 'Br']]
Aliasing In Action

Figure 3.4: Aliasing In Action

26) Indexing vs. Slicing

metals = ['Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn']
middle = metals[2:-2]
print 'before'
print 'metals:', metals
print 'middle:', middle

middle[0] = 'Al'
del middle[1]

print 'after'
print 'metals:', metals
print 'middle:', middle
before
metals: ['Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn']
middle: ['Fe', 'Co', 'Ni']
after
metals: ['Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn']
middle: ['Al', 'Ni']
Slicing Lists

Figure 3.5: Slicing Lists

27) Tuples

28) Multi-Valued Assignment

29) Unpacking Structures in Loops

elements = [
    ['H', 'hydrogen', 1.008],
    ['He', 'helium', 4.003],
    ['Li', 'lithium', 6.941],
    ['Be', 'beryllium', 9.012]
]

for (symbol, name, weight) in elements:
    print name + ' (' + symbol + '): ' + str(weight)
hydrogen (H): 1.008
helium (He): 4.003
lithium (Li): 6.941
beryllium (Be): 9.012

30) Files

input_file = open('count_bytes.py', 'r')
content = input_file.read()
input_file.close()
print len(content), 'bytes in file'
121 bytes in file
Method Purpose Example
close Close the file; no more reading or writing is allowed input_file.close()
read Read N bytes from the file, returning the empty string if the file is empty. next_block = input_file.read(1024)
If N is not given, read the rest of the file. rest = input_file.read()
readline Read the next line of text from the file, returning the empty string if the file is empty. line = input_file.readline()
readlines Return the remaining lines in the file as a list, or an empty list at the end of the file. rest = input_file.readlines()
write Write a string to a file. output_file.write("Element 8: Oxygen")
write does not automatically append a newline.
writelines Write each string in a list to a file (without appending newlines). output_file.writelines(["H", "He", "Li"])

Table 3.3: File Methods

31) Copying a File

input_file = open('file.txt', 'r')
output_file = open('copy.txt', 'w')
line = input_file.readline()
while line:
    output_file.write(line)
    line = input_file.readline()
input_file.close()
output_file.close()

32) Looping Over Files

input_file = open('count_lines.py', 'r')
count = 0
for line in input_file:
    count += 1
input_file.close()
print count, 'lines in file'
6 lines in file

33) Other Ways To Copy Files

input_file = open('file.txt', 'r')
lines = input_file.readlines()
input_file.close()

output_file = open('copy.txt', 'w')
output_file.writelines(lines)
output_file.close()

34) Still More Ways

input_file = open('file.txt', 'r')
output_file = open('copy.txt', 'w')
for line in input_file:
    line = line.rstrip()
    print >> output_file, line
input_file.close()
output_file.close()

35) Summary