Quality Assurance
April 24, 2010: We are pleased to announce that Version 4 of this course is now under development. For updates and an early peek at the content, please check out the Software Carpentry blog at http://www.software-carpentry.org/blog/.
1) Introduction
- The more you invest in quality, the less time it takes to develop working software [Glass 2002]
- Quality is not just testing
- "Trying to improve the quality of software by doing more testing is like trying to lose weight by weighing yourself more often." (Steve McConnell)
- Quality is:
- Designed in
- Monitored and maintained through the whole software lifecycle
- This lecture looks at basic things every developer can do to maintain quality

2) You Can Skip This Lecture If...
- You know that no amount of testing can prove that software is correct
- You know what unit testing, integration testing, and regression testing are
- You know what a fixture is
- You know what an exception is, and how to raise one
- You know what test-driven design is
- You know what defensive programming is
- You know what design by contract is

3) Limits to Testing
- Suppose you have a function that compares two 7-digit phone numbers, and returns
True if the first is greater than the second
- 1072 possible inputs
- At ten million tests per second, that's 155 days
- If they're 7-character alphabetic strings, it's 254 years
- Then you move on to the second function...
- And how do you know that your tests are correct?
- All a test can do is show that there may be a bug

4) Terminology
- A unit test exercises one component in isolation
- Developer-oriented: tests the program's internals
- An integration test exercises the whole system
- User-oriented: tests the software's overall behavior
- Regression testing is the practice of rerunning tests to check that the code still works
- I.e., make sure that today's changes haven't broken things that were working yesterday
- Programs that don't have regression tests are difficult (sometimes impossible) to maintain [Feathers 2005]

5) Test Results and Specifications
- Any test can have one of three outcomes:
- Pass: the actual outcome matches the expected outcome
- Fail: the actual outcome is different from what was expected
- Error: something went wrong inside the test (i.e., the test contains a bug)
- Don't know anything about the system being tested
- A specification is something that tells you how to classify a test's result

6) Structuring Tests
- How to write tests so that:
- It's easy to add or change tests
- It's easy to see what's been tested, and what hasn't
- A test consists of a fixture, an action, and an expected result
- A fixture is something that a test is run on
- Can be as simple as a single value, or as complex as a networked database
- Every test should be independent
- I.e., the outcome of one test shouldn't depend on what happened in another test
- Otherwise, faults in early tests can distort the results of later ones
- So each test:
- Creates a fresh instance of the fixture
- Performs the operation
- Checks and records the result

7) A Simple Example
- Test
string.startswith
- Specification: returns
True if the string starts with the given prefix, and False otherwise
- But what if the prefix is the empty string?
- Store the tests in a table
- String and prefix are the fixture
Tests = [
# String Prefix Expected
['a', 'a', True],
['a', 'b', False],
['abc', 'a', True],
['abc', 'ab', True],
['abc', 'abc', True],
['abc', 'abcd', False],
['abc', '', True]
]
passes = 0
failures = 0
for (s, p, expected) in Tests:
actual = s.startswith(p)
if actual == expected:
passes += 1
else:
failures += 1
print 'passed', passes, 'out of', passes+failures, 'tests'
- But where's the code to handle and report errors in the tests themselves?

8) Catching Errors
- Python uses exceptions for error handling
- Separates normal operation from error handling
- Makes both easier to read
- Structured like
if/else
- Code for healthy case goes in a
try block
- Error handling code goes in a matching
except block
- When something goes wrong in the
try block, Python raises an exception
- Can add an optional
else block
- Executed when things don't go wrong inside the
try block

9) Simple Exception Example
for num in [-1, 0, 1]:
try:
inverse = 1/num
except:
print 'inverting', num, 'caused error'
else:
print 'inverse of', num, 'is', inverse
inverse of -1 is -1
inverting 0 caused error
inverse of 1 is 1

10) Exception Objects
- When Python raises an exception, it creates an object to hold information about what went wrong
- Typically contains an error message
- Can choose which errors to handle by specifying an exception type in the
except statement
- E.g., handle division by zero, but not out-of-bounds list index
values = [0, 1, 'momentum']
for i in range(4):
try:
print 'dividing by value', i
x = 1.0 / values[i]
print 'result is', x
except ZeroDivisionError, e:
print 'divide by zero:', e
except IndexError, e:
print 'index error:', e
except:
print 'some other error:', e
dividing by value 0
divide by zero: float division
dividing by value 1
result is 1.0
dividing by value 2
some other error: float division
dividing by value 3
index error: list index out of range
- The
except blocks are tested in order---whichever matches first, wins
- If a "naked"
except appears, it must come last (since it catches everything)
- Generally better to use
except Exception, e so that you have the exception object

11) Exception Hierarchy
- Exceptions are organized in a hierarchy
- E.g.,
ZeroDivisionError, OverflowError, and FloatingPointError are all types of ArithmeticError
- A handler for the general type catches all its specific sub-types
- We'll see Python Basic Object-Oriented Programming how this hierarchy is implemented
- Hint: it has something to do with objects

| Name |
|
|
Purpose |
Exception |
|
|
Root of exception hierarchy |
|
ArithmeticError |
|
Illegal arithmetic operation |
|
|
FloatingPointError |
Generic error in floating point calculation |
|
|
OverflowError |
Result too large to represent |
|
|
ZeroDivisionError |
Attempt to divide by zero |
|
IndexError |
|
Bad index to sequence (out of bounds or illegal type) |
|
TypeError |
|
Illegal type (e.g., trying to add integer and string) |
|
ValueError |
|
Illegal value (e.g., math.sqrt(-1)) |
|
EnvironmentError |
|
Error interacting with the outside world |
|
|
IOError |
Unable to create or open file, read data, etc. |
|
|
OSError |
No permissions, no such device, etc. |
Table 15.1: Common Exception Types in Python
12) Functions and Exceptions
- Each time Python enters a
try/except block, it pushes the except handlers on a stack
- Just like the function call stack
- When an exception is raised, Python searches this stack for the top-most matching handler
- Often means jumping out of the middle of a function
def invert(vals, index):
try:
vals[index] = 10.0/vals[index]
except ArithmeticError, e:
print 'inner exception handler:', e
def each(vals, indices):
try:
for i in indices:
invert(vals, i)
except IndexError, e:
print 'outer exception handler:', e
# Once again, the top index will be out of bounds.
values = [-1, 0, 1]
print 'values before:', values
each(values, range(4))
print 'values after:', values
values before: [-1, 0, 1]
inner exception handler: float division
outer exception handler: list index out of range
values after: [-10.0, 0, 10.0]

13) Raising Exceptions
- Use
raise to trigger exception processing
- Specify the type of exception you're raising using
raise Exception('this is an error message')
- Please make your error messages more informative...
for i in range(4):
try:
if (i % 2) == 1:
raise ValueError('index is odd')
else:
print 'not raising exception for %d' % i
except ValueError, e:
print 'caught exception for %d' % i, e
not raising exception for 0
caught exception for 1 index is odd
not raising exception for 2
caught exception for 3 index is odd

14) Exceptional Style
- Always use exceptions to report errors instead of returning
None, -1, False, or some other value
- Allows callers to separate normal code from error handling
- And sooner or later, your function will probably actually want to return that "special" value
- Note: Python's own
list.find breaks this rule
- Returns -1 if something can't be found
- Throw low, catch high
- I.e., throw lots of very specific exceptions...
- ...but only catch them where you can actually take corrective action
- Because every application handles errors differently
- If someone is using your library in a GUI, you don't want to be printing to
stderr

15) Handling Errors in Tests
- Now know how to check for errors in tests: wrap the test in
try/except
Tests = [
['a', 'a', False], # wrong expected value
['a', 1, False], # wrong type
['abc', 'a', True] # everything legal
]
passes = failures = errors = 0
for (s, p, expected) in Tests:
try:
actual = s.startswith(p)
if actual == expected:
passes += 1
else:
failures += 1
except:
errors += 1
print 'tests:', passes + failures + errors
print 'passes:', passes
print 'failures:', failures
print 'errors:', errors
tests: 3
passes: 1
failures: 1
errors: 1
- Note the deliberate errors in the test cases to exercise the testing code

16) Test-Driven Design
- Tests are actually specifications
- "Given these inputs, this code should behave the following way"
- So write the tests first, then the application code
- Sounds backward, but...
- A great way to clarify specifications
- I write the tests
- "All" you have to do is write code that passes those tests
- Gives programmers a definite goal
- Coding is finished when all tests run
- Particularly useful when trying to fix bugs in old code, as it forces you to figure out how to re-create the bug
- Helps prevent the "one more feature" syndrome
- Ensures that tests actually get written
- People are often too tired, or too rushed, to test after coding
- Helps clarify the Application Programming Interface (API) before it is set in stone
- If something is awkward to test, it can be redesigned before it's written

17) TDD Example
- "I want you to write a function that calculates a running sum of the values in a list"
- Doesn't specify whether to create a new list, or overwrite the input
- Doesn't specify how to handle errors
- Compare that with this:
Tests = [
[[], [], 'empty list'],
[[1], [1], 'single value'],
[[1, 3], [1, 4], 'two values'],
[[1, 3, 7], [1, 4, 11], 'three values'],
[[-1, 1], [-1, 0], 'negative values'],
[[1, 3.0], [1, 4.0], 'mixed types'],
["string", ValueError, 'non-list input'],
[['a'], ValueError, 'non-numeric value']
]
- If the expected result is an exception, pass only if that exception is raised
- If the test doesn't pass, print the comment so that the programmer knows what to look at

18) Design by Contract
- Functions ought to carry their specifications around with them
- Keeping specification and implementation together makes both easier to understand
- And improves the odds that programmers will keep them in sync
- A function is defined by:
- Its pre-conditions: what must be true in order for the function to work correctly
- Its post-conditions: what the function guarantees will be true if its pre-conditions are met
- May also have invariants: things that are true throughout the execution of the function
- Leads to a style of programming called design by contract
- Pre- and post-conditions constrain how the function can evolve
- Can only ever relax pre-conditions (i.e., take a wider range of input)...
- ...or tighten post-conditions (i.e., produce a narrower range of output)
- Tightening pre-conditions, or relaxing post-conditions, would violate the function's contract with its callers

19) Assertions
- Normally specify pre- and post-conditions using assertions
- A statement that something is true at a particular point in a program
- If the assertion's condition is not met, Python raises an
AssertionError exception
- For example:
- Pre-condition: input argument is a non-empty list
- Post-condition: two values from the list such that the first is less than the second
def find_range(values):
'''Find the non-empty range of values in the input sequence.'''
assert (type(values) is list) and (len(values) > 0)
left = min(values)
right = max(values)
assert (left in values) and (right in values) and (left <= right)
return left, right
- Note that the post-condition isn't as exacting as it should be
- Doesn't check that
left is less than or equal to all other values, or that right is greater than or equal to
- The code to check the condition exactly is as likely to contain errors as the function itself
- Which is one of the reasons design by contract isn't as popular as it might be

20) Defensive Programming
- You can (and should) use
assert liberally
- Even if you don't practice design by contract
- Defensive programming is like defensive driving
- Program as if the rest of the world is out to get you
- "Fail early, fail often"
- The less distance there is between the error and you detecting it, the easier it will be to find and fix

21) It's Never Too Late to Do It Right
- Good practice: every time you fix a bug, put in an assertion and a comment
- Because if you made the error, the right code can't be obvious
- And you should protect yourself against someone "simplifying" the bug back in
def can_transmute(element):
'''Can this element be turned into gold?'''
# Bug #172: make sure the input is actually an element.
assert is_valid_element(element)
# Gold is trivial.
if element is Gold:
return True
# Trans-uranic metals and halogens are impossible.
if (element.atomic_number > Uranium.atomic_number) or \
(element in Halogens):
return False
# Look for a sequence of steps that leads to gold.
steps = search_transmutations(element, Gold)
if steps == []:
return False
else:
# Bug #201: must be at least two elements in sequence.
assert len(steps) >= 2
return True

22) Summary
- The real goal of quality assurance isn't to find bugs: it's to figure out where they're coming from, so that they can be prevented
- But without testing, no one (including you) has any right to rely on the program's output
- Only way to ensure quality is to design it in
