2) Bhargan Basepair
Bhargan Basepair received a B.Sc. in biochemistry five years ago. He
has been working since then for Genes'R'Us, a biotech firm with labs
in four countries. He did a Java programming course as a freshman,
and a bioinformatics course using Perl as a senior.
Bhargan and his colleagues are developing fuzzy pattern-matching
algorithms for finding similarities between DNA records in standard
databases. To help other Genes'R'Us researchers, and to test his
group's heuristics, Bhargan runs an overnight sequence query service.
Researchers email sequences in a variety of formats (in-line,
attachments, URLs to pages behind the company firewall, etc.).
Bhargan saves them in files called
search/a.in,
search/b.in, and so on, then edits them to add query
directives. He is very conscientious, and almost never accidentally
overwrites one query with another.
Before leaving at night, he runs a Perl script that processses these
inputs to create output files with matching names like
search/a.out. When Bhargan comes in the next morning, he pages
through his mail again, sending
.out files to the appropriate
people. (He almost never sends the wrong file to the wrong person.)
He then uses another Perl script to copy all the input and output
files to a directory with a name corresponding to the date, such as
2009-07-23. He and his colleagues would like to do statistics
on these saved queries and results to see how well their algorithms
are doing, but have never found the time.
This course will teach Bhargan how to automate his overnight service
by writing simple scripts to retrieve, process, and reply to email
queries. Those scripts will automatically record queries, results,
and other data, and produce a daily summary of the performance of the
pattern-matching algorithms.

3) Helen Helmet
Helen Helmet, a Ph.D. student in mechanical engineering, is currently
doing a six-month internship at an engineering firm designing
carbon-fiber helmets for firefighters and other emergency service
personnel. Her undergraduate courses included an introduction to
scientific computing using MATLAB, a robotics course using C, and a
numerical methods course that also used MATLAB. She taught herself
Fortran during a co-op placement between her junior and senior years,
and used it again in a graduate course on finite elements.
Helen's task is to model the non-combustive thermal degradation
(otherwise known as "melting") of candidate materials. Her starting
point is a 14,000-line program her supervisor wrote a decade ago.
After deciding that there isn't time to re-write it in C++ (which she
would like to learn), she comments out the calls to the mesh
deformation routine in the main loop and begins to write a
replacement. She sometimes deletes what she has written and starts
over three or four times before she is satisfied.
Helen tests her program by writing the total heat content of the mesh
at each time step to a file. She then loads this data into MATLAB to
graph the percentage differences between these values and the ones
produced by the original program for six sample problems. In one
case, the difference grew as large as 30% by the end of the
simulation. Helen added
write statements to her program to
display values until she managed to convince herself that the
difference was due to a bug in the original subroutines.
Helen keeps a to-do list on her home page. Every two or three days,
she updates this list to show the progress she has made. She keeps
completed tasks on the page until the end of the month, when she
writes a short status report for her supervisor.
This course will teach Helen to design software before she starts
typing, and that there are better ways to manage code evolution than
commenting out one section, and replacing it with another. She will
also learn more effective testing and debugging procedures, and how to
use a version control system to ensure that she can back down to an
old version of code when she needs to. Finally, she will be shown how
to use an issue-tracking system to manage her to-do list, and how to
write a small script to generate his monthly progress report
automatically.

4) Stefan Synthesis
Stefan Synthesis is a graduate student in chemistry who is working as
a lab technician to help cover his costs. His only programming
experience is a general first-year introduction to computational
science using Python.
Stefan's supervisor is studying the production of fullerenes (also
known as "buckyballs"). Each set of experiments involves 100
different reactant mixtures, 20 different temperature regimes, and 5
different pressures. Using a machine built by a collaborating lab,
Stefan can run all the mixture and temperature combinations at once,
so that the output of each experiment is five files containing 2000
lines of data each.
The controller for the experimental machine writes these files to
Stefan's workstation approximately an hour after the experiment
begins. To analyze them, Stefan opens them with Excel, copies and
pastes to merge the data into one spreadsheet, then creates a chart
using the chart wizard. He saves the chart as a PNG file on the
group's web site, along with the original data file.
Two or three times a week, Stefan receives results from his
supervisor's collaborators. He creates charts for each, which he
uploads to the web site, then merges summary statistics into a master
spreadsheet.
This course will teach Stefan how to automate the process described
above. More importantly, it will teach him how to track the
provenance of the data he is working with, so that scientists in his
group and others can trace backward from the final charts to the raw
data they represent.
