This page contains slides and supporting material for the Summer 2010
version of BMS270:Practical Bioinformatics. The website for the
current course is here. Please see the
site updates page for information on tracking
changes.
General Resources
Python
Homepage for the Python programming language (download,
documentation, etc.)
Learning Python
Excellent introduction to Python. The 4th edition covers Python3 (for biology, it's probably best to stick with Python2 until modules like numpy are ported to Python3 -- expect this to happen in a year or two).
Safari Bookshelf
The Safari bookshelf provides on-line access to a large selection of programming books, primarily from O'Reilly. Useful references for this course include: Learning Python, Programming Python, and BLAST.
Dive Into Python
Programmer-oriented introduction to Python. Faster paced than
Learning Python, but the entire book can be downloaded as a free pdf
file.
Day 1: Introduction to Python
Slides
Mark's slides for day 1
Transcript
Mark's IDLE session from day 1
Day 2: File Formats
Slides
Mark's slides for day 2
Transcript
Mark's IDLE session from day 2
stats1.py
Annotated solutions for the homework problems from day 1
Day 3: Comparing expression profiles
Slides
Mark's slides for day 3
Transcript
Mark's IDLE session from day 3
filetest.py, filetest2.py
Mark's modules from day 3
PNAS 95:14863 (Eisen et. al.)
Paper introducing cluster analysis for microarrays.
supp2data.tdt
Supplementary data for PNAS 95:14863 figure 2 (yeast expression
profiles) converted to tab-delimited text (TDT) format.
Numerical Recipes
Excellent reference for numerical methods. The older editions are available on-line at this link. The new (3rd) edition adds a chapter covering clustering and HMMs. See chapter 7.1 (and Comm. ACM 31:1192) for background on random number generators and chapters 14 and 15 for statistics and model fitting.
rpy
Module for interacting with an R session from Python (some people asked about this after
class). On Ubuntu or Debian Linux, this can be installed via "apt-get install python-rpy".
Please let me know if you successfully install rpy on Windows or OS X.
Day 4: Distance matrices
Slides
Mark's slides for day 4 (corrected for the indentation bug
that Chris caught)
Transcript
Mark's IDLE session from day 4
parseTdt.py, tdtClass.py
Mark's modules from day 4
Cluster3
Michael de Hoon's port of Michael Eisen's CLUSTER program.
cluster.c
Source code (version 1.49) for Michael de Hoon's Cluster3 implementation. Redistribution is governed by the Python License (See the Cluster3 website for full source code and license details).
JavaTreeView
Alok Saldanha's port of Michael Eisen's TREEVIEW program.
MeV
Cluster/TreeView alternative from JCVI (TIGR), similar to Acuity. Implements many statistical tools, including SAM.
Day 5: Clustering
Slides
Mark's slides for day 5
Transcript
Mark's IDLE session from day 5
TdtRatios.py
Revised tdtClass module
Here are the revision control tools that I mentioned after class.
git
Distributed version control system. Very fast, but optimized for Linux.
mercurial
Distributed version control system. Very similar to git, but better cross-platform support. Also, it's written in Python =)
xxdiff
Graphical tool for comparing two files. Windows and OS X versions may be slightly buggy.
Day 6: Sequences
Slides
Mark's slides for day 6
Transcript
Mark's IDLE session from day 6
geneticCode.py
Universal genetic code as a python dictionary
scoreMatrices.py
Nucleotide and amino-acid scoring matrices as python dictionaries
The re module
Documentation for Python's regular expression module
Mastering Regular Expressions
General reference for everything you ever wanted to know
about regular expressions across popular programming languages.
Day 7: Sequence Alignment
Slides
Mark's slides for day 7
Transcript
Mark's IDLE session from day 7
stringsearch.py
String searching example
CLUSTAL
Multiple sequence alignment program based on neighbor-joining trees.
JALVIEW
Program for viewing and annotating multiple alignments.
The BLAST book
Excellent coverage of sequence alignment methods with an emphasis on BLAST. Of particular interest are the coverage of dynamic programming in chapter 3 and the protocols for common BLAST tasks in chapter 9.
Day 8: Dynamic Programming
Slides
Mark's slides for day 8
Transcript
Mark's IDLE session from day 8
dp.py
Needleman-Wunsch implementation with no gap opening penalty.
Biological Sequence Analysis
This is the Sean Eddy book that I mentioned in class.
In addition to excellent coverage of hidden Markov models
(and the related dynamic programming algorithms for generating
and searching with them) this book gives good general coverage
of sequence alignment and statistics.
Needleman-Wunsch
Primary reference for global alignment by dynamic programming.
Smith-Waterman
Primary reference for local alignment by dynamic programming. (There is an earlier paper that gives the scoring method, but this paper gives the algorithm).
Gotoh
Gotoh's speed-up of Smith-Waterman from O(m^2 n) to O(mn) time.
Myers and Miller
Myers and Miller's update to Gotoh's algorithm, improving the space requirement
from O(mn) to O( n ).
BLAST
The primary reference for BLAST, giving a fast heuristic method for approximating the local alignment methods and a statistical framework for interpreting the results.
Day 9: Multiple Sequence Alignment
Slides
Mark's slides for day 9
Transcript
Mark's IDLE session from day 9
makeIdent.py
The makeIdent function that we added to dp.py during class
to debug figure 3-5 from the BLAST book.
dp2.py
Extensions of yesterday's dynamic-programming code to cover
non-zero gap opening penalties (nwg) and local alignments
(sw). Note that the two new functions use distinct fill algorithms
(nwg_fill and sw_fill) but a common traceback function.
njplot
Simple program for viewing (bootstrapped) neighbor-joining trees
Hsp82aa.fasta
(Predicted) HSP90 protein sequences for assorted fungi and E. coli
2IOQ.pdb
Crystal structure of E. coli HSP90
Day 10: Microarray statistics
Slides
Mark's slides for day 10
PNAS 98:5116
Primary reference for SAM