This page contains slides and supporting material for the Spring 2011 version of BMS270: Practical Bioinformatics without Programming. Please see the site updates page for archived versions and information on tracking changes.

General Resources

Safari Bookshelf
The Safari bookshelf provides on-line access to a large selection of computer books, primarily from O'Reilly (you must be on the UCSF network to access Safari).
The BLAST book
Excellent coverage of sequence alignment methods with an emphasis on BLAST. Of particular interest are the coverage of dynamic programming in chapter 3 and the protocols for common BLAST tasks in chapter 9.

Day 1: Sequence Similarity

Slides
Mark's slides for day 1
Entrez
NCBI Entrez cross-database search.
FASTA file format
Ad hoc FASTA specification from NCBI.
Feature Table Format
Specification for the feature table format shared by GenBank (.gb), EMBL, and DDBJ sequence files.
DOTTER
Dotplot sequence comparison program. Windows users should download windotter.zip from Eric Sonnhammer's page. OS X users should download the Q virtual machine and the biosqueeze_rsa private key. The disk disk image containing DOTTER will be distributed during class. (Advanced OS X users may install DOTTER via the fink package on the AceDB page).
sequences1.zip
Example sequences for Monday's exercises.
Genes 167:GC1
The primary reference for DOTTER.
Fugu
Graphical file transfer client for OS X. Useful for moving files to and from the Q virtual machine (connect to "localhost" with username "bms", port "8022", and password (when prompted) "bms").

Day 2: Sequence Homology

Slides
Mark's slides for day 2
CLUSTALX
Multiple sequence alignment program based on neighbor-joining trees.
JALVIEW
Program for viewing and annotating multiple alignments. (The "Install Anywhere" version is slightly easier to install compared to the "Java Web Start" version).
GFF file format
Specification for the commonly used version 2 of GFF (e.g., this is what JALVIEW uses for sequence annotations).
GFFv3 file format
Recent update of GFF by Lincoln Stein, with stricter definitions and better support for relationships among features, as used by newer programs such as Gbrowse2.

Day 3: Heuristic Alignment and Searching

Slides
Mark's slides for day 3
NCBI BLAST
NCBI's BLAST portal
JMB 215:403
The primary reference for BLAST, giving a fast heuristic method for approximating the local alignment methods and a statistical framework for interpreting the results.
CABIOS 4:11
Myers and Miller's implementation of Smith-Waterman local alignment in O(MN) time and O(N) space. This is the pairwise local alignment method used by BLAST and CLUSTALW.
ShuffleSequence.py
At the end of the class, I showed a way to shuffle sequences in Python (to use as negative controls in, e.g. BLAST searches). This script is an easier to understand version of what I showed in class (installation instructions are at the top of the script -- read them in, e.g. Notepad or TextEdit).
Web shuffler
A web version of the ShuffleSequence.py script, if you don't want to run it locally. (If you're interested in the changes that convert the script from command-line to web, the source is here).
Cygwin
UNIX-like environment for Windows

Day 4: Multiple Alignment and Phylogeny

Slides
Mark's slides for day 4

Day 5: Hidden Markov Models

Slides
Mark's slides for day 5
HMMer
Sean Eddy's profile-HMM implementation
HMMer web interface
New web interface to PHMMer, HMMscan, and HMMsearch
InterProScan
Search interface for InterPro, a meta-database of 15 motif databases (mostly HMM based)
PLoS Comp. Biol. 4:e1000069
Reference for HMMer 3 heuristic
Biological Sequence Analysis
In addition to excellent coverage of hidden Markov models (and the related dynamic programming algorithms for generating and searching with them) this book gives good general coverage of sequence alignment and statistics.

Day 6: Expression Profiling

Slides
Mark's slides for day 6
PNAS 95:14863 (Eisen et. al.)
Paper introducing cluster analysis for microarrays.
supp2data.txt
Supplementary data for PNAS 95:14863 figure 2 (yeast expression profiles) converted to tab-delimited text (TDT) format.
shuffled.cdt
supp2data.txt with rows shuffled to remove Eisen's gene clustering and headers modified for CDT format.
Cluster3
Michael de Hoon's port of Michael Eisen's CLUSTER program.
JavaTreeView
Alok Saldanha's port of Michael Eisen's TREEVIEW program. (Start with version 1.1.4r5)
Numerical Recipes
Excellent reference for numerical methods. The older editions are available on-line at this link. The new (3rd) edition adds a chapter covering clustering and HMMs. See chapter 7.1 (and Comm. ACM 31:1192) for background on random number generators and chapters 14 and 15 for statistics and model fitting.

Day 7: Systematic Annotation

Slides
Mark's slides for day 7
The Gene Ontology
GO Consortium homepage, featuring the AmiGO browser
GO Annotations
GO Consortium links to GO Annotated genomes
Evidence Codes
GO Consortium evidence code definitions
SGD GO Term Finder
Search for significant enrichment of GO terms in a set of S. cerevisiae genes (slow)
SGD GO Slim Mapper
Map a set of S. cerevisiae genes to a simplified GO subset, without statistics (fast)
SGD GO Tutorial
A guided tour of the GO resources at SGD

Day 8: Estimating Relative Expression

Slides
Mark's slides for day 8 (revised for continued discussion on day 9)
PNAS 98:5116
Primary reference for SAM
BMC Bioinformatics 5:54
Primary reference for BAGEL >= 3. (See also LOX, for RnaSeq data)
BAGEL and LOX
Bayesian frameworks for estimating expression levels
MeV
Cluster/TreeView alternative from JCVI (TIGR), similar to Acuity. Implements many statistical tools, including SAM.

Day 9: Exploring Graphs

Cytoscape
Visualization/analysis tool for very large graphs. Supports pathways, interaction networks, gene ontologies, and expression data.
ScSOK2_3.sif
An example of a small network in SIF format for Cytoscape
Gene ontology
Full set of GO terms and their relationships, downloaded from SGD 4/13/2010.
gene_association.sgd
SGD annotations associating GO terms with yeast genes (including the evidence codes and literature references for these annotations), downloaded from SGD 4/13/2010.
BiNGO
Cytoscape plugin for calculating enrichment of GO terms in a gene set (note that there are many tools for solving this problem, c.f., the GO consortium's list).
GraphViz
Command-line graph layout program from Bell Labs. Good for moderate-sized graphs. Inkscape is useful for exploring and annotating SVG output from GraphViz.
NetworkX
Python graph library from LLNL. Can connect to GraphViz and/or Matplotlib for visualization.

Day 10: Exploring Graphs

Slides
Mark's slides for day 10
MochiView
Genome browser aimed at ChIP-chip, ChIP-seq, and motif finding. Download the full software (v1.45), the manual (pdf), the tutorial (pdf), and the tutorial database (cvw).
CGD
Candida albicans genome database (for the example data in the MochiView tutorial)
Web of Science
UCSF links for the Web of Science and SCOPUS citation databases
PubMed RSS feeds
Tutorial on creating RSS feeds for PubMed searches
Ubuntu
Ubuntu Linux distribution. The installation CD can be booted as a "Live CD", allowing you to try Linux with no change to your computer.
Knoppix
Knoppix is a "Live CD" version of Debian GNU/Linux (the basis for Ubuntu and the version of Linux that we used for DOTTER on the first day of class). Knoppix may be a bit less user friendly than Ubuntu, but it may boot faster on some computers. More information about Knoppix can be found on this unofficial site.