This page contains slides and supporting material for the Spring 2012 version of BMS270: Practical Bioinformatics without Programming. Please see the site updates page for archived versions and information on tracking changes.

General Resources

Safari Bookshelf
The Safari bookshelf provides on-line access to a large selection of computer books, primarily from O'Reilly. N.B.:
The BLAST book
Excellent coverage of sequence alignment methods with an emphasis on BLAST. Of particular interest are the coverage of dynamic programming in chapter 3 and the protocols for common BLAST tasks in chapter 9.

Day 1: Sequence Similarity

Slides
Mark's slides for day 1
Entrez
NCBI Entrez cross-database search.
FASTA file format
Ad hoc FASTA specification from NCBI.
Feature Table Format
Specification for the feature table format shared by GenBank (.gb), EMBL, and DDBJ sequence files.
DOTTER
Dotplot sequence comparison program. Windows users should download windotter.zip from Eric Sonnhammer's page. OS X users should download pydotter.py instead. (Advanced OS X users may install DOTTER by installing GTK+ via fink and then downloading the OS X binary from AceDB page).
sequences1.zip
Example sequences for Monday's exercises.
Genes 167:GC1
The primary reference for DOTTER.

Fast, approximate dotplot tools

Pairwise BLAST
NCBI BLAST of two sequences includes a dotplot graphic
Ugene
Sequence "workbench" that includes a nucleotide dotplot tool
MUMmer
Very fast whole genome aligner

Day 2: Sequence Homology

Slides
Mark's slides for day 2
CLUSTALX
Multiple sequence alignment program based on neighbor-joining trees.
JALVIEW
Program for viewing and annotating multiple alignments. (The "Install Anywhere" version is slightly easier to install compared to the "Java Web Start" version).
GFF file format
Specification for the commonly used version 2 of GFF (e.g., this is what JALVIEW uses for sequence annotations).
GFFv3 file format
Recent update of GFF by Lincoln Stein, with stricter definitions and better support for relationships among features, as used by newer programs such as Gbrowse2.

Day 3: Heuristic Alignment and Searching

Slides
Mark's slides for day 3
NCBI BLAST
NCBI's BLAST portal
JMB 215:403
The primary reference for BLAST, giving a fast heuristic method for approximating the local alignment methods and a statistical framework for interpreting the results.
CABIOS 4:11
Myers and Miller's implementation of Smith-Waterman local alignment in O(MN) time and O(N) space. This is the pairwise local alignment method used by BLAST and CLUSTALW.
Web shuffler
A simple tool for shuffling sequences (to use as negative controls in, e.g. BLAST searches). You can also download a local version or view the source of the web version.
Cygwin
UNIX-like environment for Windows

Day 4: Multiple Alignment and Phylogeny

Slides
Mark's slides for day 4
CLUSTAL -> JALVIEW converter
Web tool to reformat bootstrapped CLUSTAL trees for JALVIEW

Day 5: Hidden Markov Models

Slides
Mark's slides for day 5
HMMer
Sean Eddy's profile-HMM implementation
HMMer web interface
New web interface to PHMMer, HMMscan, and HMMsearch
InterProScan
Search interface for InterPro, a meta-database of 15 motif databases (mostly HMM based)
PLoS Comp. Biol. 4:e1000069
Reference for HMMer 3 heuristic
Biological Sequence Analysis
In addition to excellent coverage of hidden Markov models (and the related dynamic programming algorithms for generating and searching with them) this book gives good general coverage of sequence alignment and statistics.
AGA1.fasta
Protein sequence of low-complexity mannoprotein from S. cerevisiae.

Days 6 and 7: Expression Profiling

Slides
Mark's slides for day 6
PNAS 95:14863 (Eisen et. al.)
Paper introducing cluster analysis for microarrays.
supp2data.txt
Supplementary data for PNAS 95:14863 figure 2 (yeast expression profiles) converted to tab-delimited text (TDT) format.
better_shuffle.txt
supp2data.txt with rows shuffled to remove Eisen's gene clustering and headers modified for CDT format. Corrected annotations
fig1data.txt
Data for PNAS 95:14863 figure 1 (human fibroblast serum response) taken from the primary reference, Science 283:83
fig1data_shuffled.txt
fig1data.txt with rows shuffled to remove clustering
fig1data_shuffled_orderless.txt
fig1data.txt with rows shuffled and GORDER column stripped. (shuffling independent of fig1data_shuffled.txt)
Cluster3
Michael de Hoon's port of Michael Eisen's CLUSTER program.
JavaTreeView
Alok Saldanha's port of Michael Eisen's TREEVIEW program.
Numerical Recipes
Excellent reference for numerical methods. The older editions are available on-line at this link. The new (3rd) edition adds a chapter covering clustering and HMMs. See chapter 7.1 (and Comm. ACM 31:1192) for background on random number generators and chapters 14 and 15 for statistics and model fitting.

Day 8: Systematic Annotation and Graph Visualization

Slides
Mark's slides for day 8
Bioinformatics 20:3710
Primary reference for the SGD GO Term Finder
The Gene Ontology
GO Consortium homepage, featuring the AmiGO browser
GO Annotations
GO Consortium links to GO Annotated genomes
Evidence Codes
GO Consortium evidence code definitions
SGD GO Term Finder
Search for significant enrichment of GO terms in a set of S. cerevisiae genes (slow)
SGD GO Slim Mapper
Map a set of S. cerevisiae genes to a simplified GO subset, without statistics (fast)
SGD GO Tutorial
A guided tour of the GO resources at SGD
Cytoscape
Visualization/analysis tool for very large graphs. Supports pathways, interaction networks, gene ontologies, and expression data.
BiNGO
Cytoscape plugin for calculating enrichment of GO terms in a gene set (note that there are many tools for solving this problem, c.f., the GO consortium's list).
GraphViz
Command-line graph layout program from Bell Labs. Good for moderate-sized graphs. Inkscape is useful for exploring and annotating SVG output from GraphViz.
NetworkX
Python graph library from LLNL. Can connect to GraphViz and/or Matplotlib for visualization.
yeastHighQuality.sif
Yeast interactome graph from Cytoscape example directory
CellCycle_B.uids
Gene list from JavaTreeView for cluster B (cell cycle) from PNAS 95:14863
CellCycle_B.sif
Subgraph of yeastHighQuality.sif corresponding to the genes in CellCycle_B.uids

Day 9: 3D Structures

Slides
Mark's slides for day 9 (post-facto)
properties.pml
Example PyMol script
Colors.pml
Gianne's version of the example PyMol script. Mac users should try this one if properties.pml doesn't work.
RCSB/PDB
The protein databank
PyMol
Viewer for 3D protein structures (PDB files)
PyMol Introduction
Getting started tutorial
PyMol Commands
More in-depth command documentation
Selection Algebra
Wiki page on selection commands
RasMol
Older alternative to PyMol. Less pretty but nice for quick structure viewing.
1CLL
Calmodulin
2CHT
Chorismate mutase (B. subtillus)
1A02
Fos/Jun/NFAT with DNA (Nature 392:42)
1ASM
AATase
1QDM
Phytepsin (EMBO J. 18:3947)
1GID
P4-P6 Ribozyme
6TNA
tRNA

Day 10: Genome Browsing

Slides
Mark's slides for day 10
MochiView
Genome browser aimed at ChIP-chip, ChIP-seq, and motif finding. Download the full software (v1.45), the manual (pdf), the tutorial (pdf), and the tutorial database (cvw).
CGD
Candida albicans genome database (for the example data in the MochiView tutorial)
PLoS Biology 7:e1000133
Primary reference for the MochiView tutorial data set
sig_genes.mochiview
Sheet 1 of supplemental dataset S4 from PLoS Biology 7:e1000133 (giving control/zap1-/- log ratios for significantly differential genes) reformatted for MochiView
IGV
Integrative Genomics Viewer from the BROAD Institute. Similar to MochiView, but supports viewing aligned sequencing reads.
UGENE
Molecular biology "workbench" (similar to Vector-NTI) with support for viewing next-gen sequencing results (e.g., BAM files).
Motif finding tools
MEME is one of the first profile-based motif tools; it finds motifs in unaligned sequences using an EM algorithm. Monkey and RVista are more recent tools that use comparative genomics to reduce false positive signal.
Web of Science
UCSF links for the Web of Science and SCOPUS citation databases
PubMed RSS feeds
Tutorial on creating RSS feeds for PubMed searches
Ubuntu
Ubuntu Linux distribution. The installation CD can be booted as a "Live CD", allowing you to try Linux with no change to your computer.
Knoppix
Knoppix is a "Live CD" version of Debian GNU/Linux (the basis for Ubuntu). Knoppix may be a bit less user friendly than Ubuntu, but it may boot faster on some computers. More information about Knoppix can be found on this unofficial site.