This page contains slides and supporting material for the Spring 2012
version of BMS270: Practical Bioinformatics without Programming.
Please see the site updates page for archived
versions and information on tracking changes.
General Resources
Safari Bookshelf
The Safari bookshelf provides on-line access to a large selection of
computer books, primarily from O'Reilly. N.B.:
- You need to be on the UCSF network to access Safari.
- You need to authenticate your browser by visiting the main
Safari Bookshelf link above before the book-specific links
below will work.
The BLAST book
Excellent coverage of sequence alignment methods with an emphasis on BLAST. Of particular interest are the coverage of dynamic programming in chapter 3 and the protocols for common BLAST tasks in chapter 9.
Day 1: Sequence Similarity
Slides
Mark's slides for day 1
Entrez
NCBI Entrez cross-database search.
FASTA file format
Ad hoc FASTA specification from NCBI.
Feature Table Format
Specification for the feature table format shared by GenBank (.gb),
EMBL, and DDBJ sequence files.
DOTTER
Dotplot sequence comparison program. Windows users should
download windotter.zip from Eric Sonnhammer's page. OS X users
should download pydotter.py instead.
(Advanced OS X users may install DOTTER by installing GTK+ via fink and then downloading the OS X binary
from AceDB page).
sequences1.zip
Example sequences for Monday's exercises.
Genes 167:GC1
The primary reference for DOTTER.
Fast, approximate dotplot tools
Pairwise BLAST
NCBI BLAST of two sequences includes a dotplot graphic
Ugene
Sequence "workbench" that includes a nucleotide dotplot tool
MUMmer
Very fast whole genome aligner
Day 2: Sequence Homology
Slides
Mark's slides for day 2
CLUSTALX
Multiple sequence alignment program based on neighbor-joining trees.
JALVIEW
Program for viewing and annotating multiple alignments. (The
"Install Anywhere" version is slightly easier to install compared
to the "Java Web Start" version).
GFF file format
Specification for the commonly used version 2 of GFF (e.g.,
this is what JALVIEW uses for sequence annotations).
GFFv3 file format
Recent update of GFF by Lincoln Stein, with stricter definitions
and better support for relationships among features, as used by newer
programs such as Gbrowse2.
Day 3: Heuristic Alignment and Searching
Slides
Mark's slides for day 3
NCBI BLAST
NCBI's BLAST portal
JMB 215:403
The primary reference for BLAST, giving a fast heuristic method for approximating the local alignment methods and a statistical framework for interpreting the results.
CABIOS 4:11
Myers and Miller's implementation of Smith-Waterman local alignment
in O(MN) time and O(N) space. This is the pairwise local alignment
method used by BLAST and CLUSTALW.
Web shuffler
A simple tool for shuffling sequences (to use as negative
controls in, e.g. BLAST searches). You can also
download a local version
or view the source of the web version.
Cygwin
UNIX-like environment for Windows
Day 4: Multiple Alignment and Phylogeny
Slides
Mark's slides for day 4
CLUSTAL -> JALVIEW converter
Web tool to reformat bootstrapped CLUSTAL trees for JALVIEW
Day 5: Hidden Markov Models
Slides
Mark's slides for day 5
HMMer
Sean Eddy's profile-HMM implementation
HMMer web interface
New web interface to PHMMer, HMMscan, and HMMsearch
InterProScan
Search interface for InterPro, a meta-database of 15 motif databases
(mostly HMM based)
PLoS Comp. Biol. 4:e1000069
Reference for HMMer 3 heuristic
Biological Sequence Analysis
In addition to excellent coverage of hidden Markov models
(and the related dynamic programming algorithms for generating
and searching with them) this book gives good general coverage
of sequence alignment and statistics.
AGA1.fasta
Protein sequence of low-complexity mannoprotein from
S. cerevisiae.
Days 6 and 7: Expression Profiling
Slides
Mark's slides for day 6
PNAS 95:14863 (Eisen et. al.)
Paper introducing cluster analysis for microarrays.
supp2data.txt
Supplementary data for PNAS 95:14863 figure 2 (yeast expression
profiles) converted to tab-delimited text (TDT) format.
better_shuffle.txt
supp2data.txt with rows shuffled to remove Eisen's gene clustering
and headers modified for CDT format. Corrected annotations
fig1data.txt
Data for PNAS 95:14863 figure 1 (human fibroblast serum response)
taken from the primary reference, Science 283:83
fig1data_shuffled.txt
fig1data.txt with rows shuffled to remove clustering
fig1data_shuffled_orderless.txt
fig1data.txt with rows shuffled and GORDER column stripped.
(shuffling independent of fig1data_shuffled.txt)
Cluster3
Michael de Hoon's port of Michael Eisen's CLUSTER program.
JavaTreeView
Alok Saldanha's port of Michael Eisen's TREEVIEW program.
Numerical Recipes
Excellent reference for numerical methods. The older editions are available on-line at this link. The new (3rd) edition adds a chapter covering clustering and HMMs. See chapter 7.1 (and Comm. ACM 31:1192) for background on random number generators and chapters 14 and 15 for statistics and model fitting.
Day 8: Systematic Annotation and Graph Visualization
Slides
Mark's slides for day 8
Bioinformatics 20:3710
Primary reference for the SGD GO Term Finder
The Gene Ontology
GO Consortium homepage, featuring the AmiGO browser
GO Annotations
GO Consortium links to GO Annotated genomes
Evidence Codes
GO Consortium evidence code definitions
SGD GO Term Finder
Search for significant enrichment of GO terms in a set of
S. cerevisiae genes (slow)
SGD GO Slim Mapper
Map a set of S. cerevisiae genes to a simplified GO subset,
without statistics (fast)
SGD GO Tutorial
A guided tour of the GO resources at SGD
Cytoscape
Visualization/analysis tool for very large graphs.
Supports pathways, interaction networks, gene ontologies,
and expression data.
BiNGO
Cytoscape plugin for calculating enrichment of GO terms in a gene
set (note that there are many tools for solving this problem,
c.f., the GO consortium's list).
GraphViz
Command-line graph layout program from Bell Labs.
Good for moderate-sized graphs.
Inkscape is useful
for exploring and annotating SVG output from GraphViz.
NetworkX
Python
graph library from LLNL. Can connect to GraphViz and/or
Matplotlib
for visualization.
yeastHighQuality.sif
Yeast interactome graph from Cytoscape example directory
CellCycle_B.uids
Gene list from JavaTreeView for cluster B (cell cycle)
from PNAS 95:14863
CellCycle_B.sif
Subgraph of yeastHighQuality.sif corresponding to the genes
in CellCycle_B.uids
Day 9: 3D Structures
Slides
Mark's slides for day 9 (post-facto)
properties.pml
Example PyMol script
Colors.pml
Gianne's version of the example PyMol script. Mac users should
try this one if properties.pml doesn't work.
RCSB/PDB
The protein databank
PyMol
Viewer for 3D protein structures (PDB files)
PyMol Introduction
Getting started tutorial
PyMol Commands
More in-depth command documentation
Selection Algebra
Wiki page on selection commands
RasMol
Older alternative to PyMol. Less pretty but nice for quick
structure viewing.
1CLL
Calmodulin
2CHT
Chorismate mutase (B. subtillus)
1A02
Fos/Jun/NFAT with DNA (Nature 392:42)
1ASM
AATase
1QDM
Phytepsin (EMBO J. 18:3947)
1GID
P4-P6 Ribozyme
6TNA
tRNA
Day 10: Genome Browsing
Slides
Mark's slides for day 10
MochiView
Genome browser aimed at ChIP-chip, ChIP-seq, and motif finding.
Download the full software (v1.45), the manual (pdf), the tutorial (pdf),
and the tutorial database (cvw).
CGD
Candida albicans genome database (for the example data
in the MochiView tutorial)
PLoS Biology 7:e1000133
Primary reference for the MochiView tutorial data set
sig_genes.mochiview
Sheet 1 of supplemental dataset S4 from PLoS Biology 7:e1000133
(giving control/zap1-/- log ratios for significantly differential genes)
reformatted for MochiView
IGV
Integrative Genomics Viewer from the BROAD Institute. Similar
to MochiView, but supports viewing aligned sequencing reads.
UGENE
Molecular biology "workbench" (similar to Vector-NTI) with
support for viewing next-gen sequencing results (e.g., BAM
files).
Motif finding tools
MEME is one of the first
profile-based motif tools; it finds motifs in unaligned sequences
using an EM algorithm.
Monkey
and RVista are more recent
tools that use comparative genomics to reduce false positive signal.
Web of Science
UCSF links for the Web of Science and SCOPUS citation databases
PubMed RSS feeds
Tutorial on creating RSS feeds for PubMed searches
Ubuntu
Ubuntu Linux distribution. The installation CD can be booted
as a "Live CD", allowing you to try Linux with no change to your
computer.
Knoppix
Knoppix is a "Live CD" version of Debian GNU/Linux (the basis for
Ubuntu). Knoppix may be a bit less user friendly than Ubuntu,
but it may boot faster on some computers. More information about
Knoppix can be found on this
unofficial site.