Dictionaries

Dictionaries are associative arrays that map unique, hashable keys to arbitrary values
As discussed in class, "hashable" effectively means "not expected to change after it's been added to the dictionary"
- e.g., tuples are hashable, but lists aren't

Defining a dictionary using curly-braces (analogous to defining a list with square braces)

d = {"tyr1":3,"abc1":5,"Hello":"world"}

As for lists, strings, and tuples, we can index a dictionary using square braces, but, instead of using an integer index, we use one of the dictionary's keys.

d["tyr1"]

3

d["abc1"]

5

d["Hello"]

'world'

tuples are valid keys

d = {(1,2,3):5}

lists aren't

d = {[1,2,3]:5}

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-1d6c93b89271> in <module>()
----> 1 d = {[1,2,3]:5}

TypeError: unhashable type: 'list'

Dictionary keys are unique. If I define a dictionary with multiple instances of the key "Hello", only the last instance is retained.

d = {"tyr1":3,"abc1":5,"Hello":"world","Hello":2}

d

{'Hello': 2, 'abc1': 5, 'tyr1': 3}

You can get a list of a dictionaries keys with its keys method, and iteration over a dictionary is defined as iteration over its keys.

Dictionary iteration is deterministic; i.e. as long as the contents of the dictionary doesn't change, two iterations over the dictionary will visit its keys in the same order
but not well defined -- it is difficult to predict ahead of time in what order the keys will be visited
if you do want well defined iteration over a dictionary, use a sorted list of its keys, e.g.:for key in sorted(d): i = d[key] # do something based on i

d.keys()
for i in d:
    print i

['tyr1', 'abc1', 'Hello']

for (key,val) in d.items():
    print key
    print val

tyr1
3
abc1
5
Hello
2

As for lists, we can use the indexing syntax to assign a new value to a dictionary key

d

{'Hello': 2, 'abc1': 5, 'tyr1': 3}

d["Hello"] = 5
d

{'Hello': 5, 'abc1': 5, 'tyr1': 3}

This method also works for adding a new key,value pair to the dictionary

d["possum"] = 12

d

{'Hello': 5, 'abc1': 5, 'possum': 12, 'tyr1': 3}

There are two ways to check if a dictionary has a key without risk of throwing a KeyError or modifying the dictionary:

Use the dictionary's has_key method
Use the in operator (this also works for lists, tuples, sets, and strings)

d.has_key("possum")

True

d.has_key("opposum")

False

"possum" in d

True

Sequence transformation exercises

A dictionary for complementing DNA bases

Assuming we've normalized our DNA strings to uppercase (almost always a good idea)
If we represent RNA as cDNA, then this works for RNA as well (which is why the cDNA representation is almost always a good idea)
It's useful to let N map to itself, to handle, e.g., genomic sequence with undetermined bases

comp = {"A":"T","G":"C","T":"A","C":"G","N":"N"}

Using the random module from the python standard library to generate some random sequences

from random import seed, choice

Seed the random number generator (we don't have to do this, but doing so means our "random" code will get a deterministic stream of numbers, making it reproducible/debuggable)

seed(42)

The choice function implements uniform sampling with replacement

choice("ATGC")

'G'

choice("ATGC")

'A'

(Aside: sample implements uniform sampling without replacement)

from random import sample

sample("ATGC",3)

['T', 'A', 'C']

The other tool we'll need is string's join method. It takes a list of strings (on the right) and concatenates them with the given string (on the left)

"this is a string".join(("A","B","C"))

'Athis is a stringBthis is a stringC'

" ".join(("A","B","C"))

'A B C'

"".join(("A","B","C"))

'ABC'

Okay, we've got all of the tools we need -- choose 50 deoxynucleotides at random and concatenate them into a single sequence

s = "".join(choice("ATGC") for i in range(50))
s

'GCATAAGAAGGAGCACGTACTAACGCGGCTGCGCGGAATAAATGTTATCG'

If we want to build up strings more deliberately (e.g., in a for loop) the += operator is useful

(note that we can't append to a string)

x = "some string"
x += "A"
x

'some stringA'

Here's a useful dictionary for the translation exercise

from geneticCode import geneticCode

print geneticCode

{'CTT': 'L', 'ATG': 'M', 'ACA': 'T', 'ACG': 'T', 'ATC': 'I', 'AAC': 'N', 'ATA': 'I', 'AGG': 'R', 'CCT': 'P', 'ACT': 'T', 'AGC': 'S', 'AAG': 'K', 'AGA': 'R', 'CAT': 'H', 'AAT': 'N', 'ATT': 'I', 'CTG': 'L', 'CTA': 'L', 'CTC': 'L', 'CAC': 'H', 'AAA': 'K', 'CCG': 'P', 'AGT': 'S', 'CCA': 'P', 'CAA': 'Q', 'CCC': 'P', 'TAT': 'Y', 'GGT': 'G', 'TGT': 'C', 'CGA': 'R', 'CAG': 'Q', 'TCT': 'S', 'GAT': 'D', 'CGG': 'R', 'TTT': 'F', 'TGC': 'C', 'GGG': 'G', 'TAG': '*', 'GGA': 'G', 'TAA': '*', 'GGC': 'G', 'TAC': 'Y', 'TTC': 'F', 'TCG': 'S', 'TTA': 'L', 'TTG': 'L', 'TCC': 'S', 'ACC': 'T', 'TCA': 'S', 'GCA': 'A', 'GTA': 'V', 'GCC': 'A', 'GTC': 'V', 'GCG': 'A', 'GTG': 'V', 'GAG': 'E', 'GTT': 'V', 'GCT': 'A', 'TGA': '*', 'GAC': 'D', 'CGT': 'R', 'TGG': 'W', 'GAA': 'E', 'CGC': 'R'}

Exercise one: simple complementation

Given s in 5' to 3' orientation, return the antisense sequence in 3' to 5' orientation

s

'GCATAAGAAGGAGCACGTACTAACGCGGCTGCGCGGAATAAATGTTATCG'

Method 1: using join on a generator comprehension

(note that iteration over a string yields one character at a time)

"".join(comp[i] for i in s)

'CGTATTCTTCCTCGTGCATGATTGCGCCGACGCGCCTTATTTACAATAGC'

Method 2: as in method 1, but write an explicit loop for growing the string

antisense = ""
for i in s:
    antisense += comp[i]
antisense

'CGTATTCTTCCTCGTGCATGATTGCGCCGACGCGCCTTATTTACAATAGC'

Method 3: use a loop to grow a list of strings, then concatenate them with join (this is more directly analogous to method 1 compared to method 2)

antisense = []
for i in s:
    antisense += comp[i]
"".join(antisense)

'CGTATTCTTCCTCGTGCATGATTGCGCCGACGCGCCTTATTTACAATAGC'

Exercise 2: reverse complement

Given s in 5' to 3' orientation, return the reverse complement sequence in 5' to 3' orientation

Method 1: generate the antisense sequence as before then reverse it

Here, we acomplish the reversal by passing three arguments to range (start, stop, step)

antisense = []
for i in s:
    antisense += comp[i]

revcomp = ""
for i in range(len(s)-1,-1,-1):
    revcomp += antisense[i]

revcomp

'CGATAACATTTATTCCGCGCAGCCGCGTTAGTACGTGCTCCTTCTTATGC'

Method 2: Same thing, combining the reversed direction of the second loop above with the complementation step from the first loop

revcomp = ""
for i in range(len(s)-1,-1,-1):
    revcomp += comp[s[i]]

revcomp

'CGATAACATTTATTCCGCGCAGCCGCGTTAGTACGTGCTCCTTCTTATGC'

Method 3: Same thing, using a three argument version of python's slicing syntax, which has the same semantics as the range call above

revcomp = ""
for i in s[::-1]:
    revcomp += comp[i]

revcomp

'CGATAACATTTATTCCGCGCAGCCGCGTTAGTACGTGCTCCTTCTTATGC'

Method 4: reversed is a generator funtion that yields a reversed sequence for any indexable input

revcomp = ""
for i in reversed(s):
    revcomp += comp[i]

revcomp

'CGATAACATTTATTCCGCGCAGCCGCGTTAGTACGTGCTCCTTCTTATGC'