Dictionaries

  • Dictionaries are associative arrays that map unique, hashable keys to arbitrary values

  • As discussed in class, "hashable" effectively means "not expected to change after it's been added to the dictionary"

    • e.g., tuples are hashable, but lists aren't

Defining a dictionary using curly-braces (analogous to defining a list with square braces)

In [1]:
d = {"tyr1":3,"abc1":5,"Hello":"world"}

As for lists, strings, and tuples, we can index a dictionary using square braces, but, instead of using an integer index, we use one of the dictionary's keys.

In [2]:
d["tyr1"]
Out[2]:
3
In [3]:
d["abc1"]
Out[3]:
5
In [4]:
d["Hello"]
Out[4]:
'world'

tuples are valid keys

In [5]:
d = {(1,2,3):5}

lists aren't

In [6]:
d = {[1,2,3]:5}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-1d6c93b89271> in <module>()
----> 1 d = {[1,2,3]:5}

TypeError: unhashable type: 'list'

Dictionary keys are unique. If I define a dictionary with multiple instances of the key "Hello", only the last instance is retained.

In [8]:
d = {"tyr1":3,"abc1":5,"Hello":"world","Hello":2}
In [9]:
d
Out[9]:
{'Hello': 2, 'abc1': 5, 'tyr1': 3}

You can get a list of a dictionaries keys with its keys method, and iteration over a dictionary is defined as iteration over its keys.

  • Dictionary iteration is deterministic; i.e. as long as the contents of the dictionary doesn't change, two iterations over the dictionary will visit its keys in the same order
  • but not well defined -- it is difficult to predict ahead of time in what order the keys will be visited
  • if you do want well defined iteration over a dictionary, use a sorted list of its keys, e.g.: for key in sorted(d): i = d[key] # do something based on i
In [10]:
d.keys()
for i in d:
    print i
Out[10]:
['tyr1', 'abc1', 'Hello']
In [13]:
for (key,val) in d.items():
    print key
    print val
tyr1
3
abc1
5
Hello
2

As for lists, we can use the indexing syntax to assign a new value to a dictionary key

In [14]:
d
Out[14]:
{'Hello': 2, 'abc1': 5, 'tyr1': 3}
In [15]:
d["Hello"] = 5
d
Out[15]:
{'Hello': 5, 'abc1': 5, 'tyr1': 3}

This method also works for adding a new key,value pair to the dictionary

In [16]:
d["possum"] = 12
In [17]:
d
Out[17]:
{'Hello': 5, 'abc1': 5, 'possum': 12, 'tyr1': 3}

There are two ways to check if a dictionary has a key without risk of throwing a KeyError or modifying the dictionary:

  • Use the dictionary's has_key method
  • Use the in operator (this also works for lists, tuples, sets, and strings)
In [18]:
d.has_key("possum")
Out[18]:
True
In [19]:
d.has_key("opposum")
Out[19]:
False
In [20]:
"possum" in d
Out[20]:
True

Sequence transformation exercises

A dictionary for complementing DNA bases

  • Assuming we've normalized our DNA strings to uppercase (almost always a good idea)
  • If we represent RNA as cDNA, then this works for RNA as well (which is why the cDNA representation is almost always a good idea)
  • It's useful to let N map to itself, to handle, e.g., genomic sequence with undetermined bases
In [22]:
comp = {"A":"T","G":"C","T":"A","C":"G","N":"N"}

Using the random module from the python standard library to generate some random sequences

In [23]:
from random import seed, choice

Seed the random number generator (we don't have to do this, but doing so means our "random" code will get a deterministic stream of numbers, making it reproducible/debuggable)

In [24]:
seed(42)

The choice function implements uniform sampling with replacement

In [27]:
choice("ATGC")
Out[27]:
'G'
In [28]:
choice("ATGC")
Out[28]:
'A'

(Aside: sample implements uniform sampling without replacement)

In [29]:
from random import sample
In [30]:
sample("ATGC",3)
Out[30]:
['T', 'A', 'C']

The other tool we'll need is string's join method. It takes a list of strings (on the right) and concatenates them with the given string (on the left)

In [31]:
"this is a string".join(("A","B","C"))
Out[31]:
'Athis is a stringBthis is a stringC'
In [32]:
" ".join(("A","B","C"))
Out[32]:
'A B C'
In [33]:
"".join(("A","B","C"))
Out[33]:
'ABC'

Okay, we've got all of the tools we need -- choose 50 deoxynucleotides at random and concatenate them into a single sequence

In [34]:
s = "".join(choice("ATGC") for i in range(50))
s
Out[34]:
'GCATAAGAAGGAGCACGTACTAACGCGGCTGCGCGGAATAAATGTTATCG'

If we want to build up strings more deliberately (e.g., in a for loop) the += operator is useful

(note that we can't append to a string)

In [35]:
x = "some string"
x += "A"
x
Out[35]:
'some stringA'

Here's a useful dictionary for the translation exercise

In [39]:
from geneticCode import geneticCode
In [51]:
print geneticCode
{'CTT': 'L', 'ATG': 'M', 'ACA': 'T', 'ACG': 'T', 'ATC': 'I', 'AAC': 'N', 'ATA': 'I', 'AGG': 'R', 'CCT': 'P', 'ACT': 'T', 'AGC': 'S', 'AAG': 'K', 'AGA': 'R', 'CAT': 'H', 'AAT': 'N', 'ATT': 'I', 'CTG': 'L', 'CTA': 'L', 'CTC': 'L', 'CAC': 'H', 'AAA': 'K', 'CCG': 'P', 'AGT': 'S', 'CCA': 'P', 'CAA': 'Q', 'CCC': 'P', 'TAT': 'Y', 'GGT': 'G', 'TGT': 'C', 'CGA': 'R', 'CAG': 'Q', 'TCT': 'S', 'GAT': 'D', 'CGG': 'R', 'TTT': 'F', 'TGC': 'C', 'GGG': 'G', 'TAG': '*', 'GGA': 'G', 'TAA': '*', 'GGC': 'G', 'TAC': 'Y', 'TTC': 'F', 'TCG': 'S', 'TTA': 'L', 'TTG': 'L', 'TCC': 'S', 'ACC': 'T', 'TCA': 'S', 'GCA': 'A', 'GTA': 'V', 'GCC': 'A', 'GTC': 'V', 'GCG': 'A', 'GTG': 'V', 'GAG': 'E', 'GTT': 'V', 'GCT': 'A', 'TGA': '*', 'GAC': 'D', 'CGT': 'R', 'TGG': 'W', 'GAA': 'E', 'CGC': 'R'}

Exercise one: simple complementation

Given s in 5' to 3' orientation, return the antisense sequence in 3' to 5' orientation

In [42]:
s
Out[42]:
'GCATAAGAAGGAGCACGTACTAACGCGGCTGCGCGGAATAAATGTTATCG'

Method 1: using join on a generator comprehension

(note that iteration over a string yields one character at a time)

In [43]:
"".join(comp[i] for i in s)
Out[43]:
'CGTATTCTTCCTCGTGCATGATTGCGCCGACGCGCCTTATTTACAATAGC'

Method 2: as in method 1, but write an explicit loop for growing the string

In [44]:
antisense = ""
for i in s:
    antisense += comp[i]
antisense
Out[44]:
'CGTATTCTTCCTCGTGCATGATTGCGCCGACGCGCCTTATTTACAATAGC'

Method 3: use a loop to grow a list of strings, then concatenate them with join (this is more directly analogous to method 1 compared to method 2)

In [45]:
antisense = []
for i in s:
    antisense += comp[i]
"".join(antisense)
Out[45]:
'CGTATTCTTCCTCGTGCATGATTGCGCCGACGCGCCTTATTTACAATAGC'

Exercise 2: reverse complement

Given s in 5' to 3' orientation, return the reverse complement sequence in 5' to 3' orientation

Method 1: generate the antisense sequence as before then reverse it

Here, we acomplish the reversal by passing three arguments to range (start, stop, step)

In [46]:
antisense = []
for i in s:
    antisense += comp[i]

revcomp = ""
for i in range(len(s)-1,-1,-1):
    revcomp += antisense[i]

revcomp
Out[46]:
'CGATAACATTTATTCCGCGCAGCCGCGTTAGTACGTGCTCCTTCTTATGC'

Method 2: Same thing, combining the reversed direction of the second loop above with the complementation step from the first loop

In [47]:
revcomp = ""
for i in range(len(s)-1,-1,-1):
    revcomp += comp[s[i]]

revcomp
Out[47]:
'CGATAACATTTATTCCGCGCAGCCGCGTTAGTACGTGCTCCTTCTTATGC'

Method 3: Same thing, using a three argument version of python's slicing syntax, which has the same semantics as the range call above

In [48]:
revcomp = ""
for i in s[::-1]:
    revcomp += comp[i]

revcomp
Out[48]:
'CGATAACATTTATTCCGCGCAGCCGCGTTAGTACGTGCTCCTTCTTATGC'

Method 4: reversed is a generator funtion that yields a reversed sequence for any indexable input

In [49]:
revcomp = ""
for i in reversed(s):
    revcomp += comp[i]

revcomp
Out[49]:
'CGATAACATTTATTCCGCGCAGCCGCGTTAGTACGTGCTCCTTCTTATGC'
In [ ]: