Dictionaries are associative arrays that map unique, hashable keys to arbitrary values
As discussed in class, "hashable" effectively means "not expected to change after it's been added to the dictionary"
Defining a dictionary using curly-braces (analogous to defining a list with square braces)
d = {"tyr1":3,"abc1":5,"Hello":"world"}
As for lists, strings, and tuples, we can index a dictionary using square braces, but, instead of using an integer index, we use one of the dictionary's keys.
d["tyr1"]
d["abc1"]
d["Hello"]
tuples are valid keys
d = {(1,2,3):5}
lists aren't
d = {[1,2,3]:5}
Dictionary keys are unique. If I define a dictionary with multiple instances of the key "Hello", only the last instance is retained.
d = {"tyr1":3,"abc1":5,"Hello":"world","Hello":2}
d
You can get a list of a dictionaries keys with its keys method, and iteration over a dictionary is defined as iteration over its keys.
for key in sorted(d):
i = d[key]
# do something based on i
d.keys()
for i in d:
print i
for (key,val) in d.items():
print key
print val
As for lists, we can use the indexing syntax to assign a new value to a dictionary key
d
d["Hello"] = 5
d
This method also works for adding a new key,value pair to the dictionary
d["possum"] = 12
d
There are two ways to check if a dictionary has a key without risk of throwing a KeyError or modifying the dictionary:
d.has_key("possum")
d.has_key("opposum")
"possum" in d
A dictionary for complementing DNA bases
comp = {"A":"T","G":"C","T":"A","C":"G","N":"N"}
Using the random module from the python standard library to generate some random sequences
from random import seed, choice
Seed the random number generator (we don't have to do this, but doing so means our "random" code will get a deterministic stream of numbers, making it reproducible/debuggable)
seed(42)
The choice function implements uniform sampling with replacement
choice("ATGC")
choice("ATGC")
(Aside: sample implements uniform sampling without replacement)
from random import sample
sample("ATGC",3)
The other tool we'll need is string's join method. It takes a list of strings (on the right) and concatenates them with the given string (on the left)
"this is a string".join(("A","B","C"))
" ".join(("A","B","C"))
"".join(("A","B","C"))
Okay, we've got all of the tools we need -- choose 50 deoxynucleotides at random and concatenate them into a single sequence
s = "".join(choice("ATGC") for i in range(50))
s
If we want to build up strings more deliberately (e.g., in a for loop) the += operator is useful
(note that we can't append to a string)
x = "some string"
x += "A"
x
Here's a useful dictionary for the translation exercise
from geneticCode import geneticCode
print geneticCode
Given s in 5' to 3' orientation, return the antisense sequence in 3' to 5' orientation
s
Method 1: using join on a generator comprehension
(note that iteration over a string yields one character at a time)
"".join(comp[i] for i in s)
Method 2: as in method 1, but write an explicit loop for growing the string
antisense = ""
for i in s:
antisense += comp[i]
antisense
Method 3: use a loop to grow a list of strings, then concatenate them with join (this is more directly analogous to method 1 compared to method 2)
antisense = []
for i in s:
antisense += comp[i]
"".join(antisense)
Given s in 5' to 3' orientation, return the reverse complement sequence in 5' to 3' orientation
Method 1: generate the antisense sequence as before then reverse it
Here, we acomplish the reversal by passing three arguments to range (start, stop, step)
antisense = []
for i in s:
antisense += comp[i]
revcomp = ""
for i in range(len(s)-1,-1,-1):
revcomp += antisense[i]
revcomp
Method 2: Same thing, combining the reversed direction of the second loop above with the complementation step from the first loop
revcomp = ""
for i in range(len(s)-1,-1,-1):
revcomp += comp[s[i]]
revcomp
Method 3: Same thing, using a three argument version of python's slicing syntax, which has the same semantics as the range call above
revcomp = ""
for i in s[::-1]:
revcomp += comp[i]
revcomp
Method 4: reversed is a generator funtion that yields a reversed sequence for any indexable input
revcomp = ""
for i in reversed(s):
revcomp += comp[i]
revcomp