Practical Bioinformatics -- Day 8

Data structures for dynamic programming

This is from our in-class discussion re: how to manage the arrows needed for the dynamic-programming traceback step.

Make a 5x6 array filled in with None "sentinel" values:

In [1]:

arrows = []
for i in xrange(5):
    arrows.append([])
    for j in xrange(6):
        arrows[-1].append(None)

Default formatting of the arrays:

In [2]:

print arrows

[[None, None, None, None, None, None], [None, None, None, None, None, None], [None, None, None, None, None, None], [None, None, None, None, None, None], [None, None, None, None, None, None]]

A "pretty" printing function -- useful for debugging:

In [3]:

def dump(M):
    for i in M:
        for j in i:
            print j,
        print

Here's the array as initialized:

In [4]:

dump(arrows)

None None None None None None
None None None None None None
None None None None None None
None None None None None None
None None None None None None

A diagonal ("align") arrow pointing from the 4th row, 3rd column to the 3rd row, 2nd column:

In [5]:

arrows[3][2] = (2,1)

In [6]:

dump(arrows)

None None None None None None
None None None None None None
None None None None None None
None None (2, 1) None None None
None None None None None None

A vertical ("gap") arrow:

In [7]:

arrows[2][5] = (0,5)

In [8]:

dump(arrows)

None None None None None None
None None None None None None
None None None None None (0, 5)
None None (2, 1) None None None
None None None None None None

Note that it's easy to use the arrows directly as indices:

In [9]:

(i,j) = arrows[2][5]
arrows[i][j]

A parallel array for the subalignment scores:

In [10]:

scores = [range(6) for i in xrange(5)]

In [11]:

dump(scores)

0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5

In [12]:

dump(arrows)

None None None None None None
None None None None None None
None None None None None (0, 5)
None None (2, 1) None None None
None None None None None None

Since the arrays are parallel, I can index them in the same way:

In [13]:

(i,j) = arrows[2][5]
arrows[i][j]
scores[i][j]

Out[13]:

Let's make another 5x6 arrows array, this time initializing with empty lists rather than None values -- this gives us room to store more than one arrow per location:

In [16]:

arrows2 = []
for i in xrange(5):
    arrows2.append([])
    for j in xrange(6):
        arrows2[-1].append([])

In [18]:

dump(arrows2)

[] [] [] [] [] []
[] [] [] [] [] []
[] [] [] [] [] []
[] [] [] [] [] []
[] [] [] [] [] []

In [20]:

arrows2[2][5].append((0,5))

In [21]:

dump(arrows2)

[] [] [] [] [] []
[] [] [] [] [] []
[] [] [] [] [] [(0, 5)]
[] [] [] [] [] []
[] [] [] [] [] []

Adding a second arrow to the same location:

In [22]:

arrows2[2][5].append((1,4))

In [23]:

dump(arrows2)

[] [] [] [] [] []
[] [] [] [] [] []
[] [] [] [] [] [(0, 5), (1, 4)]
[] [] [] [] [] []
[] [] [] [] [] []

In [24]:

%logstart -o BMS270b.2013.08.log

Activating auto-logging. Current session state plus future input saved.
Filename       : BMS270b.2013.08.log
Mode           : backup
Output logging : True
Raw input log  : False
Timestamping   : False
State          : active