Practical Bioinformatics -- Day 1

Saying hello to the interpreter shell

In [1]:
print "hello, world"
hello, world

Commands that start with "%" are IPython "magics", which are not part of the python language and do not get sent directly to the interpreter.

You can get a list of available magics with:

%magic

In [2]:
%pwd
Out[2]:
u'/home/mvoorhie/BMS270'

Commands that start with "!" are sent directly to the operating system shell rather than to the python interpreter.

On OS X and Linux, the shell is typically bash, which will interpret !ls as list the files in the current directory.

On Windows, !dir should give the equivalent result in the DOS shell (if you start ipython from cygwin, you should be able to use bash commands).

In [3]:
!ls
notebook1.ipynb  notebook1.log

Here is a platform independent way to have python create a list of the filenames in the current directory

In [4]:
import glob
glob.glob("*")
Out[4]:
['notebook1.log', 'notebook1.ipynb']

The %logstart magic retro-actively starts logging this IPython session (i.e., it captures even commands typed before the log was started).

The -o option turns on logging for outputs in addition to inputs.

In [5]:
%logstart -o notebook1.log
Activating auto-logging. Current session state plus future input saved.
Filename       : notebook1.log
Mode           : backup
Output logging : True
Raw input log  : False
Timestamping   : False
State          : active

Data types

Use matched single or double quotes for simple strings, triple quotes for multi-line strings.

In [6]:
"this is a string"
Out[6]:
'this is a string'
In [7]:
'i can single quote'
Out[7]:
'i can single quote'
In [8]:
"""this won't do what I want

multiple

                 lines
"""
Out[8]:
"this won't do what I want\n\nmultiple\n\n                 lines\n"
In [9]:
print """this won't do what I want

multiple

                 lines
"""
this won't do what I want

multiple

                 lines

Use a decimal point to distinguish integer (int) and double-precision floating point (float) values.

In [10]:
5
Out[10]:
5
In [11]:
5.
Out[11]:
5.0
In [12]:
5+3
Out[12]:
8
In [13]:
5-3
Out[13]:
2
In [14]:
5*2
Out[14]:
10
In [15]:
10/2
Out[15]:
5

Dividing two integers gives an integer result, rounded down.

In [16]:
10/3
Out[16]:
3

Make sure to include at least one float if you want floating point division

In [17]:
10./3.
Out[17]:
3.3333333333333335
In [18]:
10/3.
Out[18]:
3.3333333333333335

String substitution works via the mod (%) operator

In [19]:
"This is a %s mad lib" % "neat"
Out[19]:
'This is a neat mad lib'

Use parenthesis to group multiple arguments for string substitution

Technically, the parenthesis are defining a tuple, which is an immutable version of a list

In [20]:
"Now %s and %s" % ("one","two")
Out[20]:
'Now one and two'

Most python objects can be converted to strings, which is what the %s format string is asking for

In [21]:
"Now %s and %s" % (1,2)
Out[21]:
'Now 1 and 2'
In [22]:
"%s" % (10./3)
Out[22]:
'3.33333333333'

The %f format string lets us specify special formatting for floating point values; e.g., the number of decimal places:

In [23]:
"%.5f" % (10./3)
Out[23]:
'3.33333'
In [24]:
"%2.5f" % (10./3)
Out[24]:
'3.33333'

The %e format string gives scientific notation:

In [25]:
"%e" % (10./3)
Out[25]:
'3.333333e+00'
In [26]:
2**3
Out[26]:
8

Assignment

The assignment operator (=) lets a name "point at" (reference) a piece of data.

We can refer to the names as variables (because their value may vary with reassignment)

or as references (to emphasize their pointing nature).

In [27]:
a = 8
b = "hello"

We can then operate on the references as if they were the underlying data

In [28]:
a
Out[28]:
8
In [29]:
b
Out[29]:
'hello'
In [30]:
a + 3
Out[30]:
11
In [31]:
a + b
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-31-f96fb8f649b6> in <module>()
----> 1 a + b

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Addition on strings is concatenation:

In [32]:
b + "hi"
Out[32]:
'hellohi'
In [33]:
a
Out[33]:
8
In [34]:
a = a + 3
In [35]:
a
Out[35]:
11
In [36]:
a += 3
In [37]:
a
Out[37]:
14
In [38]:
(a+2)*3
Out[38]:
48

Exercise: using variables to evaluate a formula

In [43]:
ngc = 13
L = 25
nmm = 2

# Wrong: gives an unintended rounding error due to integer division
Tm1 = 81.5 + (41*ngc - 100*nmm - 675)/L
# Here are two ways to ensure floating point division:
Tm2 = 81.5 + (41*ngc - 100*nmm - 675.)/L
Tm3 = 81.5 + (41*ngc - 100*nmm - 675)/float(L)
In [44]:
Tm1, Tm2, Tm3
Out[44]:
(67.5, 67.82, 67.82)

IPython magic for listing just the names that we've explicitly assigned in this session

In [45]:
%who
L	Tm1	Tm2	Tm3	a	b	glob	ngc	nmm	

How to remove a reference:

In [47]:
del Tm1
In [48]:
%who
L	Tm2	Tm3	a	b	glob	ngc	nmm	
In [49]:
a
Out[49]:
14

Comparison operators

= $\rightarrow$ assignment

== $\rightarrow$ test for equality

In [50]:
a == 5
Out[50]:
False
In [51]:
a = 5
In [52]:
a == 5
Out[52]:
True
In [53]:
b = True
In [54]:
a > 6
Out[54]:
False
In [55]:
a < 5
Out[55]:
False
In [56]:
a != 3
Out[56]:
True
In [57]:
a >= 3
Out[57]:
True
In [58]:
a <= 3
Out[58]:
False

Boolean values can be combined with and, or, and not.

Remember that this type of logical expression will "short circuit" as soon as only one outcome is possible.

In [59]:
(5 > 3) and (3 > 2)
Out[59]:
True
In [60]:
(5 < 3) or (3 > 2)
Out[60]:
True

We can act on boolean values using if/elif/else statements.

Things to remember for this type of block statement:

In [61]:
if(5 > 6):
    print "it is"
else:
    print "it isn't"
it isn't
In [62]:
if(5 > 6):
    print "it is"
else:
    print "it isn't"
    print "it really isn't"
it isn't
it really isn't
In [63]:
if(5 > 6):
    print "hi"
elif(5> 3):
    print "hello"
else:
    print "the end"
hello
In [64]:
if(5 > 3):
    if(4 > 7):
        print "first"
    else:
        print "second"
else:
    print "oops"
second

Lists

Lists are:

In [65]:
mylist = [1,2,"apple","orange",2**5]
In [66]:
mylist
Out[66]:
[1, 2, 'apple', 'orange', 32]
In [67]:
mylist += ["first","second"]
In [68]:
mylist
Out[68]:
[1, 2, 'apple', 'orange', 32, 'first', 'second']
In [69]:
mylist.append(5)
In [70]:
mylist
Out[70]:
[1, 2, 'apple', 'orange', 32, 'first', 'second', 5]
In [71]:
list2 = []
In [72]:
list2.append(1)
In [73]:
list2
Out[73]:
[1]
In [74]:
mylist[3]
Out[74]:
'orange'
In [75]:
mylist[3:6]
Out[75]:
['orange', 32, 'first']

We can find the length of a sequence or collection with len

When defining your own classes, you can add this property with a __len__ method

In [76]:
len(mylist)
Out[76]:
8
In [77]:
len("string")
Out[77]:
6
In [78]:
len(6)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-78-d42e7c5a4468> in <module>()
----> 1 len(6)

TypeError: object of type 'int' has no len()
In [79]:
mylist[len(mylist)-1]
Out[79]:
5
In [80]:
mylist[-1]
Out[80]:
5
In [81]:
mylist[:3]
Out[81]:
[1, 2, 'apple']
In [82]:
mylist[3:]
Out[82]:
['orange', 32, 'first', 'second', 5]
In [83]:
mylist[:]
Out[83]:
[1, 2, 'apple', 'orange', 32, 'first', 'second', 5]

The "pointing at" nature of python references can give unexpected behavior for lists.

Use x[:] to make a "shallow" copy of a list for independent modification.

In [84]:
otherlist = mylist
In [85]:
print otherlist
print mylist
[1, 2, 'apple', 'orange', 32, 'first', 'second', 5]
[1, 2, 'apple', 'orange', 32, 'first', 'second', 5]
In [86]:
otherlist[3] = "thing"
In [87]:
otherlist
Out[87]:
[1, 2, 'apple', 'thing', 32, 'first', 'second', 5]
In [88]:
mylist
Out[88]:
[1, 2, 'apple', 'thing', 32, 'first', 'second', 5]
In [89]:
otherlist = mylist[:]
In [90]:
otherlist[3] = "new"
In [91]:
print mylist
print otherlist
[1, 2, 'apple', 'thing', 32, 'first', 'second', 5]
[1, 2, 'apple', 'new', 32, 'first', 'second', 5]
In [92]:
c = mylist[:3]
In [93]:
c[2] = 5
In [94]:
print c
print mylist
[1, 2, 5]
[1, 2, 'apple', 'thing', 32, 'first', 'second', 5]

We can create a (possibly ragged) multi-dimensional array with a list of lists:

In [95]:
L = [[5,6],[3,4]]
In [96]:
L
Out[96]:
[[5, 6], [3, 4]]
In [97]:
L[0][1]
Out[97]:
6

We can also create multidimensional arrays with the numpy array class.

Restrictions on arrays:

Benifits of arrays (due to underlying C implementation):

For starting out, you'll usually want python lists

In [98]:
A = array(L)
In [99]:
A
Out[99]:
array([[5, 6],
       [3, 4]])
In [100]:
A.dtype
Out[100]:
dtype('int64')

What is the largest signed or unsigned integer that can be represented by a given number of bits?

In [101]:
2**64-1
Out[101]:
18446744073709551615L
In [102]:
2**32-1
Out[102]:
4294967295
In [103]:
2**(32-1)-1
Out[103]:
2147483647
In [104]:
2**(16-1)-1
Out[104]:
32767

Iteration with for loops

In [105]:
mylist
Out[105]:
[1, 2, 'apple', 'thing', 32, 'first', 'second', 5]
In [106]:
for i in mylist:
    print i
1
2
apple
thing
32
first
second
5