Creating a simple 4x5 matrix (the right way)
mat = []
for i in range(4):
row = []
for j in range(5):
row.append(i*j)
mat.append(row)
mat
Same code with an indentation error
(Note that the redundant appends of row create 5 element groups in mat all linked to the same list)
mat = []
for i in range(4):
row = []
for j in range(5):
row.append(i*j)
mat.append(row)
mat
def f(x,y):
j = x*y
return j
A = 2
B = 4
C = f(A,B)
C
Answer: use the + operator to concatenate (this also works for strings)
x = [1,2,3]
x = [4]+x
x
We discussed that adding to the left is more expensive than adding to the right because it implies a copy of the full list to a new location in memory.
In the case where we know the final size of the list and where all of our data is going to go in it, we can get around this problem by pre-allocating the list:
x = [None]*10
Start with a list of 10 null values (the expected final size of the list)
x
Fill in 9 numbers, leaving room for an annotation on the left
for i in xrange(9):
x[i+1] = i
x
Fill in the annotation
x[0] = "hi"
x
(Equivalently -- how do you keep track of all of the data that you've put into core memory?)
In "vanilla" python, you can use the dir function. This works in ipython, but is cluttered up with a lot of extra names that ipython generates:
dir()
To deal with this clutter problem, ipython provides the %who magic, which lists just the names that you've explicitly defined
%who
E.g., to free up memory, or for debugging purposes
Answer: use the del statement
A
del A
A
As we develop reusable code, it is useful to factor it into our own modules, so that we don't have to keep copy-pasting all of the time. (Modules also make it easier to document and version-control our code, and are a useful way of sharing our code with others).
Here, we create a "stats" module with functions from earlier in the week which will be useful for solving last night's homework problem.
We start by saving stats.py (from a 2013 session of this course) to the directory from which we launched the IPython notebook.
import stats
A module's __file__ attribute tells us the location of the module on disk. Here I'm confirming that I imported the stats.py in my working directory:
stats.__file__
Other modules come from elsewhere
(based on the search path defined in the PYTHONPATH environment variable, which you can inspect and manipulate via):
import sys sys.path
E.g., because I'm running ipython in my normal system context, numpy is imported from my system copy of numpy
(if I were running my Canopy copy of ipython from the Canopy virtual environment, the normal context for this course, then numpy would be imported from the version in my Canopy directory)
import numpy
The loaded file has a .pyc extension, indicating that it is a byte-code version, automatically generated by the python interpreter from the text-format .py version.
This is useful to know for two reasons:
numpy.__file__
!ls /usr/lib/python2.7/dist-packages/numpy/__init__.py
numpy??
Okay, let's see what's in the stats module we just loaded -- the three stats functions from our day 1 homework.
dir(stats)
Some examples of viewing the docstrings from the first lines of the function definitions.
help(stats.mean)
stats.mean?
help(mean)
from stats import mean
Here I added the CDT parsing function from last night's homework to my copy of stats.py using my favorite text editor
*The editor that comes with Canopy is a good choice for editing python code. Other popular text editors are:
Having edited stats.py, I reload the module to get access to the new code
reload(stats)
Now I can see the new function
dir(stats)
Noting that we can crash a naive implementation of Pearson with divide-by-zero errors
x = [0.,0.,0.]
stats.pearson(x,x)
Loading the example data via our new module
(cols, genes, annotations, matrix) = stats.parse_cdt3(open("supp2data.cdt"),0.)
Showing off some of the tricks available via numpy.arrays
import numpy
Make an array from a list of lists. All of the elements of the lists are floats, so we get the default floating-point type (64 bit floats on my laptop)
M = numpy.array(matrix)
M.dtype
As expected, the array has 2467 rows (genes) and 79 columns (conditions)
M.shape
Transposing a numpy array is essentially free:
M.T.shape
Using numpy to calculate the dot product of two vectors (the expression profiles of the first two genes)
d = numpy.dot(M[0],M[1])
d.shape
d