Goal: Explore file I/O

In [1]:
%ls example*
example1  example2  example.pdf
In [2]:
%pwd
Out[2]:
u'/home/bms270/BMS270_2017'

Examples of using the command line download tools wget and curl. curl should be available on all macs.

In [3]:
!wget 'http://histo.ucsf.edu/BMS270/BMS270_2017/data/example1'
--2017-04-25 15:26:47--  http://histo.ucsf.edu/BMS270/BMS270_2017/data/example1
Resolving histo.ucsf.edu (histo.ucsf.edu)... 128.218.234.54
Connecting to histo.ucsf.edu (histo.ucsf.edu)|128.218.234.54|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3252152 (3.1M)
Saving to: ‘example1.1’

example1.1          100%[===================>]   3.10M  5.74MB/s    in 0.5s    

2017-04-25 15:26:47 (5.74 MB/s) - ‘example1.1’ saved [3252152/3252152]

In [4]:
!curl 'http://histo.ucsf.edu/BMS270/BMS270_2017/data/example1' > example1
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3175k  100 3175k    0     0  5365k      0 --:--:-- --:--:-- --:--:-- 5373k
In [5]:
%ls
Day1.html        Day1_warmup.ipynb  example1    example.pdf      Untitled.ipynb
Day1.ipynb       Day2b.ipynb        example1.1  Untitled1.ipynb
Day1_post.ipynb  Day2.ipynb         example2    Untitled2.ipynb

Example 1: text file

In [6]:
data = open("example1").read()
In [7]:
len(data)
Out[7]:
3252152
In [8]:
data[:100]
Out[8]:
'@SRR4244242.20068354 20068354 length=112\nCTGGGCTCCACCTCTAGGGTGATGGTCTTGCAGGTCAGGGTCTTCGCGAAGATCTGCAT'
In [9]:
ord(data[0])
Out[9]:
64
In [10]:
chr(64)
Out[10]:
'@'
In [11]:
print data[:100]
@SRR4244242.20068354 20068354 length=112
CTGGGCTCCACCTCTAGGGTGATGGTCTTGCAGGTCAGGGTCTTCGCGAAGATCTGCAT
In [12]:
fp = open("example1")
In [13]:
fp
Out[13]:
<open file 'example1', mode 'r' at 0x7f16e10119c0>
In [14]:
fp.read(100)
Out[14]:
'@SRR4244242.20068354 20068354 length=112\nCTGGGCTCCACCTCTAGGGTGATGGTCTTGCAGGTCAGGGTCTTCGCGAAGATCTGCAT'
In [15]:
fp.read(100)
Out[15]:
'TATGACCTGATAACAAATGTGATGAAAGCACAAACCGCCCAGCGCGTCGAAAC\n+SRR4244242.20068354 20068354 length=112\nAAAA.'
In [16]:
fp.readline()
Out[16]:
'FFFF))FFF)FFFFAF<FAF.FAFF))FAFFFFFFFFFF7)FF<FFA7FFFF.7F7FFAFFFF.AAFF<F.FF.<AFFFAAFF<F.)<FFAFF<.AF<FAFF.F..F\n'
In [17]:
fp.readline()
Out[17]:
'@SRR4244242.6143545 6143545 length=116\n'
In [18]:
count = 0
for line in open("example1"):
    print line
    count += 1
    if(count > 100):
        break
@SRR4244242.20068354 20068354 length=112

CTGGGCTCCACCTCTAGGGTGATGGTCTTGCAGGTCAGGGTCTTCGCGAAGATCTGCATTATGACCTGATAACAAATGTGATGAAAGCACAAACCGCCCAGCGCGTCGAAAC

+SRR4244242.20068354 20068354 length=112

AAAA.FFFF))FFF)FFFFAF<FAF.FAFF))FAFFFFFFFFFF7)FF<FFA7FFFF.7F7FFAFFFF.AAFF<F.FF.<AFFFAAFF<F.)<FFAFF<.AF<FAFF.F..F

@SRR4244242.6143545 6143545 length=116

TTTCACCTCAGTGACGCAGCCCTTCTCTCTCCAGTCCACAGTGTCAGGCAATGTCCGATTAGAGTATGACCTGAAAGTGACAGTCTTCGGAGACTGTCGGGGAATTCTCAGAGCAC

+SRR4244242.6143545 6143545 length=116

AAAAAFFFFFFFFFFFFFFFFFFFFFFFFFFF)FFF7.FFFFFFFFFFFFFFFFFFFF<F<FFFFFFFFFFFFFFFFFA<FFFFFFFFAF<FAFFFFF.FFFFFFFFAFFFAFFAF

@SRR4244242.28027200 28027200 length=139

CTGTGATGGGGAAGACCAGAGTCTTATATCATGAATTGCATCGGTGCTGTGGGGCAGGCACATAGGATGCCAGGGCAAAGGGAGACGGAGCTCTGTGCTGACAAGGAATCACACTGAGCCCAGCTTCAGGGGGCCCAGG

+SRR4244242.28027200 28027200 length=139

AAAAAFFFFFFFFFAFFFFFFFFFFAFFFFAFFAFFFFFFFFFFFFFFFFFFFFFAFFFFFFFAFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFFAFAFFFFFFFFFFFFA7F<FFFFFFFFFFAFFFFFFAFA

@SRR4244242.21033314 21033314 length=91

CTGGCATGTTGGAACAATGTAGGTAAGGGAAGTCGGCAAGCCGGATCCGTAACTTCGGGATAAGGATTGGCTCTAAGGGCTGGGTCGGTCG

+SRR4244242.21033314 21033314 length=91

AAAAAFFF<FFFFFFF<FFFFFF7FAF7FFFFFFF7FFFFFFFFFFFAFFF<FFFFA7FAFAFFFFFFFF<FFFF<FFF7FA<FFF.FF<A

@SRR4244242.19134434 19134434 length=149

TAAAAGACAAAAGTGAGAATGGTGCAGAAAAGGCGCAGGCACAACGGCTAGAAGAGGACCCAGCCAGCTAGGACCCTGCACGGATGTGTTGATGGCGGCCTCACAGGAACAGCGAATGGTAGAGAGTGGAGTGATCTCCCAACAACCCC

+SRR4244242.19134434 19134434 length=149

AAA.AFF<FF.7A)7)7F.AFAFFFAA7FFA.A)F.F7.F<F)F.)A.FAFFFFF..7FFF77FF)FFF.FF..AFFF).))FF))F<)F)FF)F)<.FFFFFFF<F.FFAF.<.FF.7).F<A.77F..))A7A)AAAA).<.<F.7<

@SRR4244242.2375668 2375668 length=76

TAACACAGAAGCAATGCTGTCACCTACCCCGGGGTGGACTCAGGGCATGGACGCGACCATCCTCCTCTTAGGAGTG

+SRR4244242.2375668 2375668 length=76

A.AAAFFFAFFA.FFAFFFFF.FFF.FFFF)A.FF..7FFFAFF.FFFFA<F.7F.FFFFFFFFFFFAF7.FFAFF

@SRR4244242.11970718 11970718 length=119

GAGTAATAAGAGCGAGGAGGGAGGGAAAGAACCATCTTCGAGTGCTCTCGAGGAGCCAAGCCCGCCTCAGCTGTCTTCAAAAGCAAACAAAGCCATCTTTGGAATTTGCAGACTAAGAT

+SRR4244242.11970718 11970718 length=119

AAAA<AFFFF7FFFFFFFFFFFFFFFFFF<FFFFFFFFFF.FFFFFFAFFFFFFFFFF7FF7)FFFFFFFFAFFFFFFFFFFFFFAFFFFAFFF7FF)FFFFAFFA7FFAF.FFFFFFF

@SRR4244242.10413608 10413608 length=49

GTAGACATGGGTTGCTCCTCCTTCCTCTGGCATAGACAAGTAGTATTTC

+SRR4244242.10413608 10413608 length=49

AA<AAF7FFFFFFFFF<FFFFFFFFFFFFFFFFAFFFFFFFAFFFFFFF

@SRR4244242.5105782 5105782 length=148

ATTTTTATGCTAAGTTCGAATGTATTTTTTTTGAGAATACAAAAAGTAACCCTTGAAAATCAGAATATATAACAGAAAAGAGCACAATAACTTAAGTATTAAACATCTGTATGAAATAACTTGCAAAGTTTGACAAATATGCACACAT

+SRR4244242.5105782 5105782 length=148

AAAAAFFFFFFFFFFFFFFAFFFFFFFFFFFFFFFFAFFFFFF<FFFF.FAFFFFFAFFFFAFFFFFFFFFFFFFF.FFFFFFAFF<FAFFFFAFFFFFFAFFF7FFFFFFFFF7FFFFFFF<F7FAFF<FAFF7FFFFAF.A<7FFA

@SRR4244242.11374318 11374318 length=151

CTTCTCCTCCCTCCATCAGAAGATGATCTGGAAATATTCAAGAAATACCAACACCTGTTTCTTCAGGGCAATGCAACATGGATGCCTTTCTTTCACTGCCCAAAATGGAATGTTCGGACAATTCTAAGAGGAGAGCATAACTTCTTCTCTG

+SRR4244242.11374318 11374318 length=151

AAAAAFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFF)F7FFAFFAFFFFFFFFFFFFFF<AFFFFFFFFFFFFFFFFFFFFAFFAFF<<AFFFFAAFFF<FF)FFAFAFF.7AAFF.FAAF<.A7FF7<<7.<A...

@SRR4244242.3778568 3778568 length=151

TGGCAATCCAGGAGGAAGCAAAATTCGCACTGGTTACACATGACCAGGTCACCTGGTATCTGGCAGACACGGCAGATAGTGGCACTGTCATCCAGGATACCTGGCCCACTTACTGGGGCTGAGCTCACCTCAGGAGCCACCACCTCCAAGC

+SRR4244242.3778568 3778568 length=151

AA<AAFFFAFFF.FF77FFFAF<FFFAFFFFFAFFAF7FF<)A)FF.<FFFAFFAF))FFFFFFFA<A.FF<FFFFF<AF.FAFFF.FFFF<<<F).7.F<F).FF..F<)FF.FAFA<F.7.)F..7<F<.AF.7)A7..7A.AA7.A)7

@SRR4244242.21767697 21767697 length=151

CCCCTCCTTAGGCAACCTGGTGGTCCCCCGCTCCCGGGAGGTCACCATATTGATGCCGAACTTAGTGCGGACACCCGATCGGCATAGCGCACTACAGCCCAGAACTCCTGGACTCAAGCGATCCTCCTGTCTCAGCCTCCCGAGTAGCTGG

+SRR4244242.21767697 21767697 length=151

AAAAAFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFAFFFFF<FFAFFFFF.FFFFFFAFFFFFFFAFFFF.FFFF.FFFFFFF7FFFFFFFFF<FFFFF7)<FFFFFFFAFFFFFFFFFFF<FFF7FFFF<.<AAFF<A<AFA<FFFAF

@SRR4244242.19608439 19608439 length=106

CTTTGGTCTCCACGGTTGTAGTTTGTAGCTCGTGTGTTATAATTGCTCTCGTGCTGAGCTAAACACACCCAGTCGGCCAGGCTGACTCCATAGTAGCCTGCCATTC

+SRR4244242.19608439 19608439 length=106

AAAAAFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFF.FFFFAF<FFFFFFFFFFF.FFFFFF.FFF<F7FFFFFAF7FFAAF<77FFAFFAF.7AA)FFF<F7<

@SRR4244242.9959947 9959947 length=137

CCGACCTGGGCCGGTTCACCCCTCCTTAGGCAACCTGGTGGTCCCCCGCTCCCGGGAGGTCACCATATTGATGCCGAACTTAGTGCGGACACCCGATCGGCATAGCGCACTACAGCCCAGAACTCCTGGACTCAAGC

+SRR4244242.9959947 9959947 length=137

AAA<AFFFFFFFFFFFF)F.7FFFFFF<FFFF<FFFFFFFFFFF.FFFFF<A)FFAFFFFF)<F<FFF<.FFFFFFF7FF<AAFFFFFFF7F.FF.FFFAFFFFFFF.<FFFF7AFFFFF7<FFFA7FAFFFF7F7F

@SRR4244242.9702802 9702802 length=95

AGGGAGGCATCCGCTCCGGCGAGGGAGGCATCCGCCCCGACTCGGGGCTTCTCCTGCCCAGTCTGCCCCAGCGTAGAGCCCTGCTCTCTGGGAAC

+SRR4244242.9702802 9702802 length=95

A.AA<FFFFFFFAFFFFFFFFA<FA<FAF<FFFF<)FFF.FFFFFFF.FA)7FFFF..F.FAFFFAAF)FFFF...7FF.FFF))FFFFFFF.FF

@SRR4244242.10361791 10361791 length=111

TGAATACATGACCATTTCTCTTTTAGCACGCTCTTTATTCTCCTCTTCCAGAAGTTGGAGACGACTATTTAATTTGATTATCTGACGTCTTAATGAAGCTGCATCTACAAC

+SRR4244242.10361791 10361791 length=111

AAAAAFFFFF<FFAFFFFFFFFFFAFF.FFFFFFFF<FFFAFFFFFFFF<<AFFFFF7FFFF..FF<FFF..FFFFFAA.AAFF<F.)FFF.<F77FF7F.F.77F.F.FF

@SRR4244242.8071116 8071116 length=151

CTGGAGTCTTGGAAGCTTGACTACCCTACGTTCTCCTACAATGGACCTTGAGAGCTTGTTTGGAGGTTCTAGCAGGGGAGCGCAGCTACTCGTATACCCTTGACCGAAGACCGGTCCTCCTCTATTCGGGGAAGGTCGTCCTCTTCGACCG

+SRR4244242.8071116 8071116 length=151

AAAAAFFFAFFF<AFFFFFAAFFF7FFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFAFFFFFFAFAFFFFFFFFFFFFFFFAF)FFA7FFFFFFA.FFF<F<)FFF.<F<FFFFFAFFFFFFAAAFFFFFFFFFFFAFF<.A.FF<F<F<<

@SRR4244242.7864080 7864080 length=150

GCCAGCTCTGCGGCAGGGTGTTCAGGCCTCAGTCCAGCACTGAAGGCAGGTGGTGTGGCCTCTACAGCTCATCCATGGCTTGGACAGGGGATTCTTCCTCATCTTCCTCCTTCTCATCTTCTTCGTCCTCATCTTCATCTCAATCAGATC

+SRR4244242.7864080 7864080 length=150

AAAAAFFFFFFFFFAFFFFFFAFFFF<FFFF.FFFFF.)FFFFFFFAF.FFFFFFFF.7FFFF)FFFFAF<FF7.F)FFFFA<AF77)F.FFFF.FFFFF<F)FFF)<.FF<7F77F.F77F<<77.<F7<<.77)<F.AA.)AA<<..<

@SRR4244242.9494196 9494196 length=120

TTTACATAGCAGTTCCAGATCACTCAGATACACAGTAAGACCCTGTCTAGGATCCTTTCTGAAAAACAGATTATTGCAGCTGGAACAACTATATAATGCCTACTACATGCCAAGCTCCAG

+SRR4244242.9494196 9494196 length=120

A<AAAFFFFFAFFFFF7F<FFFFFFFFFFFFAFFFFA<F.FFFFFFFFAFF.FFFFFFAAFFFFFFFFFFFFFFAFF<F<FFF..FFFFFAFFFAA<FFFFFFF7FFFFFF..FFAF7FF

@SRR4244242.16207425 16207425 length=113

CTTGATCTTGATTTTCAGTACGAATACAGACCGTGAAAGCGGGGCCTCACGATCCTTCTGACCTTTTGGGTTTTAAGCAGGAGGTGTCAGAAAAGTTACCACAGGGATAACTG

+SRR4244242.16207425 16207425 length=113

AAAAAFFFFFFFFFAFFFFFFFAFFFFAFFFFFFFFFFFFFFFFFFFFFFFAFFFFAFFF.FFFFFFFFFF7FFFFFFAFFAFFFFFFFFFFAFFFFFFFFF<FFFAFAFFFF

@SRR4244242.10008174 10008174 length=151

GTAGGACTGAGGCAGGTAGGTCCCGGCCTTAATGTTAATAAGGAGCTCCAGCAGGTTTTTGGGCGGCATAACGATGGAAGTGTTCAGAGGAATCACGTAGCACTTGTCCAGGTTAAGGTCCAAATAAGCAGTGAGTTTCTTGTTGAAGTCG

+SRR4244242.10008174 10008174 length=151

AAAAAFFFFFFFFFFFFFFFFFFFFFFF)FFFFFFFFFFFFFFFFFFFFFFFFFFFAFFFFFFFFFF7FFFFFFFFFFFFFFFFFAFFFFFAFF.FFFFFFAFFFFFF<FFFF<.FFFFFF7FFFAFFFFF<FFAFAFFF<<<AF<FFFF.

@SRR4244242.22742394 22742394 length=108

GCCGCTCGTCGGAGTACAGGATGCTAGCTGAAAGACTGTGATCCCGCTGACTGTTCCCTCGCCCACCTGGGATCTTCAGGGGTGGGCGAGGGACATCAGGAGCACCAC

+SRR4244242.22742394 22742394 length=108

AAAAAFFFFFFFFFFFFFFFFFFFFAFFFFFFAF7FFFFF<FFAFFFFF<FFFAAFFF7FFFFFFFFFF7FFFF<FFF<FFFFFFFFFFFFFFFFFFF<FFFFFFFFF

@SRR4244242.15611876 15611876 length=94

TGGACTGTTATCAAAACACCTAAGGAGGATATTAATCATGAGGAAGATATTCCTTGCATATTATATTCCTTGCATGAATATAAACTGGATGATT

+SRR4244242.15611876 15611876 length=94

AAAAAFFFFAFFAFFFF<FFFAFFFFFFFFAFFFFFFFFFFFFFFFFFFFFFFFFFFFA.F7AFFAFFFFFFFFFFFFFFFFFFFF<FFFFFFF

@SRR4244242.21281 21281 length=150

CGGTTCACCCCTCCTTAGGCAACCTGGTGGTCCCCCGCTACCGGGAGGTAACCATATTGATGCCGAACTTAGTGCGGACACCCGATCGGCAGATAGGAAAAGCACACGTCTAAACTCCAAACACAACACAAAAACATATAACATAATATA

+SRR4244242.21281 21281 length=150

AAAAAF).F<F)FFFF<)FFAFF<A7<A7FFF)7FAFA7)FFAFFFFFF.FFF<F7FAF.<FFF<.FFFF.FFF).)FAAF77)F7F)F.7F7F.))FA.A.7A)FA.)F))7.).)7.))).7<)..).<7...))..7.)<)..)..7

@SRR4244242.3210697 3210697 length=151

CCCCTGGGGCGCGCAAGTCTGCGCTGGTTGTGGCCCCGCCACACTGCGGAGGTTGGTCAGATGGTTGCCCATCTTCATGATGAGTTTCACCTCCTTATCAAGAAAGAGGCTTTCCAGGAAGTCACAGAGATGAGGGTCCGCGCGGGCAGAC

+SRR4244242.3210697 3210697 length=151

AAAA<FF<FAFF.7)<FFFAFF<FFFFFFF<AF)F)FFFF)F)<.FFFFFFF<.FFFFAF<).77)F<7F.AFF7F.FF.FFFF.FAF).F7FFA.F.F..<A))F)FF.7)F<.)FAA)F)7)FFFFFF<FAAFF7)7<FF<F.AF)..)

@SRR4244242.27105289 27105289 length=117

In [19]:
data2 = open("example2").read()
In [20]:
data2[:100]
Out[20]:
'RIFFF\x1f2\x00WAVEfmt \x10\x00\x00\x00\x01\x00\x02\x00D\xac\x00\x00\x10\xb1\x02\x00\x04\x00\x10\x00LIST\x1a\x00\x00\x00INFOISFT\x0e\x00\x00\x00Lavf56.40.101\x00data\x00\x1f2\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
In [ ]: