Open a binary file explicitly in binary mode using "rb" (r = "read", b = "binary")
fp = open("example2","rb")
Read the first 100 bytes (The b at the beginning of the string marks it as a byte string)
fp.read(100)
Read the whole file into memory
data = open("example2","rb").read()
Check that Python and the file system agree on the number of bytes in the file (~3.2 MB)
len(data)
!ls -l example2
Let's take a look at the first 100 bytes again
data[:100]
The "WAVEfmt" string is a clue that we may be dealing with a WAV format audio file. A look at the WAV spec confirms our suspicions and gives us the necessary details for decoding the file (analogous to knowing the rules linking a TATA box to a transcriptional start site)
We'll scan for the first data subchunk, identified by a "data" label. Note that because we read a byte string, we need to mark our search string as a byte string as well
data.find(b"data")
data[70:74]
The next 4 bytes give the size of this data subchunk, in "little endian" encoding (least significant byte first).
This is equivalent to writing a decimal number backwards. I.e., conventionally, we write twelve with the most significant (tens) digit first: "12". In little endian encoding we would write it "21".
data[74:78]
We'll write a helper function to decode little endian values
def littleBytes(a):
# Initialize decoded value to 0
r = 0
# Value is given _least_ significant byte first,
# so we'll look over the bytes in reverse order,
# taking the most significant byte first
for i in reversed(a):
# Left shift the current value 8 bits.
# This is equivalent to multiplying by 2**8=256.
# Think of it as adding a zero on the right of
# a decimal number (equivalent to multiplying by 10)
r = r << 8
# Interpret the current byte as a number and put
# it in the right-most "ones" position (which we
# just vacated with our left shift)
r += i
return r
Now we're ready to decode the size of the data subchunk
littleBytes(data[74:78])
So, this data subchunk is close to the full length of the file (~3.2 MB), implying that all of the audio content is in a single data block.
The data block is simply a list of little endian samples, which we can think of as measurements of displacements of a microphone of speaker membrane at evenly spaced points in time. Let's take a look at the first 10000 bytes of data (2500 samples)
x = [littleBytes(data[i:i+4]) for i in range(78,10078,4)]
min(x),max(x)
Note that this spans the full range available to 4 byte unsigned integer data
256**4-1
Let's plot the first 1000 samples
%matplotlib nbagg
import matplotlib.pyplot as plt
fig = plt.figure()
plt.plot(x[:1000])
With this signed interpretation of the data, we can re-plot the first 1000 samples and get something that looks like a waveform
a = 256**4
fig = plt.figure()
plt.plot([(i - a/2) % a for i in x[:1000]])
Plotting the full 10000 samples in x shows increasing volume
fig = plt.figure()
plt.plot([(i - a/2) % a for i in x])
To visualize the full file, rather than trying to plot 3.2 million points directly (which might overtax matplotlib's nbagg backend) we'll subsample by increasing the step size in our call to range (making sure that the step is a multiple of four, so that we stay in the right reading frame)
x = [littleBytes(data[i:i+4]) for i in range(78,len(data),160)]
len(x)
fig = plt.figure()
plt.plot([(i - a/2) % a for i in x])
This gives a good view of the volume envelope for the full file. If we were interested in a particular time range, we could plot the full set of samples for just that range.