This notebook contains code examples from Chapter 8: Filtering and Convolution
Copyright 2015 Allen Downey
# Get thinkdsp.py
import os
if not os.path.exists('thinkdsp.py'):
!wget https://github.com/AllenDowney/ThinkDSP/raw/master/code/thinkdsp.py
import numpy as np
import matplotlib.pyplot as plt
from thinkdsp import decorate
# suppress scientific notation for small numbers
np.set_printoptions(precision=3, suppress=True)
As the first example, I'll look at daily closing stock prices for Facebook, from its IPO on 2012-05-18 to 2020-03-27 (note: the dataset includes only trading days )
if not os.path.exists('FB_2.csv'):
!wget https://github.com/AllenDowney/ThinkDSP/raw/master/code/FB_2.csv
import pandas as pd
df = pd.read_csv('FB_2.csv', header=0, parse_dates=[0])
df.head()
Date | Open | High | Low | Close | Adj Close | Volume | |
---|---|---|---|---|---|---|---|
0 | 2012-05-18 | 42.049999 | 45.000000 | 38.000000 | 38.230000 | 38.230000 | 573576400 |
1 | 2012-05-21 | 36.529999 | 36.660000 | 33.000000 | 34.029999 | 34.029999 | 168192700 |
2 | 2012-05-22 | 32.610001 | 33.590000 | 30.940001 | 31.000000 | 31.000000 | 101786600 |
3 | 2012-05-23 | 31.370001 | 32.500000 | 31.360001 | 32.000000 | 32.000000 | 73600000 |
4 | 2012-05-24 | 32.950001 | 33.209999 | 31.770000 | 33.029999 | 33.029999 | 50237200 |
df.tail()
Date | Open | High | Low | Close | Adj Close | Volume | |
---|---|---|---|---|---|---|---|
1972 | 2020-03-23 | 149.660004 | 152.309998 | 142.250000 | 148.100006 | 148.100006 | 29830800 |
1973 | 2020-03-24 | 155.210007 | 161.309998 | 152.570007 | 160.979996 | 160.979996 | 30440400 |
1974 | 2020-03-25 | 158.919998 | 162.990005 | 153.059998 | 156.210007 | 156.210007 | 35184300 |
1975 | 2020-03-26 | 158.250000 | 164.000000 | 157.020004 | 163.339996 | 163.339996 | 26556800 |
1976 | 2020-03-27 | 158.199997 | 160.089996 | 154.750000 | 156.789993 | 156.789993 | 24861900 |
Extract the close prices and days since start of series:
close = df['Close']
dates = df['Date']
days = (dates - dates[0]) / np.timedelta64(1,'D')
Make a window to compute a 30-day moving average and convolve the window with the data. The valid
flag means the convolution is only computed where the window completely overlaps with the signal.
M = 30
window = np.ones(M)
window /= sum(window)
smoothed = np.convolve(close, window, mode='valid')
smoothed_days = days[M//2: len(smoothed) + M//2]
Plot the original and smoothed signals.
plt.plot(days, close, color='gray', alpha=0.6, label='daily close')
plt.plot(smoothed_days, smoothed, color='C1', alpha=0.6, label='30 day average')
decorate(xlabel='Time (days)', ylabel='Price ($)')
Generate a 440 Hz sawtooth signal.
from thinkdsp import SawtoothSignal
signal = SawtoothSignal(freq=440)
wave = signal.make_wave(duration=1.0, framerate=44100)
wave.make_audio()
Make a moving average window.
window = np.ones(11)
window /= sum(window)
plt.plot(window)
decorate(xlabel='Index')
Plot the wave.
segment = wave.segment(duration=0.01)
segment.plot()
decorate(xlabel='Time (s)')
Pad the window so it's the same length as the signal, and plot it.
def zero_pad(array, n):
"""Extends an array with zeros.
array: NumPy array
n: length of result
returns: new NumPy array
"""
res = np.zeros(n)
res[:len(array)] = array
return res
N = len(segment)
padded = zero_pad(window, N)
plt.plot(padded)
decorate(xlabel='Index')
Apply the window to the signal (with lag=0).
prod = padded * segment.ys
np.sum(prod)
np.float64(-0.9001814882032255)
Compute a convolution by rolling the window to the right.
smoothed = np.zeros(N)
rolled = padded.copy()
for i in range(N):
smoothed[i] = sum(rolled * segment.ys)
rolled = np.roll(rolled, 1)
plt.plot(rolled)
decorate(xlabel='Index')
Plot the result of the convolution and the original.
from thinkdsp import Wave
segment.plot(color='gray')
smooth = Wave(smoothed, framerate=wave.framerate)
smooth.plot()
decorate(xlabel='Time(s)')
Compute the same convolution using numpy.convolve
.
segment.plot(color='gray')
ys = np.convolve(segment.ys, window, mode='valid')
smooth2 = Wave(ys, framerate=wave.framerate)
smooth2.plot()
decorate(xlabel='Time(s)')
Let's see what's happening in the frequency domain.
Compute the smoothed wave using np.convolve
, which is much faster than my version above.
convolved = np.convolve(wave.ys, window, mode='same')
smooth = Wave(convolved, framerate=wave.framerate)
smooth.make_audio()
Plot spectrums of the original and smoothed waves:
spectrum = wave.make_spectrum()
spectrum.plot(color='gray')
spectrum2 = smooth.make_spectrum()
spectrum2.plot()
decorate(xlabel='Frequency (Hz)', ylabel='Amplitude')
For each harmonic, compute the ratio of the amplitudes before and after smoothing.
amps = spectrum.amps
amps2 = spectrum2.amps
ratio = amps2 / amps
ratio[amps<280] = 0
plt.plot(ratio)
decorate(xlabel='Frequency (Hz)', ylabel='Amplitude ratio')
Plot the ratios again, but also plot the FFT of the window.
padded = zero_pad(window, len(wave))
dft_window = np.fft.rfft(padded)
plt.plot(np.abs(dft_window), color='gray', label='DFT(window)')
plt.plot(ratio, label='amplitude ratio')
decorate(xlabel='Frequency (Hz)', ylabel='Amplitude ratio')
Let's compare boxcar and Gaussian windows.
Make the boxcar window.
boxcar = np.ones(11)
boxcar /= sum(boxcar)
Make the Gaussian window.
import scipy
gaussian = scipy.signal.windows.gaussian(M=11, std=2)
gaussian /= sum(gaussian)
Plot the two windows.
plt.plot(boxcar, label='boxcar')
plt.plot(gaussian, label='Gaussian')
decorate(xlabel='Index')
Convolve the square wave with the Gaussian window.
ys = np.convolve(wave.ys, gaussian, mode='same')
smooth = Wave(ys, framerate=wave.framerate)
spectrum2 = smooth.make_spectrum()
Compute the ratio of the amplitudes.
amps = spectrum.amps
amps2 = spectrum2.amps
ratio = amps2 / amps
ratio[amps<560] = 0
Compute the FFT of the window.
padded = zero_pad(gaussian, len(wave))
dft_gaussian = np.fft.rfft(padded)
Plot the ratios and the FFT of the window.
plt.plot(np.abs(dft_gaussian), color='0.7', label='Gaussian filter')
plt.plot(ratio, label='amplitude ratio')
decorate(xlabel='Frequency (Hz)', ylabel='Amplitude ratio')
Combine the preceding example into one big function so we can interact with it.
from thinkdsp import SquareSignal
def plot_filter(M=11, std=2):
signal = SquareSignal(freq=440)
wave = signal.make_wave(duration=1, framerate=44100)
spectrum = wave.make_spectrum()
gaussian = scipy.signal.windows.gaussian(M=M, std=std)
gaussian /= sum(gaussian)
ys = np.convolve(wave.ys, gaussian, mode='same')
smooth = Wave(ys, framerate=wave.framerate)
spectrum2 = smooth.make_spectrum()
# plot the ratio of the original and smoothed spectrum
amps = spectrum.amps
amps2 = spectrum2.amps
ratio = amps2 / amps
ratio[amps<560] = 0
# plot the same ratio along with the FFT of the window
padded = zero_pad(gaussian, len(wave))
dft_gaussian = np.fft.rfft(padded)
plt.plot(np.abs(dft_gaussian), color='gray', label='Gaussian filter')
plt.plot(ratio, label='amplitude ratio')
decorate(xlabel='Frequency (Hz)', ylabel='Amplitude ratio')
plt.show()
Try out different values of M
and std
.
from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets
slider = widgets.IntSlider(min=2, max=100, value=11)
slider2 = widgets.FloatSlider(min=0, max=20, value=2)
interact(plot_filter, M=slider, std=slider2);
interactive(children=(IntSlider(value=11, description='M', min=2), FloatSlider(value=2.0, description='std', m…
Let's use the Convolution theorem to compute convolutions using FFT.
I'll use the Facebook data again, and smooth it using np.convolve
and a 30-day Gaussian window.
I ignore the dates and treat the values as if they are equally spaced in time.
window = scipy.signal.windows.gaussian(M=30, std=6)
window /= window.sum()
smoothed = np.convolve(close, window, mode='valid')
len(close), len(smoothed)
(1977, 1948)
Plot the original and smoothed data.
plt.plot(close, color='gray')
plt.plot(smoothed)
decorate(xlabel='Time (days)', ylabel='Price ($)')
Pad the window and compute its FFT.
N = len(close)
padded = zero_pad(window, N)
fft_window = np.fft.fft(padded)
plt.plot(np.abs(fft_window))
decorate(xlabel='Index')
Apply the convolution theorem.
fft_signal = np.fft.fft(close)
smoothed2 = np.fft.ifft(fft_signal * fft_window)
M = len(window)
smoothed2 = smoothed2[M-1:]
Plot the two signals (smoothed with numpy and FFT).
plt.plot(smoothed)
plt.plot(smoothed2.real)
decorate(xlabel='Time (days)', ylabel='Price ($)')
Confirm that the difference is small.
diff = smoothed - smoothed2
np.max(np.abs(diff))
np.float64(2.0017365280339838e-13)
scipy.signal
provides fftconvolve
, which computes convolutions using FFT.
smoothed3 = scipy.signal.fftconvolve(close, window, mode='valid')
Confirm that it gives the same answer, at least approximately.
diff = smoothed - smoothed3
np.max(np.abs(diff))
np.float64(1.4210854715202004e-13)
We can encapsulate the process in a function:
def fft_convolve(signal, window):
fft_signal = np.fft.fft(signal)
fft_window = np.fft.fft(window)
return np.fft.ifft(fft_signal * fft_window)
And confirm that it gives the same answer.
smoothed4 = fft_convolve(close, padded)[M-1:]
len(smoothed4)
1948
diff = smoothed - smoothed4
np.max(np.abs(diff))
np.float64(2.0017365280339838e-13)
We can also use the convolution theorem to compute autocorrelation functions.
Compute autocorrelation using numpy.correlate
:
corrs = np.correlate(close, close, mode='same')
corrs[:7]
array([10515607.461, 10533254.812, 10550927.762, 10570279.165, 10589752.032, 10609151.979, 10628488.485])
Compute autocorrelation using my fft_convolve
. The window is a reversed copy of the signal. We have to pad the window and signal with zeros and then select the middle half from the result.
def fft_autocorr(signal):
N = len(signal)
signal = zero_pad(signal, 2*N)
window = np.flipud(signal)
corrs = fft_convolve(signal, window)
corrs = np.roll(corrs, N//2+1)[:N]
return corrs
Test the function.
corrs2 = fft_autocorr(close)
corrs2[:7]
array([10515607.461-0.j, 10533254.812-0.j, 10550927.762+0.j, 10570279.165-0.j, 10589752.032-0.j, 10609151.979-0.j, 10628488.485-0.j])
Plot the results.
lags = np.arange(N) - N//2
plt.plot(lags, corrs, color='gray', alpha=0.5, label='np.convolve')
plt.plot(lags, corrs2.real, color='C1', alpha=0.5, label='fft_convolve')
decorate(xlabel='Lag', ylabel='Correlation')
len(corrs), len(corrs2)
(1977, 1977)
Confirm that the difference is small.
diff = corrs - corrs2.real
np.max(np.abs(diff))
np.float64(2.9802322387695312e-08)