andrejmuhic.github.io - Kalman filter exposed

Kalman filter

The canonical example of using Kalman filter would be kinematic motion of a car, where state consists of position, velocity and potentially acceleration and other variables. The noise is traditionally induced by random forces. The kinematic equations induce the prior on the state and thus position of the car. On top of that we are able to observe the position with some random noise by means of for example GPS and that serves as additional correction of the state distribution and this induces the posterior. The cleanest setup is Bayesian. Traditionally the noise components are constant and Gaussian. We will not add control part, which would be otherwise known acceleration induced by known forces on the car, robot, etc.

\[\text{measurement equation} \; \; y_t = F s_t + \epsilon_t,\;\; \text{where } \epsilon_t \sim N(0, \Omega_\epsilon), \text{where } F, \Omega_\epsilon \text{ are known} \tag{1}\]
\[\text{transition equation} \; \; s_t = G s_{t-1} + \eta_t,\;\; \text{where } \eta_t \sim N(0, \Omega_\eta), , \text{where } G, \Omega_\eta \text{ are known} \tag{2}\]

Let us define: \[\text{full history of observations} \; \; \pmb{Y}_t = \left(y_1, y_2, \dots, y_{t-1}, y_{t}\right) \tag{3}\] All noise components are Gaussian thus the joint and all marginals, conditional marginals will be Gaussian too. This enables simple computation of the posterior. We want to find \(P(s_t|Y_t).\)

Bayesian interpretation

\[P(y_t|s_t, \pmb{Y}_{t-1}) = \frac{P(y_t, s_t, \pmb{Y}_{t-1})}{P(s_t, \pmb{Y}_{t-1})}\]
\[P(s_t|\pmb{Y}_{t-1}) = \frac{P(s_t, \pmb{Y}_{t-1})}{P(\pmb{Y}_{t-1})}\]
\[P(y_t, \pmb{Y}_{t-1}) = \int P(s_t, y_t|\pmb{Y}_{t-1}) \;\texttt{d} s_t \times P(\pmb{Y}_{t-1})\]
\[P(y_t| \pmb{Y}_{t-1}) = \int P(s_t, y_t|\pmb{Y}_{t-1}) \;\texttt{d} s_t \]
\[P(s_t|\pmb{Y}_t) = P(s_t|y_t, \pmb{Y}_{t-1}) = \frac{P(s_t, y_t, \pmb{Y}_{t-1})}{P(y_t, \pmb{Y}_{t-1})} = \frac{P(y_t|s_t, \pmb{Y}_{t-1}) \times P(s_t|\pmb{Y}_{t-1})}{P(y_t| \pmb{Y}_{t-1})}\] Let us put this together:
The process is initialized by specifying initial \(s_0\) and \(\Sigma_0.\)
\[s_{t-1}|\pmb{Y}_{t-1} \sim N(\widehat{s}_{t-1}, \Sigma_{t-1}),\] where \(\widehat{s}_{t-1}\) is the mean of \(s_{t-1}.\)
Out prior from transition is: \[s_t|\pmb{Y}_{t-1} \sim N(G \widehat{s}_{t-1}, R_t), \] where \[R_t = G\Sigma_{t-1}G^T + \Omega_\eta.\]
From observation we get: \[y_t|s_t, \pmb{Y}_{t-1} \sim N(Fs_t, \Omega_\epsilon).\]
Thus posterior is of the form: \[P(s_t|\pmb{Y}_t) = \frac{P(y_t|s_t, \pmb{Y}_{t-1}) \times P(s_t|\pmb{Y}_{t-1})}{P(y_t| \pmb{Y}_{t-1})}.\] It is kind of nasty to compute the posterior by hand but we know it is Gaussian distribution again!

We will borrow the result that will greatly simplify this. Let \(x, y\) be jointly Gaussian: \[\begin{bmatrix} x \\ y\end{bmatrix} \sim N\left(\begin{bmatrix} \mu_x \\ \mu_y\end{bmatrix}, \begin{bmatrix}A & C\\C^T& B\end{bmatrix}\right) = N\left(\begin{bmatrix} \mu_x \\ \mu_y\end{bmatrix}, \begin{bmatrix}\widetilde{A} & \widetilde{C}\\\widetilde{C}^T& \widetilde{B}\end{bmatrix}^{-1}\right),\] then the marginal distribution of \(x\) and \(x|y\) are: \[x \sim N(\mu_x, A), \text{ and } x|y \sim N(\mu_x + CB^{-1}(y-\mu_y), A-CB^{-1}C^T)\] or \[x|y \sim N(\mu_x - \widetilde{A}^{-1}\widetilde{C}(y-\mu_y), \widetilde{A}^{-1})\] How does this help us. Let us write in block form: \[\begin{bmatrix} s_t \\ y_t \end{bmatrix} | \pmb{Y}_{t-1}= \begin{bmatrix} G & I & 0\\ FG & F & I \end{bmatrix} \begin{bmatrix} s_{t-1} \\ \eta_t \\ \epsilon_t \end{bmatrix} | \pmb{Y}_{t-1}\] We also know: \[\begin{bmatrix} s_{t-1} \\ \eta_t \\ \epsilon_t \end{bmatrix} | \pmb{Y}_{t-1} = \begin{bmatrix}s_{t-1}|\pmb{Y}_{t-1} \\ \eta_t \\ \epsilon_t \end{bmatrix} \sim N\left(\begin{bmatrix} \widehat{s}_{t-1}\\ 0 \\0\end{bmatrix}, \begin{bmatrix} \Sigma_{t-1} & 0 & 0\\ 0 & \Omega_{\eta} & 0\\ 0 & 0 & \Omega_{\epsilon}\end{bmatrix}\right) \] We do block multiplication and use the fact that \(\texttt{var}(M v) = M \texttt{var}(v) M^T\) obtain: \[\begin{bmatrix} s_t \\ y_t \end{bmatrix} | \pmb{Y}_{t-1} \sim N\left(\begin{bmatrix} G \widehat{s}_{t-1} \\ FG \widehat{s}_{t-1}\end{bmatrix}, \begin{bmatrix} G & I & 0\\ FG & F & I \end{bmatrix} \begin{bmatrix} \Sigma_{t-1} & 0 & 0\\ 0 & \Omega_{\eta} & 0\\ 0 & 0 & \Omega_{\epsilon} \end{bmatrix} \begin{bmatrix} G & I & 0\\ FG & F & I \end{bmatrix}^T \right)=\] Let us define: \[R_t = G \Sigma_{t-1} G^T + \Omega_{\eta},\] then continue: \[\begin{bmatrix} s_t \\ y_t \end{bmatrix} | \pmb{Y}_{t-1} \sim N\left(\begin{bmatrix} G \widehat{s}_{t-1} \\ FG \widehat{s}_{t-1}\end{bmatrix}, \begin{bmatrix} R_{t} & R_{t} F^T\\ FR_{t-1}^T & FR_{t}F^T + \Omega_\epsilon \end{bmatrix} \right).\] Now we are ready to compute: \[s_t| \pmb{Y}_{t-1} | y_t \sim N\left(G \widehat{s}_{t-1} + R_{t}F^T (FR_{t}F^T + \Omega_\epsilon)^{-1}(y_t - G \widehat{s}_{t-1}) , R_t - R_t F^T (FR_tF^T+\Omega_\epsilon)^{-1} F R_t^T\right). \]

Simple two dimensional movement example

Let us provide an example of this as two dimensional movement, where \(x\) denotes position and \(v\) velocity. The system is exposed to random forces.

\[s_t = \begin{bmatrix} x_1 \\ x_2\\ v_1 \\ v_2\end{bmatrix}.\] The transtion equations are \[x_1' = x_1 + v_1 \texttt{dt} \times + a_1 \frac{\texttt{dt}^2}{2} \text{ and } x_2' = x_2 + \texttt{dt} \times v_2 + a_2 \frac{\texttt{dt}^2}{2}.\] Let us assume that all acceleration comes from independent random external forces. The \(s_t' = s_{t+1}\) is short hand notation and acts on all vector components. This can be rewritten in matrix form as: \[ s_t' = \begin{bmatrix} x_1 \\ x_2\\ v_1 \\ v_2\end{bmatrix}' = \overbrace{\begin{bmatrix} 1 & 0 & \texttt{dt} & 0\\0 & 1 & 0 & \texttt{dt}\\0 & 0 &1 & 0 \\ 0 & 0 & 0 & 1\end{bmatrix}}^G s_t + \overbrace{\begin{bmatrix} \frac{\texttt{dt}^2}{2} & 0\\ 0 & \frac{\texttt{dt}^2}{2}\\ \texttt{dt} & 0 \\ 0 &\texttt{dt}\end{bmatrix}}^{\texttt{denote } V \texttt{ such that } \eta_{t+1} \sim N(0, VV^T)} \overbrace{\begin{bmatrix} a_1 \\ a_2 \end{bmatrix}}^{\sim N(0,I_2)}.\] Expanding all this: \[ s_{t+1} = G s_t + \eta_t, \text{ where } \eta_{t+1} = \begin{bmatrix} \frac{\texttt{dt}^4}{4} & 0 & \frac{\texttt{dt}^3}{2} & 0\\ 0 & \frac{\texttt{dt}^4}{4} & 0 & \frac{\texttt{dt}^3}{2}\\ \frac{\texttt{dt}^3}{2} & 0 & \texttt{dt}^2 & 0 \\ 0 &\frac{\texttt{dt}^3}{2} & 0 &\texttt{dt}^2\end{bmatrix}\] The observation equation is simply: \[y_{t+1} = y_t' = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}' = \overbrace{\begin{bmatrix} 1 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 \end{bmatrix}}^F \overbrace{\begin{bmatrix} x_1 \\ v_1\\ x_2 \\ v_2\end{bmatrix}}^{s_t} + \overbrace{\epsilon_t}^{\sim N(0, \frac{I_2}{4})}\]

Simple movement equation example in python

There are several options for estimation:

Parameters and data setup

# No control input in this example but that is just constant vector dependent on t
import numpy as np
from numpy import linalg
import pandas as pd  # Pandas isn't necessary, but makes some output nicer
import statsmodels.api as sm
from pathlib import Path
path_csv = Path('./measurements_2d.csv')

Y_pandas = pd.read_csv(path_csv, index_col=0)
Y = Y_pandas.values


# The matrices in the dynamic model are set up as follows
omega_eta, dt, omega_eps = 1, 0.1, 0.5
G = np.array([[1, 0, dt, 0],
              [0, 1, 0, dt],
              [0, 0, 1, 0],
              [0, 0, 0, 1]])
Omega_custom = omega_eta ** 2 * np.array([[dt ** 3 / 3, dt ** 2 / 2, 0, 0],
                                          [dt ** 2 / 2, dt, 0, 0],
                                          [0, 0, dt ** 3 / 3, dt ** 2 / 2],
                                          [0, 0, dt ** 2 / 2, dt]])
Omega_eta = omega_eta ** 2 * np.array([[dt ** 4 / 4, 0, dt ** 3 / 2, 0],
                                       [0, dt ** 4 / 4, 0, dt ** 3 / 2],
                                       [dt ** 3 / 2, 0, dt ** 2, 0],
                                       [0, dt ** 3 / 2, 0, dt ** 2]])
# Matrices in the measurement model are designed as follows
F = np.array([[1, 0, 0, 0],
              [0, 1, 0, 0]])
Omega_epsilon = omega_eps ** 2 * np.eye(2)
# Starting values
# Initial state vector
s0 = np.array([[0.0, 0.0, 1.0, -1.0]]).T  # column vector
# Initial noise covariance
Sigma0 = np.eye(4)

n = 100

Manual Kalman filter, would need numerical improvements

s = s0
Sigma = Sigma0
kf_mean = np.zeros((s.shape[0], n))
kf_Sigma = np.zeros((Sigma.shape[0], Sigma.shape[1], n))
for t in range(100):
    R_t = G @ Sigma @ G.T + Omega_eta
    S = F @ R_t @ F.T + Omega_epsilon
    K = linalg.lstsq(S.T, (R_t @ F.T).T)[0].T
    # K = (R_t @ F.T) @ np.linalg.pinv(S)
    s_t = G @ s
    s = s_t + K @ (Y[:, t, np.newaxis] - F @ s_t)
    Sigma = R_t- K @ S @ K.T

    kf_mean[:, t] = s.flatten()
    kf_Sigma[:, :, t] = Sigma
print(pd.DataFrame(kf_mean.T, columns=['x1', 'x2', 'dx1/dt', 'dx2/dt']))

           x1         x2    dx1/dt    dx2/dt
0   -0.281083  -0.235580  0.962081 -1.013491
1    0.100219  -0.200777  1.122475 -0.936892
2    0.228852  -0.735516  1.141854 -1.458522
3    0.379437  -0.749947  1.202244 -1.240481
4    0.587982  -0.449752  1.367730 -0.445575
..        ...        ...       ...       ...
95  11.935788  14.163066  0.888412  1.743867
96  12.036713  14.317419  0.900481  1.723859
97  12.261151  14.588231  1.034703  1.822161
98  12.322096  14.765653  0.992230  1.817373
99  12.501377  14.992161  1.072189  1.862088

[100 rows x 4 columns]

/tmp/ipykernel_786442/1222645060.py:8: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
  K = linalg.lstsq(S.T, (R_t @ F.T).T)[0].T

Kalman filter using statsmodels

# Now instantiate a statespace model with the data
# (data should be shaped nobs x n_variables))
kf = sm.tsa.statespace.MLEModel(pd.DataFrame(Y.T), k_states=4)
kf._state_names = ['x1', 'x2', 'dx1/dt', 'dx2/dt']
kf['design'] = F
kf['obs_cov'] = Omega_epsilon
kf['transition'] = G
kf['selection'] = np.eye(4)
kf['state_cov'] = Omega_eta

# Unfortunate difference in timing convetions
# https://www.statsmodels.org/stable/statespace.html#models-and-estimation
kf.initialize_known(G @ s0[:, 0], G @ Sigma0 @ G.T + Omega_eta)
# To performan Kalman filtering and smoothing, use:
res = kf.smooth([])
# Display filtered
print(res.states.filtered)

           x1         x2    dx1/dt    dx2/dt
0   -0.281083  -0.235580  0.962081 -1.013491
1    0.100219  -0.200777  1.122475 -0.936892
2    0.228852  -0.735516  1.141854 -1.458522
3    0.379437  -0.749947  1.202244 -1.240481
4    0.587982  -0.449752  1.367730 -0.445575
..        ...        ...       ...       ...
95  11.935788  14.163066  0.888412  1.743867
96  12.036713  14.317419  0.900481  1.723859
97  12.261151  14.588231  1.034703  1.822161
98  12.322096  14.765653  0.992230  1.817373
99  12.501377  14.992161  1.072189  1.862088

[100 rows x 4 columns]

Manual verification of first step using pymc

import pymc as pm

with pm.Model() as model_multi:  # model specifications in PyMC are wrapped in a with-statement
    ss = pm.MvNormal("ss", mu=G@s0[:, 0], cov=G@Sigma0@G.T + Omega_eta, transform=None)
    # Define likelihood
    likelihood = pm.MvNormal("y", mu=F @ ss, cov=Omega_epsilon, observed=Y[:, [0]].T)

    # Inference!
    # draw 3000 posterior samples using NUTS sampling
    idata = pm.sample(draws=3000, chains=4)
    # find MAP
    res_MAP = pm.find_MAP(maxeval=50000, return_raw=True)
print(pd.DataFrame(np.array([res_MAP[0]['ss']]), columns=['x1', 'x2', 'dx1/dt', 'dx2/dt']))

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [ss]
Sampling 4 chains for 1_000 tune and 3_000 draw iterations (4_000 + 12_000 draws total) took 6 seconds.

         x1       x2    dx1/dt    dx2/dt
0 -0.281083 -0.23558  0.962081 -1.013491