Recursive Least Squares Estimation

So, we’ve talked about least squares estimation and how we can weight that estimation based on our certainty in our measurements. Now let’s talk about when we want to do this shit online and roll in each subsequent measurement!

Things begin to get a bit more complicated now that we have an “update” step that requires us to fold in new information, but we are getting closer and closer to the more complicated Kalman filter. In order to fold in this newly acquired data, we add in a matrix, K_k. known as the estimator gain matrix, or if you’re dealing with a Kalman filter it’s called the Kalman gain matrix, either way, it’s the same thing: just a weighting of how confident we are in our measurements based on their statistics.

We will therefore amend our state equations as follows:

\begin{aligned}  1) \displaystyle \qquad y_k &= H_kx + v_k)\\  \hat{x}_k &= \hat{x}_{k-1} + K_k \left (y_k - H_k \hat{x}_{k-1}\right ) \\  \end{aligned}  

The difference this time being that we compute \hat{x_k} based on the previous estimate, \hat{x}_{k-1} and the measurement y_k.

For the sake of expediency I’m going to skip most of the derivation of this one because the matrix equations continue to get longer, so let’s just jump right to the useful part of it all.

Recursive Least Squares Estimation Algorithm

We begin by first initiating the estimator as follows:

\begin{aligned}  2) \displaystyle \qquad \hat{x}_0 &= E(x)\\  P_0 &= E \left [ (x - \hat{x}_0)(x - \hat{x}_0)^T \right ] \\  \end{aligned}  

Where P_k is the estimation-error covariance. There’s a bit of a trick we play next to get the ball rolling: if we don’t know anything about x before measurements start rolling in, we set P_0 = \infty I. If we know exactly what x is before measurements roll in we set P_0 = 0. So, basically the less we know about x the larger we set our estimation-error covariance.

With the estimator initialized, let’s start chugging along. We update our estimate of x and the estimation-error covariance, P, with the following equations:

\begin{aligned}  3) \displaystyle \qquad K_k &= P_{k-1}H_k^T \left ( H_k P_{k-1} H_k^T + R_k \right )^{-1}\\  \hat{x}_k &= \hat{x}_{k-1} + K_k \left ( y_k - H_k \hat{x}_{k-1} \right) \\ P_k &= \left (I - K_kH_k \right ) P_{k-1} \left (I - K_kH_k \right )^T + K_kR_kK_k^T  \end{aligned}  

Where R_k is the covariance of our zero-mean noise process, v_k. In the real world this is literally just the variance in your measurement variable when your state is held constant. So, if our output measurement variable were, for example, the angular velocity of a table saw blade, we could run the table saw at constant velocity, collect our data, and determine R to be the variance in our angular velocity measurements.

Here’s a picture I found from researchgate[1] that illustrates the effect of a recursive least squares estimator (black line) on measured data (blue line). This scenario shows a RLS estimator being used to smooth data from a cutting tool. Don’t worry about the red line, that’s a bayesian RLS estimator.

Slightly Sexier Examples (Curve Fitting)

Despite what many people with their resistor based examples of what least squares estimators will tell you, we can actually do some pretty cool things with these guys. Take for example curve fitting. Let’s suppose we are measuring the altitude of a free falling object. From basic physics we know that the underlying governing equation for this is a quadratic function of time:

\begin{aligned}  4) \displaystyle \qquad r = r_0 + v_0t + \frac{a}{2}t^2  \end{aligned}  

Where r is position, r_0 is the initial position, v_0 is the initial velocity, and a is the acceleration. If we then measure r at many different times and fit a quadratic to the resulting position vs time curve, we will have an estimate of our initial velocity and acceleration.

In this scenario, our output equation takes the form

\begin{aligned}  5) \displaystyle \qquad y_k &= x_1 + x_2 t_k + x_3 t_k^2 + v_k \\  E(v_k^2) &= R_k \\  H_k &= \left [ 1 \, t_k \, t_k^2 \right ]  \end{aligned}  

From here, it’s just a simple matter of initializing the estimator and then letting the algorithm governed by the equations set out in 3 do its thing, or what ever other flavor of least squares estimation you prefer.

In the words of Jerry Smith, later days, amigo.

Resources

  1. Mehta, Parikshit & Mears, Laine. (2014). Adaptive control for multistage machining process scenario—bar turning with varying material properties. International Journal of Advanced Manufacturing Technology. 10.1007/s00170-014-6739-x.