Weighted Least Squares Estimation of a Stationary Linear System

I figured I’d kind of skip over this variant in the least squares estimator, after all, my goal is to talk about all the stepping stones to get to the Kalman filter, and I’m a little impatient, but I like talking about the weighted least squares estimator because it shows that even if some of our measurements are less reliable than others we should never discard them. If we continue to run with the beat to death example of a resistor, let’s now assume we are using a couple different multimeters to make measurements, one that is really expensive and calibrated, and one that’s been purchased from Amazon for 10 dollars.

The above scenario essentially boils down to us saying we have more faith in one measuring device than the other, but how do we represent this in a mathematically meaningful way? With statistics!

Let’s take the same equations from our last little talk about least squares estimation, but modify them now by actually characterizing the noise that we are experiencing. Now, if we assume that the noise for each measurement is zero-mean and independent we can write our expectation and measurement covariance matrix as follows:

\begin{aligned}  1) \displaystyle \qquad E(v_i^2) &= \sigma_i^2 \qquad (i=1, \dotsb, k)\\  R &= E(vv^T) \\  &= diag \left (\sigma_1^2, \dotsb , \sigma_k^2 \right )  \end{aligned}  

We then construct our cost function like we did before, but this time weigh the sum of squares equation by the variance in our measurements. This allows us to place more emphasis on measurements that are less noisy and less emphasis on measurements that are very noisy. This technique of weighing things by how much faith we have in measurements will continue to show up as we discuss “better” estimators. This weighting results in a cost function as follows:

\begin{aligned}  2) \displaystyle \qquad J &= \frac{\epsilon_{y1}^2}{\sigma_1^2} + \dotsb + \frac{\epsilon_{yk}^2}{\sigma_k^2}\\  &= \epsilon_y^T R^{-1} \epsilon_y \end{aligned}  

Honestly, when we carry out the matrix multiplication shown above we get a pretty dumb and long looking equation that I don’t really care to type out, and unless you’re just looking for a proof for a homework assignment it doesn’t matter much. Let’s call it good enough to show that when we again take the partial derivative of our cost function with respect to \hat{x} we arrive at

\begin{aligned}  3) \displaystyle \qquad \hat{x} = \left ( H^TR^{-1}H \right )^{-1}H^TR^{-1}y \end{aligned}  

It goes without saying that since we are inverting R in equation 3 it must be nonsingular, in other words, we have to assume that each of our measurements are affected by at least some amount of noise.

More Examples!

If we return to our example of estimating the resistance of a resistor, we can basically just employ equation 3 directly, which ends up being

\begin{aligned}  4) \displaystyle \qquad \hat{x} &= \left ( H^TR^{-1}H \right )^{-1}H^TR^{-1}y \\  &= \left( \sum \frac{1}{\sigma_i^2} \right )^{-1} \left (\frac {y_1}{\sigma_1^2} + \dotsb + \frac{y_k}{\sigma_k^2} \right ) \end{aligned}  

This time, instead of just the average of measurements our solution becomes the weighted sum of measurements, where each measurement is weighted by the inverse of our uncertainty in that measurement.

Boom.