Least squares estimators are typically used to determine a constant on the basis of several noisy measurements of that constant. Mathematically I think they’re pretty beautiful, and the derivation of them leads us to the somewhat ubiquitous Moore-Penrose pseudo inverse.
Let’s just jump right in!
Suppose we want to determine a “best” estimate of some state constant n-element state vector, x, and y is a k-element noisy measurement vector. Typically estimates of something are denoted as that “something” with a hat over it, so we will call our estimated vector . Let’s assume that the process governing this system is linear, otherwise our lives have just gotten a whole lot harder, such that each element of the measurement vector y is a linear combination of elements of the state vector, x, with some added noise on top.
Which can be written in matrix form simply as:
Now, let’s introduce something called the measurement residual, which is simply the difference between the noisy measurement (actual system output) and the estimated output.
So, as we can see from equation 3, the smaller the measurement residual becomes, in some sense, the better our estimate of that parameter is. Simply by inspection it makes sense that if our measurement residual becomes zero we have done a pretty good job estimating x.
Since we are dealing with least squares estimation, the above “some sense” is in a least squares sense, so we will construct a cost function, J, that is the sum of squares of our measurement residuals as follows:
Now, let’s substitute equation 3 into equation 4 so we obtain the cost function in terms of our measurement vector, y, the output matrix, H, and estimated state vector .
Cool, now that we have our cost function constructed, the next step is to minimize said function with respect to . Remembering back to calc 3 we know that in order to minimize a multivariable equation with respect to one of its variables we must compute the partial derivative and set it equal to zero.
If we gently massage equation 6 so as to solve for our state estimate we obtain:
Where is the left pseudo inverse. The pseudo inverse exists if and H is full rank. This just means that we need to have more measurements than there are number of variables and the measurements must be linearly independent.
Example Time!
The example that I always see given for this simple problem is that of estimating the resistance of an unmarked resistor based on k measurements from a multimeter. In this scenario our state variable x is the resistance of the resistor, and y are our noisy measurements from the multimeter.
which again just reduces to the simple matrix equation
What the hell, our output matrix isn’t there! Well, it is, it’s just a k x 1 matrix filled with 1’s. Now, if we just plug this into equation 7 from above, our optimal (in the sense of least squares) estimate of the state x is simply
In this, almost stupidly, simple example, the least squares estimate reduces to just what a normal person would have done and is the average of the noisy measurements.