This document is split into several sections:

In the previous document we assumed that the best linear estimate for the
state, x_{j}, was given by

where

The question to be answered is: Can we prove that the second statement is true?

If we want to estimate the state we can use only the three quantities that we know, the previous estimate, the current input and the current measured output. We use these three variables to form a linear estimate of the state:

where a_{j} and b_{j}
are two unknowns to be chosen to minimize the error between the value of the
stat and its estimate. In other words we want to minimize the expected
value of the error e_{j} with respect to the variables a_{j}
and b_{j}.

To do the minimization with respect to each variable we simply differentiate and set the result to zero. At this point we will only try to fin

which can be rewritten

These last two expressions are often referred to as the orthogonality conditions; i.e., the error is orthogonal to the previous estimated state, the current input and the current value of the measured output.

Let's use the first condition to find an expression for a_{j}
that minimizes the expected value of the error. If we add and subtract a_{j}x_{j-1}
from the equation (why we do this will become clear shortly), we get:

Now we can use the facts that

to write

Note that because of the orthogonality relationships the first term on the right can be rewritten as

We also know that the previous estimate is uncorrelated with the current value of the measurement noise:

So we can simplify the equation to the following

This is a complicated expression that we can use a bit later,
but first we need to derive one more expression. By following the same
sequence of steps as is done above (but starting with the *second*
equation in which we set the derivative to zero), it is easily shown that

We can rewrite the last two equations

or, in matrix form

For a matrix equation

we know that either

So for the matrix equation above, either

The second equation can be written as

If the last equation is true, it should be true for any
input. If the input is a constant such that u_{j}=c,
then

However

is only true if M and N are independent, and the value of the state and its estimate are not independent, so the first condition must be true. In other words,

(This last argument seems weak to me, but I haven't worked out the details. If you have a more rigorous argument, please email me.)

So

Substituting this into our original equation for the estimate of x, we get

Now recall Equation 6 from the previous document

**
Equation 6**

From the previous document we know that the a priori estimate of the state is given by

and if we let

we can rewrite our last equation (at the end of the previous section)

which matches Equation 6.

We have shown that the Kalman filter represents the optimal
linear filter. The other document goes on to derive the optimal value
for k_{j}.