**What are the prerequisites for Math 53?**

Any version of Linear Algebra and any version of Multivariate Calculus, plus some familiarity with probability.

Math 63 (analysis) is NOT required. Economics is NOT required.

**What’s that about probability?**

Probability is not a formal prerequisite, but we’ll proceed on the assumption that everyone is familiar with the basics. We’ll review each concept when it arises, but not enough to let you catch up if you’re totally clueless at the start.

Stat 11, Econ 31, or AP statistics in high school would be enough. Those courses cover mostly statistics, rather than probability. It is the probability part that we will need.

**What should I read?**

Here are some references. You can check them to see whether you know enough probability already, or read them to learn the material. Pick your favorite book --- they all cover the same material, so you don’t need to read them all!

Grinstead and Snell, *Introduction
to Probability,* 2^{nd} revised ed.,

Chapters 1-6 except sections 3.3 and 4.3.

Moore and McCabe, *Introduction
to the Practice of Statistics, *5^{th} ed.:

Sections 1.1, 1.2, 1.3, 2.1, 2.2, 4.1 through 4.4. Be comfortable with the

use of the tables inside the front cover.

Ross, *A First Course
in Probability*, 6^{th} ed.:

Sections 2.1 through 2.5, 3.1, 3.2, 3.4, 4.1 through 4.5, 5.1 through 5.4, 6.1, 6.2,

maybe 7.3.

(I’ll revise this soon to add more references.)

**Just tell me what I need to know.**

OK, below is a rough essay on probability. If you know everything in this essay, you’re in great shape. If you feel you would need review on a few points, we’ll do that in Math 53. If you think you know this material but the language seems strange, you’ll be fine. If all of this is completely strange to you, you need to read some reference material before we start.

**Here’s the essay.**

*Events* A, B, C… are subsets of a sample space,
S. Events can be constructed by taking
unions, intersections, and complements of other events. A *probability
measure* is a function P that assigns “probabilities” P(A), P(B), P(C)… to
events. All probabilities are in the
interval [0, 1], and they satisfy certain rules. For example, if the events A_{1}, A_{2}, A_{3},…
are pairwise disjoint --- that is, if A_{i} Ç A_{j} = f
whenever i ¹ j --- then P( A_{1} È A_{2}
È A_{3}
È
… ) = P(A_{1}) + P(A_{2}) + P(A_{3}) + ….

Two events A and B are *independent*
if P(AÇB) = P(A) P(B). More generally, a set of events are
independent if, whenever we partition them into two sets {A_{i}} and {B_{j}},
any combination of the A’s is independent of any combination of the B’s.

One may also consider *conditional
probabilities.* The notation P(A|B)
means the probability that we would assign to A, if we knew that event B had
already occurred. We could also define
P(A|B) = P(A Ç
B) / P(B). It is generally true that

P(A Ç B) = P(A) P(B|A) = P(B) P(A|B)

even if the events are not independent. In fact, A and B are independent precisely when P(A|B) = P(A), and this is true precisely when P(B|A) = P(B).

A (real-valued) *random
variable* is a function X:S®**R**, for which
we can define a *cumulative distribution
function* (cdf) as follows:

F(x) = P(X ≤ x) = P( { s Î S | X(s) ≤ x } ).

Then F satisfies

_{}

Any function with these properties can be a cdf for a random variable. We have:

P( a ≤ x < b ) = F(b) – F(a).

Some random variables are *continuous,* which means that they have density functions. If X has cdf F, and if F has a derivative f,
then f is called the *density function * for X.
Then f satisfies f(x) ≥ 0
(always) and _{}, and we have

P ( a
≤ x ≤ b ) = _{}.

We call f a density function even if F is not differentiable at countably many points, as long as F is continuous everywhere. We get f from F by F’ = f, and we can reconstruct F from f by

F(a) = _{}.

Some random variables are *discrete,* which means that all of the probability is concentrated
on countably many values x_{1}, x_{2}, x_{3}…. If X is discrete, then it has a *probability mass function * p given by

p(x_{i})
= P(X = x_{i}).

Now each p(x_{i}) ≥ 0 and _{}. We also have

P ( a
≤ x ≤ b ) = _{}

where the sum is taken over all of the x_{i}’s
satisfying a ≤ x_{i} ≤ b.

Two random variables X and Y have a *joint distribution function * F_{XY} given by

F_{XY}(x,
y) = P ( X ≤ x and Y ≤ y ).

We say that these variables are *independent* if

F_{XY}(x,
y) = F(x) F(y)

for every x and y.
Sometimes the random variables have a *joint density function *f_{XY}, and in that case

_{}.

(We won’t use double integrals much in Math 53 --- maybe not at all. In fact, we won’t use joint distributions much, except conceptually.)

The *expected value*
of a random variable is, intuitively, the average of a large number of
independent draws from its distribution.
Formally, it is given by

E( X ) = _{}

if X has a density function or

E( X ) = _{}

if X has a probability mass function. Not every random variable has an expected value.

We can define a new random variable Y by expressing it as a function of X: say, Y = h(X). Then we can find the expected value of Y by

E( Y ) = E(
h(X) ) = _{}

or

E( Y ) = E(
h(X) ) = _{}.

The *variance* of a
random variable is given by

Var( X ) =
E ( (X – E(X))^{2} ) = E(X^{2}) – E(X)^{2}.

Not every random variable has a variance. When X has a variance, then it also has a *standard deviation* given by

s_{X}
= _{}.

The *covariance* of
two random variables X and Y is given by

Cov(X, Y) = E(XY) – E(X)E(Y).

When two random variables are independent, their covariance
is zero. Their *correlation coefficient* is given by

Corr(X, Y)
= Cov(X, Y) / (s_{X}s_{Y});

and this always satisfies -1 ≤ Corr(X, Y) ≤ +1.

A random variable has a *uniform
distribution* on the interval [a, b] if its density function is given by

f(x) = constant when a ≤ x ≤ b, and zero otherwise.

A random variable has an *exponential
distribution* with mean m if its density function is

f(x) = (1/m) e^{-x/}^{m} when x ≥ 0, and zero otherwise.

A random variable has a *normal
distribution* with parameters m and s if its density function is

_{}.

The graph of this function is the familiar “bell curve.” If X has a normal distribution, then we can easily determine probabilities such as

P ( a ≤ x ≤ b )

by consulting a normal distribution table.

We’ll review each of these concepts when they appear in Math 53, but we’ll expect everyone to catch on fairly quickly.

(end)