Stat 11

February 16, 2004

What’s on the Exam?  #1

 

The in-class exam on February 20 covers Chapters 1-3 and Sections 4.1-4.3, as well as material covered in class and homework assignments 1-4.  I have tried to cover all the topics on this checklist, but it isn’t guaranteed.

 

Understand…

 

            Jargon for a data table:

                        columns = variables

                        rows = cases = individuals = subjects = observations = records = etc.

                        “unique keys”

 

            Kinds of variables:

                        Categorical – nominal or ordinal

                        Quantitative – discrete or continuous

 

            Shapes of distributions:

                        Unimodal, bimodal, or multimodal

                        Symmetric, skewed right, or skewed left

                        Outliers

 

            Time plots (e.g., pp 19-21)

 

            The “equal area principle” for, for example, histograms and pie charts

 

            The “rms average” of a variable --- square the values, average them, and

                        take the square root.  It’s like an average, but a little higher and ignores signs

 

            The relationship between mean and median for skewed distributions (which is larger?)

 

            How outliers (and extreme values) affect the various measures of center and spread

                        (and how they affect correlations and regression lines)

 

            How mean, standard deviation, median, Q1 and other percentiles, and IQR change…

                        …when the variable is multiplied by a constant (rescaling) or

                        …when a constant is added to the variable (recentering)

 

                        (If a variable is changed in some other way—for example, by replacing each

                        value with its logarithm or its square—there are no good rules for how

                        the mean and standard deviation change.)

 

            The “68-95-99.7 rule”

 

            Aspects of a scatterplot:

                        outliers, separate clusters,

                        weak / strong association,

                        positive / negative association,

                        linear / non-linear association,

                       

            How correlation (or the correlation coefficient, r) measures only the linear part of an

association

           

            The least-squares criterion, and how it tells us to choose a regression line

 

            How  R2  measures the usefulness of a regression

                        (A regression with a low R2 may be useful for describing the relationship

                        between variables or in some other way, but it doesn’t give good predictions.)22 22

   

 

            Given a scatterplot and a regression line, what features should make you feel good

or bad about the linear regression?

 

The “restricted range” problem (if you only have a narrow range of x values in a regression, it’s likely to miss the relationship – p. 161)

 

Confounding variables and “lurking” variables

 

Connection between (a) a good relationship in a scatterplot and (b) cause-and-effect relationships  (i.e., a can happen without b for many reasons)

 

Observational studies vs. Experiments

 

Experiments:  Role of controls; “Hawthorne effect” and placebo effect; “blind and double-blind” experiments; role of randomization  (never mind matched pairs or block designs)

 

Statistical significance (main idea)

 

Kinds of samples…

            voluntary response

            convenience sample

            systematic sample

            probability sample (includes other kinds)

            SRS

            stratified sample

            multistage sample

            weighted sample

 

Levels in a sample survey…

            Population

            Sampling frame

            Sample (as selected)

            (actual) sample

 

Sources of errors in a sample survey…

            Undercoverage bias

            Sampling variation

            Non-response bias

            Response bias (mistakes, lies, badly-worded questions, etc.)

 

Bias vs. variablility (see pages 236-237)

 

Sampling distributions:

            If you took lots of samples, the conclusions (sample means, proportions, etc.)

            would vary; in fact, these are random variables and have distributions we can

            try to understand

 

Dependence of sampling variability on…

            sample size (does matter)

            sampling rate (doesn’t matter)

            variability of the underlying variable

 

Probability:

            Sample space

            Outcomes

            Probability model (for a sample space)

            Events

            Disjoint events

            Laws of probability (for events)  (p. 262)

            Multiplication rule for independent events

            Random variables

            Probability model (for a discrete random variable)

                        0-1 random variables

                        uniform random variables

                        binomial random variables (n trials, each probability p, count successes)

            Probability model (for a continuous random variable) = density curve

                        uniform random variables

                        normal random variables

 

Be able to…

 

            Construct a frequency table for a single variable, showing number of observations

                        for each value or range of values

 

            Construct a bar chart or a pie chart for a single variable

 

            Construct a histogram showing the distribution of a single quantitative variable

                       

            (never mind stem and leaf diagrams)

 

            Compute, for a single quantitative variable…

                        mean

                        median

                        Q1, Q3, or any percentile

                        the “five-number summary”

                        the standard deviation (prefer n-1 on the exam)

                        the IQR  (that’s the difference Q3-Q1)

 

            Construct a box plot based on a five-number summary

                       

            Compute the fraction of values of a normally distributed variable that lie between

                        two numbers.  (For example: If the mean is 10 and the SD is 5, what

                        fraction of values are between 6 and 7?)  (The usual table will be provided)

 

            Estimate (roughly) a standard deviation from a histogram or density curve

           

            For a single variable X and its standardized version Z: given X, compute Z and vice versa

 

            Construct a scatterplot for two variables

 

            Compute the correlation of two variables,  r   (given the formula)

 

            For a regression: 

                        You won’t need to compute the slope or intercept of a regression line,

                                    unless they follow directly from a general understanding (for

                                    example, if the line is obviously flat).

 

                        But, know that the regression line goes through the “point of means.”

 

                        And, know how the slope of the line is related to  r:  When x goes up by one

                                    standard deviation (sx),  y goes up by  r  standard deviations (r times sy).

                       

                        Given the coefficients of a regression model  (a and b),  calculate the predicted                                          value of  y  to go with any value of  x.

 

 

 

(end)