Stat 11

February 3, 2006

Homework #2 - Solutions

 

Normal distributions

 

1.  Scores on a typical IQ test have mean 100, standard deviation 16.  Assume that they are normally distributed.

 

            a.  What fraction of the scores are between 84 and 116 ?

          That’s one standard deviation above and below the mean, so the answer is  68 %.

            b.  An article in Parade Magazine reported that Sharon Stone has an IQ of 160.

                      About what fraction of people taking this test would score at or above 160?

          160 is 3.75 standard deviations above the mean

          [ (160-100)/16 = 3.75 ] .  But my table only goes up to 3.00, and the book’s Table A only goes to 3.49, so you need to (a) guess or (b) use Excel or a calculator to find F(3.75) = 0.999912.  That’s the fraction of scores below 160, so the answer to the question is 0.000088, or about 88 per million.

            c.  Ginger’s score was at the 80-th percentile.  What was her score ?

          In the table, find the fraction  z  such that  F(z) = 0.80.  It’s 0.84 (closest approximation in either table), so the 80th percentile is 0.84 standard deviations above the mean.  In this case, this means a score of  100 + (0.84 times 16) = about 113.

You should be able to answer problems 7 and 8  2 and 3  from pictures and pure thought, without calculation.  Of course, you can calculate if you like.

 

2.  Assume that   X   is normally distributed with mean  0.0  and standard deviation  5.0,

            and  Y  is normally distributed with mean  0.0  and standard deviation  10.0.  Which variable has a larger fraction of its values above +1 ? 

          Y has a larger fraction above +1.

 

          For X, we’re asking what fraction is above 0.2 sd’s above the mean.  For Y, we’re asking what fraction is above 0.1 sd’s above the mean.  It’s easier to be 0.1 sd’s above than 0.2 sd’s above, so the second answer must be larger.

 

 

3.  Assume that   X   is normally distributed with mean  5.0  and standard deviation  10.0,

            and  Y  is normally distributed with mean  10.0  and standard deviation  5.0.  Which variable has a larger fraction of its values above +20 ? 

To be above +20, a  Y  value would have to be two standard deviations above the mean.  But an X value would only have to be 1 1/2 standard deviations above the mean, so  X  has a larger fraction above +20.

 

Associations and Correlations

 

4.  Draw what you think a scatterplot would look like for each of these three pairs of variables.  Label your axes.

           

            a.  Apples:  weight in grams, weight in ounces.

        The dots are exactly on a line.

                                               

 

            b.  College freshmen:  reported shoe size, grade point average.  (Is shoe size bimodal?  Does that show in the scatterplot?)

 

There might be one symmetrical blob or two side-by-side blobs, depending on how much women’s shoe sizes overlap men’s.

 

            c.  Gasoline:  days since your last fill-up, gallons remaining in your tank.

 


                                        Negative association, maybe not very

                                                linear, with some randomness added.

 

 

 

5.  Using as little electricity as possible, calculate the correlation between these variables:

 

            Show your work.  Along the way, show the mean and standard deviation of each variable.

            Cars: mean 4.0, standard deviation 3.0

                        Boats: mean 6.0, standard deviation 3.0

 

                                                standardized     standardized

                        cars      boats    cars                  boats                product

                        1          9          -1.0                  +1.0                 -1.0

                        4          3            0.0                 -1.0                    0.0

                        1          9          -1.0                  +1.0                 -1.0

                        7          6          +1.0                   0.0                   0.0

                        7          3          +1.0                 -1.0                  -1.0

                                                            sum of products —>    -3

                                                            “average” of products   -0.75    (dividing by n-1,                                                                                                           which is 4)

                                               

                        So the correlation is  r  =  – 0.75.

 

6.  Now, calculate the correlation between each of the following pairs of variables.  Don’t show your work.  Don’t even do any work.  Just write the answers.

 

                        cars      boats    same as above, -0.75, because multiplying either

                        1000    9000                variable by a constant (1000) doesn’t change

                        4000    3000                the standardized variable, or the correlation

                        1000    9000                                       

                        7000    6000                                       

                        7000    3000                                       

 

                                    cars      boats    again, -0.75, because adding a constant to either

5001    12009              variable doesn’t change the standardized

5004    12003              variables or the correlation.  It doesn’t matter

5001    12009              that different constants are added to the two

5007    12006              variables; all that matters is that the same thing

5007    12003              is done in each row.

 

                                                cars      boats    correlation is +1, because the points line up

                                                5          5                      exactly on a (positive-sloping) line.

                                                6          6

                                                11        11

                                                15        15

20        20

 

7.  Can you reconstruct the distributions of both variables from a scatterplot?

            a.  In this scatterplot, what are the minimum and maximum values for the CARS variable?

Minimum --- about 3.05

Maximum – about 4.00

            b.  Can you reconstruct the entire 5-number summary for the BOATS variable? 

Rounded Rectangular Callout: Q1 is here because 5 of  20 BOAT values are below this line.            (That is --- min, Q1, median, Q3, max.)

(all values approx.)

Min = 6.05

Q1 = 6.25

Median = 6.52

Q3 = 6.6

Max = 6.81

 

(end)