STAT 11

May 7, 2006

A few sample problems…

 

1.  Confidence interval for a mean.

 

            Mars Rover Two measures 64 daily high temperatures, and finds that the average of the 64 measurements is  –42.5  degrees C.  Assume that the standard deviation of daily temperatures is exactly  16  degrees C, based on data from Mars Rover One.

 

            a.  In this problem, what are  n,  m,  ,  s,  and  s ?

 

            b.  What is the standard error of the mean temperature?

 

            c.  Give a 95% confidence interval for m.

 

            d.  Give an 82% confidence interval for m.

 

 

2.  Confidence interval for a proportion.

 

            You randomly selected 100 of your latest homemade hand grenades, and very carefully tested them.  Unfortunately(?)  80  of them failed.

 

a.  What is   ?

 

b.  What is the SE ?

 

c.  What is a 95% CI for the true failure rate?

 

d.  If you felt like using Wilson’s method for this problem, what would change?

 

 

 

3.  Confidence interval for a difference of means.

 

            You found that 30 randomly selected Sunoco stations had an average gas price of $ 3.10, with  s = $ 0.05. 

 

Also,  40  randomly selected Lukoil stations had an average price of $ 3.12, with  s = $ 0.08. 

 

Let  D = (average of all Lukoil prices) minus (average of all Sunoco prices).

 

What is a 95% confidence interval for D ?  What can you conclude about the relative prices at Lukoil and Sunoco stations generally ?

 

4.  Interpreting a scatterplot.

 

            a.  Look at this scatterplot, and estimate…

 

                        The mean of the “x” variable;

                        The standard deviation of the “x” variable;

                       

                        The mean of the “y” variable;

                        The standard deviation of the “y” variable;

 

                        The correlation coefficient (r) for the two variables.

 

            b.  Describe the relationship in words.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5.  More on standard errors.

 

            Science News reported that the average length of a junk-DNA sequence in Speciesus Inventedus is 128 bases, based on a sample of 100 measurements.

 

            a.  What is the SE for m, the true average length?

 

            b.  Actually, Science News gave the SE:  They said it’s  8.  What is the standard deviation of the original sample?

 

 

 

 

 

 

6.  One-way chi-square problem.

 

            A poker-dealing machine is supposed to deal cards at random, as if from an infinite deck.

 

            In a test, you counted 1600 cards, and observed the following:

 

                        Spades             404

                        Hearts              420

                        Diamonds        400

                        Clubs               376

 

            Could it be that the suits are equally likely?  Or are these discrepancies too much to be random?

 

 

7.  Another one-way chi-square problem.

 

            Same as before, but this time jokers are included, and you counted 1662 cards, with these

            results:

 

                        Spades             404

                        Hearts              420

                        Diamonds        400

                        Clubs               356

                        Jokers                82

 

            a.  If a deck contains 54 cards and two of them are jokers, what is the probability that any particular randomly-chosen card would be a joker?

 

            b.  How many jokers would you expect out of 1662 random cards?  How many of each suit?

 

            c.  Is it possible that the cards are really random?  Or are the discrepancies too large?

 

 

-----------------------------------------------------

 

 

SOLUTIONS

 

 

1a.       n = 64

            m = the true (long-run) average daily temperature at the Mars Rover Two site

                        (we don’t know the value)

            = –42.5 degrees C

            s = 16 degrees C  (by assumption)

            s  = the sample standard deviation – this could be computed from the sample, but it isn’t

                        given in the problem

 

1b.  SE =  exactly

 

1c.  Use z* = 1.96, since you are NOT using s as a substitute for s.  MOE = 1.96 × 2.00 = call it 3.9 degrees C, so the CI is

                        [  -42.5 – 3.9,  -42.5 + 3.9  ]

                        or

                        [  -46.4,  -38.6 ]

 

1d.  The problem here is to compute z* when C = 0.82.  You want to leave a probability of

            a/2 = 0.09 in each tail, so you could either…

                        Find 0.09 under PHI(z) in the z table, and see that it corresponds to

                                    z = - 1.34,  which is –z*, or

                        Find 0.91 under PHI(z) in the z table, and see that it corresponds to

                                    z = + 1.34, which is +z*.

 

Using z* = 1.34, we get MOE = 1.34 × 2.00 = call it 2.7 degrees, so the CI is

            [ -45.2,  -39.8  ] .

 

 

 

 

2a.       = 0.80    ( = 80 divided by 100 )

 

2b.       SE =  exactly.

 

2c.       Use z* = 1.96  (always use z* for a proportion) .  The MOE is 1.96 × 0.04 = 0.08, so the CI is [ 0.72,  0.88 ].

 

            (Be careful here, if you use percentages.  It would be ok to write [ 72%, 88% ], but if you write in percentage points, then be sure to do the same thing with the MOE.  In particular, don’t write  (WRONG) [ 79.92, 80.08 ]. )

 

2d.  The center of the CI would still be = 0.80 (exactly), but you would also compute

 

                       

 

            and you would use it to compute SE:

 

                        SE = .

 

The resulting confidence interval would be very slightly larger than the one in 2c.

 

 

 

3.  There’s no reason to pool the “s” values here, and that isn’t supposed to be part of the course material anyway.  So compute the two SE’s separately:

 

            Sunoco:

                        n = 30,   = 3.10,   s = 0.05,   SE = s/sqrt(n) = $ 0.00913

            Lukoil:

                        n = 40,   = 3.12,   s = 0.08,   SE = s/sqrt(n) = $ 0.01265

 

            Combined SE:

 

                         = $ 0.01560

 

            MOE:  Use z* = 1.96 again, so MOE = (1.96) (0.01560) = $ 0.0306…call it 3 cents.

 

            The observed difference is

 

                        = 0.02    (that’s 3.12 – 3.10)

 

            so the confidence interval for the difference is

 

                        [  –$0.01,  +$0.05  ].

 

            You can’t tell from these reports – at least you can’t tell with 95% confidence – whether Sunoco or Lukoil has generally higher prices.

 

 

4a.  The true values are as follows:

 

            mean of x:       2.086

            sd of x:            2.933               (using n-1)

           

            mean of y:       4.300

            sd of y:            1.925

 

            correlation:      –0.832

 

4b. In words:  The relationship is STRONG, NEGATIVE, LINEAR.

            (Although really, how strong it is depends on context.  You might ask, strong compared to what? )

 

(The same is true of linearity.  You might detect a slight downward bending at the left edge of the picture; or, you might decide that the apparent bending is caused by just one point and is probably accidental.  In fact, the data were generated using a linear equation with normally distributed errors.)

           

 

5a.  Trick question – you can’t tell what the SE is, because you would need to know s or s.

 

5b.  You have SE = s / sqrt(100)  =  s / 10.  Since SE = 8, you have  8 = s / 10,  so they must be using 80 for s.  (You still can’t tell whether they got this from the sample or estimated it in some other way.)

 

 

6.

 

 

expected

expected

 

observed

(percent)

(counts)

z

404

0.25

400

0.200

420

0.25

400

1.000

400

0.25

400

0.000

376

0.25

400

-1.200

 

 

 

 

 

chi-square->

2.480

 

 

 

 

 

critical value->

7.815

 

 

            Compute each z from its own row as (observed-expected)/sqrt(expected).  Be sure to use the counts in this formula, not the percentages.  The chi-square statistic is the sum of the squares of the z-values.

 

            The number of degrees of freedom is 3 (number of categories minus 1).

 

            The critical value is from a table you’ll have on the exam (using a = 0.05).  But you don’t need it in this case.  The chi-square value is about what you would expect with 3 degrees of freedom, and none of the z statistics are out of line (not even as large as 2, certainly not beyond 4).  So, DO NOT REJECT the null hypothesis.  There is no reason to suspect that the cards are not random.

 

 

 

 

 

 

7.

 

 

expected

expected

 

observed

(percent)

(counts)

z

404

0.2407

400.1

0.194

420

0.2407

400.1

0.994

400

0.2407

400.1

-0.006

356

0.2407

400.1

-2.205

82

0.0370

61.6

2.606

1662

 

1662

 

 

 

 

 

 

chi-square->

12.680

 

 

 

 

 

critical value->

9.488

 

            This time, the chi-square statistic (12.68) is above the a=0.05 critical value, so you could reject the null hypothesis and declare that the cards are not random.  The problem is clearly that there are too many jokers at the expense of clubs – you can see that from the z statistics. 

 

            On the other hand, the p-value is only 0.013  (you can’t compute that on the test)  so the test isn’t totally convincing.  You wouldn’t want to arrest the machine designer on this evidence.

 

(end)