Stat 11
February 12, 2006
Homework #3 - SOLUTIONS

2.14 (fuel consumption)
a. Clearly they want speed on the x-axis; to the extent there’s a cause-and-effect relationship, speed is the cause, AND, to the extent anyone would want to predict one from the other, they would start with speed. (Both are reasons for a variable to go on the horizontal axis.)
b. It’s a strong association, but not linear, or even linear-with-random-errors. That makes sense; cars are designed to operate best at a certain speed, and variation from that speed in either direction reduces fuel efficiency.
c. The variables could be described as positively associated for large x values or negatively associated for large x values, but neither characterization is right for both ranges.
d. We call it a very strong relationship (if you believe these data) because the value of x almost exactly determines the value of y. It’s just a more complicated formula than a straight line.2.32 (page 130)
2.32 (same, continued)
Also: In the scatterplot, show roughly where the regression line would be. Alas, correlation and regression are not useful tools in this situation.
The correlation is about r = -0.17. As the regression line shows, the linear part of the relationship is pretty weak, and also pretty uninformative. Adding a few more points on either side of the graph would change the sign of r, which makes it pretty useless as a summary description of this graph.
2.55 (page 149. The formulas on p. 137 should help; there is actually more information in the problem than you need.)
The slope of the regression line (of husband’s height against wife’s --- that means wife’s is on x axis) is
b = (r) ( sy/sx ) = (0.5) (2.7) / (2.5) = 0.54.
The line must go through the point (64.5, 68.5) (the “point of means”) so you can use the “point-slope method” to write the equation as
y = 68.5 + 0.54 ( x – 64.5 )
or you can just solve for a in the relationship
68.5 = a + (0.54) (64.5)
to get
a = 33.67
and make the equation
y = 33.67 + 0.54 x.
Plugging in x = 67 gives y = 69.85, so the predicted husband’s height is 69.85 when the wife’s height is 67.
2.58 (missing exams, page 149)
a. The slope of the line is b = r sy / sx = (0.6) (8 / 30 ) = 0.16. The intercept turns out to be 30.2.
b. 30.2 + 0.16(300) = 78.2 on the final.
c. r=0.6, so R2 is 0.36. That means that only 36% of the variance in final-exam scores is explained by pre-exam scores; there’s much more variation in the residuals than the predicted part. Prof. Friedman is giving Julie an average score on the final with a relatively small adjustment that, on average, covers only a small part of the actual difference. Of course the actual difference might have been positive or negative, but Julie might as well put her case for a bigger positive adjustment.
(Note: Most people, contra the author, capitalize R2 when talking about regressions, but use small r’s when talking about correlation. It’s a notational clash; in this case both r’s mean the same thing.)
2.79 (confounding variables, p. 169 --- does education make you rich?)
Discussed in class --- economists fall into two groups, and the better-educated group earns less. But for the full story, you have to look at the groups separately.
3.6 (p. 197 – more confounding variables)
Also discussed in class --- women who took estrogen during the observational study are richer and more health-conscious (or at least they might be) so we can’t tell whether their better heart health was caused by the estrogen or by those (possible) confounding variables. The experiment doesn’t have this defect (although experiments like this are hard; there might well be other defects).
3.7 (p. 198 – still more confounding variables)
Wine drinkers are (or might be) more affluent as a group than other people, and that might account for slightly better health statistics.
Also one more problem, modified from an earlier edition of the text:
Problem A: The Joy of Extrapolation. Here is a table showing the number of Americans living on farms, by year. (Numbers of people are in millions.)
1935 1940 1945 1950 1955 1960 1965 1970 1975 1980
32.1 30.5 24.4 23.0 19.1 15.6 12.4 9.7 8.9 7.2
a. Using any software package, construct a scatterplot and find the least-squares
regression line. (The Excel graphics package is good enough. Or, do it
all by hand if you like.)

b. Based on the regression line, what is the predicted number of Americans
living on farms in the year 1985 ?
1166.9 – 0.5868 (1985) = about 2.1 million
c. Based on the regression line, what is the predicted number of Americans
living on farms in the year 2005 ?
1166.9 – 0.5868 (2005) = about MINUS 9.6 million. Not likely right; this trend couldn’t have continued linearly at this rate.
Note: Using 4-digit years in this case is begging for rounding errors. I should have numbered years from 1900, or 1935, or even 2000. The results would have been the same, but calculations would be more accurate. (The problem is that even four digits of that coefficient, -0.5868, might not be enough when it’s getting multiplied by numbers around 2000.)
(end)