INFERENCE FOR REGRESSION Is a child’s IQ linked to their crying as an infant?
Crying and IQ Infants who cry easily may be more easily stimulated than others and this may be a sign of higher IQ. Child development researchers explored the relationship between the crying of infants four to ten days old and their later IQ scores. A snap of a rubber band on the sole of the foot caused the infants to cry. The researchers recorded the crying and measured intensity by the number of peaks in the most active 20 seconds. They later measured the children’s IQ at age three years using the Stanford-Binet IQ test. The data for this experiment is on the next page (Table 14.1 in our book).
Crying and IQ Crying
IQ
Crying
IQ
Crying
IQ
Crying
IQ
10
87
20
90
17
94
12
94
12
97
16
100
19
103
12
103
9
103
23
103
13
104
14
106
16
106
27
108
18
109
10
109
18
109
15
112
18
112
23
113
15
114
21
114
16
118
9
119
12
119
12
120
19
120
16
124
20
132
15
133
22
135
31
135
16
136
17
141
30
155
22
157
33
159
13
162
Crying and IQ
Plot and Interpret Input the data in your calculator and create a scatterplot Look for form, direction and strength in the plot Numerical Summary Calculate a LinReg(a+bx) L1, L2, y Make note of the value of r Mathematical Model We are interested in predicting the response (IQ) from the explanatory (crying) Create an LSRL using a and b
Crying and IQ Slope and Intercept The slope of the regression line (𝑦 = 91.27 + 1.493𝑥) is of particular importance. A slope is a rate of change
The true slope 𝛽 says how much higher average IQ is for children with one more peak in their crying measurement.
Because 𝑏 = 1.493 estimates the unknown 𝛽, we estimate that on the average IQ is about 1.5 points higher for each added crying peak.
Crying and IQ Slope and Intercept Though we need the intercept (𝛼 = 91.27) to draw the line, it has no statistical meaning in this example.
No child had fewer than 9 crying peaks, so we have no data near 𝑥 = 0.
It is safe to expect that all normal children would cry when snapped by a rubber band, thus we will never observe 𝑥 = 0.
Crying and IQ Residuals and Standard Error The data table shows that the first recorded infant had 10 crying peaks and a later IQ of 87.
The predicted IQ for 𝑥 = 10 is 𝑦 = 91.27 + 1.493𝑥 = 91.27 + 1.493 10 = 106.2
The residual for this observation is residual = 𝑦 − 𝑦 = 87 − 106.2 = −19.2
That is, the observed IQ for this infant lies 19.2 points below the least-squares line on the scatterplot.
Crying and IQ Residuals and Standard Error Let L3 = RESID or L3 = L2 – Y1(L1) Verify that the sum of the residuals is 0
The variance about the line is found with…
𝑠2 =
𝑦−𝑦 𝑛−2
2
or
residual2 𝑑. 𝑓.
The standard error about the line is found with … 𝑠=
𝑦−𝑦 𝑛−2
2
Crying and IQ Residuals and Standard Error I would suggest letting L4 = L32
What is the variance of our data?
What is the standard error of our data?
Crying and IQ Confidence intervals for the regression slope Remember, we will not be able to find 𝛽, but we can use 𝑏 to find a range of values in which we can be confident 𝛽 is contained…
𝑏 ± 𝑡 ∗ SEb
The standard error of the least-squares slope 𝑏 is…
𝑠
SEb =
𝑥−𝑥 2
𝑡∗
1−𝐶 2
Where is the upper critical value from the 𝑡 distribution with 𝑛 − 2 degrees of freedom
Crying and IQ Confidence intervals for the regression slope The output below for the crying study is from the regression command in the Minitab software package. Most such packages have similar output.
The first line gives the equation of the lest-squares regression line.
Crying and IQ Confidence intervals for the regression slope
Let’s look at the rest of the information given.
Crying and IQ Confidence intervals for the regression slope
The hypothesis 𝐻0 : 𝛽 = 0 says that crying has no straight-line relationship with IQ.
Our previous work showed that there is a relationship. The analysis above gives us two values that give us very strong evidence that IQ is correlated with crying.
Beer and Blood Alcohol How well does the number of beers a student drinks predict his or her blood alcohol content? Sixteen student volunteers at Ohio State university drank a randomly assigned number of cans of beer. Thirty minutes later, a police officer measured their blood alcohol content (BAC). Here are the data: Student:
1
2
3
4
5
6
7
8
Beers:
5
2
9
8
3
7
3
5
BAC:
0.10
0.03
0.19
0.12
0.04
0.095
0.07
0.06
Student:
9
10
11
12
13
14
15
16
Beers:
3
5
4
6
5
7
1
4
BAC:
0.02
0.05
0.07
0.10
0.085
0.09
0.01
0.05
Beer and Blood Alcohol Student:
1
2
3
4
5
6
7
8
Beers:
5
2
9
8
3
7
3
5
BAC:
0.10
0.03
0.19
0.12
0.04
0.095
0.07
0.06
Student:
9
10
11
12
13
14
15
16
Beers:
3
5
4
6
5
7
1
4
BAC:
0.02
0.05
0.07
0.10
0.085
0.09
0.01
0.05
The students were equally divided between men and women and differed in weight and usual drinking habits. Because of this variation, many students don’t believe that number of drinks predicts blood alcohol well. What do the data say?
Beer and Blood Alcohol
Student:
1
2
3
4
5
6
7
8
Beers:
5
2
9
8
3
7
3
5
BAC:
0.10
0.03
0.19
0.12
0.04
0.095
0.07
0.06
Student:
9
10
11
12
13
14
15
16
Beers:
3
5
4
6
5
7
1
4
BAC:
0.02
0.05
0.07
0.10
0.085
0.09
0.01
0.05
Let’s input the data into our calculators
Beers will be our explanatory variable
BAC will be our response variable
Beer and Blood Alcohol H0: 𝛽 = 0 Ha: 𝛽 > 0
number of beers has no effect on BAC number of beers increases BAC
Do a linear regression on the data 𝑎 = −0.012700604 𝑏 = 0.0179637619 𝑟 2 = 0.7998407228 𝑟 = 0.8943381479
Create a list of the residuals in L3
Square the list of the residuals in L4
Beer and Blood Alcohol
Calculate the value of s
𝑠=
𝑠=
𝑦−𝑦 2 𝑛−2
sum 𝐿4 16−2
≈ 0.0204409513
Store this value in S
Beer and Blood Alcohol
Calculate 𝑆𝐸𝑏
𝑆𝐸𝑏 =
𝑆𝐸𝑏 = 𝐒
𝑠 𝑥−𝑥 2
sum 𝐿1 −mean 𝐿1 2
Store this value in E
≈ 0.0024017034
Beer and Blood Alcohol
Why did we find 𝑆𝐸𝑏 ? It is needed to find the confidence interval for 𝛽 𝑏 ± 𝑡 ∗ 𝑆𝐸𝑏
𝑡 ∗ is the value from the table for 95% confidence with 14 degrees of freedom
0.0179637619 ± 2.145 ∙ 𝐄 0.0128, 0.0231
What does this interval tell us (in context)?
We are 95% confident that the true slope 𝛽 is between these two values.
Beer and Blood Alcohol
Now, let’s determine if we have statistically significant evidence that the number of beers effects BAC
We need to find the 𝑡-ratio
𝑡=
𝑏 𝑆𝐸𝑏
𝑡=
0.0179637619 𝐄
≈ 7.479592073
Store this value in 𝐓
Beer and Blood Alcohol
According to the hypotheses, is this a one- or twosided test?
Using the 𝑡-ratio we just found, calculate our p-value
𝑝=𝑃 𝑇≥𝑡
𝑝 = 𝑡𝑐𝑑𝑓 𝑡, ∞, 𝑑𝑓
𝑝 = 𝑡𝑐𝑑𝑓 7.479592073,1E99,14
𝑝 ≈ 1.484739926E − 6
Beer and Blood Alcohol
What is our conclusion to this point? With a p-value so small (𝑝 ≈ 1.48E − 6 < 0.001) we have very strong statistical evidence that the number of beers a person drinks does elevate their BAC. Specifically, to a 95% level of confidence, for each beer a person consumes, their BAC should increase between 0.013% and 0.023%. However, student number 3’s BAC was 0.04% (0.19 – 0.15) higher than normal. Though the 𝑥 is not extreme for this student, it is possible this value is influential. To verify that our results are not too dependent on this one value, removal of it and recalculating may be necessary.
Beer and Blood Alcohol
Remove Student 3 and recalculate a = 2.481E-5 b = 0.0146 r2 = 0.7684 r = 0.8766
𝑦−𝑦 2 𝑛−2
S=
𝑆𝐸𝑏 =
𝑏
L4 13
=
𝑠 𝑥−𝑥 2
=
≈ 0.0162 𝐒 𝑥−mean 𝑥
2
≈ 0.0022
0.0146
𝑡 = 𝑆𝐸 = 0.0022 ≈ 6.5894 𝑏
𝑝 = 𝑡𝑐𝑑𝑓 𝑡, ∞, 𝑑𝑓 = 𝑡𝑐𝑑𝑓 6.5894,1E99,13 ≈ 8.7189E − 6 𝑌1 5 = 0.073 as opposed to 0.077
Predicting Blood Alcohol Steve thinks he can drive legally 30 minutes after he finishes drinking 5 beers. We want to predict