inference for regression

INFERENCE FOR REGRESSION Is a child’s IQ linked to their crying as an infant?

Crying and IQ Infants who cry easily may be more easily stimulated than others and this may be a sign of higher IQ. Child development researchers explored the relationship between the crying of infants four to ten days old and their later IQ scores. A snap of a rubber band on the sole of the foot caused the infants to cry. The researchers recorded the crying and measured intensity by the number of peaks in the most active 20 seconds. They later measured the children’s IQ at age three years using the Stanford-Binet IQ test. The data for this experiment is on the next page (Table 14.1 in our book).

Crying and IQ Crying

IQ

Crying

IQ

Crying

IQ

Crying

IQ

10

87

20

90

17

94

12

94

12

97

16

100

19

103

12

103

9

103

23

103

13

104

14

106

16

106

27

108

18

109

10

109

18

109

15

112

18

112

23

113

15

114

21

114

16

118

9

119

12

119

12

120

19

120

16

124

20

132

15

133

22

135

31

135

16

136

17

141

30

155

22

157

33

159

13

162

Crying and IQ 





Plot and Interpret  Input the data in your calculator and create a scatterplot  Look for form, direction and strength in the plot Numerical Summary  Calculate a LinReg(a+bx) L1, L2, y  Make note of the value of r Mathematical Model  We are interested in predicting the response (IQ) from the explanatory (crying)  Create an LSRL using a and b

Crying and IQ Slope and Intercept  The slope of the regression line (𝑦 = 91.27 + 1.493𝑥) is of particular importance.  A slope is a rate of change 



The true slope 𝛽 says how much higher average IQ is for children with one more peak in their crying measurement.

Because 𝑏 = 1.493 estimates the unknown 𝛽, we estimate that on the average IQ is about 1.5 points higher for each added crying peak.

Crying and IQ Slope and Intercept  Though we need the intercept (𝛼 = 91.27) to draw the line, it has no statistical meaning in this example. 

No child had fewer than 9 crying peaks, so we have no data near 𝑥 = 0.



It is safe to expect that all normal children would cry when snapped by a rubber band, thus we will never observe 𝑥 = 0.

Crying and IQ Residuals and Standard Error  The data table shows that the first recorded infant had 10 crying peaks and a later IQ of 87. 

The predicted IQ for 𝑥 = 10 is 𝑦 = 91.27 + 1.493𝑥 = 91.27 + 1.493 10 = 106.2



The residual for this observation is residual = 𝑦 − 𝑦 = 87 − 106.2 = −19.2



That is, the observed IQ for this infant lies 19.2 points below the least-squares line on the scatterplot.

Crying and IQ Residuals and Standard Error  Let L3 = RESID or L3 = L2 – Y1(L1)  Verify that the sum of the residuals is 0 

The variance about the line is found with…

𝑠2 = 

𝑦−𝑦 𝑛−2

2

or

residual2 𝑑. 𝑓.

The standard error about the line is found with … 𝑠=

𝑦−𝑦 𝑛−2

2

Crying and IQ Residuals and Standard Error  I would suggest letting L4 = L32 

What is the variance of our data?



What is the standard error of our data?

Crying and IQ Confidence intervals for the regression slope  Remember, we will not be able to find 𝛽, but we can use 𝑏 to find a range of values in which we can be confident 𝛽 is contained… 

𝑏 ± 𝑡 ∗ SEb



The standard error of the least-squares slope 𝑏 is…





𝑠

SEb =

𝑥−𝑥 2

𝑡∗

1−𝐶 2

Where is the upper critical value from the 𝑡 distribution with 𝑛 − 2 degrees of freedom

Crying and IQ Confidence intervals for the regression slope  The output below for the crying study is from the regression command in the Minitab software package. Most such packages have similar output.



The first line gives the equation of the lest-squares regression line.

Crying and IQ Confidence intervals for the regression slope



Let’s look at the rest of the information given.

Crying and IQ Confidence intervals for the regression slope



The hypothesis 𝐻0 : 𝛽 = 0 says that crying has no straight-line relationship with IQ.  

Our previous work showed that there is a relationship. The analysis above gives us two values that give us very strong evidence that IQ is correlated with crying.

Beer and Blood Alcohol How well does the number of beers a student drinks predict his or her blood alcohol content? Sixteen student volunteers at Ohio State university drank a randomly assigned number of cans of beer. Thirty minutes later, a police officer measured their blood alcohol content (BAC). Here are the data: Student:

1

2

3

4

5

6

7

8

Beers:

5

2

9

8

3

7

3

5

BAC:

0.10

0.03

0.19

0.12

0.04

0.095

0.07

0.06

Student:

9

10

11

12

13

14

15

16

Beers:

3

5

4

6

5

7

1

4

BAC:

0.02

0.05

0.07

0.10

0.085

0.09

0.01

0.05

Beer and Blood Alcohol Student:

1

2

3

4

5

6

7

8

Beers:

5

2

9

8

3

7

3

5

BAC:

0.10

0.03

0.19

0.12

0.04

0.095

0.07

0.06

Student:

9

10

11

12

13

14

15

16

Beers:

3

5

4

6

5

7

1

4

BAC:

0.02

0.05

0.07

0.10

0.085

0.09

0.01

0.05

The students were equally divided between men and women and differed in weight and usual drinking habits. Because of this variation, many students don’t believe that number of drinks predicts blood alcohol well. What do the data say?

Beer and Blood Alcohol



Student:

1

2

3

4

5

6

7

8

Beers:

5

2

9

8

3

7

3

5

BAC:

0.10

0.03

0.19

0.12

0.04

0.095

0.07

0.06

Student:

9

10

11

12

13

14

15

16

Beers:

3

5

4

6

5

7

1

4

BAC:

0.02

0.05

0.07

0.10

0.085

0.09

0.01

0.05

Let’s input the data into our calculators 

Beers will be our explanatory variable



BAC will be our response variable

Beer and Blood Alcohol H0: 𝛽 = 0 Ha: 𝛽 > 0 

number of beers has no effect on BAC number of beers increases BAC

Do a linear regression on the data 𝑎 = −0.012700604  𝑏 = 0.0179637619  𝑟 2 = 0.7998407228  𝑟 = 0.8943381479 



Create a list of the residuals in L3



Square the list of the residuals in L4

Beer and Blood Alcohol 

Calculate the value of s





𝑠=

𝑠=



𝑦−𝑦 2 𝑛−2

sum 𝐿4 16−2

≈ 0.0204409513

Store this value in S

Beer and Blood Alcohol 

Calculate 𝑆𝐸𝑏 

𝑆𝐸𝑏 =



𝑆𝐸𝑏 = 𝐒



𝑠 𝑥−𝑥 2

sum 𝐿1 −mean 𝐿1 2

Store this value in E

≈ 0.0024017034

Beer and Blood Alcohol 

Why did we find 𝑆𝐸𝑏 ? It is needed to find the confidence interval for 𝛽  𝑏 ± 𝑡 ∗ 𝑆𝐸𝑏 





𝑡 ∗ is the value from the table for 95% confidence with 14 degrees of freedom

0.0179637619 ± 2.145 ∙ 𝐄 0.0128, 0.0231



What does this interval tell us (in context)? 

We are 95% confident that the true slope 𝛽 is between these two values.

Beer and Blood Alcohol 

Now, let’s determine if we have statistically significant evidence that the number of beers effects BAC 

We need to find the 𝑡-ratio 

𝑡=

𝑏 𝑆𝐸𝑏



𝑡=

0.0179637619 𝐄



≈ 7.479592073

Store this value in 𝐓

Beer and Blood Alcohol 

According to the hypotheses, is this a one- or twosided test? 

Using the 𝑡-ratio we just found, calculate our p-value 

𝑝=𝑃 𝑇≥𝑡



𝑝 = 𝑡𝑐𝑑𝑓 𝑡, ∞, 𝑑𝑓



𝑝 = 𝑡𝑐𝑑𝑓 7.479592073,1E99,14



𝑝 ≈ 1.484739926E − 6

Beer and Blood Alcohol 

What is our conclusion to this point? With a p-value so small (𝑝 ≈ 1.48E − 6 < 0.001) we have very strong statistical evidence that the number of beers a person drinks does elevate their BAC.  Specifically, to a 95% level of confidence, for each beer a person consumes, their BAC should increase between 0.013% and 0.023%.  However, student number 3’s BAC was 0.04% (0.19 – 0.15) higher than normal. Though the 𝑥 is not extreme for this student, it is possible this value is influential. To verify that our results are not too dependent on this one value, removal of it and recalculating may be necessary. 

Beer and Blood Alcohol 

Remove Student 3 and recalculate a = 2.481E-5  b = 0.0146  r2 = 0.7684  r = 0.8766 

𝑦−𝑦 2 𝑛−2



S=



𝑆𝐸𝑏 =



𝑏

L4 13

=

𝑠 𝑥−𝑥 2

=

≈ 0.0162 𝐒 𝑥−mean 𝑥

2

≈ 0.0022

0.0146

𝑡 = 𝑆𝐸 = 0.0022 ≈ 6.5894 𝑏

𝑝 = 𝑡𝑐𝑑𝑓 𝑡, ∞, 𝑑𝑓 = 𝑡𝑐𝑑𝑓 6.5894,1E99,13 ≈ 8.7189E − 6  𝑌1 5 = 0.073 as opposed to 0.077 

Predicting Blood Alcohol Steve thinks he can drive legally 30 minutes after he finishes drinking 5 beers. We want to predict

Recommend Documents