AP Statistics Final Review: ANSWER KEY - #1-10 1. Resistant measures are not influenced by outliers or skewness. For example, the median is a resistant measure of center while the mean is not. Likewise, the interquartile range is a resistant measure of spread but both the range and standard deviation are heavily influenced by outliers and skewness. 2a. Who: the 100 volunteers; What: gender and treatment are categorical, initial weight, final weight, and change in weight are quantitative; How: a randomized experiment; When and Where not specified; Why: to see which weight loss program works better. 2b. Shape & Unusual Values: Both distributions are approximately symmetric, but A has an outlier on the low end (note: we cannot tell anything about the number of peaks). Center: The median of distribution B is higher. Spread: Both the range and IQR are larger for distribution B. 2c. To identify outliers on the low end, calculate Q1 – 1.5(IQR). Any value lower than that is an outlier. 2d. We cannot tell. The lowest 25% for both groups includes both negative and positive values, but we don’t know how many are negative and how many are positive. For example, it is possible that only 2 values are negative in the A distribution and 12 values in the B distribution. Likewise, there may only be 1 negative value in the B distribution while A can have up to 12 (in a set of 50 values, there will be 12 values below Q1). 3a. Note: This is just one possibility. You could make 4 pie charts or a comparative bar chart. Note: It is better to focus on each grade separately in 4 segmented bar charts than looking at two charts (Obama and McCain) split by grade level. Note: Since the group sizes are so different, you must make your graph in terms of percents (relative frequencies). CDO Mock Election 100%
Percent
80% 60%
McCain
40%
Obama
20% 0% 9
10
11
12
Grade Level
3b. Grade level and presidential preference are not independent in this sample. If they were, then the same percentage of each grade would prefer Obama. However, more than half of 9th, 10th, and 12th graders prefer Obama while less than half of 11th graders preferred Obama. So, knowing a student’s grade level would help you predict who he would vote for. 4. Use a bar chart when the data is categorical and a histogram when the data is quantitative. In a bar chart, the bars shouldn’t touch and can be in any order. In a histogram, the bars will touch unless there is an empty category. 5a. x = 7.65, s = 1.30 5b. On average, the amount of sleep students get is 1.30 hours away from the mean. 5c. is the population mean (the average we would get if we surveyed the entire population of CDO students). x is the sample mean (7.65) which is an estimate of based on a sample of 10 CDO students. It probably isn’t equal to , but hopefully it is close. 6 7.65 1.27 . The student who got only 6 hours of sleep is 1.27 standard deviations below the 5d. z 1.30 mean of the sample. 5e. Since the z-score is a standardized score, it does not depend on the units so it would be the same.
6a. The shape is single peaked and approximately symmetric. 6b. Bin Freq Rel. Freq. Cum. Rel. Freq. 10-<15 1 0.03333333 0.03333333 15-<20 1 0.03333333 0.06666667 20-<25 1 0.03333333 0.1 25-<30 3 0.1 0.2 30-<35 7 0.23333333 0.43333333 35-<40 7 0.23333333 0.66666667 40-<45 4 0.13333333 0.8 45-<50 3 0.1 0.9 50-<55 2 0.06666667 0.96666667 55-<60 0 0 0.96666667 60-<65 0 0 0.96666667 65-<70 1 0.03333333 1 Ogive of Times for Test 1 0.9 0.8
Cum. Rel. Freq.
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
10
20
30
40
50
60
70
80
Time
6c. In the histogram, most of the data is between 25 and 55, so this is where the ogive is most steep (where the most data is being added to the cumulative relative frequency). There is less data below 25 and above 55, so the ogive is relatively flat here. It is perfectly flat from 55 to 65 where there is no data at all. 6d. Tracing over from .75, we estimate Q3 = 43 and tracing over from .25, we estimate Q1 to be 32. So, the IQR is approximately 11. 7a.
7b. Since the data is strongly skewed to the right, I would expect the mean to be greater than the median since the mean is pulled in the direction of the skew. 7c. I would use a relative frequency histogram anytime I wanted to know the percent in a category and especially when I am comparing two distributions with different sample sizes. 8a. range = 20 – 6 = 14, IQR = 16.5 – 11 = 5.5 8b. mean = 13.44(3) + 10 = 50.32 s = 3.67(3) = 11.01 median = 13.5(3) + 10 = 50.5 IQR = 5.5(3) = 16.5 8c. Sally’s score was the same or better than 39% of the test takers. 8d. Still in the 39th percentile. 8e. To make the SD = 15, multiply each value by 15/3.67 = 4.09. Multiplying everything by 4.09 will make the mean = 4.09(13.44) = 54.97 so add an additional 25.03 to each score. Box Plot Collection 9a. min = 1, Q1 = 510, med = 30.5, Q3 = 38, max = 50
0
10
20
30 texts
40
50
60
9b. With the stemplot you were able to see the double-peaked shape, which is not evident in the boxplot. However, in the boxplot it is easy to see where the median is as well as measure the interquartile range (length of the box). 10. Let x = distance the golf ball travels ~ N(250,15)
a) b)
c) d) e)
f)
220 235 250 265 280 231 250 P(x < 231) = P(z < ) = P(z < -1.27) = .1020 15 OR P(x < 231) = normalcdf(-999,231,250,15) = .1026 P(x > 300) = normalcdf(300,999,250,15) = .0004 (can also do with z-scores) P(240 < x < 260) = normalcdf(240,260,250,15) = .4950 (can also do with z-scores) (draw picture with area of .75 to left of boundary) boundary = invnorm(.75,250,15) = 260.1 yards OR x 250 From table: z = .67 = . Solving for x = 260.05 yards 15 (draw picture with area of .10 to left of boundary so .90 is to the right of the boundary. Boundary = invnorm(.10, 250, 15) = 230.8 yards (also can do with z-scores)