Honors Introduction to Statistics

Practice Problems for Test 3

 

Show your work for full credit.  Feel free to check your answers with your calculator, but answers without supporting work will receive little or no credit.  Always interpret results in the context of the situation.

 

1.         Harley-Davidson motorcycles make up 14% of all motorcycles registered in the United States.  In 1995, 9224 motorcycles were reported stolen; 2490 of these were Harleys.  We can think of motorcycles stolen in 1995 as an SRS of motorcycles stolen in recent years.

a.  If Harley’s make up 14% of motorcycles stolen, what would be the sampling distribution of the proportion of Harleys in a sample of 9224 stolen motorcycles?

sample proportion  ~ N(0.14, 0.003613)

b.  Is the proportion of Harleys among stolen bikes significantly higher than their share of all motorcycles?  You could use a hypothesis test to answer this question, but computations are really not necessary.  Explain why not.

 = 2490/9224 = .2699, which is MANY standard deviations above the mean of .14, so yes, the proportion of Harleys among stolen bikes is much higher than the proportion of Harleys in all registered bikes.  If we did the test, the p-value would be TINY.

2.         A college president says, “99% of the alumni support my firing of Coach Boggs.” 

a.  Describe the population and explain in words what the parameter p is.

The population is all college alumni.  The parameter p is the proportion of all college alumni that support the firing of Coach Boggs.

b.  You contact an SRS of 200 of the college’s 15,000 living alumni and find that 152 of them support firing the coach.  Give the numerical value of the statistic  that estimates p.

 = 152/200 = 0.76

c.  Based on the responses of the alumni you contacted, construct a 99% confidence interval for p.  (Your work should exhibit a correct critical value.  Feel free to check your result using your calculator, but an interval with no work will receive no credit.)

0.76 ± 2.576 * sqrt( 0.76*0.24/200) , or (0.682, 0.848)

d.  Explain the meaning of your computation in Part (c).  How does your response relate to the president’s assertion?

We are 99% certain that 68% to 85% of all alumni support the firing of Coach Boggs.  There is very good reason to believe that the president seriously overestimated the level of support he has from alumni.

3.         When we toss a coin and call heads or tails to make a decision, we are generally assuming that coins are “fair,” that is, that there are equal chances that a flipped coin will turn up heads and tails.  What if, instead of flipping pennies, we tip them?  (Carefully set a penny on its edge on a table or other sturdy but movable surface, then jar the table to make it fall over.)  Your friend claims that pennies are more likely to turn up heads than tails when they are tipped, and you decide to test her claim by performing a hypothesis test.

a.  State your hypotheses, both in symbols and in words.

H0:  p = 0.5           The proportion of all tipped pennies that come up heads is 0.5.
Ha:  p > 0.5           The proportion of all tipped pennies that come up heads is greater than 0.5.

b.   Suppose you randomly choose 50 pennies, set them on their edges, and tip them.  Of the 50 pennies, 32 come up heads.  Decide if this is reason to believe your friend’s claim.  (Compute the test statistic and the P-value, and then clearly state your conclusion.)

=32/50=0.64
test statistic: z = (.64-.50)/sqrt(.5*.5/50) = 1.98
p-value
:  0.0239

There is good evidence in support of my friend’s claim.  The data suggest that tipped pennies are more likely to turn up heads than tails.

 

4.         The carapace lengths (in mm) of 15 mature gopher tortoises randomly selected from the preserve in Abacoa are shown below.

320      295      284      303      315      308      303      305     

272      315      291      294      276      318      278

 


        a.  Examine these data for shape, center, spread, and outliers. 

The shape is roughly uniform, with a center (median) of
303 mm and a spread of 272 to 320 mm.  There are no
outliers.


b.  Do you believe the use of our inference techniques is justified in this situation?  Explain your answer.

Yes, the sample was an
SRS, the sample size is at least 15 and there is no serious skewness in the distribution.

c.  Give a 95% confidence interval for the mean carapace length of all mature gopher tortoises in the preserve.  Write a complete sentence interpreting the meaning of your interval. (Your sentence should say something about tortoises!).

298.467 ± 2.145*15.7837/sqrt(15) or (289.73, 307.21)

We are 95% confident that the mean carapace length of all gopher tortoises in the preserve is between 289.7 and 307.3 mm.

        d.  Estimate the sample size you would you need to compute a 95% confidence interval with a margin of error less than 3 mm?  Why can’t you give an exact answer?
    
Need 2.145*15.7837/sqrt(n) = 3.  Solve for n and round up to get that n should be at least 128 tortoises.  We can’t get an exact answer because we don’t know the sample standard deviation and have to estimate it with the one we have.  We also don’t know the correct t* critical value, and have used df = 14.

5.      A study of computer-assisted learning examined the learning of “Blissymbols” by children.  The researcher designed two computer lessons that taught the same content, one in which students interacted with the material, and one in which students controlled the pace of the lesson but otherwise did not interact with the program.  After the lesson, the computer presented a quiz that asked the children to identify 56 Blisssymbols.  Here are the numbers of correct identifications by the 24 children in the Active group:

 

29

28

24

31

15

24

27

23

20

22

23

21

24

35

21

24

44

28

17

21

21

20

28

16

 

And here are the counts for the 24 children in the Passive group:

 

16

14

17

15

26

17

12

25

21

20

18

21

20

16

18

15

26

15

13

17

21

19

15

12

 

a.  Is there good evidence that active learning is superior to passive learning?  State your hypotheses, give a test statistic and P-value, and clearly state your conclusion in the context of student learning.

H0:  ma = mp            The mean number of correct identifications for active and passive learners is the same
Ha:  ma > mp The mean number of correct identifications for active learners is greater than for passive learners

T = (24.41667-17.875)/sqrt(6.31022/24 + 4.0252/24) = 4.28, df = 23

p < 0.0005

There is very strong evidence that the average score for all active learners is greater than the average score for all passive learners.

b.  Give a 90% confidence interval for the difference in the mean number of Blissymbols identified correctly by the active learning group and the passive learning group.  Interpret your result.

24.41667 – 17.875  ± 1.714* sqrt(6.31022/24 + 4.0252/24) or (3.92,9.16)

We are 95% confident that the average score for all active learners is 3.9 to 9.2 points higher than the average score for all passive learners.

c.  What assumptions do your procedures from (a) and (b) require?  Do the data meet these assumptions?  Justify your answer.

We need SRSs from two populations, and the sum of the two sample sizes to be at least 40 if there is skew in the distributions.  Although we don’t have SRSs, we have random assignment into experimental groups, which is reasonable.  Each sample size is 24, so we’re fine.  There is some skewness in the distributions  (a back-to-back stem plot with split stems is handy here), but it’s not bad.  So, yes, we should be okay to use the t-procedures.

6.      Twelve runners are asked to run a 10-kilometer race on each of two consecutive weeks.  In one of the races the runners wear one brand of shoe and in the other a second brand.  The brand they wear in each race is determined at random.  All runners are timed and are asked to run their best in each race.  The results (in minutes) are given below.

Runner

Brand 1

Brand 2

Difference

1

31.23

32.02

-0.79

2

29.33

28.98

0.35

3

30.50

30.63

-0.13

4

32.20

32.67

-0.47

5

33.08

32.95

0.13

6

31.52

31.53

-0.01

7

30.68

30.83

-0.15

8

31.05

31.10

-0.05

9

33.00

33.12

-0.12

10

29.67

29.50

0.17

11

30.55

30.57

-0.02

12

32.12

32.20

-0.08

 

Use the appropriate procedure to determine if there is evidence that the brand of the shoe affects runners’ times.  State your hypotheses, compute the test statistic, give the P-value (or an estimate of it), and interpret your result.


This is a matched pairs design and the differences in times are computed above.  We apply the one sample t-test to the differences.

H0:  m = 0               The mean difference in times (brand 1 minus brand 2) is 0
Ha:  m0                    The mean difference in times is not 0

Test statistic:  T = -0.0975/(.2958/sqrt(12)) = - 1.1418, df = 11

.10 < p < .20

There is no real evidence that shoe brand matters.  The mean difference in times (brand 1 minus brand 2) could plausibly be 0.

7.      The Physician’s Health Study examined the effects of taking an aspirin every other day.  Earlier studies suggested that aspirin might reduce the risk of heart attacks.  The subjects were 22,071 healthy male physicians at least 40 years old.  The study assigned 11,037 of the subjects at random to take aspirin.  The others took a placebo.  The study was double-blind.  The researchers found that 119 participants in the Aspirin group had strokes, while 98 of those in the Placebo group had strokes.  Is this difference significant?  Conduct the appropriate test, be sure the technical conditions for the test have been satisfied, and state your conclusion. 

H0:  pa = pp             The proportion of aspirin takers who have strokes is equal to the proportion of
                                           placebo takers who have strokes
Ha:  pa ≠ pp                

 = 119/11037    = 98/11034            = 217/22071

Since the subjects were randomly assigned, we are willing to treat them as SRSs.  The sample sizes are plenty big to apply the two-sample procedures for proportions.

test statistic:  Z = (119/11037 – 98/11034) / sqrt(217/22071*(1-217/22071)*(1/11037+1/11034)) = 1.43

p = 0.1528

No, the data do not provide evidence that taking aspirin has a significant effect on the incidence of strokes.

8.      How do we estimate the standard deviation of the sampling distribution when computing confidence intervals for the difference in proportions?  When conducting significance tests to compare proportions from two populations?  Explain why we use different things, and how the two are related.

For CIs, we use the standard error formula: 

For HTs, we use the standard error formula:

For CIs, we make no assumptions about the relationship between the two population proportions, so the standard error estimate just replaces the unknown population proportions that appear in the standard deviation formula (for the sample proportion) with their corresponding sample proportions.  For HTs, we are assuming the two population proportions are the same, so we replace both population proportions by the pooled sample proportion (total number of successes divided by the total number of observations).  The formulas are the same except for this substitution, though they look slightly different because a common term has been factored out in the HT version.


           

1.      We might be interested in the number of final exams that are canceled (including ones given as a take-home or other alternate form).  Is the frequency of departures from an "in-class" final related to the subject area?  Suppose that 45 courses are randomly selected and the type of final exam in each is classified to give the two-way table below.

 

 

In-class

Other

Humanities

6

11

Social Sciences

9

6

Natural Sci/Math

12

1

 

a. What sort of test would you perform to answer the question "Is the frequency of departures from an 'in-class' final related to the subject area?"  State your hypotheses.

H0:  ph = ps = pn       The proportion of  other” finals is the same for all three subject areas
Ha:  Not all proportions are the same               

b.  Given that the value of the test statistic is 9.98, test to determine if there is any relationship between the subject area of the course and the type of final given.  Estimate a P-value and state your conclusions in a complete sentence (say something about finals and subject areas).

 

df = 2 and 0.005 < p < 0.010

There is strong evidence that thee is a relationship between subject area and type of final.