Exam 1                                                       Name ______________________________

STA 2023, Spring 2007

 

Instructions: Please show your work and clearly indicate your answers.  For full credit, your work and/or explanation must support your answer.  Always specify units.  Provide explanations where requested!

I expect you to exhibit a level of individual academic integrity that is commensurate with being a part of the Honors College.

Please acknowledge that integrity by signing the honor statement at the end of the test.

 

1.        (18 points) In 2004, the Boston Red Sox won the World Series for the first time in 86 years.  The table below gives the salaries of the Red Sox players as of opening day of the 2005 season. 

Player

Salary

 

Player

Salary

Ramirez

$19,806,820

 

Embree

$3,000,000

Schilling

$14,500,000

 

Timlin

$2,750,000

Damon

$8,250,000

 

Bellhorn

$2,750,000

Renteria

$8,000,000

 

Mueller

$2,500,000

Varitek

$8,000,000

 

Arroyo

$1,850,000

Nixon

$7,500,000

 

Mirabelli

$1,500,000

Foulke

$7,500,000

 

Miller

$1,500,000

Clement

$6,500,000

 

Halama

$850,000

Ortiz

$5,250,000

 

Mantei

$750,000

Wakefield

$4,670,000

 

Vazquez

$700,000

Wells

$4,075,000

 

Myers

$600,000

Payton

$3,500,000

 

Youkilis

$323,125

Millar

$3,500,000

 

Stern

$316,000

 

a.  Illustrate the distributions of salaries with a histogram or stemplot.  (If you use a histogram, say what your bins include.  If you use a stemplot, say what your stems and leaves represent.  Label appropriately.)













b.  Give an appropriate numerical summary for the distribution.

 

 

 

c.  Write a sentence or two describing the distribution.  Be sure to address features of the distribution that were discussed repeatedly in class and in your text.





        d.  Which members, if any, of the team have salaries that are considered outliers by the 1.5 * IQR Rule?  Your work should support your answer.



 

2.        (10 points) The figure below shows the number of servings of fruit per day claimed by 74 seventeen-year-old girls in a study in Pennsylvania.



a.  Describe this distribution in words.  Address the features discussed in class.





b.  What percent of the girls ate fewer than two servings of fruit per day?



c.  Compute the 5 number summary for these data.





3.        (12 points) Mechanical measurements on supposedly identical objects usually vary.  The variation often follows a normal distribution.  The stress required to break a type of bolt varies Normally with mean 75 kilopounds per square inch (ksi) and standard deviation 8.3 ksi.

a.  Find the z-score for a bolt that breaks at a stress of 90 ksi.




b.  What proportion of these bolts will withstand a stress of 90 ksi without breaking?






c.  What range covers the middle 50% of breaking strengths for these bolts?








4.        (4 points)
a.  Choose four numbers from the whole numbers 0 to 10 (repeats allowed) with the smallest possible standard deviation.



b.  Choose four numbers from the whole numbers 0 to 10 (repeats allowed) with the largest possible standard deviation.


 

5.        (20 points) The table below gives the heights (in inches) of 11 adult brother-sister pairs.

Brother

71

68

66

67

70

71

70

73

72

65

66

Sister

69

64

65

63

65

62

65

64

66

59

62


a.  Find the LSR line for predicting a sister’s height from a brother’s height.  Record the equation of your line.



b.  Damien is 70 inches tall.  Predict the height of his sister Tonya.



c.  Use your result from the previous part, along with another prediction (brother’s height 65 inches would be a good choice) to carefully draw the LSR line on the scatterplot below.

 


       

       

 

        d.  One point on the scatterplot represents two observations.  Circle it.

 

       

 

 

 

 

 

       

        e.  Based on the scatterplot and the correlation r, do you expect your prediction to be very accurate?  Explain, and include the value of r in your response.





e.  How would the correlation change if all the men were 6 inches shorter than reported in the table?


f.  If heights were measured in centimeters rather than inches, how would the correlation change?  (There are 2.54 cm in an inch.)


g.  If every sister was exactly 3 inches shorter than her brother, what would be the correlation between brother and sister heights?


h.  Report the value of r2 and write a sentence that carefully interprets this value in the context of sibling heights.




 

 

6.        (18 points) We sometimes hear that getting married is good for your career.  The table below presents data from one of the studies behind this generalization.  To avoid gender effects, the investigators looked only at men.  The data describe the marital status and job level of all 8235 male managers and professionals employed by a large manufacturing firm.  The job grades are a measure of the value of a job to the company, with 1 being low and 4 being high.

 

Single

Married

Divorced

Widowed

Total

Grade 1

58

874

15

8

955

Grade 2

222

3927

70

20

4239

Grade 3

50

2396

34

10

2490

Grade 4

7

533

7

4

551

Total

337

7730

126

42

8235

 

 

a.  Give (in percents – you may round to the nearest full percent) the two marginal distributions (one for marital status and one for job grade).  You can write your percentages at the margins of the table above if you’d like.  Do each of your two sets of percentages add to exactly 100%?  If not, why not?







b. Give (in percents) the conditional distribution of job grade among single men.




 

 



c.  Give (in percents) the conditional distribution of job grade among married men.





 

d.  Briefly explain the relationship that your conditional distributions reveal.



 


e.  We should not conclude that single men can help their careers by getting married.  What lurking variables might help explain the association between marital status and job grade?

 

 



 

 

 

 

 

 

 

7.        (6 points)  We expect that students who do well on the midterm exam in a course will usually also do well on the final exam.  Professor Smith looked at the exam scores of all 346 students who took his statistics class over a 10 year period.  The least-square line for predicting the final exam score from midterm-exam score was.

Octavio scores 10 points above the class mean on the midterm.  How many points above the class mean do you predict that he will score on the final?  (Hint:  Use the fact that the LSR line passes through the point  and the fact that Octavio’s midterms score is .) 









8.        (12 points)  Wabash Tech has two professional schools, business and law.  The tables below show applicants to both schools, categorized by gender and admission decision, as well as the combined totals


Business School

 

Law School

 

Totals

 

Admit

Deny

 

 

Admit

Deny

 

 

Admit

Deny

Male

480

120

 

Male

10

90

 

Male

490

210

Female

180

20

 

Female

100

200

 

Female

280

220



a.  Calculate the percent of all male applicants that are admitted and the percent of all female applicants that are admitted.

                                        males __________              females _________

b.  Now compute separately the percents of male and female applicants admitted by the business school and by the law school.

        Business                                males __________              females _________

 

        Law                         males __________              females _________


c.  What is interesting about your answers to the previous parts?  This phemonmenon has a name… what is it?





d.  Explain carefully, as if speaking to a skeptical reporter, how it can happen that Wabash appears to favor males when each school individually favors females.  (In particular, what is going on behind the scenes?)



 

 

 

 

 

 

 

 

 

 

I have abided by the principles of the Honor Code in completing this test.  _________________________________________

                                                                                                                                                                Signature

Exam 2                                                       Name ______________________________

STA 2023, Spring 2007

 

 

  1. (8 points)  The Ministry of Health in the Canadian province of Ontario wants to know whether the national health care system is achieving its goals in the province.  Much information about health care comes from patient records, but that source doesn’t allow us to compare people who use health services with those who don’t.  So the Ministry of Health conducted the Ontario Health Survey, which interviewed a random sample of 61,239 people who live in Ontario.

    a.  What is the population for this survey?  What is the sample?

          population:

          sample:


    b.   The survey found that 76% of males and 86% of females in the sample had visited a general practitioner at
          least once in the past year.  These values are (circle one):  parameters or statistics.

    c.  Do you think these estimates are close to the truth about the entire population?  Why?






  2. (10 points) The level of nitrogen oxides in the exhaust of a particular car model varies with mean 0.9 grams per mile (g/mi) and standard deviation 0.15 g/mi.  A company has 125 cars of this model in its fleet.

a.       What is the approximate distribution of the mean NOX emission level  for the company’s cars?




b.       What is the probability that the mean NOX emission level for the company’s cars is less than 0.8 grams per mile?









c.       What is the level L such that the probability that  is greater than L is only 0.01?







  1. (14 points)  People who eat lots of fruits and vegetables have lower rates of colon cancer than those who eat little of these foods.  Fruits and vegetables are rich in “antioxidants” such as vitamins A, C, and E.  Will taking antioxidants help prevent colon cancer?  A medical experiment studied this question with 864 people who were at risk of colon cancer.  The subjects were divided into four groups:  daily beta-carotene, daily vitamins C and E, all three vitamins every day, or daily placebo.  After four years, the researchers were surprised to find no significant difference in colon cancer among the groups.

    a.  What are the explanatory and response variables in this study?

          explanatory:

          response:

    b.  Outline the design of the experiment, specifying group sizes and explicitly explaining how
          randomization is used.









    c.  The study was double-blind.  What does this mean?




    d.  What does “no significant difference” mean in describing the outcome of the study?






    e.  Suggest some lurking variables that could explain why people who eat lots of fruits and
          vegetables have lower rates of colon cancer.  The experiment suggests that these variables,
          rather than the antioxidants, may be responsible for the observed benefits of fruits and
          vegetables.




4.       (4 points) A researcher looking for evidence of extrasensory perception (ESP) tests 500 subjects.  Four of these subjects do significantly better (P < 0.01) than random guessing.  Is it proper to conclude that these four people have ESP?  Explain your answer.  If the researcher wanted to test whether any of these four subjects have ESP, what should she do?





 

 

  1. (15 points)  A couple plans to have three children.  There are 8 possible arrangements of girls and boys.  For example, GGB means the first two children are girls and the third child is a boy.  All 8 arrangements are (approximately) equally likely.

    a.  Write down all 8 arrangements of the sexes of the three children.  What is the probability of any
          one of these arrangements?





    b.  Let X be the number of girls the couple has.  What is the probability that X = 2?





    c.  Find the distribution of the random variable X.  That is, list the possible values X can take
          along with the probability for each.






    d.  Ashley has three children.  Given that one is a girl, what is the conditional probability that she has
          three girls?




    e.  Brianna also has three children.  Given that her oldest child is a girl, what is the conditional
          probability that she has three girls?




6.    (6 points)
a.  When asked to explain the meaning of “the P-value was P = 0.03,” a student says, “This means
      there is only probability 0.03 that the null hypothesis is true.”  Is this an essentially correct
      explanation?  Explain your answer fully.





b.  Another student, when asked why statistical significance appears so often in research reports,
      says, “Because saying that results are significant tells us that they cannot easily be explained by
      chance variation alone.”  Do you think that this statement is essentially correct?  Explain your
      answer fully.


 



  1.  (16 points) Leaking from underground gasoline tanks at service stations can damage the environment.  It is estimated that 25% of these tanks leak.  You examine 15 tanks chosen at random, independently from one another.  Let X be the number of tanks (out of 15) that are leaking.

a.       Explain why you expect X to follow a binomial distribution.  (You should verify four conditions for the binomial setting.)






b.       What is the probability that exactly 5 of the 15 tanks leak?





c.       Now you do a larger study, examining a random sample of 900 tanks nationally.  What is the mean and standard deviation for the number of tanks in 900 that are leaking? 

mean:

standard deviation:

d.       What is the approximate probability that at least 300 of these 900 tanks are leaking?







e.       In part (b) you found the probability that exactly 1/3 of the sample was leaking, while in part (d) you found the probability that at least 1/3 of the sample was leaking.  Yet, the probability from part (d) is (or should be!) much smaller than that from part (b).  Explain why this makes sense.






 



  1. (12 points) A recent article in the Palm Beach Post discussed the amounts of money spent by parents to prepare their children to take the SAT test. Historically, high school students gained an average of 22 points in their second attempt at the SAT mathematics exam, with no additional preparation.  Assume the change in score has a Normal distribution with standard deviation of s = 50 points. 

a.       Find the z* critical value that would be used to compute a 92% confidence interval for the mean increase in SAT math scores for all students on their second attempt, based on a sample from a recent year.  You do NOT need to compute a confidence interval (you have no data with which to do so).



b.       After taking the SAT once, a random sample of 100 high school students is provided with computer software to prepare to take the SAT test a second time.  These students scored an average of 30 points higher in their second attempt.  Do the data provide evidence that the computer software is more valuable in raising SAT scores than just retaking the test?  State your hypotheses, compute the test statistic, estimate a P-value, and clearly state your conclusion in the context of SAT scores.

hypotheses:





test statistic:




p-value:




conclusion:






  1. (15 points) Here is a distribution of the lengths in feet of 44 great white sharks, together with some summary statistics.  Assume the standard deviation for the lengths of all great white sharks is s = 3 feet.

 

                           

 

a.       Describe the distribution.  Is it reasonable to think these data might have come from a normal population?






b.       Give a 95% confidence interval for the mean length of great white sharks.  Give a one sentence interpretation of your interval, in the context of sharks.










c.       How large of a sample would be needed to estimate the mean length of all great white sharks to within 0.5 feet?






d.       Based on the interval you found above, is there significant evidence at the 5% level to reject the claim that “Great white sharks average 20 feet in length”?  Explain briefly.

 





 

  1. (BONUS 8 points)  The unique colors of cashmere sweaters your firm makes result from heating undyed yearn in a kettle with a dye liquor.  The pH (acidity) of the liquor is critical for regulating dye uptake and hence the final color.  There are 5 kettles, all of which receive dye liquor from a common source.  Twice each day, the pH of the liquor in each kettle is measured, giving a sample of size 5.  The process has been operating in control with  m = 4.22 and s = -0.127. 

    a.  Give the center line and control limits for the  chart.





    b.  What are the natural tolerances for the individual pH measurements?





    c.  Explain what an  control chart is used for.




 



I have adhered to the principles of the Honor Code in completing this test.  __________________________     

Exam 3                                                       Name ______________________________

STA 2023, Spring 2007

 

            Show your work on all problems.  Answers with no work will receive no credit.  When you are asked for conclusions or interpretations of your results, your response should say something about the original data.  Include units whenever appropriate.

            If you use you calculator for something other than arithmetic, say what you did with your calculator and indicate what function(s) you used.  You are strongly encouraged to use your calculator to check the results of your confidence intervals and tests for significance whenever possible (and you need not write anything about these confirmations), but your work should exhibit that you understand how to perform the required procedure. 

            I expect you to exhibit a level of individual academic integrity that is commensurate with being a part of the Honors College.  Please acknowledge that integrity by signing the honor statement at the end of the test.

 

11.   (14 points)  Some FAU students are concerned about having two final exams on the same day.  To investigate the extent of this problem, the student government would like to conduct a survey.

 

a.       If the student government wants to estimate the proportion of students who have two finals on the same day to within 0.03 with 90% confidence, how many students should be surveyed?

 

 

 

 

 

 

 

 

 

 

 

 

 

b.       Suppose the student in charge of the survey eventually uses a sample of size n = 40 and finds that 6 students have two exams on the same day.  Use the plus 4 method to construct an 96% confidence interval for the overall proportion of students at FAU who have this problem.  Write a sentence interpreting your result.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 



12.   (16 points)  A biologist who studies spiders is interested in comparing the lengths of male and female green lynx spiders.  Summary statistics and plots for the lengths, in mm, of 30 male spiders and 25 female spiders are shown below.

                            

a.       Describe the distributions of the spider lengths.  Compare and contrast the male and female distributions.




 



b.       Do these data meet our technical assumptions for inference?  Explain why or why not.





c.       Find a 95% confidence interval for the difference in mean lengths of the two groups of spiders.  Write a one or two sentence interpretation of your confidence interval. 








 

 

 

 

 

 

 

 

 

 

 

 

 

13.   (16 points) The design of controls and instruments affects how easily people can use them.  A student project investigated this effect by asking 15 right-handed students to turn a knob (with their right hands) that moved an indicator by screw action.  There were two identical instruments, one with a right-hand thread (the knob turns clockwise) and the other with a left-hand thread (the knob turns counterclockwise).  The table below gives the time in seconds each subject took to move the indicator a fixed distance. Note that each subject used both instruments!!!

Subject

RightThread

LeftThread

Difference

1

113

137

-24

2

105

105

0

3

130

133

-3

4

101

108

-7

5

138

115

23

6

118

170

-52

7

87

103

-16

8

116

145

-29

9

75

78

-3

10

96

107

-11

11

122

84

38

12

103

148

-45

13

116

147

-31

14

107

87

20

15

118

166

-48

 

a.       Each of the 15 students used both instruments.  Discuss briefly how you would use randomization in arranging the experiment.





b.       The project hoped to show that right-handed people find right-hand threads easier to use.  What is the parameter(s) for the appropriate test?  (Describe the parameter(s) in words, and define the symbol(s) you will use in conducting a hypothesis test.)



c.       Conduct the appropriate test.  State your hypotheses, compute the test statistic, give the P-value, and report your conclusions in the context of the experiment.












 




 

14.  (16 points) Jeanie completed a thesis project on the effect of renourished beaches on loggerhead turtle nesting.  She considered two stretches of beach, one that was renourished between the 2001 and 2002 turtle nesting seasons, and one that was left natural.  One of the things Jeanne studied was the “nesting success rate” of loggerhead turtles, which is found by dividing the number of nests by the number of trips (called “crawls”) that turtles make onto the beach, which may or may not result in a nest.  In the 2001 nesting season, both stretches of beaches that Jeannie considered had a nesting success rate of approximately 0.396, and this is known to be close to the historical average nesting success rate in this geographic region.  In 2002, the renourished beach had 434 nests from 1339 crawls, for a success rate of 0.3241. 

 

a.       Is there evidence that the nesting success rate of loggerhead turtles in 2002 was smaller than the normal rate?  State your hypotheses, both in words and in symbols, report the test statistic and P-value and clearly interpret your results in the context of turtle nesting.







 







b.       It might be the case that 2002 was a less successful nesting season for all stretches of beach, and had nothing to do with the beach renourishment.  To decide, Jeannie compared the 2002 nesting success rates on the renourished and natural beaches.  In 2002, the natural beach had 223 nests from 539 crawls.  Does the data provide evidence of decreased nesting success of loggerhead turtles on renourished beaches compared to natural beaches in 2002? 

State the hypotheses for this test, both in words and in symbols.




Which of the following is the correct  test statistic computation? Circle one.

      z =   OR  z =

      Find the P-value and clearly interpret your results in the context of turtle nesting.



 

 

 

 

 

15.   (20 points) A State Highway Patrol Department would like to assess if the cause of an accident is related to the outcome of the accident.  The Department decides to focus on accidents that occur along the major toll road that crosses the state.  A random sample of 250 accidents reports over the past six months were obtained.  The accidents were cross-classified by primary cause of the accident and by the outcome of the accident.  A portion of the data and the table of expected counts is below.  Note that there is no totals for the table of expected counts.

 

DATA

Death

No Death

Total

Speeding

21

41

62

Recklessness

10

 

 

Drinking

39

71

110

Other

5

 

 

Total

75

175

250

 

EXPECTED COUNTS

Death

No Death

Speeding

18.6

43.4

Recklessness

 

28

Drinking

33

77

Other

 

26.6

 

a.       Fill in the blank spaces in the tables above.

 

b.       Calculate the proportion of accidents resulting in deaths for each of the causes.  Based solely on this information, do you think there is an association between cause and result?  Why or why not?   

 

 

Probability of death, given…

Speeding

 

Recklessness

 

Drinking

 

Other

 

 

c.       State the null and alternative hypotheses for a chi-squared test with this example.

 

 

 

 

 

d.       What are the degrees of freedom?

 

 

e.       The value of the chi square test statistic is 7.61.  Find the P-value.  What is your conclusion?

 

 

 

 

 

 

 

 

 

 

Multiple choice (18 points)

16.   The appraised values of three recently sold houses in the Columbus area are (in thousands of dollars) 160, 215, and 195.  The standard error of the mean of these three appraised values is
A)  190.00    B)  27.84    C)  22.73    D)  16.07

17.   Does the mean cost of text books per semester differ for math students and students in the liberal arts? A sample of six math students and a sample of six liberal arts students were asked how much their text books had cost the previous semester. 

A)

the matched-pairs t test.

C)

the two-sample t test

B)

the one-sample t test

D)

Any of the above are valid.

 

18. Which of the following statements is true?

A)

Two-sample t procedures are less robust than the one-sample t methods.

B)

In planning a two-sample study, it is best to choose equal sample sizes.

C)

In planning a two-sample study, if the two population distributions have different shapes, then you can use samples of size 5.

D)

None of the above is true.

 

19.   A teacher was interested in whether a test-taking-skills class improved the pass rate on a high school exit exam. Let  be the proportion of all students who took the skills class that passed the exit exam, and  be the proportion of students who did not take the skills class that passed the exit exam. A 95% confidence interval for   was calculated to be .

a.       The sampling distribution of the difference in sample proportions has standard error equal to

A)

0.055.

B)

0.1078.

C)

0.066.

D)

The standard error cannot be calculated without knowing the sample results.

 

b.       Which of the following statements gives a correct interpretation of the confidence interval?

A)

We can be 95% confident that the difference between the sample proportions falls between –0.0578 and 0.1578.

B)

There is a 95% probability that the difference between proportions falls between –0.0578 and 0.1578.

C)

Ninety-five percent of the confidence intervals constructed will fall between –0.0578 and 0.1578.

D)

We can be 95% confident that the true difference in proportions falls between –0.0578 and 0.1578.

 

c.       The teacher decides to use the confidence interval to test the hypothesis  using a 0.05 level of significance. His decision should be:

A)

reject the null hypothesis because 0 falls in the 95% confidence interval.

B)

fail to reject the null hypothesis because 0 falls in the 95% confidence interval.

C)

reject the null hypothesis because 0 does not fall in the 95% confidence interval.

D)

It is not possible to test the hypotheses using the 95% confidence interval.



 

 

 

 

 

 

 

 

 

I have adhered to the principles of the Honor Code in completing this test.  __________________________