Lake Pollution

 

(Adapted from the Project Intermath ILAP Lake Pollution)

 

Background

 

The Great Lakes provide the main water supply for many people in the United States and Canada.  Additionally, they are commercially fished, provide transportation, and are a source of recreational facilities.  Unfortunately, a considerable amount of waste and sewage is dumped into these lakes.  This waste includes a massive amount of phosphates which have been traced to detergents, insecticides, and other chemicals such as DDT and mercury.  Extensive pollution kills off the fish, as well as other forms of animal and plant life.

 

The pollution of the Great Lakes has, in the past few decades, been a significant issue on the agenda of both local and regional environmentalist groups.  There have been a number of clean-up initiatives that have been started in the past 20 years which have greatly improved the outlook for the future of this natural waterway.  The primary polluters have been some of the heavy industries that use the lakes as a means of transportation to get their products to both domestic and international markets.  Some of the most significant violators are the steel industries around the Pittsburgh area on Lake Erie and around the Gary, Indiana area of Lake Michigan, and the mining industries around the Superior/Duluth areas of Lake Superior.  Both of these industries have made significant strides in recent years to clean up their operations and their efforts are starting to slowly pay off.

 

You have been hired as an analyst by a regional aluminum processing plant that is located on the shores of Lake Erie.  Two years ago, the plant, in cooperation with several other major polluters on the Great Lakes, implemented a fairly comprehensive clean-up plan which temporarily calmed the fears of concerned environmentalists.  Recently, however, a study was published in an environmentally friendly journal which implied that the clean-up plans that had been implemented by the major polluters were not effective.  The journal article basically claimed that the clean-up plans were mere eyewash to quiet the environmental watchdog groups.  This revelation has energized all of the key players involved in the Great Lakes pollution issue and each side is currently assessing their ability to defend their position in a court of law.

 

Your boss has put you in charge of the team which will study this issue.  She has given you two years worth of data, and she wants some quality statistical analysis which will withstand any scrutiny by the environmentalists.  The data consists of weekly test samples that were taken from Lake Erie and analyzed for various attributes (so a total of 104 samples for the two years).

 


Data

 

The data can also be downloaded directly into your calculator from the file LakePollution.8lg, which is located in the Statistics folder (in the Dr. Mellor folder) on the computers in the lab.  You will need to use the Graph Link program, which is installed on the three computers in the rear of the lab which have signs saying "Graph Link".  The six columns in the table contain the following data:

 

week:           The week in which the sample was collected.

alum_ppm:   The amount of aluminum pollutants in the weekly test sample in parts per million (ppm).

oth_ppm:      The amount of other pollutants in the weekly test sample in parts per million (ppm).

puretime:     The amount of time that it takes to purify a weekly test sample in seconds.

epacode:      This variable is 0 if the sample is not "polluted" per EPA guidelines, and 1 if the sample is "polluted" per EPA guidelines.

temp:            The temperature of the sample in degrees Fahrenheit.

 

week

alum_ppm

oth_ppm

puretime

epacode

temp

1

36.2846

59.739

12.105

1

70.0692

2

35.1909

43.6978

37.864

0

68.8021

3

38.5753

45.4508

67.275

1

71.9412

4

29.8268

55.6974

45.286

1

62.2027

5

32.7359

43.6578

81.583

0

64.9527

6

33.4072

56.6556

6.297

1

66.1605

7

31.5494

49.9503

32.174

1

63.9796

8

29.483

59.2501

15.345

1

63.8191

9

28.3125

59.2778

103.952

1

61.8358

10

33.3508

30.8085

44.684

0

67.2914

11

35.4065

50.986

16.497

1

68.2996

12

40.8595

59.0104

28.704

1

72.8905

13

38.0167

57.1471

67.667

1

70.5572

14

48.5467

48.4696

36.098

1

80.3929

15

38.2178

47.4461

17.372

1

70.1867

16

35.7263

37.0215

33.567

0

68.3591

17

44.3164

52.0141

23.307

1

75.7433

18

33.7658

53.8839

109.182

1

66.486

19

38.6447

35.8155

1.579

0

70.8557

20

44.4616

43.8406

2.717

1

76.2052

21

37.8688

58.4949

72.884

1

70.2122

22

33.2331

56.73

131.37

1

66.7332

23

39.3477

45.6032

57.226

1

72.0411

24

34.2471

37.7449

58.106

0

66.5119

25

38.9567

45.796

16.483

1

71.3661

26

37.2145

44.0806

11.4

1

69.1802

27

39.3465

55.0693

63.58

1

71.9098

28

34.671

31.5343

8.304

0

67.6962

29

26.6301

57.3854

18.576

1

60.7372

30

30.8516

30.2706

91.128

0

64.3912

31

30.8723

42.1799

5.037

0

63.3019

32

29.7076

44.9115

17.931

0

62.7968

33

42.7976

40.9325

38.233

1

74.7648

34

44.4128

57.4171

52.001

1

75.9615

35

28.4938

30.2881

12.81

0

62.3881

36

32.8682

46.1233

30.378

0

66.9608

37

35.1048

39.4139

17.783

0

67.8019

38

40.9571

46.7199

0.608

1

73.3128

39

28.7226

36.3994

27.31

0

62.4025

40

32.7272

42.1318

65.823

0

65.277

41

30.2132

54.0219

18.391

1

64.411

42

29.0897

34.4747

56.934

0

62.9142

43

34.0419

49.2455

168.731

1

67.592

44

28.1496

43.044

29.275

0

62.4702

45

35.2347

58.2225

5.884

1

67.3556

46

37.2364

57.4492

4.351

1

70.5495

47

29.162

53.2136

42.727

1

63.2839

48

40.446

30.933

32.152

0

72.5938

49

35.7201

48.8363

11.04

1

69.567

50

29.2285

32.6732

137.548

0

62.9715

51

42.9233

53.6791

40.148

1

75.5686

52

34.9017

31.7362

69.898

0

68.4558

53

26.6193

35.7099

21.848

0

59.771

54

32.8131

39.6966

19.566

0

64.9707

55

30.609

48.3676

25.052

0

63.1579

56

28.5616

42.1876

1.559

0

61.6976

57

30.4919

53.2427

19.179

1

63.7523

58

39.593

47.2459

44.036

1

71.4984

59

32.4821

31.4635

46.615

0

65.7153

60

35.1442

43.3965

22.226

0

68.2051

61

33.953

44.8305

18.446

0

67.0155

62

29.4919

46.8731

150.477

0

63.7843

63

28.4642

39.5781

15.591

0

60.9638

64

26.1369

37.0135

14.602

0

60.3454

65

35.1675

37.8624

4.265

0

67.6119

66

37.4473

37.1184

40.328

0

69.8011

67

37.6916

36.5494

33.648

0

69.9018

68

24.6438

41.2847

2.789

0

58.6964

69

30.0035

48.9843

91.2

0

62.4671

70

26.0824

50.6743

146.261

0

59.0033

71

31.5738

48.537

33.855

1

64.0707

72

33.683

33.1112

43.153

0

65.7709

73

30.7659

54.359

26.966

1

64.6611

74

28.5338

56.7471

192.736

1

61.7126

75

29.9502

34.9079

58.938

0

63.9489

76

39.0303

43.4007

7.306

1

71.5458

77

40.7136

34.6556

29.328

0

72.5465

78

47.5382

42.0995

42.509

1

80.2774

79

27.8366

55.4403

15.455

1

60.49

80

34.5073

42.9701

13.94

0

67.0364

81

33.2366

56.2188

18.492

1

66.2645

82

38.185

39.4222

15.752

0

71.2584

83

31.3074

33.5046

6.902

0

63.958

84

32.5488

38.6973

22.425

0

66.5546

85

30.3595

59.6348

25.319

1

63.0884

86

36.6709

38.3582

75.05

0

70.0097

87

37.7908

46.5821

3.351

1

70.9777

88

33.5284

36.829

9.954

0

66.3875

89

33.988

35.3832

28.603

0

67.7514

90

30.681

56.9938

41.422

1

63.1022

91

33.2926

33.039

9.068

0

66.6784

92

34.0935

37.8167

6.904

0

66.2453

93

37.59

51.4286

13.276

1

70.7118

94

35.6707

37.3356

0.195

0

67.7272

95

35.0608

30.3771

23.171

0

68.2565

96

30.6652

37.7568

40.732

0

64.8418

97

33.3158

43.9597

78.473

0

67.0687

98

32.9712

50.1189

2.845

1

66.6186

99

43.8148

47.6849

28.864

1

76.5815

100

37.1492

53.0739

53.365

1

68.9791

101

34.0851

34.636

13.298

0

66.2452

102

37.8289

36.0122

106.067

0

71.2823

103

40.5501

31.8986

228.922

0

73.7126

104

41.6061

49.6269

57.271

1

74.019

 

Part 1

 

Your first job is to "see" what the data looks like, to get a better feel for it.  Consider the three variables alum_ppm, oth_ppm and puretime.

 

A.     For each of the three variables, construct a histogram and stemplot.  Be sure to think carefully about the intervals for the histograms, and how much to round off for the stemplots.

B.     Discuss the shape and symmetry of the various plots in A.  Do any of the distributions look normal?

C.     Do a time plot for each of the three variables.  Are there any noticeable patterns?

D.     Produce the five-number summary and associated boxplot for each variable.

E.      Compute the mean and standard deviation for each variable.

 

Part 2

 

It occurs to you that there might be a relationship between the temperature of the sample and the amount of pollution detected, and you decide to look more carefully into this question.

 

A.     Plot the variable alum_ppm against the temperature (temp).  Does there appear to be a relationship?  What kind?

B.     Find the equation of the line which best fits the plot.  What is r, the correlation?  What is r2?  What do these numbers tell you about the linear model?  Give a physical interpretation for the slope of your line?

C.     What amount of aluminum pollutants would you expect to find in a weekly sample with a temperature of 65 degrees Fahrenheit?

D.     Repeat A and B for oth_ppm.  Do aluminum pollutants and other pollutants have similar relationships to temperature?

 

Part 3

 

So far, you've ignored the variable epacode.  You need to look at this to see what is meant by "polluted."

 

A.     Your first problem is to determine what standard the EPA used to decide whether a sample was "polluted."  Your boss has only given you the relevant EPA guidelines - all 5,000 pages - and, in a typical snafu, the index was not included.  You decide it's easier just to figure it out from the data.  Find the total number of pollutants in each sample (the sum of the aluminum and other pollutants), and compare these to the epacode.  Explain why it is reasonable to assume that the EPA considers a sample "polluted" when the total pollutants is 80 ppm.

B.     What percentage of the samples were polluted?

 

Since you are working for the aluminum company, your boss wants you to take a hard look at the aluminum pollutants and see what kind of data you can collect to help the company's case.

 

C.     You recall the average amount of other pollutants (oth_ppm) from your calculations in Part 1.  Assuming this is the amount of other pollutants in your sample, what level of aluminum pollutants will cause the EPA to label the sample polluted?

D.     Assuming the amount of aluminum pollutants (alum_ppm) has a normal distribution with the mean and standard deviation you found in Part 1, what is the probability that the amount of aluminum pollutants will be at or above the level you determined in question C in a single sample?

E.      On the other hand, you decide it might be better to consider the average amount of aluminum pollutants.  What is the probability that the average value of alum_ppm over a two-year period (104 samples) will be at or above the level you determined in question C?

F.      You remember that the measurement of aluminum pollutants was highly dependent on the temperature.  Recall the value of alum_ppm you would expect if the sample temperature were 65 degrees Fahrenheit, from Part 2.  What would the level of other pollutants have to be for the sample to be polluted?

G.     Remember that the level of other pollutants (oth_ppm) was not dependent on temperature, so you can assume it has the same distribution at 65 degrees as in your sample data.  What is the probability that the average value of oth­_ppm over a two-year period would reach the level in F?  What theorem did you use to compute this, since the distribution for oth_ppm is not normal?

 

Part 4

 

Your boss wants a "solid" estimate of the amount of aluminum pollutants currently in the Great Lakes.

 

A.     Provide an estimate for the population mean of the amount of aluminum pollutants in a weekly test sample.

B.     Using the z-statistic, provide an interval estimate of the population mean of the amount of aluminum pollutants in a weekly test sample.  Use a confidence level of 90%.  Explain how you can use the z-statistic even though you do not know the population standard deviation (just the sample standard deviation you computed in Part 1), and why this will give you a reasonable answer.

C.     Interpret your answer to the previous question in terms of probability.

 

Another method in your arsenal is hypothesis testing, and you decide to apply this to the aluminum pollutants.  You will continue to use z-procedures.

 

D.     You want to show that the true population mean of the amount of aluminum pollutants in a weekly test sample is less than the amount you computed in Part 3C.  Test this claim at a significance level of 0.05.  Use a sketch to support your conclusion.

E.      Historical data over the years shows that the true mean of the amount of aluminum pollutants that are in a weekly test sample is 35 ppm.  Perform a Type II error analysis on your test from D assuming that the historical data is accurate.  Use a sketch to support your conclusions.

F.      What is the power of your test in E?  What can you do to increase the power of your test without increasing the probability of other errors?  What is the possible drawback with using this method?

 

G.     (Extra Credit)  Since we did not know the population standard deviation exactly, it would have been more precise to use t-procedures than z-procedures; do so.  You will have to work out for yourself how to do the error analysis in question D using the t-statistic!

 

Part 5

 

A.     Looking back over your previous reports, your boss mentions that you never gave a confidence interval for the linear models you computed in Part 2.  Compute these confidence intervals, and test each linear model against the hypothesis of no linear relationship. 

 

Your boss has now received all of your reports to date, with the details of your computations.  She wants you to prepare a summary of your results, conclusions and recommendations for her to present to the Board of Directors.  Your summary should be approximately two pages long, written in coherent paragraphs and correct English.  Remember that you are giving your recommendations as to how the company should present the results in court to their best advantage; the Board is not interested in your opinions of how the company conducts its business.

 

However, your conscience is pricking you for helping the polluters of the Great Lakes, some of the continent's greatest natural resources.  So you have decided to also write up the opposing case and leak it to the environmentalists (anonymously, of course - you don't want to lose your job!).  Again, this should be a summary of approximately two pages of your results and how the environmentalist lobby could present them to greatest effect.