Lake Pollution
(Adapted
from the Project Intermath ILAP Lake
Pollution)
Background
The
Great Lakes provide the main water supply for many people in the United States
and Canada. Additionally, they are
commercially fished, provide transportation, and are a source of recreational
facilities. Unfortunately, a
considerable amount of waste and sewage is dumped into these lakes. This waste includes a massive amount of
phosphates which have been traced to detergents, insecticides, and other
chemicals such as DDT and mercury.
Extensive pollution kills off the fish, as well as other forms of animal
and plant life.
The
pollution of the Great Lakes has, in the past few decades, been a significant
issue on the agenda of both local and regional environmentalist groups. There have been a number of clean-up
initiatives that have been started in the past 20 years which have greatly
improved the outlook for the future of this natural waterway. The primary polluters have been some of the
heavy industries that use the lakes as a means of transportation to get their
products to both domestic and international markets. Some of the most significant violators are the steel industries
around the Pittsburgh area on Lake Erie and around the Gary, Indiana area of
Lake Michigan, and the mining industries around the Superior/Duluth areas of
Lake Superior. Both of these industries
have made significant strides in recent years to clean up their operations and
their efforts are starting to slowly pay off.
You
have been hired as an analyst by a regional aluminum processing plant that is
located on the shores of Lake Erie. Two
years ago, the plant, in cooperation with several other major polluters on the
Great Lakes, implemented a fairly comprehensive clean-up plan which temporarily
calmed the fears of concerned environmentalists. Recently, however, a study was published in an environmentally
friendly journal which implied that the clean-up plans that had been
implemented by the major polluters were not effective. The journal article basically claimed that
the clean-up plans were mere eyewash to quiet the environmental watchdog
groups. This revelation has energized
all of the key players involved in the Great Lakes pollution issue and each side
is currently assessing their ability to defend their position in a court of
law.
Your
boss has put you in charge of the team which will study this issue. She has given you two years worth of data,
and she wants some quality statistical analysis which will withstand any
scrutiny by the environmentalists. The
data consists of weekly test samples that were taken from Lake Erie and
analyzed for various attributes (so a total of 104 samples for the two years).
Data
The
data can also be downloaded directly into your calculator from the file LakePollution.8lg,
which is located in the Statistics folder (in the Dr. Mellor folder) on the
computers in the lab. You will need to
use the Graph Link program, which is installed on the three computers in the
rear of the lab which have signs saying "Graph Link". The six columns in the table contain the
following data:
week: The week in which the sample
was collected.
alum_ppm: The amount of aluminum
pollutants in the weekly test sample in parts per million (ppm).
oth_ppm: The amount of other
pollutants in the weekly test sample in parts per million (ppm).
puretime: The amount of time that it
takes to purify a weekly test sample in seconds.
epacode: This variable is 0 if the
sample is not "polluted"
per EPA guidelines, and 1 if the sample
is "polluted" per EPA guidelines.
temp: The temperature of the sample in
degrees Fahrenheit.
|
week |
alum_ppm |
oth_ppm |
puretime |
epacode |
temp |
|
1 |
36.2846 |
59.739 |
12.105 |
1 |
70.0692 |
|
2 |
35.1909 |
43.6978 |
37.864 |
0 |
68.8021 |
|
3 |
38.5753 |
45.4508 |
67.275 |
1 |
71.9412 |
|
4 |
29.8268 |
55.6974 |
45.286 |
1 |
62.2027 |
|
5 |
32.7359 |
43.6578 |
81.583 |
0 |
64.9527 |
|
6 |
33.4072 |
56.6556 |
6.297 |
1 |
66.1605 |
|
7 |
31.5494 |
49.9503 |
32.174 |
1 |
63.9796 |
|
8 |
29.483 |
59.2501 |
15.345 |
1 |
63.8191 |
|
9 |
28.3125 |
59.2778 |
103.952 |
1 |
61.8358 |
|
10 |
33.3508 |
30.8085 |
44.684 |
0 |
67.2914 |
|
11 |
35.4065 |
50.986 |
16.497 |
1 |
68.2996 |
|
12 |
40.8595 |
59.0104 |
28.704 |
1 |
72.8905 |
|
13 |
38.0167 |
57.1471 |
67.667 |
1 |
70.5572 |
|
14 |
48.5467 |
48.4696 |
36.098 |
1 |
80.3929 |
|
15 |
38.2178 |
47.4461 |
17.372 |
1 |
70.1867 |
|
16 |
35.7263 |
37.0215 |
33.567 |
0 |
68.3591 |
|
17 |
44.3164 |
52.0141 |
23.307 |
1 |
75.7433 |
|
18 |
33.7658 |
53.8839 |
109.182 |
1 |
66.486 |
|
19 |
38.6447 |
35.8155 |
1.579 |
0 |
70.8557 |
|
20 |
44.4616 |
43.8406 |
2.717 |
1 |
76.2052 |
|
21 |
37.8688 |
58.4949 |
72.884 |
1 |
70.2122 |
|
22 |
33.2331 |
56.73 |
131.37 |
1 |
66.7332 |
|
23 |
39.3477 |
45.6032 |
57.226 |
1 |
72.0411 |
|
24 |
34.2471 |
37.7449 |
58.106 |
0 |
66.5119 |
|
25 |
38.9567 |
45.796 |
16.483 |
1 |
71.3661 |
|
26 |
37.2145 |
44.0806 |
11.4 |
1 |
69.1802 |
|
27 |
39.3465 |
55.0693 |
63.58 |
1 |
71.9098 |
|
28 |
34.671 |
31.5343 |
8.304 |
0 |
67.6962 |
|
29 |
26.6301 |
57.3854 |
18.576 |
1 |
60.7372 |
|
30 |
30.8516 |
30.2706 |
91.128 |
0 |
64.3912 |
|
31 |
30.8723 |
42.1799 |
5.037 |
0 |
63.3019 |
|
32 |
29.7076 |
44.9115 |
17.931 |
0 |
62.7968 |
|
33 |
42.7976 |
40.9325 |
38.233 |
1 |
74.7648 |
|
34 |
44.4128 |
57.4171 |
52.001 |
1 |
75.9615 |
|
35 |
28.4938 |
30.2881 |
12.81 |
0 |
62.3881 |
|
36 |
32.8682 |
46.1233 |
30.378 |
0 |
66.9608 |
|
37 |
35.1048 |
39.4139 |
17.783 |
0 |
67.8019 |
|
38 |
40.9571 |
46.7199 |
0.608 |
1 |
73.3128 |
|
39 |
28.7226 |
36.3994 |
27.31 |
0 |
62.4025 |
|
40 |
32.7272 |
42.1318 |
65.823 |
0 |
65.277 |
|
41 |
30.2132 |
54.0219 |
18.391 |
1 |
64.411 |
|
42 |
29.0897 |
34.4747 |
56.934 |
0 |
62.9142 |
|
43 |
34.0419 |
49.2455 |
168.731 |
1 |
67.592 |
|
44 |
28.1496 |
43.044 |
29.275 |
0 |
62.4702 |
|
45 |
35.2347 |
58.2225 |
5.884 |
1 |
67.3556 |
|
46 |
37.2364 |
57.4492 |
4.351 |
1 |
70.5495 |
|
47 |
29.162 |
53.2136 |
42.727 |
1 |
63.2839 |
|
48 |
40.446 |
30.933 |
32.152 |
0 |
72.5938 |
|
49 |
35.7201 |
48.8363 |
11.04 |
1 |
69.567 |
|
50 |
29.2285 |
32.6732 |
137.548 |
0 |
62.9715 |
|
51 |
42.9233 |
53.6791 |
40.148 |
1 |
75.5686 |
|
52 |
34.9017 |
31.7362 |
69.898 |
0 |
68.4558 |
|
53 |
26.6193 |
35.7099 |
21.848 |
0 |
59.771 |
|
54 |
32.8131 |
39.6966 |
19.566 |
0 |
64.9707 |
|
55 |
30.609 |
48.3676 |
25.052 |
0 |
63.1579 |
|
56 |
28.5616 |
42.1876 |
1.559 |
0 |
61.6976 |
|
57 |
30.4919 |
53.2427 |
19.179 |
1 |
63.7523 |
|
58 |
39.593 |
47.2459 |
44.036 |
1 |
71.4984 |
|
59 |
32.4821 |
31.4635 |
46.615 |
0 |
65.7153 |
|
60 |
35.1442 |
43.3965 |
22.226 |
0 |
68.2051 |
|
61 |
33.953 |
44.8305 |
18.446 |
0 |
67.0155 |
|
62 |
29.4919 |
46.8731 |
150.477 |
0 |
63.7843 |
|
63 |
28.4642 |
39.5781 |
15.591 |
0 |
60.9638 |
|
64 |
26.1369 |
37.0135 |
14.602 |
0 |
60.3454 |
|
65 |
35.1675 |
37.8624 |
4.265 |
0 |
67.6119 |
|
66 |
37.4473 |
37.1184 |
40.328 |
0 |
69.8011 |
|
67 |
37.6916 |
36.5494 |
33.648 |
0 |
69.9018 |
|
68 |
24.6438 |
41.2847 |
2.789 |
0 |
58.6964 |
|
69 |
30.0035 |
48.9843 |
91.2 |
0 |
62.4671 |
|
70 |
26.0824 |
50.6743 |
146.261 |
0 |
59.0033 |
|
71 |
31.5738 |
48.537 |
33.855 |
1 |
64.0707 |
|
72 |
33.683 |
33.1112 |
43.153 |
0 |
65.7709 |
|
73 |
30.7659 |
54.359 |
26.966 |
1 |
64.6611 |
|
74 |
28.5338 |
56.7471 |
192.736 |
1 |
61.7126 |
|
75 |
29.9502 |
34.9079 |
58.938 |
0 |
63.9489 |
|
76 |
39.0303 |
43.4007 |
7.306 |
1 |
71.5458 |
|
77 |
40.7136 |
34.6556 |
29.328 |
0 |
72.5465 |
|
78 |
47.5382 |
42.0995 |
42.509 |
1 |
80.2774 |
|
79 |
27.8366 |
55.4403 |
15.455 |
1 |
60.49 |
|
80 |
34.5073 |
42.9701 |
13.94 |
0 |
67.0364 |
|
81 |
33.2366 |
56.2188 |
18.492 |
1 |
66.2645 |
|
82 |
38.185 |
39.4222 |
15.752 |
0 |
71.2584 |
|
83 |
31.3074 |
33.5046 |
6.902 |
0 |
63.958 |
|
84 |
32.5488 |
38.6973 |
22.425 |
0 |
66.5546 |
|
85 |
30.3595 |
59.6348 |
25.319 |
1 |
63.0884 |
|
86 |
36.6709 |
38.3582 |
75.05 |
0 |
70.0097 |
|
87 |
37.7908 |
46.5821 |
3.351 |
1 |
70.9777 |
|
88 |
33.5284 |
36.829 |
9.954 |
0 |
66.3875 |
|
89 |
33.988 |
35.3832 |
28.603 |
0 |
67.7514 |
|
90 |
30.681 |
56.9938 |
41.422 |
1 |
63.1022 |
|
91 |
33.2926 |
33.039 |
9.068 |
0 |
66.6784 |
|
92 |
34.0935 |
37.8167 |
6.904 |
0 |
66.2453 |
|
93 |
37.59 |
51.4286 |
13.276 |
1 |
70.7118 |
|
94 |
35.6707 |
37.3356 |
0.195 |
0 |
67.7272 |
|
95 |
35.0608 |
30.3771 |
23.171 |
0 |
68.2565 |
|
96 |
30.6652 |
37.7568 |
40.732 |
0 |
64.8418 |
|
97 |
33.3158 |
43.9597 |
78.473 |
0 |
67.0687 |
|
98 |
32.9712 |
50.1189 |
2.845 |
1 |
66.6186 |
|
99 |
43.8148 |
47.6849 |
28.864 |
1 |
76.5815 |
|
100 |
37.1492 |
53.0739 |
53.365 |
1 |
68.9791 |
|
101 |
34.0851 |
34.636 |
13.298 |
0 |
66.2452 |
|
102 |
37.8289 |
36.0122 |
106.067 |
0 |
71.2823 |
|
103 |
40.5501 |
31.8986 |
228.922 |
0 |
73.7126 |
|
104 |
41.6061 |
49.6269 |
57.271 |
1 |
74.019 |
Part 1
Your
first job is to "see" what the data looks like, to get a better feel
for it. Consider the three variables
alum_ppm, oth_ppm and puretime.
A.
For
each of the three variables, construct a histogram and stemplot. Be sure to think carefully about the
intervals for the histograms, and how much to round off for the stemplots.
B.
Discuss
the shape and symmetry of the various plots in A. Do any of the distributions look normal?
C.
Do
a time plot for each of the three variables.
Are there any noticeable patterns?
D.
Produce
the five-number summary and associated boxplot for each variable.
E.
Compute
the mean and standard deviation for each variable.
Part 2
It
occurs to you that there might be a relationship between the temperature of the
sample and the amount of pollution detected, and you decide to look more
carefully into this question.
A.
Plot
the variable alum_ppm against the temperature (temp). Does there appear to be a relationship? What kind?
B.
Find
the equation of the line which best fits the plot. What is r, the
correlation? What is r2? What do these numbers tell you about the linear model? Give a physical interpretation for the slope
of your line?
C.
What
amount of aluminum pollutants would you expect to find in a weekly sample with
a temperature of 65 degrees Fahrenheit?
D.
Repeat
A and B for oth_ppm. Do aluminum
pollutants and other pollutants have similar relationships to temperature?
Part 3
So
far, you've ignored the variable epacode.
You need to look at this to see what is meant by "polluted."
A.
Your
first problem is to determine what standard the EPA used to decide whether a
sample was "polluted." Your
boss has only given you the relevant EPA guidelines - all 5,000 pages - and, in
a typical snafu, the index was not included.
You decide it's easier just to figure it out from the data. Find the total number of pollutants in each
sample (the sum of the aluminum and other pollutants), and compare these to the
epacode. Explain why it is reasonable
to assume that the EPA considers a sample "polluted" when the total
pollutants is 80 ppm.
B.
What
percentage of the samples were polluted?
Since
you are working for the aluminum company, your boss wants you to take a hard
look at the aluminum pollutants and see what kind of data you can collect to
help the company's case.
C.
You
recall the average amount of other pollutants (oth_ppm) from your calculations
in Part 1. Assuming this is the amount
of other pollutants in your sample, what level of aluminum pollutants will
cause the EPA to label the sample polluted?
D.
Assuming
the amount of aluminum pollutants (alum_ppm) has a normal distribution with the
mean and standard deviation you found in Part 1, what is the probability that
the amount of aluminum pollutants will be at or above the level you determined
in question C in a single sample?
E.
On
the other hand, you decide it might be better to consider the average amount of aluminum
pollutants. What is the probability
that the average value of alum_ppm
over a two-year period (104 samples) will be at or above the level you
determined in question C?
F.
You
remember that the measurement of aluminum pollutants was highly dependent on
the temperature. Recall the value of
alum_ppm you would expect if the sample temperature were 65 degrees Fahrenheit,
from Part 2. What would the level of
other pollutants have to be for the sample to be polluted?
G.
Remember
that the level of other pollutants (oth_ppm) was not dependent on temperature, so you can assume it has the same
distribution at 65 degrees as in your sample data. What is the probability that the average value of oth_ppm over a two-year period would reach the
level in F? What theorem did you use to
compute this, since the distribution for oth_ppm is not normal?
Part 4
Your
boss wants a "solid" estimate of the amount of aluminum pollutants
currently in the Great Lakes.
A.
Provide
an estimate for the population mean of the amount of aluminum pollutants in a
weekly test sample.
B.
Using
the z-statistic, provide an interval estimate of the population mean of
the amount of aluminum pollutants in a weekly test sample. Use a confidence level of 90%. Explain how you can use the z-statistic
even though you do not know the population standard deviation (just the
sample standard deviation you computed in Part 1), and why this will give you a
reasonable answer.
C.
Interpret
your answer to the previous question in terms of probability.
Another
method in your arsenal is hypothesis testing, and you decide to apply this to
the aluminum pollutants. You will
continue to use z-procedures.
D.
You
want to show that the true population mean of the amount of aluminum pollutants
in a weekly test sample is less than the amount you computed in Part 3C. Test this claim at a significance level of
0.05. Use a sketch to support your
conclusion.
E.
Historical
data over the years shows that the true mean of the amount of aluminum
pollutants that are in a weekly test sample is 35 ppm. Perform a Type II error analysis on your
test from D assuming that the historical data is accurate. Use a sketch to support your conclusions.
F.
What
is the power of your test in E? What
can you do to increase the power of your test without increasing the
probability of other errors? What is
the possible drawback with using this method?
G.
(Extra Credit) Since we did not know the
population standard deviation exactly, it would have been more precise to use t-procedures
than z-procedures; do so. You
will have to work out for yourself how to do the error analysis in question D
using the t-statistic!
Part 5
A.
Looking
back over your previous reports, your boss mentions that you never gave a
confidence interval for the linear models you computed in Part 2. Compute these confidence intervals, and test
each linear model against the hypothesis of no linear relationship.
Your
boss has now received all of your reports to date, with the details of your
computations. She wants you to prepare
a summary of your results, conclusions and recommendations for her to present
to the Board of Directors. Your summary
should be approximately two pages long, written in coherent paragraphs and
correct English. Remember that you are
giving your recommendations as to how the company should present the results in
court to their best advantage; the Board is not interested in your
opinions of how the company conducts its business.
However,
your conscience is pricking you for helping the polluters of the Great Lakes,
some of the continent's greatest natural resources. So you have decided to also write up the opposing case and leak
it to the environmentalists (anonymously, of course - you don't want to lose
your job!). Again, this should be a
summary of approximately two pages of your results and how the environmentalist
lobby could present them to greatest effect.