STA 2023 Practice Exam 1              

 

1.  When applying for financial aid, City U students and their families must report household income (as computed for tax purposes).  Family incomes, in thousands of dollars, for a group of 34 incoming students are shown below.

                                22.1         24.5        25.0        29.3        31.2         39.8         40.0         41.0         44.2         45.6

46.7         48.8        49.1        50.2        50.4         51.3         54.1         57.5         59.5         62.1

64.0         64.0        68.3        68.9        70.1         74.4         75.4         80.0         81.5         86.9        

98.8         110.3       129.5       191.2                      

 

(a) Make a histogram or stemplot of these data.  If you choose a histogram, be sure to specify your classes.  If you choose a stemplot, be sure to explain what your stems and leaves represent.

(b) Describe the overall shape of the distribution.  Is it roughly symmetric, skewed to the right, or skewed to the left?  Are there any outliers?

(c) Would the 5 number summary or the mean and standard deviation give a better brief summary for this distribution?  Explain your choice.  Calculate the summary statistics that you choose.

 

2.  The table below summarizes the accept/reject decisions which City U has made for a sample of n=3000 applicants, broken down by the type of high school attended.                                           

 

Public

Private

Parochial

Accept

1254

336

180

Reject

1026

144

60

 

(a) What is the acceptance rate (as a %) among all City U applicants? ___________________________

(b) What proportion of City U applicants are not from a public high school? ___________________________

(c) Find the conditional distribution of acceptance and rejection within each of the high school types.  (That is, find the acceptance and rejection rates for students who attended public high schools.  Then do the same for private high schools and again for parochial schools.)  Summarize the results in a table and with a bar chart.

(d) If there was no relationship between the type of school and the admissions decision, what would you expect for the count in the cell describing number accepted from public high schools?

(e) With a sentence or two, summarize any relationship that you see in these data between the admission decision and the type of high school.

 

3.  City U has a special relationship with an inner city high school that encourages students to apply for admission.  Below are the Verbal SAT scores from a SRS of 10 applicants from that school.

                                510   430   600   540   420   380   620   520   490   540

 

(a) Find the sample mean and standard deviation for these SAT scores.

(b) Find the interquartile range for these data.  [Recall that the interquartile range is the difference between the third and first quartiles.]

(c) Use the 1.5*IQR criterion to decide if the minimum score of 380 unusually low, given the other values in this distribution.  Carefully justify your decision.  [Recall that the 1.5*IQR criterion says that an observation is an outlier if it falls more than 1.5*IQR above the third quartile or below the first quartile.]

 

4.  Suppose that all City U applicants are required to submit a high school grade average (on a 100 point scale).  Past experience shows that these averages follow a normal distribution with a mean of 83.0 and a standard deviation of 6.0 points.

 

(a) What proportion of City U applicants should have a high school average below 80? Find the appropriate z-score and use a standard normal table.

(b) The admissions office would like to designate students in the top 10% of the high school grade distribution for a "fast track" admissions decision.  How high would a student's high school average need to be in order to make it into this special decision group?

Your work should include the relevant z-score and the relationship between the z-score and your answer.

 

5.  (16 points) City U is noted for having a top-ranked water polo team.  In order to attract the best quality players, the school is quite generous in awarding scholarships to students on the team to help defray the $18,000 tuition bill.  Suppose that the boxplot below reflects the size of the scholarships awarded to the 15 current water poloists.  All scholarships are in multiples of $1,000.

                                                                               

 

                                               

Determine whether each of the statements below is VALID (definitely true), INVALID (definitely false), or UNDETERMINED (could be true or false).  Explain your reasoning in each case.

(a) __________________ At least 4 of the water polo players are on full scholarships.

(b) __________________ There is at least one player with a $12,000 scholarship.

(c) __________________ None of the 15 swimmers has a scholarship worth exactly $10,000.

(d) Circle the value below which is the most reasonable estimate for the sample mean of the water polo scholarships.  Briefly explain your reasoning.

                                $ 9,000     $ 13,500    $ 16,000    $ 18,000

 

6.  Trying to determine the best number of students to accept is a tricky admission's decision.  City U officials must assume that some students will reject an offer from City U in order to attend another school.  If too few students are accepted, they may end up with too small an incoming class, but accepting too many students may jeopardize City U's rating in college guidebooks.  Here are several years' data on the number of students accepted and the number who later enrolled. 

Year

Accepted

Enrolled

1996

2440

611

1997

2800

708

1998

2720

637

1999

2360

584

2000

2660

614

2001

2620

625

 

(a) Find the correlation between the number of students accepted and the number that enrolled.

(b) Which variable should be the explanatory variable, and which should be the response?  Explain.  Find the least squares regression line which best fits these 6 data points.

(c) Write a sentence that interprets what the value of the estimated slope of this regression line tells us about accepted and enrolled students.  Be as specific as possible.

(d) If City U accepts 2500 students in 2002, how many would you expect to enroll?

(e)  What is the residual for 1998?  Write a sentence interpreting the value of the residual.

(f) Find the value of r2 for this model and interpret it as a percentage.  Be as specific as possible.  Your statement should relate to City U admissions.

(g) Sketch a time plot of the accepted data and another of the enrolled data. these data.  Do your time plots reveal any strong trend in the number of students accepted or enrolled from year to year?

 


7.  The age distribution of students at City U is modeled by the distribution shown to

the right. 

 

(a)  Approximate the median student age on the graph based on the distribution.

(b)  Do you expect the mean student age to be higher or lower than the median?  Explain

briefly.  Approximate the mean student age, based on the distribution.

 

8.  (a)  Tell me everything you can about correlation.  (What does it measure?  What values can it have?  How is it used?) 

(b)  Sketch two scatterplots, one with a correlation of approximately -0.98 and the other with a correlation of approximately 0.45.  Label your plots so it is clear which is which.

 

9.  Explain or define the following terms as they relate to linear regression:

(a)  Influential observations

(b)  Residual

 

10.  Overweight parents tend to have overweight children.  The results of a study of Mexican American girls aged 9 to 12 years are typical.  The investigators measured body mass index (BMI), a measure of weight relative to height, for both the girls and their mothers.  People with high BMI are typically overweight.  The correlation between the BMI of daughters and the BMI of their mothers was r = 0.506.  The results of this study are confounded.  Explain what the confounding is and what you may or may not conclude from the study. 

 

11.  The table below shows numbers of flights on time and delayed for two airlines at five airports in one month. 

 

Alaska Airlines

America West

 

On Time

Delayed

On Time

Delayed

Los Angeles

497

62

694

117

Phoenix

221

12

4840

415

San Diego

212

20

383

65

San Francisco

503

102

320

129

Seattle

1841

305

201

61

 

(a) What proportion of all Alaska Airlines flights were delayed?  What proportion of all America West flights were delayed?

(b) Find the percentage of delayed flights for Alaska Airlines at each of the five airports.  You may record your percentages in the table, next to the number of delayed flights.  Do the same for America West.

(c) What happens?  What is the name of the phenomenon you observe?  Explain why it occurs in this situation.  (What’s the lurking variable?)

 

12.  In Professor Friedman’s economics course, the correlation between the students total scores prior to the final exam and their final exam scores is r = 0.6.  The pre-exam totals for all students in the course have mean 280 and standard deviation 30.  The final exam scores have mean 75 and standard deviation 8. 

(a)  Professor Friedman grades on a curve so that he expects to assign A’s to approximately 15% of his students, B’s to approximately 35%, C’s to approximately 40% of his students, and D’s or F’s to the remaining 10%.  Assuming the distribution of pre final totals is approximately normal, before the final exam, how many points (find the minimum) would a student need to be earning an A?  a B?  a C?

(b)  Find the least squares regression line of final exam scores on pre-final total scores for this course.

(c)  Explain the meaning of the vertical intercept of your LSR line in the context of Professor Friedman’s class.   Is your interpretation reasonable?  Why or why not?

(d)  Julie’s total before the exam was 300.  What does LSR predict for her score on the final exam?

(e)  Should we should have great confidence in our ability to predict Julie’s final exam score accurately?  Explain your answer and justify it statistically.