Page 1 of 17
1.1
•
The data shown below consists of the price (in dollars) of 7 events at a local venue and
the number of people who attended. Determine if there is significant negative linear
correlation between ticket price and number of attendees. Use a significance level of
0.01 and round all values to 4 decimal places.
Ticket Price Attendence
6
151
10
144
14
148
18
143
22
140
26
140
30
138
Ho: Ï = 0
Ha: Ï < 0
Find the Linear Correlation Coefficient
r=
Find the p-value
p-value =
The p-value is
•
Less than (or equal to) αα
•
Greater than αα
The p-value leads to a decision to
•
Do Not Reject Ho
•
Accept Ho
•
Reject Ho
The conclusion is
•
There is a significant positive linear correlation between ticket price and
attendance.
•
There is insufficient evidence to make a conclusion about the linear correlation
between ticket price and attendance.
•
There is a significant negative linear correlation between ticket price and
attendance.
•
There is a significant linear correlation between ticket price and attendance.
Page 2 of 17
A study was done to look at the relationship between number of movies people watch at the
theater each year and the number of books that they read each year. The results of the
survey are shown below.
Movies 1 8 7 8 10 0
5 7 8
Books
8 3 7 4
4 9 11 6 5
•
•
Find the correlation coefficient: r=r=
Round to 2 decimal places.
The null and alternative hypotheses for correlation are:
H0:H0:
== 0
H1:H1:
≠≠0
The p-value is:
•
•
•
Round to 4 decimal places.
Use a level of significance of α=0.05α=0.05 to state the conclusion of the hypothesis
test in the context of the study.
a.
There is statistically significant evidence to conclude that a person who
watches fewer movies will read fewer books than a person who watches fewer
movies.
b.
There is statistically significant evidence to conclude that a person who
watches more movies will read fewer books than a person who watches fewer
movies.
c.
There is statistically significant evidence to conclude that there is a
correlation between the number of movies watched per year and the number
of books read per year. Thus, the regression line is useful.
d.
There is statistically insignificant evidence to conclude that there is a
correlation between the number of movies watched per year and the number
of books read per year. Thus, the use of the regression line is not appropriate.
r2r2 =
(Round to two decimal places)
Interpret r2r2 :
a.
There is a large variation in the number books people read each year, but
if you only look at people who watch a fixed number of movies each year, this
variation on average is reduced by 52%.
b.
Given any fixed number of movies watched per year, 52% of the population
reads the predicted number of books per year.
c.
52% of all people watch about the same number of movies as they read
books each year.
Page 3 of 17
d.
•
The equation of the linear regression line is:
ˆyy^ =
•
There is a 52% chance that the regression line will be a good predictor for
the number of books people read based on the number of movies they watch
each year.
+
(Please round your answer to the nearest whole number.)
Interpret the slope of the regression line in the context of the question:
a.
b.
c.
•
(Please show your answers to two decimal places)
Use the model to predict the number of books read per year for someone who watches
2 movies per year.
Books per year =
•
xx
For every additional movie that people watch each year, there tends to be
an average decrease of 0.57 books read.
As x goes up, y goes down.
The slope has no practical meaning since people cannot read a negative
number of books.
Interpret the y-intercept in the context of the question:
a.
The best prediction for a person who doesn't watch any movies is that they
will read 10 books each year.
b.
The average number of books read per year is predicted to be 10 books.
c.
If someone watches 0 movies per year, then that person will read 10 books
this year.
d.
The y-intercept has no practical meaning for this study.
Page 4 of 17
3. A study was done to look at the relationship between number of vacation days employees
take each year and the number of sick days they take each year. The results of the survey are
shown below.
Vacation Days 1 11 4 8 1 4 3 6 4
Sick Days
4
1 5 2 9 2 6 2 4
a. Find the correlation coefficient: r=r=
Round to 2 decimal places.
b. The null and alternative hypotheses for correlation are:
H0:H0:
== 0
H1:H1:
≠≠0
The p-value is:
(Round to four decimal places)
c. Use a level of significance of α=0.05α=0.05 to state the conclusion of the hypothesis
test in the context of the study.
d.
e.
o
There is statistically insignificant evidence to conclude that there is a
correlation between the number of vacation days taken and the number of sick
days taken. Thus, the use of the regression line is not appropriate.
o
There is statistically significant evidence to conclude that an employee
who takes more vacation days will take fewer sick days than an employee who
takes fewer vacation days .
o
There is statistically significant evidence to conclude that an employee
who takes more vacation days will take more sick days than an employee who
takes fewer vacation days.
o
There is statistically significant evidence to conclude that there is a
correlation between the number of vacation days taken and the number of sick
days taken. Thus, the regression line is useful.
r2r2 =
(Round to two decimal places)
Interpret r2r2 :
o
o
There is a large variation in the number of sick days employees take, but if
you only look at employees who take a fixed number of vacation days, this
variation on average is reduced by 57%.
57% of all employees will take the average number of sick days.
o
Given any group with a fixed number of vacation days taken, 57% of all of
those employees will take the predicted number of sick days.
o
There is a 57% chance that the regression line will be a good predictor for
the number of sick days taken based on the number of vacation days taken.
Page 5 of 17
f. The equation of the linear regression line is:
ˆyy^ =
+
xx (Please show your answers to two decimal places)
g. Use the model to predict the number of sick days taken for an employee who took 8
vacation days this year.
Sick Days =
(Please round your answer to the nearest whole number.)
h. Interpret the slope of the regression line in the context of the question:
o
As x goes up, y goes down.
o
For every additional vacation day taken, employees tend to take on
average 0.59 fewer sick days.
o
The slope has no practical meaning since a negative number cannot occur
with vacation days and sick days.
i. Interpret the y-intercept in the context of the question:
o
o
o
o
The average number of sick days is predicted to be 7.
The best prediction for an employee who doesn't take any vacation days is
that the employee will take 7 sick days.
The y-intercept has no practical meaning for this study.
If an employee takes no vacation days, then that employee will take 7 sick
days.
Page 6 of 17
4. What is the relationship between the number of minutes per day a woman spends talking
on the phone and the woman's weight? The time on the phone and weight for 7 women are
shown in the table below.
Time
75
79 15
40
72
39
80
Pounds 152 151 99 144 156 136 166
a. Find the correlation coefficient: r=r=
Round to 2 decimal places.
b. The null and alternative hypotheses for correlation are:
H0:H0:
== 0
H1:H1:
≠≠0
The p-value is:
(Round to four decimal places)
c. Use a level of significance of α=0.05α=0.05 to state the conclusion of the hypothesis
test in the context of the study.
d.
e.
o
There is statistically insignificant evidence to conclude that a woman who
spends more time on the phone will weigh more than a woman who spends less
time on the phone.
o
There is statistically significant evidence to conclude that a woman who
spends more time on the phone will weigh more than a woman who spends less
time on the phone.
o
There is statistically insignificant evidence to conclude that there is a
correlation between the time women spend on the phone and their weight.
Thus, the use of the regression line is not appropriate.
o
There is statistically significant evidence to conclude that there is a
correlation between the time women spend on the phone and their weight.
Thus, the regression line is useful.
r2r2 =
(Round to two decimal places)
Interpret r2r2 :
o
Given any group of women who all weight the same amount, 82% of all of
these women will weigh the predicted amount.
o
There is a 82% chance that the regression line will be a good predictor for
women's weight based on their time spent on the phone.
o
There is a large variation in women's weight, but if you only look at women
with a fixed weight, this variation on average is reduced by 82%.
o
82% of all women will have the average weight.
Page 7 of 17
f. The equation of the linear regression line is:
ˆyy^ =
+
xx (Please show your answers to two decimal places)
g. Use the model to predict the weight of a woman who spends 42 minutes on the phone.
Weight =
(Please round your answer to the nearest whole number.)
h. Interpret the slope of the regression line in the context of the question:
o
As x goes up, y goes up.
o
The slope has no practical meaning since you cannot predict a women's
weight.
o
For every additional minute women spend on the phone, they tend to
weigh on averge 0.77 additional pounds.
i. Interpret the y-intercept in the context of the question:
o
o
o
o
The average woman's weight is predicted to be 100.
The best prediction for the weight of a woman who does not spend any
time talking on the phone is 100 pounds.
The y-intercept has no practical meaning for this study.
If a woman does not spend any time talking on the phone, then that woman
will weigh 100 pounds.
Page 8 of 17
1.2
2.. Here is a bivariate data set.
x
23
38
36
24
-17
11
13
27
y
-75
30
129
38
7
110
-48
38
Find the correlation coefficient and report it accurate to four decimal places.
r=
3. Here is a bivariate data set.
x
y
30
26
47
20
22
-48
40
17
4 104
21
21
30
-15
23
15
14
4
Find the correlation coefficient and report it accurate to four decimal places.
r=
6. The following table shows retail sales in drug stores in billions of dollars in the U.S. for
years since 1995.
Year
0
3
6
Retail Sales
85.851
108.426
141.781
Page 9 of 17
Year
9
12
15
Retail Sales
169.256
202.297
222.266
Let S(t)S(t) be the retails sales in billions of dollars in t years since 1995. A linear model for
the data is F(t)=9.44t+84.182F(t)=9.44t+84.182.
Estimate the retails sales in the U. S. in 2015.
billions of dollars.
Use the model to predict the year that corresponds to retails sales of $243 billion.
8. A regression analysis was performed to determine if there is a relationship between hours
of TV watched per day (xx) and number of sit ups a person can do (yy). The results of the
regression were:
y=ax+b
a=-1.386
b=23.093
r2=0.571536
r=-0.756
Use this to predict the number of sit ups a person who watches 9.5 hours of TV can do, and
please round your answer to a whole number.
9. A regression was run to determine if there is a relationship between hours of study per
week (xx) and the final exam scores (yy).
The results of the regression were:
y=ax+b
a=6.309
b=29.15
r2=0.763876
r=0.874
Use this to predict the final exam score of a student who studies 3.5 hours per week, and
please round your answer to a whole number.
Page 10 of 17
10 Statistics students in Oxnard College sampled 11 textbooks in the Condor bookstore and
recorded the number of pages in each textbook and its cost. The bivariate data are shown
below:
Number of Pages (xx)
817
551
951
452
794
528
300
423
854
373
792
Cost(yy)
122.04
92.12
128.12
75.24
119.28
78.36
45
71.76
128.48
53.76
115.04
A student calculates a linear model
yy =
xx +
. (Please show your answers to two decimal places)
Use the model to estimate the cost when number of pages is 702.
Cost = $
(Please show your answer to 2 decimal places.)
Page 11 of 17
1.3
1. A researcher wishes to examine the relationship between years of schooling completed and
the number of pregnancies in young women. Her research discovers a linear relationship, and
the least squares line is: ˆy=3−5xy^=3-5x where x is the number of years of schooling
completed and y is the number of pregnancies. The slope of the regression line can be
interpreted in the following way:
•
When amount of schooling increases by one year, the number of pregnancies tends
to increase by 5.
•
When amount of schooling increases by one year, the number of pregnancies tends
to decrease by 3.
•
When amount of schooling increases by one year, the number of pregnancies tends
to decrease by 5.
•
When amount of schooling increases by one year, the number of pregnancies tends
to increase by 3.
2. Here is a bivariate data set.
x
12
38
26
29
-4
38
19
y
32
108
37
33
-32
90
79
Find the correlation coefficient and report it accurate to four decimal places.
r=
3. Choose the most appropriate completion of the sentence.
In order to indicate a strong correlation between variables, the correlation coefficient will be
•
near 1
•
near -1
•
near -1 or 1
Page 12 of 17
•
near 1/2
•
near 10
•
near 0
4. A study was done asking people how much money they spend per month on their natural
gas bill and how much money per month they spend on their electric bill. The
correlation rr was found to be 0.94 and the p-value for correlation was 0.0003. Then a
person with a high natural gas bill will also have a high electric bill.
•
false
•
true
5. A study was done on smoking and lung capacity. 200 smokers took part in a study that
asked them how many cigarettes a day they smoked and then measured their lung capacity.
The correlation was found to be r=−0.992r=-0.992 . Based solely on this study it can be
concluded that smoking causes lung cancer.
•
true
•
false
6. A study was done that looked at how much red meat people consumed and how long they
lived. The correlation rr was found to be 0.98 and the p-value for correlation was 0.0005.
Then a person who does not eat red meat will live longer than a person who has an 18 ounce
steak every day.
•
false
•
true
7. A researcher found the correlation between age of death and number of cigarettes smoked
per day to be -0.95. Based just on this information, the researcher can justly conclude that
smoking causes early death.
•
true
•
false
8. If the equation of the regression line that relates percent blood alcohol, xx , to reaction
time in milliseconds, yy , is ˆy=36−1.3xy^=36-1.3x , then the slope tells us that for every
Page 13 of 17
percent increase in blood alcohol, we can predict reaction time to go down by 1.3
milliseconds.
•
true
•
false
9. The table below shows the number of state-registered automatic weapons and the murder
rate for several Northwestern states.
8.5 6.7 3.3 2.3
xx 11.7
yy
14
11.5
9.7
6.9
5.7
2.2
2.1
0.6
6.2
6.1
4.6
xx = thousands of automatic weapons
yy = murders per 100,000 residents
This data can be modeled by the equation y=0.85x+4.12.y=0.85x+4.12. Use this equation
to answer the following;
Special Note: I suggest you verify this equation by performing linear regression on your
calculator.
A) How many murders per 100,000 residents can be expected in a state with 10.9 thousand
automatic weapons?
Answer =
Round to 3 decimal places.
B) How many murders per 100,000 residents can be expected in a state with 8 thousand
automatic weapons?
Answer =
Round to 3 decimal places.
10. The following table shows retail sales in drug stores in billions of dollars in the U.S. for
years since 1995.
Year
0
3
6
9
12
15
Retail Sales
85.851
108.426
141.781
169.256
202.297
222.266
Let S(t)S(t) be the retails sales in billions of dollars in t years since 1995. A linear model for
the data is F(t)=9.44t+84.182F(t)=9.44t+84.182.
Page 14 of 17
Use the above scatter plot to decide whether the linear model fits the data well.
•
The function is not a good model for the data
•
The function is a good model for the data.
Estimate the retails sales in the U. S. in 2015.
billions of dollars.
Use the model to predict the year that corresponds to retails sales of $244 billion.
11. You wish to determine if there is a negative linear correlation between the age of a driver
and the number of driver deaths. The following table represents the age of a driver and the
number of driver deaths per 100,000. Use a significance level of 0.01 and round all values to 4
decimal places.
Driver Age Number of Driver Deaths per 100,000
56
19
45
23
45
33
78
24
64
31
56
24
34
25
63
35
30
34
Ho: Ï = 0
Ha: Ï < 0
Find the Linear Correlation Coefficient
r=
Find the p-value
p-value =
The p-value is
•
Greater than αα
•
Less than (or equal to) αα
The p-value leads to a decision to
Page 15 of 17
•
Do Not Reject Ho
•
Accept Ho
•
Reject Ho
The conclusion is
•
There is a significant positive linear correlation between driver age and number of
driver deaths.
•
There is a significant negative linear correlation between driver age and number
of driver deaths.
•
There is insufficient evidence to make a conclusion about the linear correlation
between driver age and number of driver deaths.
•
There is a significant linear correlation between driver age and number of driver
deaths.
12. A biologist looked at the relationship between number of seeds a plant produces and the
percent of those seeds that sprout. The results of the survey are shown below.
Seeds Produced
63
59
69 56 66
65 60
57
Sprout Percent
45.5 55.5 41.5 58 40 43.5 44 45.5
a. Find the correlation coefficient: r=r=
Round to 2 decimal places.
b. The null and alternative hypotheses for correlation are:
H0:H0:
== 0
H1:H1:
≠≠0
The p-value is:
(Round to four decimal places)
c. Use a level of significance of α=0.05α=0.05 to state the conclusion of the hypothesis
test in the context of the study.
o
There is statistically insignificant evidence to conclude that a plant that
produces more seeds will have seeds with a lower sprout rate than a plant that
produces fewer seeds.
o
There is statistically insignificant evidence to conclude that there is a
correlation between the number of seeds that a plant produces and the
percent of the seeds that sprout. Thus, the use of the regression line is not
appropriate.
Page 16 of 17
d.
e.
o
There is statistically significant evidence to conclude that there is a
correlation between the number of seeds that a plant produces and the
percent of the seeds that sprout. Thus, the regression line is useful.
o
There is statistically significant evidence to conclude that a plant that
produces more seeds will have seeds with a lower sprout rate than a plant that
produces fewer seeds.
r2r2 =
(Round to two decimal places)
Interpret r2r2 :
o
56% of all plants produce seeds whose chance of sprouting is the average
chance of sprouting.
o
There is a large variation in the percent of seeds that sprout, but if you
only look at plants that produce a fixed number of seeds, this variation on
average is reduced by 56%.
o
There is a 56% chance that the regression line will be a good predictor for
the percent of seeds that sprout based on the number of seeds produced.
o
Given any group of plants that all produce the same number of seeds, 56%
of all of these plants will produce seeds with the same chance of sprouting.
f. The equation of the linear regression line is:
ˆyy^ =
+
xx (Please show your answers to two decimal places)
g. Use the model to predict the percent of seeds that sprout if the plant produces 58
seeds.
Percent sprouting =
(Please round your answer to the nearest whole number.)
h. Interpret the slope of the regression line in the context of the question:
o
o
o
For every additional seed that a plant produces, the chance for each of the
seeds to sprout tends to decrease by 1.05 percent.
As x goes up, y goes down.
The slope has no practical meaning since it makes no sense to look at the
percent of the seeds that sprout since you cannot have a negative number.
i. Interpret the y-intercept in the context of the question:
Page 17 of 17
o
The average sprouting percent is predicted to be 111.86.
o
If plant produces no seeds, then that plant's sprout rate will be 111.86.
o
The best prediction for a plant that has 0 seeds is 111.86 percent.
o
The y-intercept has no practical meaning for this study.
1. Determine whether the following is an example of a sampling error or a non sampling
error.
A sociologist surveyed 300 people about their level of anxiety on a scale of 1 to 100.
Unfortunately, the person inputting the data into the computer accidentally transposed
six of the numbers causing the statistics to have errors.
•
•
Non Sampling Error
Sampling Error
2. Suppose you want to estimate the percentage of videos on YouTube that are cat
videos. It is impossible for you to watch all videos on YouTube so you use a random
video picker to select 1000 videos for you. You find that 2% of these videos are cat
videos. Determine which of the following is an observation, a variable, a sample statistic,
or a population parameter.
Whether or not a video is a cat video a/an
•
•
•
•
observation
sample statistic
variable
population parameter
Purchase answer to see full
attachment