Description
Statistic Homework Sheet…MUST HAVE EXCEL SHEET. NO HANDWRITTEN WORK ACCEPTED
For the module 05 Homework, please download the Microsoft Word document using the link below and follow the directions included in the assignment.
Module 05 Homework assignment
Use downloaded documents as a guide to complete assignment
Name:______________________________________
Module 5 Homework Assignment
The paired data below consists of test scores and hours of preparation for 5 randomly selected
students. Use this data set to answer the questions below:
x Hours of preparation 5 2 9 6 10
y Test score
64 48 72 73 80
1. Use the given data to find the correlation coefficient r, regression equation and scatter plot in MS
Excel.
Solution:
Instructor Comments:
2. Based on the linear correlation coefficient r, is this a good model? Explain.
Solution:
Instructor Comments:
3. What is the best predicted test score for a student who spent 7 hours preparing for the test?
Solution:
Instructor Comments:
4. Find the standard error se . Use formula or MS Excel.
Solution:
Instructor Comments:
5. Find the 99% prediction interval for the test score of a person who spent 7 hours preparing for the
test given that E=34.677. Interpret the result.
Solution:
Instructor Comments:
6. Find the explained variation.
Solution:
Instructor Comments:
7. Find the unexplained variation.
Solution:
Instructor Comments:
8. Find the total variation.
Solution:
Instructor Comments:
9. Find the value of r2 and explain its meaning.
Solution:
Instructor Comments:
10. If the data point ( 3, 100) is added to the data set, how would this effect the results of the regression
analysis? Is this data point an outlier, influential point or both? Explain.
Solution:
Instructor Comments:
Name:______________________________________
Module 5 Homework Assignment
The paired data below consists of test scores and hours of preparation for 5 randomly selected
students. Use this data set to answer the questions below:
x Hours of preparation 6 3 8 5 9
y Test score
69 50 68 65 80
1. Use the given data to find the correlation coefficient r, regression equation and scatter plot in MS
Excel.
Solution:
Instructor
Comments:
Scatter Plot:
Hours of Study v Test Score
90
80
70
60
50
40
30
20
10
0
0
2
4
6
8
10
Correlation coefficient uses the Excel formula =CORREL(y-values, x-values)
𑟠= 0.9186
Regression equation:
ð‘¦Ì‚ = ð‘Â0 + ð‘Â1 ð‘¥
The y-intercept of the regression equation (ð‘Â0 ) is found using the Excel formula
=INTERCEPT(y-values, x-values).
The slope of the regression equation (ð‘Â1 ) is found using the Excel formula
=SLOPE(y-values, x-values).
ð‘¦Ì‚ = 40.6754 + 4.1491ð‘¥
2. Based on the linear correlation coefficient r, is this a good model? Explain.
Solution:
Instructor Comments:
This question depends on what value of alpha we
choose. Let’s see what happens with both 𛼠=
0.05 and 𛼠= 0.01:
𛼠= 0.05:
Using the table of critical values of 𑟠from page
692 of the textbook we get a critical value of
0.878 for a sample size of 5. Since the correlation
coefficient is greater than the critical value we
conclude that there is evidence of linear
correlation between hours of study and test
scores and therefore the regression equation is a
good model and can be used for predicting test
scores based of hours of study.
𛼠= 0.01:
Using the table of critical values of 𑟠from page
692 of the textbook we get a critical value of
0.959 for a sample size of 5. Since the correlation
coefficient is less than the critical value we
conclude that there is not evidence of linear
correlation between hours of study and test
scores, and therefore the regression equation is
not a good model and cannot be used for
predicting test scores based of hours of study.
3. What is the best predicted test score for a student who spent 4 hours preparing for the test?
Solution:
Again this question depends on what value of
alpha we choose. Let’s see what happens with
both 𛼠= 0.05 and 𛼠= 0.01:
Instructor Comments:
𛼠= 0.05:
At this level of significance we are OK with using
the regression equation to predict test scores, so
we plug in ð‘¥ = 4 into the regression equation to
get the predicted value ð‘¦Ì‚:
ð‘¦Ì‚ = 40.6754 + 4.1491(4)
ð‘¦Ì‚ = 57.2719
𛼠= 0.01:
At this significance level we are saying that there
is no correlation between hours of study and test
scores, so the regression equation is not used. In
this case our best predicted value is simply the
average of the test scores, ð‘¦Ì…:
ð‘¦Ì… = 66.4
4. Find the standard error se . Use formula or MS Excel.
Solution:
Instructor Comments:
Use the Regression command from the Data
Analysis ToolPak add-in for Excel to find the
standard error (as well as explained variation,
unexplained variation, and total variation from
#’s 6, 7, and 8).
ð‘ ð‘’
5. Find the 99% prediction interval for the test score of a person who spent 7 hours preparing for the
test given that E=34.677. Interpret the result.
Solution:
Instructor Comments:
With the margin of error already given to us in
the problem all we need to do to construct the
confidence interval is subtract this from and add
this to the point estimate. Since we are using a
99% confidence level that would give 𛼠= 0.01 so
our point estimate is ð‘¦Ì… = 66.4:
ð‘¦Ì… − ð¸ < 𑦠< ð‘¦Ì… + ð¸
66.4 − 34.677 < 𑦠< 66.4 + 34.677
31.723 < 𑦠< 101.077
This tells us that we can be 99% confident that a
student who studies for 4 hours will score
between 31.723 and 101.077.
6. Find the explained variation.
Solution:
Instructor Comments:
Explained Variation = 392.507
7. Find the unexplained variation.
Solution:
Instructor Comments:
Unexplained variation = 72.693
8. Find the total variation.
Solution:
Instructor Comments:
Total variation = 465.2
9. Find the value of r2 and explain its meaning.
Solution:
Instructor Comments:
𑟠2 = 0.8437
This is called the coefficient of determination. It
is the proportion of the explained variation out of
total variation. It tells us the amount of change in
the y variable that correlates to a change in the x
variable.
10. If the data point ( 3, 100) is added to the data set, how would this effect the results of the regression
analysis? Is this data point an outlier, influential point or both? Explain.
Solution:
Instructor
Comments:
Hours of Study v Test Score
120
100
80
60
40
20
0
0
2
4
6
8
10
Adding in this point appears to significantly affect the regression line. The line
would have to move up quite a bit to accommodate for the new point added to
the scatter-plot. Shifting the regression line like this make it an influential point.
However, this point is still within the range of x values we’re already using so it
is not an outlier. An outlier would need to be significantly outside of the current
range of values.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.918552
R Square
0.843738
Adjusted R Square
0.791651
Standard Error
4.922499 Standard Error
Observations
5
ANOVA
df
Regression
Residual
Total
SS
MS
F
Significance F
1 392.507 392.507 16.19855 0.02756
3 72.69298 24.23099
4
465.2
Explained variation
Unexplained variation
Total variation
Coefficients
Standard Error t Stat
P-value Lower 95% Upper 95%Lower 95.0%
Upper 95.0%
Intercept 40.67544 6.760089 6.016997 0.009199 19.16182 62.18906 19.16182 62.18906
X Variable 1 4.149123 1.030904 4.024742 0.02756 0.868326 7.429919 0.868326 7.429919
Data
Hours
Score
6
3
8
5
9
3
1r
b_0
b_1
69
50
68
65
80
100
0.918552
correlation coefficient
40.67544
4.149123
y-intercept
slope
90
80
70
60
50
40
30
20
10
0
0
3x
y-hat
4
57.27193
y-bar
66.4
5E
lower bd
upper bd
9 r^2
hours of study
best predicted value using regression equation
average test score
34.677
31.723
101.077
0.843738
coefficient of determination
1
2
Hours of Study v Test Score
Hours of Study v Test S
120
100
80
60
40
20
2
3
4
5
6
7
8
9
10
0
0
1
2
3
Hours of Study v Test Score
4
5
6
7
8
9
10
Purchase answer to see full
attachment