+1(978)310-4246 credencewriters@gmail.com
  

Description

Statistic Homework Sheet…MUST HAVE EXCEL SHEET. NO HANDWRITTEN WORK ACCEPTED

For the module 05 Homework, please download the Microsoft Word document using the link below and follow the directions included in the assignment.

Module 05 Homework assignment

Use downloaded documents as a guide to complete assignment

Name:______________________________________
Module 5 Homework Assignment
The paired data below consists of test scores and hours of preparation for 5 randomly selected
students. Use this data set to answer the questions below:
x Hours of preparation 5 2 9 6 10
y Test score
64 48 72 73 80
1. Use the given data to find the correlation coefficient r, regression equation and scatter plot in MS
Excel.
Solution:
Instructor Comments:
2. Based on the linear correlation coefficient r, is this a good model? Explain.
Solution:
Instructor Comments:
3. What is the best predicted test score for a student who spent 7 hours preparing for the test?
Solution:
Instructor Comments:
4. Find the standard error se . Use formula or MS Excel.
Solution:
Instructor Comments:
5. Find the 99% prediction interval for the test score of a person who spent 7 hours preparing for the
test given that E=34.677. Interpret the result.
Solution:
Instructor Comments:
6. Find the explained variation.
Solution:
Instructor Comments:
7. Find the unexplained variation.
Solution:
Instructor Comments:
8. Find the total variation.
Solution:
Instructor Comments:
9. Find the value of r2 and explain its meaning.
Solution:
Instructor Comments:
10. If the data point ( 3, 100) is added to the data set, how would this effect the results of the regression
analysis? Is this data point an outlier, influential point or both? Explain.
Solution:
Instructor Comments:
Name:______________________________________
Module 5 Homework Assignment
The paired data below consists of test scores and hours of preparation for 5 randomly selected
students. Use this data set to answer the questions below:
x Hours of preparation 6 3 8 5 9
y Test score
69 50 68 65 80
1. Use the given data to find the correlation coefficient r, regression equation and scatter plot in MS
Excel.
Solution:
Instructor
Comments:
Scatter Plot:
Hours of Study v Test Score
90
80
70
60
50
40
30
20
10
0
0
2
4
6
8
10
Correlation coefficient uses the Excel formula =CORREL(y-values, x-values)
𝑟 = 0.9186
Regression equation:
𝑦̂ = 𝑏0 + 𝑏1 𝑥
The y-intercept of the regression equation (𝑏0 ) is found using the Excel formula
=INTERCEPT(y-values, x-values).
The slope of the regression equation (𝑏1 ) is found using the Excel formula
=SLOPE(y-values, x-values).
𝑦̂ = 40.6754 + 4.1491𝑥
2. Based on the linear correlation coefficient r, is this a good model? Explain.
Solution:
Instructor Comments:
This question depends on what value of alpha we
choose. Let’s see what happens with both 𝛼 =
0.05 and 𝛼 = 0.01:
𝛼 = 0.05:
Using the table of critical values of 𝑟 from page
692 of the textbook we get a critical value of
0.878 for a sample size of 5. Since the correlation
coefficient is greater than the critical value we
conclude that there is evidence of linear
correlation between hours of study and test
scores and therefore the regression equation is a
good model and can be used for predicting test
scores based of hours of study.
𝛼 = 0.01:
Using the table of critical values of 𝑟 from page
692 of the textbook we get a critical value of
0.959 for a sample size of 5. Since the correlation
coefficient is less than the critical value we
conclude that there is not evidence of linear
correlation between hours of study and test
scores, and therefore the regression equation is
not a good model and cannot be used for
predicting test scores based of hours of study.
3. What is the best predicted test score for a student who spent 4 hours preparing for the test?
Solution:
Again this question depends on what value of
alpha we choose. Let’s see what happens with
both 𝛼 = 0.05 and 𝛼 = 0.01:
Instructor Comments:
𝛼 = 0.05:
At this level of significance we are OK with using
the regression equation to predict test scores, so
we plug in 𝑥 = 4 into the regression equation to
get the predicted value 𝑦̂:
𝑦̂ = 40.6754 + 4.1491(4)
𝑦̂ = 57.2719
𝛼 = 0.01:
At this significance level we are saying that there
is no correlation between hours of study and test
scores, so the regression equation is not used. In
this case our best predicted value is simply the
average of the test scores, 𝑦̅:
𝑦̅ = 66.4
4. Find the standard error se . Use formula or MS Excel.
Solution:
Instructor Comments:
Use the Regression command from the Data
Analysis ToolPak add-in for Excel to find the
standard error (as well as explained variation,
unexplained variation, and total variation from
#’s 6, 7, and 8).
𝑠𝑒
5. Find the 99% prediction interval for the test score of a person who spent 7 hours preparing for the
test given that E=34.677. Interpret the result.
Solution:
Instructor Comments:
With the margin of error already given to us in
the problem all we need to do to construct the
confidence interval is subtract this from and add
this to the point estimate. Since we are using a
99% confidence level that would give 𝛼 = 0.01 so
our point estimate is 𝑦̅ = 66.4:
𝑦̅ − 𝐸 < 𝑦 < 𝑦̅ + 𝐸 66.4 − 34.677 < 𝑦 < 66.4 + 34.677 31.723 < 𝑦 < 101.077 This tells us that we can be 99% confident that a student who studies for 4 hours will score between 31.723 and 101.077. 6. Find the explained variation. Solution: Instructor Comments: Explained Variation = 392.507 7. Find the unexplained variation. Solution: Instructor Comments: Unexplained variation = 72.693 8. Find the total variation. Solution: Instructor Comments: Total variation = 465.2 9. Find the value of r2 and explain its meaning. Solution: Instructor Comments: 𝑟 2 = 0.8437 This is called the coefficient of determination. It is the proportion of the explained variation out of total variation. It tells us the amount of change in the y variable that correlates to a change in the x variable. 10. If the data point ( 3, 100) is added to the data set, how would this effect the results of the regression analysis? Is this data point an outlier, influential point or both? Explain. Solution: Instructor Comments: Hours of Study v Test Score 120 100 80 60 40 20 0 0 2 4 6 8 10 Adding in this point appears to significantly affect the regression line. The line would have to move up quite a bit to accommodate for the new point added to the scatter-plot. Shifting the regression line like this make it an influential point. However, this point is still within the range of x values we’re already using so it is not an outlier. An outlier would need to be significantly outside of the current range of values. SUMMARY OUTPUT Regression Statistics Multiple R 0.918552 R Square 0.843738 Adjusted R Square 0.791651 Standard Error 4.922499 Standard Error Observations 5 ANOVA df Regression Residual Total SS MS F Significance F 1 392.507 392.507 16.19855 0.02756 3 72.69298 24.23099 4 465.2 Explained variation Unexplained variation Total variation Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Lower 95.0% Upper 95.0% Intercept 40.67544 6.760089 6.016997 0.009199 19.16182 62.18906 19.16182 62.18906 X Variable 1 4.149123 1.030904 4.024742 0.02756 0.868326 7.429919 0.868326 7.429919 Data Hours Score 6 3 8 5 9 3 1r b_0 b_1 69 50 68 65 80 100 0.918552 correlation coefficient 40.67544 4.149123 y-intercept slope 90 80 70 60 50 40 30 20 10 0 0 3x y-hat 4 57.27193 y-bar 66.4 5E lower bd upper bd 9 r^2 hours of study best predicted value using regression equation average test score 34.677 31.723 101.077 0.843738 coefficient of determination 1 2 Hours of Study v Test Score Hours of Study v Test S 120 100 80 60 40 20 2 3 4 5 6 7 8 9 10 0 0 1 2 3 Hours of Study v Test Score 4 5 6 7 8 9 10 Purchase answer to see full attachment

error: Content is protected !!