The main goal of this project is to help students to build skills in statistical analysis by applying the descriptive statistics tools to estimate the mean COVID-19 Cases per 100,000 people (
C19CP100000
) and the mean COVID-19 Proportion of Total Deaths in Total Cases (
C19PTDITC
) for each of your two selected US selected states, and then use those estimates and the inferential statistics to test the difference in COVID-19 incidences across the two selected states. Students are expected to write their final research report which must describe the population of interest to the analysis, the data collection procedure, the implementation of the statistical procedure to estimate the population parameters (mean C19CP100000 and the mean C19PTDITC) using the sample data, the interpretation of the results, and the policy recommendations.
Project Goals
The main goal of this project is to help students to build skills in statistical analysis by applying the
descriptive statistics tools to estimate the mean COVID-19 Cases per 100,000 people
(C19CP100000) and the mean COVID-19 Proportion of Total Deaths in Total Cases (C19PTDITC)
for each of your two selected US selected states, and then use those estimates and the inferential
statistics to test the difference in COVID-19 incidences across the two selected states. Students are
expected to write their final research report which must describe the population of interest to the
analysis, the data collection procedure, the implementation of the statistical procedure to estimate
the population parameters (mean C19CP100000 and the mean C19PTDITC) using the sample data,
the interpretation of the results, and the policy recommendations.
Learning objectives
Upon completing this research project, the student will be able to:
– Collect and use data in the decision-making process;
– Calculate descriptive statistics;
– Use the Central Limit Theorem to identify the probability distributions of statistics;
– Conduct statistical inference to determine behaviors of population parameters using sample data;
– Interpret the results of analysis; and
– Make policy recommendations
Problem Statement
The coronavirus disease 2019 (COVID-19), which appeared first in China in late 2019, has spread
quickly across the world, causing in its way significant health, economic, demographic, and social
disruptions. What was initially seen as a largely China-centric shock has ballooned to full blown
global crisis. On March 11, 2020, the World Health Organization (WHO) declared COVID-19 a global
pandemic. COVID-19 has brought forth new challenges such as social distancing, requirement to
wear masks in public place, teleworking, prohibition of large-scale social events, travel restrictions
and others. Overcoming those challenges has proved to be the best way to contain the spread of the
pandemic and protect lives. In the particular case of the United States, each state has set forth
strategies to contain the spread of the disease and to reduce the number of deaths.
Project Description
You are tasked with determining whether or not there exits difference of COVID-19 incidences
across two US states of your choice using COVID-19 data, namely, Cases per 100,000 people,
Total Deaths, and Total Cases.
To complete your project, you will use secondary; CDC COVID Data Tracker – 2020
(https://covid.cdc.gov/covid-data-tracker/#county-map) to estimate the difference in COVID-19
incidences across two states. You will also have to test the hypothesis of no difference in
COVID-19 incidences across two states.
Steps for conducting the statistical analysis are described below.
1. Data collection and visualization
For this project, you need to download COVID-19 data using the link provided above. Once on
the data page, you will be prompted to enter your state. Data on counties of the state will be
displayed. Your variables of interest are Cases per 100,000, Total Cases and Total Deaths. Select
a simple random sample which must be the third of the total number of counties. If the third of
counties is less than 20 counties, increase the number of counties to 20 by randomly selecting the
missing number. If the total number of counties is less than 20, please choose a different state.
Please follow the same procedure to select the sample for the other state. Next, plot the two
samples in the same chart (visualization) to detect whether or not there exist differences in Cases
per 100,000 people, Total Cases, and in Total deaths across the two states. The visualizations
should be presented using SPSS visualizations.
To complete the SPSS visualization, each student must complete five modules of Statistics 101
from the following link https://cognitiveclass.ai/courses/statistics101/
Upon the completion of Statistics 101, each student must print the certificate of completion and
attach it as an appendix to the written project report.
2. Estimation of the mean, variance and standard deviation for each of the two COVID19 variables
The estimates of the means C19CP100000, their standard deviations as well as their sample sizes
are the inputs needed to calculate point estimate and the interval estimation of C19CP100,000
differentials (use the confidence level of your choice, preferably between 95% and 99%).
Likewise, the estimates of means C19PTDITC, their standard deviations as well as their sample
sizes are the inputs needed to calculate point estimate and the interval estimation of C19PTDTC
differentials (use the confidence level of your choice, preferably between 95% and 99%). If the
sample size of each state is 30 or more, assume that the standard deviation from the sample is the
same as the population standard deviation and use the Z distribution to construct the confidence
interval. But, if the sample size of your group is less than 30, use the t distribution to construct
the confidence interval.
Next, reduce the margin of error by 75% and calculate the sample size needed to achieve such
target. Finally, reconstruct the confidence intervals of estimates of C19CP100,000 differential
that would result from such simple sample. Repeat the same procedure for the C19PTDITC
differentials.
3. Hypothesis testing of the non-existence of COVID-19 Incidences differentials
In this step, the hypothesis testing procedure will be implemented to test the nonexistence of
COVID-19 incidences differentials for each of the two variables. The hypothesis of nonexistence of COVID-19 incidences differentials will be tested against the alternative hypothesis
of existence of COVID-19 incidences differentials. This step is crucial since it helps to determine
whether or not the observed estimated value of COVID-19 incidences differentials is due to the
random errors. Choose the confidence level between 95% and 99% to conduct your hypothesis
testing. Also, follow the same guidelines highlighted in point 3 to determine the type of
distribution to be used in hypothesis testing. The hypothesis testing procedure is summarized
below.
– Determine the null and alternative hypotheses.
– Choose the significance of level (preferably, set α = 0.05).
– Validate the assumptions of the hypothesis test, identify the appropriate test statistic, and
compute its value (compute P-value)
– Using the graphs to determine if you should be conducting a two-sample test of the mean with
equal or unequal variances.
– Compare the value of your statistic to the theoretical value (from the statistical Tables)
– Make a decision to reject or fail to reject the null hypothesis
– State the conclusion
5. Interpretation of results
Describe the meaning of your results and how they can be used for policy recommendations.
Project Grading/Evaluation
– This project will be graded out of 100 points and will contribute 10% to your final grade in this course.
– The key success factor for this project is to use the correct and cleaned data and demonstrate a
systematic approach to data analysis by using the appropriate tools.
– This project should be completed in Excel or SPSS (or Tableau). There is a free version of SPSS available
for STAT 101 (or Tableau) on the IBM cognitive class. You should complete the course and prove its
completion by attaching your certificate of completion to the final report. You should also explain the
rationale for adopting a particular method of analysis.
– A typed; multiple line-space (at 1.15) paper that contain an introduction, a section describing your
methodology, a data analysis section and a conclusion section that summarizes the results of your
analysis. The formulas used should be shown in detail, and the calculations shown clearly. All cited work
and source of information must be listed in the reference list.
– You should each keep a log on what you have been assigned to do and what you have accomplished
– The project will be evaluated by me and you will receive a discounted grade if there are significant
discrepancies.
– The assessment rubric is attached.
Format
Each project will be 5 pages maximum (appendix not included) and must be written using the following
guidelines and contents:
– Title page (Include project title and your name)
– Introduction: Problem of the propose study, purpose and justification of the study
– Methodology
– Data Collection and Cleaning
– Data analysis
– Interpretation of results
– Findings and conclusion.
– Appendices: Tables, Figures. Certificate of completion-Statistics 191,
– References
Font must be Time New Roman (or Calibri) and Font size must be 12. The line spacing must be multiple
at 1.15. The spacing before must be 6 pt and the spacing after must be 6 pt.
The project must be written using the MLA style.
Purchase answer to see full
attachment