+1(978)310-4246 credencewriters@gmail.com
  

please complete part 2: filter and mine by using part 1 (acquire and parse) as a guide

All the answers are in Red
Assignment: Acquire and Parse Worksheet
Directions: Use complete sentences to answer the questions below and save the completed
document with your first and last name. Save your Excel data file with your first and last
name.
Upload the following to the Google Classroom:
• This completed document
• Excel file with your data
Acquire Step: Obtain the data
Search for Transportation or Covid-19 data. Possible sources: Kaggle or Bureau of
Transportation Statistics (www.bts.org).
1. Provide the name of your data source.
The data source used to install the dataset is Kaggle. The dataset installed is of Covid-19.
2. Provide the URL of your data source.
https://www.kaggle.com/datasets/imdevskp/corona-virus-report
3. Provide a brief description of your data source.
Kaggle:
Kaggle is a platform where multiple datasets are uploaded by different data scientists all
over the world. A user can install, upload and discuss details regarding a dataset on this
platform. This is one of the most used platforms when it comes to solving data scientists’
task.
Dataset:
The dataset used in this parsing sheet is based on Covid-19. This dataset consists of 6 files
covering all the countries which got affected by the virus. The total number of cases linked
to this dataset initialised from January 2020 and end up to being 8,243 cases. The 6 files in
this dataset are listed below:
•
•
Full-grouped.csv (day to day cases in each country)
Covid19-clean (contains the same data but without province and country)
•
•
•
•
Country wise latest (contains all the updated data)
Day wise (consist of same data as country clean but without level data)
USA country wise (all the data related to the us)
Worldometer data (linked to the Worldometer website and keeps updating)
Used File for Parsing Sheet:
The file on which I worked on covers the data on the basis of a country and will work similar in
case of all countries. The change can be the values and provinces/states.
Dataset file: worldometer_data.csv
4. Is the data primary or secondary?
This dataset is secondary because all the data used in this file is being collected on the
Worldometer website. Another primary source of this data is uploaded on GitHub on behalf
of John Hopkins University.
Parse Step: Understand the meaning of data variables and identify the data types
1. How many variables are in the dataset?
The total number of variables in the dataset are 16.
2. How many of the variables are quantitative?
The total number of quantitative variables are 13. The variables are listed below:
(Population, Total Cases, New cases, Total Deaths, New Deaths, Active Cases, Serious cases,
Tot Cases, Total Tests, Deaths/1M Pop, Tests/1M pop)
3. How many of the variables are qualitative?
The total number of qualitative variables are 3 which are:
(Country, Region and Continent)
4. For the all the variables or the first 5 variables in your dataset, complete the information
below. If you have less than 5 variables, delete the sections with no variable information.
For example, if your dataset has 3 variables, delete the sections for variable 4 and variable
5.
Variable 1
a. Name: Country
b. Qualitative or Quantitative: Qualitative
c. Ordinal, Nominal, Discrete or Continuous. Ordinal
d. Data type: String
Variable 2
a. Name: Continent
b. Qualitative or Quantitative: Qualitative
c. Ordinal, Nominal, Discrete or Continuous. Ordinal
d. Data type: String
Variable 3
a. Name: Population
b. Qualitative or Quantitative: Quantitative
c. Ordinal, Nominal, Discrete or Continuous. Discrete
d. Data type: Float
Variable 4
a. Name: Total Cases
b. Qualitative or Quantitative: Quantitative
c. Ordinal, Nominal, Discrete or Continuous. Discrete
d. Data type: Long Int
Variable 5
a. Name: Total Deaths
b. Qualitative or Quantitative: Quantitative
c. Ordinal, Nominal, Discrete or Continuous. Discrete
d. Data type: Long int
5. Determine two questions you can answer using the variables in your dataset.
Question 1: How can you identify a variable and an observation in your dataset?
Question 2: What is the total number of dependent and independent variable in your
dataset?
Assignment: Filter and Mine Worksheet
Directions: Answer the questions below and save the document with your first and last name.
Upload the following to the Google Classroom:
● This completed document
● The completed Excel file with your data, the formulas used and the PivotTable that was
created.
Filter: Remove data not needed to create the visualization.
1. Provide the names of the variables needed to create your visualization:
Variable 1:
Variable 2:
Before removing data, please make sure you have 1 quantitative variable and 1 qualitative
(categorical) variable to complete the Mine Step.
Mine: Calculate simple summary statistics to discern patterns or place the data in mathematical
context.
1. For the quantitative variable above, use Excel to calculate the summary statistics and record
the values below:
Summary Statistics
a. Mean
b. Median
c. Mode
d. Minimum Value
e. Maximum Value
f. Range
g. Lower Quartile
h. Upper Quartile
i. Interquartile Range
j. Lower Limit of Data
Value
k. Upper Limit of Data
l. Outliers
2. For the qualitative variable above, use a PivotTable in Excel to:
a. List the possible categories/levels
b. Provide a count for the categories/levels
Provide a copy of your PivotTable below:

Purchase answer to see full
attachment

  
error: Content is protected !!