please complete part 2: filter and mine by using part 1 (acquire and parse) as a guide

All the answers are in Red

Assignment: Acquire and Parse Worksheet

Directions: Use complete sentences to answer the questions below and save the completed

document with your first and last name. Save your Excel data file with your first and last

name.

Upload the following to the Google Classroom:

â€¢ This completed document

â€¢ Excel file with your data

Acquire Step: Obtain the data

Search for Transportation or Covid-19 data. Possible sources: Kaggle or Bureau of

Transportation Statistics (www.bts.org).

1. Provide the name of your data source.

The data source used to install the dataset is Kaggle. The dataset installed is of Covid-19.

2. Provide the URL of your data source.

https://www.kaggle.com/datasets/imdevskp/corona-virus-report

3. Provide a brief description of your data source.

Kaggle:

Kaggle is a platform where multiple datasets are uploaded by different data scientists all

over the world. A user can install, upload and discuss details regarding a dataset on this

platform. This is one of the most used platforms when it comes to solving data scientistsâ€™

task.

Dataset:

The dataset used in this parsing sheet is based on Covid-19. This dataset consists of 6 files

covering all the countries which got affected by the virus. The total number of cases linked

to this dataset initialised from January 2020 and end up to being 8,243 cases. The 6 files in

this dataset are listed below:

â€¢

â€¢

Full-grouped.csv (day to day cases in each country)

Covid19-clean (contains the same data but without province and country)

â€¢

â€¢

â€¢

â€¢

Country wise latest (contains all the updated data)

Day wise (consist of same data as country clean but without level data)

USA country wise (all the data related to the us)

Worldometer data (linked to the Worldometer website and keeps updating)

Used File for Parsing Sheet:

The file on which I worked on covers the data on the basis of a country and will work similar in

case of all countries. The change can be the values and provinces/states.

Dataset file: worldometer_data.csv

4. Is the data primary or secondary?

This dataset is secondary because all the data used in this file is being collected on the

Worldometer website. Another primary source of this data is uploaded on GitHub on behalf

of John Hopkins University.

Parse Step: Understand the meaning of data variables and identify the data types

1. How many variables are in the dataset?

The total number of variables in the dataset are 16.

2. How many of the variables are quantitative?

The total number of quantitative variables are 13. The variables are listed below:

(Population, Total Cases, New cases, Total Deaths, New Deaths, Active Cases, Serious cases,

Tot Cases, Total Tests, Deaths/1M Pop, Tests/1M pop)

3. How many of the variables are qualitative?

The total number of qualitative variables are 3 which are:

(Country, Region and Continent)

4. For the all the variables or the first 5 variables in your dataset, complete the information

below. If you have less than 5 variables, delete the sections with no variable information.

For example, if your dataset has 3 variables, delete the sections for variable 4 and variable

5.

Variable 1

a. Name: Country

b. Qualitative or Quantitative: Qualitative

c. Ordinal, Nominal, Discrete or Continuous. Ordinal

d. Data type: String

Variable 2

a. Name: Continent

b. Qualitative or Quantitative: Qualitative

c. Ordinal, Nominal, Discrete or Continuous. Ordinal

d. Data type: String

Variable 3

a. Name: Population

b. Qualitative or Quantitative: Quantitative

c. Ordinal, Nominal, Discrete or Continuous. Discrete

d. Data type: Float

Variable 4

a. Name: Total Cases

b. Qualitative or Quantitative: Quantitative

c. Ordinal, Nominal, Discrete or Continuous. Discrete

d. Data type: Long Int

Variable 5

a. Name: Total Deaths

b. Qualitative or Quantitative: Quantitative

c. Ordinal, Nominal, Discrete or Continuous. Discrete

d. Data type: Long int

5. Determine two questions you can answer using the variables in your dataset.

Question 1: How can you identify a variable and an observation in your dataset?

Question 2: What is the total number of dependent and independent variable in your

dataset?

Assignment: Filter and Mine Worksheet

Directions: Answer the questions below and save the document with your first and last name.

Upload the following to the Google Classroom:

â— This completed document

â— The completed Excel file with your data, the formulas used and the PivotTable that was

created.

Filter: Remove data not needed to create the visualization.

1. Provide the names of the variables needed to create your visualization:

Variable 1:

Variable 2:

Before removing data, please make sure you have 1 quantitative variable and 1 qualitative

(categorical) variable to complete the Mine Step.

Mine: Calculate simple summary statistics to discern patterns or place the data in mathematical

context.

1. For the quantitative variable above, use Excel to calculate the summary statistics and record

the values below:

Summary Statistics

a. Mean

b. Median

c. Mode

d. Minimum Value

e. Maximum Value

f. Range

g. Lower Quartile

h. Upper Quartile

i. Interquartile Range

j. Lower Limit of Data

Value

k. Upper Limit of Data

l. Outliers

2. For the qualitative variable above, use a PivotTable in Excel to:

a. List the possible categories/levels

b. Provide a count for the categories/levels

Provide a copy of your PivotTable below:

Purchase answer to see full

attachment