+1(978)310-4246 credencewriters@gmail.com

Data Analysis Project:
For this project, imagine that you are a decision maker and need to analyze multiple
alternatives (at least 3). For this project, you must collect dataset(s) that include at least 2
qualitative variables and 4 quantitative variables.
Below is a list of requirements for the project:
1. Create a Purpose slide that discusses the who?, what?, when?, why? Note that although
you have flexibility to choose an area of interest, it should be appropriately applied to
real-world decisions in fields such as sports, business, financial analysis, etc.
2. One of the most important aspects of decision making is understanding the data.
Create appropriate visualization tools, and include a slide for each that highlights it’s
utilization and value-add. Make sure to identify the type of data being analyzed and
ensure that appropriate statistics and visualizations are being provided.
You must have a have a histogram (including discussion of symmetry,
skewness, etc.), box-and-whisker plot (comparing multiple alternatives), and
scatter plot (correlation analysis for two variables for each alternative).
3. Use appropriate statistical measures to help decision makers analyze variables and/or
alternatives and make informed decisions.
You must include descriptive statistics such as mean, trimmed mean, median,
mode, range, variance, standard deviation, quartiles, interquartile range, and
coefficient of variation. All of these are available in Minitab – Stat – Display
Descriptive Statistics – Statistics.
You must determine if the dataset(s) are resistant.
You must determine any outliers that exist in your dataset(s) and discuss why
they may exist, and what you feel is appropriate to do with them.
You must find the correlation coefficient between two variables in your
dataset(s) for each of the alternatives.
If you have time series data, then you must compute a 2 and 3 period moving
average. In Minitab, use Stat – Time Series – Moving Average, see the
hyperlink for additional information. Enter your data for Moving Average Minitab
If you have grouped data, then you must find the mean and standard deviations.
You may also considering including, but not required, empirical rule concepts,
measures of relative position, data subsetting, and proportions.
You may also consider including, but not required, other data visualizations
discussed throughout the course.
4. Your project should analyze multiple variables and alternatives. You must have at least
2 qualitative variables, 4 quantitative variables, and 3 alternatives, but you are
welcome to include as may as you feel is needed for your given decision problem.
Some examples that you may consider:
Comparing various sports teams/players/etc.
Stock market analysis. Yahoo Finance also includes data on cryptocurrencies, so it
may be interesting to compare various portfolio options and think about the type of
investor (e.g. conservative, moderate, aggressive) that may be appropriate for each.
Comparing multiple Universities/vehicles/etc. and selecting the best alternative.
Of course, you have the flexibility to pursue your own area of interest! If you want to discuss
a project topic proposal with me, please send me an email.
I created an Up Post rating scheme so you can “thumbs up” projects, students responses, etc.
that you feel were top tier and helpful! This is a great way to direct students to value adding
areas within the discussion board which will further support everyone’s learning!
I also selected a setting that requires you to create your own thread before you can read and
reply to other student’s posts. The purpose of this is for everyone to work on their own project
before seeing products produced by classmates.
Please save your work as LastName_FirstName_Project.
Assignment 2
Department of Computer Science and Engineering
Faculty of Engineering
University of North Texas
CSCE5390 sections 001/002
Summer 2022
Due on or before July 22, 2022
This assignment is on computing motion vectors for a given video. Please follow the
following steps.
1. Capture a one-minute video. (A hallway video will give you the best result)
2. Use ffmpeg tool to extract individual frames. (Refer to the documentation on how to
do this, this is a part of the assignment)
3. Use 16×16 blocks to compute MVs. You can use sequential search.
4. Search area size can be varying, and the student must come up with a best value for
5. Compute the MVs using your language of your choice. MATLAB is preferred as it will
lift most of the heavy weight.
6. Create a CSV (comma separated values) file for each pair of frames with the
following information
Block number
Current frame
Previous frame
The header is present in the above table to make it clear for you. The CSV file you
may create can have five columns without any header that will make it easy for your
to do the programming part. Such that the final file will have values as given below,
and f6
1, 0,0, 4, 3
2,16,0, 18, 4
Submit your motion vector files as a compressed file that includes all the MV CSV
files generated for each pair of the frames (name the file with the previous frame
number: ex: 1.csv, 3.csv, etc.). We may ask you to provide the video captured by you
at any time and keep it handy. You do not need to submit the video.
Create a simple word document and include one pair of frames from your video and
the first 10 entries of the MV csv file that was generated for the given pair of frames.
Include a snapshot of your MV calculation program in the word document.
Include steps on how to run the program so that we can test it locally and grade it
How To Improve The
Chicago Bears Offense
In The NFL draft
(based on 2021-2022 season)
James Leonard
The Chicago Bears have not had a superbowl win since 1985. The team last appeared in a
superbowl in 2007 and have not had a playoff win since 2011.
Being one of the largest cities in the nation and also the oldest NFL team, the Chicago Bears
market is very large and can get even larger if the team starts winning games.
One of the biggest issues with the team has always been its offensive production, often
ranking anywhere in the bottom half of the league to the bottom quarter.
The goal of this research is to take a look at the teams biggest competition and see what
positions they can draft at with its 39th overall draft pick to boost offensive production. The
three positions, or alternatives being closely examined will be Receiver, Running Back and
Offensive Lineman. A quarterback was drafted with the first pick last season so that will not
be measured.
Season Stats Compared To Major Opponents
One data set that will be examined to
assist in decision making of what
position to draft using the #39 pick is
the team’s offensive stats compared to
the teams major opponents.
Minnesota (MIN), Green Bay (GB),
Detroit (DET) are within the Bears
division (NFC North), so the team plays
them each twice per season which is
nearly a third of its total games.
Winning the division is a very important
step in making the playoffs and
eventually winning the Super Bowl. The
Los Angeles (LAR) was the best team in
the NFL and won the superbowl this
Lastly, is the NFL average of these stats
to compare the Bears to the rest of the
By using a Histogram, the
frequency at which each
sample attains each stat can
be observed easily.
This histogram is
symmetrical and exactly
three of the samples had
under 4,250 yards total and
three of the six had over
that amount.
By looking at the total
passing yardage, one can
observe that the Chicago
Bears can use some help in
the passing game. By using
this data to make further
inferences into the
passing-game issues of the
Bears it can be determined
whether or not the team
should draft some sort of
The two datasets of
Pass attempts and
yardage were chosen
to be compared in
order to see how
efficient each offense
The data
demonstrates a
positive correlation
between the amount
of pass attempts and
touchdowns. But
there is one outlier in
this set but it is not
significant enough to
make it non-resistant.
Without the Bears X
and Y The Mean of X
would change from
29.3 to 29.83 and the
Mean of Y would
change from 587.33
to 596.4
Receiving Yards – Bears are last in this category being 577 less than the mean average.
Receiving Touchdowns – Bears are last in this category being 13.83 below the mean average.
Pass Attempts – The Bears are also last in this category being 45.33 below the mean average.
Relative Frequencies – (frequency of event / total number of events)
The probability of a team having less than 4,000 passing yards = 3/6 = .5 = 50% chance
The probability of a team having over 3,635 passing yards = 5/6 = .8333 = 83% chance
The probability of a team having less than 20 touchdowns with over 500 passing attempts
is 1/6 =.167 = 16.7% chance
This data can help in determining goals for the future of the team based on this years
stats. It tells them that they are way below the average passing yardage, passing
touchdowns and attempts which means some improvement at the receiver position is
Correlation Coefficients between passing attempts and touchdowns based on scatterplot X values: 16 34 39 23 41 26 Y values: 542 604 593 593 607 585 X Mean: 29.833 Y Mean: 2789.333
(X – Mx)(Y – My) = Product of Deviation Scores (627.111 + 69.444 +51.944 + -38.722 + 219.611 +
8.944) = 938.333
(Y – My) ^2 = Y Sum of Squares = 2055.111 + 277.778 + 32.111 + 32.111 + 386.778 + 5.444 =
(X – Mx) ^2 = X Sum of Squares = 191.361 + 17.361 + 84.028 + 46.694 + 124.694 + 14.694 = 478.833
Hence, r = 938.333 Product of Deviation Scores / √((478.833 Sum of Squares – x values)(2789.333
Sum of Squares Y values)) = .8119
Therefore, the relationship between passing attempts and touchdowns has a strong positive
correlation because .8119 is greater than .7 and is positive. This can tell the team that they need to
increase its passing attempts to therefore score more passing touchdowns and will hopefully result in
outsourcing more teams and winning more games.
In this histogram the
rams are the
obvious outlier with
The Bears, lead this
category with 2,018
rushing yards.
This data is
interesting because
the Rams won
Superbowl LVI, yet
they are by far dead
last in this category.
Rush Att.
The two datasets of Pass
attempts and yardage were
chosen to be compared in
order to see how efficient
each offenses running
productivity is.
In this case, productivity is
counted in scoring
Based on this graph, a
positive correlation can be
seen between the amount of
rushing attempts by each
team and its total number of
rushing touchdowns. The
bears are an outlier in terms
of attempts, but are only just
in the top third of the data in
This data is resistant
because there is no
significant difference
between stats that would
change the mean.
If the Bears, with 26 more
attempts than the next, at
Rushing Yardage- The Bears lead this category with 2018 (122 above the mean average)
Rushing Touchdowns – The Bears are second in this category with 14 (1.5 above mean average)
Rushing Attempts – The Bears lead this category with 475 (30 above mean average)
Relative Frequencies – (frequency of event / total number of events)
The probability of a team having over 2,000 rushing yards is 1/6 =.167 = 16.7% chance
The probability of a team having over 1,500 rushing yards on more than 450 attempts
is 2/6 = .333 = 33% chance
The probability of a team having less than 1900 yards on less than 430 attempts is 2/6
= .333 = 33% chance
The Bears lead the rushing yardage category nearly 60 yards above the NFL average,
but are two touchdowns below the NFL average. All in all, a determination that the
Bears have a solid running game is logical. The team has one more touchdown and 118
yards more than its division winner (GB) and nearly 400 more rushing yards and 4
more touchdowns than Super Bowl LVI winner (LAR)
Correlation Coefficient – Between rushing attempts and touchdowns based on scatterplot
X values: 14, 10, 13, 12, 10, 16Y values: 475, 449, 446, 427, 420, 453 X Mean: 12.5 Y Mean:
(X – Mx)(Y – My) = Product of Deviation Scores (45.000 -10.000 + .500 + 9 + 62.5 + 280 = 135
(Y – My) ^2 = Y Sum of Squares = 900 + 16 + 1 + 324 + 625 + 64 = 1930
(X – Mx) ^2 = X Sum of Squares = 2.250 + 6.250 +.250 + .250 + 6.25 + 12.25 = 27.5
Hence, r = 135 Product of Deviation Scores / √(27.5 Sum of Squares – x values)(1930 Sum of
Squares Y values)) = 0.586
Therefore, the correlation between rushing attempts is positive, but moderate at .586. It is less
than 7, but still close enough to it that it is moderate and not weak. Based on this one can see
that more rushing attempts mean more touchdowns and in the Bears case they are ahead or in
the top percentile of both of those categories.
Sacks can be
detrimental in holding
back an offense
The Bears are the
obvious outlier in this
category with 58 sacks.
The next closest data
point is Detroit at 36.
This data is resistant
because when the Bears
data is excluded the
mean average changes
from 37.6 to 33.6. This
is not skewed and the
one outlier does not
change the data set by
too much.
Sacks Allowed – The Bears lead this category with 58 (20.33 above the mean average)
By examining this data it becomes apparent that the Bears do have an issue on the offensive
The issue than becomes which stats can be used to find if there is a correlation between sacks
and offensive production?
The offensive line not only products the quarterback while they are passing, but also creates
holes for a running back to rush through. The bears are last in Passing yards, but first in
To find a correlation
between the number
of sacks and its
impact on offensive
production, the total
offensive yardage of
each team will be
used in a scatterplot.
This data will be
conceived by adding
each teams total
rushing yardage with
its total passing
Relative Frequencies – (frequency of event / total number of events)
The probability of a team having over 30 sacks and over 6,000 total offensive
yards = 2/6 = .333 = 33% chance
The probability of a team having under 30 total sacks is 0%
The probability of having over 7,000 yards of total offense is 1/6 = 16.7%
Despite the Bears having the most total sacks by a team in the entire league,
they still managed to have the most total yardage out of their major competitors
and were almost 2,000 yards above league average (1,813 total). Most of this
comes from the teams high run volume, but the team is getting some significant
protection in order to come up with these numbers. That, or the offensive
playmakers are extremely skilled.
Correlation Coefficient – Between total sacks allowed and total yardage based on scatterplot
X values: 58, 30, 33, 36, 31, 38 Y values: 7653, 6380, 6426, 5770, 6576, 5840 X Mean: 37.667 Y Mean:
(X – Mx)(Y – My) = Product of Deviation Scores (24647.389 + 466.389 + 69.222 + 1118.056 -901.111 -200.278
= 25199.667
(Y – My) ^2 = Y Sum of Squares = 1469348.028 + 3700.694 + 220.028 + 450017.361 + 18270.028 +
361000.694 = 2302556.833
(X – Mx) ^2 = X Sum of Squares = 413.444 + 58.778 + 21.778 + 2.778 + 44.444 + .111 = 541.333
Hence, r = 25199.667 Product of Deviation Scores / √( 541.33 Sum of Squares – x values)(2302556.833 Sum of
Squares Y values)) = 0.7138
Therefore, according to the data there is a moderately positive correlation, but this doesn’t necessarily make
sense as a football analyst since sacks mean a loss of yardage on the play, but this equation states that more lost
yardage equals more total offensive yardage production, which leads me to believe that the correlation may be
weak. Despite this, because of the Bears total yards compared to other teams the O-Line seems to be doing well
based off that statistic alone.
The data in this
Histogram is
spread out
relatively evenly.
The Bears are in
the lower third of
the data by two
The data is
resistant because
there are no
outliers or skewed
Box Plot
The goal of this boxplot is to
compare the three alternatives of
Receiver, Running Back and O-Line
needs by looking at all of the data
from each category side by side.
For passing total yardage is used,
for Rushing total rushing yards is
used and for O-line total sacks
allowed is used.
The Bears Passing game is
obviously at the bottom, but so is
the total sacks allowed. The
Offensive line allowing that many
sacks is detrimental to offensive
Buying time for the quarterback to
be able to make throws is essential
despite providing some good
However, there may be some
qualitative data that can help
explain the passing issues.
Qualitative Data
Most Common Play Call
Chi – Pass
MIN – Pass
GB – Pass
DET – Pass
LAR – Pass
NFL – Pass
Time Of Possession Vs.
Chi – Less
MIN – More
GB – Less
DET – More
LAR – More
NFL – Less
There is no correlation between what a teams
most common play type is and the amount of
possession they have the ball for.
The time per possession also does not
determine a teams success based on the
quantitative data. However, all of the teams
who had more possession time aside from
Green Bay were able to have more possession
than its opponent per game. Therefore, from
this data it can be determined that more
successful passing attempts lead to longer
possession times which can lead to putting
more stress on the other teams offensive
At the end of the day, this basically means that
being able to hold the ball longer gives
opponents less time to score.
Passing the ball stops the clock more often than
running it and the majority of teams that have
the ball for more time pass more than run.
Final Decision
● Based on the findings in this study, both teams that won the Super
Bowl LVI and the NFC North Division had more passing attempts,
passing touchdowns and passing yardage than the rest of the
division and league average. These two teams are winners in their
respective places.
● Therefore, it seems that there is a positive correlation between
placing more attention and resources to the passing game than any
other offensive strategy if a team wants to win in the NFL.
● As a result, the Chicago Bears will select a wide receiver with the
39th pick in the 2023 NFL draft.
Works Cited
FOX Sports. (2022). Chicago bears team game log – NFL. Chicago Bears Team Game Log – NFL | FOX
Sports. Retrieved July 25, 2022, from https://www.foxsports.com/nfl/chicago-bears-team-game-log
National Football League. (2022). Official site of the National Football League. NFL.com. Retrieved July
25, 2022, from https://www.nfl.com/stats/team-stats/offense/rushing/2021/reg/all
Pro Football Reference. (2022). NFL season by Season Team Offense. Pro Football Reference. Retrieved
July 25, 2022, from https://www.pro-football-reference.com/years/NFL/index.htm
Team Rankings. (2022). NFL team opponent time of Possession Percentage (excluding OT). NFL Football
Stats – NFL Team Opponent Time of Possession Percentage (Excluding OT) | TeamRankings.com.
Retrieved July 25, 2022, from
Problem: Quantitative and Qualitative Variables
Stock Market
According to numerous sources, the stock market’s liquidity is crucial because it promotes control
of savings for the lifetime of investments, enabling investors to keep access to their assets for
the duration of the investment. When investors want to change their portfolios, the stock
market’s high liquidity “allows savers to purchase and sell promptly and affordably.” The stock
market’s increased liquidity makes long-term investments more accessible. The liquidity
parameter also has a favorable effect on economic growth by increasing capital’s marginal
productivity. Furthermore, it is asserted that liquidity encourages long-term investment,
information acquisition, and investments. Behavior revealed the harmful effects of increased
market liquidity. Reduction of saving rate and the need for precautionary saving as a result of an
increase in investment return, which may or may not have an impact on economic growth.
Investors may start funding high-return ventures like in response to increased risk diversification.
The savings rate can drop by eliminating risk diversification through an integrated stock market
at the global level, which would therefore reduce economic development and welfare. The
allocation of resources is improved and economic growth is accelerated by risk diversification.
Risk diversification can be quantified using the multifactor international arbitrage pricing model,
which is used to measure stock market integration, or by comparing the size of transactional
equity to the size of the economy using the total value of shares traded on the stock market to
GDP, also known as the stock market total value trading ratio (STR).
Investors assess a company’s financial soundness using quantitative analysis. While some
investors prefer to use only one research technique to assess long-term investments, it is best to
employ a mix of fundamental, technical, and quantitative analysis. The advent of the computer
era led to the development of quantitative analysis, which made it simpler than ever before to
evaluate vast volumes of data quickly. Quantitative trading analysts (quants) recognize trade
patterns, create models to evaluate those patterns, and then utilize the data to forecast the price
and direction of assets. Quants use the data to set up automatic trades of securities after the
models have been created and the information has been acquired. Comparative analysis, which
looks at things like a company’s structure, the composition of its management team, and its
strengths and shortcomings, is different from quantitative analysis.
A specialized trader known as a “quant” uses quantitative and mathematical approaches to
assess financial products or markets. They can assess risks and locate trading opportunities in
this manner. To find trading opportunities and to buy and sell stocks, they employ mathematical
models. The profession has become quite competitive due to the surge of applicants from
academia, software development, and engineering. Quantitative analysis, as opposed to
qualitative analysis, enables a reduced risk procedure through its dispassionate, objective,
numbers-based approach to determining whether or not a financial asset on the stock market is
useful for investors. Investors can use quantitative analysis to help them make investment
decisions from anywhere in the world and for less money than they would pay an expensive
analyst team. Additionally, investors can manage their portfolios in real time, make decisions,
and save time, money, and effort by using the tools while delegating the labor-intensive tasks to
the algorithms.
1. Statistics Exams
I tested for stationary using the Augmented Dickey-Fuller Unit to prevent the biased results
caused by the probable existence of unit roots in our variables. The Johansen-Juselius co
integration approach was used to assess the long-run equilibrium between variables and
determine whether the time series under examination “have a similar stochastic model. In
order to determine whether there is non-randomness in the data, we also performed the
multicollinearity test. Using the min-max approach of linear scaling, parameters were
2. Regression
Human capital as in, DEPTH as the size of the financial system as in relation to the size of the
economy as in, and investments as in were used as the control variables. The net secondary
school enrolment ratio will be used to gauge human capital. Investments will be assessed as
real GDP-related investments, while DEPTH is a measure of liquid liabilities plus demand and
interest for bearing liabilities. With growth as the dependent variable and liquidity, size, risk
diversification, openness, and volatility as the independent variables, we conducted a
multiple regression using ordinary least squares.
3. Visualization of data
Individual graphs will be shown in the paragraph that follows to emphasize the data utilized
in the regression. For Germany, France, Luxembourg, and the United States of America,
graphs show annual percentage changes in the size, risk diversification, and openness of the
national stock market as well as economic growth. Economic development and stock market
openness over the studied period followed a similar pattern in all of the countries. The size
of the stock market and economic expansion also exhibit a similar trend, however the
declines had a bigger impact on economic growth than the trend’s upward movement. Even
more intriguing is the one-year lag of the annual variations between the openness of
economic expansion and the stock market.
4. Quant’s construct their equations using a number of data sources.
Investors can choose whether to invest in a specific asset based on historical investment data,
stock market information, ratios, and cash flow valuations. The trading algorithms and
computer models make unbiased inferences from this data. Since everything is automated,
there is less chance of making rash decisions based on emotional upheaval. Stable algorithms
continuously evaluate each potential investment using the same data sets, such as the priceto-earnings ratio or discounted cash flow valuations. There is significantly less chance of
making hasty or foolish decisions because patterns and numbers are the only determining
Momentum Techniques:
It is important to note that momentum can also exist within a day, even if we categorize it with
a broader time period than a day. For lengthier time frames as well as during the day, traders can
find momentum. The strategies that we examine in the section below have long-term
momentum. A momentum strategy is focused on spotting and adhering to a market price trend.
It is predicated on the idea that an asset’s price will move steadily in one direction until the
strength of the price trend wanes. The trade volume and pace of price change are used to
calculate the momentum. Time-series momentum and cross-sectional momentum are two
different types of momentum.
Temporal Momentum
Time series momentum denotes a correlation between past and present returns. Researchers
use a statistical method to calculate the correlation coefficient of the returns in order to create
a time-series momentum strategy, where the null hypothesis denotes no correlation between
returns. The correlation coefficient of returns might fluctuate between lags, and occasionally the
strongest correlation is found between returns at various lags.
Momentum across Sections
Cross-sectional momentum is based on how two price series compare to one another when it
comes to performance, with one price series outperforming the other. In this kind of strategy
support, the underlying premise is that if one price series outperforms another in the present, it
will probably continue to do so in the future.
Utilizing Momentum Strategies
Technical indications and breakouts can serve as the foundation for momentum strategies. If the
price exceeds the upper Bollinger band, the N-day Moving Average, the Exponential Moving
Average, or a new N-day high, for instance, it may be reasonable to establish an entry signal to
We may deduce from the qualitative approach that the stock market could promote growth in a
very complicated manner, built on a web of economic levers and systems. The procedure is twoway and built on a multivalent logic, making it capable of more intricate structural
transformations. The qualitative approach led to the following inferences: the elements are
arranged in three-field conglomerations; conglomerations have the ability to permute circularly
three letters at a time; and the three fundamental elements are built using the auto orphisms
covered in this work. The qualitative model is equally viable, however it faces competition from
a number of variables.
While endogenous growth models develop their relationship through determination functions,
the relationship within the hexagonal fractalization system is performed through feedback loops
and commutative diagrams, allowing for self-stimulation and self-inhibition of the system as well
as the formation of accumulation zones. Complex interactions, such as those of interrelationship
between various systems, develop inside the fractalized system (initially formed in a hyper cubic
dimension). This makes it possible to monitor the impacts of changing one parameter, even if
those effects take place in a different system. The automation team is enduring as it is preserved
and the cascade was self-initiated and self-sustaining. A relational structure is both sustainable
and unsustainable, and while building the model, it produces economic growth in a qualitative
Quants don’t visit companies, meet the management teams, or examine the items the companies
sell in order to identify a competitive edge, in contrast to typical qualitative investment analysts.
They frequently are unaware of or uninterested in the qualitative characteristics of the
businesses they invest in or the goods or services these businesses offer. Instead, they just use
numbers to determine which investments to make.
In qualitative analysis, judgments are made based on “soft” or immeasurable data. Qualitative
analysis deals with elusive and imprecise data that can be challenging to gather and quantify.
Since intangibles cannot be quantified by numerical numbers, machines find it difficult to
perform qualitative analysis. The foundation of qualitative analysis is an understanding of people
and organizational cultures. Qualitative analysis is aided by knowing a company’s competitive
advantage and viewing it through the eyes of its customers.
1. Corporate responsibility:
Compared to the more straightforward subject of management integrity, corporate governance
has a much wider definition. Corporate governance essentially assesses whether the company’s
senior management is acting in the best interests of the shareholders, particularly the minority
shareholders. A lot depends on factors like disclosure, transparency, management ideals, and
consistency. Companies have recently suffered severe losses as a result of poor corporate
governance. Look at a few examples. After revelations of instances of secretive intra-group
transactions. After the auditors raised concerns and ultimately left under dubious circumstances,
2. Resilience of the business model:
Although it is difficult to put a number on this, we can use an example to demonstrate. Let’s
consider what happened to Nokia as an illustration. People regarded the Nokia Symbian
operating system lacking when Apple released the smart phones in 2007. The Nokia business
strategy was built on the premise that phones would remain voice-only devices and that data
would continue to favor PCs and laptops. That was a serious error of judgment. Consumer
preference quickly switched over the following few years to data use on mobile devices, where
operating systems like Apple IOS and Google Android went on to gain dominance. Simply enough,
Nokia’s business model was insufficiently sound. Consider whether the company’s business
model is reliable enough. Consider the Indian pharmaceutical sector. For 25 years, these
pharmaceutical corporations paid little attention to intellectual property (IP) and concentrated
on using reverse engineering to make generic drugs at a considerably lower price. The majority
of pharmaceutical businesses began to lose value and profits as low-cost nation competition
increased and US consumers became more demanding. Simply put, the business strategy wasn’t
strong enough.
Fundamental Analysis
Qualitative Factors:
Quantitative Factors:
~ BBussiness Model
~ Industry Growth
~ Competitive Advantage
~ Competition
~ Management
~ Customers
~ Corporate Governance
~ Fnancial statement
Quantitative Data
YES: Can be computed
NO: Cannot be computed
Quantitative Data
Quantitative Data
Boston Housing Data
Aniketh Reddy Jakkidi
➢ Real estate market has been expanding over the last 30 years.
New Housing tracts are planned to be constructed in Boston.
Numerous factors like residential land, accessibility to highways
and employment centres, number of rooms effect the price of
Houses in the housing tracts.
➢ The goal is to predict the median house price in new tracts based
on information such as crime rate, pollution, and number of
Data Source
➢ Boston Housing dataset is taken from Kaggle. Dataset contains 506 observations and 14 variables.
➢ The Boston Housing Dataset is a derived from information collected by the U.S. Census Service concerning
housing in the area of Boston. The following describes the dataset columns:
CRIM – per capita crime rate by town
ZN – proportion of residential land zoned for lots over 25,000 sq. ft.
INDUS – proportion of non-retail business acres per town.
CHAS – Charles River dummy variable (1 if tract bounds river; 0 otherwise)
NOX – nitric oxides concentration (parts per 10 million)
RM – average number of rooms per dwelling
AGE – proportion of owner-occupied units built prior to 1940
DIS – weighted distances to five Boston employment centres
RAD – index of accessibility to radial highways
TAX – full-value property-tax rate per $10,000
PTRATIO – pupil-teacher ratio by town
B – 1000(Bk – 0.63)^2 where Bk is the proportion of blacks by town
LSTAT – % lower status of the population
MEDV – Median value of owner-occupied homes in $1000’s
➢ Data Source Link: Boston housing dataset | Kaggle
Distribution of House
price is right skewed,
with most housing
tracts has median value
of 18,000$ to 24,000$.
Box Plot
Housing tracts near to
boundaries of Charles river
has high median value
compared housing tracts
far from Charles river.
Box Plot
Housing tracts near to
highways has higher prices
compared to housing tracts
from highways. But we can
see outliers due to other
factors impact on Median
housing price of tracts
Box Plot
Higher category Housing
tracts have lower pupilteacher ratio, lower
pollution, lower
industrialized and less
people with lower
Scatter Plot
As people with lower
status of population
increases median value
of housing tracts
Scatter Plot
Overall pollution and
median value of
housing tract has
negative correlation
but the amount is
not significant
Descriptive Statistics
Descriptive Statistics like
mean, median, mode,
range, skewness, IQR are
shown. MEDV has mean
value of 22.533 and
skewness of 5.22
Balance of Data set
This data set has
83.4% of lower
median value
housing tracts and
only 16.6% of
higher median
value housing
Based on Grubb’s Test at
95% confidence level
there are no outliers of
MEDV value in the Boston
housing data set
Correlation and Cross Tabulation
MEDV value has highest
correlation with RM (average
number of rooms per
dwelling). As RM increases
MEDV increases.
MEDV has has negative
correlation with LSTAT,
➢ Median Housing tract prices depend mostly on average number
of rooms per dwelling, pupil-teacher ratio by town, full-value
property-tax rate and per capita crime rate by town.
➢ From all visualizations, price has positive relationship with
average number of rooms per dwelling and strong negative
relationship with LSTAT(% lower status of the population)
➢ The amount of variation in median value of housing tract due to
each predictor can be further explored using data modelling.

Purchase answer to see full

error: Content is protected !!