+1(978)310-4246 credencewriters@gmail.com
  

Journal of Applied Sport Management
Volume 6
Issue 2
Article 11
1-1-2014
The Introduction and Application of Sports Analytics in
Professional Sport Organizations
Michael Mondello
Christopher Kamke
Follow this and additional works at: https://trace.tennessee.edu/jasm
Recommended Citation
Mondello, Michael and Kamke, Christopher (2014) “The Introduction and Application of Sports Analytics
in Professional Sport Organizations,” Journal of Applied Sport Management: Vol. 6 : Iss. 2.
Available at: https://trace.tennessee.edu/jasm/vol6/iss2/11
This Article is brought to you for free and open access by Volunteer, Open Access, Library Journals (VOL Journals),
published in partnership with The University of Tennessee (UT) University Libraries. This article has been accepted
for inclusion in Journal of Applied Sport Management by an authorized editor. For more information, please visit
https://trace.tennessee.edu/jasm.
Journal of Applied Sport Management
Vol. 6, No. 2, Summer 2014
The Introduction and Application of Sports
Analytics in Professional Sport Organizations
A Case Study of the Tampa Bay Lightning
Michael Mondello
Christopher Kamke
Abstract
While professional sports organizations continue to seek techniques to augment
their on-field success, the growth of sports analytics has concurrently become increasingly competitive and complex. However, despite these recent developments
and availability of data, much of the information shared between organizations,
academicians, and practitioners is often limited and anecdotal. In this paper, we
sought to provide a brief overview of analytics and subsequently share several best
practice examples of how one National Hockey League (NHL) franchise, the Tampa Bay Lightning, integrates analytical techniques into several core business entities. An organizational emphasis on Customer Relationship Management (CRM)
provides management with valuable data about their customers’ current purchasing habits and potentially may predict future purchases. In addition, analytical
techniques have assisted the organization in developing and implementing both
dynamic and variable ticket pricing strategies to procure additional revenues. We
conclude the paper with suggestions for future research and applications.
Keywords: sports analytics; technology; innovation
Michael Mondello is the Associate Director of the Sport and Entertainment Program at the
University of South Florida.
Christopher Kamke is the Director of Business Strategy and Analytics for the Tampa Bay
Lightning.
Please send correspondence regarding this article to Michael Mondello, mmondello@usf.edu
1
Sports Analytics
Introduction
Although analytics is often associated with player personnel decisions and
roster movements, professional sport teams are now relying on analytical techniques to confirm or predict answers to questions related to ticket pricing strategies, sponsorship return on investment, and customer relationship management.
In today’s complex business environment, analytics has become an important tool
for organizations. However, as noted by (Phillipps, 2013) in a research study involving more than 100 surveys and in-depth interviews with senior management
representing 35 companies worldwide, 96% of the respondents indicated analytics
will become more important in operating their organizations in the next three
years. This suggests organizations should continue to develop innovative strategies to successfully implement analytic techniques within their business model or
potentially risk losing valuable market share.
While data can serve as an incredibly valuable resource, the utility of data is
largely dependent on how well it is analyzed and more importantly communicated
to a broader audience. An underlying tension between subjective and objective
data extends beyond sport organizations and can be found in other academic disciplines including behavioral economics, management, and finance (Wolfe et al.,
2007). Today, analytics are utilized in various industries with multiple applications and successfully using analytical techniques appears to be a combination of
science and art. As noted by Rivera (2012), effective analysts have the ability to
combine a mixture of art and science with intuition to help drive decision making. For example, Internet giants Google and Amazon gather significant data from
online web searches and retail consumption to tailor specific offerings to current
and potential customers. This is just one example of how analytics has added
value—summarizing data, interpreting the findings, and subsequently utilizing
these findings to decipher patterns and help forecast future tendencies. While college athletic departments and professional sports organizations respectively have
implemented analytical techniques to help with ticket pricing, customer service,
game strategy, and player personnel decisions only in the last couple of years has
this information been disseminated publicly.
The purpose of this paper is to showcase how one National Hockey League
(NHL) team, the Tampa Bay Lightning, has successfully incorporated analytics
within several organizational departments. In addition, we review analytics in
both sport and non-sport contexts and how analytics has assisted the Lightning
front office in business decisions related to customer relationship management
and ticket pricing. We conclude the paper with a few suggestions of how analytics
can be incorporated in the future.
2
Mondello and Kamke
Literature Review
Analytics
According to Davenport and Harris (2007) analytics can be classified as descriptive, predictive, or prescriptive. Descriptive analytics incorporates gathering
and organizing of data and then detailing the qualities of the data. While this
analysis has merit, descriptive analytics provides no information about why something happened or what may occur in the future. Next, predictive analytics incorporates previous data to assist with forecasting future trends. While predictive
analytics are useful for predicting trends, one cannot assume any explicit cause/
effect relationship. Therefore, prescriptive analytics including methods such as
optimization and experimental design provide an additional layer of analysis by
offering suggestions for implementing solutions to problems.
Davenport and Kim (2013) identified three major stages of analytical thinking: framing the problem, solving the problem, and communicating and acting
on the results. Furthermore, each stage is comprised of various steps necessary
to achieve a desired outcome. The initial stage, framing the problem involves two
steps: problem recognition and reviewing the previous findings. This stage is intuitively important because if the problem is framed incorrectly, all subsequent
analysis becomes significantly less valuable. Framing the problem often involves
hypothesis development within a given a set of constraints. A review of previous
findings is analogous to the literature review section of an academic paper. Specifically, scholars often develop new research questions predicated on previous
related research.
Solving the problem represents stage two and encompasses three distinct
steps: modeling, data collection, and data analysis (Davenport & Kim, 2013). In
this stage, the researcher identifies the variables to include in the model, how they
will collect the data, and how the data will be analyzed. Finally, stage three involves how the results should be presented and the subsequent actions needed
to implement the recommended course of action. Furthermore, the third stage is
just as important as the first two stages but invariably is not given the appropriate
attention to detail. Results that are ineffectively communicated to their respective
audiences are limited in value.
As an example of how important analytics can be to the success or failure of a
business, we provide an example from the insurance industry. As a major part of
their business model, Assurant Solutions sells credit insurance, debt protection,
and competes for market share in the highly competitive credit insurance business
where customer retention remains a significant industry problem. Although the
company’s 16% retention rate was consistent with industry standards, Assurant
nevertheless experienced five out of six customers dropping their coverage and essentially ignoring their other products (Hopkins & Brokaw, 2010). Although they
were analyzing the key to keeping customers loyal, they did so with the wrong
3
Sports Analytics
approach. Consequently, Assurant’s leadership decided to implement a new analytical strategy.
The company invited professionals from various outside industries to offer
suggestions on how they could improve their customer retention rate. With additional analysis, Assurant recognized some of their customer service representatives were superior at dealing with certain types of clients—essentially by matching the skill set of the employee with specific customer issues, their retention rate
nearly tripled. Although the applied science and technology could not explain
why something happened, by examining past experience, Assurant accurately
predicted when a successful call would occur. Consequently, Assurant was able
to receive significant guidance from outside constituents including actuaries and
mathematicians to significantly improve their customer retention rate.
Sports Analytics
In 2003, best-selling author Michael Lewis published Moneyball, detailing
how a small-market Major League Baseball (MLB) team, the Oakland A’s, utilized
various statistical techniques to assemble a competitive team without the luxury
of a high payroll relative to other Major League Baseball teams. In essence, General Manager Billy Beane recognized the value of utilizing player performance
data to drive organizational decisions involving player valuations and strategies on
the field. Although sport business scholars and practitioners often identify Lewis’s
work as one of the first analytical approaches to uncovering how statistical applications could be integrated into managerial applications, baseball historian Bill
James compiled baseball statistics into annual baseball abstracts during the 1970s
(Hanchett, 2012). Baseball provided a laboratory for statistical analysis given the
series of quantifiable individual events. Other historians suggested integration of
statistical analysis happened years earlier when legendary baseball pioneer Branch
Rickey commissioned statistical analysis in the 1940s when he served as general
manager of the Brooklyn Dodgers (Dizikes, 2013).
While the utility of analyzing data to increase new business development has
been successfully integrated within the professional sports industry, franchises
continue to explore different analytical data techniques to drive decision making.
Consequently, sport organizations have recognized the added value new technologies present. Leaders in sport business and sales are becoming increasingly savvy
with analytics. Data-driven analysis has become a competitive advantage in driving business strategies and challenging the industry overall to invest resources or
risk the possibility of falling behind the competition.
One area of sport business research that continues to remain elusive centers
on how to accurately quantify the expected return on investment (ROI) involving
corporate sponsorships. For example, both national and local brands may elect
to engage as a corporate sponsor involving arena/dashboard signage, activation,
program advertisements, or even naming rights of the facility. Considering some
companies allocate 25% or more of their sponsorship budgets toward sporting
4
Mondello and Kamke
events, understanding how the return on this invested capital impacts the overall
bottom line of the organization is significant (Meenaghan & O’Sullivan, 2013).
Titlebaum, Lawrence, Moberg, and Ramos (2013) conducted qualitative research with 15 decision-makers at Fortune 100 firms to assess how they used premium seating as part of their overall marketing efforts. When asked specifically
if their firm implemented a type of ROI analysis, most organizations surprisingly
indicated they did not. Moreover, several respondents suggested the metric of
interest to them was not necessarily ROI, but ROO (return on objectives). Furthermore, for firms using ROI to measure success, the process tended to be more
informal in nature. One respondent posited that because the sales cycle typically
extended to 12 months, there were underlying constraints in measuring sponsorship ROI.
Meenaghan and O’Sullivan (2013) suggested additional research is needed to
provide answers to how businesses measure sponsorship effectiveness and when
they attempt to do so, do they utilize appropriate metrics. Citing a recent U.S.
Sponsorship survey, 20% of the respondents were unable to determine if their ROI
changed in any direction from sponsorship activities. In addition, within the same
study, over a third of the participants (34%) confessed to not measuring sponsorship returns. Based on these deficiencies, they argued a “measurement deficit” subsists within sport sponsorship. Given this call to action, Meenaghan and
O’Sullivan examined two popular sponsorship metrics, media exposure and sponsorship awareness, respectively. After analyzing several examples of how the same
sponsorship program was evaluated quite differently suggested there were notable
credibility issues related to sponsorship effectiveness. Yet, questions remain as to
the best metric to measure these benefits. Several scholars have examined this issue with varying degrees of success. As metrics capable of successfully assessing
sponsorship effectiveness continue to be developed, both sponsors and sport organizations will seek greater clarity on how to effectively measure this relationship.
The Orlando Magic of the National Basketball Association (NBA) is recognized as an industry leader for utilizing analytics within several departments.
Throughout the 2012–13 season, the Magic implemented a unique fan promotion
with restaurant sponsor Tijuana Flats. During any home game, ticket holders were
eligible to win a free taco when the Magic made at least 10 three-point field goals.
While this type of promotion at face value does not appear to be distinctively
different than other types of restaurant sponsorships, there was one key differentiator. After Magic fans redeemed their game ticket in exchange for the taco, Tijuana Flats then returned the tickets back to the organization for further analytical
analysis allowing the team to gather additional consumer information (Simmons,
2013).
Once the redeemed tickets were returned, the Magic organization followed
up with consumers and subsequently collected valuable consumer information
to help quantify the elusive ROI. Among their findings: 26% of the consumers
5
Sports Analytics
had never visited a Tijuana Flats restaurant, 66% would not have visited without
the promotion, and 85% indicated they would visit again. The Magic could now
tangibly provide the sponsor with several ROI metrics linked to the promotion
and identify strategies to increase awareness and ultimately revenues (Simmons,
2013).
Jensen and Cobbs (2014) noted the absence of established metrics and pricing data for nontraditional marketing techniques complicates assessing ROI. They
analyzed the ROI for sponsorship in Formula One racing by examining sponsorship prices and the exposure generated from television. Collectively, their results
suggest a link between team performance and brand exposure. Specifically, sponsor ROI was more likely to be positive as team performance increased.
In May 2013, social media giant Twitter unveiled a new in-stream advertising program aptly named Twitter Amplify. This platform allows content owners
the opportunity to link video clips and connect them with sponsors. Collectively,
the revenue is divided between the content owner, distributor, and Twitter. The
most important contribution of Twitter Amplify could be developing a revenue
strategy for sport organizations through social media (Fisher, 2013). While sport
consumers have turned to social media as a way to keep engaged with their favorite team/players, a similar issue to traditional sponsorship remains: How to
assess advertising effectiveness and more specifically quantify the ROI. Although
not without criticisms, other media content including television and radio, have
historically relied on either Nielsen or Arbitron ratings as a primary way to assess
and establish fair market advertising rates. As the continued growth of social media platforms expands, further development of social media analytics is likely to
transpire.
Applications
With regard to data analytics, the Tampa Bay Lightning’s main focus is centered on making data-driven decisions. This focus has been a significant contributor to the overall brand transformation over the past three years. Specifically,
the organizational transformation fueled by a strong focus on data analytics and
technology has included a new team logo, a $60 million privately financed arena
renovation, a renewed focus on customer service, and a commitment to giving
back to the Tampa Bay community through philanthropic and volunteer efforts.
To support the strategic planning behind a successful transformation, executive leadership understood the significance of community and team support. In
addition, to incorporate outside viewpoints that would assist in developing a more
complete plan, the Lightning organization formulated several focus groups. These
focus groups were designed to glean additional information and understanding
on what the team, brand, and hockey meant to Lightning fans, the Tampa community, current and past players, coaches, and other stakeholders. Through these
focus groups, important elements to the targeted constituents surfaced and were
6
Mondello and Kamke
subsequently integrated into the tangible and intangible transformation. Similarly,
the Lightning used data, analytics, and technology to advance two significant areas related to the success of the organization: the development and strength of
customer relationship management (CRM) and the identification of new revenue
opportunities. Each of these segments will be examined in further details.
Technology
This emphasis on data has encouraged developments in CRM strategies.
Technological growth within various digital and mobile platforms assist in making data collection easier and more seamless than previous periods. Concurrently,
sport consumers have higher expectations for smart, relevant marketing. With
increased competition within teams, rival leagues, and other entertainment options, attracting and retaining consumers is especially important for the long-term
viability of professional sports teams. Subsequently, understanding who the consumer is and furthermore establishing a deeper relationship with their individual
preferences is important.
As the sport industry develops over the next few years, teams must offer individualized marketing strategies or run the risk of losing valuable clientele. Therefore, for sport business to succeed in one-to-one marketing, organizations must
invest in CRM solutions and strategies based on accurate data. Consumers, especially sport fans, possess unique attributes and consequently should be targeted
and marketed to reflect these characteristics. A strong CRM strategy can help a
team acquire a complete and authentic perspective of each consumer and ultimately understand how to effectively communicate with them.
The Lightning franchise utilizes several integrated software solutions and
third-party vendors to manage the collection, organization, and storage of consumer demographic and behavior data into a centralized Microsoft CRM platform. Daily data feeds and standard reports inform the marketing, sales, and service teams about new consumers, consumer attributes, and segmented consumer
demographics. One example is qualifying and prioritizing prospects for the sales
team. Each night, all new ticket purchase customers have Acxiom demographic
data appended to their customer record. Acxiom is an enterprise data, analytics,
and software-service company uniquely building trust, experience, and scale to
fuel data-driven results. The customer record is then put through two lead scoring
models. One model predicts how likely each customer will eventually purchase a
Lightning ticket plan and the other model predicts, if the customer buys any kind
of ticket, how much the customer is expected to spend. Each model provides a rating on a 1 to 5 scale, indicating to the sales team which customers should be given
higher priority in their sales campaigns. Another less frequent method is used
to determine which current season ticket members are most likely to lapse their
season ticket membership. The lapse ratings are used by the season ticket member service team to prioritize their time and resources when addressing customer
7
Sports Analytics
issues. Collectively, by consistently reviewing data reports, management can encourage their respective teams to make data-driven decisions on consumer interactions. Ultimately, this analysis can positively influence the consumer experience
and augment the Lightning’s financial position.
Data retention and integration have helped the Lightning organization increase sales transactions to unprecedented levels. For example, as part of a loyalty initiative, the Lightning offered free customized team jerseys to season ticket
members. This type of program rewards both the team and the fans. For the organization, fans wearing jerseys to the games helps foster a home-team atmosphere
within the arena and on television. Other guests attending games or watching on
television see these jerseys and may subsequently inquire about becoming season
ticket holders. As an incentive for season ticket holders to wear their jerseys to
games, they received substantial discounts on concessions and merchandise. This
initiative was made possible due to a radio frequency identification (RFID) chip
sewn into the jersey sleeve capturing all in-arena transactions made by the season ticket member. Each season ticket member is made aware this information
is being collected to help enhance the overall fan experience. This information
provided new insight on individual consumer purchasing habits and preferences.
In addition to providing discounts to their season ticket members, understanding consumer preferences empowers the Lightning to create both targeted
concession and merchandise offers at the individual consumer level based on
previous transactions. This becomes extremely valuable when marketing specific
products. For example, one application of this data analysis would be a sponsorship activation program rewarding fans for purchasing sponsorship products during Lightning games. Since the Lighting franchise features an in-arena Outback
Steakhouse location, the team can send a notification to all season ticket members
who purchased from Outback Steakhouse during the game, thanking them and
providing a coupon good for a free appetizer at a local Outback Steakhouse restaurant. The added ability to accurately market at the individual consumer level
creates opportunity to provide added value to targeted customers.
The Lightning management uses technology and analytics to identify revenue
opportunities and take a more strategic and methodical approach when addressing business issues. Consequently, the Lightning organization addresses business
solutions in carefully implemented stages. Initially, the team defines the problem.
After the problem is clearly defined, the organization develops the action plan,
identifying key measurements for success. Finally, the team executes the plan by
collecting and analyzing data to derive and validate insights.
Ticketing Analytics
Adopting this new approach of using data to address problems and discover
innovative opportunities requires a willingness to change established practices.
One area the Lightning organization has relied on extensive analytics involves
8
Mondello and Kamke
ticket pricing. For example, based on existing ticket sales, the Lightning organization theorized ticket demand for seat inventory in variably priced sections of the
arena was not uniformly proportionate. Consequently, if this assumption was accurate, ticket demand based on current pricing and inventory structure resulted
in either lost or undervalued sales. To test this hypothesis, the Lightning analyst
analyzed ticket demand based on individual arena seating sections. Through their
analysis they confirmed their hypothesis, and by the following season, the organization adjusted prices and altered the boundaries confining seat locations to price
sections. Accordingly, by increasing supply to match market demand, the organization experienced an influx of new business into these adjusted arena sections.
Inventory management represents another level of focus where the Lightning
used data analytics to drive ticket revenue. Group tickets represent an important
aspect of ticket sales and help many teams secure both a large volume of revenue
and distributed tickets. However, because group tickets are typically discounted,
seats sold at a group ticket rate that otherwise would sell at the standard ticket rate
represent lost revenue. To help minimize lost revenue, the Lightning franchise has
established both game-by-game and section-by-section inventory levels. These
inventory levels were derived from historical ticket sales and current ticket sales
trends. Both of these practices helped price future season tickets and set ticket inventory levels at each price point. This new practice facilitated maximizing overall
ticket sales revenue.
The organization also uses analytics to estimate the demand for tickets on a
game-by-game basis. Statistical analysts incorporate regression modeling to forecast ticket sales and attendance demand on a game-by-game level. This modeling
identifies significant factors affecting sales and attendance. For example, through
modeling, the Lightning analyst identified the month of March as a significant
factor affecting ticket sales. A Lightning game played during the month of March
yielded “X” more tickets than a game against a comparable opponent played in
December. This practice, conducted monthly prior to the season, allows the Lightning to variable price tickets based on individual game demand and then target
promotions accordingly. Variable ticket pricing conducted prior to tickets going
on sale is a static solution for market demand. Because demand for sporting events
is knowingly influenced by current events, the sport industry has adopted dynamic ticket pricing strategies similar to the airline industry. Dynamic pricing is
a valuable strategy to help sports teams address changes in market demand and
compete against the secondary ticket market businesses including ecommerce
websites StubHub and TicketsNow.
Through daily dynamic pricing, measurements on several factors such as online search frequency or secondary ticket market ticket transactions are analyzed
to measure variations in team interest. With these measurements, management
can decide if an increase in ticket price is justified by demand by targeting specific
sections or individual seats. Conversely, the Lightning can also lower ticket prices
9
Sports Analytics
to avoid being drastically undercut by the secondary ticket market when ticket
demand decreases. By applying demand modeling, the Lightning use both proactive and reactive analytics to drive both primary and ancillary revenues as well as
support community and consumer initiatives.
Sport teams today face almost overwhelming amounts of data, complex and
disparate systems, and a multitude of consumer behaviors. Data and digital technologies provide unprecedented amounts of opportunities for sport teams to
understand consumer needs, preferences, and behaviors. In this environment of
surplus data, effectively gathering actionable information is a key factor in operating a successful business. Actionable data allows teams to implement data-driven
strategies across the organization.
Future Directions
The future direction of sports analytics and technology appears to be expanding in both scope and breadth. To fully capitalize on this trend, we would recommend that sport organizations invest in integrated and automated approaches
encouraging large-scale data processes and create better and more relevant interactions with consumers. This would also include a call to hire trained statistical
analysts to work closely with senior management.
Investments in technology by sport teams continue to grow as evidenced by
the recent move in the National Basketball Association (NBA) to outfit each of
their 30 arenas with specialized cameras installed to help capture thousands of
data points within each game. Specifically, six motion cameras capable of filming
the game at 25 frames per second provide team analysts with unprecedented data.
Furthermore, the league has made this a mandatory policy and all of the available
data is openly shared with the public on NBA.com (Goldich, 2013).
From ticket operations to social media measurement solutions, teams continue to increase their analytic capabilities. While this growth has largely produced
positive results, it has also resulted in many teams possessing technology systems
with highly complex architecture loosely stitched together. As the complexity of
the architecture increases, data becomes increasingly difficult to fully leverage.
One primary goal of investing in technology is to provide stakeholders with data
and insights they can leverage to enhance organizational effectiveness.
Recently, the Lightning partnered with TIBCO, a company specializing in
data management and real time pricing, to utilize their “Spotfire” data visualization software to facilitate the next step in the organization’s integrated use of data.
Spotfire will permit the Lightning to integrate multiple data sources and automatically connect, source, and process data in real time to create business dashboards
and provide data insights across the organization. Previously, the Lightning have
faced time-consuming challenges by manually and repetitively merging data from
ticketing, database marketing, point of sale, and other sources. Moving forward,
the team aspires to share and integrate data and insights more efficiently.
10
Mondello and Kamke
Within the last decade, the sports industry has identified one key to future
business success is developing a greater understanding and ability to serve its respective fans. Previous research has consistently demonstrated the passion and
loyalty sports fans have towards their team. While sport organizations have a general sense of their fans, only a select few if any, can realistically proclaim they possess a true 360-degree understanding of their customers. By engaging fans more
effectively, sport organizations such as the Lightning believe they can acquire,
strengthen, and retain customer relationships. Data analytics, technological advances, and system integration will all be important factors as sport teams look to
expand their customer views to 360 degrees.
Although data and technology can introduce additional complexity and complications to an organization, the Lightning management views them as opportunities. The sport teams addressing and also conquering data through technological initiatives will be the teams positioned to lead sport business innovation in the
future. Finally, we strongly encourage sport business/management faculty to creatively think about different strategies to successfully incorporate projects into the
classroom potentially benefitting practitioners. As noted by Sutton (2012), assignments focusing on business analytics, marketing, and sales would likely be well
received by sport organizations looking to improve their operations. Furthermore,
when students first hand observe how research can be a useful tool for identifying
and providing viable solutions to sport organizations they may then continue this
practice upon entering the industry.
References
Davenport, T., & Harris, J. (2007). Competing on analytics. Boston, MA: Harvard
Business School Press.
Davenport, T., & Kim, J. (2013). Keeping up with the quants. Boston, MA : Harvard
Business School Press.
Dizikes, P. (2013). How numbers can reveal hidden truths about sports. Retrieved
from http://web.mit.edu/newsoffice/2013/how-numbers-can-reveals-hiddentruths-about-sports-0301.html
Fisher, E. (2013). Signs point toward more revenue. Retrieved from http://www.
sportsbusinessdaily.com/Journal/Issues/2013/06/03/In-Depth/Social-revenue.aspx?hl=fisher%20&sc=0
Goldich, M. (2013). Race for data feeds the sports analytics revolution. Retrieved
from http://www.huffingtonpost.com/mitch-goldich/sports-data-analyticsrevolution_b_4436794.html
Hanchett, D. (2012). Playing hardball with big data: How analytics is changing the
world of sports. Retrieved from: http://www.emc.com/campaigns/global/bigdata/human-face-of-big-data.htm
Hopkins, M., & Brokaw, L. (2010). Matchmaking with math: How analytics beat
intuition to win customers. Retrieved from: http://sloanreview.mit.edu/themagazine/2011-winter/52206/matchmaking with math
11
Sports Analytics
Jensen, J., & Cobbs, J. (2014), Analyzing return-on-investment in sponsorship:
Modeling brand exposure, price and ROI in Formula One Racing. Journal of
Advertising Research, Forthcoming. Available at SSRN: http://ssrn.com/abstract=2322589
Lewis, M. (2003). Moneyball: The art of winning an unfair game. New York: W. W.
Norton.
Meenaghan, T., & O’Sullivan, P. (2013), Metrics in Sponsorship Research: Is Credibility an Issue? Psychol. Mark., 30: 408–416. doi: 10.1002/mar.20615
Phillipps, T. (2013). The analytics advantage: We’re just getting started. Retrieved
from https://www.deloitte.com/view/en_GX/global/services/deloitte-analytic
s/3dc8095c436fe310VgnVCM1000003256f70aRCRD.htm#.Uhyaxj-wV8E
Rivera, R. (2012). What makes analytics wizards so good? They do everything
backwards. Retrieved from http://www.forbes.com/sites/sap/2012/10/05/
what-makes-analytics-wizards-so-good-they-do-everything-backwards/
Simmons, M. (2013). Business analytics: Retention rate & sponsorship activation.
Retrieved from: http://sportsanalyticsblog.com/articles/business-analyticsretention-rate-sponsorship-activation–3
Sutton, W. (2012). Academia and the Sports Industry. In Gillentine, A., Baker, R.,
& Cuneen, J. (Eds.), Critical essays in sport management (pp. 115–124). Scottsdale, AZ:Holcomb Hathaway.
Titlebaum, P., Lawrence, H., Moberg, C., Ramos, C. (2013). Fortune 100 Companies: Insight into Premium Seating Ownership. Sport Marketing Quarterly.
Fitness Information Technology Inc. 2013. Retrieved from http://www.highbeam.com/doc/1P3-3015051431.html
Wolfe, R., Babiak, K., Cameron, K., Quinn, R. E., Smart, D. L., Terborg, J. R., &
Wright, P. M. (2007). Moneyball: A business perspective. International Journal
of Sport Finance, 2(4), 249–262. Retrieved from http://search.proquest.com/
docview/229347066?accountid=14745
12
International Journal of Data Science and Analytics (2018) 5:213–222
https://doi.org/10.1007/s41060-017-0093-7
TRENDS OF DATA SCIENCE
Sports analytics and the big-data era
Elia Morgulev1,2 · Ofer H. Azar1 · Ronnie Lidor2
Received: 9 August 2017 / Accepted: 28 December 2017 / Published online: 9 January 2018
© Springer International Publishing AG, part of Springer Nature 2018
Abstract
The explosion of data, with large datasets that are available for analysis, has affected virtually every aspect of our lives. The
sports industry has not been immune to these developments. In this article, we provide examples of three types of data-driven
analyses that have been performed in the domain of sport: (a) field-level analysis focused on the behavior of athletes, coaches,
and referees; (b) analysis of management and policymakers’ decisions; and (c) analysis of the literature that uses sports data
to address various questions in the fields of economics and psychology.
Keywords Big data · Data analytics · Decision making · Sports · Psychology · Economics
1 Introduction
Sport is an important endeavor in the lives of many people. One reason is that many of them are engaged in sport
as a way of exercising and improving their health and life
style. Another reason is that watching and keeping track
of professional sports is a major activity shared by both
young and adult individuals. People all around the globe
watch sports on television, many on a daily basis. In addition,
sports fans tend to be highly involved, reflecting on coaches’
decisions, comparing players’ metrics, and predicting outcomes of games and the final ranks of individuals and teams
playing in competitions. Many newspapers contain a regular
sports section, and entire television channels are devoted to
both individual and team sports. Major sport competitions,
such as the Olympic Games, the World Cup in soccer, or
the World Championships in basketball and swimming, are
among the most popular events worldwide. Billions of dollars are involved in the various aspects of the sports industry,
from the cost of game tickets to payment for broadcasting
licenses, salaries of top players, and advertising.
In his book on the promise and the pitfalls of big data, Nate
Silver [51] elaborated on the possibilities of performance
B Ofer H. Azar
azar@som.bgu.ac.il
1
Department of Business Administration, Guilford Glazer
Faculty of Business and Management, Ben-Gurion University
of the Negev, Beer-Sheva, Israel
2
The Academic College at Wingate, Wingate Institute,
Netanya, Israel
assessment and sport scouting that have been unleashed in
the big-data era. Silver initially gained his reputation when
he succeeded in determining causality and separating skill
from luck, aggregating extensively large datasets on major
league baseball players’ performance. Variables that predict
future performance were elicited by Silver based on analysis
of thousands of players during more than five decades in the
major leagues. This effort enabled him to estimate predictors’
parameters and to devise a forecasting model that tends to
outperform expert scouts. Silver speculated that baseball may
offer the world’s richest dataset, where just about everything
that has happened on a major league field in the past 140
years has been accurately recorded and is now available for
analysis.
Since data from different sport events have been regularly recorded for many years, and are often available to the
public at large (entire games are recorded on video in addition to the quantitative data that are retained in datasets),
the domain of sport provides a uniquely authentic arena for
exploring research ideas. In addition to the big-data characteristic of sport, other factors also contribute to sport being an
excellent source for analytics and research, particularly those
concerning certain aspects of human behavior. The rules of
the games in sport are clear and well defined. The players
in professional sports are considered to be experts, and are
offered large incentives to perform the best that they can.
Differences between sports (for example, individual sports
versus team sports) create a variety of situations, each allowing the assessment of different aspects of performance. Sports
being prevalent worldwide allows for global analysis, either
123
214
International Journal of Data Science and Analytics (2018) 5:213–222
using comparisons between countries, or aggregating data
from different parts of the world. Patterns of behavior in
sport can often provide insights about all types of human
behavior, because universal phenomena that affect human
behavior in general will often be reflected in sports behavior
as well.
The data for research in sports are provided in many cases
by companies that specialize in measuring and coding sport
performance with an eye for selling customized packages of
information for clubs, associations, broadcasters, and academic researchers. For example, these companies may assist
the team’s scouting staff with detailed information on soccer or basketball players’ performance in every league on
the globe that plays on a professional or semi-professional
level. The purpose of this article is to provide examples of
data-driven analysis in sports, sometimes with implications
outside sports as well.
2 Sports analytics defined
Sports analytics is the investigation and modeling of sports
performance, implementing scientific techniques. More
specifically, sports analytics refers to the management of
structured historical data, the application of predictive analytic models that use these data, and the utilization of
information systems, in order to inform decision makers and
enable them to assist their organizations in gaining a competitive advantage on the field of play ([1], see also [39,40]).
Historical data can be either quantitative or qualitative; these
data are typically collected from multiple sport-relevant
resources, among them biographical data, films/videos, boxscore performance data, medical reports of the athletes, and
scouting reports. The collected data are standardized, centralized, integrated, and analyzed using different metrics. It
is assumed that a reliable and systematic analysis of the data
will enable coaches, athletes, and policymakers to strengthen
their decision-making processes. A sports analytics framework is described in Fig. 1.
3 Development of field-level oriented
analysis
Sports analytics originated in the 1960s in the USA, where
American football and basketball were analyzed using coded
notes (i.e., notational analysis) [25]. Notational analysis is
an objective way of recording performance, so that critical
events in that performance can be quantified in a consistent
and reliable manner [25]. Such analysis enables the coach and
the manager to objectively assess competitive performance,
and therefore to improve it.
Another popular American sport is the game of baseball.
Baseball is a less dynamic game than football or basketball,
and as a result it is more convenient to break down into distinct events to be analyzed. It was in baseball where the first
platform for statisticians to work with individual and team
performance data (box score) was developed, during the second half of nineteenth century. By 1971, a group of baseball
analysts founded the Society for American Baseball Research
(SABR) [21].
In the 1950s, in England, a retired Royal Air Force Wing
Commander and an amateur statistician named Charles Reep
began to analyze the number of passes in soccer that led
to a goal, alongside the field positions where those passes
originated. Reep’s work led to the one of the first scientific
publications in sports analytics [48] and constituted the “long
ball” style of play which for decades stamped its mark on
English soccer.
Yet gathering sports’ data and conducting a comprehensive analysis was an extremely time-demanding task during
the pre-computerized era. For example, the first hand notational system developed for tennis was never actually used
due to its complexity, whereas another analysis system developed for squash took five to eight hours to master and an
additional 40–50 h to analyze the data from a single game
[30].
Technological developments in the 1980s enabled gradual computerization both of the data-gathering process and
Fig. 1 A sports analytics
framework
Information
gathering
Data
management
Data
analysis
Decision makers –
coaches,
players,
policymakers
Multiple resources –
Quantitative and
qualitative data
123
Standardization,
centralization, and
integration
Using metrics
International Journal of Data Science and Analytics (2018) 5:213–222
of its analysis. Computerized versions of notational analysis for tennis and squash were implemented by the end of
that decade [30]. In 1989 David Smith, a biology professor,
founded Retrosheet, a nonprofit organization aimed at computerizing the box score of every major league baseball game
ever played, in order to analyze the statistics of the game
[19]. Smith drew on the previous work of another well-known
baseball enthusiast, Bill James, who became frustrated about
the major league administration’s refusal to publish play-byplay game accounts, and therefore initiated the Scoresheet
project—a network of fans who would collect and distribute
this information. Insights from this collaborative effort by
Smith, James, and other members of SABR society were
implemented by the professional staff of the Oakland Athletics franchise, where a more quantitative approach to baseball
was put to use in the 1990s. Billy Beane took over as general
manager of the Oakland Athletics in 1997, and capitalizing
on statistical approaches he was able to assemble a highly
efficient team. These events were later popularized in the
best-selling book [35] and movie, Moneyball.
From the mid-1990s, professional sports gradually entered
the big-data era. As an example, throughout the 1995–1996
National Basketball Association (NBA) season, Advanced
Scout software was distributed to 16 NBA teams. The raw
data from games were initially collected using a unique system designed for logging basketball data. Information was
collected on various defensive and offensive variables of the
game, among them the number of players’ shot attempts,
type of shots taken, and the number of rebounds taken by
players. At the end of each game, the data were uploaded
to an electronic bulletin board, and a team could download
their own data or the data of any other team from this billboard. The Advanced Scout software was able to seek out and
discover meaningful patterns in the game [9]. A number of
years later, in the 2003–2004 NBA season, data on players’
shot attempts were already publicly available on sites such
as espn.com [49], and therefore could be used for systematic
analytics. For example, researchers who analyzed 1270 shot
attempts made by one player in the above-mentioned season
discovered useful predictors of both shot location and field
goal percentage, and proposed a new statistical model for
analyzing basketball shot charts [49].
In soccer, teams in the English Premier League (EPL)
became relatively advanced in terms of performance analytics, and some of them have even made performance data
available to fans for open-source analysis [21]. However,
when compared to basketball, assessing players’ ability to
score in soccer is hindered by the low frequency of scoring
events. Tactical factors, such as number and length of possessions, passing sequences, and spatial analysis of the territory
played are aggregated in order to optimize performance in
offense [21]. A specific example of how players and coaches
may benefit from the assessment of large samples of events
215
in soccer is the information on probable directions of penalty
shots, based on the shooters’ previous statistics provided by
the analysts to the goalkeepers before critical matches [42].
In the 2010s analysis of video data became possible
across many professional sports. For example, in the NBA
a camera system (SportVU), originally based on Israeli
missile-tracking technology, became mandatory in all arenas. This system is hung above the court, and records the ball
and players’ movement data. In baseball, PITCHf/x, HITf/x,
and FIELDf/x video systems are used to capture and analyze
pitching, hitting, and fielding, respectively. Video analytics
systems in basketball are able to produce huge datasets of
players’ movements, ball touches, rebounds, and shot locations. Nowadays, the amount of data is apparently larger than
the capability to extract possible insights from it [21].
3.1 Implications of big-data analytics in the field
One notable impact of studies analyzing big-data files is the
transition toward the three-point shooting style of play, evident nowadays in the NBA. Annual data on shot location
have enabled analysts to develop a model of expected points
per shot from each location on the court. This model revealed
that decisions on long-range two-pointers are inferior to the
selection of three-pointer shots. Visualization of ball movements and shot outcomes [29] allowed the players not only to
optimize the allocation of the ball between the team members
in an attack, but also to learn about the best positioning for a
defensive rebound, depending on the player who attempted
the shot, and the spot from which the shot was taken.
Combining modern statistical projections with traditional
scouts’ insights leads to more accurate assessments of a
player’s prospects at the professional level. For instance, the
Boston Celtics were able to pick future all-star Rajon Rondo
late in the 2006 NBA draft because they identified rebounding by guards as an undervalued skill in the NBA. Other
teams at that time did not realize the potential value of this
performance indicator as the more analytical Celtics did [1].
Based on comprehensive datasets of on-court performances in basketball, analysts were able to develop a number
of sophisticated performance game-related indicators, allowing them (a) to account for the number of minutes played
by the players when comparing points scored by starter
and backup players; (b) to distinguish shooting accuracy
from shooting selection when deciding on the acquisition
of players competing in different leagues; and (c) to control
for overall rebound opportunities when assessing a player’s
rebound ability.
Alamar [1] described the problem of projecting a player’s
development in a position which differs from the one he has
played so far. For example, the NBA player Russel Westbrook
played predominantly as a shooting guard and attracted the
attention of the Supersonics who were in need of a point guard
123
216
(a player who passes rather than shoots) and not a shooting
guard (player who shoots rather than passes). The ordinary
“number of assists” metric was found to be insufficient in
the assessment of Westbrook’s passing ability, and therefore
a new performance indicator, which measured the change in
the team’s shooting percentage when a specific player made
a pass to the shooter, was created by Alamar. This analysis
revealed that Westbrook’s effect on his teammates’ shooting
ability was of the same caliber as of the top point guards in
the NBA.
Similar processes in football led to the development of
metrics that account for the quality of the catcher, the strength
of the defense, and the effectiveness of the linebackers,
while evaluating the quarterback’s passing skill. In baseball,
machine learning is being leveraged to predict the pitching
behavior of players to better inform in-game decision making. The outcome of this is a model that can predict upcoming
pitches using real-time game statistics, with an accuracy of
74.5%. This model incorporates factors such as the type of
pitches thrown by particular pitchers/teams, the number and
position of the players on base, the ball-strike count, and the
number of innings played [26].
In soccer, bias toward seeing “what is there” and ignoring “what is not there” makes evaluating defense difficult.
Attacking in soccer has one simple best outcome: scoring
a goal. But defending is quite the opposite: there, the best
outcome is a goal that is not conceded—an event that does
not actually happen. This may be because of a shot that did
not come, a cross that was not made, or a through ball that
could not be passed properly. As a result, for instance, in 2001
Sir Alex Ferguson, one of the most successful managers in
British soccer history, decided to sell the Dutch international
defender Jaap Stam to Lazio. The sale was prompted partly
by match data. Studying the numbers, Ferguson had spotted
that Stam was tackling less often than before. He presumed
that the defender, then twenty-nine, was declining. So he sold
him. Ferguson has called this decision the biggest mistake of
his career [2].
Advanced information systems in soccer (e.g., Opta,
Prozone) provide the decision maker with heat maps and
visualizations of ball movements on the pitch. While looking at such data, the coach is able to identify patterns for the
defender’s ability to prevent passes and penetrations to the
area that he is in charge of, even without producing clears
and tackles. Such a skill has remained under the radar so far.
Another important source of information on players’ and
teams’ performance is locational and biometric devices (e.g.,
GPS devices, radio frequency devices, accelerometers). Such
devices are most frequently used to assess the total physical activity undertaken by players in games and practices
[21]. Devices worn by players provide a rich source of objective information on their external workloads and movement
patterns. In addition to location-based (x–y coordinates)
123
International Journal of Data Science and Analytics (2018) 5:213–222
and distance–time (speed) data, GPS units are fitted with
accelerometers, gyroscopes, and magnetometers, providing
data on accelerations, decelerations, change of direction
movements, and vertical jumps performed. Strength and conditioning coaches can modify training intensity according to
this objective information, and use it to help them decide
about in-game substitutions and player rotations. Based on
these data, sport analysts have developed models for predicting the risk of injury and are now able to alert the coach when
player workloads are mismanaged [26].
4 Management and policymakers’
decision-oriented analysis
The business-oriented analysis of sports is excessively versatile and addresses areas from economic assessment of the
impact of mega-sport events to allocation of scarce resources
while building professional teams’ rosters, to optimal ticket
pricing via evaluation of the fans’ level of interest. In this
section, we present various examples of business-oriented
research of sports, with an eye to briefly introducing the
reader to this wide spectrum of scientific endeavor.
Researchers are now able to produce a detailed cost–
benefit analysis of hosting mega-sport events (e.g., Olympic
Games, FIFA World Cup). For instance, Billings and Holladay [10] examined whether hosting the Olympic Games
improves a city’s long-term growth. The researchers matched
the host cities with cities that were finalists for the Olympic
Games, but were not selected by the International Olympic
Committee. An examination of post-Olympic impacts for
host cities between 1950 and 2005 indicated no long-term
impacts of hosting an Olympics on population, real Gross
Domestic Product per capita, or trade openness.
Baade and Matheson [5] suggested three major categories on the cost side of hosting an Olympic Games: (a)
general infrastructure such as transportation and accommodation; (b) the specific sports infrastructure required for
competition venues; and (c) operational costs. On the benefit side, they proposed immediate tourist spending during
the Games, long-term benefits (i.e., an Olympic legacy) that
might include improvements in infrastructure, and increased
trade, foreign investment, or tourism after the Games, as well
as intangible benefits such as the “feel-good effect” and civic
pride. Each of these costs and benefits was assessed, and the
main conclusion was that in most cases the Olympic Games
are a money-losing enterprise; they result in positive net benefits only under very specific circumstances in developed
countries. As early as 1956, Rottenberg [50] proposed that
the output of teams depends on player skills, training facilities, the stadium, the management, and other owner-supplied
resources. Rottenberg formulated the “uncertainty of outcome hypothesis,” which suggests that fans receive more
International Journal of Data Science and Analytics (2018) 5:213–222
utility from viewing competitions with an unpredictable outcome. This principle implies that teams should possess even
playing abilities to some extent, in order for the game’s outcome to be less certain.
On these grounds, league authorities often invoke outcome uncertainty as a rationale for intervention measures. For
instance, in the National Football League (NFL), ticket revenue sharing, equal broadcast revenue sharing, and a salary
cap are all combined steps meant to secure a certain degree
of competitive balance across the teams; that is, to prevent
large market teams from acquiring excessive talent relative
to the rest of the league [14]. Interestingly, a more recent
analysis of big datasets of annual league-level attendance
both in Europe and USA, controlling for a large number
of plausible influences on game/match day attendance, provided evidence contradicting the uncertainty of the outcome
hypothesis [14,20,43].
An acute issue confronting team managers since the inception of the salary cap is how to optimally distribute limited
resources across their team members. Borghesi [12] examined the relationship between compensation and performance
during 10 consecutive seasons in the NFL, and provided evidence that productivity, draft, and experience variables are
significantly related to the levels of base and bonus salaries.
Teams that compensate players the most inequitably were
found to be the most likely to perform the worst. Borghesi pointed out that superstar players on a roster can be
disruptive, even if their output on the field justifies their compensation levels.
Optimization of ticket pricing is a legitimate way for teams
to increase their revenues. For decades, the seat location was
the sole determinant of price. However, since the 2010s a
dynamic pricing strategy, where ticket prices fluctuate daily
based on changing market conditions, was introduced in
sports. Analysis of the price determinants provides ticket sellers a basis on which to set prices, and therefore is critical for
revenue generation [52]. For example, the Boston Red Sox
baseball team has monitored the flow of fans into the stadium. Analyzing the entrance used by fans relative to their
seat location, they optimized the location of concession and
memorabilia stands in order to minimize the lines and the
distance fans would have to cover to find food and souvenirs
[26].
Another example of using a big-data approach to improve
managerial decisions is the customer relationship management (CRM) systems, which gather information on customers at various touch points. The data collected are then
used to guide a sport team’s relationship with its customers,
to build a loyal fan base and to increase revenues. The
English Premier League’s Manchester City Football Club
performs customer-based, data-driven marketing very successfully. The club provides its supporters with member cards
to be used for buying tickets, entering the stadium, making
217
purchases at the stadium, and so on. The stadium has a system that interacts with the members’ cards and gathers data
throughout their visit to the stadium. The data is stored in a
CRM system and allows the club to understand their fans’
behavior in great detail. The club uses the insights derived
from the data analyses to engage with their fans, to build
deeper long-term relationships with them, and to add value
to their relationships with the club [26].
Recently, sports franchises have become interested in
developing targeted approaches to marketing based on a fan’s
history and past purchases. At the same time, social media
has come to be an important source of ample data, and if
used appropriately a new lens for many new types of studies [37]. In this regard, the use of social media analytics is
a practice at the frontier of measuring fan engagement [21].
For example, Bagić Babac and Podobnik [6] analyzed user
comments published on the Facebook pages of the top five
2015–2016 Premier League soccer clubs. They shed light
on who, how, and why fans participate in social media sport
websites, and suggest that outcomes from social media mining bring insights about human behavior patterns that are not
visible otherwise. Such results have the potential to influence
soccer marketing and to encourage organizations to develop
new strategies, for example in targeting women as growing
consumers of soccer-related products.
The sports betting industry is another solid sector of sports
business where both customers and suppliers attempt to
correctly predict outcomes of future events. Consequently,
researchers continue to introduce a variety of models that are
formulated by diverse forecast methodologies. These models may be targeted on the prediction of results of individual
matches [17] or on tournament outcomes [38]. Due to the
extremely competitive nature of the gambling market, it has
become an arena for the implementation of advances in computing and machine learning, with cutting-edge predictive
algorithms (see [17,18,38,41,56]).
5 Analysis of sports data to learn about
human behavior
Over 30 years ago, Gilovich et al. [28] caused a stir by
debunking a widely accepted belief in the tendency of a hitting player to produce more and more hits (i.e., “hot hand”).
The analyzed data provided no evidence for a positive correlation between the outcomes of successive shots. The primary
focus of this study was by no means the assessment of basketball players’ performance, but rather to demonstrate a
common bias caused by implementation of the representativeness heuristic [55]. The data were taken from records of
48 home games of the Philadelphia 76ers, collected by the
team’s statistician. The authors stated that records of consecutive shots during basketball games for individual players
123
218
were not available for other teams in the NBA during that
time. Gilovich et al. suggested that the failure to detect evidence of streak shooting might also be attributed to the
selection of shots by individual players and the defensive
strategy of the opposing teams. However, they also reached
the conclusion that there is no correlation between the outcomes of successive shots, both after analyzing free-throw
records of the Boston Celtics and from a controlled shooting
experiment with the males and females of Cornell’s varsity
teams.
The study by Gilovich et al. is a showcase of research
relying on data from the field of sports. It is one of the most
notable instances of sports being used as a laboratory for
assessment of important psychological-economic theories.
On Google Scholar, for example, this study is cited over 1300
times, and the literature on the hot hand is very broad today
(see, for example, the review article [7]).
Almost 30 years later, another study showed how technological developments and the much wider availability of
data today are reflected in current research. Bocskocsky et
al. [11] used novel metrics provided by the optical tracking system SportVu alongside play-by-play data recorded
for each game in the NBA. These researchers analyzed a
dataset of over 83,000 shots from the 2012–2013 season,
combined with data of both players and ball position in each
shot attempt. Relying on such a rich database, they were
able to construct a comprehensive model of shot difficulty,
and by this to demonstrate that players who exceeded their
expectations in shots over recent attempts took shots from
significantly further away, faced tighter defense, and were
more likely to take their team’s next shot.
Economists noted that the essence of game theory is to
facilitate understanding and to predict behavior in economic,
social, and political contexts [34]. However, testing game theory predictions has proven to be extremely difficult, and as
a result even the most fundamental premises have not yet
been supported empirically in real situations [46]. One of
the fundamental basic tools in analyzing behavior in games
(in the field of game theory) is the notion of mixed strategies, where players may play some of their strategies with
certain probabilities rather than pursue a single pure strategy. Basic concepts in strategic situations that require the
player to be unpredictable are von Neumann’s Minimax Theorem and the mixed-strategy Nash equilibrium (“MSNE”).
These two concepts were examined empirically in laboratory
experiments, but given the advantages of field data, especially
when decision makers are experts and face large incentives,
several attempts have been made to examine these concepts
with sports data as well. These studies attempted to conclude
whether professional players seem to be playing according
to the MSNE, and thus to obtain insights about the usefulness of this solution concept. In another study, Walker and
Wooders [57] used data on championship (Grand Slam) pro-
123
International Journal of Data Science and Analytics (2018) 5:213–222
fessional tennis matches and found that win rates in the serve
and return play are consistent with the Minimax hypothesis.
Penalty kicks in soccer are a good context to examine
the MSNE concept, since they present a situation with large
incentives (due to the small number of goals in soccer, a
penalty kick can determine the entire game), and start in
the same way every time (as opposed to complex situations
during play), with only two players involved, and with simultaneous play and simple rules and potential strategies. As
such, a number of studies used data on penalty kicks to examine the MSNE concept and its applicability to behavior in
real games. In one study, Chiappori, Levitt, and Groseclose
[16] collected data on penalty kicks in Division 1 (the highest division for competitive soccer) in France and Italy, and
developed a theoretical model of the penalty kick game as a
simultaneous 3X3 (each player can play right, left, or center)
game between the kicker and the goalkeeper. Chiappori et al.
made some assumptions about the payoffs in the game, and
derived predictions on which strategies or combinations of
strategies should be more common than others if the players
play the MSNE. The predictions, and therefore the MSNE
concept, were supported by the data.
In another study, Palacios-Huerta [46] collected and examined a different dataset, which consisted of 1417 penalty
kicks from various countries (mostly Italy, England, and
Spain). He performed most of the analysis on a simplified
2X2 game (right vs. left without center) and found that the
winning probabilities of each strategy of each player are
similar and that players’ choices are serially independent,
demonstrating that professional soccer players and goalkeepers behave as predicted by the Minimax Theorem when
deciding on the directions of shots and jumps in penalty duels.
In a more recent study, Azar and Bar-Eli [4], using a different dataset of penalty kicks, compared the predictions of the
MSNE to other prediction methods and found that the MSNE
predictions were the closest to the data, even though some
other prediction methods used information on the marginal
distribution of kicks or jumps whereas the MSNE did not.
The desired characteristics of simplicity, constant situation, and large incentives, and the fact that penalty kicks in
soccer are sometimes taken as a series of shootouts after a
tied game (in certain competitions), have made penalty kicks
a source for addressing additional research questions. For
example, Bar-Eli et al. [8] analyzed 286 penalty kicks in
top leagues and championships worldwide and found that
given the probability distribution of kick direction, the optimal strategy for goalkeepers is to stay in the goal’s center.
Goalkeepers, however, seem to behave non-optimally and
almost always jump to the right or left. The authors suggest
how this can be explained by the norm theory [32]. Because
the goalkeepers’ norm is to act (jumping to a side), a goal
scored yields worse feelings for the goalkeeper following
inaction (staying in the center) than following action (jump-
International Journal of Data Science and Analytics (2018) 5:213–222
ing to a side), leading to a bias for action. The more common
omission bias, a bias in favor of inaction, is reversed in this
context, since the norm is reversed—to act rather than to
choose inaction. The claim that jumping is the norm was
supported by a survey conducted with 32 top professional
goalkeepers.
In another study on penalty kicks in soccer, Apesteguia
and Palacios-Huerta [3] revealed a surprising phenomenon
that takes place during penalty shootouts. The purpose of
a penalty shootout is to decide the winning team where
competition rules require one team to be declared the winner after a drawn game. The referee tosses a coin and the
team whose captain wins the toss decides whether to take
the first or the second kick. Five kicks are taken alternately
by the teams. The explicit randomization mechanism used
to determine which team goes first in the sequence, in a
situation where both teams have exactly the same opportunities to perform a task, suggests that we should expect the
first and second teams to have exactly the same probability
of winning the shootout. Yet, using data on 1343 penalty
kicks from 129 penalty shootouts over the period 1976–
2003, Apesteguia and Palacios-Huerta found that teams that
take the first kick in the sequence win the penalty shootout
60.5% of the time. Given the characteristics of the setting,
the researchers attributed this difference in performance to
psychological effects (e.g., pressure) resulting from the kick
consequence. In particular, most kicks are scored, and this
puts the team that kicks second behind in its score most of
the time. Such a finding provides a fertile ground for the
intervention and help of a sport psychologist, and is valuable
for influencing the decisions of coaches and teams’ captains.
Interestingly, another study [33] that increased the sample
size from 129 shootouts to 540 suggests a much smaller firstmover advantage (53.3% for the first-kicking team to win),
which is no longer statistically significant. This demonstrates
the importance of relying on as much relevant data as possible, which reinforces the advantages we have today in the
big-data era.
In another study on soccer, Misirlisoy and Haggard [44]
examined all 361 kicks from the 37 penalty shootouts performed in the World Cup and Euro Cup games over the
period 1976–2012, and suggested that goalkeepers displayed
a sequential bias: following repeated kicks to the same direction, goalkeepers became more likely to jump to the opposite
direction in the next kick. This is an illustration of the phenomenon of the gambler’s fallacy, a famous psychological
bias. Surprisingly, kickers did not seem to exploit these goalkeeper biases.
However, a new observation on the same issue by other
researchers yielded different findings: Braun and Schmidt
[13] suggested that even with the original data, but using
a different statistical analysis (a binomial test instead of
bootstrapping), the results of Misirlisoy and Haggard are no
219
longer statistically significant, even at the 10%-level with a
one-tailed test (equivalent to 20% in a two-tailed test). The
results of Misirlisoy and Haggard turn out to be very sensitive, due to the small number of cases on which their findings
are based. Although the entire dataset consists of a reasonable number of kicks (361), their main finding emerges from
only 16 kicks that represent a situation where the three previous kicks of that team went to the same direction (right or
left). This shows the challenge that still exists even in the
big-data era: if the event being considered is special and relatively rare, then even a large dataset may provide only a few
relevant events, making the analysis problematic. Moreover,
the sensitivity of the results turns out to be susceptible not
only to changing the statistical test but also to expanding the
sample. In particular, Braun and Schmidt found that adding
some additional competitions, which increased the number
of relevant sequences (three previous kicks to the same direction) from 16 to 26, reduced the percentage of cases in which
the goalkeeper jumps to the opposite direction (compared to
the last three kicks) from 69% to only 58%; the difference
between 58% and 50% is not statistically significant with
only 26 observations.
Large datasets have come to be especially useful in
detecting systematic biases in the decision making of referees. For example, using field data from the Spanish soccer
league, Garicano et al. [27] examined the amount of extra
time a referee adds after 90 min (regular time in a soccer
game) and provided clear evidence that referees add significantly more extra time in the case where the home team is
behind. On a related topic, Dohmen [23] analyzed the neutrality of referees during 12 German premier league soccer
seasons, and documented evidence that social forces influence agents’ preferences and decisions. Dohmen reported
that referees tend to favor the home team in decisions to award
goals and penalty kicks. The insights generated by these studies have been incorporated in referees’ training programs.
In one study on basketball, Morgulev et al. [45] combined
an examination of referees’ decisions with the analysis of
the related behavior of players and its impact on the team.
More specifically, the researchers examined the behavior of
professional referees and players in the context of offensive
fouls in basketball. Over 500 incidents that had the potential to meet the criteria of an offensive foul were recorded
and analyzed by basketball experts. Falling intentionally to
improve the chances to get an offensive foul was found to
be a very common behavior of defenders (almost two-thirds
of the recorded falls). At first, it seems helpful, increasing
the chances to be given an offensive foul. However, an additional statistical analysis suggests that the overall impact of
an intentional fall on the team seems to be negative (e.g.,
because a fallen player is less helpful for the team than a
standing player). The authors argue that both rational reasons
and biased decision making lead players to act against their
123
220
team’s interest by falling. In addition, the authors reported
that referees almost never call an offensive foul if the player
remains on his feet, and generally call fewer fouls than the
number judged by experts as appropriate. They explain the
referees’ behavior as being partially biased by the representativeness heuristic, but also partially reflecting officiating
mistakes that are rational given the referees’ incentives.
An additional central concept in economics that was examined by data collection from the sport arena is the Prospect
Theory, and the effect of loss aversion derived from this theory. Pope and Schweitzer [47] turned to the field of golf in
order to put this psychological mechanism to a test. In golf,
every hole has a par number of strokes associated with it,
and the par number provides a reference point for a satisfying performance. For a professional golfer, a birdie (one
stroke under par) is a gain, and a bogey (one stroke over
par) is a loss. The researchers compared a situation where
the player is putting to avoid a bogey, with a more favorable setting where the player is aiming to achieve a birdie.
A hypothesis dictated by loss aversion suggests that players will try harder when putting for a par (to avoid a bogey)
than when putting for a birdie. Their analysis of more than
2.5 million putts supported this prediction. Nobel Laureate
Daniel Kahneman referred to this finding in his 2011 book:
These fierce competitors certainly do not make a conscious
decision to slack off on birdie putts, but their intense aversion
to a bogey apparently contributes to extra concentration on
the task at hand [31, p. 304].
In a study on professional basketball players, Staw and
Hoang [53] used the player market in the NBA to elicit fieldoriginated data on the existence of the well-known sunk-cost
effect. These researchers tested whether the amount teams
spent on players influenced how much playing time the players got and how long they stayed with the NBA franchises.
This study was one of the first quantitative field tests of the
sunk-cost effect.
Another area in which big-data analysis provided interesting insights is corruption in sports. Duggan and Levitt [24]
analyzed the results of all “critical” sumo matches from January 1989 until January 2000, and found strong evidence for
match rigging in professional sumo. This study was replicated and extended by Dietl et al. [22], who discovered
more intriguing trends, relying on even bigger datasets. In
basketball, Taylor and Trogdon [54] analyzed the winning
percentages of NBA teams that were eliminated from the
playoffs. They found that to gain higher draft positions, these
teams were 2.5 times more likely to lose than teams that were
still trying to secure their place in the playoffs. Recently,
Elaad, Kantor, and Krumer (2016)1 began to utilize data from
crucial soccer games between a team in immediate danger
1 Elaad, G., Kantor, J., & Krumer, A. (2016) Corruption and Contests:
Cross-Country Evidence from Sensitive Soccer Matches, mimeo.
123
International Journal of Data Science and Analytics (2018) 5:213–222
of being relegated to a lower division and a team not much
affected by the results in the respective game, on the last day
of a season. Based on data from 75 countries between 2001–
2013, they found that the odds of the team in danger to avoid
relegation are significantly higher when the country is more
corrupt according to the Corruption Perceptions Index (CPI).
6 Challenges of big data in sports
Silver [51] summarizes his book with the realization that
prediction in the era of big data is not going very well. The
author admits that his success in building functioning forecasting systems for baseball and politics is in large part due
to his ability to choose his “battles” well. We mentioned previously that baseball is an exceptional domain with a defined
set of distinct actions that can be counted, and have actually
been counted for more than a century. In contrast, the game
of soccer is far more susceptible to chance, with success rates
of pre-game favorites only slightly above 50% [2].
Daryl Morey gained his reputation as the first truly analytical general manager in the NBA. His staff gathered original
data by measuring items that had previously gone unmeasured. In 2008, Morey used his predictive model to select a
center player that was soon revealed to be a bust, and failed to
detect DeAndre Jordan, future dominant NBA center and the
second-best player in the entire draft class. Digging deeper
into the matter, Morey revealed that his model disregarded
the prospects’ age. He realized that an entire class of college
players existed who played better due to the fact that they
were much older than the players they were playing against.
The failure to detect DeAndre Jordan proved to be far more
complex: He had played a single year of college basketball,
hated his coach, and did not even want to be in school—
it is impossible to see this prospect’s future in his college
statistics. The analytical model would always miss DeAndre Jordan; however, one of Morey’s scouts had wanted to
draft Jordan on the strength of what appeared to him Jordan’s
undeniable physical talent [36].
Due to this reason, Alamar [1] emphasized the importance of integrating different data sources, both quantitative
and qualitative, into the data management system. For example, if the quantitative data give a different picture from the
scouting reports, integrated medical information about the
player’s medical history may explain the differences. If not,
then integrated and linked videos lets the decision maker see
the player in action, and he can then conclude which source
of information is the most relevant. The installation of such
comprehensive and integrated information systems requires
significant financial investments in technology, alongside the
cooperation and change of behavior of the different departments in the organization.
International Journal of Data Science and Analytics (2018) 5:213–222
Assembling and organizing all of the quantitative and
qualitative data is a monumental task, and strong and determined leadership is essential in order to move from a culture
of data silos to a centralized system. The whole organization
needs to understand the importance of the new data management system. Confident leadership is crucial when the
data-driven approach does not lead to immediate success and
it becomes open to attack in a way that the old approach to
decision making was not [36].
Some of the data in sports are quantitative (e.g., points
gained by the teams during the game, league scores, and
other objective performance criteria), but much else of what
can be analyzed is more complex as it comes from situations during the games. Except for a few situations that are
generally the same each time (e.g., a penalty kick in soccer
[4,8,44,46] or a basketball free throw), most other situations
are complex and involve many variables such as the identity
of the involved players and their abilities, how much times
remains until the game end, previous fouls of the players in
basketball or yellow cards in soccer, the current score, the
identity of the home team, the number of spectators, etc. In
addition, some variables are hard to quantify in a manner
that can be used in regressions or a similar statistical analysis, for example the location of multiple players and the ball,
the location of the referee, etc. Moreover, the importance of
the game may be different for the teams based on the situation in the league or the championship. Often data collection
requires to obtain multiple sources. When the game situation
is important for the analysis, a common approach is to collect video clips of game events and categorize them based on
some relevant characteristics (e.g., whether the player with
the ball in soccer is inside the 16-meter area of the opponent,
or whether a collision in basketball may be an offensive foul
[45]), sometimes using expert judges for this purpose [45].
Because game events can differ in so many ways, one often
has to make compromises and put together in the analysis
events that have some common features although they differ in other aspects. For example, we may categorize soccer
situations based on the location of the player who holds the
ball, ignoring the location of other players, which is different
across the events.
Another important challenge in sport analysis is that it may
be hard to conclude about causality even when a correlation
is found, due to endogeneity problems. That is, we are usually considering in sports complex systems where multiple
players make decisions that affect each other and are influenced by various factors, making it hard to conclude what is
the reason for what. For example, finding that the chances
to make the shot in basketball are higher when the player
is blocked by an opponent player, probably does not mean
that being blocked improves accuracy. Whether the opponent team blocks a player is endogenous, i.e., it is determined
within the system (the game dynamics). It makes sense that
221
the opponent team blocks a player in situations that are more
dangerous, and this is why we find a correlation between
being blocked and having better chances of making the shot.
To infer about causality one needs to find ways to neutralize
the endogeneity problem, which is often quite difficult.
7 Conclusion
The big-data era, where large datasets are available for
analysis in many domains of life, provides various unique
opportunities for research (see [15] for a review). In this
article, we focus on the case of analyzing sports data. We
provide some examples, from various sports, such as the
world’s highly popular team sports of soccer and basketball, the individual sports of golf and tennis, and the more
unique and local sport of sumo, for studies that used sports
data to address a variety of research questions. In many cases,
although the data are collected from the domain of sport, the
lessons learned are more general and have implications for
other fields. We hope that this short literature review may
lead some readers to develop their own ideas on how to use
sports data to address interesting research topics.
Compliance with ethical standards
Conflict of interest On behalf of all authors, the corresponding author
states that there is no conflict of interest.
References
1. Alamar, B.C.: Sports Analytics—A Guide for Coaches, Managers,
and Other Decision Makers. Columbia University Press, West Sussex (2013)
2. Anderson, C., Sally, D.: The Numbers Game. Why Everything You
Know About Soccer is Wrong (2013)
3. Apesteguia, J., Palacios-Huerta, I.: Psychological pressure in
competitive environments: evidence from a randomized natural
experiment. Am. Econ. Rev. 100, 2548–2564 (2010)
4. Azar, O.H., Bar-Eli, M.: Do soccer players play the mixed-strategy
Nash equilibrium? Appl. Econ. 43, 3591–3601 (2011)
5. Baade, R.A., Matheson, V.A.: Going for the gold: the economics
of the Olympics. J. Econ. Perspect. 30, 201–218 (2016)
6. Bagić Babac, M., Podobnik, V.: A sentiment analysis of who participates, how and why, at social media sport websites: how differently
men and women write about football. Online Inf. Rev. 40, 814–833
(2016)
7. Bar-Eli, M., Avugos, S., Raab, M.: Twenty years of “hot hand”
research: review and critique. Psychol. Sport Exerc. 7, 525–553
(2006)
8. Bar-Eli, M., Azar, O.H., Ritov, I., Keidar-Levin, Y., Schein, G.:
Action bias among elite soccer goalkeepers: the case of penalty
kicks. J. Econ. Psychol. 28, 606–621 (2007)
9. Bhandari, I., Colet, E., Parker, J., Pines, Z., Pratap, R., Ramanujam,
K.: Advanced scout: data mining and knowledge discovery in NBA
data. Data Min. Knowl. Discov. 1, 121–125 (1997)
123
222
10. Billings, S.B., Holladay, J.S.: Should cities go for the gold? The
long-term impacts of hosting the Olympics. Econ. Inq. 50, 754–772
(2012)
11. Bocskocsky, A., Ezekowitz, J., Stein, C.: The hot hand: a new
approach to an old “fallacy”. In: Proceedings of the 8th MIT Sloan
Sport Analytics Conference (2014)
12. Borghesi, R.: Allocation of scarce resources: insight from the NFL
salary cap. J. Econ. Bus. 60, 536–550 (2008)
13. Braun, S., Schmidt, U.: The gambler’s fallacy in penalty shootouts.
Curr. Biol. 25, R597–R598 (2015)
14. Buraimo, B., Simmons, R.: Do sports fans really value uncertainty
of outcome? Evidence from the English Premier League. Int. J.
Sport Finance 3, 146 (2008)
15. Cao, L.: Data science: a comprehensive overview. ACM Comput.
Surv. (CSUR) 50, 43 (2017)
16. Chiappori, P.A., Levitt, S., Groseclose, T.: Testing mixed-strategy
equilibria when players are heterogeneous: the case of penalty kicks
in soccer. Am. Econ. Rev. 92, 1138–1151 (2002)
17. Constantinou, A.C., Fenton, N.E., Neil, M.: Profiting from an inefficient Association Football gambling market: prediction, risk and
uncertainty using Bayesian networks. Knowl. Based Syst. 50, 60–
86 (2013)
18. Constantinou, A., Fenton, N.O.R.M.A.N.: Towards smart-data:
improving predictive accuracy in long-term football team performance. Knowl. Based Syst. 124, 93–104 (2017)
19. Costa, G.B., Huber, M.R., Saccoman, J.T.: Understanding Sabermetrics: An Introduction to the Science of Baseball Statistics.
McFarland (2007)
20. Cox, A.: Spectator demand, uncertainty of results, and public interest: evidence from the English Premier League. J. Sports Econ.
1527002515619655 (2015)
21. Davenport, T.H.: Analytics in sports: the new science of winning.
Int. Inst. Anal. 2, 1–28 (2014)
22. Dietl, H.M., Lang, M., Werner, S.: Corruption in professional sumo:
an update on the study of Duggan and Levitt. J. Sports Econ. 11,
383–396 (2010)
23. Dohmen, T.J.: The influence of social forces: evidence from the
behavior of football referees. Econ. Inq. 46, 411–424 (2008)
24. Duggan, M., Levitt, S.D.: Winning isn’t everything: corruption in
sumo wrestling. Am. Econ. Rev. 92, 1594–1605 (2002)
25. Franks, I., Hughes, M.: Notational Analysis of Sport: Systems for
Better Coaching and Performance in Sport. Routledge, London
(2004)
26. Fried, G., Mumcu, C. (eds.): Sport Analytics: A Data-Driven
Approach to Sport Business and Management. Taylor & Francis,
New York (2016)
27. Garicano, L., Palacios-Huerta, I., Prendergast, C.: Favoritism under
social pressure. Rev. Econ. Stat. 87, 208–216 (2005)
28. Gilovich, T., Vallone, R., Tversky, A.: The hot hand in basketball:
on the misperception of random sequences. Cogn. Psychol. 17,
295–314 (1985)
29. Goldsberry, K.: CourtVision: New visual and spatial analytics for
the NBA MIT Sloan Sports Analytics Conference. In: MIT Sloan
Sports Analytics Conference (2012)
30. Hughes, M., Hughes, M.T., Behan, H.: The evolution of computerised notational analysis through the example of racket sports. Int.
J. Sports Sci. Eng. 1, 3–28 (2007)
31. Kahneman, D.: Thinking, Fast and Slow. Macmillan, London
(2011)
32. Kahneman, D., Miller, D.T.: Norm theory: comparing reality to its
alternatives. Psychol. Rev. 93, 136–153 (1986)
33. Kocher, M.G., Lenz, M.V., Sutter, M.: Psychological pressure in
competitive environments: new evidence from randomized natural
experiments. Manag. Sci. 58, 1585–1591 (2012)
123
International Journal of Data Science and Analytics (2018) 5:213–222
34. Kreps, D.M.: Game Theory and Economic Modelling. Oxford University Press, Oxford (1990)
35. Lewis, M.: Moneyball: The Art of Winning an Unfair Game. WW
Norton & Company, New York (2004)
36. Lewis, M.: The Undoing Project: A Friendship That Changed the
World. Penguin, London (2016)
37. Liu, H., Morstatter, F., Tang, J., Zafarani, R.: The good, the bad, and
the ugly: uncovering novel research opportunities in social media
mining. Int. J. Data Sci. Anal. 1, 137–143 (2016)
38. Lopez, M.J., Matthews, G.J.: Building an NCAA men’s basketball
predictive model and quantifying its success. J. Quant. Anal. Sports
11, 5–12 (2015)
39. Martin, L.: Sports Performance Measurement and Analytics. Pearson, Old Tappan (2016)
40. Miller, T.W.: Sports Analytics and Data Science. Pearson, Old Tappan (2016)
41. Martins, R.G., Martins, A.S., Neves, L.A., Lima, L.V., Flores, E.L.,
do Nascimento, M.Z.: Exploring polynomial classifier to predict
match results in football championships. Expert Syst. Appl. 83,
79–93 (2017)
42. Memmert, D., Hüttermann, S., Hagemann, N., Loffing, F., Strauss,
B.: Dueling in the penalty box: evidence-based recommendations
on how shooters and goalkeepers can win penalty shootouts in
soccer. Int. Rev. Sport Exerc. Psychol. 6, 209–229 (2013)
43. Mills, B., Fort, R.: League-level attendance and outcome uncertainty in US pro sports leagues. Econ. Inq. 52, 205–218 (2014)
44. Misirlisoy, E., Haggard, P.: Asymmetric predictability and cognitive competition in football penalty shootouts. Curr. Biol. 24,
1918–1922 (2014)
45. Morgulev, E., Azar, O.H., Lidor, R., Sabag, E., Bar-Eli, M.: Deception and decision making in professional basketball: is it beneficial
to flop? J. Econ. Behav. Organ. 102, 108–118 (2014)
46. Palacios-Huerta, I.: Professionals play minimax. Rev. Econ. Stud.
70, 395–415 (2003)
47. Pope, D.G., Schweitzer, M.E.: Is Tiger Woods loss averse? Persistent bias in the face of experience, competition, and high stakes.
Am. Econ. Rev. 101, 129–157 (2011)
48. Reep, C., Bernard, B.: Skill and chance in association football. J.
R. Stat. Soc. Ser. A (Gen.) 131, 581–585 (1968)
49. Reich, B.J., Hodges, J.S., Carlin, B.P., Reich, A.M.: A spatial analysis of basketball shot chart data. Am. Stat. 60, 3–12 (2006)
50. Rottenberg, S.: The baseball players’ labor market. J. Polit. Econ.
64, 242–258 (1956)
51. Silver, N.: The Signal and the Noise: Why so Many Predictions
Fail-but Some Don’t. Penguin, London (2012)
52. Shapiro, S.L., Drayer, J.: An examination of dynamic ticket pricing
and secondary market price determinants in Major League Baseball. Sport Manag. Rev. 17, 145–159 (2014)
53. Staw, B.M., Hoang, H.: Sunk costs in the NBA: why draft order
affects playing time and survival in professional basketball. Adm.
Sci. Q. 40, 474–494 (1995)
54. Taylor, B.A., Trogdon, J.G.: Losing to win: tournament incentives
in the National Basketball Association. J. Labor Econ. 20, 23–41
(2002)
55. Tversky, A., Kahneman, D.: Judgment under uncertainty: heuristics
and biases. Science 185, 1124–1131 (1974)
56. Ulmer, B., Fernandez, M., Peterson, M.: Predicting Soccer Match
Results in the English Premier League. Doctoral dissertation, Ph.
D. dissertation, Stanford (2013)
57. Walker, M., Wooders, J.: Minimax play at Wimbledon. Am. Econ.
Rev. 91, 1521–1538 (2001)

Purchase answer to see full
attachment

  
error: Content is protected !!