+1(978)310-4246 credencewriters@gmail.com
  

Description

what you’ve learned in the course and how you will apply it to your work life.

Data Visualisation
Sara Miller McCune founded SAGE Publishing in 1965 to support
the dissemination of usable knowledge and educate a global
community. SAGE publishes more than 1000 journals and over
800 new books each year, spanning a wide range of subject areas.
Our growing selection of library products includes archives, data,
case studies and video. SAGE remains majority owned by our
founder and after her lifetime will become owned by a charitable
trust that secures the company’s continued independence.
Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
2nd Edition
Data Visualisation
A Handbook for Data Driven Design
Andy Kirk
SAGE Publications Ltd
1 Oliver’s Yard
55 City Road
London EC1Y 1SP
© Andy Kirk 2019
SAGE Publications Inc.
2455 Teller Road
Thousand Oaks, California 91320
Apart from any fair dealing for the purposes of research or
private study, or criticism or review, as permitted under the
Copyright, Designs and Patents Act, 1988, this publication
may be reproduced, stored or transmitted in any form, or by
any means, only with the prior permission in writing of the
publishers, or in the case of reprographic reproduction, in
accordance with the terms of licences issued by the Copyright
Licensing Agency. Enquiries concerning reproduction outside
those terms should be sent to the publishers.
SAGE Publications India Pvt Ltd
B 1/I 1 Mohan Cooperative Industrial Area
Mathura Road
New Delhi 110 044
First edition published 2016. Reprinted four times in 2016, twice
in 2017, three times in 2018, and three times in 2019.
SAGE Publications Asia-Pacific Pte Ltd
3 Church Street
#10-04 Samsung Hub
Singapore 049483
Editor: Aly Owen
Editorial assistant: Lauren Jacobs
Production editor: Ian Antcliff
Copyeditor: Neville Hankins
Proofreader: Christine Bitten
Indexer: David Rudeforth
Marketing manager: Susheel Gokarakonda
Cover design: Shaun Mercier
Typeset by: C&M Digitals (P) Ltd, Chennai, India
Printed in the UK
Library of Congress Control Number: 2018964578
British Library Cataloguing in Publication data
A catalogue record for this book is available from
the British Library
ISBN 978-1-5264-6893-2
ISBN 978-1-5264-6892-5 (pbk)
At SAGE we take sustainability seriously. Most of our products are printed in the UK using responsibly sourced
papers and boards. When we print overseas we ensure sustainable papers are used as measured by the PREPS
grading system. We undertake an annual audit to monitor our sustainability.
Contents
Acknowledgements vii
About the Author
ix
Discover Your Textbook’s Online Resources
xi
Introduction 1
PART A FOUNDATIONS
13
1
Defining Data Visualisation
15
2
The Visualisation Design Process
31
PART B THE HIDDEN THINKING
59
3
Formulating Your Brief
61
4
Working With Data
95
5
Establishing Your Editorial Thinking
119
PART C DEVELOPING YOUR DESIGN SOLUTION 133
6
Data Representation
135
7
Interactivity
203
8
Annotation
231
9
Colour
249
10 Composition
277
Epilogue 295
References 301
Index 303
Acknowledgements
I could not have written this book without the unwavering support of my wonderful wife, Ellie,
and my family. The book is dedicated to my inspirational Dad who sadly passed away before
its publication. I want to acknowledge the contributions of the thousands of data visualisation
practitioners who have created such a wealth of exceptional design work and smart writing. I
have been devouring this for over a decade now and I am constantly inspired by the talents
and minds behind it all. I also want to express my gratitude to the people and organisations
who have granted me permission to reference and showcase their visualisation work in this
book. Sincere thanks to the many people at Sage who have played a role in making this book
grow from the first proposal and now to a second edition. Finally, to you the readers, I am
hugely thankful that you chose to invest in this book. I hope it helps you in your journey to
learning about this super subject.
About the Author
Andy Kirk is a freelance data visualisation specialist based in Yorkshire, UK. He is a visualisation
design consultant, training provider, teacher, author, speaker, researcher and editor of the
award-winning website visualisingdata.com.
After graduating from Lancaster University in 1999 with a BSc (hons) in Operational Research,
Andy’s working life began with a variety of business analysis and information management
roles at organisations including CIS Insurance, West Yorkshire Police and the University of
Leeds.
He discovered data visualisation in early 2007, when it was lurking somewhat on the fringes of
the Web. Fortunately, the timing of this discovery coincided with his shaping of his Master’s
(MA) degree research proposal, a self-directed research programme that gave him the opportunity to unlock and secure his passion for the subject.
He launched visualisingdata.com to continue the process of discovery and to chart the course
of the increasing popularity of the subject. Over time, this award-winning site has grown to
become a popular reference for followers of the field, offering contemporary discourse, design
techniques and vast collections of visualisation examples and resources.
Andy became a freelance professional in 2011. Since then he has been fortunate to work with
a diverse range of clients across the world, including organisations such as Google, CERN,
Electronic Arts, the EU Council, Hershey and McKinsey. At the time of publication, he will have
delivered over 270 public and private training events in 25 different countries, reaching more
than 6000 delegates. Alongside his busy training schedule, Andy also provides design consultancy, his primary client being the Arsenal FC Performance Team, since 2015.
In addition to his commercial activities, he maintains regular engagements in academia.
Between 2014 and 2015 he was an external consultant on a research project called ‘Seeing
Data’, funded by the Arts & Humanities Research Council and hosted by the University of
Sheffield. This study explored the issues of data visualisation literacy among the general public
and, inter alia, helped to shape an understanding of the human factors that affect visualisation
literacy and the effectiveness of design.
Andy joined the highly respected Maryland Institute College of Art (MICA) as a visiting lecturer
in 2013 teaching a module on the Information Visualisation Master’s Programme through to
2017. From January 2016, he taught a data visualisation module as part of the MSc in Business
Analytics at the Imperial College Business School in London through to 2018. As of May 2019,
Andy has started teaching at University College London (UCL).
Discover Your Textbook’s
Online Resources
Want more support around understanding and creating data visualisations? Andy Kirk is here
to help, offline and on!
Hosted by the author and with resources organized by chapter, the supporting website for this
book has everything you need to explore, practice, and hone your data visualisation skills.
••
Explore the field: expand your knowledge and reinforce your learning about working
with data through libraries of further reading, references, and tutorials.
••
Try this yourself: revise, reflect, and refine your skill and understanding about the challenges of working with data through practical exercises.
••
See data visualisation in action: get to grips with the nuances and intricacies of working with data in the real world by navigating instalments of the narrative case study and
seeing an additional extended example of data visualisation in practice. Follow along with
Andy’s video diary of the process and get direct insight into his thought processes, challenges, mistakes, and decisions along the way.
••
Chartmaker directory: access crowd-sourced guidance that aims to answer the crucial
question ‘which tools make which charts?’ with this growing directory of examples and
technical solutions for chart building.
Ready to learn more? Go beyond the book and dive deeper into data visualisation via the rest
of Andy’s website (www.visualisingdata.com), which contains data visualisation tools
and software, links to additional influential further reading, and a blog with monthly
collections of the best data visualisation examples and resources each month.
Introduction
The primary challenge one faces when writing a book about data visualisation is to determine
what to leave in and what to leave out. Data visualisation is a big subject. There is no single
book to rule it all because there is no one book that can truly cover it all. Each and every one
of the topics covered by the chapters in this book could (and, in several cases, do) exist as books
in their own right.
The secondary challenge when writing a book about data visualisation is to decide how to
weave the content together. Data visualisation is not rocket science; it is not an especially
complicated discipline, though it can be when working on sophisticated topics and with
advanced applications. It is, however, a complex subject. There are lots of things to think about,
many things to do and, of course, things that will need making. Creative and journalistic
sensibilities need to blend harmoniously with analytical and scientific judgement. In one
moment, you might be checking the statistical rigour of an intricate calculation, in the next
deciding which shade of orange most strikingly contrasts with a vibrant blue. The complexity
of data visualisation manifests in how the myriad small ingredients interact, influence and
intersect to form a whole.
The decisions I have made when formulating this book’s content have been shaped by my own
process of learning. I have been researching, writing about and practising data visualisation for
over a decade. I believe you only truly learn about your own knowledge of a subject when you
have to explain it and teach it to others. To this extent I have been fortunate to have had
extensive experience designing and delivering commercial training as well as academic teaching.
I believe this book offers an effective and proven pedagogy that successfully translates the
complexities of this subject in a form that is fundamentally useful. I feel well placed to bridge
the gap between the everyday practitioners, who might identify themselves as beginners, and
the superstar talents expanding the potential of data visualisation. I am not going to claim to
belong to the latter cohort, but I have certainly been a novice, taking tentative early steps into
this world. Most of my working hours are spent helping others start their journey. I know what
I would have valued when I started out in this field and this helps inform how I now pass this
on to others in the same position I was several years ago.
There is a large and growing library of fantastic books offering different theoretical and
practical viewpoints on this subject. My aim is to add value to this existing collection by
approaching the subject through the perspective of process. I believe the path to mastering data
visualisation is achieved by making better decisions: namely, effective choices, efficiently made.
I will help you understand what decisions need to be made and give you the confidence to
make the right choices. Before moving on to discuss the book’s intended audience, here are its
key aims:
2  
•
DATA VISUALISATION
To challenge your existing approaches to creating and consuming visualisations. I will
challenge your beliefs about what you consider to be effective or ineffective visualisation. I
will encourage you to eliminate arbitrary choices from your thinking, rely less on taste and
instinct, and become more reasoned in your judgements.
•
To enlighten you I will increase your awareness of the possible approaches to visualising
data. This book will broaden your visual vocabulary, giving you a wider and more sophisticated understanding of the contemporary techniques used to express your data visually.
•
To equip is to provide you with robust tactics for managing your way through the myriad
options that exist in data visualisation. To help you overcome the burden of choice, an
adaptable framework is offered to help you think for yourself, rather than relying on inflexible rules and narrow instruction.
•
To inspire is to open the door to a subject that will stimulate you to elevate your ambition
and broaden your confidence. Developing competency in data visualisation will take time
and will need more than just reading this book. It will require a commitment to embrace
the obstacles that each new data visualisation opportunity poses through practice. It will
require persistence to learn, apply, reflect and improve.
Who Is This Book Aimed At?
Anyone who has reason to use quantitative and qualitative methods in their professional or
academic duties will need to grasp the demands of data visualisation. Whether this is a large
part of your duties or just a small part, this book will support your needs.
The primary intended audiences are undergraduates, postgraduates and early-career researchers.
Although aimed at those in the social sciences, the content will be relevant to readers from
across the spectrum of arts and humanities right through to the natural sciences.
This book is intended to offer an accessible route for novices to start their data visualisation
learning journey and, for those already familiar with the basics, the content will hopefully
contribute to refining their capabilities. It is not aimed at experienced or established visualisation
practitioners, though there may be some new perspectives to enrich their thinking: some content
will reinforce existing knowledge, other content might challenge their convictions.
The people who are active in this field come from all backgrounds. Outside academia, data
visualisation has reached the mainstream consciousness in professional and commercial
contexts. An increasing number of professionals and organisations, across all industry types
and sizes, are embracing the importance of getting more value from their data and doing more
with it, for both internal and external benefit. You might be a market researcher, a librarian or
a data analyst looking to enhance your data capabilities. Perhaps you are a skilled graphic
designer or web developer looking to take your portfolio of work into a more data-driven
direction. Maybe you are in a managerial position and though not directly involved in the
creation of visualisation work, you might wish to improve the sophistication of the language
you coordinate or commission others who are. Everyone needs the lens and vocabulary to
evaluate work effectively.
Introduction 
  3
Data visualisation is a genuinely multidisciplinary discipline. Nobody arrives fully formed with
all constituent capabilities. The pre-existing knowledge, skills or experiences which, I think,
reflect the traits needed to get the most out of this book would include:
•
Strong numeracy is necessary as well as a familiarity with basic statistics.
•
While it is reasonable to assume limited prior knowledge of data visualisation, there should
be a strong desire to want to learn it. The demands of learning a craft like this take time
and effort; the capabilities will need nurturing through ongoing learning and practice.
They are not going to be achieved overnight or acquired alone from reading this book.
Any book that claims to be able magically to inject mastery through just reading it cover to
cover is over-promising and likely to under-deliver.
•
The best data visualisers possess inherent curiosity. You should be the type of person who
is naturally disposed to question the world around them. Your instinct for discovering and
sharing answers will be at the heart of this activity.
•
There are no expectations of your having any prior familiarity with design principles, but
an appetite to embrace some of the creative aspects presented in this book will heighten the
impact of your work. Time to unleash that suppressed imagination!
•
If you are somebody fortunate to possess already a strong creative flair, this book will guide
you through when and crucially when not to tap into this sensibility. You should be willing
to increase the rigour of your analytical decision making and be prepared to have your
creative thinking informed more fundamentally by data rather than just instinct.
•
No particular technical skills are required to get value from this book, as I will explain
shortly. But you will ideally have some basic knowledge of spreadsheets and experience of
working with data irrespective of which particular tool.
This is a portable practice involving techniques that are subject-matter agnostic. Throughout
this book you will see a broad array of examples from different industries covering many
different topics. Do not be deterred by any example being about a subject different to your
own area of interest. Look beyond the subject and you will see analytical and design choices
that are just as applicable to you and your work: a line chart showing political forecasts
involves the same thought process as would a line chart showing stock prices changing or
average global temperatures rising. A line chart is a line chart, regardless of the subject
matter.
The type of data you are working with is the only legitimate restriction to the design methods
you might employ, not your subject and certainly not traditions in your subject. ‘Waterfall
charts are only for people in finance’, ‘maps are only for cartographers’, ‘Sankey diagrams are
only for engineers’. Enter this subject with an open mind, forget what you believe or have been
told is the normal approach, and your capabilities will be expanded.
Data visualisation is an entirely global community, not the preserve of any geographic region.
Although the English language dominates written discourse, the interest in the subject and
work created from studios through to graphics teams originates everywhere. There are cultural
influences and different flavours in design sensibility around the world which enrich the field
but, otherwise, it is a practice common and accessible to all.
4  
DATA VISUALISATION
Finding the Balance
Handbook vs Manual
The description of this book as a ‘handbook’ positions it as distinct from a tutorial-based manual. It aims to offer conceptual and practical guidance, rather than technical instruction. Think
of it more as a guidebook for a tourist visiting a city than an instruction manual for how to fix
a washing machine.
Apart from a small proportion of visualisation work that is created manually, the reliance on
technology to create visualisation work is an inseparable necessity. For many beginners in
visualisation there is an understandable appetite for step-by-step tutorials that help them
immediately to implement their newly acquired techniques.
However, writing about data visualisation through the lens of selected tools is hard, given the
diversity of technical options that exist in the context of such varied skills, access and needs.
The visualisation technology space is characterised by flux. New tools are constantly
emerging to supplement the many that already exist. Some are proprietary, others are open
source; some are easier to learn but do not offer much functionality; others do offer rich
potential but require a great deal of foundation understanding before you even accomplish
your first bar chart. Some tools evolve to keep up with current techniques; they are well
supported by vendors and have thriving user communities, others less so. Some will exist as
long-term options whereas others depreciate. Many have briefly burnt brightly but quickly
become obsolete or have been swallowed up by others higher up the food chain. Tools come
and go but the craft remains.
There is a role for all book types and a need for more than one to acquire true competency in
a subject. Different people want different sources of insight at different stages in their
development. If you are seeking a text that provides instructive tutorials, you will learn from
this how to accomplish technical developments in a given technology. However, if you only
read tutorial-based books, you will likely fall short in the fundamental critical thinking that will
be needed to harness data visualisation as a skill.
I believe a practical, rather than technical, text focusing on the underlying craft of data
visualisation through a tool-agnostic approach offers the most effective guide to help people
learn this subject.
The content of this book will be relevant to readers regardless of their technical knowledge and
experience. The focus will be to take your critical thinking towards a detailed, fully reasoned
design specification – a declaration of intent of what you want to develop. Think of the
distinction as similar to that between architecture (design specification) and engineering
(design execution).
There is a section in Chapter 3 that describes the influence technology has on your work and
the places it will shape your ambitions. Furthermore, among the digital resources offered online
are further profiles of applications, tools and libraries in common use in the field today and a
vast directory of resources offering instructive tutorials. These will help you to apply technically
the critical capabilities you acquire throughout this book.
Introduction 
  5
Useful vs Beautiful
Another important distinction to make is that this book is not intended to be seen as a beauty
pageant. I love flicking through glossy ‘coffee table’ books as they offer great inspiration, but
often lack substance beyond the evident beauty. This book serves a different purpose to that.
I believe, for a beginner or relative beginner, the most valuable inspiration comes more from
understanding the thinking behind some of the amazing works encountered today, learning
about the decisions that led to their conceptual development.
My desire is to make this the most useful text available, a reference that will spend more time
on your desk than on your bookshelf. To be useful is to be used. I want the pages to be dogeared. I want to see scribbles and annotated notes made across its pages and key passages
underlined. I want to see sticky labels peering out above identified pages of note. I want to see
creases where pages have been folded back or a double-page spread that has been weighed
down to keep it open. It will be an elegantly presented and packaged book, but it should not
be something that invites you to look but not touch.
Pragmatic vs Theoretical
The content of this book has been formed through years of absorbing knowledge from as
many books as my shelves can hold, generations of academic work, endless web articles,
hundreds of conference talks, personal interactions with the great and the good of the
field, and lots and lots of practice. More accurately, lots and lots of mistakes. What I present here is a pragmatic distillation of what I have learned and feel others will benefit from
learning too.
It is not a deeply academic or theoretical book. Experienced or especially curious practitioners
may have a desire for deeper theoretical discourse, but that is beyond the intent of this
particular text. You have to draw a line somewhere to determine the depth you can reasonably
explore about a given topic. Take the science of visual perception, for example, arguably the
subject’s foundation. There is no value in replicating or attempting to better what has already
been covered by other books in greater quality than I could achieve.
An important reason for giving greater weight to pragmatism is because of the inherent
imperfections of this subject. Although there is so much important empirical thinking in this
subject, the practical application can sometimes fail to translate beyond the somewhat artificial
context of a research study. Real-world circumstances and the strong influence of human
factors can easily distort the significance of otherwise robust concepts.
Critical thinking will be the watchword, equipping you with the independence of thought
to decide rationally for yourself which solutions best fit your context, your data,
your message and your audience. To accomplish this, you will need to develop an
appreciation of all the options available to you (the different things you could do) and a
reliable approach for critically determining what choices you should make (the things you
will do and why).
6  
DATA VISUALISATION
Contemporary vs Historical
I have huge respect for the ancestors of this field, the dominant names who, despite primitive
means, pioneered new concepts in the visual display of statistics to shape the foundations of
the field being practised today. The field’s lineage is decorated by pioneers such as William
Playfair, W. E. B. Du Bois, Florence Nightingale and John Snow, to name but a few. To many
beginners in the field, the historical context of this subject is of huge interest. However, this
kind of content has already been covered by plenty of other book and article authors.
I do not want to bloat this book with the unnecessary reprising of topics that have been covered
at length elsewhere. I am not going to spend time attempting to enlighten you about how we
live in the age of ‘Big Data’ and how occupations related to data are or will be the ‘sexiest jobs’
of our time. The former is no longer news, the latter claim emerged from a single source. There
is more valuable and useful content I want you to focus your time on.
The subject matter, the ideas and the practices presented here will hopefully not date a great
deal. Of course, many of the graphic examples included in the book will be surpassed by newer
work demonstrating similar concepts as the field continues to develop. However, their worth
as exhibits of a particular perspective covered in the text should prove timeless. As time passes
there will be new techniques, new concepts and new, empirically evidenced rules. There will be
new thought-leaders, new sources of reference and new visualisers to draw insight from. Things
that prove a manual burden now may become seamlessly automated in the near future. That is
the nature of a fast-growing field.
Analysis vs Communication
A further distinction to make concerns the subtle but critical difference between visualisation
used for analysing data and visualisation used for communicating data.
Before a visualiser can confidently decide what to communicate to others, he or she needs to
have developed an intimate understanding of the qualities and potential of the data. In certain
contexts, this might only be achieved through exploratory data analysis. Here, the visualiser
and the viewer are the same person. Through visual exploration, interrogations of the data can
be conducted to learn about its qualities and to unearth confirmatory or enlightening
discoveries about what insights exist.
Visualisation for analysis is part of the journey towards creating visualisation for
communication, but the techniques used for visual analysis do not have to be visually
polished or necessarily appealing. They are only serving the purpose of helping you truly
to learn about your data. When a data visualisation is being created to communicate to
others, many careful considerations come into play about the requirements and interests of
the intended audience. This influences many design decisions that do not exist alone with
visual analysis.
For the scope of this book the content is weighted more towards methods and concerns about
communicating data visually to others. If your role is concerned more with techniques for
Introduction 
  7
exploratory analysis rather than visual communication, you will likely require a deeper
treatment of the topic than this book can reasonably offer.
Another matter to touch on here concerns the coverage of statistics, or lack thereof. For many
people, statistics can be a difficult topic to grasp. Even for those who are relatively numerate
and comfortable working with simple statistical methods, it is quite easy to become rusty
without frequent practice. The fear of making errors with intricate statistical calculations
depresses confidence and a vicious circle begins.
You cannot avoid the need to use some statistical techniques if you are going to work with data.
I will describe some of the most relevant statistical techniques in Chapter 4, at the point in your
thinking where they are most applicable. However, I do believe the range and level of statistical
techniques most people will need to employ on most of their visualisation tasks can be
overstated. I know there will be exceptions, and a significant minority will be exposed to
requiring advanced statistical thinking in their work.
It all depends, of course. In my experience, however, the majority of data visualisation
challenges will generally involve relatively straightforward univariate and bivariate statistical
techniques to describe data. Univariate techniques help you to understand the shape, size and
range of a single variable of data, such as determining the minimum, maximum and average
height of a group of people. Bivariate techniques are used to observe possible relationships
between two different variables. For example, you might look at the relationship between gross
domestic product and medal success for countries competing at the Olympics. You may also
encounter visualisation challenges that require a basic understanding of probabilities to assist
with forecasting risk or modelling uncertainty.
The more advanced applications of statistics will be required when working with larger
complicated datasets, where multivariate techniques are employed simultaneously to model the
significance of relationships between multiple variables. Above and beyond that, you are
moving towards advanced statistical modelling and algorithm design.
Though it may seem unsatisfactory to offer little coverage of this topic, there is no value in
reinventing the wheel. There are hundreds of existing books better placed to offer the depth
you might need. That statistics is such a prolific and vast field in itself further demonstrates
how deeply multidisciplinary a field visualisation truly is.
Chapter Contents
The book is organised into three main parts (A, B and C) comprising ten chapters and an
Epilogue. Each chapter opens with a preview of the content to be covered and closes with a
summary of the most salient learning points to emerge. There are collections of further
resources available online to substantiate the learning from each chapter.
For most readers, especially beginners, it is recommended that you start from the beginning
and proceed through each chapter as presented. For those setting out to begin working on their
own visualisation, you might jump straight into Chapters 2–5 to ensure you are fully prepared
8  
DATA VISUALISATION
for some of the important preparatory activities you need to accomplish before moving on to
look at developing your design solution. For those with more experience and/or prior exposure
to this subject, who are perhaps looking to fine-tune specific aspects of their design skills, most
of your interest will lie in Part C, comprising Chapters 6–10. For readers who just want to dip
in and out of specific topic areas, although each chapter builds sequentially from the preceding
ones, they can all be read in isolation. Follow any sequence that satisfies your needs. The
coloured tabs on the outer edge will provide quick visual navigation through the distinct parts
and chapters within.
Part A: Foundations
Part A introduces some important foundational understanding about data visualisation as a
subject area and as an activity. The contents of the first two chapters give shape to the coverage
across the rest of the book.
Chapter 1 ‘Defining Data Visualisation’ will be the logical starting point for those who are
new to the field, providing a definition for the subject and exploring some of the tensions that
enrich this subject. The second section explains some of the distinctions and overlaps with
other related disciplines. If you already know what data visualisation is about, you might
choose to pass on this; it does, though, help frame many of the discussions elsewhere.
Chapter 2 ‘The Visualisation Design Process’ introduces the value of following a design
process, the sequence of activities around which the book’s contents in Parts B and C are
organised. It explains what is involved and offers some useful tips to help you seamlessly
adopt this approach. Where the process offers organisation and efficiency, design principles ensure effectiveness. The second section will describe what separates the good from
the bad in visualisation design, building up your convictions to help with your upcoming
decision making.
Part B: The Hidden Thinking
Part B profiles the first three stages of the data visualisation design process. These are the hidden preparatory stages that will significantly influence the path you take towards an eventual
solution.
Chapter 3 ‘Formulating Your Brief’ covers the opening tasks involved in initiating, defining
and planning the requirements of your work. The first section looks at issues around context,
specifically about the importance of defining curiosity and identifying the circumstances that
will shape your project. The second section considers the vision of your work, looking at what
purpose it intends to serve and how you might creatively define the type of work you will need
to pursue. Finally, a short section looks at the value of harnessing initial ideas.
Chapter 4 ‘Working With Data’ commences your practical involvement with your data,
stepping through the four distinct steps that acquaint you with the potential of your
Introduction 
  9
critical raw material. Data acquisition outlines the different origins of and methods for
obtaining your data. Data examination profiles the different characteristics that define
the type, extent and condition of your data. Data transformation builds on your examination work to find ways of modifying and enhancing your data to prepare it for use.
Finally, data exploration discusses methods for discovering more about the qualities and
insights hidden away in your data.
Chapter 5 ‘Establishing Your Editorial Thinking’ reflects on the possibilities offered by your
data and explains the importance of committing to an editorial path. The chapter opens with
a definition about the influence of editorial thinking, using two case studies to explain how
editorial definitions influence design choices later in the process.
Part C: Developing Your Design Solution
Part C represents the main part of this book and covers the five distinct layers of the data visualisation anatomy. They are presented in separate chapters to help organise your thinking and
to avoid being overwhelmed by the detailed options that exist. However, they are ultimately
interrelated matters and the chapter sequencing across this part is carefully arranged to support
this. Each chapter follows a similar structure, opening with an array of different possible design
options and supplemented by guidance on the factors that will influence your choices. Initially,
you will need to make decisions about what elements to include around data representation
(charts), interactivity and annotation. You will then complete your thinking about the appearance of these elements, through colour and composition.
Chapter 6 ‘Data Representation’ introduces the act of visual encoding and then expands on
this to provide a detailed profile of 49 distinct chart types to help broaden your visual vocabulary. The chapter closes with a run through the key factors that will influence the suitability of
your data representation choices.
Chapter 7 ‘Interactivity’ introduces the potential value of incorporating interactive features in
your work, profiling a wide range of options – such as filtering, highlighting and animating –
that will enable users to interrogate and control a visualisation. The chapter closes with the
main considerations that will influence your selection of interactive features.
Chapter 8 ‘Annotation’ describes the importance of providing useful assistance to your viewers, including headings, chart apparatus, and labels. The chapter closes with a look at which
factors will inform the choices you make.
Chapter 9 ‘Colour’ commences with an overview of different colour models. This provides the
basis for understanding the different ways of applying colour to facilitate data legibility and
deliver functional decoration. Once again, having introduced the options, we will look at how
you arrive at appropriate choices.
Chapter 10 ‘Composition’ explores the final element of developing your design solution concerning how you organise the placement and sizing of all your visual elements within the space
you have to work. Looking at matters of layout, arrangement and chart sizing, we will then
wrap up this topic with a discussion about how to make your decisions.
10  
DATA VISUALISATION
Epilogue: To close the book, the epilogue will summarise the development cycle of activities
you will need to undertake as you move your detailed design specification to a fully executed
solution.
Digital Resources
The opportunity to supplement the print version of this book with further digital companion
resources helps to offer readers a range of additional learning materials:
•
a written and video-based case-study of a visualisation project that demonstrates the design
process in action;
•
an extensive and up-to-date catalogue of over 350 data visualisation tools;
•
a large collection of tutorials and resources to help develop your technical capabilities in
making a wide range of different charts;
•
useful exercises designed to help embed the learning covered in each chapter;
•
a digital gallery of all the artwork included in this book and many further examples of the
concepts presented across all chapters;
•
refreshed reading resources to support ongoing learning about the subjects covered in each
chapter.
Glossary
Consistency in the meaning of language and terms used in data visualisation is important.
Though data visualisation is no different to many fields that get bogged down by superfluous
semantic noise, it can only help to establish clarity about its usage in this book at least.
Roles
Visualiser: This is the role I am assigning to you – the person making the visualisation.
Sometimes people prefer to use terms like researcher, analyst, developer, storyteller or even
‘visualist’. Designer would also be particularly appropriate, but I want to broaden the scope of
the role beyond just design to cover all activities involved in this discipline.
Viewer: This is the role assigned to the recipient, who is viewing or using your visualisation
product. It offers a broader and better fit than alternatives such as consumer, reader, user or
customer. However, ‘user’ will be temporarily adopted during the more active chapter about
interactivity.
Audience: This concerns the collective group of viewers for whom your work is intended.
Within an audience there will be cohorts of different viewer types that you might characterise
through distinct personas to help your thinking about serving their varied needs.
Introduction 
  11
Consuming: This will be the general act of the viewer, to consume. I will use more active
descriptions like ‘reading’ and ‘using’ when consuming becomes too passive or vague, and
when distinctions are needed between reading a chart and using interactive features.
Data
Raw data: For the purpose of this book, raw data will be the initial state of data you have
collected, received or downloaded that has not yet been subjected to any statistical or transforming treatment. Some people take issue with the implied ‘rawness’ this label implies, given
that data will have already lost its raw state having been recorded by some instrument, stored,
retrieved and maybe cleaned already. I appreciate this viewpoint but think it is the most pragmatic label relevant to most people’s understanding.
Data source: This is the term used to describe the origin(s) of the raw data used in a
visualisation.
Dataset: A table of data is an array of values visually arranged into rows and columns, usually
existing in a spreadsheet or database. The rows are the records – instances or items – and the
columns are the variables – details about the items. Datasets are visualised in order to ‘see’ the
size, patterns and relationships that are otherwise hard to observe. A dataset may comprise one
or a collection of several tables.
Tabulation: For the purpose of this book, I distinguish between types of datasets that are ‘normalised’ and others that are ‘cross-tabulated’. This distinction will be explained in context in
Chapter 4.
Data types: The variables (columns) in a table that hold details about items (records) will have
different scales of measurement or data types. At the most general level, distinctions in quantitative (e.g. salary) and categorical (e.g. gender) data are important in how you will statistically
and visually handle them. A detailed distinction between data types, with examples, will again
be offered in Chapter 4.
Series: A series of values is essentially a sequence of related values in a table. An example of a
series would be the highest recorded temperatures in a city for each day over a month. Though
individual daily values will be stored as distinct moment-in-time measurements, the activity of
temperature never stops ‘happening’ and therefore the collected values have a legitimate continuous relationship through the series.
Visualisation
Project: For the purpose of this book, we will consider the development of a data visualisation
as being a project. Even though you might consider something a quick, small task, it will still
need to involve the thinking consistent with the stages of the process covered in this book.
Chart type: Charts are visual representations of data. There are many ways of representing your data, using different combinations of marks, attributes, layouts and apparatus.
12  
DATA VISUALISATION
Their combinations form archetypes of charts more commonly named chart types, such as the
bar chart, dendrogram or treemap.
Graphs, plots, diagrams and maps: Traditionally the term graph has been used to describe
visualisations that display network relationships, while chart would be commonly used to label
common devices like the bar or pie chart. Plots and diagrams are more specifically attached to
special types of displays but with no pattern of consistency in their usage. All these terms are so
interchangeable that any energy expended in explaining meaningful difference is redundant. For
the purpose of this book, I will generally stick to the term chart to act as the main label to cover
all representation types. In places, this ‘umbrella’ term will incorporate thematic maps, for the
sake of convenience, even though they clearly have a visual structure that is quite different to
standard charts.
Graphic: The term graphic will be used when referring to visuals more focused on information-led displays such as explanation or process diagrams as distinct from charts that are
concerned with data-driven visuals. It might also be used to refer more broadly to a visualisation that incorporates charts, text and images.
Format: This concerns the difference in output form between printed work, digital work and
physical visualisation work.
Functionality: This concerns the difference in whether a visualisation is static or interactive.
Interactive visualisations allow you to manipulate and interrogate a computer-based display of
data. They are published on the Web, exist within apps, or are on larger digital displays, as in
galleries. In contrast, a static visualisation displays a non-changeable, still display of data that
could be published in print but also digitally. Just because something is published digitally does
not automatically make it interactive.
Axes: Many common chart types have axis lines that provide a reference for measuring quantitative values or positioning categorical values. The horizontal axis is known as the x-axis and
the vertical axis is known as the y-axis.
Scales: Scales exist in two forms, typically. Firstly, as a set of marks along an axis that indicate
positions for the range of values included in a chart. Scales are normally presented in regular
intervals (10, 20, 30, etc.) representing units of measurement, such as prices, distances, years or
percentages. A scale may also be presented in a key to explain associations between, for example, different sizes of areas or classifications of different colour attributes.
Legend: Charts that employ visual attributes, such as colours, shapes or sizes to represent values of data, will often be accompanied by a legend to house visual explanations of classifications,
known as keys.
Outliers: Outliers are points of data that are outside the normal range of values. They are
the unusually large or small or simply different values that stand out and generally draw a
viewer’s attention.
Correlation: This is a measure of the presence and extent of a mutual relationship between
two or more variables of data. For example, you would expect to see a correlation between the
height and weight of people or age and salary of workers. Devices like scatter plots, in particular, help visually to portray possible correlations between two quantitative values.
Part A
Foundations
1
Defining Data Visualisation
This opening chapter will introduce data visualisation through the prism of a proposed definition. Each component that forms this definition will be explored in depth to illustrate some of
the main characteristics and complexities of this subject.
The second part of the chapter will position data visualisation in the context of other related
disciplines or fields, explaining where overlaps or clear distinctions exist. Overall, this chapter
will seek to forge a shared understanding that will help set the tone and reasoning for the
structure of this book.
1.1 What Is Data Visualisation?
It is useful to commence this book with a definition of data visualisation (Figure 1.1). It helps
to ensure we (you the reader, me the writer) have a mutual understanding, from the outset,
about what is meant by data visualisation in the context of this text. The components of this
definition carve the subject into distinct perspectives around which the contents of this book
are organised.
Figure 1.1 A Definition
for Data Visualisation
16  
FOUNDATIONS
Let me delve into this and describe the roles of and relationships between each component
expressed. I will also explain where and how these topics will be covered. Firstly, let’s look at data.
Data is names and amounts. It is groupings, descriptions and measurements. It is dates and
locations. It will be helpful for discussions in this book to think of data as being typically
structured in table form, with rows of records and columns of variables. Most data we
commonly encounter will exist in textual, numeric or a combined form, but it is also worth
noting the opportunities that increasingly exist for working with data assets in media forms
of images, audio and video.
In Chapter 4 you will learn about the importance of developing an intimate understanding of
your data to acquaint yourself fully with its properties, its condition and its qualities.
You will see that data is the fundamental element driving the decisions across this design
process. Without data there is no material to feed nor necessitate a visualisation. Conversely,
without visualisation the value of data can be unfulfilled. This is not to say we should always
visualise data, absolutely not, but in most circumstances, to harness the maximum value of
data, there are missed opportunities if we do not.
To explain, here is a simple illustration. When data is presented in a table, it is a straightforward
task for a viewer to scan the rows and columns to seek out values of relevance or to discover
particular data points that trigger interest. For instance, by viewing the table in Figure 1.2 it
should prove quite simple to find out what the percentage share of online sales for a Company X
was during April 2016. Now look for the percentage share of store sales during December 2011.
Figure 1.2 Proportion
of Sales % by Channel
Over Time
Defining Data Visualisation 
  17
As a viewer your task is simply to find the relevant row and column intersection: look at the
value display and read it. The percentage share of online sales for Company X during April 2016
is 84, and for store sales during December 2011 it is 71.
To find which sales channel had the second largest percentage share of sales during August
2014, again just find the relevant row, compare the three quantitative values along that row,
and then determine which channel column contains the second-ranked amount. For this
month, the online channel, at 44, had the second largest percentage share of sales.
The limitations of reading data when it is presented in this form emerge when we want to answer
broader questions: that is, enquiries that transcend the scope of an answer originating from a
single or small number of adjacent data points. From the same table, how easy do you find it to
identify the headline trends across each sales channel over the period of time displayed?
You can probably ascertain that the percentage share of sales for stores starts quite high then
drops to nothing, the percentage share of online sales starts quite low and then reaches the
100% maximum, and the percentage share of sales via telephone is consistently tiny.
Though it takes a while to study the values under each sales channel column in order to form
this summary observation, it is still possible. But what if your observations need to be formed
more quickly? What if you needed to know more about the localised patterns of ups and downs
within those global trends? What if you wanted to identify the first occasion when the
percentage share of online sales exceeded the percentage share of store sales? When was the last
occasion the percentage share of store sales exceeded that of online sales? During which periods
did the different sales channels experience the most accelerated upward or downward changes?
These are harder questions to answer efficiently and accurately from the data alone. This is
because synthesising observations from multiple values across different rows and columns to
perceive broader relationships fails to exploit fully the capabilities of our visual system – how
our eyes and mind work together to make sense of objects and patterns. To read values in
isolation, store them in our short-term memory and compare them in our head with other
isolated values is mentally challenging. It is not impossible, since we can still accomplish this
with just a table of data, but it will take an excessive amount of time and effort.
This workload will also only increase as the data grows in volume and complexity. For instance,
what if this table were 1000 rows deep and there were 20, 50 or 100 different columns to work
through? Or, what if the quantities had similar value sizes and more modest variation? How
easy would it then be to notice significant patterns?
The crux of all this is that we can look at data, but we cannot really see it. To see data, we need
to represent it in a different, visual form.
Returning to the definition, the term visual representation is arguably the quintessential
activity of data visualisation. Representation involves making decisions about how you are
going to portray your data visually so that the subject understanding it offers can be made
accessible to your audience. In simple terms, this is all about charts and the act of selecting the
right chart to show the features of your data that you think are most relevant.
The building blocks of any chart are marks and attributes. Marks can be points, lines or shapes
and they are used to represent items of data. An example of an item of data from the table in
18  
FOUNDATIONS
Figure 1.2 would be the ‘percentage share of sales from stores during June 2014’. Not the value
itself, more the thing the value is about.
Attributes, sometimes described as channels, are visual variations of marks to represent the
values associated with each. These include properties such as different scales of size, colour or
position. If the item of data is ‘percentage share of sales from stores during June 2014’, an
attribute would be used to represent the associated value, in this case 72. If marks and attributes
are the ingredients, the different combinations used create different chart types – the recipes.
Figure 1.3 shows a chart of the data shown in the table from Figure 1.2. Here the data is
represented using a line chart, a common chart type used to show how quantitative values
change over time. In this case the items of data are represented by point marks, positioned at
the intersection of the relevant x and y positions for each reporting month and channel. The
attributes used here are, firstly, the connected lines that join the continuous series of values for
each channel and, secondly, the distinct colours applied to distinguish each line path and
associate them with their respective sales channel category.
Figure 1.3
Proportion of Sales Percentage by Channel over Time
As a viewer, you scan this chart to form observations about the three sales channels individually
and then compare them with each other. The comparisons made between separate channels are
especially relevant for this data as the quantities shown are representative of parts of a 100%
whole. This means that at any given point along the timeline, the change in value for one
channel will have an effect on the values across the two others.
Consuming this data in chart form, as opposed to reading a table, enables a viewer to
process clusters of multiple data points simultaneously to identify the slopes and flats, the
Defining Data Visualisation 
  19
peaks and troughs, as well as gaps and cross-overs between lines. Though the precision of
determining an individual data point (e.g. from the chart, what was the percentage share
of online sales during April 2016?) is slightly diminished compared with the ease of
performing the same task with data in table form, observations about the collective patterns
and relationships, in turn, become more precise. The story of the rise in dominance of
online sales and the related decline of store sales is immediately apparent, but what is
striking here is an intense pattern of ebb and flow during the time period of mid-2014 to
mid-2015, out of which the significant respective changes in trajectory of online and store
sales materialised and continues.
The chart has the same data as the table, but it is represented differently. Whether this chart
view is better than a table view depends on the purpose of your communication and the
needs of your audience. You do not chart data because you can, you do it because it provides
a window for seeing different features of data. We will explore the judgements you need to
make about what you want to show your audience in Chapter 5 and, in particular, in Chapter 6
where you will learn about the wide range of established chart types that are commonly used
by the visualisers of today. These charts vary in complexity and composition. Each is capable
of accommodating different types of data and portraying different types of analysis. This
chapter will broaden your visual vocabulary, giving you an appreciation of more ways to
express your data. It will also increase the sophistication of how you go about making
effective choices.
The next component of the definition is presentation, which concerns all the other design
decisions that make up the full anatomy of any visualisation. As this text is focused on creating
visualisation as a means for communicating to others, presentation concerns how we choose
to ‘package up’ a visualisation work to impart it to an audience, irrespective of the medium or
disseminating method.
Visual presentation includes design choices such as the possible application of interactivity,
features of annotation, all matters around colour usage, and the composition of the work.
Considering the line chart in Figure 1.3, if this was intended for the Web, you could
envisage interactivity being useful to offer tooltip details of value labels as you hover over
parts of each line. You could offer controls to modify the x-axis time range or filtering to
hide or show different lines of interest. There are some features of annotation already on
display with this chart, such as the title, the colour legend, and the x- and y-axis scales. You
could also add captions to provide explanations about some of the most noticeable patterns
in the data. As mentioned, colour is already used as an attribute, to distinguish the lines for
each sales channel category, but the application of colour extends across every visible
element including the background shading, gridline colours, and colouring of any text or
labels. Finally, composition relates to the size and placement of all design elements, like the
dimensions of the chart area, the alignment and size of the title, and the placement of the
axis labels.
The thinking that goes into designing the full anatomy of a visualisation – combining visual
representation and presentation – is inevitably interconnected. The selection of a chart type
inherently triggers a need to think about the space and place it will occupy on your screen or
20  
FOUNDATIONS
page; a clickable interactive feature that reveals annotated captions requires careful thought
about how to style the text and what colours to use.
There are lots of seemingly small design decisions to make in visualisation, little things that add
up to having a big impact. During the early stages of learning this subject it is helpful to
partition your presentation thinking and tackle these design concerns as separate layers.
Chapters 7–10 will explore each of these design matters separately, but in sufficient depth,
profiling the options available and the factors that influence your decisions. As you gain
experience and assurance, the interrelated nature of the choices you make will become more
seamless and you will be stimulated by the depth of thinking demanded of you.
The final component of the definition expresses that data visualisation aims to facilitate
understanding. Everything in this book essentially boils down to helping you accomplish
this objective. We will deal with the term facilitate shortly, but let’s focus for now on the word
understanding.
The notion of understanding is quite broad. To best explain its relevance to data visualisation
requires us, again, to turn to the perspective of a viewer.
When consuming a visualisation, a viewer will go through a process of understanding involving
three phases: perceiving, interpreting and comprehending (Figure 1.4). These are not just synonyms
for the same word, rather they convey distinctions in cognitive focus.
Figure 1.4
The Three Phases of Understanding
For the benefit of this illustration we will consider them to occur in a linear sequence, with
successive phases being dependent on the preceding phase having been accomplished. To a
viewer, consciously trying both to understand a visualisation and to extract understanding from
a visualisation, these different phases will feel rather indiscernible. They might appear to occur
in parallel. Viewers are human – there are occasions when rapid interpretations of a chart’s
headline features are made before the whole content has had a chance to be perceived first.
Let’s look at the characteristics of and differences between these phases referring, initially, to
an example chart (Figure 1.5) that presents some headline statistics about footballer Lionel
Messi’s career with FC Barcelona.
Defining Data Visualisation 
  21
The first phase is perceiving, and this concerns the act of reading a chart: ‘what do I see?’. A
viewer decodes how the data is represented to form initial observations about the main features
of the displayed data:
•
What chart is being used?
•
What items of data do the marks represent? What value associations do the attributes
represent?
•
What range of values are displayed?
•
Are the data and its representation trustworthy?
Figure 1.5 Lionel Messi: Games and Goals for FC Barcelona
Source: Data from transfermarkt.com
In the example we see a clustered bar chart showing quantitative values of pairs of categories
over time. This is a chart type I am familiar with and so I feel instantly at ease with the prospect
of consuming it.
I see time is plotted on the x-axis in years – or, more specifically, football seasons – and a shared
quantitative measure is on the y-axis. There are two distinct categories of bars for each season, with
the colour association explained by an explanation key integrated into the title. The burgundy bars
show the games played in a season and the blue bars the number of goals scored. This title also
helps establish clarity about what the data is showing. As the representation method is understood,
initial observations begin to form about the main characteristics of the display:
22  
FOUNDATIONS
•
What features – shapes, patterns, differences or connections – are observable?
•
Where are the largest, mid-sized and smallest values? (known as ‘stepped magnitude’
judgements).
•
Where are the most and the least? Where is the average or normal? (‘global comparison’
judgements).
When scanning the chart, my eyes are drawn to the dominant bars in the middle and towards the
right of the display. I am particularly interested in the highest pair of bars in 2011/12. With assistance offered by the horizontal gridlines and axis labels I can perceive with reasonable confidence
that the highest number of goals scored was 73 and the most games played was 60. I can see that
the burgundy bars – showing games played – are relatively stable in size since around 2008/09, but
the blue bars are more erratic. The bar heights for both categories are much smaller the further left
the time series goes. Looking between the categories, there is no consistency in the relationship as
the burgundy bars are sometimes larger than their blue neighbours, sometimes smaller.
Interpreting, the second phase of understanding, translates these observations into
quantitative and/or qualitative meaning. Interpreting involves assimilating what you have
observed against what you know about the subject. What does what you have seen mean,
given the subject?
•
What features – shapes, patterns, differences or connections – are interesting?
•
What features are expected or unexpected?
•
What features are important given the subject?
The task of drawing interpretations from the observations I made on the chart is helped considerably by my interest in and knowledge of football. I know that if a player is scoring more than
25 goals in a season this is very good, and to score over 35 is exceptional. To achieve 50, 60 or
indeed 70 goals in a season is frankly preposterous, especially at the highest level of the game. I
know it is rare for a player to be scoring at a ratio of greater than one goal per game played, so
the seasons where a blue bar exceeds the height of the burgundy bars represent a quite remarkable statistic. I could elaborate on some of the features I expected to (and do) see in this chart based
on knowing the periods when different managers were in charge of Barcelona, which other players were in the team, and how the team performed from one season to the next. I know what to
expect in terms of the classic shape of a footballer’s career arc and can map that onto Messi’s,
anticipating that at some point – but not yet – the classic rise, peak and plateau will inevitably be
followed by steady decline.
As this commentary demonstrates, a viewer’s ability to perform rational interpretation will be
significantly determined by factors external to the visualisation itself. The degree of knowledge
viewers possess about the portrayed subject and their capacity to close a knowledge gap is
fundamental. To fulfil the perceiving of a chart, viewers need the context of scale; to fulfil the
interpreting of a chart, viewers need the context of subject. Furthermore, there is the matter of
willingness. At the time of consuming a visualisation, not everyone has the inclination to
engage with it, especially if they have no interest in a subject or if it has no immediate relevance
to their needs.
Defining Data Visualisation 
  23
My connection with the subject of football helped me understand more about the meaning of
the features of data compared with other viewers who might possess no knowledge of the sport.
Switching the subject from football to a completely made-up topic, but using the same chart
with the same data, reinforces this. In Figure 1.6 we see a chart displaying data about the
sightings of Winglets and Spungles.
Figure 1.6
Total Sightings of Winglets and Spungles
I can still perceive the chart, observing the same features as I did when it was portraying Messi’s
quantities of games and goals, but as I have no knowledge of this subject I cannot interpret it.
I have no idea what Winglets and Spungles are, so I cannot form any reasonable sense of what
is interesting, surprising or important about the features of this display. My process of
understanding stops after the perceiving phase.
As this illustrates, any deficit in a viewer’s connection to a subject will fundamentally impede
progress towards performing interpretation. Additionally, this may heighten the risk of the
viewer drawing spurious or unsupported interpretations from a visual display.
In situations where a potential viewer might not possess sufficient knowledge of a subject, it
will require the visualiser to assist in bridging the gap between observation and meaning. This
can be achieved through simple design elements like the provision of captions, inclusion of
headlines and astute use of colour to create emphasis, for example. The viewer must then take
responsibility to learn from the assistance provided. As the purple colouring of the middle
phase circle shown in Figure 1.4 denotes, forming useful and reasonable interpretations is a
shared responsibility.
24  
FOUNDATIONS
The final phase of understanding is comprehending, which is the consequence or reflective
legacy of the communication experience. The viewers now consider what the interpretations
mean to themselves. What can be inferred as being important to you about the interpretations
you have made?
•
What has been learnt? Has it reinforced or challenged existing knowledge? Has it been
enlightened with new knowledge?
•
What feelings have been stirred? Has the experience had an impact emotionally?
•
What does one do with this understanding? Is it just knowledge acquired or something to
inspire action, such as making a decision or motivating a change in behaviour?
In my case, the outcome of the understanding achieved from the Messi chart is nothing too
dramatic or emotional. There is no direct action linked to it, rather I simply reflect on gaining
a heightened impression, formed out of this data, about how sensational a footballer he has
been and continues to be. For Barcelona fanatics who watch him play every week, they will
have already formed this understanding. This information would only reaffirm what they
already knew. To others less familiar with the subject, it might be more enlightening, but only
if they had any requisite interest.
One person’s ‘wow’ is another person’s ‘I knew that’ is another person’s ‘I don’t care’. Even if
you have just two people in your target audience group, you have potentially two different
viewer profiles. We cannot always anticipate what they do not know, what they want to know
and what is the relevance to them of knowing something.
Visualising data is just an agent of communication and not a guarantor for what a viewer does
with the opportunity for understanding that is presented. There are different flavours of
comprehension, different consequences of understanding formed through this final phase. Many
visualisations will be created with the ambition simply to inform, like the Messi graphic achieved
for me, perhaps to add just an extra grain to the pile of knowledge held about a subject. Not every
visualisation exists to lead a viewer towards some Hollywood-esque moment of grand discovery,
surprising insight or life-changing decision. That is OK, though, as long as the outcome fits with
the intended purpose, something we will discuss in more depth in Chapter 3.
Once again, the association a viewer has with the subject portrayed will greatly influence this
comprehending phase. Returning to the data shown earlier about the percentage sales by channel
over time for Company X, let’s suppose this was a chart produced to assess the effectiveness of a
corporate strategy to consolidate operations towards an online-only sales model.
The outcome of the interpretations formed from this chart might be to draw the conclusion
that whatever actions were taken, they have succeeded. Depending on when the 100% online
sales target was expected, it may be that this chart demonstrates complete success. It might also
reveal belated success. Maybe the company was hoping for 100% online sales far sooner than
when they were achieved. Conversely, the analysis shown might reveal unexpected patterns of
sales. Online channels are clearly dominating, but what if the company is still maintaining the
expense of running stores, with staff costs and stock tied up in what appears to be an expired
model? There might be substantial costs assigned to telephone operators waiting for the phone
Defining Data Visualisation 
  25
to ring in order to make a potential sale. But nobody is phoning, so maybe the company should
look at restructuring.
All these are reasonable avenues that comprehending this data could lead to. But, at this point,
I should reveal that the real context of this data actually had nothing to do with sales. That
subject was picked for illustration purposes, but the data was actually about something else.
Specifically, this was data about the ever-shifting forecasts during the night of the 2016 US
Election. The data values came from FiveThirtyEight, a respected website noted for its use of
statistical techniques to analyse and tell stories about elections and several other data-rich
subjects. The quantities relate to the ‘chances of winning the presidency’ forecasts for the two
main party candidates, as well as a residual ‘other result’ percentages to make up the 100%
aggregate. The temporal dimension concerned the times during the night of the election
(8 November 2016), when key results were declared, influencing the changes shown in the
forecasted outcome at each point.
Figure 1.7 shows the same chart as before, with the same quantities plotted and with the same
design, but now reflecting the true context of the subject matter, as indicated by the updated
title, colour key and x-axis scales.
Figure 1.7
Forecasted % Chance of Winning Presidency (US Election, 8 November 2016)
Irrespective of where you sit politically, the revised context of the data portrayed in this chart
will unquestionably change how you feel about what you now see. It is no longer just a routine
sales chart restricted in relevance to a small group of people at Company X. It is now a
visualisation about a momentous event in modern history, the outcome of which most people
on the planet have some connection with or awareness of.
26  
FOUNDATIONS
There are consequences of emotion to consuming this data. Some will relive the wild jubilation
of their candidate’s unexpected victory, others will recoil in horror at the memory of their
candidate’s unexpected defeat.
There are consequences of enlightenment. Some will be seeing these compelling patterns of ebb
and flow for the first time, others will at least recollect this roller-coaster story playing out via
TV or web coverage during the night itself.
There are also rational reactions. Consuming this chart now, many months or years later,
offers the opportunity for more considered analysis, in contrast to the original setting of this
data being consumed live across the USA and the rest of the world via a dynamically – and
dramatically – changing forecast tracker. In the cold light of day questions can be asked (and
have been) about the rigour of polling methods as well as the calculations used to create such
forecasts. ‘How could they be so wrong?’ some have asked, while others have countered with
‘How could they be expected to be more right, it’s a complicated electoral system!?’
From your perspective as the visualiser, this final phase of understanding is something you will
have limited control over. Everything depends. It can be frustrating for people who are learning
visualisation and who just want the answer: ‘How do I deliver understanding to my audience?!’
In my experience, the factors that most influence the success of a visualisation are not technical,
they are contextual and, furthermore, human. Viewers are people. People are different, and
people are complex. They can be irrational and unpredictable, or impassive and disengaged. You
can lead a horse to water, but you cannot make it drink: you cannot force viewers to be interested
in reading your work, nor to understand the meaning of what you present, nor control how they
react to that experience. Even if your visualisation clearly shows action needs to be taken, you
cannot guarantee the viewers will recognise there is a need to act, will be in a position to act, and
indeed will know how to act.
It is at this point that we must recognise the ambitions and – more importantly – the limitations
of what data visualisation can deliver. Returning to the definition for a final time, the
illustrations we have gone through in this chapter support why the term facilitating is
realistically the most a visualiser can do. It might feel like a rather tepid duty, something of a
cop-out that abdicates responsibility for the outcome – why not aspire to achieve something
more concrete than ‘facilitate’?
I use facilitate because it gets to the heart of the tensions that visualisers face. There are times
when the onus is on us, and other times when the onus is on the viewer. Visualisation design
cannot change the world, it can only make it run a little smoother. Visualisers can control the
output but not the outcome; at best we can expect to have only some influence on it. The rest
of this book concerns how we optimise this influence.
1.2 Distinctions
Having delved into the proposed definition for data visualisation, it is now worth
acknowledging some other associated terms and disciplines that you may be familiar with
or aware of.
Defining Data Visualisation 
  27
The subtleties and semantics of defining fields are recurring concerns as new technologies develop
and creative techniques evolve. As participation has grown over the past decade, data visualisation
has been cross-pollinated with creative and analytical sensibilities arriving from different origins.
The traditional boundaries begin to blur and the practical value of preserving dogmatic distinctions
reduces accordingly. Ultimately, when one is tasked with creating a visual portrayal of data, does it
really matter if the creation is labelled and filed under ‘data visualisation’ or ‘infographic’ as long
as it achieves the aim of helping the audience to achieve some form of understanding?
However, subject distinctions do need to be understood. It is important for people to identify
with a particular discipline in which they have recognised expertise. It is therefore worth
clarifying some proposed distinctions, so, once again, we are on the same page of understanding.
Infographics: The classic distinction between infographics and data visualisation concerns
the format and the content. Infographics were traditionally created for print consumption, in
newspapers or magazines, for example. The best infographics explain things graphically –
systems, events, stories – and can often be generalised as explanation graphics. Infographics
contain charts (visualisation elements) but may also include illustrations, photo-imagery,
diagrams and text. These days, the art of infographic design continues to be produced for static
output – as opposed to interactive – irrespective of how and where the work is published.
Earlier this decade there was an explosion in different forms of infographics. From a purist
perspective, this wave of work was generally viewed as being an inferior form of infographic
design. These pieces were primarily driven by marketing desire for ‘clicks’, above any real desire
to facilitate understanding. If your motive is ‘bums on seats’ then I feel this is a different
endeavour to pure infographics and I would question the legitimacy of attaching the term
infographic to these designs; perhaps instead info-posters or tower graphics (they commonly
existed with a fixed-width dimension and huge length in order to be embedded into websites and
onto social media platforms) could be used. It is important not to dismiss entirely the evident – if
superficial – value of this type of work, as demonstrated by the occasional viral success story. But
I sense the popular interest in these forms has now waned and the authentic superior-quality
infographic has managed to rise back out of this noise.
Information visualisation: Smarter people than me use labels of data visualisation and
information visualisation interchangeably, without a great deal of thought for the relevant
differences. The general distinction tends to be shaped by one’s emphasis in focus towards
either the input material (data) or the nature of the output form (information). It is common
for information visualisation to be used as the term to define work that is primarily concerned
with visualising abstract data structures such as trees or graphs (networks) as well as other
qualitative data (therefore focusing more on relationships rather than quantities).
Information design: Information design is a design practice concerned with the presentation
of information. It is often associated with the activities of data visualisation; indeed sometimes
it is presented as the major field in which data visualisation belongs. Unquestionably, both
share an underlying motive to facilitate understanding. However, in my view, information
design has a much broader application concerned with the design of many different forms of
visual communication, particularly those with an instructional or functional slant, such as
way-finding devices like hospital building maps or in the design of utility bills.
28  
FOUNDATIONS
Data journalism: Also known as data-driven journalism (DDJ), this concerns the increasingly
recognised importance of having numerical, data and computer skills in the journalism field.
In a sense it is an adaption of data visualisation but with unquestionably deeper roots in the
responsibilities of the reporter/journalist.
Visual analytics: Some people use this term to relate to analytical-style visualisation work,
such as dashboards, that serve the role of operational decision support systems or provide
instruments of business intelligence. The term is also used to describe the analytical reasoning
and exploration of data facilitated by interactive visual tools. This aligns with the role of
exploratory data analysis that I will be discussing in Chapter 4.
Data science: As a field, data science is hard to define, so it is easier to consider it through
the lens of a data scientist’s duties. Data scientists are somewhat unicorn-like in that they
possess – or are expected to possess – an almost preposterous repertoire of capabilities
covering the gamut of demands involved with gathering, handling, analysing and
presenting data. Typically, the data scientist works with data of large size and complexity.
Data scientists have strong mathematical, statistical and computer science skills, not to
mention astute business experience, and are also expected to possess so-called ‘softer’
abilities like problem solving, communication and presentation.
Scientific visualisation: This is another form of a term used by many people for different
applications. Some label exploratory data analysis as scientific visualisation (drawing out the
scientific methods for analysing and reasoning about data). Others relate it to the use of visualisation
for conceiving highly complex and multivariate datasets specifically concerning matters with a
scientific bent (such as the modelling functions of the brain or molecular structures).
Data art: Apart from the disputes over the merits of certain infographic work, data art is arguably
the other discipline related to visualisation that has historically stirred up the most debate. Again,
maybe it is reasonable to suggest the noise is quieter these days, but its sheer existence still manages
to wind up certain sections of the data visualisation illuminati. Data artists work with a similar raw
material in the form of data, but their goal is not driven by facilitating the kind of understanding
that a data visualisation would seek. Data art is more about pursuing a form of self-expression or
aesthetic exhibition using data as the paint and algorithms as the brush. As a viewer, the meaning
you draw from displays of data art are entirely down to the personal interpretation it invites.
Dashboard: These are popular methods for displaying multiple visualisations and statistical
information. Dashboards often take the form of some organisational instrument that offers
both at-a-glance and detailed views of many different analytical and information dimensions.
Dashboards are not a unique chart type themselves, but rather should be considered
compositions that comprise multiple chart types.
Storytelling: This is an increasingly common term that is often misused and misunderstood,
which is quite understandable. Stories are usually constructed upon some notion of movement,
change or narrative. Charts showing trends or activities over a temporal plane or maps portraying
spatial relationships offer displays that are most consistent with the idea of a story. A bar chart
alone does not represent a story, in most people’s sense of the term, but if you show a pair of bar
charts to represent a before-and-after comparison, you have created a change dynamic.
Defining Data Visualisation 
  29
Similarly, if you incorporate charts into some temporal presentation like a slideshow or video,
the chart becomes a prop and a narrator may draw out the story verbally. In this case it is the
setting and delivery that are consistent with the notion of storytelling, not the chart itself.
A further distinction to make is between stories that are explicitly communicated and stories that
form through interpretation. The famous six-word story For sale: baby shoes, never worn by Ernest
Hemingway is not presented as a story, rather the story is triggered in our mind when we read
this passage and start to infer meaning, implication and context. A story is being presented
only if it is accompanied by some explanation of the meaning of the data. Otherwise, any story
derived is what the viewers form themselves.
Summary: Defining Data Visualisation
In this chapter you have been introduced to the subject of data visualisation, learning a definition that will shape much of the structure and content of this book:
The visual representation and presentation of data to facilitate understanding.
The different components that form this definition have been explained, with particular focus on the
nuances around facilitating understanding. The three distinct phases of understanding were described:
•
Perceiving: what do I see?
•
Interpreting: what does it mean, given the subject?
•
Comprehending: what does it mean to me?
The second section explained some of the distinctions and overlaps with other related disciplines, supplementing the glossary provided in the Introduction.
What now? Visit book.visualisingdata.com
EXPLORE THE FIELD Expand your knowledge and reinforce your learning about working
with data through this chapter’s library of further reading, references, and tutorials.
TRY THIS YOURSELF Revise, reflect, and refine your skill and understanding about the
challenges of working with data through these practical exercises.
SEE DATA VISUALISATION IN ACTION Get to grips with the nuances and intricacies of
working with data in the real world by working through this next instalment in the narrative
case study and see an additional extended example of data visualisation in practice. Follow
along with Andy’s video diary of the process and get direct insight into his thought processes,
challenges, mistakes, and decisions along the way.
2
The Visualisation Design
Process
In this second chapter I will outline the data visualisation design process around which the
book’s chapters are arranged. You will learn why using a process approach is important to
organise and optimise your thinking – taking you from the initial spark of curiosity, through
wrangling with data, to juggling the myriad options that shape a design solution.
The process organises the activities into a sequence of manageable chunks so that the right
things are tackled in the right order. You cannot expect just to land on a great solution by
chance if your working practices are chaotic and confused. You will be aided by some additional
practical tips and good habits to employ across the whole process.
The quality of your decision making is the main difference between a visualisation that
succeeds and one that fails. To maximise the effectiveness of facilitating understanding for your
audience, the sectional parts of the chapter will introduce the three principles of good
visualisation design.
2.1 Design Process: Organising Your Decision Making
For those new to the field, one of the first things to grasp is the idea that any notion of perfect
in data visualisation does not exist. It can prove simultaneously frustrating and liberating to
learn that there are good and bad solutions, but there are no perfect ones. To have perfect you
need immaculate conditions that are free of pressure, constraint or flaw. That is how things
operate now in real life. There will always be demands pushing and pulling you in different
directions. There will be shortcomings in the data that frustrate you or limitations in technical
ability that impede you. As described in Chapter 1, people, as recipients, introduce a diversity
of need that realistically cannot always be fulfilled. Recognising that perfect is unobtainable
helps unburden us from a nagging sense that somehow we might have missed finding the perfect solution. There will never be just one single possible solution to a problem.
The central premise in this book is that decision making is the key competency in data
visualisation: namely, effective decisions, efficiently made. To accomplish this you need to
follow a design process that organises your thinking and is underpinned by robust principles
to optimise your thinking.
32  
FOUNDATIONS
We will discuss principles shortly, but firstly let’s look briefly at the design process overall
(Figure 2.1).
Figure 2.1
The Four Stages of the Data Visualisation Design Process
Across the four stages that make up this process there are two main phases. The first three
stages, presented in Part B of this book through Chapters 3 to 5, involve activities that I describe
as concerning the ‘hidden thinking’ of data visualisation. These stages cover the preparatory
work that informs what you are visualising, for whom and, crucially, why:
1
Formulating your brief: planning, defining and initiating your project.
2
Working with data: gathering, handling and preparing your data.
3
Establishing your editorial thinking: defining what you will show your audience.
The second main phase of the process sits entirely with stage 4 and this involves developing
your design solution, the visual manifestation of the preparatory work you have conducted.
This stage is concerned with the how.
The five distinct design layers that make up the anatomy of any visualisation solution – data
representation, interactivity, annotation, colour and composition – are covered in Part C of this
book, in Chapters 6 to 10 respectively. As explained earlier, a detailed treatment of technical
activities is beyond the scope of this text.
I am not going to describe these process stages in more depth here – the next eight chapters
exist to do that. Instead, here are some observations about why it is important to follow a
design process.
Reducing the randomness of your approach: The value of this design process is that it
shapes your entry and closing points. How do you start a process? How do you know when you
have finished? As I have mentioned, the sheer extent of things you will have to think about,
even with simple projects, can be quite an overwhelming prospect. This approach breaks down
key stages into a connected system of thinking that will help progress your work and preserve
cohesion between your activities. It incrementally leads you towards developing a solution,
with each stage building on the previous one and informing the next.
Every project is different: Every visualisation presents new challenges. Even if you are just
re-producing the same report every month, no two instances of that report will involve the exact
same context. Just by having one extra month of data, for example, may expose you to larger
values, smaller values, new values and expired values. Whether you have simple data, or vast
The Visualisation Design Process 
  33
amounts of complex data, two hours or two
months, the process you follow will always be
the same. You should follow the same sequence
of thinking regardless of the size, speed and
complexity of your challenge. The main difference is that any extremes in the circumstances you face will amplify the stresses at
each stage of the process and place greater
demands on the need for thorough, effective
and timely decision making.
Adaptability: The term process contrasts
considerably with procedure. The process outlined in this book provides a framework for
‘I tend to keep referring back to the original brief
(even if it’s a brief I’ve made myself) to keep
checking that the concepts I’m creating tick
all the right boxes. Or sometimes I get excited
about an idea but if I talk about it to friends and
it’s hard to describe effectively then I know that
the concept isn’t clear enough. Sometimes just
sleeping on it is all it takes to separate the good
from the bad! Having an established workflow
is important to me, as it helps me cover all the
bases of a project and feel confident that my
concept has a sound logic.’ Stefanie Posavec,
Information Designer
thinking, rather than instructions to learn
and follow. A good process should offer adaptability and remove the inflexibility of a defined
procedure. In any visualisation project, you will need to respond to revised requirements, additional data that emerges, or a shift in creative direction. A good process safeguards adaptability
and cushions the impact of changing circumstances like these. Although the activities presented in this book are in a linear arrangement, there will always need to be room for iteration.
There will be plenty of occasions when you have to revisit decisions or redo activities in a different way, especially if you make mistakes. What is more important in these situations is how
gracefully you fail and how quickly you recover.
Protect experimentation: The process approach I am advocating is not overly systematic
and does not compromise on allowing space for experimentation. When there are pressures on
time, the need to focus and avoid distraction is understandable. Aspiring to reduce wasted effort
and improve efficiency is entirely reasonable, but one must still seek out opportunities – in the
right circumstances – for imagination to blossom. In reality, few projects will offer too much scope
for far-reaching creative exploration, but when an opportunity presents itself for you to work on a
subject that befits creativity, you should
embrace it. And do not forget to enjoy it!
The first occasion, not the last: Each
activity you commence across the distinct
stages in the process will likely represent
the first occasion you pay attention to these
matters, but not the final occasion. Think
of the sequencing as being akin to a trickledown effect. Take, for instance, the recurring
concern about thinking about your audience.
You will first encounter the need to define a
profile
of
your
anticipated
audience’s
characteristics during the first stage of the
process, ‘Formulating your brief’. However,
‘I truly feel that experimentation (even for the
sake of experimentation) is important, and I
would strongly encourage it. There are infinite
possibilities in diagramming and visual communication, so we have much to explore yet. I
think a good rule of thumb is to never allow your
design or implementation to obscure the reader
understanding the central point of your piece.
However, I’d even be willing to forsake this, at
times, to allow for innovation and experimentation. It ends up moving us all forward, in some
way or another.’ Kennedy Elliott, Graphics
Editor, National Geographic
34  
FOUNDATIONS
the concern about what they know, what they need to know, and how interested they will
be will reoccur right through to the end. Concerns like these should never drop off your
radar. The list of concerns will only build, but the intention is that the process gives you
the best chance of keeping all the necessary plates spinning for as long as they need to be.
Across the book there are frequent vignettes of advice and useful tips for you to adopt to get
the most out of working through this process. These are informed by interviews with people
working in the field, as well as from my own practical experiences, and are provided with each
topic in the book. There are some recommended habits that are applicable to all stages in this
process, relevant to novices or experienced visualisers alike, as follows.
Time management: Any creative work quickly swallows up all the available time. You get
tempted to try things, to explore different ideas, to attempt one final pass at seeking out interesting features of your data. It is easy to be consumed by the stretching demands of the
activities across this process. As you then reach a deadline you either sink or swim: for some
the pressure of the clock ticking is crippling, especially impacting their creative thinking; others
thrive on the adrenaline it stirs, sharpening their focus as a result. Regardless of how you
respond to looming deadlines, good planning is vital.
Time management is the essence of good planning. It keeps a process cohesive and on track.
From experience working on different projects your ability to anticipate how much time to
allocate to different activities will improve. That said, each project introduces its own profile of
demands, so always find time before you set off to estimate where your likely commitments
will be most required. Do not forget to factor-in time for easily neglected responsibilities, such
as supervisor meetings, Skype calls, research and file management.
Mindsets: Irrespective of the type of visualisation you are working on, your process will
involve a mixture of conceptual and practical activities. Sometimes these will be allocated
across a team, exploiting the range of talents
at different times through the process. On
‘You need a design eye to design, and a nondesigner eye to feel what you designed. As
Paul Klee said, “See with one eye, feel with
the other.”’ Oliver Reichenstein, Founder of
Information Architects (iA)
other occasions you will be working alone,
and the diversity of these activities will
stretch your mind considerably. Sometimes
you are thinking, sometimes you are creating;
sometimes you need to be creative, sometimes you need to have an eye for detail.
•
Thinking: The duties here will be conceptual in nature, requiring imagination and judgement, such as formulating your curiosity, defining your audience’s needs, reasoning your
editorial perspectives, and making decisions about viable design choices.
•
Doing: These are active duties that engage the brain through more practical undertakings, such
as sketching ideas, conducting research, holding discussions with a client, or checking data.
•
Making: These are more hands-on constructive duties characterised by using tools for activities like handling data, creating charts, and designing presentation features.
For the scope of this book, the focus is largely on thinking. I find the notion of brain ‘states’
relevant here, especially the ‘alpha’ state. This is the state our mind is in, most commonly,
The Visualisation Design Process 
  35
when we feel especially relaxed. Occupying this state helps heighten your imagination and
thought process. I find I do some of my most astute thinking in the shower or just before going
to sleep at night. These are the occasions when I am most likely drifting into a relaxed state. I
find the same conditions when undertaking long train journeys or flights. I use it to help contemplate the progress I am making on a task. It lets me escape the noise present when doing
more practical tasks.
Documenting: It is mawkish to claim the humble pen and paper are the most important tools
for visualisers. After all, unless you are producing artisan hand-drawn work, technical applications will be more applicable for most of your process. However, pen and paper will prove to be
a real ally to help you document thoughts and capture sketches. Do not rely on your memory; if
you have a great idea, sketch it down. You do not need great artistry, you just need to get things
out of your head and onto paper, particularly if you are collaborating with others. If you are fortunate to be fluent with a tool and find it more natural to use that for ‘sketching’ ideas than pen
and paper, then this is absolutely fine, as long as it is the quickest medium to do so.
Whether using pen and paper, or a tool like Word or Google Docs, note-taking is a useful habit
to develop. It helps you document important details such as:
•
task lists with details of deadlines and precedents;
•
information about the sources of data you are using;
•
details of complicated calculations or manipulations you have applied to your data;
•
a log of any assumptions you have made;
•
terminology, abbreviations, acronyms – technical prop…
Purchase answer to see full
attachment

  
error: Content is protected !!