Heart of the Dataset
Michael Owen Fisher interviews David McCandless
January 15, 2013
The data journalist and designer on the balance between content and beauty
David McCandless creates visual landscapes from the debris of internet statistics. A self-taught designer, he has established himself as a leading light in infographics and data visualization. The task today for data journalists like McCandless is far from easy. They must navigate a sea of information, steering clear of online canards and misinformation. Their product must be attractive and digestible, suitable for an audience increasingly oriented toward instant messages. Yet, in whittling data to its core, they must guard against losing a story’s nuances. McCandless manages to achieve this balance. He works with large pools of information, often fusing datasets to distill the central trends and tease out hidden relationships.
His first book, The Visual Miscellaneum, covers a wide range of subjects: from hot potatoes such as carbon emissions and conflict in the Middle East, to more recherché subjects such as dance-music genre-ology and representations of souls. The research can be time-consuming. McCandless scoured ten thousand Facebook status updates in order to map the timing of relationship breakups. The meta-analyses for his “Snake Oil?” diagram—a “balloon-race” graphic that examines the efficacy of health supplements—took several weeks of work. When designing his images, McCandless avoids the bromides of pie and bar charts. He favors more eye-catching approaches, such as a hectic, looped timeline to track TV time travel plots, or an egg chart to show amphibian extinction rates.
The success of The Visual Miscellaneum led to a TED fellowship and the appearance of his work at MoMA. Since then, McCandless has placed greater emphasis on collaboration. He is coordinating several crowdsourcing projects through his website, Information is Beautiful. His new book, Knowledge is Beautiful, which will be released later this year, features several of these large-scale collaborative graphics. We spoke over the phone a few days before the relaunch of his website.
—Michael Owen Fisher for Guernica
Guernica: You once said you had developed a “dormant design literacy” over the years. Do you think this is a result of all the exposure you’ve had to web design? Perhaps the rise of the internet has made us all more design literate?
David McCandless: Yes, all the domains I’ve worked in—programming, video games, magazines and newspapers, advertizing—are very visual in their own way and are design-related. I think I absorbed a lot of design nous and design sensibility by being exposed to these media domains over the years.
I had the story running in my head that I wasn’t a designer and I couldn’t do design. I had that typical response a lot of people have to design–thinking it’s a mystical ability that some people have and some people don’t. For years I thought I was just a writer, but when I sat down to design and started playing around with it, I realized that, really, it’s pretty easy. Obviously it’s more than just a set of rules, but the basics of design are actually pretty simple and quite mathematical. The link between data and design works at quite a fundamental level.
Guernica: You mention programming. Those skills must help.
David McCandless: I have a computational quality to my mind, I suppose. When I was a kid, I was obsessed with video games. I reprogrammed games, and this eventually landed me a column in a magazine. That’s how I got into print journalism: writing about video games.
Guernica: What started you down the road of infographics and design?
David McCandless: I was a freelance journalist, and it was a struggle because I had to pitch all the time, research, and stay on top of subjects. About ten years ago I started moving into online work, and that exposed me to design and the impact it has on the flow, shape, and narrative of the story. This got me thinking that maybe this is a way of doing journalism, a way of telling stories and revealing patterns.
I remember researching a really complicated article and having trouble keeping track of all the different perspectives, the different camps within the subject. I ended up drawing a diagram to help myself follow how the ideas were interrelated. I looked at the diagram when I had finished and thought, “Oh, maybe I don’t need to write the article now–maybe I’ve done my job as a journalist. I can convey my understanding through the diagram.”
Guernica: When it comes to the designs, how do you decide which approach to use for each dataset? Do you find it’s dependent on the nature of the data?
David McCandless: It’s a mix. Sometimes the data has a sense of a structure within it, especially if I’m looking at networks or relationships, which generally lend themselves to a certain style. But when I’m working with numbers, or data that’s more informational, I often spend time structuring the narrative and the flow of information in the spreadsheet itself. It maps my understanding and the questions I might ask journalistically, as an outline would. Often the structure in the spreadsheet can be planted directly into a design. The spreadsheet is in two dimensions, an x- and y-axis, and you can stack information left and right as well as up and down. It provides a blueprint, a skeletal structure, which, when you place it into a design, can easily become a table, a grid, a set of blocks, or something geometric. There’s parity between the two approaches: the spreadsheet and the grid of a design. Sometimes it’s as simple as transplanting it, sometimes it isn’t. I play around with five, six, or even seven approaches until one feels like it’s more in tune, more intuitive, more linked to the idea.
Guernica: Have you ever set out trying to prove a particular point with a dataset, only to find it’s not showing what you expected, or perhaps what you wanted?
David McCandless: All the time. It’s a common experience. Often I have a perspective, especially in the political arena, but when I begin researching and find the data contradict it, I have to change my opinion and my direction midstream.
Guernica: Is that frustrating?
David McCandless: At first, I found it really annoying. I wanted a clean route, and I didn’t want my idea and execution to be detoured or deflected. So it could be frustrating. But I began to see that this process actually challenges your own preconceptions. If you want to visualize something in its fullness you have to look at every perspective and every angle, and that means—sometimes uncomfortably—looking at things you don’t want to look at. It is about creating a full, multi-angled, multi-perspective image as much as possible.
I think it’s good practice, if you’re a writer, or anyone who’s shaping narrative, to be flexible and open with the direction of the narrative. But, yes, it happens a lot.
Guernica: Can you give an example? Which datasets have surprised you?
David McCandless: The classic one for me is one of my favorite images, which I discuss in a TED talk. It’s a left-versus-right political-spectrum image. I was trying to visualize the concepts on the political spectrum. I’m left-leaning, and I discovered as I was doing it that I had an impulse to make the left-hand side appear better than the right-hand side. That was manifesting in the way I was choosing certain words, framing certain ideas. I shared it with a few people, and they all said, “Oh my God, this is really biased.” I hadn’t seen it at all. I really hadn’t seen it. I thought I was doing something quite clean. As part of my process I had to go back to the right-hand side of the diagram and honor the perspectives on that side. I noticed, a little bit uncomfortably, how a lot of me was in that side, and that some of my beliefs err toward the right. So, again, the process was flossing my mind, coaxing me to be wider in my perspective, to challenge my preconceptions.
Guernica: Your aim is to capture the heart of the dataset, the most important aspect of it. That is to say, if anything, you’ll trim, elide, and simplify your datasets. Is that right?
David McCandless: Definitely, but I often forget that. I’ve been working on a graphic about carbon emissions. It’s an incredibly simple graphic—a bunch of blocks and a table below it—but it’s taken me three weeks to design. For some reason it just wasn’t working. Then finally I realized there was a number present, which I was rendering in each version, that wasn’t necessary for the understanding of the piece. This figure was getting in the way and distracting from the main flow of the narrative. As soon as I pulled that graphic out of the design, it sprang into focus. Suddenly it worked.
Guernica: A eureka moment.
David McCandless: Yes, but often I forget that design is really about removing things: optimizing the flow of information, the display, so there are the fewest elements possible, but still they preserve the essence of the subject.
Guernica: But at the same time, if you slice away and streamline the data, is there a risk you could miss an added complication? I imagine there’s a balancing act: filtering to the essentials, the crux of the topic, versus ensuring you avoid overlooking more complex patterns. Have you ever found you have a dataset that’s difficult to create an infographic from? Or do you always find a route?
David McCandless: That’s a good point. I think there’s a balance between optimization and simplification. They feel like different things to me. Simplification seems to be the removal of objects for the goal of making a graphic as clean and uncluttered as possible. Whereas, with optimization, it feels like there’s more intelligence in that. It maintains the usability, but tries to distill something down to its essence. But with some datasets you have to be careful because, as with linear, print journalism, it’s easy to shave off facts that don’t quite fit the flow.
There’s a tendency in graphics to allow the trimming of certain parts. But I think that if you’re open about your process, your methodology, such as introducing thresholds, introducing filters, techniques people use in research and data management, it’s legitimate. It’s legitimate to say, “We’re only going to show data above this level, or between levels.” As long as you say that’s what you’re doing. Techniques such as rounding and normalizing—putting a percentage in relative terms—are all recognized techniques. These methods allow a data journalist to filter and reduce the amount of information, but in an open and legitimate way.
Guernica: While we’re thinking about appearance versus content, your “Rhetological Fallacies” visualization caught my eye. You included the design fallacy—the idea that aesthetics determine credibility. I know you made a joke of it in the graphic, suggesting it might be apposite to some of your work. But, more seriously, nowadays do you think there’s a danger that the design fallacy is more in play and that it’s quite easy to be misled by the beautiful presentation to which we’re exposed?
David McCandless: I definitely think that, and I feel that responsibility because design has a powerful impact on the viewer. It has authority, and data also has the same air of authenticity and detail. It can be hard to argue with a graph, and it’s hard to argue with data. So to combine data with a strong visual impact creates a powerful message. We’ve seen it in the last U.S. presidential campaign: both sides were trading graphs and circulating data visualizations to make their point. So the political establishment is waking up to the power of a good graph as well.
The way I combat it is that, wherever possible, in fact pretty much every piece I make, I pair it with some transparency. In the carbon emissions graphic I’m working on, there’s a whole visualization, and then, underneath, all the sources, and then underneath that is a link to a Google document where you can see all the data, all the math, all our workings, the whole process we used to create the diagram. You can check, if you’re interested, go back to our data, go back to our numbers, and make sure we’re not pulling the wool over anyone’s eyes and we’re not grinding a particular agenda.
Guernica: Can you check who’s looking at the raw data?
David McCandless: Yes, we’ve got logs on the website. About 20 percent of people who look at the graphic will also look at the raw data. We get lots of feedback from people—sometimes angry—but pointing out really important things that we might have missed, little details, or data we didn’t include. When I first upload a graphic, there’s always a rush of people coming forward.
Guernica: It would be quite insightful to see the flow of people—how quickly they visit when you put a new one up.
David McCandless: It would actually make a great visualization. I might be able to do that.
Guernica: And some of the visitors to your site are helping to shape your datasets because you’re crowdsourcing your new book.
David McCandless: Some parts of it, yes. I’m launching my new website, and we’re going to start doing some crowdsourcing projects because some of the new ideas are quite ambitious. I don’t think I could do them by myself.
I’ve made mistakes in the past, and it’s tapped me into the idea of a singular author versus multiple authors. In vast subject matters, such as climate change, one person can’t do it. You need the crowd. You need experts. You need a team. You need people online who can help you to check facts, to gather information. It’s a much more collaborative process, and I never consider any of the graphics to be finished.
Guernica: How will the crowdsourcing work?
David McCandless: We’re using available tools like Google Docs and WordPress, and we’re inviting people to contribute a cell, a row, or a column of data. It’s quite targeted. We might be examining the biographies of two hundred people, for example, and invite the contributors to research and fill in particular gaps. For every row and column we’re going to get two people to do each one, so we’ll double-source it, which gives a degree of internal checking. Much of the crowdsourcing is for the fun ideas, data that is easier to get hold of. I haven’t edited or changed the site since I launched it back in 2009, so it’s overdue for a re-spruce. It’ll be a new visual direction for the site.
Guernica: Another one of your graphics I found interesting was “Low Resolution,” which shows the power of our visual system. The data comes from Tor Nørretranders, who came up with the idea of comparing the processing speeds of our senses to bandwidths, which is clever. Do you think that’s why infographics are so effective, because they combine our language processing system with the speed of the visual cortex in identifying patterns and contrasts?
David McCandless: I think that’s definitely happening on the biological, cognitive level. Also, it suddenly occurred to me that when I create some graphics—like the CO2 one I mentioned and “Snake Oil?”—they take ages. They can take three or four weeks, sometimes even six—and it struck me that during that production time, I’m doing a lot of cognitive processing. I’m gathering research. I’m processing it. I’m arranging the data. I’m sorting out the narrative. I’m designing. It’s almost as if I do all the cognitive work that you then don’t have to do. I digest it, process it, and then offer something that’s very easy for you to digest.
Guernica: Like a mother bird.
David McCandless: [laughs] It’s enjoyable to just drink in an image. It’s effortless. That struck me as an interesting way of looking at it: I do all the grunt work so you can enjoy. And I don’t say any of that with resentment. I’m happy to provide that service. But that’s the power of it. A good infographic that’s been really worked on looks simple and easy, but it actually required a lot of processing.
Guernica: How about scientists and researchers? Do they do a good job when presenting data to the public?
David McCandless: [laughs] Is this on the record? I think everyone is struggling somewhat with presentation. The bar has been raised because we’re looking at the Web all the time, and what we consider good design is now influenced by the Web. The Internet is generally well designed, if you look at the most popular websites, so we expect our visuals to be at that level of quality. When you sit in a presentation and you’re looking at nonsensical pie charts and the like, your audience does disengage. People across a range of industries, not just science, are struggling with their communication because their output doesn’t compete with what people see on a day-to-day basis. Some of the commercial work I do is helping people to improve their presentations and add some design thinking. There are so many amazing things in science, and such great data, which can often be locked away. It’s in the minds of these amazing practitioners, who can’t necessarily express what they want in a visual way.
Guernica: And you’ve been collaborating with scientists?
David McCandless: I’m doing some work with the Wellcome Trust. We’re trying to work on how to achieve better pathways from the research to the audience, from the research to the eyeballs. But it’s challenging, because really I can’t design something unless I understand it.
Guernica: Like you said, you have to be able to do the research and all the cognitive processing before contemplating the design.
David McCandless: Exactly. I don’t know whether the solution is to teach scientists to design or teach designers to be better journalists so they are able to gain the necessary level of understanding.
Guernica: How will Knowledge is Beautiful differ from your previous book, other than the collaborative aspect?
David McCandless: It’s a little deeper, a little wider, a little more connected. It’s exploring the difference between what you might feel is information, and what you might feel is knowledge. It’s a similar format, but I’d say the graphics are generally richer and deeper, and they go into the whys and the hows to a greater extent. They look into the corners of things. There’s a shift in the type of data as well. In the first book, it was mostly quantitative data, but the new book blends quantitative and qualitative. There will be more subject matters like the “Rhetological Fallacies” graphic, where we’re looking at stuff that’s quite conceptual, subjective, in that realm.
Guernica: And do you think qualitative data are generally harder to form into a graphic?
David McCandless: Usually. What I tend to do is blend quantitative with the qualitative to allow me to plot the qualitative data in some way. It’s a question of what quantitative data are most applicable. So I’m playing with that, merging the two.
Guernica: What subjects have you most enjoyed getting your teeth into for the new book? Have there been any challenging datasets you’ve balked at?
David McCandless: I’ve got quite a few in the new book that make me gulp–they’re a little overambitious. I mean, I’ve got a ton more “Snake Oil?” diagrams that I’ll be including, which roam across evidence-based medicine, looking at things you can take, in a medical sense. I’m doing a lot more work around music, not from a quantitative point of view, but looking from an emotional point of view—plotting emotions. I’m working on several cosmic and astronomical graphics, trying to bring that stuff alive without being too heavy. I enjoy working on things I don’t understand initially, which these days includes pretty much everything.
To contact Guernica or David McCandless, please write here.