Limiting Bias for Better Data Storytelling

Our own biases significantly influence how data is both interpreted and communicated.

Biases exist naturally; it’s simply part of the human condition. Still, biases significantly influence how each of us analyze, interpret, and communicate data.‍

What’s more, those who aren’t classically-trained data practitioners (those who never set out to build a career in research, analytics, or data science, but now find themselves in a data-focused role) can often feel as if they’re “flying blind,” unaware of what they simply don’t know or can’t control.

This resource is intended to get to the crux of how to improve the quality of your data and mitigate biases so that you can learn to trust your data and make more well-informed decisions; it’s a “quick-start guide” because it outlines the most important activities every enterprise leader should be prioritizing right now in order to improve data efficacy and make better decisions.

Jump To:

Understand -->

Trace -->

Implement -->

01–

Better understand how cognitive bias affects data interpretation, analysis, and decision-making

It can definitely be challenging to break down our own systems of thinking, but luckily, our brains are already wired to want to make sense of our underlying assumptions, biases, and habits. At the end of the day, it's not about simply ‘following your gut;’ it's about becoming more aware of where that feeling is actually coming from (and why).

—

Allison Hu

Director of Design Research

RevUnit

Purple icon with pencil and checklist items

How Understanding Cognitive Bias Improves Data Storytelling

Highlights the fact that biases exist naturally, particularly in data analysis
Makes you more aware of your own biases and how they affect your judgement
Underscores the importance of basic bias mitigation techniques, controls
Allows you to proactively identify and adjust behaviors to mitigate bias

Recognize the types of biases most likely to affect your analysis

Obviously, if left unchecked, biases can result in less objective decision making, or worse, potentially costly financial mistakes.‍

So, if you’re in a position where you’re tasked with collecting, interpreting, or communicating data (nearly everyone these days), start by better understanding the types of cognitive biases that are most likely to affect your judgement.

It’s worth pointing out that cognitive bias in and of itself is a topic deserving of a much lengthier and complex discussion than what we can possibly cover here. So, we’ll only aim to briefly introduce a couple of the most common biases we’ve encountered in our own data work, particularly when working with large, complex organizations.

Firstly, confirmation bias pops up time and time again, particularly when collecting, analyzing, and presenting data. Confirmation bias, as the name suggests, occurs when we select data that supports our own personal arguments or hypotheses. That is, it’s a shortcut that seeks to use data to confirm our own personal beliefs. Confirmation bias occurs frequently, especially in large organizations, where personal agendas can sometimes take priority, particularly in a highly bureaucratic setting where leaders often jockey to advance their own priorities.

Secondly (and for many of the same reasons), anchoring bias often runs rampant inside larger organizations or within larger functional teams where data analysis is rather siloed or compartmentalized. Anchoring bias, again as the name implies, occurs when you rely too heavily or "anchor" yourself to one trait or piece of information when making data-driven decisions. As a result, your interpretation and analysis often then impacts everything downstream from your anchor point.

Become familiar with bias mitigation techniques in visual analytics‍

We won’t pretend to be the go-to experts here, but would offer this piece of guidance — make it a point to try to stay current with the latest research and opinions coming from the more academically-focused voices in the analytics field.‍

Doing so is often an effective way to better understand where others are placing their bets. The field of visual analytics is still relatively “raw” in the sense that we’re learning more and more each day about both: (1) how humans interpret and analyze information, and (2) how to mitigate some of our natural biases as a means to more effectively analyze information and data.

For instance, a recently published doctoral dissertation (August 2020) examined the effects of detecting and mitigating human bias in visual analytics, specifically, exploring—among other things—the latest bias mitigation strategies that have proven promising or effective.

The research references two high-level categories of bias mitigation techniques — a priori and real-time. While a priori bias mitigation strategies seek to reduce bias before the analysis stage (often in the form of educational training that examines past errors to inform future decision-making), real-time bias mitigation techniques aim to identify and mitigate bias at the point of analysis, in real-time. The thought process here is clear: “If biased decision making processes can be assessed and measured in real-time, bias mitigation strategies can do more than simply educate analysts beforehand.” That is, they may be able to reduce bias at the point of interpretation and analysis.

Spotlight —

Induction bias, selection bias, and survivorship bias are also common cognitive biases that exist when working with data in any capacity. It’s a bit much to cover each of those in detail here in this guide, so, if you’re looking to go a bit deeper, we’d recommend this closer look at cognitive bias in data science.

There are a variety of bias mitigation techniques, yet one in particular is perhaps most intriguing given the prevalence and promise of more sentient machine learning abilities. These techniques are often labeled “machine-initiative” because they rely on visual analytic tools to play a central role in bias mitigation, where the machine operates as an unbiased collaborator that can act on behalf of the interpreter, or take initiative, to mitigate potentially biased analysis processes.

02–

Find Your Spot in the Build-to-Buy Spectrum

Individuals who are tasked with interpreting and analyzing data often assume that the data they’re working with is clean and accurate. In many cases, that’s correct. In many others, it’s both an incorrect and consequential assumption. It’s vital, then, that each individual make a concerted effort to thoroughly inspect and question critical source data.

—

Colin Shaw

Director of Machine Learning

RevUnit

How Tracing Data Back to the Source Improves Data Storytelling

Encourages teams to treat data as a critical discipline, not an afterthought
Seeks to identify root errors or inconsistencies in source data right from the start
Allows for added scrutiny where and when necessary, including ongoing conditioning
Allows for flexibility in data management, shifting focus to critical needs as appropriate
Treats data as an evolving asset, enforces critical data quality standards at global level

Assume nothing, question the source and quality of the data

More than 70 percent of organizational leaders have said that suboptimal data quality has negatively impacted their business decisions.

‍Now, there are a number of contributing factors at play here, yet it’s important to recognize that one of those factors is cognitive bias. In this case, individuals who are tasked with interpreting and analyzing data often assume that the data they’re working with is clean and accurate. In many cases, that’s correct. In many others, it’s an incorrect and consequential assumption, or a loosely held bias about the data itself.

The unfortunate reality of deep-rooted inconsistencies or glaring biases in your source data is the resulting fragility of your decision-making process. Most, especially those downstream from the source data, seldom think to question the data they’re working with until there’s a clear and compelling reason to do so. The alternative path is much more effective; that is, to entertain your own curiosities, ask questions of both your team and your data, and entertain other possibilities outside of the ones you seek. Thus, to reduce unnecessary risk, it’s vital that each individual make a concerted effort to thoroughly inspect and question critical source data.

Audit, catalog, and organize existing data to ensure reliability‍

young black man working on a laptop with organizing shapes and elements

This step is necessary both at an organizational and functional level.‍

Yet, in many cases, this step is either lacking entirely, or it’s been done rather haphazardly. Thus, it’s worth spending the time and energy necessary to map existing data flows within your specific function or team.

When doing so, identify any “dead ends” or silos you encounter along the way. Even if there is a governance team who’s working on a similar initiative, it’s still worth a second look. You want to intimately understand where critical data is housed, how it’s collected, and where it moves over time (both upstream and downstream). Even if you find that everything checks out, at worst, you’re verifying the cleanliness and accuracy of critical data sets, which is hardly a bad thing.

There are a variety of tools that can assist you in this process, many of which will allow you to simplify and automate some of the critical tasks involved in discovering, profiling, and indexing data. In fact, if your organization already has a dedicated data governance team (or a group or business function acting in a similar capacity), it’s likely that an enterprise data intelligence platform is already in place. If that’s indeed the case, check with that team to better understand how the tool and technology stack can be used to assist in this process. Any sound data governance practice will be actively using these kinds of tools at a global level.

Still, despite the clear benefits of auditing, cataloging, and organizing existing data, only 20% of organizations say they publish data provenance and data lineage information internally, and most of those who don’t publish such information say they have no plans to start. Don’t make this mistake. Making this type of information readily accessible to all teams is one of the most important steps toward making data and its sources more transparent for a more data-driven culture, which, if maintained and governed properly, is critical to improving data quality directly at the source.

Spotlight —

Implement a system of checks and balances, ensuring all critical assumptions are double-checked

03–

When to Consider a Hybrid Model with a Partner

Many teams tend to limit data interpretation and analysis to a small group of individuals—usually those in an analyst role or similar—who may lack the specific domain expertise needed in order to understand the full picture. Instead, seek out other individuals who have the domain expertise that you or your team might lack.

—

CJ Weatherford

Director of UX, Data Visualization and Strategy

RevUnit

How a System of Checks and Balances Improves Data Storytelling

Encourages a continuous feedback loop built on diversity of opinion
Creates a data culture of curiosity, whereby inquisitiveness is celebrated
Allows for closer collaboration between both functional and domain experts
Forces teams and individuals to review assumptions and storylines thoroughly

Invite other domain experts to participate in the analysis

Bias mitigation techniques take many forms; some are more intricate and complex than others.

‍Still, one of the best ways to reduce bias in your analysis and reporting process is to include those who will bring varied perspectives to the same set of data. While it seems like a no-brainer, many teams instead tend to limit data interpretation and analysis to a small group of individuals—usually those in an analyst role or similar—who may lack the specific domain expertise needed in order to understand the full picture.

Instead, seek out other individuals who have the domain expertise that you or your team might lack. It’s worth noting that you’re not looking to gather perspectives just for the sake of doing so. Rather, you’re looking for those specific individuals who, by the very nature of their work, have both a vested interest in the data and thus should have an accompanying seat at the table. These individuals can often provide help to identify flaws, patterns, or supply context that otherwise may go unnoticed — or worse, misinterpreted.

Ensure all critical assumptions and storylines are peer-reviewed

Finally, for many of the same reasons, it’s worth implementing a review process to ensure that the narrative that you’re trying to communicate is clear, accurate, and actionable.

It’ll be difficult (if not impossible) to remove all biases from your reporting. That’s okay. Your goal isn’t perfection; it’s simply unattainable. Instead, you’re looking to implement the safeguards necessary to ensure that your team and organization are making the best decisions with the best data available. To do so consistently still requires a keen human eye, especially given our increased reliance on machine learning and artificial intelligence (AI) models and tools, particularly those that are used to collect, store, analyze, and visualize data.

In fact, Gartner predicts that, by 2023, 75% of large organizations will hire AI behavior forensic, privacy, and customer trust specialists to reduce reputation risk as a direct result of biased data. In addition, large organizations like Facebook, Google, Bank of America, NASA, and others have moved to appoint forensic specialists who primarily focus on uncovering undesired bias in AI models before they’re deployed. These specialists are validating models during the development phase and continue to monitor them once they’re released into production, as unexpected bias can be introduced because of the differences between training and real-world data.

Safeguards like that of a peer review process (or similar) are critically important; the more thorough those safeguards, the easier it will be to identify, correct, and reduce incidences of obvious or unexpected bias. Doing so is still one of the most effective methods you have in order to ensure that you’re not relying on inaccurate assumptions or biased preconceptions as the foundations of your data storytelling. Hopefully, more trained eyes means fewer mistakes.

Spotlight —

Limiting Bias for Better Data Storytelling

Our own biases significantly influence how data is both interpreted and communicated.

Better understand how cognitive bias affects data interpretation, analysis, and decision-making

Recognize the types of biases most likely to affect your analysis

Become familiar with bias mitigation techniques in visual analytics‍

Find Your Spot in the Build-to-Buy Spectrum

Assume nothing, question the source and quality of the data

Audit, catalog, and organize existing data to ensure reliability‍

When to Consider a Hybrid Model with a Partner

Invite other domain experts to participate in the analysis

Ensure all critical assumptions and storylines are peer-reviewed

Wrapping Up

Tell us about your next big idea.

Are you looking to create change, faster, with your data?

‍Tell us what you're up against, and we'll help you tackle it.