Image
man with mask at desk

Perspectives|

Data Visualization in a Pandemic World

A data analytics leader examines errors in COVID-19 data analysis and key actions that make a difference in understanding the progression of a global pandemic.

Written by Greg Green, PhD.
7-minute read

  • Data Analytics
  • Science in Practice
  • Technology and Innovation

In Brief

  1. The outbreak of COVID-19 saw the rapid development of data structures and visualizations to track infections, hospitalizations, and deaths.
  2. Inevitably, attempts to quantify the novel disease in such a fast-evolving environment led to erroneous reporting.
  3. By highlighting three key areas where mistakes are likely, future data analysts will be prepared to take action sooner and potentially save lives.

At a time when false narratives spread rapidly and undermine traditional sources of authority, the integrity in data science and analytics has become especially critical to building trust and credibility.

Since the outbreak of COVID-19, the rapid development of data structures and visualizations to track infections, hospitalizations, and deaths has served a pivotal role in leading health policy while keeping public anxiety at a minimum throughout the global pandemic. At the same time, the public's lack of knowledge about the disease, combined with the highly fluid environment in which COVID-19 has spread, has inevitably led to inaccuracies. These are exacerbated by the constant evolution of core data sets, which are released by government entities and provide data on hospitalizations and test results. What errors do data scientists need to watch out for? What causes common mistakes in such an immature and fast-moving environment?

As a data analytics professional and leader for over thirty years, I am highlighting three key areas where the high cost of mistakes could negatively impact public health, politics, and the economy: data foundations, advanced analytics, and data-driven leadership. Identifying possible errors, replicating the stronger data visualizations, and using analytics for social good may help us move to act faster in the future and save lives.


Data Foundations

Cracks in the foundations of the data sources feeding all of the key metrics and visualizations are evident if we look closely. Let us examine a few and explore their ramifications.

Missing or suppressed data

Since the beginning of the COVID-19 outbreak in China, global case and death rate tracking have been less than ideal. In some countries with low official counts, normal reporting has indicated death rates that are much higher than expected, suggesting a lack of testing or suppression of disease outcome data. Substantive differences in data collection, quality, gaps, and healthcare access across countries impact the capacity to draw valid cross-country conclusions.

Regional inconsistencies within testing data

Differing rules for allowing tests and compiling test-related data make trends in the data inherently misleading. Without random sampling strategies, the tracked data gets communicated in biased metrics without estimates of the bias and without warnings on the flawed nature of the metrics.

Dynamic data availability

Due to real-time availability, even inaccurate data is put to use immediately, thereby having a role in shaping public understanding. As errors are corrected, however, improved data numbers get released without notification on dashboards and reports. The expanding capacity of data to present the public with a more nuanced and comprehensive view of the pandemic can be observed in Figures 1 and 2. Each figure represents a snapshot of the Johns Hopkins Dashboard separated by an interval of weeks. As data became more available, particularly in the United States, additional toggles under each pane emerged offering additional metrics and granular levels of view. In Figure 3, for instance, which contains data for DuPage County, Illinois, demographic data captures age, case rate per 100,000, poverty level, insurance coverage, and more, leading to visualizations that enable a multifold increase in perspectives on the pandemic as well as the potential for regional comparison.


Advanced Analytics

Advanced analytics issues, predictive model errors, and misuse of statistics may be due to a lack of expertise or intentional misuse of model output. Here are some examples that have occurred publicly and their implications.

Reckless predictions

Without realistic scenarios for constraining potential outcomes, early models of infection rates and death tolls were extraordinarily inflammatory. This lowered their value, reduced the credibility of the science, and offset the effects of the significantly improved models that arrived later.

Forced speculation

Lacking basic data inputs, experts have had to extrapolate excessively and use predictive models with error rates above 100%, which in turn yield predictions of little value. In the case of social distancing, for instance, dynamic unknowns and fluid conditions left modelers with little option but to guess the impact on case and death rates.

Lack of clarity around metrics

A basic metric, such as number of tests, will often be highlighted at the top level of a dashboard (Figure 1). More advanced metrics (Figure 3), such as number of tests per 100,000, which more effectively capture comparisons across states, regions, and countries, will therefore require additional clicks to access. In this way, dashboards facilitate comparisons drawn inappropriately from more basic metrics.


Data-Driven Leadership

In contexts of high data fluidity, it falls to leadership to evaluate complex and uncertain results. Non-technical leaders must ask better questions about the data sources and the models’ underlying assumptions, while also being clear about the potential risks and uncertainties of their conclusions. More technical leaders, such as data scientists, must anticipate the misuse of their insights, models, and forecasts and step forward to challenge incorrect conclusions when they arise. If leaders are insufficiently informed or wedded to predetermined courses of action, misleading conclusions are easily drawn. Figures 4 and 5 serve to contrast a prudent and clear presentation with a more obfuscating one.

Image
graph

Figure 4.

  1. Strong contrasting examples: Figure 4 uses clear data analytics for decision metrics, while Figure 5’s lack of clarity suggests an arrangement intended to support a predetermined agenda.
  2. Use of layered metrics: Figure 4 presents a new, higher death count in an identical format layering deaths likely due to COVID-19 as a way to gain trust in the data reporting even as it changes.
New York City Department of Health
Image
graph chart

Figure 5.

  1. Use of consistent metrics: In contrast to Figure 5, Figure 4 limits the confusion stemming from inconsistent and variable single-day metrics by using visuals that show the daily progression in the context of prior days and weeks.
Pulled from Georgia Department of Health prior to correction.

Ultimately, the effectiveness of leadership’s use of data science and analytics to gain cooperation from the population and make decisions that change the course of the epidemic will be judged in the months and years ahead. For now, as the disease continues to spread, we can take a few positive actions that will make a difference not only in how we visualize and consume the information coming our way but also in the actions we take as a result of those data visualizations.

Key Actions

Whether you are a member of the public or a leader using data to inform decisions and gain credibility with stakeholders, here are some key actions you can take to improve your data-driven influence:

  1. Be vigilant and continue to assess the approaches used by labs, governing leaders, and neighboring countries in terms of their appropriateness and accuracy in presenting data-based insight.
     
  2. Look for action-oriented metrics and visuals while suppressing and stopping the spread of informational, extraneous, or interim metrics. For instance, a metric like “percentage ICU bed utilization” is an action-oriented metric that drives a region’s decision to move from phase three to four, whereas a metric like “number of tests” is informational and less directly related to such actions.
     
  3. Take action and support metrics that are reliable and indicate a need to change behavior, such as increasing infection rates, hospitalizations, and decreased ICU capacity.
     
  4. Support and encourage leaders with consistent, transparent, and well-designed data visualizations. As consumers of data, we need to reward the responsible actions of our leaders with votes, vocal support and feedback, and amplification of their message through social media and other influence channels.
Image
Dr. Greg Green Analytics Director headshot

Greg Green, PhD

Executive Director, Analytics Programs at the University of Chicago

As executive director of analytics programs at the University of Chicago, Greg Green architects and leads programs that strategically apply analytics to solve complex industry problems with greater speed and impact. 

Throughout his professional career, Greg has used his expertise in digital strategies, business analytics, and new product development to drive rapid revenue growth and accelerate business transformation. His previous work bringing innovation to an academic environment included authoring a Marketing Analytics course, designing a prerequisite applied statistics course for full- and part-time programs, and serving as a lecturer for Marketing Analytics and a Foundational Statistics Bootcamp at Northwestern University.

Greg’s industry roles include chief analytics officer at Harland Clarke Holdings, director at Google, EVP/managing director at Publicis Groupe, and analytics practice lead at PwC. Greg’s patented cloud-based media analytics platform was highlighted in Harvard Business Review and Fast Company.

Greg holds a doctor of philosophy in mathematics from Claremont Graduate School and a master of science in statistics from Michigan State University.