Data Visualization in a Pandemic World
A data analytics leader examines errors in COVID-19 data analysis and key actions that make a difference in understanding the progression of a global pandemic.
Written by Greg Green, PhD. Read time – 7 minutes
- Data Analytics
- Science in Practice
- Technology and Innovation
- The outbreak of COVID-19 saw the rapid development of data structures and visualizations to track infections, hospitalizations, and deaths.
- Inevitably, attempts to quantify the novel disease in such a fast-evolving environment led to erroneous reporting.
- By highlighting three key areas where mistakes are likely, future data analysts will be prepared to take action sooner and potentially save lives.
At a time when false narratives spread rapidly and undermine traditional sources of authority, the integrity in data science and analytics has become especially critical to building trust and credibility.
Since the outbreak of COVID-19, the rapid development of data structures and visualizations to track infections, hospitalizations, and deaths has served a pivotal role in leading health policy while keeping public anxiety at a minimum throughout the global pandemic. At the same time, the public's lack of knowledge about the disease, combined with the highly fluid environment in which COVID-19 has spread, has inevitably led to inaccuracies. These are exacerbated by the constant evolution of core data sets, which are released by government entities and provide data on hospitalizations and test results. What errors do data scientists need to watch out for? What causes common mistakes in such an immature and fast-moving environment?
As a data analytics professional and leader for over thirty years, I am highlighting three key areas where the high cost of mistakes could negatively impact public health, politics, and the economy: data foundations, advanced analytics, and data-driven leadership. Identifying possible errors, replicating the stronger data visualizations, and using analytics for social good may help us move to act faster in the future and save lives.
Cracks in the foundations of the data sources feeding all of the key metrics and visualizations are evident if we look closely. Let us examine a few and explore their ramifications.
Missing or suppressed data
Since the beginning of the COVID-19 outbreak in China, global case and death rate tracking have been less than ideal. In some countries with low official counts, normal reporting has indicated death rates that are much higher than expected, suggesting a lack of testing or suppression of disease outcome data. Substantive differences in data collection, quality, gaps, and healthcare access across countries impact the capacity to draw valid cross-country conclusions.
Regional inconsistencies within testing data
Differing rules for allowing tests and compiling test-related data make trends in the data inherently misleading. Without random sampling strategies, the tracked data gets communicated in biased metrics without estimates of the bias and without warnings on the flawed nature of the metrics.
Dynamic data availability
Due to real-time availability, even inaccurate data is put to use immediately, thereby having a role in shaping public understanding. As errors are corrected, however, improved data numbers get released without notification on dashboards and reports. The expanding capacity of data to present the public with a more nuanced and comprehensive view of the pandemic can be observed in Figures 1 and 2. Each figure represents a snapshot of the Johns Hopkins Dashboard separated by an interval of weeks. As data became more available, particularly in the United States, additional toggles under each pane emerged offering additional metrics and granular levels of view. In Figure 3, for instance, which contains data for DuPage County, Illinois, demographic data captures age, case rate per 100,000, poverty level, insurance coverage, and more, leading to visualizations that enable a multifold increase in perspectives on the pandemic as well as the potential for regional comparison.
Advanced analytics issues, predictive model errors, and misuse of statistics may be due to a lack of expertise or intentional misuse of model output. Here are some examples that have occurred publicly and their implications.
Without realistic scenarios for constraining potential outcomes, early models of infection rates and death tolls were extraordinarily inflammatory. This lowered their value, reduced the credibility of the science, and offset the effects of the significantly improved models that arrived later.
Lacking basic data inputs, experts have had to extrapolate excessively and use predictive models with error rates above 100%, which in turn yield predictions of little value. In the case of social distancing, for instance, dynamic unknowns and fluid conditions left modelers with little option but to guess the impact on case and death rates.
Lack of clarity around metrics
A basic metric, such as number of tests, will often be highlighted at the top level of a dashboard (Figure 1). More advanced metrics (Figure 3), such as number of tests per 100,000, which more effectively capture comparisons across states, regions, and countries, will therefore require additional clicks to access. In this way, dashboards facilitate comparisons drawn inappropriately from more basic metrics.
In contexts of high data fluidity, it falls to leadership to evaluate complex and uncertain results. Non-technical leaders must ask better questions about the data sources and the models’ underlying assumptions, while also being clear about the potential risks and uncertainties of their conclusions. More technical leaders, such as data scientists, must anticipate the misuse of their insights, models, and forecasts and step forward to challenge incorrect conclusions when they arise. If leaders are insufficiently informed or wedded to predetermined courses of action, misleading conclusions are easily drawn. Figures 4 and 5 serve to contrast a prudent and clear presentation with a more obfuscating one.
Ultimately, the effectiveness of leadership’s use of data science and analytics to gain cooperation from the population and make decisions that change the course of the epidemic will be judged in the months and years ahead. For now, as the disease continues to spread, we can take a few positive actions that will make a difference not only in how we visualize and consume the information coming our way but also in the actions we take as a result of those data visualizations.
Whether you are a member of the public or a leader using data to inform decisions and gain credibility with stakeholders, here are some key actions you can take to improve your data-driven influence:
- Be vigilant and continue to assess the approaches used by labs, governing leaders, and neighboring countries in terms of their appropriateness and accuracy in presenting data-based insight.
- Look for action-oriented metrics and visuals while suppressing and stopping the spread of informational, extraneous, or interim metrics. For instance, a metric like “percentage ICU bed utilization” is an action-oriented metric that drives a region’s decision to move from phase three to four, whereas a metric like “number of tests” is informational and less directly related to such actions.
- Take action and support metrics that are reliable and indicate a need to change behavior, such as increasing infection rates, hospitalizations, and decreased ICU capacity.
- Support and encourage leaders with consistent, transparent, and well-designed data visualizations. As consumers of data, we need to reward the responsible actions of our leaders with votes, vocal support and feedback, and amplification of their message through social media and other influence channels.
Complete Your Data Science Degree Online
Take advantage of our online format to take your data science career to the next level.MScA Online