When is Data Visualisation more Effective than Stats?

Why is Data Visualisation so trendy nowadays? In reality, it has always been, but it is becoming more and more critical because large (VERY large) datasets are being collected every minute around the globe. In fact, every day more than 2.5 quintillion bytes of data are generated. Interestingly, up to 90% of the data that we have available today has only been created in the last 2-3 years. This spike of information generation doesn’t necessarily mean we are gaining more knowledgeable. In fact, analysing multiple sources of heterogeneous data imposes new challenges for sensemaking and for making informed decisions. This series of TED talks provides a quick glimpse of what can be done with all this information.

One of the most effective ways to understand data in the age of Big Data is to leverage our visual reasoning. Reasoning through visualisations is way faster and more reliable than mental reasoning. This is related with the Gestalt Principles which state that every stimulus is perceived in its most simple form. These principles explain why data visualisations allow us to have a wholistic view of data and rapidly see if there are any potential trends or patterns that deserve further exploration. But, can’t we just do the same with stats? Statistics are commonly more accurate than some visualisations. In fact, some visualisations are just bad. For example, pie charts (which can be considered the worst charts in the world) generally fail at doing what data visualisations are meant to do: facilitate the quick comparison of segments of data. One critical problem is that they encode each data segment into circular arc and area. Humans are bad in comparing perceived areas.

However, in some cases, visualisations can be very powerful, even more powerful than stats. The classic example that illustrates the advantages of describing data through visualisation over stats is the Anscombe’s Quartet. This is composed of four datasets that have identical descriptive statistics. For example, they have the exact same mean, variance, correlation and linear regression:

Source: The Anscombe’s Quartet

However, when plotted, each dataset present a quite different shape:

Source: The Anscombe’s Quartet

Although more advanced statistics would describe the differences visually displayed, this example demonstrates the power to very quickly describe the behaviour of certain datasets. In short, exploring data via data visualisation can be a very powerful tool. However, under certain circumstances, visualisations may fall short, and under others, they may provide valuable insights. this is a call for us to use visualisation wisely.

Comments

  1. Pingback: Exploring Data Sonification - Futures