Thursday, November 04, 2021

The Data Detective: Ten Easy Rules to Make Sense of Statistics

On the surface, statistics are concrete facts that describe the world. However, in actuality, they are highly susceptible to manipulation, both intentional and unintentional. Understanding the details is key to making accurate decisions. 

In the UK, one part of the country appeared to have a much greater infant mortality rate. What could be causing that? The mortality rate even differed among similar demographics in the different locations. The "difference" turned out to just be in the way the statistics were reported. The same event would be reported as a "late miscarriage" in one place and a "live birth followed by death" in another. The events were the same, but the different reporting, made one appear to reduce infant mortality. This difference also explains some of the differences in infant mortality across countries. It is not that outcomes are worse, just that they are reported differently. It is common for elaborate charts and analysis to be built on unreliable data. 

Scientific studies, can be very prone to publication bias. An unexpected novel result is likely to get a great deal of press coverage. The "boring" results are less likely to even be published. There is also a strong incentive to publish novel results, rather than try to reproduce previous results. There may have been multiple people trying similar experiments, but only the novel results made it to publication. Then there are issues with the study itself. The significance threshold is arbitrary. A researcher may choose to change the size of the dataset in order to get the results that they are desiring. Psychology and medicine are both very prone to only publishing the "good stuff", leaving us with little ability to know what is truly "good".

On an individual level, we are also susceptible to bias from our pre-existing beliefs. We are more likely to consume media that backs up our beliefs. People that think like us are more likely to be given the benefit of the doubt over those that contradict our viewpoints. The same things happens in research. Results that are differ from what is expected are more likely to be discarded.

Visualizations are a great way to mislead. Different types of numbers can be compared to produce misleading results. Changes in color and scale can change our interpretation of the data. Unrelated data can be shown together to imply relationships were none exists. Graphs and infographics are easily digestible, but not necessarily show the whole picture.

The use of data can impact the data itself. If people know that data will be used for a purpose, their behavior may be changed to account for the data collection. Wrestlers target a specific weight class before they weigh in. Looking at data, it will be rare for a wrestler to be at the bottom of a weight class. Teachers that are graded based on student performance have incentive to manipulate that performance. The student performance is no longer fully objective. Government officials that can control data may manipulate it for their objectives. (This it is important to have non-partisan data organizations.)

Data Detective does a great job of helping to guide us through the pitfalls and benefits of statistics that we encounter in our life. It builds on past work while presenting concrete cases to help us to better understand what is going on in the world around us.

