Friday, February 12, 2021

Calculated Risks: How to Know When Numbers Deceive You

The best way to get lie to somebody is to present accurate statistics that they do not understand. In Calculated Risks, Gerd Gigerenzer presents many examples in which people make bad decisions based on a poor understanding of the numbers. The first part of the book focusses on medicine. Doctors tend to be very smart, but not experts with statistics. People are often confused when risk is expressed in probabilities rather than frequencies. Relative risk reduction is especially misleading. If 1000 people participate in a screening and 3 die, while 4 of 1000 in a control group die, the screening would have a relative risk reduction of 25%, which sounds impressive. However, the absolute risk reduction is only 0.1%, which doesn't sound as good. If analyzed further it is found that the average increase in life expectancy is 12 days, which sounds even worse. And that doesn't even take into account the costs. 

Breast cancer is called out as something where bad statistical understanding has lead to negative outcomes. Studies have shown no increase in life lived with universal mammograms for people under 50. However, there is a significant risk of false positives. There is also an increased risk of cancer caused by the radiation exposure in the test. Alas, rather than being upset at needless intervention, people are often relieved when a doctor cuts into the breast only to find no cancer. The many statistics about breast cancer lead to confusion. There is a fairly high prevalence of the cancer in women. However, it is often not the cause of death. Similar to prostate cancer, many people that die of other causes are found to have the cancer. These cancers are often benign and do not negatively impact the quality of life. However, treating these with surgery and/or radiation does negatively impact the life. 

The book gives an example of expressing data differently. "If cancer probability is .8%. If somebody has breast cancer, there is a 90% chance of a positive mammogram. If they don't have cancer, there is a 7% chance of positive mammogram." From this example, it looks like a positive test result makes it extremely likely that one has cancer. However, expressing in frequencies makes it more clear. "8 of 1000 women have breast cancer. Of these 8, 7 will have a positive test. Of the 992 without cancer, 70 will have a positive test". This is the same data expressed differently. However, it makes it much more clear that somebody with a positive result most likely does not have cancer. 

Knowing the prevalence in a population can be extremely important in understanding results. If somebody has HIV, there is a 99.9% chance they get a positive test result. If they are not infected, there is a 99.99% chance they will get a negative test. This seems like the test is rock solid. However, if somebody is not in a high risk category, their chance of having HIV is about 0.01%. Thus in a group of 10,000 low-risk men, 1 will have the virus and 9999 will not. The one with the virus will likely get a positive result. Of the 9999 men without HIV, there will also be 1 likely test result. Thus, even with the very high specificity and sensitivity, a positive result from a low risk population only has a 50% chance of being accurate. In the case of AIDS, many people have committed suicide after getting a positive result, even though the accuracy of this result for their population is about the same as a coin flip.

Criminal Justice often misuses statistics on both sides. In the OJ Simpson trial, the defense portrayed his beating of his wife as irrelevant to the his wive's eventual murder. There are millions of men that beat their wives, yet only a small number that eventually murder them. However, expressing in different frequencies produces a different result. Of 100,000 battered women, 45 were murdered. This seems to back up the defense. However, of the 45 murdered women, 40 were murdered by their partners, while 5 were murdered by somebody else. Oops! Now it seem that the wife beating may be very relevant indicator of a murder. The inverse of this is the "prosecutors fallacy". Here, the prosecutor infers that the probability of observing a set of characteristics is the probability that a defendant is innocent.

The book ends with some hypothetical problems and examples of "deliberate misleading" with statistics. (In one case, risk was expressed using low absolute numbers, while benefit was shown using high relative numbers.) Statistical literacy is a key trait in society today. Alas, even most highly educated people do not have it and will fall prey to intentional (and unintentional) misleading representations.


No comments:

Post a Comment