Tuesday, May 30, 2017

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

Everybody Lies weaves together two interweaving threads. The first is that what people type in private in a search engine tells a lot more about their true feelings than what they tell to pollsters. (People will often search for "non socially acceptable" things such as racist or sexual keywords, but would not admit that to pollsters.) The second is that analyzing big data can provide us answers that we could not find using smaller data sets. An example of a finding was on crimes caused by violent movies. Using hourly crime data and violent movie box office, it was found there was a decrease in violent crimes when a popular violent movie was playing. This may be due to the violence-inclined watching the movie instead of getting drunk and violent that evening.

Most of the work delves in to the big data analysis of the non-socially acceptable topics. People would not tell a pollster something, but they would be willing to type it into a search engine. The difference between the public postings on Facebook and the private searches on Google would be an interesting avenue of exploration. (But, alas, the book doesn't go there.) The author acknowledges that there are weaknesses in using extremely large data sets. It is very well possible to find an answer that is merely coincidence. You can almost always find the answer that you are looking for, but need to be careful to make sure it is really true. (However, I wonder if he also falls victim to this also. If something is unacceptable in an area, the people publicly supporting it may be lower, while searches may be higher. However, would the reverse also be true, with people in an area where it is acceptable to say they support it, while not actually searching for it?)

Anti-muslim behavior provides an interesting case study. After attacks in San Bernadino, anti Muslim sentiment was on the rise. Obama tried to quell this by giving a speech stressing peace. The speech went over well with the media. However, a spike in Muslim-hate searchers occurred during the speech. The one time when it went down was when he talked about Muslim athletes. People were then more interested in searching out how Muslims were similar to them. This helped provide a base for future attempts at reducing violent behavior.

The analysis of "border cases" provides some interesting insights. There is a strict test score cut off for admittance to the most prestigious high school in New York. However, people that barely make the cut off seem to get into equally prestigious colleges as those who barely make the cutoff. This seems to show the high school has very little value. (However, it could also show that admissions officers favor students from a diversity of schools and may make it more difficult for those in the best school to get into their desired college.) Similar results were shown for people who got into Penn State and Harvard. Regardless of which school they chose, they seemed to have equally successful careers. (This does leave plenty of questions. Did people that went to Penn State work harder? Does Penn State have an honors program that provides a similar environment as an ivy? Is Harvard a mediocre experience for those without wealthy connections? Would people with similar academic profiles that did not apply to either show similar results?) It does seem to show that it is the person, not the circumstances that lead to success. Or perhaps that people that try and fail have a chip on their shoulder and are likely to work harder to succeed in the long run.

The author pays a debt of gratitude to Steven Levitt and Freakanomics in inspiring him to look for quirky answers to other problems. (Though he does claim that Levitt has fallen from grace to to political incorrectness and a coding issue - I guess I missed that one.) Now big data is the force that can finally put the "science" into social science. With large data sets, we can legitimately probe human behavior in a way that natural scientists can probe nature. However, there still are challenges. In some instances, "little data" can be better used. Often the best results can be found by combining multiple sources that include big data, enriched by more traditional "little data".

No comments:

Post a Comment