Why bother with theory when you’ve got big data?

Big data is hotter than ever. A Google search for the phrase yields about 126,000,000 results, demonstrating huge demand for, and interest in, these services. Every day, companies, recognizing these powerful trends, are making huge investments in artificial intelligence, machine learning, and data science. The most ardent promoters of big data even claim that as we master data, we won’t need the scientific method or theory building.

By “theory” we mean something devilishly simple: a statement of what causes what and why. For instance, our central Theory of Disruptive Innovation explains what causes successful firms to fall to under-resourced upstarts, enabling us to predict outcomes more reliably than depending on complex analysis of big data. Though we at the Christensen Institute use the lens of causal theory to understand the world, we recognize that this is seemingly counterintuitive, and contrary to the hard work being done to make best use of the proliferation of data due to ever-increasing computing power. Isn’t data—tangible, countable, data—more trustworthy than theory?

In fact, data analysis has its limitations. The well-publicized failure of Google FLU Trends, a data-aggregating tool designed to use real-time monitoring of flu cases based on Google searches of flu-related activity, is just one example. Though the program was undoubtedly innovative in its approach, it repeatedly overestimated incidences of the flu, demonstrating that more data does not ensure better information about our complex world. At its best, it helps us to discover correlation, and correlation is not causation.

One of my former teachers, K. Codell Carter, a historian and philosopher of medicine illustrates the power of taking a causal, rather than a correlational approach to biomedical science. In his book, The Rise of Causal Concepts of Disease: Case Histories, Carter contrasts the trajectories of causal and non-causal thinking in curing two diseases—scurvy and beriberi—both caused by a deficiency of a single nutrient.

Scurvy, a severe vitamin C deficiency, was identified by the famed father of medicine Hippocrates (460 BC – 370 BC), but not understood causally until 1927. Throughout the two millennia that preceded a true understanding of its cause, many remedies were prescribed, and by the 1830s something like clinical guidelines were developed for the prevention and treatment of the disorder. However, lacking a complete understanding of the causal mechanism for scurvy led to confusion among physicians and researchers.

Researchers got what we might refer to as “noisy” data from implementations of their clinical guidelines. Sometimes lime juice cured scurvy and sometimes it did not. To complicate matters, the advent of faster steamships and a proliferation of ports got sailors from port to port faster, where they had access to good sources of vitamin C. So, although the on-ship sources of vitamin C had been attenuated to the point of being ineffective, sailors reached port sooner, where they got better nutrition and therefore felt few, if any effects, of scurvy. Without a drive to understand the causal mechanism for scurvy, these guidelines did not evolve much. Researchers were satisfied with knowing that (most of the time) lime juice cured scurvy. And so medical discoveries stalled.

Beriberi, an insufficiency of vitamin B1, had a different trajectory. Although known in the 17th century, the serious study of beriberi did not begin in earnest until the 1880s. Due to the emerging preference among physicians and other researchers for a causal mindset and theory-driven approach, by 1884 a Japanese researcher had determined that beriberi was only found among sailors whose diet consisted almost entirely of polished rice. One can imagine empirical researchers being satisfied with this correlation and working to ensure that sailors had a more diverse diet—just as they had been satisfied knowing that usually meat or citrus cured scurvy. However, the drive toward causal mechanisms kept scientific progress on beriberi moving forward. In the 1880s researchers ran controlled animal studies, and shortly thereafter settled definitively that a dietary deficiency from some element of unpolished rice caused beriberi. And by 1913 researchers knew with certainty the very extract of rice bran whose lack was the cause beriberi.

Big data is just more sophisticated empiricism, and sometimes that’s just fine. If tests on our data show consistent correlation, we might be able to make future predictions that are sufficient to solve our problems. However, such an approach is shortsighted when you consider that rigorous application of the scientific method leads to theories. When we understand what causes what and why, applying this knowledge is far more powerful and vastly cheaper and faster in predicting an outcome than looking for patterns among whatever data happens to be available.

For this reason, as we seek to understand the world around us, we can’t be satisfied with correlations. At the Christensen Institute, we have found that a theory-first approach has powerful applications in many sectors, and we work to encourage skeptics and enthusiasts alike to approach their work utilizing sound causal theories.

Author

David Sundahl

About the Institute

Meet our Staff

Our Theories

Our Research

Our Programs

Resource Library

Why bother with theory when you’ve got big data?

Author

Our Mission

Explore

Quick Links

Connect