I am a data scientist and first heard “data is the new oil” nearly fifteen years ago. But oil is useless unless we know what to do with it.
I take the position that there are no “natural” resources. Raw materials are raw materials until they’re put to use, and how they are used determines their value. This, in turn, can change over time. Moreover, there are different grades of raw material.
Data is Abundant
It is important to recognize that the use of data to make decisions is not new. Crows can count and use this innate ability to make some decisions. Paleoanthropologists have discovered evidence of humans keeping records thousands of years ago. Ancient civilizations used data and the pyramids could not have been built without data. Data have been a part of military decision-making for centuries and merchants have always used data, even if crudely and unsystematically. Even astrology has some empirical basis – there are stars, after all. The Royal Statistical Society was founded in 1834 and the American Statistical Association in 1839.
So perhaps oil was the new data.
There are many ways data can be used to make decisions. One is to study change over time. In a business context, we may study stock prices, sales, and operating profit of a company for the past decade. Physicians are interested in keeping an eye on our weight, blood pressure and cholesterol levels and how they vary over time. Whether a trend is upward, downward or flat can be good, bad or irrelevant, and this depends on the circumstances and how knowledge of the trend will be used by decision-makers.
Another way to use data is to compare groups we believe are important on the basis of numbers we believe indicate important things about these groups, whether they be consumers, companies, or other entities. Believe is significant here since beliefs most strongly held are often the least accurate. We can easily become ego-involved and emotional when cool heads and detachment should prevail.
Though there are deterministic models grounded in first principles, most data cannot be fully explained, thus the error terms in statistical models. We can only guess at the mechanism(s) that gave rise to the data. Statisticians often attribute errors in a model to chance (“random error”) but what this really means is we don’t know why the results of our model depart from expectation in certain ways. Measurement error, omitted variables, model misspecification and a whole host of other reasons can cause a lack of fit or render the model useless when applied to new data.
Not All Data is Analyzed In-Depth
Of course, a lot of data aren’t analyzed in-depth and not all have to be. For example, bad economic news may cause the stock market to suddenly tank. We may want to have a closer look but might be reasonably certain that the bad news caused a sudden drop. But not always. Making snap judgments can prove very costly, especially when we’re making a lot of them. We need to consider the costs of a more thorough investigation and the costs of being wrong.
Estimating the financial impact of renovating a plant may not require rocket science, but judgments regarding the effectiveness of our marketing activities, for example, are seldom clear cut. There are many highly correlated variables whose independent effects are difficult or impossible to pry apart. A relationship between two variables may depend on other variables (moderation). It may go through other variables (mediation). Relationships may be a curvilinear, not a straight line. There may be lagged relationships in which the effect of one variable on another does not become evident until later on. Cause and effect can be hard to disentangle. For instance, marketing budgets are driven in part by past sales, but past sales are, in part, driven by past marketing activity.
Consumer behavior is not always easy to predict, either. Remember that perfect birthday gift to a close friend or loved one that totally flopped? People can behave in similar ways for similar reasons. But they can also do the same things for different reasons, different things for the same reasons and different things for different reasons.
There’s Still More to Uncover with Data
It is also important to understand that neuroscientists are still unsure of how humans make decisions. See Neuroscience and Marketing, an interview with Professor Nicole Lazar for a snapshot of this fascinating topic. Who Cares About Evidence? is my own less scholarly take on this subject. Ewout Steyerberg’s Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating is a dense textbook covering precision medicine and related topics that I can recommend to statisticians working in any field, including marketing.
Consumer behavior can also differ by product category and purchase occasion (e.g., for self or other family members). It can also change over time, as events such as a job promotion, relocation, marriage or birth of a child can have a dramatic impact on consumers’ behavior. Marketing can influence behavior and that, of course, is its purpose. However, it can influence in ways we hadn’t intended and that are undesirable, encouraging bargain hunting and eroding brand equity, for example. Marketers seldom know exactly why individual consumers behave the way they do even when they are able to predict it quite accurately, but having insight into the why can be immensely helpful to marketers. New Product Development is one example.
Let’s return to management decisions. In the heat of battle, some of us have gotten into bad habits, and one is confusing tactical and strategic decisions. Not all decisions must be made instantaneously, despite what we may hear in the blogosphere. Haste can make tons of waste. Marketing research, in fact, has historically made its main contributions at the strategic rather than tactical level, thus the word research. In my opinion, to become the “new oil” data will need to be more fully utilized at the strategic level of decision making, not just used more extensively for automated machine-to-machine communication (M2M) purposes, for example.
The data | information | knowledge | wisdom hierarchy still makes plenty of sense.
The human brain is essentially the same as it was 25,000 years ago. Hadoop and Support Vector Machines have yet to play a role in natural selection. We are cave people with computers and balance sheets and have a lot more to learn about how to use data to make decisions. But we’ve been at it for eons.