In college, the one class I hated more than any other was chemistry lab. It wasn’t because of the calculations involved. It wasn’t because I couldn’t come to terms with the difference between molality and molarity. Heck, it wasn’t even the incomprehensible pseudo-English of my lab instructor. For me, it all boiled down to the humble test tube.
Yes, a test tube. More specifically, the meniscus of a liquid in a test tube. For those of you who skipped chem lab, a meniscus is the curve in the surface of a liquid close to where it touches the container, caused by surface tension. When looking at a test tube, the meniscus must be taken into account in order to obtain a precise measurement. Every time I leaned over to look at the test tube, I could never really figure out where the crescent bottomed out at, and I always felt like my measurements were inexact. Every deviation from a calculated value was the result of that dang meniscus, or so I thought. I couldn’t bear the imprecision.
Scientists understand the value of precision; the higher the probability of error, the less accurate your conclusions will be. Successful experiments are predicated upon high levels of precision. But this sort of precision is a function of mathematics and the accuracy of the instruments used to measure whatever is being tested. Thank the gods of fate and circumstance that no parallel exists in the corporate world. If I’m trying to predict the purchasing behavior of a consumer based upon a set of characteristics that can only be measured by anecdotal evidence supplied by marketers, then it’s not precision I am concerned with; it’s an interpretation of reality.
Sadly, the advent of big data threatens to change all of that. Not all by itself, of course; the analysis of big data is a means to an end. The heavy lifting is through instrumentation – the ability of an application or a device to collect runtime intelligence around usage levels, errors, user behavioral patterns, and ultimately the statistical analysis of this information. Everything we touch and everything we do is slowly being instrumented, and patterns of behavior are dissected and analyzed across a wide variety of psychographic domains – gender, ethnicity, age, socioeconomic status, and even sexual orientation. Every time you click on a link, something is recorded. That recording is compared against other links you have clicked on. Many times, those links are aggregated into a repository that has a history of your clicks. Machines share this information with each other, creating specific profiles of you and generic profiles of the various market segments that you belong to.
Over a couple of thousand people, such data collection is only of middling interest. It’s too restricted to offer a compelling profile of a group of users. But turn that couple of thousand into tens of millions, and then parse that across multiple data elements, and you have an entirely different story. All of a sudden, it’s no longer the whiz-kid sales agent that knows how to spot a winning product; a company has a better than 50% probability of selling a product based upon specific criteria it is collected about a given product’s differentiating factors. And maybe a 75% probability of selling you something that they think you might like.
Some time ago, Target sent coupons for baby clothes and diapers addressed to a high school girl at the home. An irate father showed up, demanding to talk to a manager and asking if Target was trying to encourage his daughter into getting pregnant. The manager apologized profusely and even followed up a few weeks later to find out if the coupons had stopped. Over the phone, the father abashedly admitted that his daughter was in fact pregnant and due later that year. Target was able to identify not just that the girl was pregnant, but also reasonably predict her due date; they sent her marketing material in order to capture her in her second trimester – the holy grail of retailors hoping to tie the knot with new clients. They knew she was pregnant and how far along she was even before her father did.
It doesn’t seem that complicated, but the amount of data that needs to be crunched in order to accurately predict the trimester is gigantic. It requires looking at tens of millions of purchases to find out what kind of products women tend to buy when they are in their second trimester. It also requires looking at digital footprints - tracking what ads a woman looks at when pregnant, and what she clicks through in her second trimester. It involves knowing what she will buy with an online coupon, and if a coupon sent in the mail will trigger a weekend shopping event at her local retailor. And it involves knowing the probability of a future visit using the Starbucks coupon they are going to print on her receipt.
Despite the massive amounts of data that were analyzed and crunched to create this amazing and true scenario, it only scratches the surface of what big data is ultimately capable of accomplishing. Collect enough data, and your probabilistic modeling starts approaching 100%, at least in category subsets. Using eugenics to determine the probability of a person’s suitability for a job or a field of study might be against the law, but what is to stop an employer or a university from running a person’s demographic, Facebook, Twitter, credit card, grades, and test records through a big data machine to figure it out? Why wouldn’t you want to, if the answer you ended up with was in excess of 90% accurate?
We are in era of instrumentation. Although many things we touch are already instrumented, the approaches we see today are almost rudimentary compared to what we will see in the future. Today we measure purchases and mouse clicks; tomorrow we will be tracking how long our vision rests on what image of an advertisement, or how often we use product names when speaking into our cell phones. We will measure how long it takes to make a decision when presented with a screen, or if we change our mind when it comes to selection, or how many options we consider when looking at a product. Companies will measure everything they possibly can in the hope that someone will develop a way to monetize that information. And every data point collected brings that company closer into refining and perfecting its product and marketing strategy.
It doesn’t have to be about sales and marketing, of course. Another major role instrumentation will play is in the optimization of manufacturing and operations. Instrumentation will identify not just what is going wrong inside an organization, but also what is about to go wrong. How many times does an employee commit the same error? How often is that error repeated across the same type of employee doing the same task? Is it a function of the employee or does the task need to be modified to make it less complex? Before, these questions were hardly ever asked, and when they were, they could only be answered by veterans of an organization. Soon, analytics will be able to predict when processes are failing under stress, or what processes are inefficient due to human errors.
We all like to look at ourselves as unique individuals with unique circumstances and unique capabilities, and in many ways we are. But where we are not unique are in the form of habits. Habits that describe how we lead our lives, how we make decisions, and what attracts us to consume something. Understanding and exploiting these habits is not a challenge to our uniqueness; rather, it is an affirmation of the banality of our decision-making when we allow ourselves to run on autopilot. Nothing messes up a good model like an exception to the rule.