The term “predictive analytics” is grossly overused in the startup world these days. I’d say about 20% of the pitch decks I see reference some sort of predictive or machine-learning feature. Often it’s used in an aspirational way (e.g. on the product roadmap 3 years out) or really just kind of thrown in to add a data-driven aspect to an otherwise traditional approach. Investors, the press and industry types are in on it too. Using Google Trends data as a proxy, public interest in predictive analytics has been on a steady uptick, increasing fivefold since 2009.
We’re warming to the term “predictive” because it’s a handy catchall for everything taking advantage of a pervasive trend: leveraging the availability of tons of new data, many of which are real-time and some of which aren’t even structured (i.e analysis-ready). “Predictive” is a common thread between Amazon / Netflix recommendations, Palantir’s signal-from-the noise tech, Rocketfuel’s ad optimization magic, IBM Watson’s Ken Jennings-beating secret sauce, Siri (and Cortana!), and so on. Even President Obama got in on the excitement with a recent announcement of public & private funding for tools to support neural network mapping, equally neuro- and computer-science. The accurate prediction of complex outcomes—by systems so sophisticated as to appear “intelligent”—has been a holy grail of futurists for centuries. The contemporary prospect of harnessing “big data” in combination with new machine learning schema would seem to be a turning point that we’re only now approaching. The truth is, we still face a lot of hard work to make sense the growing pile of data we’re now able to collect about ourselves, our businesses and the world around us. But the progress we’ve made puts us on a pretty exciting trajectory:
Data Analysis For Prediction Is Ancient, But The Input Required Was Lame
Humans have been able to observe patterns within complex systems in order to accurately presage events for millennia. Several predictions we’ve become quite good at involve natural events like sunrises, tides and moon phases. Several ancient astronomers did do through dogged observation, literally marking the movement of astral bodies daily for decades in order to understand seemingly mysterious events like the retrograde motion of Mars or the transit of Venus. When the first automatic computers came along, the were thought of as time savers, or tools that merely sped up what man could already conceptualize (and even do himself given the time). Examples include the abacus, the proposed Babbage engine and punch-card mainframes. We did the real thinking, machines did the grunt work, then we pulled insights from their output.
Machines! They Learned Quickly, But Still Needed Tons Of Manual Feeding
In the late 1950’s, a new concept began to emerge around what computer “programs” (which began as little more than overgrows punch cards) might actually be able to do: make “intelligent” decisions rather than merely execute calculations. The term machine learning was coined in 1959. The basic gist was that computers’ ability to do X could improve autonomously, without human intervention, by “learning.” We’ll illustrate with a theoretical program whose task it is to distinguish a human face from other animals’ faces. To begin with, a programmer would “teach” the program a thing or two about the general characteristics of the human face vs. others animal faces. He’d do this by giving it a data set of many faces (of both types) plus the correct answers (human or not human). Then, the program would analyze the data and identify the graphical similarities of the human group, the similarities of the non-human group, and the most common differentiating factors, for example. Now, having learned from its “training set,” the program can be served new, unseen photos, and make independent determinations on whether or not it depicts a human visage. Over time, whether through additional training or continuous feedback, the predictive software should become better and better at its job. But at the end of the day, the decisions the program could make were still pretty one-dimensional, or limited by having to hand-choose factors & feed training data for each analysis.
Ubiquitous Data + Autonomous Learning = Hands-Free Intelligence
Since the inception of machine learning as a concept over 50 years ago, the ability of software to crunch more data faster has skyrocketed, thanks both to the work of pioneering data scientists and the steady forward-march of processor power. It’s gotten to the point where machines can run calculations so complicated that that they begin to appear “intelligent.” John McCarthy (a father of machine learning alongside Minsky, Shannon, Solomonoff, etc.) said it best: “every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.” If and when these simulations constitute human-grade “intelligence” is the real sticking point (and whether or not tests like Turing’s famous thought experiment constitute the best measures of success). Miles Brundage and Joanna Bryson wrote a nice piece on the topic, published not too long ago on Slate. They pose that we’d be remiss to downplay progress of technologies like Watson as merely computation, while highlighting views that we’re a far cry from AI (the stance of one of my favorite modern thinkers, Douglas Hofstadter). I’d tend to agree while pointing out that as the amount of data accessible grows, and our programs’ ability to learn from that data in an ad hoc manner expands, software begins to surface more “proactive analytics,” which may begin to blur the line between what we call predictive and intelligent.
What To Expect In Software: The Transition From Pull To Push
I think we’re likely now entering a 10-year period in which predictive analytics turn the corner from being a “pull” technology (e.g. dump in the data, get a result) and develop the capability to parse a large amount & variety of data to “push” the right insights, at the right time, in the right way to users. While it’s not necessarily intelligence, as these systems are able to access a variety & volume of data not easily comprehensible to humans, and analyze them with methods far beyond our computational abilities, they’ll appear to know something we don’t. This concept is certainly applicable to both consumer and business software, but in B2B, we’re already starting to see some pretty amazing applications.
Emerging software startups are attempting to, for example, identify the next likely location of a crime (Knightscope) or the best combo of lifestyle & meds (Lumiata) on a person-by-person basis. No one would give up their cops or doctors for the time being but one can begin to see how the line between what we deem predictive and intelligent is beginning to blur. In enterprise software, the reigns have been fully handed over to machines as far as some decisions are concerned. The proliferation of sensor data is revolutionizing logistics & SCM, increasing transit efficiency & inventory availability by 30-40% generally (and much more in some cases). Prices for tickets, hotels and other items (non-retail) online are often fully controlled by algorithms, which take advantage of weather, device, macro-economic metrics, etc.; as you likely know from Kayak, their results are inscrutable to the layman. Marketing is experiencing a complete decade-long transformation from a gut and style business (think Mad Men) to a fully quantified field advanced by data scientists and hackers (RocketFuel on the backend, and emerging players like TrackMaven in the space of marketer-facing applications) [disclosure: TrackMaven is one of ours!]. While it’s my view that we’re seeing most of this predictive analytics progress in the enterprise, expect it to enter your home and daily life if it hasn’t already. The key consideration is that in all these cases, complex & actionable insights are being surfaced without a commensurate increase in the work required to feed programs training data. Not to downplay the difficult work of software engineers here, but when our programs determine not only results, but when and where we need them, analytics becomes a much more frictionless “push” process that we can spend less dedicated time governing. And for the scientist, salesperson, marketer and consumer alike, that’s a big deal.
Now, the moment you’ve been waiting for: “where we’re going, we don’t need roads.” Or at the very least we won’t need to look up directions.