The idea of harnessing “big data” is hardly new, yet continuing improvements in computing power have brought it to fruition in ways that should be genuinely impactful to future economic growth and the way we live our lives. At one level, big data the noun is just the accumulated record of our modern, digitized life, or what has been termed the “digital exhaust” we leave behind when we use computing devices to interact with the World Wide Web, social networks, and online shopping sites. We don’t mean to make it sound trivial or wasteful. On the contrary, more of this data is purposely being stored precisely because of what clever organizations can learn from it. So at another level, big data the verb is the process of analyzing historical digital records to predict future behavior and thereby gain actionable insights. The ability to act on such insights in a timely manner is what makes big data such a big opportunity.
It has become common knowledge that the amount of data being produced and stored by the modern economy is vast. Total data stored in the year 2000, when most of it was not even digital, has been estimated at 1 exabyte (that’s 1 billion gigabytes). By 2012 it had grown to 2,800 exabytes, nearly all of which was digital. Of this amount, probably less than 10% was what is known as structured data, such as records of financial transactions. This is the stuff of spreadsheets and relational databases, which for decades has been the lifeblood of business operations.
What constitutes the rest? After years of extensive Internet activity by the broad population, as well as the proliferation of connected devices, it is unstructured data: the phone calls, texts, pictures, videos, browser clicks, etc. that we humans create, plus the machine data created by smart meters, industrial machinery, health monitors, connected cars and appliances, radio frequency ID tags, and the other digital devices that constitute what is now known as the “Internet of Things.” Estimates place the expected number of unique connected devices at between 25 and 30 billion by the year 2020. This unstructured data is where the visionaries can see incremental business opportunity, if they can figure out how to extract value from the data.
As analysts, we have expertise in reviewing structured data such as financial statements. Structured data as found in a traditional database can be characterized for its accuracy, consistency, completeness, and durability. No matter who accesses this data, or from where they do so, all parties can rely on it being basically accurate and complete. Unstructured data is just the opposite. It has no defined format, because it can represent anything from bits and words to voices and images. As digital information it can be stored easily and ever more cheaply, as the cost of storage halves approximately every two years. However, the lack of structure means that organizing, re-accessing, and using such unstructured data in a meaningful way can be difficult.
In their book Big Data, Viktor Mayer-Schönberger and Kenneth Cukier provide a brief history of the art of prediction. They note mankind’s obsession with measuring the world and everything in it, essentially trying to quantify and ultimately predict natural phenomena. But by the late 19th century, governments and businesses could not efficiently process all of the data at their disposal, which led to the rise not only of statistical sampling but also of automated data processing. Sampling, while imperfect, remains to this day a common method of predicting results when it isn’t possible to poll an entire population. The concept of big data, however, harnesses the quantum leaps in data processing capabilities to allow analysts to move beyond the sampling process.
With data stores that contain extensive records of all sorts of human (and machine) activity, available samples are becoming huge and in some cases effectively complete. In this environment, the “thinking” has to be done by computers, because it is one thing to analyze a company’s financial statements, but quite another to ask a mere human to troll the massive data store known as the World Wide Web to find information that might have business or social value. That’s why big data is considered a subset of the field of artificial intelligence, specifically, machine learning. The more interesting type of machine learning is “unsupervised,” or a way for a machine to learn through observation, thereby gaining experience it can ultimately use to make predictions.
However, Mayer-Schönberger and Cukier note that when handing off the heavy lifting to a computer, one thing analysts must accept in order to process big data is the presence of inaccuracies. In the world of structured data (e.g. spreadsheets and relational databases) inaccuracies are anathema. But in the big data world, the goal is to understand general trends in order to make better business decisions. It’s not about reflecting the world with precision, but rather focusing resources in the right direction, as fast as possible. And perhaps more difficult, working with big data may force analysts to abandon their instincts, because a computer may tell us what is happening and even what to do about it, but give no clue as to why. But when time is of the essence, “why” doesn’t matter, only “what” matters.
Although computers can process lots of information quickly and even learn on the job, they still need to be given a set of instructions to follow, known as a program or an algorithm. Humans still design those programs, so capitalizing on big data is a function of three factors: available information, processing power, and cleverness. Some examples may help to explain. Apparently, Amazon’s recommendation engine is vastly superior to its competitors’, not only driving 30% of its sales, but simultaneously driving Amazon’s book review team out of work. Although based on algorithms designed and constantly refined by humans, the computers’ ability to track customers’ browsing and purchase history make it much more productive than human reviewers. A more general example of using big data to make business more productive is to track machine data produced by sensors embedded in industrial machinery, vehicles, and structures. When programmed to monitor the feedback, computers can recommend a maintenance action before normal wear becomes breakage. Finally, an example where social good more than business development was at issue. In 2009 Google was able to track the spread of the H1N1 flu virus more or less in real time (much more quickly than the Centers for Disease Control) because Google had invested the time to develop an algorithm that correctly identified outbreaks of the flu based on what search terms were prevalent in regions that proved to be affected.
McKinsey & Company has attempted to quantify the benefits to our economy of using the analysis of big data to cut operating costs, detect fraud, and generally increase efficiency. They see $610 billion of annual cost reductions by 2020 just in the retail, manufacturing, health care, and government arenas. It may seem like a huge target, but they reason that as the most productive companies implement advanced analytics on big data sets, their share gains will prompt others to do the same, thereby creating a higher overall standard.
The Investment Implications
Most technological developments are heralded by both excitement about the potential growth of the companies that invented or first implemented them, and concern over those whose business models are threatened. Interestingly, big data doesn’t seem to present that same dynamic. One might initially suspect that as the prevalence of unstructured data overwhelms structured data, purveyors of traditional relational databases would be threatened. But as business activity continues to grow, so does structured data, and mission-critical business information is most appropriately stored in relational databases, which have been developed over decades to be highly reliable and user friendly. Major database vendors also have the resources to build or buy management tools for unstructured data.
One obvious area of growth is in simple data storage itself. As companies learn the potential value of the data that their operations have been generating, they will likely choose to store more of it. We realize that the dynamic of the storage industry is one where sales of significant additional capacity are offset by falling selling prices. Nevertheless the trend has been a modest but positive net impact to revenues over time. We believe many companies are straining their storage assets, so we expect that a new investment cycle for storage vendors will come soon.
It takes years to build a library of data with enormous scope and scale, something with the characteristics of the Web itself, just as it takes years to build up a financial network or an unassailable patent portfolio, and we like businesses with such strong underlying, if intangible, assets. A Web-scale data store, such as a search engine or social network, can be very attractive at the right price.
More broadly, however, we agree that there are any number of ways to skin the big data cat, and if McKinsey is right, this trend will dovetail nicely with our thesis about the renaissance of American manufacturing. It would seem that this stool has yet another leg. And what applies to manufacturing can apply to many other areas. Data is varied, and finding what is relevant and tapping it for business insights needs to become a core competency for any company, one on which we will stay focused.
The one undeniable and unavoidable conclusion we can draw from the increasing adoption of big data analysis is that a big data world is essentially a Big Brother world. The fact that Target Corporation can track a teenager’s purchases of such things as vitamins and unscented lotions and correlate it with other customers’ purchase histories to figure out her due date before her parents even realize she’s pregnant is a little creepy, though many expecting parents probably just appreciate the coupons that arrive. Some of the privacy concerns about video surveillance and recorded phone calls do not stem from commercial operations but rather issues of national security, and most people would probably trade a bit of privacy to lessen the chance of being successfully targeted by terrorists. However, much of our digital exhaust is entirely of our own making and under our own control. If we want to keep it to a minimum, we can log off Facebook and shop only with cash at physical stores. Otherwise, we may consider ourselves quite welcome in the world of big data.
The U.S. dollar has been strong recently, and the stock market has begun to encounter a bit of turbulence. Historically, a stronger dollar has actually been supportive of stocks, reflecting a stronger economy and typically pushing price/earnings multiples higher. However, with economic growth stagnating in many parts of the world outside the U.S., such an environment may in the near term negatively impact American companies with significant overseas operations. The Federal Reserve has also clearly begun the process of ratcheting back its policy of extremely easy money. When combined with the continuing rise of stocks, we may be set up for a correction.
Longer term, two of our biggest concerns remain the very large buildup of U.S. government debt and unfunded future liabilities, and whether the political will exists to address these problems. If it doesn’t, eventually the financial markets will force the issue. But in the meantime, market participants will be more focused on the economic cycle, and the U.S. economy still seems capable of respectable if not dynamic growth for some time to come. Thus, while a general stock market correction seems overdue, we strongly emphasize our belief that if and when a correction does occur, it will be within the context of an ongoing secular bull market.