Big data, and its effects on online markets, has been thrust into the center of the tech policy chattering class debate. In the last few weeks, events have been held on both sides of the Atlantic focusing on the concept of big data as an entry barrier. (The topic has also come up in speeches by FTC Commissioners [and a paper], in discussions surrounding the EU’s forthcoming Digital Single Market strategy, and is the frequent topic of recent academic writing.) Specifically, the concept being debated is whether the accumulation of data by Internet companies hinders competition because the new entrants will not be able to compete effectively with the first mover in the marketplace. In this post, I will address why startups and entrepreneurs should not be overly concerned.
In a stylized view of the Internet economy, as a platform (such as Google, Facebook, Amazon, Pinterest or Twitter) achieves scale and gains users, it acquires more data. This data leads to product improvement, which leads to more users and, subsequently, more data. The process repeats. According to proponents of the data as a barrier to entry theory, this leads to an unbreakable positive feedback loop that makes effective competition impossible.
However plausible this argument sounds, a review of the short history of the Internet economy, which has been characterized by intense competition and frequent disruption, seems to cast doubt on the soundness of the theory. (See Andres Lerner’s discussion of the User Scale – Service Quality feedback loop.) Besides the common examples of Facebook overtaking Myspace and Google overtaking prior search competitors (who, at the time, were predicted to be unassailable largely on account of the User Scale – Service Quality feedback loop discussed above), a casual look at online markets illustrates how competitive the market is. Why are online markets so competitive even though some firms are believed to have an unassailable advantage in big data?
First, this view of Internet markets is extremely simplistic. Data is just one input of many in the process of innovation and market success. Second, unique economic characteristics of data — such as it being non-rivalrous and the diminishing marginal returns of data — mean that the accumulation of data, as opposed to other barriers to entry like intellectual property portfolios or high-fixed capital costs, in and of itself does not function as much of a barrier at all. When you couple these characteristics with the fact that data, and the tools to use and analyze data, are readily available from numerous third party sources, the notion of an iron-clad data feedback loop falls apart.
I’ll break this down piece by piece.
1) Data is non-rivalrous and non-exclusive
Some have compared data to oil, referring to it as the essential input of the 21st century. Although, rhetorically, this analogy has superficial appeal, it is also misleading, as Geoff Manne and Ben Sperry point out:
“But to say data is like oil is a complete misnomer. If Exxon drills and extracts oil from the ground, that oil is no longer available to BP. Data is not finite in the same way.”
In economic terms all information, including data, is non-rivalrous and non-exclusive (Matt Schruers has touched on this before). In other words, if Twitter knows that I am a male, in a relationship and like sports, Facebook can also know those things. Twitter knowing these things neither prevents Facebook from knowing those things nor using that knowledge to better their product to serve my tastes. Therefore, as an input, data does not function as a barrier to entry, as say exclusive spectrum ownership or access to rare-earth minerals serve as a barriers to entry in the mobile telecommunications space (to pull two examples from the technology policy world). In these cases, purchasing the exclusive rights to operate on a segment of a nation’s airwaves, by definition, means it is not available to other competitors. This limits the number of firms that can provide nationwide mobile communications services. As for rare-earth elements (REE), mobile phone manufacturers need these to build their devices. When one firm consumes a REE in the manufacturing process, it is no longer available to other firms.
Furthermore, users tend to multihome, meaning they use many online platforms at the same time. Whether that means using both Google and Bing, or Pinterest and Twitter, the fact that someone built a successful product does not mean that consumers will use that product exclusively. In the online matchmaking world (which I will discuss in more detail later), consumers often use many online dating products at the same time. What does this mean in terms of data? Multiple companies have access to the same user data at the same time, and the use of one Internet platform does not deprive other Internet platforms from obtaining the same data from the same users.
To the extent that data, especially basic consumer behavior and preference data, is deemed essential to competitive success, the fact that no firm can control it or exclude others from using it means that it does not function as a barrier to entry in the way a finite, excludable resource could.
2) The marginal returns on data diminish rapidly
To illustrate this point theoretically, one needs not go much further than Stats 101. If one is to conduct a survey of voting preferences for an upcoming election, one needs to construct a large enough sample size to ensure accuracy. However, each additional survey participant does not increase the quality of the survey by the same amount as the one before her. For example, if a pollster has 5 respondents, adding a 6th proves extremely valuable. If the pollster already has 100,000 respondents, then adding an additional one is almost insignificant. Therefore, a survey that has 100,000 respondents is not twice as accurate as a survey with 50,000 respondents. In fact, in both cases, the margin of error is less than 1%. (For the stats nerds, accuracy generally increases as a square root of the sample size, so doubling the sample size equates to roughly a 41% increase in accuracy. Hence, the rapidly declining returns to scale.)
[Although the above example has been simplified for clarity, a more thorough explanation of this concept’s applicability to “big data” can be found in Andres Lerner’s paper, paragraphs 61 – 76.]
To veer back into the real world, what are the practical effects of this mathematical reality? This is why most Internet companies test algorithmic changes on a small subset of users (see Facebook and Google). In this case, there is little competitive advantages to scale after a certain point. And, as industries grow, the competitive advantage a larger rival has over a smaller rival becomes even smaller.
This mathematical reality leads to the conclusion that how companies utilize and parse the data is much more important than the sheer volume of data a company has.
3) Data is readily available in the marketplace
A quick read of the FTC’s recent report on data brokers makes clear how easily data is to obtain on the open market. Although the report calls for greater transparency and accountability, it also makes clear that these services facilitate dynamic online competition:
[C]onsumers benefit from increased and innovative product offerings fueled by increased competition from small businesses that are able to connect with consumers they may not have otherwise been able to reach.
Although the report focused on nine of the biggest data brokers, the report also makes clear that there a many more companies and products in the market providing similar services to businesses. Therefore, a startup company can avail itself of a similar set of data driven insights of the market leaders with large user bases, as the report notes:
Among other things, the analytics products offered by some of the data brokers enable a client to more accurately target consumers for an advertising campaign, refine product and campaign messages, and gain insights and information about consumer attitudes and preferences.
4) The market for data analytics is also robust
The market for data analytics, companies making tools to help customers derive insights from data, is also incredibly robust. In 2015, the data analytics market is predicted be worth $125 billion. Although it is beyond the scope of this article to go into great detail on this phenomenon, it is worth noting that companies looking to utilize the data they either have or acquire can quickly, and relatively cheaply (as compared to building these tools from scratch in house), benefit from the insights of big data. There are even free, widely-used open source technologies that allow users to analyze large datasets (i.e. Hadoop). And, as the FTC report discusses, many data brokers provide businesses with structured and analyzed data, not just raw data sets.
5) The value of data decreases rapidly over time
The value of big data is fleeting. Historical data can be mined for trends, which can be helpful from a product improvement standpoint, but historical data is of little value for real-time decisions, such as ad targeting, thus limiting the advantages conferred to incumbents who have caches of historical data. As noted in a paper by Darren Tucker and Hill Wellford, 70% of unstructured data is stale after 90 days. As a result, most data processing and analysis is done in real time (or on a near-real-time basis).
6) Barriers to entry online are very low
Focusing on data as a barrier to entry in online markets belies the fact that the Internet is a dynamic marketplace that has drastically lowered barriers to entry. The capital costs of starting and scaling a business online are significantly lower than in the offline world. Worldwide reach, standardized technology and communications protocols, and rapid price decreases in things like cloud platforms and storage, means that it is cheap — and getting cheaper by the day — to build an online business. As I have discussed previously, these characteristics allow firms to scale quickly but they also allow potential competitors to scale quickly and overtake them:
On the Internet, consumers can flock to the best product or service en mass almost instantly. This means the best product or service often quickly gains impressive market share. However, the same dynamics that precipitated the rise of companies like Google and Facebook also place extreme competitive pressure on them.
In fact, the widespread availability of data (and data processing tools) lowers barrier to entry more than it entrenches current incumbents. You don’t even have to start out with users anymore to obtain data about consumer preferences and online behavior. Thus, on the first day of a product’s launch, a company can have already designed a product that is informed by consumer preferences and that has the programming infrastructure to respond intelligently to specific customers.
7) Ideas matter more than data
So, what insights can be derived from the preceding discussion of the economic characteristics of data?
Undeniably, more data helps companies refine and evolve their products, but this is true across all sectors of the economy. Traditional retailers, such as Tesco and Walmart, actively collect a myriad of data about consumers’ shopping preferences. Individual stores produce heat maps to determine the most trafficked floor space, which dictates where retailers place certain products. In the auto industry, companies like Volvo collect data on their cars through thousands of sensors that both help service current automobiles and inform later design changes. These processes mirror that of online companies that use data to better tailor their products to consumer preferences. Indeed, this is testament to how competitive these markets are and the need for constant product improvement to stay relevant.
However, a trove of data is not hugely important to building a better product and succeeding in the marketplace. The quality of service offered to users is the single biggest determinant of success for new Internet products and services. In terms of building a successful business model to compete with incumbents, it helps to build a better mousetrap. Or, in other words, attack the same problem in a different way.
Google achieved success over other search engines by conceiving of a better way of matching users queries to relevant websites. The fact that Yahoo and AltaVista had a lead in the race for data didn’t matter much when Google conceived of a better way to do things. In the case of Facebook, it built a social network that users liked better (even though social networks like Myspace and Friendster had large user bases and data advantages).
Or, for a possibly more illustrative cutting-edge example, it helps to look at the evolution of the market for Internet-powered dating services. If ever there were a market where data would serve as a barrier to entry, online dating would be a perfect example. Given the complex, varied, and poorly understood nature of human affection, possessing a large user base (and their detailed personal information and preferences) and troves of data about human attraction and relationship compatibility should give early movers in the online dating space an unassailable advantage. Yet, Tinder — an online dating app that launched less than 3 years ago — is adding a million users a week and is already valued at over $1 billion. Given the nature of this market, and the theoretically large data advantage rivals such as Match.com, eHarmony and OkCupid enjoyed, the iron-clad feedback loop theory of data should have meant that Tinder shouldn’t even have tried to compete, let alone achieved significant success. But, the founders of Tinder — like Internet entrepreneurs that went before them — thought they had a better idea. In the case of Tinder, this was the “double opt in”. In other words, the fact that Tinder didn’t have a matchmaking algorithm (and loads of matchmaking data) didn’t matter. As one GQ author puts it:
The key to Tinder—the “double opt-in”—is an idea born of real-world experience (this is what you want in a bar—to know that the person you want to hit on wants you to hit on him or her) as opposed to sophisticated computer metrics.
The process Tinder’s founders created through the mobile application — the double opt-in where users declare secretly who they are attracted to and are only matched after both say yes — immediately posed a challenge to the established dating websites and their algorithms. Now, Tinder has million of users and, according to its own stats, facilitates 21 million matches a day, which gives it a mountain of data through which it can refine its product and tailor its user experience. (Given that many of Tinder’s users use multiple online dating services, this also illustrates the multihoming concept discussed previously.) But that data isn’t going to protect the company from the next entrepreneur that thinks she has a better approach from starting a mobile dating app. In fact, a number of Tinder competitors have emerged, with different ideas on how to best facilitate matchmaking. For example, Hinge, which combines Tinder’s location-driven approach with your friend information from Facebook and your past preferences, is trying to solve what its founder saw as a problem with Tinder for many users: the randomness factor of Tinder that makes the app uncomfortable for some users. According to a Hinge spokesman, “If Tinder feels like meeting a stranger at a bar, Hinge feels like getting warmly introduced at a cocktail party.”
Hinge is also a great example of the role of data in online competition. Although Hinge uses data it gains from its users to improve its matching algorithm, the fact that other dating platforms already had a lot of data did not prevent it from entering the market. A Hinge factsheet explains its process:
Think of setting up your pickiest friend. First, you’d think of all the people you know who he/she might like to meet. Then you would prioritize those recommendations based on what you know about your friend…. Finally, over time you would start to learn his/her tastes and refine your recommendations. That’s exactly how Hinge’s algorithm works.
In this case, data about individual users improves the apps performance, but not having detailed information on users did not prevent market entry.
Since its launch, Hinge has grown rapidly (even as Tinder is still rapidly expanding) and recently secured significant venture funding. And, according to recent statements by its CEO, it is growing its user base by 20% a month.
The evolution of the online dating platform mirrors the evolution of other sectors of online competition. (In the above video, Hinge’s CEO likens his approach to competing with Tinder to how Facebook took on MySpace). Data is a useful input, but a slightly different idea or algorithm can easily lead to the dethroning of the current market leaders, which parallels the success stories of Google and Facebook.
If the online dating market shows us anything, data can help you improve your product or better monetize traffic, but it does little to protect you from competition — especially when a company has better idea.