Wednesday, October 22, 2014

Data Banks

Each of us individually create a huge amount of data online. Some of this data we create explicitly, such as when we make webpages or public facing profiles, write emails, or author documents. But we also create a lot of data implicitly as a byproduct of our interactions with digital information. These implicit data includes the search queries we issue, the webpages we visit, and our online social networks.

The data we create is valuable. We can use it to understand more about ourselves, and services can use it to personalize our experiences and understand people’s information behavior in general. But despite the fact that we are the ones who create the data, much of it is not actually in our possession. Instead, it resides with companies that provide us with online services in exchange for it. A handful of powerful companies have a monopoly on our data.
Definition of monopoly: the exclusive possession or control of the supply or trade in a commodity or service
Definition of data monopoly: the exclusive possession or control of the supply or trade in an individual’s personal data
When a company that makes use of data to provide a service has a data monopoly, competitors cannot provide the same quality of service because they just don't have the same amount of information. Further, data monopolies are self-reinforcing; the fact that a company can provide the best service enables that company to collect the most additional data, which in turn can be used to further improve their service.

Google provides a good example of a data monopoly. Search engines incorporate usage data into their ranking algorithms. The more people who search using a search engine, the better the ranking. As a result, it is almost impossible for a new, unused search engine to rank documents well. Bing was only able to enter the scene because it had a powerhouse like Microsoft behind it to help drive through a period of significant data sparsity. Facebook is another good example. When your social network is owned by Facebook, the company is able to provide significant value that you cannot bring with you to another site. New companies without data don’t stand a chance in the face of data monopolies.

There are other challenges to data monopolies as well. Because our data is valuable to the companies that collect it, it often ends up fragmented across services. For example, Bing knows some of the search queries I issue, and Google knows a different set. They can each use the fraction of my queries that they have to improve their search results and personalize my experience, but neither can use all of my queries to create the best personalized experience possible. And I, meanwhile, cannot get at any of my queries without relying on both of the companies to return them to me. Additionally, academic research is inhibited by researchers’ lack of access to proprietary company data. As a result, innovation is stifled.

However, even if companies wanted to share the data they collect, it would be very difficult. Our personal data is not something individuals would like to share with other people, emerging companies, or even academic researchers. We tend to be willing to share some information with a service provider if it enables better experiences, but even as we do, we worry about breaches in our trust and the collecting entity’s ability to secure our data. We also worry about sharing information now that might be used against us at some unforeseen point in the future. If we could find a solution to data monopolies that enabled us to maintain control over our data, our trust in the system could enable us to share more information and receive better services.

One potential solution is to build data banks that serve as a trusted third party to collect and aggregate personal usage data. The data bank could then allow us access to our data as needed, share it with the companies we choose to enable personalized experiences, and sell anonymous data aggregated across users to companies that want large-scale usage data. Through the bank, the use of data could be audited to ensure any one individual’s data was not being used inappropriately. Further, individuals can receive a portion of the proceeds from the sale of aggregate data in the form of monetary interest.
Definition of bank: a business that keeps money for individual people or companies, exchanges currencies, makes loans, and offers other financial services
Definition of data bank: a business that keeps data for individual people or companies, aggregates data, makes information loans, and offers other information services
Data banks would allow us to have a complete picture of all of our data. For example, I would love to view all of the emails I ever received from mother at the same time. Now I can just see all of the emails I received at one account. Likewise, it might be nice to see all of the searches I have ever run, not just the queries I issued to a particular search engine. If our data were collected in single place, we would be able to easily access and use it. We could also selectively share it, retroactively deleting information we don’t want others to have (example: embarrassing pictures from college) as desired.

One reason personal behavioral data has value to companies is that they can use it to uniquely create a personalized experience, in a way that other companies without access to the same data cannot. A challenge with data monopolies is that that companies can use them to their advantage to lock users in and keep competitors out. People get locked in to companies because the companies own their data. If we are able to claim ownership of our own data using data banks, we could avoid getting stuck with companies because they own our data. Instead, when we join a new service we could grant access to our collection of relevant personal usage data.

Data banks would also allow us to monetize our data. Currently, we give our personal behavioral information to companies in exchange for the services we receive. However, this transaction is implicit. Data banks would let us get explicit value from our data instead. Companies that want to use aggregate usage data could purchase it from the bank, enabling new companies to start providing high quality services from the get go. The money made from the purchase could then be shared back with us, as the data owners.

Of course, there are also a lot of challenges that make data banks unlikely to happen anytime in the near future. Companies that currently own data are unlikely to want to give up their monopolies. Additionally, a successful data bank would require a lot of trust and must provide security and transparency, making it possible for others to audit how data is used. And while our behavioral data is valuable, logged data can be hard to understand outside of the context in which it was collected. Different systems log different content, and a lot of the data gathered is very fine grained and system dependent. The state of the system matters for understanding the data, and it may take time to identify good. But it is fun to imagine our data transactions made explicit in a way that breaks existing data monopolies to enable new opportunities for end users, companies, and researchers.

1 comment: