Beyond Transparency — 2013 Code for America
The private and public sectors have begun to embrace “big data” and analytics to improve productivity and enable innovation. We have documented the tremendous economic potential that can be unlocked by using the increasing volumes and diversity of real-time data (e.g., social media, road traffic flows) to make better decisions in a wide variety of sectors, from healthcare to manufacturing to retail to public administration (Manyika et al., 2011).
Open data—governments and other institutions making their data freely available—plays an important role in maximizing the benefits of big data. Open data enables third parties to create innovative products and services using datasets such as transportation data, or data about medical treatments and their outcomes, that are generated in the course of providing public services or conducting research. This is a trend that is both global—in less than two years, the number of national governments that have become members of the Open Government Partnership has increased from a founding eight to more than fifty—and local; state/provincial, and municipal governments, including New York, Chicago, and Boston, have begun to “liberate” their data through open data initiatives.
Some of the key motivations for open data initiatives are to promote transparency of decision-making, create accountability for elected and appointed officials, and spur greater citizen engagement. In addition, however, it is increasingly clear that open data can also enable the creation of economic value beyond the walls of the governments and institutions that share their data. This data can not only be used to help increase the productivity of existing companies and institutions, it also can spur the creation of entrepreneurial businesses and improve the welfare of individual consumers and citizens.
McKinsey & Company is engaged in ongoing research to identify the potential economic impact of open data, the findings from which will be published in the fall of 2013. In this piece, we would like to share some of our preliminary hypotheses from this work, including examples from our research into open data in healthcare (See “The ‘Big Data’ revolution in healthcare,” McKinsey Center for US Healthcare Reform and Business Technology).
It’s helpful to first clarify what we mean by open data. We use four criteria to define open data:
However, we also recognize that these are the ideals of “openness” and there is still significant value in making data more widely available, even if its use is not completely unrestricted. For example, the US Centers for Medicare & Medicaid Services (CMS) has released some health-care claims data, but only for use by qualified medical researchers, and with strict rules about how the data can be used. Nevertheless, providing this data outside of CMS multiplies the amount of value it can create. Similarly, there is great variation in the degree to which data can be considered machine-readable. Data in proprietary formats is machine-readable, but is less useful than data in open-standard formats, which do not require licenses to use and are not subject to change with format updates decided by a single vendor. And while a strict definition of open data requires zero cost for data access, some institutions have chosen to charge a fee for accessing data, still providing considerable value.
Very closely related to this definition of open data is the concept of “my data,” which involves supplying data about individuals or organizations that has been collected about them. In the United States, the “Blue Button” encourages healthcare providers to give patients access to their health information (see www.bluebuttondata.org). Similarly, the “Green Button” program encourages energy providers to give consumers access to energy usage information such as data collected by smart meters (see www.greenbuttondata.org). In “my data” applications, information is not made accessible to all, but only to the person or organization whose activities generated the data. These users can opt in to make their data available to other service providers (e.g., a service that analyzes energy consumption and suggests ways in which to improve energy efficiency).
It’s also worth considering why the open data movement is gathering momentum. First, the amount and variety of valuable data that is being generated and collected by institutions has exploded: transaction data produced by government, sensor data collected from the physical world, and regulatory data collected from third parties such as transportation carriers or financial institutions. Secondly, the ability to process large, real-time, diverse streams of data has been improving at an exponential rate, thanks to advances in computing power. Today, a smartphone has sufficient processing power to beat a grandmaster at chess.
Equally important, there are institutional forces accelerating the adoption of open data initiatives. Both within and especially outside of government, decision makers are demanding more precise and timely insights, supported by data and experimentation (e.g., running controlled experiments on the web or in the real world to determine how people will actually behave). At the same time, governments are under pressure to become more transparent, while simultaneously doing more with less due to fiscal constraints. The financial pressure also compels governments to look for economic growth and innovation, which could be catalyzed by new businesses based on open data.
Finally, there is a social benefit: open data can democratize information, as more individuals gain access to their own data through my data initiatives, and people with programming skills gain access to more datasets. Individuals can develop applications that use open data, reflecting their interests, rather than relying on data services provided by large organizations.
Our emerging hypothesis is that the effective use of open data can unlock significant amounts of economic value. For example, in US healthcare, we found that more than $300 billion a year in value potentially could be created through the use of more open data, e.g., through the analysis of open data to determine which therapies are both medically effective and cost-efficient. We also recognize that access to data alone does not unlock value. In healthcare, many systemic reforms need to be in place before data-enabled innovations such as large-scale analyses of comparative effectiveness and genetically tailored therapies can achieve their maximum potential. Yet, if reforms are in place, truly transformative changes in the healthcare system can result. We believe similar changes can occur in many other domains.
So what are some of the archetypes for value creation that we discovered? Building on our big data research, we see five common ways in which the use of open data can unlock value.
In many cases, we find that decisions are made without access to relevant data. Simply providing data to the right decision maker at the right moment can be a huge win. For example, most patients and primary care physicians have limited knowledge about how well different hospitals do in various types of surgery or how much different providers charge for a particular procedure. When such data exists—and is provided in a usable format—the resulting transparency can lead to better decisions. In our study of US healthcare, we estimate that ensuring that patients go to the right care setting (e.g., the one with the best record of outcomes and the best costs) could unlock $50 to $70 billion in annual value.
Closely related to transparency is the concept of exposing variability in processes and outcomes, then using experimentation to identify the drivers of that variability. For example, open data can be used to expose the variability in improving student achievement across various schools or school districts. When this information is made transparent, it creates incentives to improve educational outcomes. In addition to simply exposing differences in performance, open data can be used to design and analyze the outcomes of purposeful experimentation to determine which organizational or teaching techniques raise student achievement.
Open data can also be used to ensure that individuals and organizations receive the products and services that best meet their needs. There is an old saying in marketing that we know that half of marketing spending is wasted, but we don’t know which half. Open data can sometimes help marketers find the additional insights that can make their efforts more effective. For example, a provider of rooftop solar panels could narrow its targeted offers to customers who both have sufficient roof area, and sufficient solar exposure by using aerial imagery and weather data available from public sources.
Open data can be used to augment the data that is being analyzed to improve or automate decision-making. We know from research in behavioral economics and other fields that human decision-making is often influenced by cognitive biases. Furthermore, our minds are limited in the number of data points we can process. Advanced analytical techniques can help overcome these limitations. For example, researchers only identified the cardiovascular risks of COX-2 inhibitors (a class of anti-inflammatory drugs) after analyzing data on millions of individual doses. In some cases, data can be used to make real-time decisions automatically. For example, by combining data from embedded sensors with open data traffic information, it is possible to create systems that automatically adjust the timing of traffic signals to relieve congestion.
Some of the most exciting applications of open data come about when it is used to create new products and services by existing companies, or to create entirely new businesses. For example, in 2012, more than two hundred new applications of open health-care data were submitted to the US Health Data Initiative Forum. One submission, from a startup called Asthmapolis, combines usage data from sensors on asthma medicine inhalers with open environmental data (e.g., pollen counts and data on other allergens) to develop personalized treatment plans for patients with asthma.
Successful open data initiatives have many elements and the open data community is beginning to share practices and stories to make success more likely. Based on our ongoing research, we suggest that the following elements are needed for a successful open data initiative.
Too often, open data initiatives seem to prioritize releasing data based on the ease of implementation (i.e., making available the data that is easiest to release). We believe the prioritization process should also take value creation potential into account. For instance, datasets collected for regulatory or compliance purposes that enable companies to benchmark their performance against other players in the marketplace (e.g., energy efficiency data, purchasing data) can drive significant increases in economic performance for companies and consumers, even if the release of this data doesn’t directly benefit the public sector agency. Of course, it isn’t possible to predict all of the ways in which open data can be used to create value, so it’s still important to release open data to the large community of potential outside innovators, even if it’s not clear how it will be used. But in the near term, considering potential value creation along with ease of implementation should be part of the prioritization process.
To a certain extent, open data is a “platform play,” i.e., a foundation on which third parties can build innovative products and services. Tim O’Reilly, founder of O’Reilly Media, has famously described the concept of “Government as a Platform” (O’Reilly, 2011). To have a successful platform, you need to have a thriving ecosystem of contributors that build on your platform. For a successful open data initiative, it is important to activate a thriving ecosystem of developers that will build applications that use open data. This requires activities akin to marketing, including raising awareness of the availability of open data, convincing developers to try using open data (potentially through special offers, perhaps contests), supporting their experience, and even encouraging them to return to use other open data. The “Datapaloozas” that the United States Government has sponsored are an example of activating an ecosystem of developers to consume open data, as they convene developers at common events, celebrating successes and raising the visibility of and excitement around open data.
Clearly, a scalable and reliable data infrastructure has to be put in place. Ideally, an institution’s internal data infrastructure will be designed in such a way that makes it easy to open data to external connections when the decision is made to do so. One guiding principle that can help make this possible is to build internal interfaces as if they were external interfaces. Amazon.com requires all of its internal IT services to have standard application program interfaces. Then, when it wants to expose a new service that it has developed internally to the outside world, the process is relatively straightforward.
Thoughtful consideration must also be given to the channels through which open data is distributed. These decisions can greatly affect the uptake and continuing use of open data. Are you releasing data in open data formats that make it easy for third party developers to use? Do you provide appropriate metadata to help guide the users to the data? Do you provide means through which users of the data are alerted automatically when data has been updated?
Some institutions have decided to make “open” the default for their data. However, there are often good reasons not to release all of an organization’s data or to restrict openness along one of more of the open data dimensions (e.g., with fees or restrictions on use). Thoughtfully identifying the criteria for such restrictions will be important; they could include safety, security, privacy, liability, intellectual property, and confidentiality.
Last but not least, a successful open data program needs real leadership and a commitment to supporting an open data culture. In some cases, the benefits of releasing data could be outweighed by the perceived risks to managers, who might see an open data initiative as adding more work (e.g., dealing with outside stakeholders), while simultaneously making it more likely that facts in the data might be misrepresented, or even reveal issues about their operations. Leaders will have to set a tone from the top that the overall benefits make an open data initiative worth the investments and risks. Furthermore, leaders will also have to engage with the external community of data consumers, learning to treat them as “data customers,” and being responsive to their concerns and suggestions.
Particularly for smaller municipalities, it can be a challenge to find the resources, both financial as well as human, to invest in open data initiatives. One point that can help the investment case for open data is that much of the infrastructure for open data, e.g. building internal IT service interfaces as if they were external interfaces, actually improves the efficiency and scalability of the institution itself. Secondly, technology innovations, such as cloud services, are making the level of required investment more manageable. And more generally, taking advantage of external resources, from open source software to innovation fellowships and civic hackathons, can also unlock additional capabilities. Ultimately, institutions will have to determine the relative priority of creating value through open data to support their missions in the context of the other priorities.
Overall, open data can generate value for multiple stakeholders, including governments themselves, established companies, entrepreneurs, and individual citizens. Understanding the scope and scale of this value potential, particularly for stakeholders outside of the organization opening its data, and how to effectively create an ecosystem of data users, will be essential in order to generate this value.
Dr. Michael Chui is a principal of the McKinsey Global Institute (MGI), McKinsey’s business and economics research arm, where he leads research on the impact of information technologies and innovation on business, the economy, and society. Prior to joining McKinsey, Michael served as the first chief information officer of the city of Bloomington, Indiana, where he also founded a cooperative Internet Service Provider. He is based in San Francisco, CA.
Diana Farrell is a director in McKinsey & Company’s Public Sector Practice, and the global leader and a co-founder of the McKinsey Center for Government (MCG). Diana rejoined McKinsey in 2011, after two years as Deputy Director of the National Economic Council and Deputy Assistant on Economic Policy to President Obama. She is based in Washington, DC.
Steve Van Kuiken is a director in McKinsey & Company’s Business Technology Office, and leads McKinsey’s Healthcare Information Technology work, serving a wide variety of healthcare organizations in developing and executing technology strategies, including payors and providers, pharmaceutical and medical products companies, and IT providers to the industry. He is based in McKinsey’s New Jersey office.
The authors wish to thank their colleague Peter Groves, a principal at McKinsey & Company based in New Jersey, for his substantial contributions to this article.