The data gold rush


Neelie KROES (European Commission):  “Nearly 200 years ago, the industrial revolution saw new networks take over. Not just a new form of transport, the railways connected industries, connected people, energised the economy, transformed society.
Now we stand facing a new industrial revolution: a digital one.
With cloud computing its new engine, big data its new fuel. Transporting the amazing innovations of the internet, and the internet of things. Running on broadband rails: fast, reliable, pervasive.
My dream is that Europe takes its full part. With European industry able to supply, European citizens and businesses able to benefit, European governments able and willing to support. But we must get all those components right.
What does it mean to say we’re in the big data era?
First, it means more data than ever at our disposal. Take all the information of humanity from the dawn of civilisation until 2003 – nowadays that is produced in just two days. We are also acting to have more and more of it become available as open data, for science, for experimentation, for new products and services.
Second, we have ever more ways – not just to collect that data – but to manage it, manipulate it, use it. That is the magic to find value amid the mass of data. The right infrastructure, the right networks, the right computing capacity and, last but not least, the right analysis methods and algorithms help us break through the mountains of rock to find the gold within.
Third, this is not just some niche product for tech-lovers. The impact and difference to people’s lives are huge: in so many fields.
Transforming healthcare, using data to develop new drugs, and save lives. Greener cities with fewer traffic jams, and smarter use of public money.
A business boost: like retailers who communicate smarter with customers, for more personalisation, more productivity, a better bottom line.
No wonder big data is growing 40% a year. No wonder data jobs grow fast. No wonder skills and profiles that didn’t exist a few years ago are now hot property: and we need them all, from data cleaner to data manager to data scientist.
This can make a difference to people’s lives. Wherever you sit in the data ecosystem – never forget that. Never forget that real impact and real potential.
Politicians are starting to get this. The EU’s Presidents and Prime Ministers have recognised the boost to productivity, innovation and better services from big data and cloud computing.
But those technologies need the right environment. We can’t go on struggling with poor quality broadband. With each country trying on its own. With infrastructure and research that are individual and ineffective, separate and subscale. With different laws and practices shackling and shattering the single market. We can’t go on like that.
Nor can we continue in an atmosphere of insecurity and mistrust.
Recent revelations show what is possible online. They show implications for privacy, security, and rights.
You can react in two ways. One is to throw up your hands and surrender. To give up and put big data in the box marked “too difficult”. To turn away from this opportunity, and turn your back on problems that need to be solved, from cancer to climate change. Or – even worse – to simply accept that Europe won’t figure on this mapbut will be reduced to importing the results and products of others.
Alternatively: you can decide that we are going to master big data – and master all its dependencies, requirements and implications, including cloud and other infrastructures, Internet of things technologies as well as privacy and security. And do it on our own terms.
And by the way – privacy and security safeguards do not just have to be about protecting and limiting. Data generates value, and unlocks the door to new opportunities: you don’t need to “protect” people from their own assets. What you need is to empower people, give them control, give them a fair share of that value. Give them rights over their data – and responsibilities too, and the digital tools to exercise them. And ensure that the networks and systems they use are affordable, flexible, resilient, trustworthy, secure.
One thing is clear: the answer to greater security is not just to build walls. Many millennia ago, the Greek people realised that. They realised that you can build walls as high and as strong as you like – it won’t make a difference, not without the right awareness, the right risk management, the right security, at every link in the chain. If only the Trojans had realised that too! The same is true in the digital age: keep our data locked up in Europe, engage in an impossible dream of isolation, and we lose an opportunity; without gaining any security.
But master all these areas, and we would truly have mastered big data. Then we would have showed technology can take account of democratic values; and that a dynamic democracy can cope with technology. Then we would have a boost to benefit every European.
So let’s turn this asset into gold. With the infrastructure to capture and process. Cloud capability that is efficient, affordable, on-demand. Let’s tackle the obstacles, from standards and certification, trust and security, to ownership and copyright. With the right skills, so our workforce can seize this opportunity. With new partnerships, getting all the right players together. And investing in research and innovation. Over the next two years, we are putting 90 million euros on the table for big data and 125 million for the cloud.
I want to respond to this economic imperative. And I want to respond to the call of the European Council – looking at all the aspects relevant to tomorrow’s digital economy.
You can help us build this future. All of you. Helping to bring about the digital data-driven economy of the future. Expanding and depening the ecosystem around data. New players, new intermediaries, new solutions, new jobs, new growth….”

Coordinating the Commons: Diversity & Dynamics in Open Collaborations


Dissertation by Jonathan T. Morgan: “The success of Wikipedia demonstrates that open collaboration can be an effective model for organizing geographically-distributed volunteers to perform complex, sustained work at a massive scale. However, Wikipedia’s history also demonstrates some of the challenges that large, long-term open collaborations face: the core community of Wikipedia editors—the volunteers who contribute most of the encyclopedia’s content and ensure that articles are correct and consistent — has been gradually shrinking since 2007, in part because Wikipedia’s social climate has become increasingly inhospitable for newcomers, female editors, and editors from other underrepresented demographics. Previous research studies of change over time within other work contexts, such as corporations, suggests that incremental processes such as bureaucratic formalization can make organizations more rule-bound and less adaptable — in effect, less open— as they grow and age. There has been little research on how open collaborations like Wikipedia change over time, and on the impact of those changes on the social dynamics of the collaborating community and the way community members prioritize and perform work. Learning from Wikipedia’s successes and failures can help researchers and designers understand how to support open collaborations in other domains — such as Free/Libre Open Source Software, Citizen Science, and Citizen Journalism.

In this dissertation, I examine the role of openness, and the potential antecedents and consequences of formalization, within Wikipedia through an analysis of three distinct but interrelated social structures: community-created rules within the Wikipedia policy environment, coordination work and group dynamics within self-organized open teams called WikiProjects, and the socialization mechanisms that Wikipedia editors use to teach new community members how to participate.To inquire further, I have designed a new editor peer support space, the Wikipedia Teahouse, based on the findings from my empirical studies. The Teahouse is a volunteer-driven project that provides a welcoming and engaging environment in which new editors can learn how to be productive members of the Wikipedia community, with the goal of increasing the number and diversity of newcomers who go on to make substantial contributions to Wikipedia …”

Overcoming 'Tragedies of the Commons' with a Self-Regulating, Participatory Market Society


Paper by Dirk Helbing; “Our society is fundamentally changing. These days, almost nothing works without a computer chip. Processing power doubles every 18 months and will exceed the capabilities of human brains in about ten years from now. Some time ago, IBM’s Big Blue computer already beat the best chess player. Meanwhile, computers perform about 70 percent of all financial transactions, and IBM’s Watson advises customers better than human telephone hotlines. Will computers and robots soon replace skilled labor? In many European countries, unemployment is reaching historical heights. The forthcoming economic and social impact of future information and communication technologies (ICT) will be huge – probably more significant than that caused by the steam engine, or by nano- or biotechnology.
The storage capacity for data is growing even faster than computational capacity. Within just a year we will soon generate more data than in the entire history of humankind. The “Internet of Things” will network trillions of sensors. Unimaginable amounts of data will be collected. Big Data is already being praised as the “oil of the 21st century”. What opportunities and risks does this create for our society, economy, and environment?”

Three ways digital leaders can operate successfully in local government


in The Guardian: “The landscape of digital is constantly changing and being redefined with every new development, technology breakthrough, success and failure. We need digital public sector leaders who can properly navigate this environment, and follow these three guidelines.
1. Champion open data
We need leaders who can ensure that information and data is open by default, and secure when absolutely required. Too often councils commission digital programmes only to find the data generated does not easily integrate with other systems, or that data is not council-owned and can only be accessed at further cost.
2. Don’t get distracted by flashy products
Leaders must adopt an agnostic approach to technology, and not get seduced by the ever-increasing number of digital technologies and lose sight of real user and business needs.
3. Learn from research
Tales of misplaced IT investments plague the public sector, and senior leaders are understandably hesitant when considering future investments. To avoid causing even more disruption, we should learn from research findings such as those of the New Local Government Network’s recent digital roundtables on what works.
Making the decision to properly invest in digital leadership will not just improve decision making about digital solutions and strategies. It will also bring in the knowledge needed to navigate the complex security requirements that surround public-sector IT. And it will ensure that practices honed in the digital environment become embedded in the council more generally.
In Devon, for example, we are making sure all the services we offer online are based on the experience and behaviour of users. This has led service teams to refocus on the needs of citizens rather than those of the organisation. And our experiences of future proofing, agility and responsiveness are informing service design throughout the council.
What’s holding us back?
Across local government there is still a fragmented approach to collaboration. In central government, the Government Digital Service is charged with providing the right environment for change across all government departments. However, in local government, digital leaders often work alone without a unifying strategy across the sector. It is important to understand and recognise that the Government Digital Service is more than just a team pushing and promoting digital in central government: they are the future of central government, attempting to transform everything.
Initiatives such as LocalGov Digital, (O2’s Local Government Digital Fund), Forum (the DCLG’s local digital alliance) and the Guardian’s many public sector forums and networks are all helping to push forward debate, spread good practice and build a sense of urgent optimism around the local government digital agenda. But at present there is no equivalent to the unified force of the Government Digital Service.”

Open Data (Updated and Expanded)


As part of an ongoing effort to build a knowledge base for the field of opening governance by organizing and disseminating its learnings, the GovLab Selected Readings series provides an annotated and curated collection of recommended works on key opening governance topics. We start our series with a focus on Open Data. To suggest additional readings on this or any other topic, please email [email protected].

Data and its uses for GovernanceOpen data refers to data that is publicly available for anyone to use and which is licensed in a way that allows for its re-use. The common requirement that open data be machine-readable not only means that data is distributed via the Internet in a digitized form, but can also be processed by computers through automation, ensuring both wide dissemination and ease of re-use. Much of the focus of the open data advocacy community is on government data and government-supported research data. For example, in May 2013, the US Open Data Policy defined open data as publicly available data structured in a way that enables the data to be fully discoverable and usable by end users, and consistent with a number of principles focused on availability, accessibility and reusability.

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)
Fox, Mark S. “City Data: Big, Open and Linked.” Working Paper, Enterprise Integration Laboratory (2013). http://bit.ly/1bFr7oL.

  • This paper examines concepts that underlie Big City Data using data from multiple cities as examples. It begins by explaining the concepts of Open, Unified, Linked, and Grounded data, which are central to the Semantic Web. Fox then explore Big Data as an extension of Data Analytics, and provide case examples of good data analytics in cities.
  • Fox concludes that we can develop the tools that will enable anyone to analyze data, both big and small, by adopting the principles of the Semantic Web:
    • Data being openly available over the internet,
    • Data being unifiable using common vocabularies,
    • Data being linkable using International Resource Identifiers,
    • Data being accessible using a common data structure, namely triples,
    • Data being semantically grounded using Ontologies.

Foulonneau, Muriel, Sébastien Martin, and Slim Turki. “How Open Data Are Turned into Services?” In Exploring Services Science, edited by Mehdi Snene and Michel Leonard, 31–39. Lecture Notes in Business Information Processing 169. Springer International Publishing, 2014. http://bit.ly/1fltUmR.

  • In this chapter, the authors argue that, considering the important role the development of new services plays as a motivation for open data policies, the impact of new services created through open data should play a more central role in evaluating the success of open data initiatives.
  • Foulonneau, Martin and Turki argue that the following metrics should be considered when evaluating the success of open data initiatives: “the usage, audience, and uniqueness of the services, according to the changes it has entailed in the public institutions that have open their data…the business opportunity it has created, the citizen perception of the city…the modification to particular markets it has entailed…the sustainability of the services created, or even the new dialog created with citizens.”

Goldstein, Brett, and Lauren Dyson. Beyond Transparency: Open Data and the Future of Civic Innovation. 1 edition. (Code for America Press: 2013). http://bit.ly/15OAxgF

  • This “cross-disciplinary survey of the open data landscape” features stories from practitioners in the open data space — including Michael Flowers, Brett Goldstein, Emer Colmeman and many others — discussing what they’ve accomplished with open civic data. The book “seeks to move beyond the rhetoric of transparency for transparency’s sake and towards action and problem solving.”
  • The book’s editors seek to accomplish the following objectives:
    • Help local governments learn how to start an open data program
    • Spark discussion on where open data will go next
    • Help community members outside of government better engage with the process of governance
    • Lend a voice to many aspects of the open data community.
  • The book is broken into five sections: Opening Government Data, Building on Open Data, Understanding Open Data, Driving Decisions with Data and Looking Ahead.

Granickas, Karolis. “Understanding the Impact of Releasing and Re-using Open Government Data.” European Public Sector Information Platform, ePSIplatform Topic Report No. 2013/08, (2013). http://bit.ly/GU0Nx4.

  • This paper examines the impact of open government data by exploring the latest research in the field, with an eye toward enabling  an environment for open data, as well as identifying the benefits of open government data and its political, social, and economic impacts.
  • Granickas concludes that to maximize the benefits of open government data: a) further research is required that structure and measure potential benefits of open government data; b) “government should pay more attention to creating feedback mechanisms between policy implementers, data providers and data-re-users”; c) “finding a balance between demand and supply requires mechanisms of shaping demand from data re-users and also demonstration of data inventory that governments possess”; and lastly, d) “open data policies require regular monitoring.”

Gurin, Joel. Open Data Now: The Secret to Hot Startups, Smart Investing, Savvy Marketing, and Fast Innovation, (New York: McGraw-Hill, 2014). http://amzn.to/1flubWR.

  • In this book, GovLab Senior Advisor and Open Data 500 director Joel Gurin explores the broad realized and potential benefit of Open Data, and how, “unlike Big Data, Open Data is transparent, accessible, and reusable in ways that give it the power to transform business, government, and society.”
  • The book provides “an essential guide to understanding all kinds of open databases – business, government, science, technology, retail, social media, and more – and using those resources to your best advantage.”
  • In particular, Gurin discusses a number of applications of Open Data with very real potential benefits:
    • “Hot Startups: turn government data into profitable ventures;
    • Savvy Marketing: understanding how reputational data drives your brand;
    • Data-Driven Investing: apply new tools for business analysis;
    • Consumer Information: connect with your customers using smart disclosure;
    • Green Business: use data to bet on sustainable companies;
    • Fast R&D: turn the online world into your research lab;
    • New Opportunities: explore open fields for new businesses.”

Jetzek, Thorhildur, Michel Avital, and Niels Bjørn-Andersen. “Generating Value from Open Government Data.” Thirty Fourth International Conference on Information Systems, 5. General IS Topics 2013. http://bit.ly/1gCbQqL.

  • In this paper, the authors “developed a conceptual model portraying how data as a resource can be transformed to value.”
  • Jetzek, Avital and Bjørn-Andersen propose a conceptual model featuring four Enabling Factors (openness, resource governance, capabilities and technical connectivity) acting on four Value Generating Mechanisms (efficiency, innovation, transparency and participation) leading to the impacts of Economic and Social Value.
  • The authors argue that their research supports that “all four of the identified mechanisms positively influence value, reflected in the level of education, health and wellbeing, as well as the monetary value of GDP and environmental factors.”

Kassen, Maxat. “A promising phenomenon of open data: A case study of the Chicago open data project.Government Information Quarterly (2013). http://bit.ly/1ewIZnk.

  • This paper uses the Chicago open data project to explore the “empowering potential of an open data phenomenon at the local level as a platform useful for promotion of civic engagement projects and provide a framework for future research and hypothesis testing.”
  • Kassen argues that “open data-driven projects offer a new platform for proactive civic engagement” wherein governments can harness “the collective wisdom of the local communities, their knowledge and visions of the local challenges, governments could react and meet citizens’ needs in a more productive and cost-efficient manner.”
  • The paper highlights the need for independent IT developers to network in order for this trend to continue, as well as the importance of the private sector in “overall diffusion of the open data concept.”

Keen, Justin, Radu Calinescu, Richard Paige, John Rooksby. “Big data + politics = open data: The case of health care data in England.Policy and Internet 5 (2), (2013): 228–243. http://bit.ly/1i231WS.

  • This paper examines the assumptions regarding open datasets, technological infrastructure and access, using healthcare systems as a case study.
  • The authors specifically address two assumptions surrounding enthusiasm about Big Data in healthcare: the assumption that healthcare datasets and technological infrastructure are up to task, and the assumption of access to this data from outside the healthcare system.
  • By using the National Health Service in England as an example, the authors identify data, technology, and information governance challenges. They argue that “public acceptability of third party access to detailed health care datasets is, at best, unclear,” and that the prospects of Open Data depend on Open Data policies, which are inherently political, and the government’s assertion of property rights over large datasets. Thus, they argue that the “success or failure of Open Data in the NHS may turn on the question of trust in institutions.”

Kulk, Stefan and Bastiaan Van Loenen. “Brave New Open Data World?International Journal of Spatial Data Infrastructures Research, May 14, 2012. http://bit.ly/15OAUYR.

  • This paper examines the evolving tension between the open data movement and the European Union’s privacy regulations, especially the Data Protection Directive.
  • The authors argue, “Technological developments and the increasing amount of publicly available data are…blurring the lines between non-personal and personal data. Open data may not seem to be personal data on first glance especially when it is anonymised or aggregated. However, it may become personal by combining it with other publicly available data or when it is de-anonymised.”

Kundra, Vivek. “Digital Fuel of the 21st Century: Innovation through Open Data and the Network Effect.” Joan Shorenstein Center on the Press, Politics and Public Policy, Harvard College: Discussion Paper Series, January 2012, http://hvrd.me/1fIwsjR.

  • In this paper, Vivek Kundra, the first Chief Information Officer of the United States, explores the growing impact of open data, and argues that, “In the information economy, data is power and we face a choice between democratizing it and holding on to it for an asymmetrical advantage.”
  • Kundra offers four specific recommendations to maximize the impact of open data: Citizens and NGOs must demand open data in order to fight government corruption, improve accountability and government services; Governments must enact legislation to change the default setting of government to open, transparent and participatory; The press must harness the power of the network effect through strategic partnerships and crowdsourcing to cut costs and provide better insights; and Venture capitalists should invest in startups focused on building companies based on public sector data.

Noveck, Beth Simone and Daniel L. Goroff. “Information for Impact: Liberating Nonprofit Sector Data.” The Aspen Institute Philanthropy & Social Innovation Publication Number 13-004. 2013. http://bit.ly/WDxd7p.

  • This report is focused on “obtaining better, more usable data about the nonprofit sector,” which encompasses, as of 2010, “1.5 million tax-exempt organizations in the United States with $1.51 trillion in revenues.”
  • Toward that goal, the authors propose liberating data from the Form 990, an Internal Revenue Service form that “gathers and publishes a large amount of information about tax-exempt organizations,” including information related to “governance, investments, and other factors not directly related to an organization’s tax calculations or qualifications for tax exemption.”
  • The authors recommend a two-track strategy: “Pursuing the longer-term goal of legislation that would mandate electronic filing to create open 990 data, and pursuing a shorter-term strategy of developing a third party platform that can demonstrate benefits more immediately.”

Robinson, David G., Harlan Yu, William P. Zeller, and Edward W. Felten, “Government Data and the Invisible Hand.” Yale Journal of Law & Technology 11 (2009), http://bit.ly/1c2aDLr.

  • This paper proposes a new approach to online government data that “leverages both the American tradition of entrepreneurial self-reliance and the remarkable low-cost flexibility of contemporary digital technology.”
  • “In order for public data to benefit from the same innovation and dynamism that characterize private parties’ use of the Internet, the federal government must reimagine its role as an information provider. Rather than struggling, as it currently does, to design sites that meet each end-user need, it should focus on creating a simple, reliable and publicly accessible infrastructure that ‘exposes’ the underlying data.”
Ubaldi, Barbara. “Open Government Data: Towards Empirical Analysis of Open Government Data Initiatives.” OECD Working Papers on Public Governance. Paris: Organisation for Economic Co-operation and Development, May 27, 2013. http://bit.ly/15OB6qP.

  • This working paper from the OECD seeks to provide an all-encompassing look at the principles, concepts and criteria framing open government data (OGD) initiatives.
  • Ubaldi also analyzes a variety of challenges to implementing OGD initiatives, including policy, technical, economic and financial, organizational, cultural and legal impediments.
  • The paper also proposes a methodological framework for evaluating OGD Initiatives in OECD countries, with the intention of eventually “developing a common set of metrics to consistently assess impact and value creation within and across countries.”

Worthy, Ben. “David Cameron’s Transparency Revolution? The Impact of Open Data in the UK.” SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, November 29, 2013. http://bit.ly/NIrN6y.

  • In this article, Worthy “examines the impact of the UK Government’s Transparency agenda, focusing on the publication of spending data at local government level. It measures the democratic impact in terms of creating transparency and accountability, public participation and everyday information.”
  • Worthy’s findings, based on surveys of local authorities, interviews and FOI requests, are disappointing. He finds that:
    • Open spending data has led to some government accountability, but largely from those already monitoring government, not regular citizens.
    • Open Data has not led to increased participation, “as it lacks the narrative or accountability instruments to fully bring such effects.”
    • It has also not “created a new stream of information to underpin citizen choice, though new innovations offer this possibility. The evidence points to third party innovations as the key.
  • Despite these initial findings, “Interviewees pointed out that Open Data holds tremendous opportunities for policy-making. Joined up data could significantly alter how policy is made and resources targeted. From small scale issues e.g. saving money through prescriptions to targeting homelessness or health resources, it can have a transformative impact. “

Zuiderwijk, Anneke, Marijn Janssen, Sunil Choenni, Ronald Meijer and Roexsana Sheikh Alibaks. “Socio-technical Impediments of Open Data.” Electronic Journal of e-Government 10, no. 2 (2012). http://bit.ly/17yf4pM.

  • This paper to seeks to identify the socio-technical impediments to open data impact based on a review of the open data literature, as well as workshops and interviews.
  • The authors discovered 118 impediments across ten categories: 1) availability and access; 2) find-ability; 3) usability; 4) understandability; 5) quality; 6) linking and combining data; 7) comparability and compatibility; 8) metadata; 9) interaction with the data provider; and 10) opening and uploading.

Zuiderwijk, Anneke and Marijn Janssen. “Open Data Policies, Their Implementation and Impact: A Framework for Comparison.” Government Information Quarterly 31, no. 1 (January 2014): 17–29. http://bit.ly/1bQVmYT.

  • In this article, Zuiderwijk and Janssen argue that “currently there is a multiplicity of open data policies at various levels of government, whereas very little systematic and structured research [being] done on the issues that are covered by open data policies, their intent and actual impact.”
  • With this evaluation deficit in mind, the authors propose a new framework for comparing open data policies at different government levels using the following elements for comparison:
    • Policy environment and context, such as level of government organization and policy objectives;
    • Policy content (input), such as types of data not publicized and technical standards;
    • Performance indicators (output), such as benefits and risks of publicized data; and
    • Public values (impact).

To stay current on recent writings and developments on Open Data, please subscribe to the GovLab Digest.
Did we miss anything? Please submit reading recommendations to [email protected] or in the comments below.

House Bill Raises Questions about Crowdsourcing


Anne Bowser for Commons Lab (Wilson Center):”A new bill in the House is raising some key questions about how crowdsourcing is understood by scientists, government agencies, policymakers and the public at large.
Robin Bravender’s recent article in Environment & Energy Daily, “House Republicans Push Crowdsourcing on Agency Science,” (subscription required) neatly summarizes the debate around H.R. 4012, a bill introduced to the House of Representatives earlier this month. The House Science, Space and Technology Committe earlier this week held a hearing on the bill, which could see a committee vote as early as next month.
Dubbed the “Secret Science Reform Act of 2014,” the bill prohibits the Environmental Protection Agency (EPA) from “proposing, finalizing, or disseminating regulations or assessments based upon science that is not transparent or reproducible.” If the bill is passed, EPA would be unable to base assessments or regulations on any information not “publicly available in a manner that is sufficient for independent analysis.” This would include all information published in scholarly journals based on data that is not available as open source.
The bill is based on the premise that forcing EPA to use public data will inspire greater transparency by allowing “the crowd” to conduct independent analysis and interpretation. While the premise of involving the public in scientific research is sound, this characterization of crowdsourcing as a process separate from traditional scientific research is deeply problematic.
This division contrasts the current practices of many researchers, who use crowdsourcing to directly involve the public in scientific processes. Galaxy Zoo, for example, enlists digital volunteers (called “citizen scientists”) help classify more than 40 million photographs of galaxies taken by the Hubble Telescope. These crowdsourced morphological classifications are a powerful form of data analysis, a key aspect of the scientific process. Galaxy Zoo then publishes a catalogue of these classifications as an open-source data set. And the data reduction techniques and measures of confidence and bias for the data catalogue are documented in MNRAS, a peer-reviewed journal. A recent Google Scholar search shows that the data set published in MNRAS has been cited a remarkable 121 times.
As this example illustrates, crowdsourcing is often embedded in the process of formal scientific research. But prior to being published in a scientific journal, the crowdsourced contributions of non-professional volunteers are subject to the scrutiny of professional scientists through the rigorous process of peer review. Because peer review was designed as an institution to ensure objective and unbiased research, peer-reviewed scientific work is widely accepted as the best source of information for any science-based decision.
Separating crowdsourcing from the peer review process, as this legislation intends, means that there will be no formal filters in place to ensure that open data will not be abused by special interests. Ellen Silbergeld, a professor at John Hopkins University who testified at the hearing this week, made exactly this point when she pointed to data manipulation commonly practiced by tobacco lobbyists in the United States.
Contributing to scientific research is one goal of crowdsourcing for science. Involving the public in scientific research also increases volunteer understanding of research topics and the scientific process and inspires heightened community engagement. These goals are supported by President Obama’s Second Open Government National Action Plan, which calls for “increased crowdsourcing and citizen science programs” to support “an informed and active citizenry.” But H.R. 4012 does not support these goals. Rather, this legislation could further degrade the public’s understanding of science by encouraging the public to distrust professional scientists rather than collaborate with them.
Crowdsourcing benefits organizations by bringing in the unique expertise held by external volunteers, which can augment and enhance the traditional scientific process. In return, these volunteers benefit from exposure to new and exciting processes, such as scientific research. This mutually beneficial relationship depends on collaboration, not opposition. Supporting an antagonistic relationship between science-based organizations like the EPA and members of “the crowd” will benefit neither institutions, nor volunteers, nor the country as a whole.
 

What makes a good API?


Joshua Tauberer’s Blog: “There comes a time in every dataset’s life when it wants to become an API. That might be because of consumer demand or an executive order. How are you going to make a good one?…
Let’s take the common case where you have a relatively static, large dataset that you want to provide read-only access to. Here are 19 common attributes of good APIs for this situation. …
Granular Access. If the user wanted the whole thing they’d download it in bulk, so an API must be good at providing access to the most granular level practical for data users (h/t Ben Balter for the wording on that). When the data comes from a table, this usually means the ability to read a small slice of it using filters, sorting, and paging (limit/offset), the ability to get a single row by identifying it with a persistent, unique identifier (usually a numeric ID), and the ability to select just which fields should be included in the result output (good for optimizing bandwidth in mobile apps, h/t Eric Mill). (But see “intents” below.)
Deep Filtering. An API should be good at needle-in-haystack problems. Full text search is hard to do, so an API that can do it relieves a big burden for developers — if your API has any big text fields. Filters that can span relations or cross tables (i.e. joins) can be very helpful as well. But don’t go overboard. (Again, see “intents” below.)
Typed Values. Response data should be typed. That means that whether a field’s value is an integer, text, list, floating-point number, dictionary, null, or date should be encoded as a part of the value itself. JSON and XML with XSD are good at this. CSV and plain XML, on the other hand, are totally untyped. Types must be strictly enforced. Columns must choose a data type and stick with it, no exceptions. When encoding other sorts of data as text, the values must all absolutely be valid according to the most narrow regular expression that you can make. Provide that regular expression to the API users in documentation.
Normalize Tables, Then Denormalize. Normalization is the process of removing redundancy from tables by making multiple tables. You should do that. Have lots of primary keys that link related tables together. But… then… denormalize. The bottleneck of most APIs isn’t disk space but speed. Queries over denormalized tables are much faster than writing queries with JOINs over multiple tables. It’s faster to get data if it’s all in one response than if the user has to issue multiple API calls (across multiple tables) to get it. You still have to normalize first, though. Denormalized data is hard to understand and hard to maintain.
Be RESTful, And More. ”REST” is a set of practices. There are whole books on this. Here it is in short. Every object named in the data (often that’s the rows of the table) gets its own URL. Hierarchical relationships in the data are turned into nice URL paths with slashes. Put the URLs of related resources in output too (HATEOAS, h/t Ed Summers). Use HTTP GET and normal query string processing (a=x&b=y) for filtering, sorting, and paging. The idea of REST is that these are patterns already familiar to developers, and reusing existing patterns — rather than making up entirely new ones — makes the API more understandable and reusable. Also, use HTTPS for everything (h/t Eric Mill), and provide the API’s status as an API itself possibly at the root URL of the API’s URL space (h/t Eric Mill again).
….
Never Require Registration. Don’t have authentication on your API to keep people out! In fact, having a requirement of registration may contradict other guidelines (such as the 8 Principles of Open Government Data). If you do use an API key, make it optional. A non-authenticated tier lets developers quickly test the waters, and that is really important for getting developers in the door, and, again, it may be important for policy reasons as well. You can have a carrot to incentivize voluntary authentication: raise the rate limit for authenticated queries, for instance. (h/t Ben Balter)
Interactive Documentation. An API explorer is a web page that users can visit to learn how to build API queries and see results for test queries in real time. It’s an interactive browser tool, like interactive documentation. Relatedly, an “explain mode” in queries, which instead of returning results says what the query was and how it would be processed, can help developers understand how to use the API (h/t Eric Mill).
Developer Community. Life is hard. Coding is hard. The subject matter your data is about is probably very complex. Don’t make your API users wade into your API alone. Bring the users together, bring them to you, and sometimes go to them. Let them ask questions and report issues in a public place (such as github). You may find that users will answer other users’ questions. Wouldn’t that be great? Have a mailing list for longer questions and discussion about the future of the API. Gather case studies of how people are using the API and show them off to the other users. It’s not a requirement that the API owner participates heavily in the developer community — just having a hub is very helpful — but of course the more participation the better.
Create Virtuous Cycles. Create an environment around the API that make the data and API stronger. For instance, other individuals within your organization who need the data should go through the public API to the greatest extent possible. Those users are experts and will help you make a better API, once they realize they benefit from it too. Create a feedback loop around the data, meaning find a way for API users to submit reports of data errors and have a process to carry out data updates, if applicable and possible. Do this in the public as much as possible so that others see they can also join the virtuous cycle.”

Selected Readings on Behavioral Economics: Nudges


The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of behavioral economics was originally published in 2014.

The 2008 publication of Richard Thaler and Cass Sunstein’s Nudge ushered in a new era of behavioral economics, and since then, policy makers in the United States and elsewhere have been applying behavioral economics to the field of public policy. Like Smart Disclosure, behavioral economics can be used in the public sector to improve the decisionmaking ability of citizens without relying on regulatory interventions. In the six years since Nudge was published, the United Kingdom has created the Behavioural Insights Team (also known as the Nudge Unit), a cross-ministerial organization that uses behavioral economics to inform public policy, and the White House has recently followed suit by convening a team of behavioral economists to create a behavioral insights-driven team in the United States. Policymakers have been using behavioral insights to design more effective interventions in the fields of long term unemployment; roadway safety; enrollment in retirement plans; and increasing enrollment in organ donation registries, to name some noteworthy examples. The literature of this nascent field provides a look at the growing optimism in the potential of applying behavioral insights in the public sector to improve people’s lives.

Selected Reading List (in alphabetical order)

  • John Beshears, James Choi, David Laibson and Brigitte C. Madrian – The Importance of Default Options for Retirement Savings Outcomes: Evidence from the United States – a paper examining the role default options play in encouraging intelligent retirement savings decisionmaking.
  • Cabinet Office and Behavioural Insights Team, United Kingdom – Applying Behavioural Insights to Healtha paper outlining some examples of behavioral economics being applied to the healthcare landscape using cost-efficient interventions.
  • Matthew Darling, Saugato Datta and Sendhil Mullainathan – The Nature of the BEast: What Behavioral Economics Is Not – a paper discussing why control and behavioral economics are not as closely aligned as some think, reiterating the fact that the field is politically agnostic.
  • Antoinette Schoar and Saugato Datta – The Power of Heuristics – a paper exploring the concept of “heuristics,” or rules of thumb, which can provide helpful guidelines for pushing people toward making “reasonably good” decisions without a full understanding of the complexity of a situation.
  • Richard H. Thaler and Cass R. Sunstein – Nudge: Improving Decisions About Health, Wealth, and Happiness – an influential book describing the many ways in which the principles of behavioral economics can be and have been used to influence choices and behavior through the development of new “choice architectures.” 
  • U.K. Parliament Science and Technology Committee – Behaviour Changean exploration of the government’s attempts to influence the behaviour of its citizens through nudges, with a focus on comparing the effectiveness of nudges to that of regulatory interventions.

Annotated Selected Reading List (in alphabetical order)

Beshears, John, James Choi, David Laibson and Brigitte C. Madrian. “The Importance of Default Options for Retirement Savings Outcomes: Evidence from the United States.” In Jeffrey R. Brown, Jeffrey B. Liebman and David A. Wise, editors, Social Security Policy in a Changing Environment, Cambridge: National Bureau of Economic Research, 2009. http://bit.ly/LFmC5s.

  • This paper examines the role default options play in pushing people toward making intelligent decisions regarding long-term savings and retirement planning.
  • Importantly, the authors provide evidence that a strategically oriented default setting from the outset is likely not enough to fully nudge people toward the best possible decisions in retirement savings. They find that the default settings in every major dimension of the savings process (from deciding whether to participate in a 401(k) to how to withdraw money at retirement) have real and distinct effects on behavior.

Cabinet Office and Behavioural Insights Team, United Kingdom. “Applying Behavioural Insights to Health.” December 2010. http://bit.ly/1eFP16J.

  • In this report, the United Kingdom’s Behavioural Insights Team does not attempt to “suggest that behaviour change techniques are the silver bullet that can solve every problem.” Rather, they explore a variety of examples where local authorities, charities, government and the private-sector are using behavioural interventions to encourage healthier behaviors.  
  • The report features case studies regarding behavioral insights ability to affect the following public health issues:
    • Smoking
    • Organ donation
    • Teenage pregnancy
    • Alcohol
    • Diet and weight
    • Diabetes
    • Food hygiene
    • Physical activity
    • Social care
  • The report concludes with a call for more experimentation and knowledge gathering to determine when, where and how behavioural interventions can be most effective in helping the public become healthier.

Darling, Matthew, Saugato Datta and Sendhil Mullainathan. “The Nature of the BEast: What Behavioral Economics Is Not.” The Center for Global Development. October 2013. https://bit.ly/2QytRmf.

  • In this paper, Darling, Datta and Mullainathan outline the three most pervasive myths that abound within the literature about behavioral economics:
    • First, they dispel the relationship between control and behavioral economics.  Although tools used within behavioral economics can convince people to make certain choices, the goal is to nudge people to make the choices they want to make. For example, studies find that when retirement savings plans change the default to opt-in rather than opt-out, more workers set up 401K plans. This is an example of a nudge that guides people to make a choice that they already intend to make.
    • Second, they reiterate that the field is politically agnostic. Both liberals and conservatives have adopted behavioral economics and its approach is neither liberal nor conservative. President Obama embraces behavioral economics but the United Kingdom’s conservative party does, too.
    • And thirdly, the article highlights that irrationality actually has little to do with behavioral economics. Context is an important consideration when one considers what behavior is rational and what behavior is not. Rather than use the term “irrational” to describe human beings, the authors assert that humans are “infinitely complex” and behavior that is often considered irrational is entirely situational.

Schoar, Antoinette and Saugato Datta. “The Power of Heuristics.” Ideas42. January 2014. https://bit.ly/2UDC5YK.

  • This paper explores the notion that being presented with a bevy of options can be desirable in many situations, but when making an intelligent decision requires a high-level understanding of the nuances of vastly different financial aid packages, for example, options can overwhelm. Heuristics (rules of thumb) provide helpful guidelines that “enable people to make ‘reasonably good’ decisions without needing to understand all the complex nuances of the situation.”
  • The underlying goal heuristics in the policy space involves giving people the type of “rules of thumb” that enable make good decisionmaking regarding complex topics such as finance, healthcare and education. The authors point to the benefit of asking individuals to remember smaller pieces of knowledge by referencing a series of studies conducted by psychologists Beatty and Kahneman that showed people were better able to remember long strings of numbers when they were broken into smaller segments.
  • Schoar and Datta recommend these four rules when implementing heuristics:
    • Use heuristics where possible, particularly in complex situation;
    • Leverage new technology (such as text messages and Internet-based tools) to implement heuristics.
    • Determine where heuristics can be used in adult training programs and replace in-depth training programs with heuristics where possible; and
    • Consider how to apply heuristics in situations where the exception is the rule. The authors point to the example of savings and credit card debt. In most instances, saving a portion of one’s income is a good rule of thumb. However, when one has high credit card debt, paying off debt could be preferable to building one’s savings.

Thaler, Richard H. and Cass R. Sunstein. Nudge: Improving Decisions About Health, Wealth, and Happiness. Yale University Press, 2008. https://bit.ly/2kNXroe.

  • This book, likely the single piece of scholarship most responsible for bringing the concept of nudges into the public consciousness, explores how a strategic “choice architecture” can help people make the best decisions.
  • Thaler and Sunstein, while advocating for the wider and more targeted use of nudges to help improve people’s lives without resorting to overly paternal regulation, look to five common nudges for lessons and inspiration:
    • The design of menus gets you to eat (and spend) more;
    • “Flies” in urinals improve, well, aim;
    • Credit card minimum payments affect repayment schedules;
    • Automatic savings programs increase savings rate; and
    • “Defaults” can improve rates of organ donation.
  • In the simplest terms, the authors propose the wider deployment of choice architectures that follow “the golden rule of libertarian paternalism: offer nudges that are most likely to help and least likely to inflict harm.”

U.K. Parliament Science and Technology Committee. “Behaviour Change.” July 2011. http://bit.ly/1cbYv5j.

  • This report from the U.K.’s Science and Technology Committee explores the government’s attempts to influence the behavior of its citizens through nudges, with a focus on comparing the effectiveness of nudges to that of regulatory interventions.
  • The author’s central conclusion is that, “non-regulatory measures used in isolation, including ‘nudges,’ are less likely to be effective. Effective policies often use a range of interventions.”
  • The report’s other major findings and recommendations are:
    • Government must invest in gathering more evidence about what measures work to influence population behaviour change;
    • They should appoint an independent Chief Social Scientist to provide them with robust and independent scientific advice;
    • The Government should take steps to implement a traffic light system of nutritional labelling on all food packaging; and
    • Current voluntary agreements with businesses in relation to public health have major failings. They are not a proportionate response to the scale of the problem of obesity and do not reflect the evidence about what will work to reduce obesity. If effective agreements cannot be reached, or if they show minimal benefit, the Government should pursue regulation.”

Selected Readings on Big Data


The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of big data was originally published in 2014.

Big Data refers to the wide-scale collection, aggregation, storage, analysis and use of data. Government is increasingly in control of a massive amount of raw data that, when analyzed and put to use, can lead to new insights on everything from public opinion to environmental concerns. The burgeoning literature on Big Data argues that it generates value by: creating transparency; enabling experimentation to discover needs, expose variability, and improve performance; segmenting populations to customize actions; replacing/supporting human decision making with automated algorithms; and innovating new business models, products and services. The insights drawn from data analysis can also be visualized in a manner that passes along relevant information, even to those without the tech savvy to understand the data on its own terms (see The GovLab Selected Readings on Data Visualization).

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)

Australian Government Information Management Office. The Australian Public Service Big Data Strategy: Improved Understanding through Enhanced Data-analytics Capability Strategy Report. August 2013. http://bit.ly/17hs2xY.

  • This Big Data Strategy produced for Australian Government senior executives with responsibility for delivering services and developing policy is aimed at ingraining in government officials that the key to increasing the value of big data held by government is the effective use of analytics. Essentially, “the value of big data lies in [our] ability to extract insights and make better decisions.”
  • This positions big data as a national asset that can be used to “streamline service delivery, create opportunities for innovation, identify new service and policy approaches as well as supporting the effective delivery of existing programs across a broad range of government operations.”

Bollier, David. The Promise and Peril of Big Data. The Aspen Institute, Communications and Society Program, 2010. http://bit.ly/1a3hBIA.

  • This report captures insights from the 2009 Roundtable exploring uses of Big Data within a number of important consumer behavior and policy implication contexts.
  • The report concludes that, “Big Data presents many exciting opportunities to improve modern society. There are incalculable opportunities to make scientific research more productive, and to accelerate discovery and innovation. People can use new tools to help improve their health and well-being, and medical care can be made more efficient and effective. Government, too, has a great stake in using large databases to improve the delivery of government services and to monitor for threats to national security.
  • However, “Big Data also presents many formidable challenges to government and citizens precisely because data technologies are becoming so pervasive, intrusive and difficult to understand. How shall society protect itself against those who would misuse or abuse large databases? What new regulatory systems, private-law innovations or social practices will be capable of controlling anti-social behaviors–and how should we even define what is socially and legally acceptable when the practices enabled by Big Data are so novel and often arcane?”

Boyd, Danah and Kate Crawford. “Six Provocations for Big Data.” A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society. September 2011http://bit.ly/1jJstmz.

  • In this paper, Boyd and Crawford raise challenges to unchecked assumptions and biases regarding big data. The paper makes a number of assertions about the “computational culture” of big data and pushes back against those who consider big data to be a panacea.
  • The authors’ provocations for big data are:
    • Automating Research Changes the Definition of Knowledge
    • Claims to Objectivity and Accuracy are Misleading
    • Big Data is not always Better Data
    • Not all Data is Equivalent
    • Just Because it is accessible doesn’t make it ethical
    • Limited Access to Big Data creates New Digital Divide

The Economist Intelligence Unit. Big Data and the Democratisation of Decisions. October 2012. http://bit.ly/17MpH8L.

  • This report from the Economist Intelligence Unit focuses on the positive impact of big data adoption in the private sector, but its insights can also be applied to the use of big data in governance.
  • The report argues that innovation can be spurred by democratizing access to data, allowing a diversity of stakeholders to “tap data, draw lessons and make business decisions,” which in turn helps companies and institutions respond to new trends and intelligence at varying levels of decision-making power.

Manyika, James, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela Hung Byers. Big Data: The Next Frontier for Innovation, Competition, and Productivity.  McKinsey & Company. May 2011. http://bit.ly/18Q5CSl.

  • This report argues that big data “will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus, and that “leaders in every sector will have to grapple with the implications of big data.” 
  • The report offers five broad ways in which using big data can create value:
    • First, big data can unlock significant value by making information transparent and usable at much higher frequency.
    • Second, as organizations create and store more transactional data in digital form, they can collect more accurate and detailed performance information on everything from product inventories to sick days, and therefore expose variability and boost performance.
    • Third, big data allows ever-narrower segmentation of customers and therefore much more precisely tailored products or services.
    • Fourth, big sophisticated analytics can substantially improve decision-making.
    • Finally, big data can be used to improve the development of the next generation of products and services.

The Partnership for Public Service and the IBM Center for The Business of Government. “From Data to Decisions II: Building an Analytics Culture.” October 17, 2012. https://bit.ly/2EbBTMg.

  • This report discusses strategies for better leveraging data analysis to aid decision-making. The authors argue that, “Organizations that are successful at launching or expanding analytics program…systematically examine their processes and activities to ensure that everything they do clearly connects to what they set out to achieve, and they use that examination to pinpoint weaknesses or areas for improvement.”
  • While the report features many strategies for government decisions-makers, the central recommendation is that, “leaders incorporate analytics as a way of doing business, making data-driven decisions transparent and a fundamental approach to day-to-day management. When an analytics culture is built openly, and the lessons are applied routinely and shared widely, an agency can embed valuable management practices in its DNA, to the mutual benet of the agency and the public it serves.”

TechAmerica Foundation’s Federal Big Data Commission. “Demystifying Big Data: A Practical Guide to Transforming the Business of Government.” 2013. http://bit.ly/1aalUrs.

  • This report presents key big data imperatives that government agencies must address, the challenges and the opportunities posed by the growing volume of data and the value Big Data can provide. The discussion touches on the value of big data to businesses and organizational mission, presents case study examples of big data applications, technical underpinnings and public policy applications.
  • The authors argue that new digital information, “effectively captured, managed and analyzed, has the power to change every industry including cyber security, healthcare, transportation, education, and the sciences.” To ensure that this opportunity is realized, the report proposes a detailed big data strategy framework with the following steps: define, assess, plan, execute and review.

World Economic Forum. “Big Data, Big Impact: New Possibilities for International Development.” 2012. http://bit.ly/17hrTKW.

  • This report examines the potential for channeling the “flood of data created every day by the interactions of billions of people using computers, GPS devices, cell phones, and medical devices” into “actionable information that can be used to identify needs, provide services, and predict and prevent crises for the benefit of low-income populations”
  • The report argues that, “To realise the mutual benefits of creating an environment for sharing mobile-generated data, all ecosystem actors must commit to active and open participation. Governments can take the lead in setting policy and legal frameworks that protect individuals and require contractors to make their data public. Development organisations can continue supporting governments and demonstrating both the public good and the business value that data philanthropy can deliver. And the private sector can move faster to create mechanisms for the sharing data that can benefit the public.”