PEOPLE

Putting Crowdsourcing on the Map

Curated on March 8, 2014August 3, 2018 by Stefaan Verhulst

MIT Technology Review: “Even in San Francisco, where Google’s roving Street View cars have mapped nearly every paved surface, there are still places that have remained untouched, such as the flights of stairs that serve as pathways between streets in some of the city’s hilliest neighborhoods.
It’s these places that a startup called Mapillary is focusing on. Cofounders Jan Erik Solem and Johan Gyllenspetz are attempting to build an open, crowdsourced, photographic map that lets smartphone users log all sorts of places, creating a richer view of the world than what is offered by Street View and other street-level mapping services. If contributors provide images often, that view could be more representative of how things look right now.
Google itself is no stranger to the benefits of crowdsourced map content: it paid $966 million last year for traffic and navigation app Waze, whose users contribute data. Google also lets people augment Street View content with their own images. But Solem and Gyllenspetz think there’s still plenty of room for Mapillary, which they say can be used for everything from tracking a nature hike to offering more up-to-date images to house hunters and Airbnb users.
Solem and Gyllenspetz have only been working on the project for four months; they released an iPhone app in November, and an Android app in January. So far, there are just a few hundred users who have shared about 100,000 photos on the service. While it’s free for anyone to use, the startup plans to eventually make money by licensing the data its users generate to companies.
With the app, a user can choose to collect images by walking, biking, or driving. Once you press a virtual shutter button within the app, it takes a photo every two seconds, until you press the button again. You can then upload the images to Mapillary’s service via Wi-Fi, where each photo’s location is noted through its GPS tag. Computer-vision software compares each photo with others that are within a radius of about 100 meters, searching for matching image features so it can find the geometric relationship between the photos. It then places those images properly on the map, and stitches them all together. When new images come in of an area that has already been mapped, Mapillary will add them to its database, too.
It can take less than 30 seconds for the images to show up on the Web-based map, but several minutes for the images to be fully processed. As with Google’s Street View photos, image-recognition software blurs out faces and license plate numbers.
Users can edit Mapillary’s map by moving around the icons that correspond to images—to fix a misplaced image, for instance. Eventually, users will also be able to add comments and tags.
So far, Mapillary’s map is quite sparse. But the few hundred users trying out Mapillary include some map providers in Europe, and the 100,000 or so images to the service ranging from a bike path on Venice Beach in California to a snow-covered ski slope in Sweden.
Street-level images can be viewed on the Web or through Mapillary’s smartphone apps (though the apps just pull up the Web page within the app). Blue lines and colored tags indicate where users have added photos to the map; you can zoom in to see them at the street level.

Navigating through photos is still quite rudimentary; you can tap or click to move from one image to the next with onscreen arrows, depending on the direction you want to explore.
Beyond technical and design challenges, the biggest issue Mapillary faces is convincing a large enough number of users to build up its store of images so that others will start using it and contributing as well, and then ensuring that these users keep coming back.”

Coordinating the Commons: Diversity & Dynamics in Open Collaborations

Curated on March 6, 2014August 15, 2018 by Stefaan Verhulst

Dissertation by Jonathan T. Morgan: “The success of Wikipedia demonstrates that open collaboration can be an effective model for organizing geographically-distributed volunteers to perform complex, sustained work at a massive scale. However, Wikipedia’s history also demonstrates some of the challenges that large, long-term open collaborations face: the core community of Wikipedia editors—the volunteers who contribute most of the encyclopedia’s content and ensure that articles are correct and consistent — has been gradually shrinking since 2007, in part because Wikipedia’s social climate has become increasingly inhospitable for newcomers, female editors, and editors from other underrepresented demographics. Previous research studies of change over time within other work contexts, such as corporations, suggests that incremental processes such as bureaucratic formalization can make organizations more rule-bound and less adaptable — in effect, less open— as they grow and age. There has been little research on how open collaborations like Wikipedia change over time, and on the impact of those changes on the social dynamics of the collaborating community and the way community members prioritize and perform work. Learning from Wikipedia’s successes and failures can help researchers and designers understand how to support open collaborations in other domains — such as Free/Libre Open Source Software, Citizen Science, and Citizen Journalism.

In this dissertation, I examine the role of openness, and the potential antecedents and consequences of formalization, within Wikipedia through an analysis of three distinct but interrelated social structures: community-created rules within the Wikipedia policy environment, coordination work and group dynamics within self-organized open teams called WikiProjects, and the socialization mechanisms that Wikipedia editors use to teach new community members how to participate.To inquire further, I have designed a new editor peer support space, the Wikipedia Teahouse, based on the findings from my empirical studies. The Teahouse is a volunteer-driven project that provides a welcoming and engaging environment in which new editors can learn how to be productive members of the Wikipedia community, with the goal of increasing the number and diversity of newcomers who go on to make substantial contributions to Wikipedia …”

The Problem With Serious Games–Solved

Curated on February 25, 2014November 28, 2018 by Stefaan Verhulst

Emerging Technology From the arXiv:” Serious games are becoming increasingly popular but the inability to generate realistic new content has hampered their progress. Until now.

Here’s an imaginary scenario: you’re a law enforcement officer confronted with John, a 21-year-old male suspect who is accused of breaking into a private house on Sunday evening and stealing a laptop, jewellery and some cash. Your job is to find out whether John has an alibi and if so whether it is coherent and believable.
That’s exactly the kind of scenario that police officers the world over face on a regular basis. But how do you train for such a situation? How do you learn the skills necessary to gather the right kind of information?
An increasingly common way of doing this is with serious games, those designed primarily for purposes other than entertainment. In the last 10 years or so, medical, military and commercial organisations all over the world began to experiment with game-based scenarios that are designed to teach people how to perform their jobs and tasks in realistic situations.
But there is a problem with serious games which require realistic interaction is with another person. It’s relatively straightforward to design one or two scenarios that are coherent, lifelike and believable but it’s much harder to generate them continually on an ongoing basis.
Imagine in the example above, that John is a computer-generated character. What kind of activities could he describe that would serve as a believable, coherent alibi for Sunday evening? And how could he do it a thousand times, each describing a different realistic alibi. Therein lies the problem.
Today, Sigal Sina at Bar-Ilan University in Israel, and a couple pals, say they’ve solved this probelm. These guys have come up with a novel way of generating ordinary, realistic scenarios that can be cut and pasted into a serious game to serve exactly this purpose. The secret sauce in their new approach is to crowdsource the new scenarios from real people using Amazon’s Mechanical Turk service.
The approach is straightforward. Sina and co simply ask Turkers to answer a set of questions asking what they did during each one-hour period throughout various days, offering bonuses to those who provide the most varied detail.
They then analyse the answers, categorising activities by factors such as the times they are performed, the age and sex of the person doing it, the number of people involved and so on.
This then allows a computer game to cut and paste activities into the action at appropriate times. So for example, the computer can select an appropriate alibi for John on a Sunday evening by choosing an activity described by a male Turker for the same time while avoiding activitiesthat a woman might describe for a Friday morning, which might otherwise seem unbelievable. The computer also changes certain details in the narrative, such as names, locations and so on to make the narrative coherent with John’s profile….
That solves a significant problem with serious games. Until now, developers have had to spend an awful lot of time producing realistic content, a process known as procedural content generation. That’s always been straightforward for things like textures, models and terrain in game settings. Now, thanks to this new crowdsourcing technique, it can be just as easy for human interactions in serious games too.
Ref: arxiv.org/abs/1402.5034 : Using the Crowd to Generate Content for Scenario-Based Serious-Games”

Crowdsourcing voices to study Parkinson’s disease

Curated on February 24, 2014August 3, 2018 by Stefaan Verhulst

TedMed: “Mathematician Max Little is launching a project that aims to literally give Parkinson’s disease (PD) patients a voice in their own diagnosis and help them monitor their disease progression.
Patients Voice Analysis (PVA) is an open science project that uses phone-based voice recordings and self-reported symptoms, along with software Little designed, to track disease progression. Little, a TEDMED 2013 speaker and TED Fellow, is partnering with the online community PatientsLikeMe, co-founded by TEDMED 2009 speaker James Heywood, and Sage Bionetworks, a non-profit research organization, to conduct the research.
The new project is an extension of Little’s Parkinson’s Voice Initiative, which used speech analysis algorithms to diagnose Parkinson’s from voice records with the help of 17,000 volunteers. This time, he seeks to not only detect markers of PD, but also to add information reported by patients using PatientsLikeMe’s Parkinson’s Disease Rating Scale (PDRS), a tool that documents patients’ answers to questions that measure treatment effectiveness and disease progression….
As openly shared information, the collected data has potential to help vast numbers of individuals by tapping into collective ingenuity. Little has long argued that for science to progress, researchers need to democratize research and move past jostling for credit. Sage Bionetworks has designed a platform called Synapse to allow data sharing with collaborative version control, an effort led by open data advocate John Wilbanks.
“If you can’t share your data, how can you reproduce your science? One of the big problems we’re facing with this kind of medical research is the data is not open and getting access to it is a nightmare,” Little says.
With the PVA project, “Basically anyone can log on download the anonymized data and play around with data mining techniques. We don’t really care what people are able to come up with. We just want the most accurate prediction we can get.
“In research, you’re almost always constrained by what you think is the best way to do things. Unless you open it to the community at large, you’ll never know,” he says.”

L’intelligence d’une ville : ses citoyens

Curated on February 23, 2014October 31, 2018 by Stefaan Verhulst

Michel Dumais: “Tic toc! disions-nous. Bientôt la centième. Et avec la cent-unième, de nouveaux défis. Ville intelligente, disiez-vous? Je subodore le traditionnel appel de pied aux trois lettres et à une logique administrative archaïque. Et si on faisait plutôt appel à l’intelligence de ceux qui connaissent le plus leur ville, ses citoyens?

Pour régler un problème (et même à l’occasion, un «pas d’problème»), les administrations regardent du côté de ces logiciels mammouth qui, sur papier, sont censés faire tout, qui engloutissent des centaines de millions de dollars, mais qui, finalement, font les manchettes des médias parce qu’il faut y injecter encore plus d’argent. Et qui permettent aux TI d’asseoir encore plus leur contrôle sur une administration.

Bref, lorsque l’on parle de ville intelligente, plusieurs y voient le pactole. Ah! Reste que ce qui était «acceptable», hier, ne l’est plus aujourd’hui. Et que la réalisation d’une ville intelligente n’est surtout pas un défi technologique, loin de là.

LA QUESTION DU SANS-FIL
Il y a des années de cela, la simple logique eut voulu que la Ville cesse de penser «big telcos» afin de conclure rapidement une alliance avec l’organisme communautaire «Île sans fil» et ainsi favoriser le déploiement rapide sur l’île de la technologie sans fil.

Une telle alliance, un modèle dans le genre, existe.

Mais pas à Montréal. Plutôt à Québec, alors que la Ville et l’organisme communautaire «Zap Québec» travaillent main dans la main pour le plus grand bénéfice des citoyens de Québec et des touristes. Et à Montréal? On jase, on jase.

Donc, une ville intelligente. C’est une ville qui sait, à l’aide des technologies, comment harnacher ses infrastructures et les mettre au service de ses citoyens tout en réalisant des économies et en favorisant le développement durable.

C’est aussi une ville qui sait écouter et mobiliser ses citoyens, ses militants et ses entrepreneurs, tout en leur donnant des outils (comme des données utilisables) afin qu’ils puissent eux aussi créer des services destinés à leur organisation et à tous les citoyens de la ville. Sans compter que tous ces outils facilitent la prise de décisions chez les maires d’arrondissement et le comité exécutif.

Bref, une ville intelligente selon le professeur Rudolf Giffinger, c’est ça: «une économie intelligente, une mobilité intelligente, un environnement intelligent, des habitants intelligents, un mode de vie intelligent et, enfin, une administration intelligente».…

J’invite le lecteur à regarder LifeApps, une extraordinaire série télé diffusée sur le site de la chaîne AlJazeera. Le sujet: des jeunes et de moins jeunes militants, bidouilleurs, qui s’impliquent et créent des services pour leur communauté.”

Are bots taking over Wikipedia?

Curated on February 20, 2014October 31, 2018 by Stefaan Verhulst

Kurzweil News: “As crowdsourced Wikipedia has grown too large — with more than 30 million articles in 287 languages — to be entirely edited and managed by volunteers, 12 Wikipedia bots have emerged to pick up the slack.

The bots use Wikidata — a free knowledge base that can be read and edited by both humans and bots — to exchange information between entries and between the 287 languages.

Which raises an interesting question: what portion of Wikipedia edits are generated by humans versus bots?

To find out (and keep track of other bot activity), Thomas Steiner of Google Germany has created an open-source application (and API): Wikipedia and Wikidata Realtime Edit Stats, described in an arXiv paper.
The percentages of bot vs. human edits as shown in the application is constantly changing. A KurzweilAI snapshot on Feb. 20 at 5:19 AM EST showed an astonishing 42% of Wikipedia being edited by bots. (The application lists the 12 bots.)

Anonymous vs. logged-In humans (credit: Thomas Steiner)
The percentages also vary by language. Only 5% of English edits were by bots; but for Serbian pages, in which few Wikipedians apparently participate, 96% of edits were by bots.

The application also tracks what percentage of edits are by anonymous users. Globally, it was 25 percent in our snapshot and a surprising 34 percent for English — raising interesting questions about corporate and other interests covertly manipulating Wikipedia information.

NatureNet: a model for crowdsourcing the design of citizen science systems

Curated on February 20, 2014August 3, 2018 by Stefaan Verhulst

Paper in CSCW Companion ’14, the companion publication of the 17th ACM conference on Computer supported cooperative work & social computing: “NatureNet is citizen science system designed for collecting bio-diversity data in nature park settings. Park visitors are encouraged to participate in the design of the system in addition to collecting bio-diversity data. Our goal is to increase the motivation to participate in citizen science via crowdsourcing: the hypothesis is that when the crowd plays a role in the design and development of the system, they become stakeholders in the project and work to ensure its success. This paper presents a model for crowdsourcing design and citizen science data collection, and the results from early trials with users that illustrate the potential of this approach.”

Open Data (Updated and Expanded)

Curated on February 14, 2014May 29, 2019 by Andrew Young

As part of an ongoing effort to build a knowledge base for the field of opening governance by organizing and disseminating its learnings, the GovLab Selected Readings series provides an annotated and curated collection of recommended works on key opening governance topics. We start our series with a focus on Open Data. To suggest additional readings on this or any other topic, please email biblio@thegovlab.org.

Open data refers to data that is publicly available for anyone to use and which is licensed in a way that allows for its re-use. The common requirement that open data be machine-readable not only means that data is distributed via the Internet in a digitized form, but can also be processed by computers through automation, ensuring both wide dissemination and ease of re-use. Much of the focus of the open data advocacy community is on government data and government-supported research data. For example, in May 2013, the US Open Data Policy defined open data as publicly available data structured in a way that enables the data to be fully discoverable and usable by end users, and consistent with a number of principles focused on availability, accessibility and reusability.

Selected Reading List (in alphabetical order)

Mark S. Fox – City Data: Big, Open and Linked – a paper exploring the concepts underlying Big City Data and potential for wider impact as more people are given the opportunity to analyze big and small data.
Muriel Foulonneau, Sébastien Martin, and Slim Turki – How Open Data Are Turned into Services? – a book chapter proposing a means for evaluating the impact of open data initiatives, especially in regard to improving service delivery.
Brett Goldstein and Lauren Dyson – Beyond Transparency: Open Data and the Future of Civic Innovation – a multi-authored book exploring the broad open data landscape from a variety of disciplinary perspectives.
Karolis Granickas – Understanding the Impact of Releasing and Re-using Open Government Data – a paper exploring the open data research field with an eye toward enabling an environment that maximizes the benefits of open data.
Joel Gurin – Open Data Now: The Secret to Hot Startups, Smart Investing, Savvy Marketing, and Fast Innovation – a book describing the realized and potential benefit, especially related its power to transform business, government, and society.
Thorhildur Jetzek, Michel Avital, and Niels Bjørn-Andersen – Generating Value from Open Government Data – a paper proposing a new conceptual model for creating value (broadly understood) from open government data.
Maxat Kassen – A promising phenomenon of open data: A case study of the Chicago open data project – a case study demonstrating the empowering potential of open data through the examination of Chicago’s open data efforts.
Justin Keen, Radu Calinescu, Richard Paige and John Rooksby – Big data + politics = open data: The case of health care data in England – a paper exploring challenges and assumptions related to open datasets data, technological infrastructure and levels of access through the study of the U.K.’s National Health Service.
Stefan Kulk and Bastiaan Van Loenen – Brave New Open Data World? – a paper examining the tensions between open data initiatives and European privacy regulations.
Vivek Kundra – Digital Fuel of the 21st Century: Innovation through Open Data and the Network Effect – a paper describing the impacts to date of open data as well as recommendations for maximizing future impact.
David G. Robinson, Harlan Yu, William P. Zeller, and Edward W. Felten – Government Data and the Invisible Hand – a paper focusing on the open data movement’s evolving impact on entrepreneurs.
Barbara Ubaldi – Open Government Data: Towards Empirical Analysis of Open Government Data Initiatives – a report offering a framework for empirically evaluating open government data initiatives.
Ben Worthy – David Cameron’s Transparency Revolution? The Impact of Open Data in the UK – a paper evaluating the U.K.’s open data efforts in relation to their effects on accountability, participation and better informing citizens.
Anneke Zuiderwijk, Marijn Janssen, Sunil Choenni, Ronald Meijer and Roexsana Sheikh Alibaks – Socio-technical Impediments of Open Data – a paper describing the socio-technical challenges of opening data based on a review of the literature, workshops and interviews.
Anneke Zuiderwijk and Marijn Janssen – Open Data Policies, Their Implementation and Impact: A Framework for Comparison – a paper proposing a comparison and evaluation framework for open government initiatives across governments levels.

Annotated Selected Reading List (in alphabetical order)
Fox, Mark S. “City Data: Big, Open and Linked.” Working Paper, Enterprise Integration Laboratory (2013). http://bit.ly/1bFr7oL.

This paper examines concepts that underlie Big City Data using data from multiple cities as examples. It begins by explaining the concepts of Open, Unified, Linked, and Grounded data, which are central to the Semantic Web. Fox then explore Big Data as an extension of Data Analytics, and provide case examples of good data analytics in cities.

Fox concludes that we can develop the tools that will enable anyone to analyze data, both big and small, by adopting the principles of the Semantic Web:
- Data being openly available over the internet,
- Data being unifiable using common vocabularies,
- Data being linkable using International Resource Identifiers,
- Data being accessible using a common data structure, namely triples,
- Data being semantically grounded using Ontologies.

Foulonneau, Muriel, Sébastien Martin, and Slim Turki. “How Open Data Are Turned into Services?” In Exploring Services Science, edited by Mehdi Snene and Michel Leonard, 31–39. Lecture Notes in Business Information Processing 169. Springer International Publishing, 2014. http://bit.ly/1fltUmR.

In this chapter, the authors argue that, considering the important role the development of new services plays as a motivation for open data policies, the impact of new services created through open data should play a more central role in evaluating the success of open data initiatives.

Foulonneau, Martin and Turki argue that the following metrics should be considered when evaluating the success of open data initiatives: “the usage, audience, and uniqueness of the services, according to the changes it has entailed in the public institutions that have open their data…the business opportunity it has created, the citizen perception of the city…the modification to particular markets it has entailed…the sustainability of the services created, or even the new dialog created with citizens.”

Goldstein, Brett, and Lauren Dyson. Beyond Transparency: Open Data and the Future of Civic Innovation. 1 edition. (Code for America Press: 2013). http://bit.ly/15OAxgF

This “cross-disciplinary survey of the open data landscape” features stories from practitioners in the open data space — including Michael Flowers, Brett Goldstein, Emer Colmeman and many others — discussing what they’ve accomplished with open civic data. The book “seeks to move beyond the rhetoric of transparency for transparency’s sake and towards action and problem solving.”

The book’s editors seek to accomplish the following objectives:
- Help local governments learn how to start an open data program
- Spark discussion on where open data will go next
- Help community members outside of government better engage with the process of governance
- Lend a voice to many aspects of the open data community.

The book is broken into five sections: Opening Government Data, Building on Open Data, Understanding Open Data, Driving Decisions with Data and Looking Ahead.

Granickas, Karolis. “Understanding the Impact of Releasing and Re-using Open Government Data.” European Public Sector Information Platform, ePSIplatform Topic Report No. 2013/08, (2013). http://bit.ly/GU0Nx4.

This paper examines the impact of open government data by exploring the latest research in the field, with an eye toward enabling an environment for open data, as well as identifying the benefits of open government data and its political, social, and economic impacts.

Granickas concludes that to maximize the benefits of open government data: a) further research is required that structure and measure potential benefits of open government data; b) “government should pay more attention to creating feedback mechanisms between policy implementers, data providers and data-re-users”; c) “finding a balance between demand and supply requires mechanisms of shaping demand from data re-users and also demonstration of data inventory that governments possess”; and lastly, d) “open data policies require regular monitoring.”

Gurin, Joel. Open Data Now: The Secret to Hot Startups, Smart Investing, Savvy Marketing, and Fast Innovation, (New York: McGraw-Hill, 2014). http://amzn.to/1flubWR.

In this book, GovLab Senior Advisor and Open Data 500 director Joel Gurin explores the broad realized and potential benefit of Open Data, and how, “unlike Big Data, Open Data is transparent, accessible, and reusable in ways that give it the power to transform business, government, and society.”

The book provides “an essential guide to understanding all kinds of open databases – business, government, science, technology, retail, social media, and more – and using those resources to your best advantage.”

In particular, Gurin discusses a number of applications of Open Data with very real potential benefits:
- “Hot Startups: turn government data into profitable ventures;
- Savvy Marketing: understanding how reputational data drives your brand;
- Data-Driven Investing: apply new tools for business analysis;
- Consumer Information: connect with your customers using smart disclosure;
- Green Business: use data to bet on sustainable companies;
- Fast R&D: turn the online world into your research lab;
- New Opportunities: explore open fields for new businesses.”

Jetzek, Thorhildur, Michel Avital, and Niels Bjørn-Andersen. “Generating Value from Open Government Data.” Thirty Fourth International Conference on Information Systems, 5. General IS Topics 2013. http://bit.ly/1gCbQqL.

In this paper, the authors “developed a conceptual model portraying how data as a resource can be transformed to value.”

Jetzek, Avital and Bjørn-Andersen propose a conceptual model featuring four Enabling Factors (openness, resource governance, capabilities and technical connectivity) acting on four Value Generating Mechanisms (efficiency, innovation, transparency and participation) leading to the impacts of Economic and Social Value.

The authors argue that their research supports that “all four of the identified mechanisms positively influence value, reflected in the level of education, health and wellbeing, as well as the monetary value of GDP and environmental factors.”

Kassen, Maxat. “A promising phenomenon of open data: A case study of the Chicago open data project.” Government Information Quarterly (2013). http://bit.ly/1ewIZnk.

This paper uses the Chicago open data project to explore the “empowering potential of an open data phenomenon at the local level as a platform useful for promotion of civic engagement projects and provide a framework for future research and hypothesis testing.”

Kassen argues that “open data-driven projects offer a new platform for proactive civic engagement” wherein governments can harness “the collective wisdom of the local communities, their knowledge and visions of the local challenges, governments could react and meet citizens’ needs in a more productive and cost-efficient manner.”

The paper highlights the need for independent IT developers to network in order for this trend to continue, as well as the importance of the private sector in “overall diffusion of the open data concept.”

Keen, Justin, Radu Calinescu, Richard Paige, John Rooksby. “Big data + politics = open data: The case of health care data in England.” Policy and Internet 5 (2), (2013): 228–243. http://bit.ly/1i231WS.

This paper examines the assumptions regarding open datasets, technological infrastructure and access, using healthcare systems as a case study.

The authors specifically address two assumptions surrounding enthusiasm about Big Data in healthcare: the assumption that healthcare datasets and technological infrastructure are up to task, and the assumption of access to this data from outside the healthcare system.

By using the National Health Service in England as an example, the authors identify data, technology, and information governance challenges. They argue that “public acceptability of third party access to detailed health care datasets is, at best, unclear,” and that the prospects of Open Data depend on Open Data policies, which are inherently political, and the government’s assertion of property rights over large datasets. Thus, they argue that the “success or failure of Open Data in the NHS may turn on the question of trust in institutions.”

Kulk, Stefan and Bastiaan Van Loenen. “Brave New Open Data World?” International Journal of Spatial Data Infrastructures Research, May 14, 2012. http://bit.ly/15OAUYR.

This paper examines the evolving tension between the open data movement and the European Union’s privacy regulations, especially the Data Protection Directive.

The authors argue, “Technological developments and the increasing amount of publicly available data are…blurring the lines between non-personal and personal data. Open data may not seem to be personal data on first glance especially when it is anonymised or aggregated. However, it may become personal by combining it with other publicly available data or when it is de-anonymised.”

Kundra, Vivek. “Digital Fuel of the 21st Century: Innovation through Open Data and the Network Effect.” Joan Shorenstein Center on the Press, Politics and Public Policy, Harvard College: Discussion Paper Series, January 2012, http://hvrd.me/1fIwsjR.

In this paper, Vivek Kundra, the first Chief Information Officer of the United States, explores the growing impact of open data, and argues that, “In the information economy, data is power and we face a choice between democratizing it and holding on to it for an asymmetrical advantage.”

Kundra offers four specific recommendations to maximize the impact of open data: Citizens and NGOs must demand open data in order to fight government corruption, improve accountability and government services; Governments must enact legislation to change the default setting of government to open, transparent and participatory; The press must harness the power of the network effect through strategic partnerships and crowdsourcing to cut costs and provide better insights; and Venture capitalists should invest in startups focused on building companies based on public sector data.

Noveck, Beth Simone and Daniel L. Goroff. “Information for Impact: Liberating Nonprofit Sector Data.” The Aspen Institute Philanthropy & Social Innovation Publication Number 13-004. 2013. http://bit.ly/WDxd7p.

This report is focused on “obtaining better, more usable data about the nonprofit sector,” which encompasses, as of 2010, “1.5 million tax-exempt organizations in the United States with $1.51 trillion in revenues.”

Toward that goal, the authors propose liberating data from the Form 990, an Internal Revenue Service form that “gathers and publishes a large amount of information about tax-exempt organizations,” including information related to “governance, investments, and other factors not directly related to an organization’s tax calculations or qualifications for tax exemption.”

The authors recommend a two-track strategy: “Pursuing the longer-term goal of legislation that would mandate electronic filing to create open 990 data, and pursuing a shorter-term strategy of developing a third party platform that can demonstrate benefits more immediately.”

Robinson, David G., Harlan Yu, William P. Zeller, and Edward W. Felten, “Government Data and the Invisible Hand.” Yale Journal of Law & Technology 11 (2009), http://bit.ly/1c2aDLr.

This paper proposes a new approach to online government data that “leverages both the American tradition of entrepreneurial self-reliance and the remarkable low-cost flexibility of contemporary digital technology.”

“In order for public data to benefit from the same innovation and dynamism that characterize private parties’ use of the Internet, the federal government must reimagine its role as an information provider. Rather than struggling, as it currently does, to design sites that meet each end-user need, it should focus on creating a simple, reliable and publicly accessible infrastructure that ‘exposes’ the underlying data.”

Ubaldi, Barbara. “Open Government Data: Towards Empirical Analysis of Open Government Data Initiatives.” OECD Working Papers on Public Governance. Paris: Organisation for Economic Co-operation and Development, May 27, 2013. http://bit.ly/15OB6qP.

This working paper from the OECD seeks to provide an all-encompassing look at the principles, concepts and criteria framing open government data (OGD) initiatives.

Ubaldi also analyzes a variety of challenges to implementing OGD initiatives, including policy, technical, economic and financial, organizational, cultural and legal impediments.

The paper also proposes a methodological framework for evaluating OGD Initiatives in OECD countries, with the intention of eventually “developing a common set of metrics to consistently assess impact and value creation within and across countries.”

Worthy, Ben. “David Cameron’s Transparency Revolution? The Impact of Open Data in the UK.” SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, November 29, 2013. http://bit.ly/NIrN6y.

In this article, Worthy “examines the impact of the UK Government’s Transparency agenda, focusing on the publication of spending data at local government level. It measures the democratic impact in terms of creating transparency and accountability, public participation and everyday information.”

Worthy’s findings, based on surveys of local authorities, interviews and FOI requests, are disappointing. He finds that:
- Open spending data has led to some government accountability, but largely from those already monitoring government, not regular citizens.
- Open Data has not led to increased participation, “as it lacks the narrative or accountability instruments to fully bring such effects.”
- It has also not “created a new stream of information to underpin citizen choice, though new innovations offer this possibility. The evidence points to third party innovations as the key.

Despite these initial findings, “Interviewees pointed out that Open Data holds tremendous opportunities for policy-making. Joined up data could significantly alter how policy is made and resources targeted. From small scale issues e.g. saving money through prescriptions to targeting homelessness or health resources, it can have a transformative impact. “

Zuiderwijk, Anneke, Marijn Janssen, Sunil Choenni, Ronald Meijer and Roexsana Sheikh Alibaks. “Socio-technical Impediments of Open Data.” Electronic Journal of e-Government 10, no. 2 (2012). http://bit.ly/17yf4pM.

This paper to seeks to identify the socio-technical impediments to open data impact based on a review of the open data literature, as well as workshops and interviews.

The authors discovered 118 impediments across ten categories: 1) availability and access; 2) find-ability; 3) usability; 4) understandability; 5) quality; 6) linking and combining data; 7) comparability and compatibility; 8) metadata; 9) interaction with the data provider; and 10) opening and uploading.

Zuiderwijk, Anneke and Marijn Janssen. “Open Data Policies, Their Implementation and Impact: A Framework for Comparison.” Government Information Quarterly 31, no. 1 (January 2014): 17–29. http://bit.ly/1bQVmYT.

In this article, Zuiderwijk and Janssen argue that “currently there is a multiplicity of open data policies at various levels of government, whereas very little systematic and structured research [being] done on the issues that are covered by open data policies, their intent and actual impact.”

With this evaluation deficit in mind, the authors propose a new framework for comparing open data policies at different government levels using the following elements for comparison:
- Policy environment and context, such as level of government organization and policy objectives;
- Policy content (input), such as types of data not publicized and technical standards;
- Performance indicators (output), such as benefits and risks of publicized data; and
- Public values (impact).

To stay current on recent writings and developments on Open Data, please subscribe to the GovLab Digest.
Did we miss anything? Please submit reading recommendations to biblio@thegovlab.org or in the comments below.