Open Data (Updated and Expanded)


As part of an ongoing effort to build a knowledge base for the field of opening governance by organizing and disseminating its learnings, the GovLab Selected Readings series provides an annotated and curated collection of recommended works on key opening governance topics. We start our series with a focus on Open Data. To suggest additional readings on this or any other topic, please email biblio@thegovlab.org.

Data and its uses for GovernanceOpen data refers to data that is publicly available for anyone to use and which is licensed in a way that allows for its re-use. The common requirement that open data be machine-readable not only means that data is distributed via the Internet in a digitized form, but can also be processed by computers through automation, ensuring both wide dissemination and ease of re-use. Much of the focus of the open data advocacy community is on government data and government-supported research data. For example, in May 2013, the US Open Data Policy defined open data as publicly available data structured in a way that enables the data to be fully discoverable and usable by end users, and consistent with a number of principles focused on availability, accessibility and reusability.

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)
Fox, Mark S. “City Data: Big, Open and Linked.” Working Paper, Enterprise Integration Laboratory (2013). http://bit.ly/1bFr7oL.

  • This paper examines concepts that underlie Big City Data using data from multiple cities as examples. It begins by explaining the concepts of Open, Unified, Linked, and Grounded data, which are central to the Semantic Web. Fox then explore Big Data as an extension of Data Analytics, and provide case examples of good data analytics in cities.
  • Fox concludes that we can develop the tools that will enable anyone to analyze data, both big and small, by adopting the principles of the Semantic Web:
    • Data being openly available over the internet,
    • Data being unifiable using common vocabularies,
    • Data being linkable using International Resource Identifiers,
    • Data being accessible using a common data structure, namely triples,
    • Data being semantically grounded using Ontologies.

Foulonneau, Muriel, Sébastien Martin, and Slim Turki. “How Open Data Are Turned into Services?” In Exploring Services Science, edited by Mehdi Snene and Michel Leonard, 31–39. Lecture Notes in Business Information Processing 169. Springer International Publishing, 2014. http://bit.ly/1fltUmR.

  • In this chapter, the authors argue that, considering the important role the development of new services plays as a motivation for open data policies, the impact of new services created through open data should play a more central role in evaluating the success of open data initiatives.
  • Foulonneau, Martin and Turki argue that the following metrics should be considered when evaluating the success of open data initiatives: “the usage, audience, and uniqueness of the services, according to the changes it has entailed in the public institutions that have open their data…the business opportunity it has created, the citizen perception of the city…the modification to particular markets it has entailed…the sustainability of the services created, or even the new dialog created with citizens.”

Goldstein, Brett, and Lauren Dyson. Beyond Transparency: Open Data and the Future of Civic Innovation. 1 edition. (Code for America Press: 2013). http://bit.ly/15OAxgF

  • This “cross-disciplinary survey of the open data landscape” features stories from practitioners in the open data space — including Michael Flowers, Brett Goldstein, Emer Colmeman and many others — discussing what they’ve accomplished with open civic data. The book “seeks to move beyond the rhetoric of transparency for transparency’s sake and towards action and problem solving.”
  • The book’s editors seek to accomplish the following objectives:
    • Help local governments learn how to start an open data program
    • Spark discussion on where open data will go next
    • Help community members outside of government better engage with the process of governance
    • Lend a voice to many aspects of the open data community.
  • The book is broken into five sections: Opening Government Data, Building on Open Data, Understanding Open Data, Driving Decisions with Data and Looking Ahead.

Granickas, Karolis. “Understanding the Impact of Releasing and Re-using Open Government Data.” European Public Sector Information Platform, ePSIplatform Topic Report No. 2013/08, (2013). http://bit.ly/GU0Nx4.

  • This paper examines the impact of open government data by exploring the latest research in the field, with an eye toward enabling  an environment for open data, as well as identifying the benefits of open government data and its political, social, and economic impacts.
  • Granickas concludes that to maximize the benefits of open government data: a) further research is required that structure and measure potential benefits of open government data; b) “government should pay more attention to creating feedback mechanisms between policy implementers, data providers and data-re-users”; c) “finding a balance between demand and supply requires mechanisms of shaping demand from data re-users and also demonstration of data inventory that governments possess”; and lastly, d) “open data policies require regular monitoring.”

Gurin, Joel. Open Data Now: The Secret to Hot Startups, Smart Investing, Savvy Marketing, and Fast Innovation, (New York: McGraw-Hill, 2014). http://amzn.to/1flubWR.

  • In this book, GovLab Senior Advisor and Open Data 500 director Joel Gurin explores the broad realized and potential benefit of Open Data, and how, “unlike Big Data, Open Data is transparent, accessible, and reusable in ways that give it the power to transform business, government, and society.”
  • The book provides “an essential guide to understanding all kinds of open databases – business, government, science, technology, retail, social media, and more – and using those resources to your best advantage.”
  • In particular, Gurin discusses a number of applications of Open Data with very real potential benefits:
    • “Hot Startups: turn government data into profitable ventures;
    • Savvy Marketing: understanding how reputational data drives your brand;
    • Data-Driven Investing: apply new tools for business analysis;
    • Consumer Information: connect with your customers using smart disclosure;
    • Green Business: use data to bet on sustainable companies;
    • Fast R&D: turn the online world into your research lab;
    • New Opportunities: explore open fields for new businesses.”

Jetzek, Thorhildur, Michel Avital, and Niels Bjørn-Andersen. “Generating Value from Open Government Data.” Thirty Fourth International Conference on Information Systems, 5. General IS Topics 2013. http://bit.ly/1gCbQqL.

  • In this paper, the authors “developed a conceptual model portraying how data as a resource can be transformed to value.”
  • Jetzek, Avital and Bjørn-Andersen propose a conceptual model featuring four Enabling Factors (openness, resource governance, capabilities and technical connectivity) acting on four Value Generating Mechanisms (efficiency, innovation, transparency and participation) leading to the impacts of Economic and Social Value.
  • The authors argue that their research supports that “all four of the identified mechanisms positively influence value, reflected in the level of education, health and wellbeing, as well as the monetary value of GDP and environmental factors.”

Kassen, Maxat. “A promising phenomenon of open data: A case study of the Chicago open data project.Government Information Quarterly (2013). http://bit.ly/1ewIZnk.

  • This paper uses the Chicago open data project to explore the “empowering potential of an open data phenomenon at the local level as a platform useful for promotion of civic engagement projects and provide a framework for future research and hypothesis testing.”
  • Kassen argues that “open data-driven projects offer a new platform for proactive civic engagement” wherein governments can harness “the collective wisdom of the local communities, their knowledge and visions of the local challenges, governments could react and meet citizens’ needs in a more productive and cost-efficient manner.”
  • The paper highlights the need for independent IT developers to network in order for this trend to continue, as well as the importance of the private sector in “overall diffusion of the open data concept.”

Keen, Justin, Radu Calinescu, Richard Paige, John Rooksby. “Big data + politics = open data: The case of health care data in England.Policy and Internet 5 (2), (2013): 228–243. http://bit.ly/1i231WS.

  • This paper examines the assumptions regarding open datasets, technological infrastructure and access, using healthcare systems as a case study.
  • The authors specifically address two assumptions surrounding enthusiasm about Big Data in healthcare: the assumption that healthcare datasets and technological infrastructure are up to task, and the assumption of access to this data from outside the healthcare system.
  • By using the National Health Service in England as an example, the authors identify data, technology, and information governance challenges. They argue that “public acceptability of third party access to detailed health care datasets is, at best, unclear,” and that the prospects of Open Data depend on Open Data policies, which are inherently political, and the government’s assertion of property rights over large datasets. Thus, they argue that the “success or failure of Open Data in the NHS may turn on the question of trust in institutions.”

Kulk, Stefan and Bastiaan Van Loenen. “Brave New Open Data World?International Journal of Spatial Data Infrastructures Research, May 14, 2012. http://bit.ly/15OAUYR.

  • This paper examines the evolving tension between the open data movement and the European Union’s privacy regulations, especially the Data Protection Directive.
  • The authors argue, “Technological developments and the increasing amount of publicly available data are…blurring the lines between non-personal and personal data. Open data may not seem to be personal data on first glance especially when it is anonymised or aggregated. However, it may become personal by combining it with other publicly available data or when it is de-anonymised.”

Kundra, Vivek. “Digital Fuel of the 21st Century: Innovation through Open Data and the Network Effect.” Joan Shorenstein Center on the Press, Politics and Public Policy, Harvard College: Discussion Paper Series, January 2012, http://hvrd.me/1fIwsjR.

  • In this paper, Vivek Kundra, the first Chief Information Officer of the United States, explores the growing impact of open data, and argues that, “In the information economy, data is power and we face a choice between democratizing it and holding on to it for an asymmetrical advantage.”
  • Kundra offers four specific recommendations to maximize the impact of open data: Citizens and NGOs must demand open data in order to fight government corruption, improve accountability and government services; Governments must enact legislation to change the default setting of government to open, transparent and participatory; The press must harness the power of the network effect through strategic partnerships and crowdsourcing to cut costs and provide better insights; and Venture capitalists should invest in startups focused on building companies based on public sector data.

Noveck, Beth Simone and Daniel L. Goroff. “Information for Impact: Liberating Nonprofit Sector Data.” The Aspen Institute Philanthropy & Social Innovation Publication Number 13-004. 2013. http://bit.ly/WDxd7p.

  • This report is focused on “obtaining better, more usable data about the nonprofit sector,” which encompasses, as of 2010, “1.5 million tax-exempt organizations in the United States with $1.51 trillion in revenues.”
  • Toward that goal, the authors propose liberating data from the Form 990, an Internal Revenue Service form that “gathers and publishes a large amount of information about tax-exempt organizations,” including information related to “governance, investments, and other factors not directly related to an organization’s tax calculations or qualifications for tax exemption.”
  • The authors recommend a two-track strategy: “Pursuing the longer-term goal of legislation that would mandate electronic filing to create open 990 data, and pursuing a shorter-term strategy of developing a third party platform that can demonstrate benefits more immediately.”

Robinson, David G., Harlan Yu, William P. Zeller, and Edward W. Felten, “Government Data and the Invisible Hand.” Yale Journal of Law & Technology 11 (2009), http://bit.ly/1c2aDLr.

  • This paper proposes a new approach to online government data that “leverages both the American tradition of entrepreneurial self-reliance and the remarkable low-cost flexibility of contemporary digital technology.”
  • “In order for public data to benefit from the same innovation and dynamism that characterize private parties’ use of the Internet, the federal government must reimagine its role as an information provider. Rather than struggling, as it currently does, to design sites that meet each end-user need, it should focus on creating a simple, reliable and publicly accessible infrastructure that ‘exposes’ the underlying data.”
Ubaldi, Barbara. “Open Government Data: Towards Empirical Analysis of Open Government Data Initiatives.” OECD Working Papers on Public Governance. Paris: Organisation for Economic Co-operation and Development, May 27, 2013. http://bit.ly/15OB6qP.

  • This working paper from the OECD seeks to provide an all-encompassing look at the principles, concepts and criteria framing open government data (OGD) initiatives.
  • Ubaldi also analyzes a variety of challenges to implementing OGD initiatives, including policy, technical, economic and financial, organizational, cultural and legal impediments.
  • The paper also proposes a methodological framework for evaluating OGD Initiatives in OECD countries, with the intention of eventually “developing a common set of metrics to consistently assess impact and value creation within and across countries.”

Worthy, Ben. “David Cameron’s Transparency Revolution? The Impact of Open Data in the UK.” SSRN Scholarly Paper. Rochester, NY: Social Science Research Network, November 29, 2013. http://bit.ly/NIrN6y.

  • In this article, Worthy “examines the impact of the UK Government’s Transparency agenda, focusing on the publication of spending data at local government level. It measures the democratic impact in terms of creating transparency and accountability, public participation and everyday information.”
  • Worthy’s findings, based on surveys of local authorities, interviews and FOI requests, are disappointing. He finds that:
    • Open spending data has led to some government accountability, but largely from those already monitoring government, not regular citizens.
    • Open Data has not led to increased participation, “as it lacks the narrative or accountability instruments to fully bring such effects.”
    • It has also not “created a new stream of information to underpin citizen choice, though new innovations offer this possibility. The evidence points to third party innovations as the key.
  • Despite these initial findings, “Interviewees pointed out that Open Data holds tremendous opportunities for policy-making. Joined up data could significantly alter how policy is made and resources targeted. From small scale issues e.g. saving money through prescriptions to targeting homelessness or health resources, it can have a transformative impact. “

Zuiderwijk, Anneke, Marijn Janssen, Sunil Choenni, Ronald Meijer and Roexsana Sheikh Alibaks. “Socio-technical Impediments of Open Data.” Electronic Journal of e-Government 10, no. 2 (2012). http://bit.ly/17yf4pM.

  • This paper to seeks to identify the socio-technical impediments to open data impact based on a review of the open data literature, as well as workshops and interviews.
  • The authors discovered 118 impediments across ten categories: 1) availability and access; 2) find-ability; 3) usability; 4) understandability; 5) quality; 6) linking and combining data; 7) comparability and compatibility; 8) metadata; 9) interaction with the data provider; and 10) opening and uploading.

Zuiderwijk, Anneke and Marijn Janssen. “Open Data Policies, Their Implementation and Impact: A Framework for Comparison.” Government Information Quarterly 31, no. 1 (January 2014): 17–29. http://bit.ly/1bQVmYT.

  • In this article, Zuiderwijk and Janssen argue that “currently there is a multiplicity of open data policies at various levels of government, whereas very little systematic and structured research [being] done on the issues that are covered by open data policies, their intent and actual impact.”
  • With this evaluation deficit in mind, the authors propose a new framework for comparing open data policies at different government levels using the following elements for comparison:
    • Policy environment and context, such as level of government organization and policy objectives;
    • Policy content (input), such as types of data not publicized and technical standards;
    • Performance indicators (output), such as benefits and risks of publicized data; and
    • Public values (impact).

To stay current on recent writings and developments on Open Data, please subscribe to the GovLab Digest.
Did we miss anything? Please submit reading recommendations to biblio@thegovlab.org or in the comments below.

House Bill Raises Questions about Crowdsourcing


Anne Bowser for Commons Lab (Wilson Center):”A new bill in the House is raising some key questions about how crowdsourcing is understood by scientists, government agencies, policymakers and the public at large.
Robin Bravender’s recent article in Environment & Energy Daily, “House Republicans Push Crowdsourcing on Agency Science,” (subscription required) neatly summarizes the debate around H.R. 4012, a bill introduced to the House of Representatives earlier this month. The House Science, Space and Technology Committe earlier this week held a hearing on the bill, which could see a committee vote as early as next month.
Dubbed the “Secret Science Reform Act of 2014,” the bill prohibits the Environmental Protection Agency (EPA) from “proposing, finalizing, or disseminating regulations or assessments based upon science that is not transparent or reproducible.” If the bill is passed, EPA would be unable to base assessments or regulations on any information not “publicly available in a manner that is sufficient for independent analysis.” This would include all information published in scholarly journals based on data that is not available as open source.
The bill is based on the premise that forcing EPA to use public data will inspire greater transparency by allowing “the crowd” to conduct independent analysis and interpretation. While the premise of involving the public in scientific research is sound, this characterization of crowdsourcing as a process separate from traditional scientific research is deeply problematic.
This division contrasts the current practices of many researchers, who use crowdsourcing to directly involve the public in scientific processes. Galaxy Zoo, for example, enlists digital volunteers (called “citizen scientists”) help classify more than 40 million photographs of galaxies taken by the Hubble Telescope. These crowdsourced morphological classifications are a powerful form of data analysis, a key aspect of the scientific process. Galaxy Zoo then publishes a catalogue of these classifications as an open-source data set. And the data reduction techniques and measures of confidence and bias for the data catalogue are documented in MNRAS, a peer-reviewed journal. A recent Google Scholar search shows that the data set published in MNRAS has been cited a remarkable 121 times.
As this example illustrates, crowdsourcing is often embedded in the process of formal scientific research. But prior to being published in a scientific journal, the crowdsourced contributions of non-professional volunteers are subject to the scrutiny of professional scientists through the rigorous process of peer review. Because peer review was designed as an institution to ensure objective and unbiased research, peer-reviewed scientific work is widely accepted as the best source of information for any science-based decision.
Separating crowdsourcing from the peer review process, as this legislation intends, means that there will be no formal filters in place to ensure that open data will not be abused by special interests. Ellen Silbergeld, a professor at John Hopkins University who testified at the hearing this week, made exactly this point when she pointed to data manipulation commonly practiced by tobacco lobbyists in the United States.
Contributing to scientific research is one goal of crowdsourcing for science. Involving the public in scientific research also increases volunteer understanding of research topics and the scientific process and inspires heightened community engagement. These goals are supported by President Obama’s Second Open Government National Action Plan, which calls for “increased crowdsourcing and citizen science programs” to support “an informed and active citizenry.” But H.R. 4012 does not support these goals. Rather, this legislation could further degrade the public’s understanding of science by encouraging the public to distrust professional scientists rather than collaborate with them.
Crowdsourcing benefits organizations by bringing in the unique expertise held by external volunteers, which can augment and enhance the traditional scientific process. In return, these volunteers benefit from exposure to new and exciting processes, such as scientific research. This mutually beneficial relationship depends on collaboration, not opposition. Supporting an antagonistic relationship between science-based organizations like the EPA and members of “the crowd” will benefit neither institutions, nor volunteers, nor the country as a whole.
 

what is impossible


impossible.com – is a new website and app that encourages people to do things for others for free. People can post wishes of things that they want or need help with and offer what they can give – can be things or skills. Impossible shows these wishes and offers and people can connect with one another. You can also create thank you posts to send people.

Managing Innovation in a Crowd


New paper by Acemoglu, Daron and Mostagir, Mohamed and Ozdaglar, Asuman E.: “Crowdsourcing is an emerging technology where innovation and production are sourced out to the public through an open call. At the center of crowdsourcing is a resource allocation problem: there is an abundance of workers but a scarcity of high skills, and an easy task assigned to a high-skill worker is a waste of resources. This problem is complicated by the fact that the exact difficulties of innovation tasks may not be known in advance, so tasks that require high-skill labor cannot be identified and allocated ahead of time. We show that the solution to this problem takes the form of a skill hierarchy, where tasks are first attempted by low-skill labor, and high skill workers only engage with a task if less skilled workers are unable to finish it. This hierarchy can be constructed and implemented in a decentralized manner even though neither the difficulties of the tasks nor the skills of the candidate workers are known. We provide a dynamic pricing mechanism that achieves this implementation by inducing workers to self-select into different layers. The mechanism is simple: each time a task is attempted and not finished, its price (reward upon completion) goes up.”

A Task-Fit Model of Crowdsourcing: Finding the Right Crowdsourcing Approach to Fit the Task


New paper by RT Nakatsu, EB Grossman, CL Iacovou: “We develop a framework for classifying crowdsourcing approaches in terms of the types of tasks for which they are best suited. The method we used to develop our task-fit taxonomy followed an iterative approach that considered over 100 well-known examples crowdsourcing. Our taxonomy considers three dimensions of task complexity: (1) task structure–is the task well-defined, or does it require a more open-ended solution;(2) task interdependence–can the task be solved by an individual or does it require a community of problem solvers; and (3) task commitment—what level of commitment is expected from crowd members? Based on our taxonomy, we identify seven categories of crowdsourcing, and discuss prototypical examples of each approach. Furnished with such an understanding, one should be able to determine which crowdsourcing approach is most suitable for a particular task situation.”

Online Video Game Plugs Players Into Real Biochemistry Lab


Science Now: “Crowdsourcing is the latest research rage—Kickstarter to raise funding, screen savers that number-crunch, and games to find patterns in data—but most efforts have been confined to the virtual lab of the Internet. In a new twist, researchers have now crowdsourced their experiments by connecting players of a video game to an actual biochemistry lab. The game, called EteRNA, allows players to remotely carry out real experiments to verify their predictions of how RNA molecules fold. The first big result: a study published this week in the Proceedings of the National Academy of Sciences, bearing the names of more than 37,000 authors—only 10 of them professional scientists. “It’s pretty amazing stuff,” says Erik Winfree, a biophysicist at the California Institute of Technology in Pasadena.
Some see EteRNA as a sign of the future for science, not only for crowdsourcing citizen scientists but also for giving them remote access to a real lab. “Cloud biochemistry,” as some call it, isn’t just inevitable, Winfree says: It’s already here. DNA sequencing, gene expression testing, and many biochemical assays are already outsourced to remote companies, and any “wet lab” experiment that can be automated will be automated, he says. “Then the scientists can focus on the non-boring part of their work.”
EteRNA grew out of an online video game called Foldit. Created in 2008 by a team led by David Baker and Zoran Popović, a molecular biologist and computer scientist, respectively, at the University of Washington, Seattle, Foldit focuses on predicting the shape into which a string of amino acids will fold. By tweaking virtual strings, Foldit players can surpass the accuracy of the fastest computers in the world at predicting the structure of certain proteins. Two members of the Foldit team, Adrien Treuille and Rhiju Das, conceived of EteRNA back in 2009. “The idea was to make a version of Foldit for RNA,” says Treuille, who is now based at Carnegie Mellon University in Pittsburgh, Pennsylvania. Treuille’s doctoral student Jeehyung Lee developed the needed software, but then Das persuaded them to take it a giant step further: hooking players up directly to a real-world, robot-controlled biochemistry lab. After all, RNA can be synthesized and its folded-up structure determined far more cheaply and rapidly than protein can.
Lee went back to the drawing board, redesigning the game so that it had not only a molecular design interface like Foldit, but also a laboratory interface for designing RNA sequences for synthesis, keeping track of hypotheses for RNA folding rules, and analyzing data to revise those hypotheses. By 2010, Lee had a prototype game ready for testing. Das had the RNA wet lab ready to go at Stanford University in Palo Alto, California, where he is now a professor. All they lacked were players.
A message to the Foldit community attracted a few hundred players. Then in early 2011, The New York Times wrote about EteRNA and tens of thousands of players flooded in.
The game comes with a detailed tutorial and a series of puzzles involving known RNA structures. Only after winning 10,000 points do you unlock the ability to join EteRNA’s research team. There the goal is to design RNA sequences that will fold into a target structure. Each week, eight sequences are chosen by vote and sent to Stanford for synthesis and structure determination. The data that come back reveal how well the sequences’ true structures matched their targets. That way, Treuille says, “reality keeps score.” The players use that feedback to tweak a set of hypotheses: design rules for determining how an RNA sequence will fold.
Two years and hundreds of RNA structures later, the players of EteRNA have proven themselves to be a potent research team. Of the 37,000 who played, about 1000 graduated to participating in the lab for the study published today. (EteRNA now has 133,000 players, 4000 of them doing research.) They generated 40 new rules for RNA folding. For example, at the junctions between different parts of the RNA structure—such as between a loop and an arm—the players discovered that it is far more stable if enriched with guanines and cytosines, the strongest bonding of the RNA base pairs. To see how well those rules describe reality, the humans then competed toe to toe against computers in a new series of RNA structure challenges. The researchers distilled the humans’ 40 rules into an algorithm called EteRNA Bot.”

How Government Can Make Open Data Work


Joel Gurin in Information Week: “At the GovLab at New York University, where I am senior adviser, we’re taking a different approach than McKinsey’s to understand the evolving value of government open data: We’re studying open data companies from the ground up. I’m now leading the GovLab’s Open Data 500 project, funded by the John S. and James L. Knight Foundation, to identify and examine 500 American companies that use government open data as a key business resource.
Our preliminary results show that government open data is fueling companies both large and small, across the country, and in many sectors of the economy, including health, finance, education, energy, and more. But it’s not always easy to use this resource. Companies that use government open data tell us it is often incomplete, inaccurate, or trapped in hard-to-use systems and formats.
It will take a thorough and extended effort to make government data truly useful. Based on what we are hearing and the research I did for my book, here are some of the most important steps the federal government can take, starting now, to make it easier for companies to add economic value to the government’s data.
1. Improve data quality
The Open Data Policy not only directs federal agencies to release more open data; it also requires them to release information about data quality. Agencies will have to begin improving the quality of their data simply to avoid public embarrassment. We can hope and expect that they will do some data cleanup themselves, demand better data from the businesses they regulate, or use creative solutions like turning to crowdsourcing for help, as USAID did to improve geospatial data on its grantees.
 
 

2. Keep improving open data resources
The government has steadily made Data.gov, the central repository of federal open data, more accessible and useful, including a significant relaunch last week. To the agency’s credit, the GSA, which administers Data.gov, plans to keep working to make this key website still better. As part of implementing the Open Data Policy, the administration has also set up Project Open Data on GitHub, the world’s largest community for open-source software. These resources will be helpful for anyone working with open data either inside or outside of government. They need to be maintained and continually improved.
3. Pass DATA
The Digital Accountability and Transparency Act would bring transparency to federal government spending at an unprecedented level of detail. The Act has strong bipartisan support. It passed the House with only one dissenting vote and was unanimously approved by a Senate committee, but still needs full Senate approval and the President’s signature to become law. DATA is also supported by technology companies who see it as a source of new open data they can use in their businesses. Congress should move forward and pass DATA as the logical next step in the work that the Obama administration’s Open Data Policy has begun.
4. Reform the Freedom of Information Act
Since it was passed in 1966, the federal Freedom of Information Act has gone through two major revisions, both of which strengthened citizens’ ability to access many kinds of government data. It’s time for another step forward. Current legislative proposals would establish a centralized web portal for all federal FOIA requests, strengthen the FOIA ombudsman’s office, and require agencies to post more high-interest information online before they receive formal requests for it. These changes could make more information from FOIA requests available as open data.
5. Engage stakeholders in a genuine way
Up to now, the government’s release of open data has largely been a one-way affair: Agencies publish datasets that they hope will be useful without consulting the organizations and companies that want to use it. Other countries, including the UK, France, and Mexico, are building in feedback loops from data users to government data providers, and the US should, too. The Open Data Policy calls for agencies to establish points of contact for public feedback. At the GovLab, we hope that the Open Data 500 will help move that process forward. Our research will provide a basis for new, productive dialogue between government agencies and the businesses that rely on them.
6. Keep using federal challenges to encourage innovation
The federal Challenge.gov website applies the best principles of crowdsourcing and collective intelligence. Agencies should use this approach extensively, and should pose challenges using the government’s open data resources to solve business, social, or scientific problems. Other approaches to citizen engagement, including federally sponsored hackathons and the White House Champions of Change program, can play a similar role.
Through the Open Data Policy and other initiatives, the Obama administration has set the right goals. Now it’s time to implement and move toward what US CTO Todd Park calls “data liberation.” Thousands of companies, organizations, and individuals will benefit.”

Use big data and crowdsourcing to detect nuclear proliferation, says DSB


FierceGovernmentIT: “A changing set of counter-nuclear proliferation problems requires a paradigm shift in monitoring that should include big data analytics and crowdsourcing, says a report from the Defense Science Board.
Much has changed since the Cold War when it comes to ensuring that nuclear weapons are subject to international controls, meaning that monitoring in support of treaties covering declared capabilities should be only one part of overall U.S. monitoring efforts, says the board in a January report (.pdf).
There are challenges related to covert operations, such as testing calibrated to fall below detection thresholds, and non-traditional technologies that present ambiguous threat signatures. Knowledge about how to make nuclear weapons is widespread and in the hands of actors who will give the United States or its allies limited or no access….
The report recommends using a slew of technologies including radiation sensors, but also exploitation of digital sources of information.
“Data gathered from the cyber domain establishes a rich and exploitable source for determining activities of individuals, groups and organizations needed to participate in either the procurement or development of a nuclear device,” it says.
Big data analytics could be used to take advantage of the proliferation of potential data sources including commercial satellite imaging, social media and other online sources.
The report notes that the proliferation of readily available commercial satellite imagery has created concerns about the introduction of more noise than genuine signal. “On balance, however, it is the judgment from the task force that more information from remote sensing systems, both commercial and dedicated national assets, is better than less information,” it says.
In fact, the ready availability of commercial imagery should be an impetus of governmental ability to find weak signals “even within the most cluttered and noisy environments.”
Crowdsourcing also holds potential, although the report again notes that nuclear proliferation analysis by non-governmental entities “will constrain the ability of the United States to keep its options open in dealing with potential violations.” The distinction between gathering information and making political judgments “will erode.”
An effort by Georgetown University students (reported in the Washington Post in 2011) to use open source data analyzing the network of tunnels used in China to hide its missile and nuclear arsenal provides a proof-of-concept on how crowdsourcing can be used to augment limited analytical capacity, the report says – despite debate on the students’ work, which concluded that China’s arsenal could be many times larger than conventionally accepted…
For more:
download the DSB report, “Assessment of Nuclear Monitoring and Verification Technologies” (.pdf)
read the WaPo article on the Georgetown University crowdsourcing effort”

Citizen roles in civic problem-solving and innovation


Satish Nambisan: “Can citizens be fruitfully engaged in solving civic problems? Recent initiatives in cities such as Boston (Citizens Connect), Chicago (Smart Chicago Collaborative), San Francisco (ImproveSF) and New York (NYC BigApps) indicate that citizens can be involved in not just identifying and reporting civic problems but in conceptualizing, designing and developing, and implementing solutions as well.
The availability of new technologies (e.g. social media) has radically lowered the cost of collaboration and the “distance” between government agencies and the citizens they serve. Further involving citizens — who are often closest to and possess unique knowledge about the problems they face — makes a lot of sense given the increasing complexity of the problems that need to be addressed.
A recent research report that I wrote highlights four distinct roles that citizens can play in civic innovation and problem-solving.
As explorer, citizens can identify and report emerging and existing civic problems. For example, Boston’s Citizen Connect initiative enables citizens to use specially built smartphone apps to report minor and major civic problems (from potholes and graffiti to water/air pollution). Closer to home, both Wisconsin and Minnesota have engaged thousands of citizen volunteers in collecting data on the quality of water in their neighborhood streams, lakes and rivers (the data thus gathered are analyzed by the state pollution control agency). Citizens also can be engaged in data analysis. The N.Y.-based Datakind initiative involves citizen volunteers using their data analysis skills to mine public data in health, education, environment, etc., to identify important civic issues and problems.
As “ideator,”citizens can conceptualize novel solutions to well-defined problems in public services. For example, the federal government’s Challenge.gov initiative employs online contests and competitions to solicit innovative ideas from citizens to solve important civic problems. Such “crowdsourcing” initiatives also have been launched at the county, city and state levels (e.g. Prize2theFuture competition in Birmingham, Ala.; ImproveSF in San Francisco).
As designer, citizens can design and/or develop implementable solutions to well-defined civic problems. For example, as part of initiatives such as NYC Big Apps and Apps for California, citizens have designed mobile apps to address specific issues such as public parking availability, public transport delays, etc. Similarly, the City Repair project in Portland, Ore., focuses on engaging citizens in co-designing and creatively transforming public places into sustainable community-oriented urban spaces.
As diffuser,citizens can play the role of a change agent and directly support the widespread adoption of civic innovations and solutions. For example, in recent years, physicians interacting with peer physicians in dedicated online communities have assisted federal and state government agencies in diffusing health technology innovations such as electronic medical record systems (EMRs).
In the private sector, companies across industries have benefited much from engaging with their customers in innovation. Evidence so far suggests that the benefits from citizen engagement in civic problem-solving are equally tangible, valuable and varied. However, the challenges associated with organizing such citizen co-creation initiatives are also many and imply the need for government agencies to adopt an intentional, well-thought-out approach….”

MIT Crowdsources the Next Great (free) IQ Test


ThePsychReport: “Raven’s Matrices have long been a gold standard for psychologists needing to measure general intelligence. But the good ones, the ones scientists like to use, are too expensive for most research projects.

Christopher Chabris, associate professor of psychology at Union College, and David Engel, postdoctoral associate at MIT Sloan School of Management, think the public can help. They recently launched a campaign to crowdsource “the next great IQ test.” The Matrix Reasoning Challenge, created through MIT’s Center for Collective Intelligence with Anita Woolley and Tom Malone,  calls on the public to design and submit matrix puzzles – 3×3 grids that asks subjects to complete a pattern by filling in a missing square.

Chabris says they aren’t trying to compete with commercially available tests used for diagnostic or clinical purposes, but rather want to provide a trustworthy and free alternative for scientists. Because these types of puzzles are nonverbal, culturally neutral, and objective, they have wide-ranging applications and are particularly useful when conducting research across various demographics. If this project is successful, a lot more scientists could do a lot more research.

A simple example of a matrix puzzle. Source: Matrix Reasoning Challenge

“Researchers typically don’t have that much money,” Chabris said. “They can’t afford pay per use tests. Sometimes they have no research budgets, or if they do, they’re not large enough for that kind of thing. Our real goal is to create something useful for researchers.”

Through the Matrix Reasoning Challenge, Chabris and Engel also hope to better understand how crowdsourcing can be used to problem-solve in social and cognitive sciences.

Social scientists already widely use crowdsourcing sites like Amazon’s Mechanical Turk to recruit participants for their studies, but the matrix project is different in that it seeks to tap into the public’s expertise to help solve scientific problems. Scientists in computer science and bioinformatics have been able to harness this expertise to yield some incredible results. Using TopCoder.com, NASA was able to find a more efficient way to deploy solar panels on the International Space Station. Harvard Medical School was able to develop better software for analyzing immune-system genes. With The Matrix Reasoning Challenge, Chabris and Engel are beginning to explore crowdsourcing’s potential in the social sciences.”