Paper by Gianluigi Viscusi and Christopher L. Tucci: “In conventional wisdom on crowdsourcing, the number of people define the crowd and maximizing this number is often assumed to be the goal of any crowdsourcingexercise. However, we propose that there are structural characteristics of the crowd that might be more important than the sheer number of participants. These characteristics include (1) growth rate and its attractiveness to the members, (2) the equality among members, (3) the density within provisional boundaries, (4) the goal orientation of the crowd, and (5) the “seriality” of the interactions between members of the crowd. We then propose a typology that may allow managers to position their companies’ initiatives among four strategic types: crowd crystals, online communities, closed crowd, and open crowd driven innovation. We show that incumbent companies may prefer a closed and controlled access to the crowd, limiting the potential for gaining results and insights from fully open crowd-driven innovation initiatives. Consequently, we argue that the effects on industries and organizations by open crowds are still to be explored, possibly via the mechanisms of entrepreneurs exploiting open crowds as new entrants, but also for the configuration of industries such as, e.g., finance, pharmaceuticals, or even the public sector where the value created usually comes from interpretation issues and exploratory problem solving…(More).”
When Lobbyists Write Legislation, This Data Mining Tool Traces The Paper Trail
FastCoExist: “Most kids learn the grade school civics lesson about how a bill becomes a law. What those lessons usually neglect to show is how legislation today is often birthed on a lobbyist’s desk.
But even for expert researchers, journalists, and government transparency groups, tracing a bill’s lineage isn’t easy—especially at the state level. Last year alone, there were 70,000 state bills introduced in 50 states. It would take one person five weeks to even read them all. Groups that do track state legislation usually focus narrowly on a single topic, such as abortion, or perhaps a single lobby groups.
Computers can do much better. A prototype tool, presented in September at Bloomberg’sData for Good Exchange 2015 conference, mines the Sunlight Foundation’s database of more than 500,000 bills and 200,000 resolutions for the 50 states from 2007 to 2015. It also compares them to 1,500 pieces of “model legislation” written by a few lobbying groups that made their work available, such as the conservative group ALEC (American Legislative Exchange Council) and the liberal group the State Innovation Exchange(formerly called ALICE).

The results are interesting. In one example of the program in use, the team—all from the Data Science for Social Good fellowship program in Chicago—created a graphic (above) that presents the relative influence of ALEC and ALICE in different states. The thickness of each line in the graphic correlates to the percentage of bills introduced in each state that are modeled on either group’s legislation. So a relatively liberal state like New York is mostly ALICE bills, while a “swing” state like Illinois has a lot from both groups….
Along with researchers from the University of Chicago, Wikimedia Foundation, Microsoft Research, and Northwestern University, Walsh is also co-author of another paperpresented at the Bloomberg conference shows how data science can increase government transparency.
Walsh and these co-authors developed software that automatically identifies earmarks in U.S. Congressional bills, showing how representatives are benefiting their own states with pork barrel projects. They verified that it works by comparing it to the results of a massive effort from the U.S. Office of Management and Budget to analyze earmarks for a few limited years. Their results, extended back to 1995 in a public database, showed that there may be many more earmarks than anyone thought.
“Governments are making more data available. It’s something like a needle in a haystack problem, trying to extract all that information out,” says Walsh. “Both of these projects are really about shining light to these dark places where we don’t know what’s going on.”
The state legislation tracker data is available for download here, and the team is working on an expanded system that automatically downloads new state legislation so it can stay up to date…(More)”
Citizen-Generated Data and Governments: Towards a Collaborative Model
Civicus: “…we’re very happy today to launch “Citizen-Generated Data and Governments: Towards a Collaborative Model”.
This piece explores the idea that governments could host and publish citizen-generated data (CGD) themselves, and whether this could mean that data is applied more widely and in a more sustainable way. It was inspired by a recent meeting in Buenos Aires with Argentine civil society organizations and government representatives, hosted by the City of Buenos Aires Innovation and Open Government Lab (Laboratorio de innovación y Gobierno Abierto de la Ciudad de Buenos Aires).
The meeting was organized to explore how people within government think about citizen-generated data, and discuss what would be needed for them to consider it as a valid method of data generation. One of the most novel and exciting ideas that surfaced was the potential for government open data portals, such as that managed by the Buenos Aires Innovation Lab, to host and publish CGD.
We wrote this report to explore this issue further, looking at existing models of data collaboration and outlining our first thoughts on the benefits and obstacles this kind of model might face. We welcome feedback from those with deeper expertise into different aspects of citizen-generated data, and look forward to refining these thoughts in the future together with the broader community…(More)”
Science is best when the data is an open book
It was 1986, and the American space agency, NASA, was reeling from the loss of seven lives. The space shuttle Challenger had broken apart about one minute after its launch.
A Congressional commission was formed to report on the tragedy. The physicist Richard Feynman was one of its members.
NASA officials had testified to Congress that the chance of a shuttle failure was around 1 in 100,000. Feynman wanted to look beyond the official testimony to the numbers and data that backed it up.
After completing his investigation, Feynman summed up his findings in an appendix to the Commission’s official report, in which he declaredthat NASA officials had “fooled themselves” into thinking that the shuttle was safe.
After a launch, shuttle parts sometimes came back damaged or behaved in unexpected ways. In many of those cases, NASA came up with convenient explanations that minimised the importance of these red flags. The people at NASA badly wanted the shuttle to be safe, and this coloured their reasoning.
To Feynman, this sort of behaviour was not surprising. In his career as a physicist, Feynman had observed that not just engineers and managers, but also basic scientists have biases that can lead to self-deception.
Feynman believed that scientists should constantly remind themselves of their biases. “The first principle” of being a good researcher, according to Feynman, “is that you must not fool yourself, and you are the easiest person to fool”….In the official report to Congress, Feynman and his colleagues recommended an independent oversight group be established to provide a continuing analysis of risk that was less biased than could be provided by NASA itself. The agency needed input from people who didn’t have a stake in the shuttle being safe.
Individual scientists also need that kind of input. The system of science ought to be set up in such a way that researchers subscribing to different theories can give independent interpretations of the same data set.
This would help protect the scientific community from the tendency for individuals to fool themselves into seeing support for their theory that isn’t there.
To me it’s clear: researchers should routinely examine others’ raw data. But in many fields today there is no opportunity to do so.
Scientists communicate their findings to each other via journal articles. These articles provide summaries of the data, often with a good deal of detail, but in many fields the raw numbers aren’t shared. And the summaries can be artfully arranged to conceal contradictions and maximise the apparent support for the author’s theory.
Occasionally, an article is true to the data behind it, showing the warts and all. But we shouldn’t count on it. As the chemist Matthew Todd has said to me, that would be like expecting a real estate agent’s brochure for a property to show the property’s flaws. You wouldn’t buy a house without seeing it with your own eyes. It can be unwise to buy into a theory without seeing the unfiltered data.
Many scientific societies recognise this. For many years now, some of the journals they oversee have had a policy of requiring authors to provide the raw data when other researchers request it.
Unfortunately, this policy has failed spectacularly, at least in some areas of science. Studies have found that when one researcher requests the data behind an article, that article’s authors respond with the data in fewer than half of cases. This is a major deficiency in the system of science, an embarrassment really.
The well-intentioned policy of requiring that data be provided upon request has turned out to be a formula for unanswered emails, for excuses, and for delays. A data before request policy, however, can be effective.
A few journals have implemented this, requiring that data be posted online upon publication of the article…(More)”
Advancing Open and Citizen-Centered Government
The White House: “Today, the United States released our third Open Government National Action Plan, announcing more than 40 new or expanded initiatives to advance the President’s commitment to an open and citizen-centered government….In the third Open Government National Action Plan, the Administration both broadens and deepens efforts to help government become more open and more citizen-centered. The plan includes new and impactful steps the Administration is taking to openly and collaboratively deliver government services and to support open government efforts across the country. These efforts prioritize a citizen-centric approach to government, including improved access to publicly available data to provide everyday Americans with the knowledge and tools necessary to make informed decisions.
One example is the College Scorecard, which shares data through application programming interfaces (APIs) to help students and families make informed choices about education. Open APIs help create an ecosystem around government data in which civil society can provide useful visual tools, making this data more accessible and commercial developers can enable even more value to be extracted to further empower students and their families. In addition to these newer approaches, the plan also highlights significant longstanding open government priorities such as access to information, fiscal transparency, and records management, and continues to push for greater progress in that work.
The plan also focuses on supporting implementation of the landmark 2030 Agenda for Sustainable Development, which sets out a vision and priorities for global development over the next 15 years and was adopted last month by 193 world leaders including President Obama. The plan includes commitments to harness open government and progress toward the Sustainable Development Goals (SDGs) both in the United States and globally, including in the areas of education, health, food security, climate resilience, science and innovation, justice and law enforcement. It also includes a commitment to take stock of existing U.S. government data that relates to the 17 SDGs, and to creating and using data to support progress toward the SDGs.
Some examples of open government efforts newly included in the plan:
- Promoting employment by unlocking workforce data, including training, skill, job, and wage listings.
- Enhancing transparency and participation by expanding available Federal services to theOpen311 platform currently available to cities, giving the public a seamless way to report problems and request assistance.
- Releasing public information from the electronically filed tax forms of nonprofit and charitable organizations (990 forms) as open, machine-readable data.
- Expanding access to justice through the White House Legal Aid Interagency Roundtable.
- Promoting open and accountable implementation of the Sustainable Development Goals….(More)”
How open company data was used to uncover the powerful elite benefiting from Myanmar’s multi-billion dollar jade industry
OpenCorporates: “Today, we’re pleased to release a white paper on how OpenCorporates data was used to uncover the powerful elite benefiting from Myanmar’s multi-billion dollar jade industry, in a ground-breaking report from Global Witness. This investigation is an important case study on how open company data and identifiers are critical tool to uncover corruption and the links between companies and the real people benefitting from it.
This white paper shows how not only was it critical that OpenCorporates had this information (much of the information was removed from the official register during the investigation), but that the fact that it was machine-readable data, available via an API (data service), and programmatically combinable with other data was essential to discover the hidden connections between the key actors and the jade industry. Global Witness was able to analyse this data with the help of Open Knowledge.
In this white paper, we make recommendations about the collection and publishing of statutory company information as open data to facilitate the creation of a hostile environment for corruption by providing a rigorous framework for public scrutiny and due diligence.
How Big Data Could Open The Financial System For Millions Of People
But that’s changing as the poor start leaving data trails on the Internet and on their cell phones. Now that data can be mined for what it says about someone’s creditworthiness, likeliness to repay, and all that hardcore stuff lenders want to know.
“Every time these individuals make a phone call, send a text, browse the Internet, engage social media networks, or top up their prepaid cards, they deepen the digital footprints they are leaving behind,” says a new report from the Omidyar Network. “These digital footprints are helping to spark a new kind of revolution in lending.”
The report, called “Big Data, Small Credit,” looks at the potential to expand credit access by analyzing mobile and smartphone usage data, utility records, Internet browsing patters and social media behavior….
“In the last few years, a cluster of fast-emerging and innovative firms has begun to use highly predictive technologies and algorithms to interrogate and generate insights from these footprints,” the report says.
“Though these are early days, there is enough to suggest that hundreds of millions of mass-market consumers may not have to remain ‘invisible’ to formal, unsecured credit for much longer.”…(More)
Can Mobile Phone Surveys Identify People’s Development Priorities?
Ben Leo and Robert Morello at the Center for Global Development: “Mobile phone surveys are fast, flexible, and cheap. But, can they be used to engage citizens on how billions of dollars in donor and government resources are spent? Over the last decade, donor governments and multilateral organizations have repeatedly committed to support local priorities and programs. Yet, how are they supposed to identify these priorities on a timely, regular basis? Consistent discussions with the local government are clearly essential, but so are feeding ordinary people’s views into those discussions. However, traditional tools, such as household surveys or consultative roundtables, present a range of challenges for high-frequency citizen engagement. That’s where mobile phone surveys could come in, enabled by the exponential rise in mobile coverage throughout the developing world.
Despite this potential, there have been only a handful of studies into whether mobile surveys are a reliable and representative tool across a broad range of developing-country contexts. Moreover, there have been almost none that specifically look at collecting information about people’s development priorities. Along with Tiago Peixoto,Steve Davenport, and Jonathan Mellon, who focus on promoting citizen engagement and open government practices at the World Bank, we sought to address this policy research gap. Through a study focused on four low-income countries (Afghanistan, Ethiopia, Mozambique, and Zimbabwe), we rigorously tested the feasibility of interactive voice recognition (IVR) surveys for gauging citizens’ development priorities.
Specifically, we wanted to know whether respondents’ answers are sensitive to a range of different factors, such as (i) the specified executing actor (national government or external partners); (ii) time horizons; or (iii) question formats. In other words, can we be sufficiently confident that surveys about people’s priorities can be applied more generally to a range of development actors and across a range of country contexts?
Several of these potential sensitivity concerns were raised in response to an earlier CGD working paper, which found that US foreign aid is only modestly aligned with Africans’ and Latin Americans’ most pressing concerns. This analysis relied upon Afrobarometer and Latinobarometro survey data (see explanatory note below). For instance, some argued that people’s priorities for their own government might be far less relevant for donor organizations. Put differently, the World Bank or USAID shouldn’t prioritize job creation in Nigeria simply because ordinary Nigerians cite it as a pressing government priority. Our hypothesis was that development priorities would likely transcend all development actors, and possibly different timeframes and question formats as well. But, we first needed to test these assumptions.
So, what did we find? We’ve included some of the key highlights below. For a more detailed description of the study and the underlying analysis, please see our new working paper. Along with our World Bank colleagues, we also published an accompanying paper that considers a range of survey method issues, including survey representativeness….(More)”
Using data to improve the environment
Sir Philip Dilley at the UK Environment Agency: “We live in a data rich world. As an engineer I know the power of data in the design and implementation of new urban spaces, iconic buildings and the infrastructure on which we all depend.
Data also is a powerful force in helping us to protect the environment and it can be mined from a variety of sources.
Since the Victorian times naturalists have collected data on the natural world. At the Environment Agency we continue to use local enthusiasts to track rainfall, which we use to feed into and support local projections of flood risk. But the advent of computing power and the Government’s move to open data means we can now all use data in a new and exciting way. The result is a more informed approach to improving the environment and protecting people.
For the last 17 years the Environment Agency has used lasers in planes to map and scan the English landscape from above to help us carry out work such as flood modelling (data now available for everyone to use). The same information has been used to track changing coastal habitats and to help us use the power of nature to adapt to a changing climate.
We’ve used our LIDAR height data together with aerial photography to inform the location and design of major coastal realignment sites. The award-winning Medmerry project, which created 183 hectares of new coastal habitat and protects 348 properties from flooding, was based on this data-led approach.
Those who live near rivers or who use them for sport and recreation know the importance of getting up to date information on river flows. We already provide online services to the public so they can see current warnings and river levels information, but opening our data means everyone can get bespoke information through one postcode or location search.
We are not the only ones seeing the power of environmental data. Data entrepreneurs know how to get accurate and easily accessible information to the public. And that means that we can all make informed choices.FloodAlerts provides a graphical representation of flood warnings and gives localised updates every 15 minutes and Flood Risk Finder app provides flood risk profiles on any property in England, both using data made available for public use by the Environment Agency.
Our bathing waters data directs those who like to swim, surf or paddle with vital information on water quality. The Safer Seas Service app alerts water users when water quality is reduced at beaches and our bathing water data is also used by the Marine Conservation Society’s Good Beach Guide….(More)”
What is Citizensourcing?