Open Data Impact: When Demand and Supply Meet


Stefaan Verhulst and Andrew Young at the GovLab: “Today, in “Open Data Impact: When Demand and Supply Meet,” the GovLab and Omidyar Network release key findings about the social, economic, cultural and political impact of open data. The findings are based on 19 detailed case studies of open data projects from around the world. These case studies were prepared in order to address an important shortcoming in our understanding of when, and how, open data works. While there is no shortage of enthusiasm for open data’s potential, nor of conjectural estimates of its hypothetical impact, few rigorous, systematic analyses exist of its concrete, real-world impact…. The 19 case studies that inform this report, all of which can be found at Open Data’s Impact (odimpact.org), a website specially set up for this project, were chosen for their geographic and sectoral representativeness. They seek to go beyond the descriptive (what happened) to the explanatory (why it happened, and what is the wider relevance or impact)….

In order to achieve the potential of open data and scale the impact of the individual projects discussed in our report, we need a better – and more granular – understanding of the enabling conditions that lead to success. We found 4 central conditions (“4Ps”) that play an important role in ensuring success:

Conditions

  • Partnerships: Intermediaries and data collaboratives play an important role in ensuring success, allowing for enhanced matching of supply and demand of data.
  • Public infrastructure: Developing open data as a public infrastructure, open to all, enables wider participation, and a broader impact across issues and sectors.
  • Policies: Clear policies regarding open data, including those promoting regular assessments of open data projects, are also critical for success.
  • Problem definition: Open data initiatives that have a clear target or problem definition have more impact and are more likely to succeed than those with vaguely worded statements of intent or unclear reasons for existence. 

Core Challenges

Finally, the success of a project is also determined by the obstacles and challenges it confronts. Our research uncovered 4 major challenges (“4Rs”) confronting open data initiatives across the globe:

Challenges

  • Readiness: A lack of readiness or capacity (evident, for example, in low Internet penetration or technical literacy rates) can severely limit the impact of open data.
  • Responsiveness: Open data projects are significantly more likely to be successful when they remain agile and responsive—adapting, for instance, to user feedback or early indications of success and failure.
  • Risks: For all its potential, open data does pose certain risks, notably to privacy and security; a greater, more nuanced understanding of these risks will be necessary to address and mitigate them.
  • Resource Allocation: While open data projects can often be launched cheaply, those projects that receive generous, sustained and committed funding have a better chance of success over the medium and long term.

Toward a Next Generation Open Data Roadmap

The report we release today concludes with ten recommendations for policymakers, advocates, users, funders and other stakeholders in the open data community. For each step, we include a few concrete methods of implementation – ways to translate the broader recommendation into meaningful impact.

Together, these 10 recommendations and their means of implementation amount to what we call a “Next Generation Open Data Roadmap.” This roadmap is just a start, and we plan to continue fleshing it out in the near future. For now, it offers a way forward. It is our hope that this roadmap will help guide future research and experimentation so that we can continue to better understand how the potential of open data can be fulfilled across geographies, sectors and demographics.

Additional Resources

In conjunction with the release of our key findings paper, we also launch today an “Additional Resources” section on the Open Data’s Impact website. The goal of that section is to provide context on our case studies, and to point in the direction of other, complementary research. It includes the following elements:

  • A “repository of repositories,” including other compendiums of open data case studies and sources;
  • A compilation of some popular open data glossaries;
  • A number of open data research publications and reports, with a particular focus on impact;
  • A collection of open data definitions and a matrix of analysis to help assess those definitions….(More)

How to Crowdsource the Syrian Cease-Fire


Colum Lynch at Foreign Policy: “Can the wizards of Silicon Valley develop a set of killer apps to monitor the fragile Syria cease-fire without putting foreign boots on the ground in one of the world’s most dangerous countries?

They’re certainly going to try. The “cessation of hostilities” in Syria brokered by the United States and Russia last month has sharply reduced the levels of violence in the war-torn country and sparked a rare burst of optimism that it could lead to a broader cease-fire. But if the two sides lay down their weapons, the international community will face the challenge of monitoring the battlefield to ensure compliance without deploying peacekeepers or foreign troops. The emerging solution: using crowdsourcing, drones, satellite imaging, and other high-tech tools.

The high-level interest in finding a technological solution to the monitoring challenge was on full display last month at a closed-door meeting convened by the White House that brought together U.N. officials, diplomats, digital cartographers, and representatives of Google, DigitalGlobe, and other technology companies. Their assignment was to brainstorm ways of using high-tech tools to keep track of any future cease-fires from Syria to Libya and Yemen.

The off-the-record event came as the United States, the U.N., and other key powers struggle to find ways of enforcing cease-fires from Syria at a time when there is little political will to run the risk of sending foreign forces or monitors to such dangerous places. The United States has turned to high-tech weapons like armed drones as weapons of war; it now wants to use similar systems to help enforce peace.

Take the Syria Conflict Mapping Project, a geomapping program developed by the Atlanta-based Carter Center, a nonprofit founded by former U.S. President Jimmy Carter and his wife, Rosalynn, to resolve conflict and promote human rights. The project has developed an interactive digital map that tracks military formations by government forces, Islamist extremists, and more moderate armed rebels in virtually every disputed Syrian town. It is now updating its technology to monitor cease-fires.

The project began in January 2012 because of a single 25-year-old intern, Christopher McNaboe. McNaboe realized it was possible to track the state of the conflict by compiling disparate strands of publicly available information — including the shelling and aerial bombardment of towns and rebel positions — from YouTube, Twitter, and other social media sites. It has since developed a mapping program using software provided by Palantir Technologies, a Palo Alto-based big data company that does contract work for U.S. intelligence and defense agencies, from the CIA to the FBI….

Walter Dorn, an expert on technology in U.N. peace operations who attended the White House event, said he had promoted what he calls a “coalition of the connected.”

The U.N. or other outside powers could start by tracking social media sites, including Twitter and YouTube, for reports of possible cease-fire violations. That information could then be verified by “seeded crowdsourcing” — that is, reaching out to networks of known advocates on the ground — and technological monitoring through satellite imagery or drones.

Matthew McNabb, the founder of First Mile Geo, a start-up which develops geolocation technology that can be used to gather data in conflict zones, has another idea. McNabb, who also attended the White House event, believes “on-demand” technologies like SurveyMonkey, which provides users a form to create their own surveys, can be applied in conflict zones to collect data on cease-fire violations….(More)

Technology and politics: The signal and the noise


Special Issue of The Economist: “…The way these candidates are fighting their campaigns,each in his own way, is proof that politics as usual is no longer an option. The internet and the availability of huge piles of data on everyone and everything are transforming the democratic process, just as they are upending many industries. They are becoming a force in all kinds of things,from running election campaigns and organising protest movements to improving public policy and the delivery of services. This special report will argue that, as a result, the relationship between citizens and those who govern them is changing fundamentally.

Incongruous though it may seem, the forces that are now powering the campaign of Mr Trump—as well as that ofBernie Sanders, the surprise candidate on the Democratic side (Hillary Clinton is less of a success online)—were first seen in full cry during the Arab spring in 2011. The revolution in Egypt and other Arab countries was not instigated by Twitter, Facebook and other social-media services, but they certainly help edit gain momentum. “The internet is an intensifier,” says Marc Lynch of GeorgeWashington University, a noted scholar of the protest movements in the region…..

However, this special report will argue that, in the longer term, online crusading and organising will turn out to matter less to politics in the digital age than harnessing those ever-growing piles of data. The internet and related technologies, such as smart phones and cloud computing, make it cheap and easy not only to communicate but also to collect, store and analyse immense quantities of information. This is becoming ever more important in influencing political outcomes.

America’s elections are a case in point. Mr Cruz with his data savvy is merely following in the footsteps of Barack Obama, who won his first presidential term with the clever application of digital know-how. Campaigners are hoovering up more and more digital information about every voting-age citizen and stashing it away in enormous databases.With the aid of complex algorithms, these data allow campaigners to decide, say, who needs to be reminded to make the trip to the polling station and who may be persuaded to vote for a particular candidate.

No hiding place

In the case of protest movements, the waves of collective action leave a big digital footprint. Using ever more sophisticated algorithms, governments can mine these data.That is changing the balance of power. In the event of another Arab spring, autocrats would not be caught off guard again because they are now able to monitor protests and intervene when they consider it necessary. They can also identify and neutralise the most influential activists. Governments that were digitally blind when the internet first took off in the mid-1990s now have both a telescope and a microscope.

But data are not just changing campaigns and political movements; they affect how policy is made and public services are offered. This is most visible at local-government level. Cities have begun to use them for everything from smoothing traffic flows to identifying fire hazards. Having all this information at their fingertips is bound to change the way these bureaucracies work, and how they interact with citizens. This will not only make cities more efficient, but provide them with data and tools that could help them involve their citizens more.

This report will look at electoral campaigns, protest movements and local government in turn. Readers will note that most of the examples quoted are American and that most of the people quoted are academics. That is because the study of the interrelationship between data and politics is relatively new and most developed in America. But it is beginning to spill out from the ivory towers, and is gradually spreading to other countries.

The growing role of technology in politics raises many questions. How much of a difference, for instance, do digitally enabled protest surges really make? Many seem to emerge from nowhere, then crash almost as suddenly, defeated by hard political realities and entrenched institutions. The Arab spring uprising in Egypt is one example. Once the incumbent president, Hosni Mubarak, was toppled, the coalition that brought him down fell apart, leaving the stage to the old powers, first the Muslim Brotherhood and then the armed forces.

In party politics, some worry that the digital targeting of voters might end up reducing the democratic process to a marketing exercise. Ever more data and better algorithms, they fret, could lead politicians to ignore those unlikely to vote for them. And in cities it is no tclear that more data will ensure that citizens become more engaged….(More)

See also:

Crowdsourced Health


crowdsourcedhealthBook by Elad Yom-Tov: “Most of us have gone online to search for information about health. What are the symptoms of a migraine? How effective is this drug? Where can I find more resources for cancer patients? Could I have an STD? Am I fat? A Pew survey reports more than 80 percent of American Internet users have logged on to ask questions like these. But what if the digital traces left by our searches could show doctors and medical researchers something new and interesting? What if the data generated by our searches could reveal information about health that would be difficult to gather in other ways? In this book, Elad Yom-Tov argues that Internet data could change the way medical research is done, supplementing traditional tools to provide insights not otherwise available. He describes how studies of Internet searches have, among other things, already helped researchers track to side effects of prescription drugs, to understand the information needs of cancer patients and their families, and to recognize some of the causes of anorexia.

Yom-Tov shows that the information collected can benefit humanity without sacrificing individual privacy. He explains why people go to the Internet with health questions; for one thing, it seems to be a safe place to ask anonymously about such matters as obesity, sex, and pregnancy. He describes in detrimental effects of “pro-anorexia” online content; tells how computer scientists can scour search engine data to improve public health by, for example, identifying risk factors for disease and centers of contagion; and tells how analyses of how people deal with upsetting diagnoses help doctors to treat patients and patients to understand their conditions….(More)

The Social Intranet: Insights on Managing and Sharing Knowledge Internally


Paper by Ines Mergel for IBM Center for the Business of Government: “While much of the federal government lags behind, some agencies are pioneers in the internal use of social media tools.  What lessons and effective practices do they have to offer other agencies?

Social intranets,” Dr. Mergel writes, “are in-house social networks that use technologies – such as automated newsfeeds, wikis, chats, or blogs – to create engagement opportunities among employees.”  They also include the use of internal profile pages that help people identify expertise and interest (similar to Facebook or LinkedIn profiles), and that are used in combination with other social Intranet tools such as on-line communities or newsfeeds.

The report documents four case studies of government use of social intranets – two federal government agencies (the Department of State and the National Aeronautics and Space Administration) and two cross-agency networks (the U.S. Intelligence Community and the Government of Canada).

The author observes: “Most enterprise social networking platforms fail,” but identifies what causes these failures and how successful social intranet initiatives can avoid that fate and thrive.  She offers a series of insights for successfully implementing social intranets in the public sector, based on her observations and case studies. …(More)”

Access to Government Information in the United States: A Primer


Wendy Ginsberg and Michael Greene at Congressional Research Service: “No provision in the U.S. Constitution expressly establishes a procedure for public access to executive branch records or meetings. Congress, however, has legislated various public access laws. Among these laws are two records access statutes,

  • the Freedom of Information Act (FOIA; 5 U.S.C. §552), and
  • the Privacy Act (5 U.S.C. §552a),

and two meetings access statutes,

  •  the Federal Advisory Committee Act (FACA; 5 U.S.C. App.), and
  • the Government in the Sunshine Act (5 U.S.C. §552b).

These four laws provide the foundation for access to executive branch information in the American federal government. The records-access statutes provide the public with a variety of methods to examine how executive branch departments and agencies execute their missions. The meeting-access statutes provide the public the opportunity to participate in and inform the policy process. These four laws are also among the most used and most litigated federal access laws.

While the four statutes provide the public with access to executive branch federal records and meetings, they do not apply to the legislative or judicial branches of the U.S. government. The American separation of powers model of government provides a collection of formal and informal methods that the branches can use to provide information to one another. Moreover, the separation of powers anticipates conflicts over the accessibility of information. These conflicts are neither unexpected nor necessarily destructive. Although there is considerable interbranch cooperation in the sharing of information and records, such conflicts over access may continue on occasion.

This report offers an introduction to the four access laws and provides citations to additional resources related to these statutes. This report includes statistics on the use of FOIA and FACA and on litigation related to FOIA. The 114th Congress may have an interest in overseeing the implementation of these laws or may consider amending the laws. In addition, this report provides some examples of the methods Congress, the President, and the courts have employed to provide or require the provision of information to one another. This report is a primer on information access in the U.S. federal government and provides a list of resources related to transparency, secrecy, access, and nondisclosure….(More)”

It’s not big data that discriminates – it’s the people that use it


 in the Conversation: “Data can’t be racist or sexist, but the way it is used can help reinforce discrimination. The internet means more data is collected about us than ever before and it is used to make automatic decisions that can hugely affect our lives, from our credit scores to our employment opportunities.

If that data reflects unfair social biases against sensitive attributes, such as our race or gender, the conclusions drawn from that data might also be based on those biases.

But this era of “big data” doesn’t need to to entrench inequality in this way. If we build smarter algorithms to analyse our information and ensure we’re aware of how discrimination and injustice may be at work, we can actually use big data to counter our human prejudices.

This kind of problem can arise when computer models are used to make predictions in areas such as insurance, financial loans and policing. If members of a certain racial group have historically been more likely to default on their loans, or been more likely to be convicted of a crime, then the model can deem these people more risky. That doesn’t necessarily mean that these people actually engage in more criminal behaviour or are worse at managing their money. They may just be disproportionately targeted by police and sub-prime mortgage salesmen.

Excluding sensitive attributes

Data scientist Cathy O’Neil has written about her experience of developing models for homeless services in New York City. The models were used to predict how long homeless clients would be in the system and to match them with appropriate services. She argues that including race in the analysis would have been unethical.

If the data showed white clients were more likely to find a job than black ones, the argument goes, then staff might focus their limited resources on those white clients that would more likely have a positive outcome. While sociological research has unveiled the ways that racial disparities in homelessness and unemployment are the result of unjust discrimination, algorithms can’t tell the difference between just and unjust patterns. And so datasets should exclude characteristics that may be used to reinforce the bias, such as race.

But this simple response isn’t necessarily the answer. For one thing, machine learning algorithms can often infer sensitive attributes from a combination of other, non-sensitive facts. People of a particular race may be more likely to live in a certain area, for example. So excluding those attributes may not be enough to remove the bias….

An enlightened service provider might, upon seeing the results of the analysis, investigate whether and how racism is a barrier to their black clients getting hired. Equipped with this knowledge they could begin to do something about it. For instance, they could ensure that local employers’ hiring practices are fair and provide additional help to those applicants more likely to face discrimination. The moral responsibility lies with those responsible for interpreting and acting on the model, not the model itself.

So the argument that sensitive attributes should be stripped from the datasets we use to train predictive models is too simple. Of course, collecting sensitive data should be carefully regulated because it can easily be misused. But misuse is not inevitable, and in some cases, collecting sensitive attributes could prove absolutely essential in uncovering, predicting, and correcting unjust discrimination. For example, in the case of homeless services discussed above, the city would need to collect data on ethnicity in order to discover potential biases in employment practices….(More)

A new data viz tool shows what stories are being undercovered in countries around the world


Jospeh Lichterman at NiemanLab: “It’s a common lament: Though the Internet provides us access to a nearly unlimited number of sources for news, most of us rarely venture beyond the same few sources or topics. And as news consumption shifts to our phones, people are using even fewer sources: On average, consumers access 1.52 trusted news sources on their phones, according to the 2015 Reuters Digital News Report, which studied news consumption across several countries.

To try and diversify people’s perspectives on the news, Jigsaw — the techincubator, formerly known as Google Ideas, that’s run by Google’s parentcompany Alphabet — this week launched Unfiltered.News, an experimentalsite that uses Google News data to show users what topics are beingunderreported or are popular in regions around the world.

Screen Shot 2016-03-18 at 11.45.09 AM

Unfiltered.News’ main data visualization shows which topics are most reported in countries around the world. A column on the right side of the page highlights stories that are being reported widely elsewhere in the world, but aren’t in the top 100 stories on Google News in the selected country. In the United States yesterday, five of the top 10 underreported topics, unsurprisingly, dealt with soccer. In China, Barack Obama was the most undercovered topic….(More)”

Can Big Data Help Measure Inflation?


Bourree Lam in The Atlantic: “…As more and more people are shopping online, calculating this index has gotten more difficult, because there haven’t been any great ways of recording prices from the sites disparate retailers.Data shared by retailers and compiled by the technology firm Adobe might help close this gap. The company is perhaps known best for its visual software,including Photoshop, but the company has also become a provider of software and analytics for online retailers. Adobe is now aggregating the sales data that flows through their software for its Digital Price Index (DPI) project, an initiative that’s meant to answer some of the questions that have been dogging researcher snow that e-commerce is such a big part of the economy.

The project, which tracks billions of online transactions and the prices of over a million products, was developed with the help of the economists Austan Goolsbee, the former chairman of Obama’s Council of Economic Advisors and a professor at the University of Chicago’s Booth School of Business, and Peter Klenow, a professor at Stanford University. “We’ve been excited to help them setup various measures of the digital economy, and of prices, and also to see what the Adobe data can teach us about some of the questions that everybody’s had about the CPI,” says Goolsbee. “People are asking questions like ‘How price sensitive is online commerce?’ ‘How much is it growing?’ ‘How substitutable is itf or non-electronic commerce?’ A lot issues you can address with this in a way that we haven’t really been able to do before.” These are some questions that the DPI has the potential to answer.

…While this new trove of data will certainly be helpful to economists and analysts looking at inflation, it surely won’t replace the CPI. Currently, the government sends out hundreds of BLS employees to stores around the country to collect price data. Online pricing is a small part of the BLS calculation, which is incorporated into its methodology as people increasingly report shopping from retailers online, but there’s a significant time lag. While it’s unlikely that the BLS would incorporate private sources of data into its inflation calculations, as e-commerce grows they might look to improve the way they include online prices.Still, economists are optimistic about the potential of Adobe’s DPI. “I don’t think we know the digital economy as well as we should,” says Klenow, “and this data can help us eventually nail that better.”…(More)

Crowdlaw and open data policy: A perfect match?


 at Sunlight: “The open government community has long envisioned a future where all public policy is collaboratively drafted online and in the open — a future in which we (the people) don’t just have a say in who writes and votes on the rules that govern our society, but are empowered in a substantive way to participate, annotating or even crafting those rules ourselves. If that future seems far away, it’s because we’ve seen few successful instances of this approach in the United States. But an increasing amount of open and collaborative online approaches to drafting legislation — a set of practices the NYU GovLab and others have called “crowdlaw” — seem to have found their niche in open data policy.

This trend has taken hold at the local level, where multiple cities have employed crowdlaw techniques to draft or revise the regulations which establish and govern open data initiatives. But what explains this trend and the apparent connection between crowdlaw and the proactive release of government information online? Is it simply that both are “open government” practices? Or is there something more fundamental at play here?…

Since 2012, several high-profile U.S. cities have utilized collaborative tools such as Google Docs,GitHub, and Madison to open up the process of open data policymaking. The below chronology of notable instances of open data policy drafted using crowdlaw techniques gives the distinct impression of a good idea spreading in American cities:….

While many cities may not be ready to take their hands off of the wheel and trust the public to help engage in meaningful decisions about public policy, it’s encouraging to see some giving it a try when it comes to open data policy. Even for cities still feeling skeptical, this approach can be applied internally; it allows other departments impacted by changes that come about through an open data policy to weigh in, too. Cities can open up varying degrees of the process, retaining as much autonomy as they feel comfortable with. In the end, utilizing the crowdlaw process with open data legislation can increase its effectiveness and accountability by engaging the public directly — a win-win for governments and their citizens alike….(More)”