Findings of the Big Data and Privacy Working Group Review

John Podesta at the White House Blog: “Over the past several days, severe storms have battered Arkansas, Oklahoma, Mississippi and other states. Dozens of people have been killed and entire neighborhoods turned to rubble and debris as tornadoes have touched down across the region. Natural disasters like these present a host of challenges for first responders. How many people are affected, injured, or dead? Where can they find food, shelter, and medical attention? What critical infrastructure might have been damaged?
Drawing on open government data sources, including Census demographics and NOAA weather data, along with their own demographic databases, Esri, a geospatial technology company, has created a real-time map showing where the twisters have been spotted and how the storm systems are moving. They have also used these data to show how many people live in the affected area, and summarize potential impacts from the storms. It’s a powerful tool for emergency services and communities. And it’s driven by big data technology.
In January, President Obama asked me to lead a wide-ranging review of “big data” and privacy—to explore how these technologies are changing our economy, our government, and our society, and to consider their implications for our personal privacy. Together with Secretary of Commerce Penny Pritzker, Secretary of Energy Ernest Moniz, the President’s Science Advisor John Holdren, the President’s Economic Advisor Jeff Zients, and other senior officials, our review sought to understand what is genuinely new and different about big data and to consider how best to encourage the potential of these technologies while minimizing risks to privacy and core American values.
Over the course of 90 days, we met with academic researchers and privacy advocates, with regulators and the technology industry, with advertisers and civil rights groups. The President’s Council of Advisors for Science and Technology conducted a parallel study of the technological trends underpinning big data. The White House Office of Science and Technology Policy jointly organized three university conferences at MIT, NYU, and U.C. Berkeley. We issued a formal Request for Information seeking public comment, and hosted a survey to generate even more public input.
Today, we presented our findings to the President. We knew better than to try to answer every question about big data in three months. But we are able to draw important conclusions and make concrete recommendations for Administration attention and policy development in a few key areas.
There are a few technological trends that bear drawing out. The declining cost of collection, storage, and processing of data, combined with new sources of data like sensors, cameras, and geospatial technologies, mean that we live in a world of near-ubiquitous data collection. All this data is being crunched at a speed that is increasingly approaching real-time, meaning that big data algorithms could soon have immediate effects on decisions being made about our lives.
The big data revolution presents incredible opportunities in virtually every sector of the economy and every corner of society.
Big data is saving lives. Infections are dangerous—even deadly—for many babies born prematurely. By collecting and analyzing millions of data points from a NICU, one study was able to identify factors, like slight increases in body temperature and heart rate, that serve as early warning signs an infection may be taking root—subtle changes that even the most experienced doctors wouldn’t have noticed on their own.
Big data is making the economy work better. Jet engines and delivery trucks now come outfitted with sensors that continuously monitor hundreds of data points and send automatic alerts when maintenance is needed. Utility companies are starting to use big data to predict periods of peak electric demand, adjusting the grid to be more efficient and potentially averting brown-outs.
Big data is making government work better and saving taxpayer dollars. The Centers for Medicare and Medicaid Services have begun using predictive analytics—a big data technique—to flag likely instances of reimbursement fraud before claims are paid. The Fraud Prevention System helps identify the highest-risk health care providers for waste, fraud, and abuse in real time and has already stopped, prevented, or identified $115 million in fraudulent payments.
But big data raises serious questions, too, about how we protect our privacy and other values in a world where data collection is increasingly ubiquitous and where analysis is conducted at speeds approaching real time. In particular, our review raised the question of whether the “notice and consent” framework, in which a user grants permission for a service to collect and use information about them, still allows us to meaningfully control our privacy as data about us is increasingly used and reused in ways that could not have been anticipated when it was collected.
Big data raises other concerns, as well. One significant finding of our review was the potential for big data analytics to lead to discriminatory outcomes and to circumvent longstanding civil rights protections in housing, employment, credit, and the consumer marketplace.
No matter how quickly technology advances, it remains within our power to ensure that we both encourage innovation and protect our values through law, policy, and the practices we encourage in the public and private sector. To that end, we make six actionable policy recommendations in our report to the President:
Advance the Consumer Privacy Bill of Rights. Consumers deserve clear, understandable, reasonable standards for how their personal information is used in the big data era. We recommend the Department of Commerce take appropriate consultative steps to seek stakeholder and public comment on what changes, if any, are needed to the Consumer Privacy Bill of Rights, first proposed by the President in 2012, and to prepare draft legislative text for consideration by stakeholders and submission by the President to Congress.
Pass National Data Breach Legislation. Big data technologies make it possible to store significantly more data, and further derive intimate insights into a person’s character, habits, preferences, and activities. That makes the potential impacts of data breaches at businesses or other organizations even more serious. A patchwork of state laws currently governs requirements for reporting data breaches. Congress should pass legislation that provides for a single national data breach standard, along the lines of the Administration’s 2011 Cybersecurity legislative proposal.
Extend Privacy Protections to non-U.S. Persons. Privacy is a worldwide value that should be reflected in how the federal government handles personally identifiable information about non-U.S. citizens. The Office of Management and Budget should work with departments and agencies to apply the Privacy Act of 1974 to non-U.S. persons where practicable, or to establish alternative privacy policies that apply appropriate and meaningful protections to personal information regardless of a person’s nationality.
Ensure Data Collected on Students in School is used for Educational Purposes. Big data and other technological innovations, including new online course platforms that provide students real time feedback, promise to transform education by personalizing learning. At the same time, the federal government must ensure educational data linked to individual students gathered in school is used for educational purposes, and protect students against their data being shared or used inappropriately.
Expand Technical Expertise to Stop Discrimination. The detailed personal profiles held about many consumers, combined with automated, algorithm-driven decision-making, could lead—intentionally or inadvertently—to discriminatory outcomes, or what some are already calling “digital redlining.” The federal government’s lead civil rights and consumer protection agencies should expand their technical expertise to be able to identify practices and outcomes facilitated by big data analytics that have a discriminatory impact on protected classes, and develop a plan for investigating and resolving violations of law.
Amend the Electronic Communications Privacy Act. The laws that govern protections afforded to our communications were written before email, the internet, and cloud computing came into wide use. Congress should amend ECPA to ensure the standard of protection for online, digital content is consistent with that afforded in the physical world—including by removing archaic distinctions between email left unread or over a certain age.
We also identify several broader areas ripe for further study, debate, and public engagement that, collectively, we hope will spark a national conversation about how to harness big data for the public good. We conclude that we must find a way to preserve our privacy values in both the domestic and international marketplace. We urgently need to build capacity in the federal government to identify and prevent new modes of discrimination that could be enabled by big data. We must ensure that law enforcement agencies using big data technologies do so responsibly, and that our fundamental privacy rights remain protected. Finally, we recognize that data is a valuable public resource, and call for continuing the Administration’s efforts to open more government data sources and make investments in research and technology.
While big data presents new challenges, it also presents immense opportunities to improve lives, the United States is perhaps better suited to lead this conversation than any other nation on earth. Our innovative spirit, technological know-how, and deep commitment to values of privacy, fairness, non-discrimination, and self-determination will help us harness the benefits of the big data revolution and encourage the free flow of information while working with our international partners to protect personal privacy. This review is but one piece of that effort, and we hope it spurs a conversation about big data across the country and around the world.
Read the Big Data Report.
See the fact sheet from today’s announcement.

Mapping the Intersection Between Social Media and Open Spaces in California

Stamen Design: “Last month, Stamen launched, a project we created in partnership with the Electric Roadrunner Lab, with the goal of revealing the diversity of social media activity that happens inside parks and other open spaces in California. If you haven’t already looked at the site, please go visit it now! Find your favorite park, or the parks that are nearest to you, or just stroll between random parks using the wander button. For more background about the goals of the project, read Eric’s blog post: A Conversation About California Parks.
In this post I’d like to describe some of the algorithms we use to collect the social media data that feeds the park pages. Currently we collect data from four social media platforms: Twitter, Foursquare, Flickr, and Instagram. We chose these because they all have public APIs (Application Programming Interfaces) that are easy to work with, and we expect they will provide a view into the different facets of each park, and the diverse communities who enjoy these parks. Each social media service creates its own unique geographies, and its own way of representing these parks. For example, the kinds of photos you upload to Instagram might be different from the photos you upload to Flickr. The way you describe experiences using Twitter might be different from the moments you document by checking into Foursquare. In the future we may add more feeds, but for now there’s a lot we can learn from these four.
Through the course of collecting data from these social network services, I also found that each service’s public API imposes certain constraints on our queries, producing their own intricate patterns. Thus, the quirks of how each API was written results in distinct and fascinating geometries. Also, since we are only interested in parks for this project, the process of culling non-park-related content further produces unusual and interesting patterns. Rural areas have large parks that cover huge areas, while cities have lots of (relatively) tiny parks, which creates its own challenges for how we query the APIs.
Broadly, we followed a similar approach for all the social media services. First, we grab the geocoded data from the APIs. This ignores any media that don’t have a latitude and longitude associated with them. In Foursquare, almost all checkins have a latitude and longitude, and for Flickr and Instagram most photos have a location associated with them. However, for Twitter, only around 1% of all tweets have geographic coordinates. But as we will see, even 1% still results in a whole lot of tweets!
After grabbing the social media data, we intersect it with the outlines of parks and open spaces in California, using polygons from the California Protected Areas Database maintained by GreenInfo Network. Everything that doesn’t intersect one of these parks, we throw away. The following maps represent the data as it looks before the filtering process.
But enough talking, let’s look at some maps!”

The Right Colors Make Data Easier To Read

Sharon Lin And Jeffrey Heer at HBR Blog: “What is the color of money? Of love? Of the ocean? In the United States, most people respond that money is green, love is red and the ocean is blue. Many concepts evoke related colors — whether due to physical appearance, common metaphors, or cultural conventions. When colors are paired with the concepts that evoke them, we call these “semantically resonant color choices.”
Artists and designers regularly use semantically resonant colors in their work. And in the research we conducted with Julie Fortuna, Chinmay Kulkarni, and Maureen Stone, we found they can be remarkably important to data visualization.
Consider these charts of (fictional) fruit sales:
The only difference between the charts is the color assignment. The left-hand chart uses colors from a default palette. The right-hand chart has been assigned semantically resonant colors. (In this case, the assignment was computed automatically using an algorithm that analyzes the colors in relevant images retrieved from Google Image Search using queries for each data category name.)
Now, try answering some questions about the data in each of these charts. Which fruit had higher sales: blueberries or tangerines? How about peaches versus apples? Which chart do you find easier to read?…
To make effective visualization color choices, you need to take a number of factors into consideration. To name just two: All the colors need to be suitably different from one another, for instance, so that readers can tell them apart – what’s called “discriminability.” You also need to consider what the colors look like to the color blind — roughly 8% of the U.S. male population! Could the colors be distinguished from one another if they were reprinted in black and white?
One easy way to assign semantically resonant colors is to use colors from an existing color palette that has been carefully designed for visualization applications (ColorBrewer offers some options) but assign the colors to data values in a way that best matches concept color associations. This is the basis of our own algorithm, which acquires images for each concept and then analyzes them to learn concept color associations. However, keep in mind that color associations may vary across cultures. For example, in the United States and many western cultures, luck is often associated with green (four-leaf clovers), while red can be considered a color of danger. However, in China, luck is traditionally symbolized with the color red.

Semantically resonant colors can reinforce perception of a wide range of data categories. We believe similar gains would likely be seen for other forms of visualizations like maps, scatterplots, and line charts. So when designing visualizations for presentation or analysis, consider color choice and ask yourself how well the colors resonate with the underlying data.”

The Open Data 500: Putting Research Into Action

TheGovLab Blog: “On April 8, the GovLab made two significant announcements. At an open data event in Washington, DC, I was pleased to announce the official launch of the Open Data 500, our study of 500 companies that use open government data as a key business resource. We also announced that the GovLab is now planning a series of Open Data Roundtables to bring together government agencies with the businesses that use their data – and that five federal agencies have agreed to participate. Video of the event, which was hosted by the Center for Data Innovation, is available here.
The Open Data 500, funded by the John S. and James L. Knight Foundation, is the first comprehensive study of U.S.-based companies that rely on open government data.  Our website at includes searchable, sortable information on 500 of these companies.  Our data about them comes from responses to a survey we’ve sent to all the companies (190 have responded) and what we’ve been able to learn from research using public information.  Anyone can now explore this website, read about specific companies or groups of companies, or download our data to analyze it. The website features an interactive tool on the home page, the Open Data Compass, that shows the connections between government agencies and different categories of companies visually.
We began work on the Open Data 500 study last fall with three goals. First, we wanted to collect information that will ultimately help calculate the economic value of open data – an important question for policymakers and others. Second, we wanted to present examples of open data companies to inspire others to use this important government resource in new ways. And third – and perhaps most important – we’ve hoped that our work will be a first step in creating a dialogue between the government agencies that provide open data and the companies that use it.
That dialogue is critically important to make government open data more accessible and useful. While open government data is a huge potential resource, and federal agencies are working to make it more available, it’s too often trapped in legacy systems that make the data difficult to find and to use. To solve this problem, we plan to connect agencies to their clients in the business community and help them work together to find and liberate the most valuable datasets.
We now plan to convene and facilitate a series of Open Data Roundtables – a new approach to bringing businesses and government agencies together. In these Roundtables, which will be informed by the Open Data 500 study, companies and the agencies that provide their data will come together in structured, results-oriented meetings that we will facilitate. We hope to help figure out what can be done to make the most valuable datasets more available and usable quickly.
We’ve been gratified by the immediate positive response to our plan from several federal agencies. The Department of Commerce has committed to help plan and participate in the first of our Roundtables, now being scheduled for May. By the time we announced our launch on April 8, the Departments of Labor, Transportation, and Treasury had also signed up. And at the end of the launch event, the Deputy Chief Information Officer of the USDA publicly committed her agency to participate as well…”

“Government Entrepreneur” is Not an Oxymoron

Mitchell Weiss in Harvard Business Review Blog: “Entrepreneurship almost always involves pushing against the status quo to capture opportunities and create value. So it shouldn’t be surprising when a new business model, such as ridesharing, disrupts existing systems and causes friction between entrepreneurs and local government officials, right?
But imagine if the road that led to the Seattle City Council ridesharing hearings this month — with rulings that sharply curtail UberX, Lyft, and Sidecar’s operations there — had been a vastly different one.  Imagine that public leaders had conceived and built a platform to provide this new, shared model of transit.  Or at the very least, that instead of having a revolution of the current transit regime done to Seattle public leaders, it was done with them.  Amidst the acrimony, it seems hard to imagine that public leaders could envision and operate such a platform, or that private innovators could work with them more collaboratively on it — but it’s not impossible. What would it take? Answer: more public entrepreneurs.
The idea of ”public entrepreneurship” may sound to you like it belongs on a list of oxymorons right alongside “government intelligence.” But it doesn’t.  Public entrepreneurs around the world are improving our lives, inventing entirely new ways to serve the public.   They are using sensors to detect potholes; word pedometers to help students learn; harnessing behavioral economics to encourage organ donation; crowdsourcing patent review; and transforming Medellin, Colombia with cable cars. They are coding in civic hackathons and competing in the Bloomberg challenge.  They are partnering with an Office of New Urban Mechanics in Boston or in Philadelphia, co-developing products in San Francisco’s Entrepreneurship-in-Residence program, or deploying some of the more than $430 million invested into civic-tech in the last two years.
There is, however, a big problem with public entrepreneurs: there just aren’t enough of them.  Without more public entrepreneurship, it’s hard to imagine meeting our public challenges or making the most of private innovation. One might argue that bungled healthcare website roll-outs or internet spying are evidence of too much activity on the part of public leaders, but I would argue that what they really show is too little entrepreneurial skill and judgment.
The solution to creating more public entrepreneurs is straightforward: train them. But, by and large, we don’t.  Consider Howard Stevenson’s definition of entrepreneurship: “the pursuit of opportunity without regard to resources currently controlled.” We could teach that approach to people heading towards the public sector. But now consider the following list of terms: “acknowledgement of multiple constituencies,” “risk reduction,” “formal planning,” “coordination,” “efficiency measures,” “clearly defined responsibility,” and “organizational culture.” It reads like a list of the kinds of concepts we would want a new public official to know; like it might be drawn from an interview evaluation form or graduate school syllabus.  In fact, it’s from Stevenson’s list of pressures that pull managers away from entrepreneurship and towards administration.  Of course, that’s not all bad. We must have more great public administrators.  But with all our challenges and amidst all the dynamism, we are going to need more than analysts and strategists in the public sector, we need inventors and builders, too.
Public entrepreneurship is not simply innovation in the public sector (though it makes use of innovation), and it’s not just policy reform (though it can help drive reform).  Public entrepreneurs build something from nothing with resources — be they financial capital or human talent or new rules — they didn’t command. In Boston, I worked with many amazing public managers and a handful of outstanding public entrepreneurs.  Chris Osgood and Nigel Jacob brought the country’s first major-city mobile 311 app to life, and they are public entrepreneurs.   They created Citizens Connect in 2009 by bringing together iPhones on loan together with a local coder and the most under-tapped resource in the public sector: the public.  They transformed the way basic neighborhood issues are reported and responded to (20% of all constituent cases in Boston are reported over smartphones now), and their model is now accessible to 40 towns in Massachusetts and cities across the country.  The Mayor’s team in Boston that started-up the One Fund in the days after the Marathon bombings were public entrepreneurs.  We built the organization from PayPal and a Post Office Box, and it went on to channel $61 million from donors to victims and survivors in just 75 days. It still operates today….
It’s worth noting that public entrepreneurship, perhaps newly buzzworthy, is not actually new. Elinor Ostrom (44 years before her Nobel Prize) observed public entrepreneurs inventing new models in the 1960s. Back when Ronald Reagan was president, Peter Drucker wrote that it was entrepreneurship that would keep public service “flexible and self-renewing.” And almost two decades have passed since David Osborne and Ted Gaebler’s “Reinventing Government” (the then handbook for public officials) carried the promising subtitle: “How the Entrepreneurial Spirit is Transforming the Public Sector”.  Public entrepreneurship, though not nearly as widespread as its private complement, or perhaps as fashionable as its “social” counterpart (focussed on non-profits and their ecosystem), has been around for a while and so have those who practiced it.
But still today, we mostly train future public leaders to be public administrators. We school them in performance management and leave them too inclined to run from risk instead of managing it. And we communicate often, explicitly or not, to private entrepreneurs that government officials are failures and dinosaurs.  It’s easy to see how that road led to Seattle this month, but hard see how it empowers public officials to take on the enormous challenges that still lie ahead of us, or how it enables the public to help them.”

Climate Data Initiative Launches with Strong Public and Private Sector Commitments

John Podesta and Dr. John P. Holdren at the White House blog:  “…today, delivering on a commitment in the President’s Climate Action Plan, we are launching the Climate Data Initiative, an ambitious new effort bringing together extensive open government data and design competitions with commitments from the private and philanthropic sectors to develop data-driven planning and resilience tools for local communities. This effort will help give communities across America the information and tools they need to plan for current and future climate impacts.
The Climate Data Initiative builds on the success of the Obama Administration’s ongoing efforts to unleash the power of open government data. Since, the central site to find U.S. government data resources, launched in 2009, the Federal government has released troves of valuable data that were previously hard to access in areas such as health, energy, education, public safety, and global development. Today these data are being used by entrepreneurs, researchers, tech innovators, and others to create countless new applications, tools, services, and businesses.
Data from NOAA, NASA, the U.S. Geological Survey, the Department of Defense, and other Federal agencies will be featured on, a new section within that opens for business today. The first batch of climate data being made available will focus on coastal flooding and sea level rise. NOAA and NASA will also be announcing an innovation challenge calling on researchers and developers to create data-driven simulations to help plan for the future and to educate the public about the vulnerability of their own communities to sea level rise and flood events.
These and other Federal efforts will be amplified by a number of ambitious private commitments. For example, Esri, the company that produces the ArcGIS software used by thousands of city and regional planning experts, will be partnering with 12 cities across the country to create free and open “maps and apps” to help state and local governments plan for climate change impacts. Google will donate one petabyte—that’s 1,000 terabytes—of cloud storage for climate data, as well as 50 million hours of high-performance computing with the Google Earth Engine platform. The company is challenging the global innovation community to build a high-resolution global terrain model to help communities build resilience to anticipated climate impacts in decades to come. And the World Bank will release a new field guide for the Open Data for Resilience Initiative, which is working in more than 20 countries to map millions of buildings and urban infrastructure….”

“Open-washing”: The difference between opening your data and simply making them available

Christian Villum at the Open Knowledge Foundation Blog:  “Last week, the Danish it-magazine Computerworld, in an article entitled “Check-list for digital innovation: These are the things you must know“, emphasised how more and more companies are discovering that giving your users access to your data is a good business strategy. Among other they wrote:

(Translation from Danish) According to Accenture it is becoming clear to many progressive businesses that their data should be treated as any other supply chain: It should flow easily and unhindered through the whole organisation and perhaps even out into the whole eco-system – for instance through fully open API’s.

They then use Google Maps as an example, which firstly isn’t entirely correct, as also pointed out by the Neogeografen, a geodata blogger, who explains how Google Maps isn’t offering raw data, but merely an image of the data. You are not allowed to download and manipulate the data – or run it off your own server.

But secondly I don’t think it’s very appropriate to highlight Google and their Maps project as a golden example of a business that lets its data flow unhindered to the public. It’s true that they are offering some data, but only in a very limited way – and definitely not as open data – and thereby not as progressively as the article suggests.

Surely it’s hard to accuse Google of not being progressive in general. The article states how Google Maps’ data are used by over 800,000 apps and businesses across the globe. So yes, Google has opened its silo a little bit, but only in a very controlled and limited way, which leaves these 800,000 businesses dependent on the continual flow of data from Google and thereby not allowing them to control the very commodities they’re basing their business on. This particular way of releasing data brings me to the problem that we’re facing: Knowing the difference between making data available and making them open.

Open data is characterized by not only being available, but being both legally open (released under an open license that allows full and free reuse conditioned at most to giving credit to it’s source and under same license) and technically available in bulk and in machine readable formats – contrary to the case of Google Maps. It may be that their data are available, but they’re not open. This – among other reasons – is why the global community around the 100% open alternative Open Street Map is growing rapidly and an increasing number of businesses choose to base their services on this open initiative instead.

But why is it important that data are open and not just available? Open data strengthens the society and builds a shared resource, where all users, citizens and businesses are enriched and empowered, not just the data collectors and publishers. “But why would businesses spend money on collecting data and then give them away?” you ask. Opening your data and making a profit are not mutually exclusive. Doing a quick Google search reveals many businesses that both offer open data and drives a business on them – and I believe these are the ones that should be highlighted as particularly progressive in articles such as the one from Computerworld….

We are seeing a rising trend of what can be termed “open-washing” (inspired by “greenwashing“) – meaning data publishers that are claiming their data is open, even when it’s not – but rather just available under limiting terms. If we – at this critical time in the formative period of the data driven society – aren’t critically aware of the difference, we’ll end up putting our vital data streams in siloed infrastructure built and owned by international corporations. But also to give our praise and support to the wrong kind of unsustainable technological development.”

This algorithm can predict a revolution

Russell Brandom at the Verge: “For students of international conflict, 2013 provided plenty to examine. There was civil war in Syria, ethnic violence in China, and riots to the point of revolution in Ukraine. For those working at Duke University’s Ward Lab, all specialists in predicting conflict, the year looks like a betting sheet, full of predictions that worked and others that didn’t pan out.

Guerrilla campaigns intensified, proving out the prediction

When the lab put out their semiannual predictions in July, they gave Paraguay a 97 percent chance of insurgency, largely based on reports of Marxist rebels. The next month, guerrilla campaigns intensified, proving out the prediction. In the case of China’s armed clashes between Uighurs and Hans, the models showed a 33 percent chance of violence, even as the cause of each individual flare-up was concealed by the country’s state-run media. On the other hand, the unrest in Ukraine didn’t start raising alarms until the action had already started, so the country was left off the report entirely.

According to Ward Lab’s staff, the purpose of the project isn’t to make predictions but to test theories. If a certain theory of geopolitics can predict an uprising in Ukraine, then maybe that theory is onto something. And even if these specialists could predict every conflict, it would only be half the battle. “It’s a success only if it doesn’t come at the cost of predicting a lot of incidents that don’t occur,” says Michael D. Ward, the lab’s founder and chief investigator, who also runs the blog Predictive Heuristics. “But it suggests that we might be on the right track.”

If a certain theory of geopolitics can predict an uprising in Ukraine, maybe that theory is onto something

Forecasting the future of a country wasn’t always done this way. Traditionally, predicting revolution or war has been a secretive project, for the simple reason that any reliable prediction would be too valuable to share. But as predictions lean more on data, they’ve actually become harder to keep secret, ushering in a new generation of open-source prediction models that butt against the siloed status quo.

Will this country’s government face an acute existential threat in the next six months?

The story of automated conflict prediction starts at the Defense Advance Research Projects Agency, known as the Pentagon’s R&D wing. In the 1990s, DARPA wanted to try out software-based approaches to anticipating which governments might collapse in the near future. The CIA was already on the case, with section chiefs from every region filing regular forecasts, but DARPA wanted to see if a computerized approach could do better. They looked at a simple question: will this country’s government face an acute existential threat in the next six months? When CIA analysts were put to the test, they averaged roughly 60 percent accuracy, so DARPA’s new system set the bar at 80 percent, looking at 29 different countries in Asia with populations over half a million. It was dubbed ICEWS, the Integrated Conflict Early Warning System, and it succeeded almost immediately, clearing 80 percent with algorithms built on simple regression analysis….

On the data side, researchers at Georgetown University are cataloging every significant political event of the past century into a single database called GDELT, and leaving the whole thing open for public research. Already, projects have used it to map the Syrian civil war and diplomatic gestures between Japan and South Korea, looking at dynamics that had never been mapped before. And then, of course, there’s Ward Lab, releasing a new sheet of predictions every six months and tweaking its algorithms with every development. It’s a mirror of the same open-vs.-closed debate in software — only now, instead of fighting over source code and security audits, it’s a fight over who can see the future the best.”

Developing an open government plan in the open

Tim Hughes at OGP: “New laws, standards, policies, processes and technologies are critical for opening up government, but arguably just as (if not more) important are new cultures, behaviours and ways of working within government and civil society.
The development of an OGP National Action Plan, therefore, presents a twofold opportunity for opening up government: On the one hand it should be used to deliver a set of robust and ambitious commitments to greater transparency, participation and accountability. But just as importantly, the process of developing a NAP should also be used to model new forms of open and collaborative working within government and civil society. These two purposes of a NAP should be mutually reinforcing. An open and collaborative process can – as was the case in the UK – help to deliver a more robust and ambitious action plan, which in turn can demonstrate the efficacy of working in the open.
You could even go one step further to say that the development of an National Action Plan should present an (almost) “ideal” vision of what open government in a country could look like. If governments aren’t being open as they’re developing an open government action plan, then there’s arguably little hope that they’ll be open elsewhere.
As coordinators of the UK OGP civil society network, this was on our mind at the beginning and throughout the development of the UK’s 2013-15 National Action Plan. Crucially, it was also on the minds of our counterparts in the UK Government. From the start, therefore, the process was developed with the intention that it should itself model the principles of open government. Members of the UK OGP civil society network met with policy officials from the UK Government on a regular basis to scope out and develop the action plan, and we published regular updates of our discussions and progress for others to follow and engage with. The process wasn’t without its challenges – and there’s still much more we can do to open it up further in the future – but it was successful in moving far beyond the typical model of government deciding, announcing and defending its intentions and in delivering an action plan with some strong and ambitious commitments.
One of the benefits of working in an open and collaborative way is that it enabled us to conduct and publish a full – warts and all – review of what went well and what didn’t. So, consider this is an invitation to delve into our successes and failures, a challenge to do it better and a request to help us to do so too. Head over to the UK OGP civil society network blog to read about what we did, and tell us what you think:

11 ways to rethink open data and make it relevant to the public

Miguel Paz at IJNET: “It’s time to transform open data from a trendy concept among policy wonks and news nerds into something tangible to everyday life for citizens, businesses and grassroots organizations. Here are some ideas to help us get there:
1. Improve access to data
Craig Hammer from the World Bank has tackled this issue, stating that “Open Data could be the game changer when it comes to eradicating global poverty”, but only if governments make available online data that become actionable intelligence: a launch pad for investigation, analysis, triangulation, and improved decision making at all levels.
2. Create open data for the end user
As Hammer wrote in a blog post for the Harvard Business Review, while the “opening” has generated excitement from development experts, donors, several government champions, and the increasingly mighty geek community, the hard reality is that much of the public has been left behind, or tacked on as an afterthought. Let`s get out of the building and start working for the end user.
3. Show, don’t tell
Regular folks don’t know what “open data” means. Actually, they probably don’t care what we call it and don’t know if they need it. Apple’s Steve Jobs said that a lot of times, people don’t know what they want until you show it to them. We need to stop telling them they need it and start showing them why they need it, through actionable user experience.
4. Make it relevant to people’s daily lives, not just to NGOs and policymakers’ priorities
A study of the use of open data and transparency in Chile showed the top 10 uses were for things that affect their lives directly for better or for worse: data on government subsidies and support, legal certificates, information services, paperwork. If the data doesn’t speak to priorities at the household or individual level, we’ve lost the value of both the “opening” of data, and the data itself.
5. Invite the public into the sandbox
We need to give people “better tools to not only consume, but to create and manipulate data,” says my colleague Alvaro Graves, Poderopedia’s semantic web developer and researcher. This is what Code for America does, and it’s also what happened with the advent of Web 2.0, when the availability of better tools, such as blogging platforms, helped people create and share content.
6. Realize that open data are like QR codes
Everyone talks about open data the way they used to talk about QR codes–as something ground breaking. But as with QR Codes, open data only succeeds with the proper context to satisfy the needs of citizens. Context is the most important thing to funnel use and success of open data as a tool for global change.
7. Make open data sexy and pop, like
Geeks became popular because they made useful and cool things that could be embraced by end users. Open data geeks need to stick with that program.
8. Help journalists embrace open data
Jorge Lanata, a famous Argentinian journalist who is now being targeted by the Cristina Fernández administration due to his unfolding of government corruption scandals, once said that 50 percent of the success of a story or newspaper is assured if journalists like it.
That’s true of open data as well. If journalists understand its value for the public interest and learn how to use it, so will the public. And if they do, the winds of change will blow. Governments and the private sector will be forced to provide better, more up-to-date and standardized data. Open data will be understood not as a concept but as a public information source as relevant as any other. We need to teach Latin American journalists to be part of this.
9. News nerds can help you put your open data to good use
In order to boost the use of open data by journalists we need news nerds, teams of lightweight and tech-heavy armored journalist-programmers who can teach colleagues how open data through brings us high-impact storytelling that can change public policies and hold authorities accountable.
News nerds can also help us with “institutionalizing data literacy across societies” as Hammer puts it. ICFJ Knight International Journalism Fellow and digital strategist Justin Arenstein calls these folks “mass mobilizers” of information. Alex Howard “points to these groups because they can help demystify data, to make it understandable by populations and not just statisticians.”
I call them News Ninja Nerds, accelerator taskforces that can foster innovationsin news, data and transparency in a speedy way, saving governments and organizations time and a lot of money. Projects like ProPublica’s Dollars For Docs are great examples of what can be achieved if you mix FOIA, open data and the will to provide news in the public interest.
10. Rename open data
Part of the reasons people don’t embrace concepts such as open data is because it is part of a lingo that has nothing to do with them. No empathy involved. Let’s start talking about people’s right to know and use the data generated by governments. As Tim O’Reilly puts it: “Government as a Platform for Greatness,” with examples we can relate to, instead of dead .PDF’s and dirty databases.
11. Don’t expect open data to substitute for thinking or reporting
Investigative Reporting can benefit from it. But “but there is no substitute for the kind of street-level digging, personal interviews, and detective work” great journalism projects entailed, says David Kaplan in a great post entitled, Why Open Data is Not Enough.”