Can Big Data Stop Wars Before They Happen?


Foreign Policy: “It has been almost two decades exactly since conflict prevention shot to the top of the peace-building agenda, as large-scale killings shifted from interstate wars to intrastate and intergroup conflicts. What could we have done to anticipate and prevent the 100 days of genocidal killing in Rwanda that began in April 1994 or the massacre of thousands of Bosnian Muslims at Srebrenica just over a year later? The international community recognized that conflict prevention could no longer be limited to diplomatic and military initiatives, but that it also requires earlier intervention to address the causes of violence between nonstate actors, including tribal, religious, economic, and resource-based tensions.
For years, even as it was pursued as doggedly as personnel and funding allowed, early intervention remained elusive, a kind of Holy Grail for peace-builders. This might finally be changing. The rise of data on social dynamics and what people think and feel — obtained through social media, SMS questionnaires, increasingly comprehensive satellite information, news-scraping apps, and more — has given the peace-building field hope of harnessing a new vision of the world. But to cash in on that hope, we first need to figure out how to understand all the numbers and charts and figures now available to us. Only then can we expect to predict and prevent events like the recent massacres in South Sudan or the ongoing violence in the Central African Republic.
A growing number of initiatives have tried to make it across the bridge between data and understanding. They’ve ranged from small nonprofit shops of a few people to massive government-funded institutions, and they’ve been moving forward in fits and starts. Few of these initiatives have been successful in documenting incidents of violence actually averted or stopped. Sometimes that’s simply because violence or absence of it isn’t verifiable. The growing literature on big data and conflict prevention today is replete with caveats about “overpromising and underdelivering” and the persistent gap between early warning and early action. In the case of the Conflict Early Warning and Response Mechanism (CEWARN) system in central Africa — one of the earlier and most prominent attempts at early intervention — it is widely accepted that the project largely failed to use the data it retrieved for effective conflict management. It relied heavily on technology to produce large databases, while lacking the personnel to effectively analyze them or take meaningful early action.
To be sure, disappointments are to be expected when breaking new ground. But they don’t have to continue forever. This pioneering work demands not just data and technology expertise. Also critical is cross-discipline collaboration between the data experts and the conflict experts, who know intimately the social, political, and geographic terrain of different locations. What was once a clash of cultures over the value and meaning of metrics when it comes to complex human dynamics needs to morph into collaboration. This is still pretty rare, but if the past decade’s innovations are any prologue, we are hopefully headed in the right direction.
* * *
Over the last three years, the U.S. Defense Department, the United Nations, and the CIA have all launched programs to parse the masses of public data now available, scraping and analyzing details from social media, blogs, market data, and myriad other sources to achieve variations of the same goal: anticipating when and where conflict might arise. The Defense Department’s Information Volume and Velocity program is designed to use “pattern recognition to detect trends in a sea of unstructured data” that would point to growing instability. The U.N.’s Global Pulse initiative’s stated goal is to track “human well-being and emerging vulnerabilities in real-time, in order to better protect populations from shocks.” The Open Source Indicators program at the CIA’s Intelligence Advanced Research Projects Activity aims to anticipate “political crises, disease outbreaks, economic instability, resource shortages, and natural disasters.” Each looks to the growing stream of public data to detect significant population-level changes.
Large institutions with deep pockets have always been at the forefront of efforts in the international security field to design systems for improving data-driven decision-making. They’ve followed the lead of large private-sector organizations where data and analytics rose to the top of the corporate agenda. (In that sector, the data revolution is promising “to transform the way many companies do business, delivering performance improvements not seen since the redesign of core processes in the 1990s,” as David Court, a director at consulting firm McKinsey, has put it.)
What really defines the recent data revolution in peace-building, however, is that it is transcending size and resource limitations. It is finding its way to small organizations operating at local levels and using knowledge and subject experts to parse information from the ground. It is transforming the way peace-builders do business, delivering data-led programs and evidence-based decision-making not seen since the field’s inception in the latter half of the 20th century.
One of the most famous recent examples is the 2013 Kenyan presidential election.
In March 2013, the world was watching and waiting to see whether the vote would produce more of the violence that had left at least 1,300 people dead and 600,000 homeless during and after 2010 elections. In the intervening years, a web of NGOs worked to set up early-warning and early-response mechanisms to defuse tribal rivalries, party passions, and rumor-mongering. Many of the projects were technology-based initiatives trying to leverage data sources in new ways — including a collaborative effort spearheaded and facilitated by a Kenyan nonprofit called Ushahidi (“witness” in Swahili) that designs open-source data collection and mapping software. The Umati (meaning “crowd”) project used an Ushahidi program to monitor media reports, tweets, and blog posts to detect rising tensions, frustration, calls to violence, and hate speech — and then sorted and categorized it all on one central platform. The information fed into election-monitoring maps built by the Ushahidi team, while mobile-phone provider Safaricom donated 50 million text messages to a local peace-building organization, Sisi ni Amani (“We are Peace”), so that it could act on the information by sending texts — which had been used to incite and fuel violence during the 2007 elections — aimed at preventing violence and quelling rumors.
The first challenges came around 10 a.m. on the opening day of voting. “Rowdy youth overpowered police at a polling station in Dandora Phase 4,” one of the informal settlements in Nairobi that had been a site of violence in 2007, wrote Neelam Verjee, programs manager at Sisi ni Amani. The young men were blocking others from voting, and “the situation was tense.”
Sisi ni Amani sent a text blast to its subscribers: “When we maintain peace, we will have joy & be happy to spend time with friends & family but violence spoils all these good things. Tudumishe amani [“Maintain the peace”] Phase 4.” Meanwhile, security officers, who had been called separately, arrived at the scene and took control of the polling station. Voting resumed with little violence. According to interviews collected by Sisi ni Amani after the vote, the message “was sent at the right time” and “helped to calm down the situation.”
In many ways, Kenya’s experience is the story of peace-building today: Data is changing the way professionals in the field think about anticipating events, planning interventions, and assessing what worked and what didn’t. But it also underscores the possibility that we might be edging closer to a time when peace-builders at every level and in all sectors — international, state, and local, governmental and not — will have mechanisms both to know about brewing violence and to save lives by acting on that knowledge.
Three important trends underlie the optimism. The first is the sheer amount of data that we’re generating. In 2012, humans plugged into digital devices managed to generate more data in a single year than over the course of world history — and that rate more than doubles every year. As of 2012, 2.4 billion people — 34 percent of the world’s population — had a direct Internet connection. The growth is most stunning in regions like the Middle East and Africa where conflict abounds; access has grown 2,634 percent and 3,607 percent, respectively, in the last decade.
The growth of mobile-phone subscriptions, which allow their owners to be part of new data sources without a direct Internet connection, is also staggering. In 2013, there were almost as many cell-phone subscriptions in the world as there were people. In Africa, there were 63 subscriptions per 100 people, and there were 105 per 100 people in the Arab states.
The second trend has to do with our expanded capacity to collect and crunch data. Not only do we have more computing power enabling us to produce enormous new data sets — such as the Global Database of Events, Language, and Tone (GDELT) project, which tracks almost 300 million conflict-relevant events reported in the media between 1979 and today — but we are also developing more-sophisticated methodological approaches to using these data as raw material for conflict prediction. New machine-learning methodologies, which use algorithms to make predictions (like a spam filter, but much, much more advanced), can provide “substantial improvements in accuracy and performance” in anticipating violent outbreaks, according to Chris Perry, a data scientist at the International Peace Institute.
This brings us to the third trend: the nature of the data itself. When it comes to conflict prevention and peace-building, progress is not simply a question of “more” data, but also different data. For the first time, digital media — user-generated content and online social networks in particular — tell us not just what is going on, but also what people think about the things that are going on. Excitement in the peace-building field centers on the possibility that we can tap into data sets to understand, and preempt, the human sentiment that underlies violent conflict.
Realizing the full potential of these three trends means figuring out how to distinguish between the information, which abounds, and the insights, which are actionable. It is a distinction that is especially hard to make because it requires cross-discipline expertise that combines the wherewithal of data scientists with that of social scientists and the knowledge of technologists with the insights of conflict experts.

Is Your City’s Crime Data Private Property?


Adam Wisnieski at the Crime Report: “In February, the Minneapolis Police Department (MPD) announced it was moving into a new era of transparency and openness with the launch of a new public crime map.
“Crime analysis and mapping data is now in the hands of the city’s citizens,” reads the first line of the press release.
According to the release, the MPD will feed incident report data to RAIDS (Regional Analysis and Information Data Sharing) Online, a nationwide crime map operated by crime analysis software company BAIR Analytics.
Since the announcement, Minneapolis residents have used RAIDS to look at reports of murder, robbery, burglary, assault, rape and other crimes reported in their neighborhoods on a sleek, easy-to-use map, which includes data as recent as yesterday.
On the surface, it’s a major leap forward for transparency in Minneapolis. But some question why the data feed is given exclusively to a single private company.
Transparency advocates argue in fact that the data is not truly in the hands of the city’s residents until citizens can download the raw data so they can analyze, chart or map it on their own.
“For it to actually be open data, it needs to be available to the public in machine readable format,” said Lauren Reid, senior public affairs manager for Code for America, a national non-profit that promotes participation in government through technology.
“Anybody should be able to go download it and read it if they want. That’s open data.”
The Open Knowledge Foundation, a national non-profit that advocates for more government openness, argues open data is important so citizens can participate and engage government in a way that was not possible before.
“Much of the time, citizens are only able to engage with their own governance sporadically — maybe just at an election every 4 or 5 years,” reads the Open Knowledge website. “By opening up data, citizens are enabled to be much more directly informed and involved in decision-making.
“This is more than transparency: it’s about making a full ‘read/write’ society — not just about knowing what is happening in the process of governance, but being able to contribute to it.”.
Minneapolis is not alone.
As Americans demand more information on criminal activity from the government, police departments are flocking to private companies to help them get the information into the public domain.
For many U.S. cities, hooking up with these third-party mapping vendors is the most public their police department has ever been. But the trend has started a messy debate about how “public” the public data actually is.
Outsourcing Makes It Easy
For police departments, outsourcing the presentation of their crime data to a private firm is an easy decision.
Most of the crime mapping sites are free or cost very little. (The Omega Group’s CrimeMapping.com charges between $600 and $2,400 per year, depending on the size of the agency.)
The department chooses what information it wants to provide. Once the system is set up, the data flows to the companies and then to the public without a lot of effort on behalf of the department.
For the most part, the move doesn’t need legislative approval, just a memorandum of understanding. A police department can even fulfill a new law requiring a public crime map by releasing report data through one of these vendors.
Commander Scott Gerlicher of the MPD’s Strategic Information and Crime Analysis Division says the software has saved the department time.
“I don’t think we are entertaining quite as many requests from the media or the public,” he told The Crime Report. “Plus the price was right: it was free.”
The companies that run some of the most popular sites — The Omega Group’s CrimeMapping.com, Public Engines’ CrimeReports and BAIR Analytics’ RAIDS — are in the business of selling crime analysis and mapping software to police departments.
Some departments buy internal software from these companies; though some cities, like Minneapolis, just use RAIDS’ free map and have no contracts with BAIR for internal software.
Susan Smith, director of operations at BAIR Analytics, said the goal of RAIDS is to create one national map that includes all crime reports from across all jurisdictions and departments (state and local police).
For people who live near or at the edge of a city line, finding relevant crime data can be hard.
The MPD’s Gerlicher said that was one reason his department chose RAIDS — because many police agencies in the Minneapolis area had already hooked up with the firm.
The operators of these crime maps say they provide a community service.
“We try to get as many agencies as we possibly can. We truly believe this is a good service for the community,” says Gabriela Coverdale, a marketing director at the Omega Group.
Raw Data ‘Off Limits’
However, the sites do not allow the public to download any of the raw data and prohibit anyone from “scraping,” using a program to automatically pull the data from their maps.
In Minneapolis, the police department continues to post PDFs and excel spreadsheets with data, but only RAIDS gets a feed with the most recent data.
Alan Palazzolo, a Code for America fellow who works as an interactive developer for the online non-profit newspaper MinnPost, used monthly reports from the MPD to build a crime application with a map and geographic-oriented chart of crime in Minneapolis.
Nevertheless, he finds the new tool limiting.
“[The MPD’s] ability to actually put out more data, and more timely data, really opens things up,” he said. “It’s great, but they are not doing that with us.”
According to Palazzolo, the arrangement gives BAIR a market advantage that effectively prevents its data from being used for purposes it cannot control.
“Having granular, complete, and historical data would allow us to do more in-depth analysis,” wrote Palazzolo and Kaeti Hinck in an article in MinnPost last year.
“Granular data would allow us to look at smaller areas,” reads the article. “[N]eighborhoods are a somewhat arbitrary boundary when it comes to crime. Often high crime is isolated to a couple of blocks, but aggregated data does not allow us to explore this.
“More complete data would allow us to look at factors like exact locations, time of day, demographic issues, and detailed categories (like bike theft).”
The question of preference gets even messier when looking at another national crime mapping website called SpotCrime.
Unlike the other third-party mapping sites, SpotCrime is not in the business of selling crime analysis software to police departments. It operates more like a newspaper — a newspaper focused solely on the police blotter pages — and makes money off advertising.
Years ago, SpotCrime requested and received crime report data via e-mail from the Minneapolis Police Department and mapped the data on its website. According to SpotCrime owner Colin Drane, the MPD stopped sending e-mails when terminals were set up in the police department for the public to access the data.
So he instead started going through the painstaking process of transferring data from PDFs the MPD posted online and mapping them.
When the MPD hooked up with RAIDS in February, Drane asked for the same feed and was denied. He says more and more police departments around the country are hooking up with one of his competitors and not giving him the same timely data.
The MPD said it prefers RAIDS over SpotCrime and criticized some of the advertisements on SpotCrime.
“We’re not about supporting ad money,” said Gerlicher.
Drane believes all crime data in every city should be open to everyone, in order to prevent any single firm from monopolizing how the information is presented and used.
“The onus needs to be on the public agencies,” he adds. “They need to be fair with the data and they need to be fair with the public.” he said.
Transparency advocates worry that the trend is going in the opposite direction.
Ohio’s Columbus Police Department, for example, recently discontinued its public crime statistic feed and started giving the data exclusively to RAIDS.
The Columbus Dispatch wrote that the new system had less information than the old…”

Innovative State: How New Technologies Can Transform Government


“In Innovative State, America’s first Chief Technology Officer Aneesh Chopra tells the story of a new revolution in America. Over the course of our history, America has had a pioneering government matched to the challenges of the day. But over the past twenty years, as our economy and our society have been completely changed by technology, and the private sector has innovated, government has stalled, trapped in models that were designed for the America of the past. Aneesh Chopra, tasked with leading the charge for a more open, tech-savvy government, here shows how we can reshape our government and tackle our most vexing problems, from economic development to affordable healthcare. Drawing on interviews with leaders and building on his firsthand experience, Chopra’s Innovative State is a fascinating look at how to be smart, do more with less, and reshape American government for the twenty-first century.”
Website: http://www.innovativestate.com/
 

Sharing in a Changing Climate


Helen Goulden in the Huffington Post: “Every month, a social research agency conducts a public opinion survey on 30,000 UK households. As part of this households are asked about what issues they think are the most important; things such as crime, unemployment, inequality, public health etc. Climate change has ranked so consistently low on these surveys that they don’t both asking any more.
On first glance, it would appear that most people don’t care about a changing climate.
Yet, that’s simply not true. Many people care deeply, but fleetingly – in the same way they may consider their own mortality before getting back to thinking about what to have for tea. And others care, but fail to change their behaviour in a way that’s proportionate to their concerns. Certainly that’s my unhappy stomping ground.
Besides what choices do we really have? Even the most progressive, large organisations have been glacial to move towards any form of real form of sustainability. For many years we have struggled with the Frankenstein-like task of stitching ‘sustainability’ onto existing business and economic models and the results, I think, speak for themselves.
That the Collaborative Economy presents us with an opportunity – in Napster-like ways – to disrupt and evolve toward something more sustainable is compelling idea. Looking out to a future filled with opportunities to reconfigure how we produce, consume and dispose of the things we want and need to live, work and play.
Whether the journey toward sustainability is short or long, it will be punctuated with a good degree of turbulence, disruption and some largely unpredictable events. How we deal with those events and what role communities, collaboration and technology play may set the framework and tone for how that future evolves. Crises and disruption to our entrenched living patterns present ripe opportunities for innovation and space for adopting new behaviours and practices.
No-one is immune from the impact of erratic and extreme weather events. And if we accept that these events are going to increase in frequency, we must draw the conclusion that emergency state and government resources may be drawn more thinly over time.
Across the world, there is a fairly well organised state and international infrastructure for dealing with emergencies , involving everyone from the Disaster Emergency Committee, the UN, central and local government and municipalities, not for profit organisations and of course, the military. There is a clear reason why we need this kind of state emergency response; I’m not suggesting that we don’t.
But through the rise of open data and mass participation in platforms that share location, identity and inventory, we are creating a new kind of mesh; a social and technological infrastructure that could considerably strengthen our ability to respond to unpredictable events.
In the last few years we have seen a sharp rise in the number of tools and crowdsourcing platforms and open source sensor networks that are focused on observing, predicting or responding to extreme events:
• Apps like Shake Alert, which emits a minute warning that an earthquake is coming
• Rio’s sensor network, which measures rainfall outside the city and can predict flooding
• Open Source sensor software Arduino which is being used to crowd-source weather and pollution data
• Propeller Health, which is using Asthma sensors on inhalers to crowd-source pollution hotspots
• Safecast, which was developed for crowdsourcing radiation levels in Japan
Increasingly we have the ability to deploy open source, distributed and networked sensors and devices for capturing and aggregating data that can help us manage our responses to extreme weather (and indeed, other kinds of) events.
Look at platforms like LocalMind and Foursquare. Today, I might be using them to find out whether there’s a free table at a bar or finding out what restaurant my friends are in. But these kind of social locative platforms present an infrastructure that could be life-saving in any kind of situation where you need to know where to go quickly to get out of trouble. We know that in the wake of disruptive events and disasters, like bombings, riots etc, people now intuitively and instinctively take to technology to find out what’s happening, where to go and how to co-ordinate response efforts.
During the 2013 Bart Strike in San Francisco, ventures like Liquid Space and SideCar enabled people to quickly find alternative places to work, or alternatives to public transport, to mitigate the inconvenience of the strike. The strike was a minor inconvenience compared to the impact of a hurricane and flood but nevertheless, in both those instances, ventures decided waive their fees; as did AirBnB when 1,400 New York AirBnB hosts opened their doors to people who had been left homeless through Hurricane Sandy in 2012.
The impulse to help is not new. The matching of people’s offers of help and resources to on-the-ground need, in real time, is.”

Sammies finalists are harnessing technology to help the public


Lisa Rein in the Washington Post: “One team of federal agents led Medicare investigations that resulted in more than 600 convictions in South Florida, recovering hundreds of millions of dollars. Another official boosted access to burial sites for veterans across the country. And one guided an initiative to provide safe drinking water to 5 million people in Uganda and Kenya. These are some of the 33 individuals and teams of federal employees nominated for the 13th annual Samuel J. Heyman Service to America Medals, among the highest honors in government. The 2014 finalists reflect the achievements of public servants in fields from housing to climate change, their work conducted in Washington and locations as far-flung as Antarctica and Alabama…
Many of them have excelled in harnessing new technology in ways that are pushing the limits of what government thought was possible even a few years ago. Michael Byrne of the Federal Communications Commission, for example, put detailed data about broadband availability in the hands of citizens and policymakers using interactive online maps and other visualizations. At the Environmental Protection Agency, Douglas James Norton made water quality data that had never been public available on the Web for citizens, scientists and state agencies.”

Out in the Open: An Open Source Website That Gives Voters a Platform to Influence Politicians


Klint Finley in Wired: “This is the decade of the protest. The Arab Spring. The Occupy Movement. And now the student demonstrations in Taiwan.
Argentine political scientist Pia Mancini says we’re caught in a “crisis of representation.” Most of these protests have popped up in countries that are at least nominally democratic, but so many people are still unhappy with their elected leaders. The problem, Mancini says, is that elected officials have drifted so far from the people they represent, that it’s too hard for the average person to be heard.
“If you want to participate in the political system as it is, it’s really costly,” she says. “You need to study politics in university, and become a party member and work your way up. But not every citizen can devote their lives to politics.”

Democracy OS is designed to address that problem by getting citizens directly involved in debating specific proposals when their representatives are actually voting on them.

That’s why Mancini started the Net Democracy foundation, a not-for-profit that explores ways of improving civic engagement through technology. The foundation’s first project is something called Democracy OS, an online platform for debating and voting on political issues, and it’s already finding a place in the world. The federal government in Mexico is using this open-source tool to gather feedback on a proposed public data policy, and in Tunisia, a non-government organization called iWatch has adopted it in an effort to give the people a stronger voice.
Mancini’s dissatisfaction with electoral politics stems from her experience working for the Argentine political party Unión Celeste y Blanco from 2010 until 2012. “I saw some practices that I thought were harmful to societies,” she says. Parties were too interested in the appearances of the candidates, and not interested enough in their ideas. Worse, citizens were only consulted for their opinions once every two to four years, meaning politicians could get away with quite a bit in the meantime.
Democracy OS is designed to address that problem by getting citizens directly involved in debating specific proposals when their representatives are actually voting on them. It operates on three levels: one for gathering information about political issues, one for public debate about those issues, and one for actually voting on specific proposals.
Various communities now use a tool called Madison to discuss policy documents, and many activists and community organizations have adopted Loomio to make decisions internally. But Democracy OS aims higher: to provide a common platform for any city, state, or government to actually put proposals to a vote. “We’re able to actually overthrow governments, but we’re not using technology to decide what to do next,” Mancini says. “So the risk is that we create power vacuums that get filled with groups that are already very well organized. So now we need to take it a bit further. We need to decide what democracy for the internet era looks like.”
Image: Courtesy of Net Democracy

Software Shop as Political Party

Today Net Democracy is more than just a software development shop. It’s also a local political party based in Beunos Aires. Two years ago, the foundation started pitching the first prototype of the software to existing political parties as a way for them to gather feedback from constituents, but it didn’t go over well. “They said: ‘Thank you, this is cool, but we’re not interested,’” Mancini remembers. “So we decided to start our own political party.”
The Net Democracy Party hasn’t won any seats yet, but it promises that if it does, it will use Democracy OS to enable any local registered voter to tell party representatives how to vote. Mancini says the party representatives will always vote the way constituents tell them to vote through the software.

‘We’re not saying everyone should vote on every issue all the time. What were saying is that issues should be open for everyone to participate.’

She also uses the term “net democracy” to refer to the type of democracy that the party advocates, a form of delegative democracy that attempts to strike a balance between representative democracy and direct democracy. “We’re not saying everyone should vote on every issue all the time,” Mancini explains. “What were saying is that issues should be open for everyone to participate.”
Individuals will also be able to delegate their votes to other people. “So, if you’re not comfortable voting on health issues, you can delegate to someone else to vote for you in that area,” she says. “That way people with a lot of experience in an issue, like a community leader who doesn’t have lobbyist access to the system, can build more political capital.”
She envisions a future where decisions are made on two levels. Decisions that involve specific knowledge — macroeconomics, tax reforms, judiciary regulations, penal code, etc. — or that affect human rights are delegated “upwards” to representatives. But then decisions related to local issues — transport, urban development, city codes, etc. — cab be delegated “downwards” to the citizens.

The Secret Ballot Conundrum

Ensuring the integrity of the votes gathered via Democracy OS will be a real challenge. The U.S. non-profit organization Black Box Voting has long criticized electronic voting schemes as inherently flawed. “Our criticism of internet voting is that it is not transparent and cannot be made publicly transparent,” says Black Box Voting founder Bev Harris. “With transparency for election integrity defined as public ability to see and authenticate four things: who can vote, who did vote, vote count, and chain of custody.”
In short, there’s no known way to do a secret ballot online because any system for verifying that the votes were counted properly will inevitably reveal who voted for what.
Democracy OS deals with that by simply doing away with secret ballots. For now, the Net Democracy party will have people sign-up for Democracy OS accounts in person with their government issued ID cards. “There is a lot to be said about how anonymity allows you to speak more freely,” Mancini says. “But in the end, we decided to prioritize the reliability, accountability and transparency of the system. We believe that by making our arguments and decisions public we are fostering a civic culture. We will be more responsible for what we say and do if it’s public.”
But making binding decisions based on these online discussions would be problematic, since they would skew not just towards those tech savvy enough to use the software, but also towards those willing to have their names attached to their votes publicly. Fortunately, the software isn’t yet being used to gather real votes, just to gather public feedback….”

Findings of the Big Data and Privacy Working Group Review


John Podesta at the White House Blog: “Over the past several days, severe storms have battered Arkansas, Oklahoma, Mississippi and other states. Dozens of people have been killed and entire neighborhoods turned to rubble and debris as tornadoes have touched down across the region. Natural disasters like these present a host of challenges for first responders. How many people are affected, injured, or dead? Where can they find food, shelter, and medical attention? What critical infrastructure might have been damaged?
Drawing on open government data sources, including Census demographics and NOAA weather data, along with their own demographic databases, Esri, a geospatial technology company, has created a real-time map showing where the twisters have been spotted and how the storm systems are moving. They have also used these data to show how many people live in the affected area, and summarize potential impacts from the storms. It’s a powerful tool for emergency services and communities. And it’s driven by big data technology.
In January, President Obama asked me to lead a wide-ranging review of “big data” and privacy—to explore how these technologies are changing our economy, our government, and our society, and to consider their implications for our personal privacy. Together with Secretary of Commerce Penny Pritzker, Secretary of Energy Ernest Moniz, the President’s Science Advisor John Holdren, the President’s Economic Advisor Jeff Zients, and other senior officials, our review sought to understand what is genuinely new and different about big data and to consider how best to encourage the potential of these technologies while minimizing risks to privacy and core American values.
Over the course of 90 days, we met with academic researchers and privacy advocates, with regulators and the technology industry, with advertisers and civil rights groups. The President’s Council of Advisors for Science and Technology conducted a parallel study of the technological trends underpinning big data. The White House Office of Science and Technology Policy jointly organized three university conferences at MIT, NYU, and U.C. Berkeley. We issued a formal Request for Information seeking public comment, and hosted a survey to generate even more public input.
Today, we presented our findings to the President. We knew better than to try to answer every question about big data in three months. But we are able to draw important conclusions and make concrete recommendations for Administration attention and policy development in a few key areas.
There are a few technological trends that bear drawing out. The declining cost of collection, storage, and processing of data, combined with new sources of data like sensors, cameras, and geospatial technologies, mean that we live in a world of near-ubiquitous data collection. All this data is being crunched at a speed that is increasingly approaching real-time, meaning that big data algorithms could soon have immediate effects on decisions being made about our lives.
The big data revolution presents incredible opportunities in virtually every sector of the economy and every corner of society.
Big data is saving lives. Infections are dangerous—even deadly—for many babies born prematurely. By collecting and analyzing millions of data points from a NICU, one study was able to identify factors, like slight increases in body temperature and heart rate, that serve as early warning signs an infection may be taking root—subtle changes that even the most experienced doctors wouldn’t have noticed on their own.
Big data is making the economy work better. Jet engines and delivery trucks now come outfitted with sensors that continuously monitor hundreds of data points and send automatic alerts when maintenance is needed. Utility companies are starting to use big data to predict periods of peak electric demand, adjusting the grid to be more efficient and potentially averting brown-outs.
Big data is making government work better and saving taxpayer dollars. The Centers for Medicare and Medicaid Services have begun using predictive analytics—a big data technique—to flag likely instances of reimbursement fraud before claims are paid. The Fraud Prevention System helps identify the highest-risk health care providers for waste, fraud, and abuse in real time and has already stopped, prevented, or identified $115 million in fraudulent payments.
But big data raises serious questions, too, about how we protect our privacy and other values in a world where data collection is increasingly ubiquitous and where analysis is conducted at speeds approaching real time. In particular, our review raised the question of whether the “notice and consent” framework, in which a user grants permission for a service to collect and use information about them, still allows us to meaningfully control our privacy as data about us is increasingly used and reused in ways that could not have been anticipated when it was collected.
Big data raises other concerns, as well. One significant finding of our review was the potential for big data analytics to lead to discriminatory outcomes and to circumvent longstanding civil rights protections in housing, employment, credit, and the consumer marketplace.
No matter how quickly technology advances, it remains within our power to ensure that we both encourage innovation and protect our values through law, policy, and the practices we encourage in the public and private sector. To that end, we make six actionable policy recommendations in our report to the President:
Advance the Consumer Privacy Bill of Rights. Consumers deserve clear, understandable, reasonable standards for how their personal information is used in the big data era. We recommend the Department of Commerce take appropriate consultative steps to seek stakeholder and public comment on what changes, if any, are needed to the Consumer Privacy Bill of Rights, first proposed by the President in 2012, and to prepare draft legislative text for consideration by stakeholders and submission by the President to Congress.
Pass National Data Breach Legislation. Big data technologies make it possible to store significantly more data, and further derive intimate insights into a person’s character, habits, preferences, and activities. That makes the potential impacts of data breaches at businesses or other organizations even more serious. A patchwork of state laws currently governs requirements for reporting data breaches. Congress should pass legislation that provides for a single national data breach standard, along the lines of the Administration’s 2011 Cybersecurity legislative proposal.
Extend Privacy Protections to non-U.S. Persons. Privacy is a worldwide value that should be reflected in how the federal government handles personally identifiable information about non-U.S. citizens. The Office of Management and Budget should work with departments and agencies to apply the Privacy Act of 1974 to non-U.S. persons where practicable, or to establish alternative privacy policies that apply appropriate and meaningful protections to personal information regardless of a person’s nationality.
Ensure Data Collected on Students in School is used for Educational Purposes. Big data and other technological innovations, including new online course platforms that provide students real time feedback, promise to transform education by personalizing learning. At the same time, the federal government must ensure educational data linked to individual students gathered in school is used for educational purposes, and protect students against their data being shared or used inappropriately.
Expand Technical Expertise to Stop Discrimination. The detailed personal profiles held about many consumers, combined with automated, algorithm-driven decision-making, could lead—intentionally or inadvertently—to discriminatory outcomes, or what some are already calling “digital redlining.” The federal government’s lead civil rights and consumer protection agencies should expand their technical expertise to be able to identify practices and outcomes facilitated by big data analytics that have a discriminatory impact on protected classes, and develop a plan for investigating and resolving violations of law.
Amend the Electronic Communications Privacy Act. The laws that govern protections afforded to our communications were written before email, the internet, and cloud computing came into wide use. Congress should amend ECPA to ensure the standard of protection for online, digital content is consistent with that afforded in the physical world—including by removing archaic distinctions between email left unread or over a certain age.
We also identify several broader areas ripe for further study, debate, and public engagement that, collectively, we hope will spark a national conversation about how to harness big data for the public good. We conclude that we must find a way to preserve our privacy values in both the domestic and international marketplace. We urgently need to build capacity in the federal government to identify and prevent new modes of discrimination that could be enabled by big data. We must ensure that law enforcement agencies using big data technologies do so responsibly, and that our fundamental privacy rights remain protected. Finally, we recognize that data is a valuable public resource, and call for continuing the Administration’s efforts to open more government data sources and make investments in research and technology.
While big data presents new challenges, it also presents immense opportunities to improve lives, the United States is perhaps better suited to lead this conversation than any other nation on earth. Our innovative spirit, technological know-how, and deep commitment to values of privacy, fairness, non-discrimination, and self-determination will help us harness the benefits of the big data revolution and encourage the free flow of information while working with our international partners to protect personal privacy. This review is but one piece of that effort, and we hope it spurs a conversation about big data across the country and around the world.
Read the Big Data Report.
See the fact sheet from today’s announcement.

Saving Big Data from Big Mouths


Cesar A. Hidalgo in Scientific American: “It has become fashionable to bad-mouth big data. In recent weeks the New York Times, Financial Times, Wired and other outlets have all run pieces bashing this new technological movement. To be fair, many of the critiques have a point: There has been a lot of hype about big data and it is important not to inflate our expectations about what it can do.
But little of this hype has come from the actual people working with large data sets. Instead, it has come from people who see “big data” as a buzzword and a marketing opportunity—consultants, event organizers and opportunistic academics looking for their 15 minutes of fame.
Most of the recent criticism, however, has been weak and misguided. Naysayers have been attacking straw men, focusing on worst practices, post hoc failures and secondary sources. The common theme has been to a great extent obvious: “Correlation does not imply causation,” and “data has biases.”
Critics of big data have been making three important mistakes:
First, they have misunderstood big data, framing it narrowly as a failed revolution in social science hypothesis testing. In doing so they ignore areas where big data has made substantial progress, such as data-rich Web sites, information visualization and machine learning. If there is one group of big-data practitioners that the critics should worship, they are the big-data engineers building the social media sites where their platitudes spread. Engineering a site rich in data, like Facebook, YouTube, Vimeo or Twitter, is extremely challenging. These sites are possible because of advances made quietly over the past five years, including improvements in database technologies and Web development frameworks.
Big data has also contributed to machine learning and computer vision. Thanks to big data, Facebook algorithms can now match faces almost as accurately as humans do.
And detractors have overlooked big data’s role in the proliferation of computational design, data journalism and new forms of artistic expression. Computational artists, journalists and designers—the kinds of people who congregate at meetings like Eyeo—are using huge sets of data to give us online experiences that are unlike anything we experienced in paper. If we step away from hypothesis testing, we find that big data has made big contributions.
The second mistake critics often make is to confuse the limitations of prototypes with fatal flaws. This is something I have experienced often. For example, in Place Pulse—a project I created with my team the M.I.T. Media Lab—we used Google Street View images and crowdsourced visual surveys to map people’s perception of a city’s safety and wealth. The original method was rife with limitations that we dutifully acknowledged in our paper. Google Street View images are taken at arbitrary times of the day and showed cities from the perspective of a car. City boundaries were also arbitrary. To overcome these limitations, however, we needed a first data set. Producing that first limited version of Place Pulse was a necessary part of the process of making a working prototype.
A year has passed since we published Place Pulse’s first data set. Now, thanks to our focus on “making,” we have computer vision and machine-learning algorithms that we can use to correct for some of these easy-to-spot distortions. Making is allowing us to correct for time of the day and dynamically define urban boundaries. Also, we are collecting new data to extend the method to new geographical boundaries.
Those who fail to understand that the process of making is iterative are in danger of  being too quick to condemn promising technologies.  In 1920 the New York Times published a prediction that a rocket would never be able to leave  atmosphere. Similarly erroneous predictions were made about the car or, more recently, about iPhone’s market share. In 1969 the Times had to publish a retraction of their 1920 claim. What similar retractions will need to be published in the year 2069?
Finally, the doubters have relied too heavily on secondary sources. For instance, they made a piñata out of the 2008 Wired piece by Chris Anderson framing big data as “the end of theory.” Others have criticized projects for claims that their creators never made. A couple of weeks ago, for example, Gary Marcus and Ernest Davis published a piece on big data in the Times. There they wrote about another of one of my group’s projects, Pantheon, which is an effort to collect, visualize and analyze data on historical cultural production. Marcus and Davis wrote that Pantheon “suggests a misleading degree of scientific precision.” As an author of the project, I have been unable to find where I made such a claim. Pantheon’s method section clearly states that: “Pantheon will always be—by construction—an incomplete resource.” That same section contains a long list of limitations and caveats as well as the statement that “we interpret this data set narrowly, as the view of global cultural production that emerges from the multilingual expression of historical figures in Wikipedia as of May 2013.”
Bickering is easy, but it is not of much help. So I invite the critics of big data to lead by example. Stop writing op–eds and start developing tools that improve on the state of the art. They are much appreciated. What we need are projects that are worth imitating and that we can build on, not obvious advice such as “correlation does not imply causation.” After all, true progress is not something that is written, but made.”

Twitter Can Now Predict Crime, and This Raises Serious Questions


Motherboard: “Police departments in New York City may soon be using geo-tagged tweets to predict crime. It sounds like a far-fetched sci-fi scenario a la Minority Report, but when I contacted Dr. Matthew Greber, the University of Virginia researcher behind the technology, he explained that the system is far more mathematical than metaphysical.
The system Greber has devised is an amalgam of both old and new techniques. Currently, many police departments target hot spots for criminal activity based on actual occurrences of crime. This approach, called kernel density estimation (KDE), involves pairing a historical crime record with a geographic location and using a probability function to calculate the possibility of future crimes occurring in that area. While KDE is a serviceable approach to anticipating crime, it pales in comparison to the dynamism of Twitter’s real-time data stream, according to Dr. Gerber’s research paper “Predicting Crime Using Twitter and Kernel Density Estimation”.
Dr. Greber’s approach is similar to KDE, but deals in the ethereal realm of data and language, not paperwork. The system involves mapping the Twitter environment; much like how police currently map the physical environment with KDE. The big difference is that Greber is looking at what people are talking about in real time, as well as what they do after the fact, and seeing how well they match up. The algorithms look for certain language that is likely to indicate the imminent occurrence of a crime in the area, Greber says. “We might observe people talking about going out, getting drunk, going to bars, sporting events, and so on—we know that these sort of events correlate with crime, and that’s what the models are picking up on.”
Once this data is collected, the GPS tags in tweets allows Greber and his team to pin them to a virtual map and outline hot spots for potential crime. However, everyone who tweets about hitting the club later isn’t necessarily going to commit a crime. Greber tests the accuracy of his approach by comparing Twitter-based KDE predictions with traditional KDE predictions based on police data alone. The big question is, does it work? For Greber, the answer is a firm “sometimes.” “It helps for some, and it hurts for others,” he says.
According to the study’s results, Twitter-based KDE analysis yielded improvements in predictive accuracy over traditional KDE for stalking, criminal damage, and gambling. Arson, kidnapping, and intimidation, on the other hand, showed a decrease in accuracy from traditional KDE analysis. It’s not clear why these crimes are harder to predict using Twitter, but the study notes that the issue may lie with the kind of language used on Twitter, which is characterized by shorthand and informal language that can be difficult for algorithms to parse.
This kind of approach to high-tech crime prevention brings up the familiar debate over privacy and the use of users’ date for purposes they didn’t explicitly agree to. The case becomes especially sensitive when data will be used by police to track down criminals. On this point, though he acknowledges post-Snowden societal skepticism regarding data harvesting for state purposes, Greber is indifferent. “People sign up to have their tweets GPS tagged. It’s an opt-in thing, and if you don’t do it, your tweets won’t be collected in this way,” he says. “Twitter is a public service, and I think people are pretty aware of that.”…

The California Report Card


The California Report Card (CRC) is an online platform developed by the CITRIS Data and Democracy Initiative at UC Berkeley and Lt. Governor Gavin Newsom that explores how smartphones and networks can enhance communication between the public and government leaders. The California Report Card allows visitors to grade issues facing California and to suggest issues for future report cards.

The CRC is a mobile-optimized web application that allows participants to advise the state government on timely policy issues.  We are exploring how technology can streamline and structure input from the public to elected officials, to provide them with timely feedback on the changing opinions and priorities of their constituents.

Version 1.0 of the CRC was launched in California on 28 January 2014. Since then, over 7000 people from almost every county have assigned over 20,000 grades to the State of California and suggested issues for the next report card.
Lt. Governor Gavin Newsom: “The California Report Card is a new way for me to keep an ear to the ground.  This new app/website makes it easy for Californians to assign grades and suggest pressing issues that merit our attention.  In the first few weeks, participants conveyed that they approve of our rollout of Obamacare but are very concerned about the future of California schools and universities.  I’m also gaining insights on issues ranging from speed limits to fracking to disaster preparedness.”
“This platform allows us to have our voices heard. The ability to review and grade what others suggest is important. It enables us and elected officials to hear directly how Californians feel.” – Matt Harris, Truck Driver, Ione, CA
“This is the first system that lets us directly express our feelings to government leaders.  I also really enjoy reading and grading the suggestions from other participants.”  – Patricia Ellis Pasko, Senior Care Giver, Apple Valley, CA
“Everyone knows that report cards can motivate learning by providing quantitative feedback on strengths and weaknesses.  Similarly, the California Report Card has potential to motivate Californians and their leaders to learn from each other about timely issues.  As researchers, the patterns of participation and how they vary over time and across geography will help us learn how to design future platforms.” – Prof. Ken Goldberg, UC Berkeley.
It takes only two minutes and works on all screens (best on mobile phones held vertically), just click “Participate“.
Anyone can participate by taking a few minutes to assign grades to the State of California on issues such as: Healthcare, Education, Marriage Equality, Immigrant Rights, and Marijuana Decriminalization. Participants are also invited to enter an online “cafe” to propose issues that they’d like to see included in the next report card (version 2.0 will come out later this Spring).
Lt. Gov. Gavin Newsom and UC Berkeley Professor Ken Goldberg reviewed the data and lessons learned from version 1.0 in a public forum at UC Berkeley on 20 March 2014 that included participants who actively contributed to identifying the most important issues for version 2.0. The event can be viewed at http://bit.ly/1kv6523.
We offer community outreach programs/workshops to train local leaders on how to use the CRC and how to reach and engage under-represented groups (low-income, rural, persons with disabilities, etc.). If you are interested in participating in or hosting a workshop, please contact Brandie Nonnecke at nonnecke@citris-uc.org”