The big cost of using big data in elections


Michael McDonald, Peter Licari and Lia Merivaki in the Washington Post: “In modern campaigns, buzzwords like “microtargeting” and “big data” are often bandied about as essential to victory. These terms refer to the practice of analyzing (or “microtargeting”) millions of voter registration records (“big data”) to predict who will vote and for whom.

If you’ve ever gotten a message from a campaign, there’s a good chance you’ve been microtargeted. Serious campaigns use microtargeting to persuade voters through mailings, phone calls, knocking on doors, and — in our increasingly connected world — social media.

But the big data that fuels such efforts comes at a big price, which can create a serious barrier to entry for candidates and groups seeking to participate in elections — that is, if they are allowed to buy the data at all.

When we asked state election officials about prices and restrictions on who can use their voter registration files, we learned that the rules are unsettlingly arbitrary.

Contrast Arizona and Washington. Arizona sells its statewide voter file for an estimated $32,500, while Washington gives its file away for free. Before jumping to the conclusion that this is a red- state/blue-state thing, consider that Oklahoma gives its file away, too.

A number of states base their prices on a per-record formula, which can massively drive up the price despite the fact that files are often delivered electronically. Alabama sells its records for 1 cent per voter , which yields an approximately $30,000 charge for the lot. Seriously, in this day and age, who prices an electronic database by the record?

Some states will give more data to candidates than to outside groups. Delaware will provide phone numbers to candidates but not to nonprofit organizations doing nonpartisan voter mobilization.

In some states, the voter file is not even available to the general public. States such as South Carolina and Maryland permit access only to residents who are registered voters. States including Kentucky and North Dakota grant access only to campaigns, parties and other political organizations.

We estimate that it would cost roughly $140,000 for an independent presidential campaign or national nonprofit organization to compile a national voter file, and this would not be a one-time cost. Voter lists frequently change as voters are added and deleted.

Guess who most benefits from all the administrative chaos? Political parties and their candidates. Not only are they capable of raising the vast amounts of money needed to purchase the data, but, adding insult to injury, they sometimes don’t even have to. Some states literally bequeath the data to parties at no cost. Alabama goes so far as to give parties a free statewide copy for every election.

Who is hurt by this? Independent candidates and nonprofit organizations that want to run national campaigns but don’t have deep pockets. If someone like Donald Trump launched an independent presidential run, he could buy the necessary data without much difficulty. But a nonprofit focused on mobilizing low-income voters could be stretched thin….(More)”

Infographic: World Statistics Day 2015


Press Release: “The U.S. Census Bureau will join statistical organizations throughout the world to celebrate the second World Statistics Day on Oct. 20, 2015.

This interactive infographic is a compilation of news graphics that highlights the wide range of ways the Census Bureau supports this year’s theme of “Better data. Better lives.”

The Census Bureau uses statistics to provide critical and timely information about the people, places and economy of the United States.

For more information on World Statistics Day 2015, please see the links provided below.

Where the right to know comes from


Michael Schudson in Columbia Journalism Review: “…what began as an effort to keep the executive under check by the Congress became a law that helped journalists, historians, and ordinary citizens monitor federal agencies. Nearly 50 years later, it may all sound easy and obvious. It was neither. And this burst of political engagement is rarely, if ever, mentioned by journalists themselves as an exception to normal “acts of journalism.”

But how did it happen at all? In 1948, the American Society of Newspaper Editors set up its first-ever committee on government restrictions on the freedom to gather and publish news. It was called the “Committee on World Freedom of Information”—a name that implied that limiting journalists’ access or straightforward censorship was a problem in other countries. The committee protested Argentina’s restrictions on what US correspondents could report, censorship in Guatemala, and—closer to home—US military censorship in occupied Japan.

When the ASNE committee turned to the problem of secrecy in the US government in the early 1950s, it chose to actively criticize such secrecy, but not to “become a legislative committee.” Even in 1953, when ASNE leaders realized that significant progress on government secrecy might require federal legislation, they concluded that “watching all such legislation” would be an important task for the committee, but did not suggest taking a public position.

Representative Moss changed this. Moss was a small businessman who had served several terms in the California legislature before his election to Congress in 1952. During his first term, he requested some data from the Civil Service Commission about dismissals of government employees on suspicion of disloyalty. The commission flatly turned him down. “My experience in Washington quickly proved that you had a hell of a time getting any information,” Moss recalled. Two years later, a newly re-elected Moss became chair of a House subcommittee on government information….(More)”

Can non-Western democracy help to foster political transformation?


Richard Youngs at Open Democracy: “…many non-Western countries are showing signs of a newly-vibrant civic politics, organized in ways that are not centered on NGOs but on more loosely structured social movements in participatory forms of democracy where active citizenship is crucial—not just structured or formal, representative democratic institutions. Bolivia is a good example.

Many Western governments were skeptical about President Evo Morales’ political project, fearing that he would prove to be just as authoritarian as Hugo Chavez in Venezuela. But some Western donors (including Germany and the European Union) have already increased their support to indigenous social movements in Bolivia because they’ve become a vital channel of influence and accountability between government and society.

Secondly, it’s clear that the political dimensions of democracy will be undermined if economic conditions and inequalities are getting worse, so democracy promotion efforts need to be delinked from pressures to adopt neo-liberal economic policies. Western interests need to do more to prove that they are not supporting democracy primarily as a means to further their economic interest in ‘free markets.’ That’s why the European Union is supporting a growing number of projects designed to build up social insurance schemes during the early phases of democratic transitions. European diplomats, at least, say that they see themselves as supporters of social and economic democracy.

Donors are becoming more willing to support the role of labor unions in pro-democracy coalition-building; and to protect labor standards as a crucial part of political transitions in countries as diverse as Tunisia, Georgia, China, Egypt and Ecuador. But they should do more to assess how the embedded structures of economic power can undermine the quality of democratic processes. Support for civil society organizations that are keen on exploring heterodox economic models should also be stepped up.

Thirdly, non-Western structures and traditions can help to reduce violent conflict successfully. Tribal chiefs, traditional decision-making circles and customary dispute resolution mechanisms are commonplace in Africa and Asia, and have much to teach their counterparts in the West. In Afghanistan, for example, international organizations realized that the standard institutions of Western liberal democracy were gaining little traction, and were probably deepening rather than healing pre-existing divisions, so they’ve started to support local-level deliberative forums instead.

Something similar is happening in the Balkans, where the United States and the European Union are giving priority to locally tailored, consensual power-sharing arrangements. The United Nations is working with customary justice systems in Somalia. And in South Sudan and Kenya, donors have worked with tribal chiefs and supported traditional authorities to promote a better understanding of human rights and gender justice issues. These forms of power-sharing and ‘consensual communitarianism’ can be quite effective in protecting minorities while also encouraging dialogue and deliberation.

As these brief examples show, different countries can both offer and receive ideas about democratic transformation regardless of geography, though this is never straightforward. It involves finding a balance between defending genuinely-universal norms on the one hand, and encouraging democratic experimentation on the other. This is a thin line to walk, and it requires, for example, recognition that the basic precepts of liberal democracy are not synonymous with what can be seen as an amoral individualism, particularly in highly religious communities.

Pro-democracy reformers and civic groups in non-Western countries often take international organizations to task for pushing too hard on questions of ‘Western liberal rights’ rather than supporting variations to the standard, individualist template, even where tribal structures and traditional conflict-resolution mechanisms work reasonably well. This has led to resistance against international support in places as diverse as Libya, Mali and Pakistan…..

Academic critical theorists argue that Western democracy promoters fail to take alternative models of democracy on board because they would endanger their own geostrategic and economic interests….(More)”

How the USGS uses Twitter data to track earthquakes


Twitter Blog: “After the disastrous Sichuan earthquake in 2008, people turned to Twitter to share firsthand information about the earthquake. What amazed many was the impression that Twitter was faster at reporting the earthquake than the U.S. Geological Survey (USGS), the official government organization in charge of tracking such events.

This Twitter activity wasn’t a big surprise to the USGS. The USGS National Earthquake Information Center (NEIC) processes about 2,000 realtime earthquake sensors, with the majority based in the United States. That leaves a lot of empty space in the world with no sensors. On the other hand, there are hundreds of millions of people using Twitter who can report earthquakes. At first, the USGS staff was a bit skeptical that Twitter could be used as a detection system for earthquakes – but when they looked into it, they were surprised at the effectiveness of Twitter data for detection.

USGS staffers Paul Earle, a seismologist, and Michelle Guy, a software developer, teamed up to look at how Twitter data could be used for earthquake detection and verification. By using Twitter’s Public API, they decided to use the same time series event detection method they use when detecting earthquakes. This gave them a baseline for earthquake-related chatter, but they decided to dig in even further. They found that people Tweeting about actual earthquakes kept their Tweets really short, even just to ask, “earthquake?” Concluding that people who are experiencing earthquakes aren’t very chatty, they started filtering out Tweets with more than seven words. They also recognized that people sharing links or the size of the earthquake were significantly less likely to be offering firsthand reports, so they filtered out any Tweets sharing a link or a number. Ultimately, this filtered stream proved to be very significant at determining when earthquakes occurred globally.

USGS Modeling Twitter Data to Detect Earthquakes

While I was at the USGS office in Golden, Colo. interviewing Michelle and Paul, three earthquakes happened in a relatively short time. Using Twitter data, their system was able to pick up on an aftershock in Chile within one minute and 20 seconds – and it only took 14 Tweets from the filtered stream to trigger an email alert. The other two earthquakes, off Easter Island and Indonesia, weren’t picked up because they were not widely felt…..

The USGS monitors for earthquakes in many languages, and the words used can be a clue as to the magnitude and location of the earthquake. Chile has two words for earthquakes: terremotoand temblor; terremoto is used to indicate a bigger quake. This one in Chile started with people asking if it was a terremoto, but others realizing that it was a temblor.

As the USGS team notes, Twitter data augments their own detection work on felt earthquakes. If they’re getting reports of an earthquake in a populated area but no Tweets from there, that’s a good indicator to them that it’s a false alarm. It’s also very cost effective for the USGS, because they use Twitter’s Public API and open-source software such as Kibana and ElasticSearch to help determine when earthquakes occur….(More)”

As a Start to NYC Prison Reform, Jail Data Will Be Made Public


Brentin Mock at CityLab: “…In New York City, 40 percent of the jailed population are there because they couldn’t afford bail—most of them for nonviolent drug crimes. The city spends $42 million on average annually incarcerating non-felony defendants….

Wednesday, NYC Mayor Bill de Blasio signed into law legislation aimed at helping correct these bail problems, providing inmates a bill of rights for when they’re detained and addressing other problems that lead to overstuffing city jails with poor people of color.

The omnibus package of criminal justice reform bills will require the city to produce better accounting of how many people are in city jails, what they’re average incarceration time is while waiting for trial, the average bail amounts imposed on defendants, and a whole host of other data points on incarceration. Under the new legislation, the city will have to release reports quarterly and semi-annually to the public—much of it from data now sheltered within the city’s Department of Corrections.

“This is bringing sunshine to information that is already being looked at internally, but is better off being public data,” New York City council member Helen Rosenthal tells CityLab. “We can better understand what polices we need to change if we have the data to understand what’s going on in the system.”…

The city passed a package of transparency bills last month that focused on Rikers, but the legislation passed Wednesday will focus on the city’s courts and jails system as a whole….(More)”

Anxieties of Democracy


Debate at the Boston Review opened by Ira Katznelson: “…..Across the range of established democracies, we see skepticism bordering on cynicism about whether parliamentary governments can successfully address pressing domestic and global challenges. These doubts about representative democracy speak to both its fairness and its ability to make good policy.

Since the late eighteenth century, liberal constitutional regimes have recurrently collided with forms of autocratic rule—including fascism and communism—that claim moral superiority and greater efficacy. Today, there is no formal autocratic alternative competing with democracy for public allegiance. Instead, two other concerns characterize current debates. First, there is a sense that constitutional democratic forms, procedures, and practices are softening in the face of allegedly more authentic and more efficacious types of political participation—those that take place outside representative institutions and seem closer to the people. There is also widespread anxiety that national borders no longer define a zone of security, a place more or less safe from violent threats and insulated from rules and conditions established by transnational institutions and seemingly inexorable global processes.

These are recent anxieties. One rarely heard them voiced in liberal democracies when, in 1989, Francis Fukuyama designated the triumph of free regimes and free markets “the end of history.” Fukuyama described “the universalization of Western liberal democracy as the final form of human government,“ a “victory of liberalism” in “the realm of ideas and consciousness,” even if “as yet incomplete in the real or material world.” Tellingly, the disruption of this seemingly irresistible trend has recently prompted him to ruminate on the brittleness of democratic institutions across the globe.

Perhaps today’s representative democracies—the ones that do not appear to be candidates for collapse or supersession—are merely confronting ephemeral worries. But the challenge seems starker: a profound crisis of moral legitimacy, practical capacity, and institutional sustainability….(More)

Reactions by:

 

What we can learn from the failure of Google Flu Trends


David Lazer and Ryan Kennedy at Wired: “….The issue of using big data for the common good is far more general than Google—which deserves credit, after all, for offering the occasional peek at their data. These records exist because of a compact between individual consumers and the corporation. The legalese of that compact is typically obscure (how many people carefully read terms and conditions?), but the essential bargain is that the individual gets some service, and the corporation gets some data.

What is left out that bargain is the public interest. Corporations and consumers are part of a broader society, and many of these big data archives offer insights that could benefit us all. As Eric Schmidt, CEO of Google, has said, “We must remember that technology remains a tool of humanity.” How can we, and corporate giants, then use these big data archives as a tool to serve humanity?

Google’s sequel to GFT, done right, could serve as a model for collaboration around big data for the public good. Google is making flu-related search data available to the CDC as well as select research groups. A key question going forward will be whether Google works with these groups to improve the methodology underlying GFT. Future versions should, for example, continually update the fit of the data to flu prevalence—otherwise, the value of the data stream will rapidly decay.

This is just an example, however, of the general challenge of how to build models of collaboration amongst industry, government, academics, and general do-gooders to use big data archives to produce insights for the public good. This came to the fore with the struggle (and delay) for finding a way to appropriately share mobile phone data in west Africa during the Ebola epidemic (mobile phone data are likely the best tool for understanding human—and thus Ebola—movement). Companies need to develop efforts to share data for the public good in a fashion that respects individual privacy.

There is not going to be a single solution to this issue, but for starters, we are pushing for a “big data” repository in Boston to allow holders of sensitive big data to share those collections with researchers while keeping them totally secure. The UN has its Global Pulse initiative, setting up collaborative data repositories around the world. Flowminder, based in Sweden, is a nonprofit dedicated to gathering mobile phone data that could help in response to disasters. But these are still small, incipient, and fragile efforts.

The question going forward now is how build on and strengthen these efforts, while still guarding the privacy of individuals and the proprietary interests of the holders of big data….(More)”

Harnessing the Data Revolution for Sustainable Development


US State Department Fact Sheet on “U.S. Government Commitments and Collaboration with the Global Partnership for Sustainable Development Data”: “On September 27, 2015, the member states of the United Nations agreed to a set of Sustainable Development Goals (Global Goals) that define a common agenda to achieve inclusive growth, end poverty, and protect the environment by 2030. The Global Goals build on tremendous development gains made over the past decade, particularly in low- and middle-income countries, and set actionable steps with measureable indicators to drive progress. The availability and use of high quality data is essential to measuring and achieving the Global Goals. By harnessing the power of technology, mobilizing new and open data sources, and partnering across sectors, we will achieve these goals faster and make their progress more transparent.

Harnessing the data revolution is a critical enabler of the global goals—not only to monitor progress, but also to inclusively engage stakeholders at all levels – local, regional, national, global—to advance evidence-based policies and programs to reach those who need it most. Data can show us where girls are at greatest risk of violence so we can better prevent it; where forests are being destroyed in real-time so we can protect them; and where HIV/AIDS is enduring so we can focus our efforts and finish the fight. Data can catalyze private investment; build modern and inclusive economies; and support transparent and effective investment of resources for social good…..

The Global Partnership for Sustainable Development Data (Global Data Partnership), launched on the sidelines of the 70th United Nations General Assembly, is mobilizing a range of data producers and users—including governments, companies, civil society, data scientists, and international organizations—to harness the data revolution to achieve and measure the Global Goals. Working together, signatories to the Global Data Partnership will address the barriers to accessing and using development data, delivering outcomes that no single stakeholder can achieve working alone….The United States, through the U.S. President’s Emergency Plan for AIDS Relief (PEPFAR), is joining a consortium of funders to seed this initiative. The U.S. Government has many initiatives that are harnessing the data revolution for impact domestically and internationally. Highlights of our international efforts are found below:

Health and Gender

Country Data Collaboratives for Local Impact – PEPFAR and the Millennium Challenge Corporation(MCC) are partnering to invest $21.8 million in Country Data Collaboratives for Local Impact in sub-Saharan Africa that will use data on HIV/AIDS, global health, gender equality, and economic growth to improve programs and policies. Initially, the Country Data Collaboratives will align with and support the objectives of DREAMS, a PEPFAR, Bill & Melinda Gates Foundation, and Girl Effect partnership to reduce new HIV infections among adolescent girls and young women in high-burden areas.

Measurement and Accountability for Results in Health (MA4Health) Collaborative – USAID is partnering with the World Health Organization, the World Bank, and over 20 other agencies, countries, and civil society organizations to establish the MA4Health Collaborative, a multi-stakeholder partnership focused on reducing fragmentation and better aligning support to country health-system performance and accountability. The Collaborative will provide a vehicle to strengthen country-led health information platforms and accountability systems by improving data and increasing capacity for better decision-making; facilitating greater technical collaboration and joint investments; and developing international standards and tools for better information and accountability. In September 2015, partners agreed to a set of common strategic and operational principles, including a strong focus on 3–4 pathfinder countries where all partners will initially come together to support country-led monitoring and accountability platforms. Global actions will focus on promoting open data, establishing common norms and standards, and monitoring progress on data and accountability for the Global Goals. A more detailed operational plan will be developed through the end of the year, and implementation will start on January 1, 2016.

Data2X: Closing the Gender GapData2X is a platform for partners to work together to identify innovative sources of data, including “big data,” that can provide an evidence base to guide development policy and investment on gender data. As part of its commitment to Data2X—an initiative of the United Nations Foundation, Hewlett Foundation, Clinton Foundation, and Bill & Melinda Gates Foundation—PEPFAR and the Millennium Challenge Corporation (MCC) are working with partners to sponsor an open data challenge to incentivize the use of gender data to improve gender policy and practice….(More)”

See also: Data matters: the Global Partnership for Sustainable Development Data. Speech by UK International Development Secretary Justine Greening at the launch of the Global Partnership for Sustainable Development Data.

Researchers wrestle with a privacy problem


Erika Check Hayden at Nature: “The data contained in tax returns, health and welfare records could be a gold mine for scientists — but only if they can protect people’s identities….In 2011, six US economists tackled a question at the heart of education policy: how much does great teaching help children in the long run?

They started with the records of more than 11,500 Tennessee schoolchildren who, as part of an experiment in the 1980s, had been randomly assigned to high- and average-quality teachers between the ages of five and eight. Then they gauged the children’s earnings as adults from federal tax returns filed in the 2000s. The analysis showed that the benefits of a good early education last for decades: each year of better teaching in childhood boosted an individual’s annual earnings by some 3.5% on average. Other data showed the same individuals besting their peers on measures such as university attendance, retirement savings, marriage rates and home ownership.

The economists’ work was widely hailed in education-policy circles, and US President Barack Obama cited it in his 2012 State of the Union address when he called for more investment in teacher training.

But for many social scientists, the most impressive thing was that the authors had been able to examine US federal tax returns: a closely guarded data set that was then available to researchers only with tight restrictions. This has made the study an emblem for both the challenges and the enormous potential power of ‘administrative data’ — information collected during routine provision of services, including tax returns, records of welfare benefits, data on visits to doctors and hospitals, and criminal records. Unlike Internet searches, social-media posts and the rest of the digital trails that people establish in their daily lives, administrative data cover entire populations with minimal self-selection effects: in the US census, for example, everyone sampled is required by law to respond and tell the truth.

This puts administrative data sets at the frontier of social science, says John Friedman, an economist at Brown University in Providence, Rhode Island, and one of the lead authors of the education study “They allow researchers to not just get at old questions in a new way,” he says, “but to come at problems that were completely impossible before.”….

But there is also concern that the rush to use these data could pose new threats to citizens’ privacy. “The types of protections that we’re used to thinking about have been based on the twin pillars of anonymity and informed consent, and neither of those hold in this new world,” says Julia Lane, an economist at New York University. In 2013, for instance, researchers showed that they could uncover the identities of supposedly anonymous participants in a genetic study simply by cross-referencing their data with publicly available genealogical information.

Many people are looking for ways to address these concerns without inhibiting research. Suggested solutions include policy measures, such as an international code of conduct for data privacy, and technical methods that allow the use of the data while protecting privacy. Crucially, notes Lane, although preserving privacy sometimes complicates researchers’ lives, it is necessary to uphold the public trust that makes the work possible.

“Difficulty in access is a feature, not a bug,” she says. “It should be hard to get access to data, but it’s very important that such access be made possible.” Many nations collect administrative data on a massive scale, but only a few, notably in northern Europe, have so far made it easy for researchers to use those data.

In Denmark, for instance, every newborn child is assigned a unique identification number that tracks his or her lifelong interactions with the country’s free health-care system and almost every other government service. In 2002, researchers used data gathered through this identification system to retrospectively analyse the vaccination and health status of almost every child born in the country from 1991 to 1998 — 537,000 in all. At the time, it was the largest study ever to disprove the now-debunked link between measles vaccination and autism.

Other countries have begun to catch up. In 2012, for instance, Britain launched the unified UK Data Service to facilitate research access to data from the country’s census and other surveys. A year later, the service added a new Administrative Data Research Network, which has centres in England, Scotland, Northern Ireland and Wales to provide secure environments for researchers to access anonymized administrative data.

In the United States, the Census Bureau has been expanding its network of Research Data Centers, which currently includes 19 sites around the country at which researchers with the appropriate permissions can access confidential data from the bureau itself, as well as from other agencies. “We’re trying to explore all the available ways that we can expand access to these rich data sets,” says Ron Jarmin, the bureau’s assistant director for research and methodology.

In January, a group of federal agencies, foundations and universities created the Institute for Research on Innovation and Science at the University of Michigan in Ann Arbor to combine university and government data and measure the impact of research spending on economic outcomes. And in July, the US House of Representatives passed a bipartisan bill to study whether the federal government should provide a central clearing house of statistical administrative data.

Yet vast swathes of administrative data are still inaccessible, says George Alter, director of the Inter-university Consortium for Political and Social Research based at the University of Michigan, which serves as a data repository for approximately 760 institutions. “Health systems, social-welfare systems, financial transactions, business records — those things are just not available in most cases because of privacy concerns,” says Alter. “This is a big drag on research.”…

Many researchers argue, however, that there are legitimate scientific uses for such data. Jarmin says that the Census Bureau is exploring the use of data from credit-card companies to monitor economic activity. And researchers funded by the US National Science Foundation are studying how to use public Twitter posts to keep track of trends in phenomena such as unemployment.

 

….Computer scientists and cryptographers are experimenting with technological solutions. One, called differential privacy, adds a small amount of distortion to a data set, so that querying the data gives a roughly accurate result without revealing the identity of the individuals involved. The US Census Bureau uses this approach for its OnTheMap project, which tracks workers’ daily commutes. ….In any case, although synthetic data potentially solve the privacy problem, there are some research applications that cannot tolerate any noise in the data. A good example is the work showing the effect of neighbourhood on earning potential3, which was carried out by Raj Chetty, an economist at Harvard University in Cambridge, Massachusetts. Chetty needed to track specific individuals to show that the areas in which children live their early lives correlate with their ability to earn more or less than their parents. In subsequent studies5, Chetty and his colleagues showed that moving children from resource-poor to resource-rich neighbourhoods can boost their earnings in adulthood, proving a causal link.

Secure multiparty computation is a technique that attempts to address this issue by allowing multiple data holders to analyse parts of the total data set, without revealing the underlying data to each other. Only the results of the analyses are shared….(More)”