The big cost of using big data in elections


Michael McDonald, Peter Licari and Lia Merivaki in the Washington Post: “In modern campaigns, buzzwords like “microtargeting” and “big data” are often bandied about as essential to victory. These terms refer to the practice of analyzing (or “microtargeting”) millions of voter registration records (“big data”) to predict who will vote and for whom.

If you’ve ever gotten a message from a campaign, there’s a good chance you’ve been microtargeted. Serious campaigns use microtargeting to persuade voters through mailings, phone calls, knocking on doors, and — in our increasingly connected world — social media.

But the big data that fuels such efforts comes at a big price, which can create a serious barrier to entry for candidates and groups seeking to participate in elections — that is, if they are allowed to buy the data at all.

When we asked state election officials about prices and restrictions on who can use their voter registration files, we learned that the rules are unsettlingly arbitrary.

Contrast Arizona and Washington. Arizona sells its statewide voter file for an estimated $32,500, while Washington gives its file away for free. Before jumping to the conclusion that this is a red- state/blue-state thing, consider that Oklahoma gives its file away, too.

A number of states base their prices on a per-record formula, which can massively drive up the price despite the fact that files are often delivered electronically. Alabama sells its records for 1 cent per voter , which yields an approximately $30,000 charge for the lot. Seriously, in this day and age, who prices an electronic database by the record?

Some states will give more data to candidates than to outside groups. Delaware will provide phone numbers to candidates but not to nonprofit organizations doing nonpartisan voter mobilization.

In some states, the voter file is not even available to the general public. States such as South Carolina and Maryland permit access only to residents who are registered voters. States including Kentucky and North Dakota grant access only to campaigns, parties and other political organizations.

We estimate that it would cost roughly $140,000 for an independent presidential campaign or national nonprofit organization to compile a national voter file, and this would not be a one-time cost. Voter lists frequently change as voters are added and deleted.

Guess who most benefits from all the administrative chaos? Political parties and their candidates. Not only are they capable of raising the vast amounts of money needed to purchase the data, but, adding insult to injury, they sometimes don’t even have to. Some states literally bequeath the data to parties at no cost. Alabama goes so far as to give parties a free statewide copy for every election.

Who is hurt by this? Independent candidates and nonprofit organizations that want to run national campaigns but don’t have deep pockets. If someone like Donald Trump launched an independent presidential run, he could buy the necessary data without much difficulty. But a nonprofit focused on mobilizing low-income voters could be stretched thin….(More)”

Big Data and Mass Shootings


Holman W. Jenkins in the Wall Street Journal: “As always, the dots are connected after the fact, when the connecting is easy. …The day may be coming, sooner than we think, when such incidents can be stopped before they get started. A software program alerts police to a social-media posting by an individual of interest in their jurisdiction. An algorithm reminds them why the individual had become a person of interest—a history of mental illness, an episode involving a neighbor. Months earlier, discreet inquires by police had revealed an unhealthy obsession with weapons—key word, unhealthy. There’s no reason why gun owners, range operators and firearms dealers shouldn’t be a source of information for local police seeking information about who might merit special attention.

Sound scary? Big data exists to find the signal among the noise. Your data is the noise. It’s what computerized systems seek to disregard in their quest for information that actually would be useful to act on. Big data is interested in needles, not hay.

Still don’t trust the government? You’re barking up an outdated tree. Consider the absurdly ancillary debate last year on whether the government should be allowed to hold telephone “metadata” when the government already holds vastly more sensitive data on all of us in the form of tax, medical, legal and census records.

All this seems doubly silly given the spacious information about each of us contained in private databases, freely bought and sold by marketers. Bizarre is the idea that Facebook should be able to use our voluntary Facebook postings to decide what we might like to buy, but police shouldn’t use the same information to prevent crime.

Hitachi, the big Japanese company, began testing its crime-prediction software in several unnamed American cities this month. The project, called Hitachi Visualization Predictive Crime Analytics, culls crime records, map and transit data, weather reports, social media and other sources for patterns that might otherwise go unnoticed by police.

Colorado-based Intrado, working with LexisNexis and Motorola Solutions, already sells police a service that instantly scans legal, business and social-media records for information about persons and circumstances that officers may encounter when responding to a 911 call at a specific address. Hundreds of public safety agencies find the system invaluable though that didn’t stop the city of Bellingham, Wash., from rejecting it last year on the odd grounds that such software must be guilty of racial profiling.

Big data is changing how police allocate resources and go about fighting crime. …It once was freely asserted that police weren’t supposed to prevent crime, only solve it. But recent research shows investment in policing actually does reduce crime rates—and produces a large positive return measured in dollars and cents. A day will come when failing to connect the dots in advance of a mass-shooting won’t be a matter for upturned hands. It will be a matter for serious recrimination…(More)

Nudge 2.0


Philipp Hacker: “This essay is both a review of the excellent book “Nudge and the Law. A European Perspective”, edited by Alberto Alemanno and Anne-Lise Sibony, and an assessment of the major themes and challenges that the behavioural analysis of law will and should face in the immediate future.

The book makes important and novel contributions in a range of topics, both on a theoretical and a substantial level. Regarding theoretical issues, four themes stand out: First, it highlights the differences between the EU and the US nudging environments. Second, it questions the reliance on expertise in rulemaking. Third, it unveils behavioural trade-offs that have too long gone unnoticed in behavioural law and economics. And fourth, it discusses the requirement of the transparency of nudges and the related concept of autonomy. Furthermore, the different authors discuss the impact of behavioural regulation on a number of substantial fields of law: health and lifestyle regulation, privacy law, and the disclosure paradigm in private law.

This paper aims to take some of the book’s insights one step further in order to point at crucial challenges – and opportunities – for the future of the behavioural analysis of law. In the last years, the movement has gained tremendously in breadth and depth. It is now time to make it scientifically even more rigorous, e.g. by openly embracing empirical uncertainty and by moving beyond the neo-classical/behavioural dichotomy. Simultaneously, the field ought to discursively readjust its normative compass. Finally and perhaps most strikingly, however, the power of big data holds the promise of taking behavioural interventions to an entirely new level. If these challenges can be overcome, this paper argues, the intersection between law and behavioural sciences will remain one of the most fruitful approaches to legal analysis in Europe and beyond….(More)”

Data-Driven Innovation: Big Data for Growth and Well-Being


“A new OECD report on data-driven innovation finds that countries could be getting much more out of data analytics in terms of economic and social gains if governments did more to encourage investment in “Big Data” and promote data sharing and reuse.

The migration of economic and social activities to the Internet and the advent of The Internet of Things – along with dramatically lower costs of data collection, storage and processing and rising computing power – means that data-analytics is increasingly driving innovation and is potentially an important new source of growth.

The report suggest countries act to seize these benefits, by training more and better data scientists, reducing barriers to cross-border data flows, and encouraging investment in business processes to incorporate data analytics.

Few companies outside of the ICT sector are changing internal procedures to take advantage of data. For example, data gathered by companies’ marketing departments is not always used by other departments to drive decisions and innovation. And in particular, small and medium-sized companies face barriers to the adoption of data-related technologies such as cloud computing, partly because they have difficulty implementing organisational change due to limited resources, including the shortage of skilled personnel.

At the same time, governments will need to anticipate and address the disruptive effects of big data on the economy and overall well-being, as issues as broad as privacy, jobs, intellectual property rights, competition and taxation will be impacted. Read the Policy Brief

TABLE OF CONTENTS
Preface
Foreword
Executive summary
The phenomenon of data-driven innovation
Mapping the global data ecosystem and its points of control
How data now drive innovation
Drawing value from data as an infrastructure
Building trust for data-driven innovation
Skills and employment in a data-driven economy
Promoting data-driven scientific research
The evolution of health care in a data-rich environment
Cities as hubs for data-driven innovation
Governments leading by example with public sector data

 

Big Data Privacy Scenarios


E. Bruce, K. Sollins, M. Vernon, and D. Weitzner at D-Space@MIT: “This paper is the first in a series on privacy in Big Data. As an outgrowth of a series of workshops on the topic, the Big Data Privacy Working Group undertook a study of a series of use scenarios to highlight the challenges to privacy that arise in the Big Data arena. This is a report on those scenarios. The deeper question explored by this exercise is what is distinctive about privacy in the context of Big Data. In addition, we discuss an initial list of issues for privacy that derive specifically from the nature of Big Data. These derive from observations across the real world scenarios and use cases explored in this project as well as wider reading and discussions:

* Scale: The sheer size of the datasets leads to challenges in creating, managing and applying privacy policies.

* Diversity: The increased likelihood of more and more diverse participants in Big Data collection, management, and use, leads to differing agendas and objectives. By nature, this is likely to lead to contradictory agendas and objectives.

* Integration: With increased data management technologies (e.g. cloud services, data lakes, and so forth), integration across datasets, with new and often surprising opportunities for cross-product inferences, will also come new information about individuals and their behaviors.

* Impact on secondary participants: Because many pieces of information are reflective of not only the targeted subject, but secondary, often unattended, participants, the inferences and resulting information will increasingly be reflective of other people, not originally considered as the subject of privacy concerns and approaches.

* Need for emergent policies for emergent information: As inferences over merged data sets occur, emergent information or understanding will occur.

Although each unique data set may have existing privacy policies and enforcement mechanisms, it is not clear that it is possible to develop the requisite and appropriate emerged privacy policies and appropriate enforcement of them automatically…(More)”

What we can learn from the failure of Google Flu Trends


David Lazer and Ryan Kennedy at Wired: “….The issue of using big data for the common good is far more general than Google—which deserves credit, after all, for offering the occasional peek at their data. These records exist because of a compact between individual consumers and the corporation. The legalese of that compact is typically obscure (how many people carefully read terms and conditions?), but the essential bargain is that the individual gets some service, and the corporation gets some data.

What is left out that bargain is the public interest. Corporations and consumers are part of a broader society, and many of these big data archives offer insights that could benefit us all. As Eric Schmidt, CEO of Google, has said, “We must remember that technology remains a tool of humanity.” How can we, and corporate giants, then use these big data archives as a tool to serve humanity?

Google’s sequel to GFT, done right, could serve as a model for collaboration around big data for the public good. Google is making flu-related search data available to the CDC as well as select research groups. A key question going forward will be whether Google works with these groups to improve the methodology underlying GFT. Future versions should, for example, continually update the fit of the data to flu prevalence—otherwise, the value of the data stream will rapidly decay.

This is just an example, however, of the general challenge of how to build models of collaboration amongst industry, government, academics, and general do-gooders to use big data archives to produce insights for the public good. This came to the fore with the struggle (and delay) for finding a way to appropriately share mobile phone data in west Africa during the Ebola epidemic (mobile phone data are likely the best tool for understanding human—and thus Ebola—movement). Companies need to develop efforts to share data for the public good in a fashion that respects individual privacy.

There is not going to be a single solution to this issue, but for starters, we are pushing for a “big data” repository in Boston to allow holders of sensitive big data to share those collections with researchers while keeping them totally secure. The UN has its Global Pulse initiative, setting up collaborative data repositories around the world. Flowminder, based in Sweden, is a nonprofit dedicated to gathering mobile phone data that could help in response to disasters. But these are still small, incipient, and fragile efforts.

The question going forward now is how build on and strengthen these efforts, while still guarding the privacy of individuals and the proprietary interests of the holders of big data….(More)”

Datafication and empowerment: How the open data movement re-articulates notions of democracy, participation, and journalism


Stefan Baack at Big Data and Society: “This article shows how activists in the open data movement re-articulate notions of democracy, participation, and journalism by applying practices and values from open source culture to the creation and use of data. Focusing on the Open Knowledge Foundation Germany and drawing from a combination of interviews and content analysis, it argues that this process leads activists to develop new rationalities around datafication that can support the agency of datafied publics. Three modulations of open source are identified: First, by regarding data as a prerequisite for generating knowledge, activists transform the sharing of source code to include the sharing of raw data. Sharing raw data should break the interpretative monopoly of governments and would allow people to make their own interpretation of data about public issues. Second, activists connect this idea to an open and flexible form of representative democracy by applying the open source model of participation to political participation. Third, activists acknowledge that intermediaries are necessary to make raw data accessible to the public. This leads them to an interest in transforming journalism to become an intermediary in this sense. At the same time, they try to act as intermediaries themselves and develop civic technologies to put their ideas into practice. The article concludes with suggesting that the practices and ideas of open data activists are relevant because they illustrate the connection between datafication and open source culture and help to understand how datafication might support the agency of publics and actors outside big government and big business….(More)”

Personalising data for development


Wolfgang Fengler and Homi Kharas in the Financial Times: “When world leaders meet this week for the UN’s general assembly to adopt the Sustainable Development Goals (SDGs), they will also call for a “data revolution”. In a world where almost everyone will soon have access to a mobile phone, where satellites will take high-definition pictures of the whole planet every three days, and where inputs from sensors and social media make up two thirds of the world’s new data, the opportunities to leverage this power for poverty reduction and sustainable development are enormous. We are also on the verge of major improvements in government administrative data and data gleaned from the activities of private companies and citizens, in big and small data sets.

But these opportunities are yet to materialize in any scale. In fact, despite the exponential growth in connectivity and the emergence of big data, policy making is rarely based on good data. Almost every report from development institutions starts with a disclaimer highlighting “severe data limitations”. Like castaways on an island, surrounded with water they cannot drink unless the salt is removed, today’s policy makers are in a sea of data that need to be refined and treated (simplified and aggregated) to make them “consumable”.

To make sense of big data, we used to depend on data scientists, computer engineers and mathematicians who would process requests one by one. But today, new programs and analytical solutions are putting big data at anyone’s fingertips. Tomorrow, it won’t be technical experts driving the data revolution but anyone operating a smartphone. Big data will become personal. We will be able to monitor and model social and economic developments faster, more reliably, more cheaply and on a far more granular scale. The data revolution will affect both the harvesting of data through new collection methods, and the processing of data through new aggregation and communication tools.

In practice, this means that data will become more actionable by becoming more personal, more timely and more understandable. Today, producing a poverty assessment and poverty map takes at least a year: it involves hundreds of enumerators, lengthy interviews and laborious data entry. In the future, thanks to hand-held connected devices, data collection and aggregation will happen in just a few weeks. Many more instances come to mind where new and higher-frequency data could generate development breakthroughs: monitoring teacher attendance, stocks and quality of pharmaceuticals, or environmental damage, for example…..

Despite vast opportunities, there are very few examples that have generated sufficient traction and scale to change policy and behaviour and create the feedback loops to further improve data quality. Two tools have personalised the abstract subjects of environmental degradation and demography (see table):

  • Monitoring forest fires. The World Resources Institute has launched Global Forest Watch, which enables users to monitor forest fires in near real time, and overlay relevant spatial information such as property boundaries and ownership data to be developed into a model to anticipate the impact on air quality in affected areas in Indonesia, Singapore and Malaysia.
  • Predicting your own life expectancy. The World Population Program developed a predictive tool – www.population.io – showing each person’s place in the distribution of world population and corresponding statistical life expectancy. In just a few months, this prototype attracted some 2m users who shared their results more than 25,000 times on social media. The traction of the tool resulted from making demography personal and converting an abstract subject matter into a question of individual ranking and life expectancy.

A new Global Partnership for Sustainable Development Data will be launched at the time of the UN General Assembly….(More)”

Research on digital identity ecosystems


Francesca Bria et al at NESTA/D-CENT: “This report presents a concrete analysis of the latest evolution of the identity ecosystem in the big data context, focusing on the economic and social value of data and identity within the current digital economy. This report also outlines economic, policy, and technical alternatives to develop an identity ecosystem and management of data for the common good that respects citizens’ rights, privacy and data protection.

Key findings

  • This study presents a review of the concept of identity and a map of the key players in the identity industry (such as data brokers and data aggregators), including empirical case studies of identity management in key sectors.
    ….
  • The “datafication” of individuals’ social lives, thoughts and moves is a valuable commodity and constitutes the backbone of the “identity market” within which “data brokers” (collectors, purchasers or sellers) play key different roles in creating the market by offering various services such as fraud, customer relation, predictive analytics, marketing and advertising.
  • Economic, political and technical alternatives for identity to preserve trust, privacy and data ownership in today’s big data environments are formulated. The report looks into access to data, economic strategies to manage data as commons, consent and licensing, tools to control data, and terms of services. It also looks into policy strategies such as privacy and data protection by design and trust and ethical frameworks. Finally, it assesses technical implementations looking at identity and anonymity, cryptographic tools; security; decentralisation and blockchains. It also analyses the future steps needed in order to move into the suggested technical strategies….(More)”

Data Collaboratives: Sharing Public Data in Private Hands for Social Good


Beth Simone Noveck (The GovLab) in Forbes: “Sensor-rich consumer electronics such as mobile phones, wearable devices, commercial cameras and even cars are collecting zettabytes of data about the environment and about us. According to one McKinsey study, the volume of data is growing at fifty percent a year. No one needs convincing that these private storehouses of information represent a goldmine for business, but these data can do double duty as rich social assets—if they are shared wisely.

Think about a couple of recent examples: Sharing data held by businesses and corporations (i.e. public data in private hands) can help to improve policy interventions. California planners make water allocation decisions based upon expertise, data and analytical tools from public and private sources, including Intel, the Earth Research Institute at the University of California at Santa Barbara, and the World Food Center at the University of California at Davis.

In Europe, several phone companies have made anonymized datasets available, making it possible for researchers to track calling and commuting patterns and gain better insight into social problems from unemployment to mental health. In the United States, LinkedIn is providing free data about demand for IT jobs in different markets which, when combined with open data from the Department of Labor, helps communities target efforts around training….

Despite the promise of data sharing, these kind of data collaboratives remain relatively new. There is a need toaccelerate their use by giving companies strong tax incentives for sharing data for public good. There’s a need for more study to identify models for data sharing in ways that respect personal privacy and security and enable companies to do well by doing good. My colleagues at The GovLab together with UN Global Pulse and the University of Leiden, for example, published this initial analysis of terms and conditions used when exchanging data as part of a prize-backed challenge. We also need philanthropy to start putting money into “meta research;” it’s not going to be enough to just open up databases: we need to know if the data is good.

After years of growing disenchantment with closed-door institutions, the push for greater use of data in governing can be seen as both a response and as a mirror to the Big Data revolution in business. Although more than 1,000,000 government datasets about everything from air quality to farmers markets are openly available online in downloadable formats, much of the data about environmental, biometric, epidemiological, and physical conditions rest in private hands. Governing better requires a new empiricism for developing solutions together. That will depend on access to these private, not just public data….(More)”