Regulation of Big Data: Perspectives on Strategy, Policy, Law and Privacy


Paper by Pompeu CasanovasLouis de KokerDanuta Mendelson and David Watts: “…presents four complementary perspectives stemming from governance, law, ethics, and computer science. Big, Linked, and Open Data constitute complex phenomena whose economic and political dimensions require a plurality of instruments to enhance and protect citizens’ rights. Some conclusions are offered in the end to foster a more general discussion.

This article contends that the effective regulation of Big Data requires a combination of legal tools and other instruments of a semantic and algorithmic nature. It commences with a brief discussion of the concept of Big Data and views expressed by Australian and UK participants in a study of Big Data use in a law enforcement and national security perspective. The second part of the article highlights the UN’s Special Rapporteur on the Right to Privacy interest in the themes and the focus of their new program on Big Data. UK law reforms regarding authorisation of warrants for the exercise of bulk data powers is discussed in the third part. Reflecting on these developments, the paper closes with an exploration of the complex relationship between law and Big Data and the implications for regulation and governance of Big Data….(More)”.

Big Data, Data Science, and Civil Rights


Paper by Solon Barocas, Elizabeth Bradley, Vasant Honavar, and Foster Provost:  “Advances in data analytics bring with them civil rights implications. Data-driven and algorithmic decision making increasingly determine how businesses target advertisements to consumers, how police departments monitor individuals or groups, how banks decide who gets a loan and who does not, how employers hire, how colleges and universities make admissions and financial aid decisions, and much more. As data-driven decisions increasingly affect every corner of our lives, there is an urgent need to ensure they do not become instruments of discrimination, barriers to equality, threats to social justice, and sources of unfairness. In this paper, we argue for a concrete research agenda aimed at addressing these concerns, comprising five areas of emphasis: (i) Determining if models and modeling procedures exhibit objectionable bias; (ii) Building awareness of fairness into machine learning methods; (iii) Improving the transparency and control of data- and model-driven decision making; (iv) Looking beyond the algorithm(s) for sources of bias and unfairness—in the myriad human decisions made during the problem formulation and modeling process; and (v) Supporting the cross-disciplinary scholarship necessary to do all of that well…(More)”.

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are


Book by Seth Stephens-Davidowitz: “Blending the informed analysis of The Signal and the Noise with the instructive iconoclasm of Think Like a Freak, a fascinating, illuminating, and witty look at what the vast amounts of information now instantly available to us reveals about ourselves and our world—provided we ask the right questions.

By the end of an average day in the early twenty-first century, human beings searching the internet will amass eight trillion gigabytes of data. This staggering amount of information—unprecedented in history—can tell us a great deal about who we are—the fears, desires, and behaviors that drive us, and the conscious and unconscious decisions we make. From the profound to the mundane, we can gain astonishing knowledge about the human psyche that less than twenty years ago, seemed unfathomable.

Everybody Lies offers fascinating, surprising, and sometimes laugh-out-loud insights into everything from economics to ethics to sports to race to sex, gender and more, all drawn from the world of big data. What percentage of white voters didn’t vote for Barack Obama because he’s black? Does where you go to school effect how successful you are in life? Do parents secretly favor boy children over girls? Do violent films affect the crime rate? Can you beat the stock market? How regularly do we lie about our sex lives and who’s more self-conscious about sex, men or women?

Investigating these questions and a host of others, Seth Stephens-Davidowitz offers revelations that can help us understand ourselves and our lives better. Drawing on studies and experiments on how we really live and think, he demonstrates in fascinating and often funny ways the extent to which all the world is indeed a lab. With conclusions ranging from strange-but-true to thought-provoking to disturbing, he explores the power of this digital truth serum and its deeper potential—revealing biases deeply embedded within us, information we can use to change our culture, and the questions we’re afraid to ask that might be essential to our health—both emotional and physical. All of us are touched by big data everyday, and its influence is multiplying. Everybody Lies challenges us to think differently about how we see it and the world…(More)”.

The law and big data


Article by Felin, Teppo, Devins, Caryn, Kauffman, Stuart and Koppl, Roger: “In this article we critically examine the use of Big Data in the legal system. Big Data is driving a trend towards behavioral optimization and “personalized law,” in which legal decisions and rules are optimized for best outcomes and where law is tailored to individual consumers based on analysis of past data. Big Data, however, has serious limitations and dangers when applied in the legal context. Advocates of Big Data make theoretically problematic assumptions about the objectivity of data and scientific observation. Law is always theory-laden. Although Big Data strives to be objective, law and data have multiple possible meanings and uses and thus require theory and interpretation in order to be applied. Further, the meanings and uses of law and data are indefinite and continually evolving in ways that cannot be captured or predicted by Big Data.

Due to these limitations, the use of Big Data will likely generate unintended consequences in the legal system. Large-scale use of Big Data will create distortions that adversely influence legal decision-making, causing irrational herding behaviors in the law. The centralized nature of the collection and application of Big Data also poses serious threats to legal evolution and democratic accountability. Furthermore, its focus on behavioral optimization necessarily restricts and even eliminates the local variation and heterogeneity that makes the legal system adaptive. In all, though Big Data has legitimate uses, this article cautions against using Big Data to replace independent legal judgment….(More)”

We use big data to sentence criminals. But can the algorithms really tell us what we need to know?


 at the Conversation: “In 2013, a man named Eric L. Loomis was sentenced for eluding police and driving a car without the owner’s consent.

When the judge weighed Loomis’ sentence, he considered an array of evidence, including the results of an automated risk assessment tool called COMPAS. Loomis’ COMPAS score indicated he was at a “high risk” of committing new crimes. Considering this prediction, the judge sentenced him to seven years.

Loomis challenged his sentence, arguing it was unfair to use the data-driven score against him. The U.S. Supreme Court now must consider whether to hear his case – and perhaps settle a nationwide debate over whether it’s appropriate for any court to use these tools when sentencing criminals.

Today, judges across the U.S. use risk assessment tools like COMPAS in sentencing decisions. In at least 10 states, these tools are a formal part of the sentencing process. Elsewhere, judges informally refer to them for guidance.

I have studied the legal and scientific bases for risk assessments. The more I investigate the tools, the more my caution about them grows.

The scientific reality is that these risk assessment tools cannot do what advocates claim. The algorithms cannot actually make predictions about future risk for the individual defendants being sentenced….

Algorithms such as COMPAS cannot make predictions about individual defendants, because data-driven risk tools are based on group statistics. This creates an issue that academics sometimes call the “group-to-individual” or G2i problem.

Scientists study groups. But the law sentences the individual. Consider the disconnect between science and the law here.

The algorithms in risk assessment tools commonly assign specific points to different factors. The points are totaled. The total is then often translated to a risk bin, such as low or high risk. Typically, more points means a higher risk of recidivism.

Say a score of 6 points out of 10 on a certain tool is considered “high risk.” In the historical groups studied, perhaps 50 percent of people with a score of 6 points did reoffend.

Thus, one might be inclined to think that a new offender who also scores 6 points is at a 50 percent risk of reoffending. But that would be incorrect.

It may be the case that half of those with a score of 6 in the historical groups studied would later reoffend. However, the tool is unable to select which of the offenders with 6 points will reoffend and which will go on to lead productive lives.

The studies of factors associated with reoffending are not causation studies. They can tell only which factors are correlated with new crimes. Individuals retain some measure of free will to decide to break the law again, or not.

These issues may explain why risk tools often have significant false positive rates. The predictions made by the most popular risk tools for violence and sex offending have been shown to get it wrong for some groups over 50 percent of the time.

A ProPublica investigation found that COMPAS, the tool used in Loomis’ case, is burdened by large error rates. For example, COMPAS failed to predict reoffending in one study at a 37 percent rate. The company that makes COMPAS has disputed the study’s methodology….

There are also a host of thorny issues with risk assessment tools incorporating, either directly or indirectly, sociodemographic variables, such as gender, race and social class. Law professor Anupam Chander has named it the problem of the “racist algorithm.”

Big data may have its allure. But, data-driven tools cannot make the individual predictions that sentencing decisions require. The Supreme Court might helpfully opine on these legal and scientific issues by deciding to hear the Loomis case…(More)”.

Big data allows India to map its fight against human trafficking


Nita Bhalla for Reuters: “An Indian charity is using big data to pinpoint human trafficking hot spots in a bid to prevent vulnerable women and girls vanishing from high-risk villages into the sex trade.

My Choices Foundation uses specially designed technology to identify those villages that are most at risk of modern slavery, then launches local campaigns to sound the alarm….

The analytics tool – developed by Australian firm Quantium – uses a range of factors to identify the most dangerous villages.It draws on India’s census, education and health data and factors such as drought risk, poverty levels, education and job opportunities to identify vulnerable areas….

There are an estimated 46 million people enslaved worldwide, with more than 18 million living in India, according to the 2016 Global Slavery Index. The Index was compiled by the Walk Free Foundation, a global organisation seeking to end modern slavery. Many are villagers lured by traffickers with the promise of a good job and an advance payment, only to find themselves or their children forced to work in fields or brick kilns, enslaved in brothels and sold into sexual slavery.

Almost 20,000 women and children were victims of human trafficking in India in 2016, a rise of nearly 25 percent from the previous year, according to government data.While India has strengthened its anti-trafficking policy in recent years, activists say a lack of public awareness remains one of the biggest impediments…(More)”.

Routledge Handbook on Information Technology in Government


Book edited by Yu-Che Chen and Michael J. Ahn: “The explosive growth in information technology has ushered in unparalleled new opportunities for advancing public service. Featuring 24 chapters from foremost experts in the field of digital government, this Handbook provides an authoritative survey of key emerging technologies, their current state of development and use in government, and insightful discussions on how they are reshaping and influencing the future of public administration. This Handbook explores:

  • Key emerging technologies (i.e., big data, social media, Internet of Things (IOT), GIS, smart phones & mobile technologies) and their impacts on public administration
  • The impacts of the new technologies on the relationships between citizens and their governments with the focus on collaborative governance
  • Key theories of IT innovations in government on the interplay between technological innovations and public administration
  • The relationship between technology and democratic accountability and the various ways of harnessing the new technologies to advance public value
  • Key strategies and conditions for fostering success in leveraging technological innovations for public service

This Handbook will prove to be an invaluable guide and resource for students, scholars and practitioners interested in this growing field of technological innovations in government….(More)”.

Could Big Data Help End Hunger in Africa?


Lenny Ruvaga at VOA News: “Computer algorithms power much of modern life from our Facebook feeds to international stock exchanges. Could they help end malnutrition and hunger in Africa? The International Center for Tropical Agriculture thinks so.

The International Center for Tropical Agriculture has spent the past four years developing the Nutrition Early Warning System, or NEWS.

The goal is to catch the subtle signs of a hunger crisis brewing in Africa as much as a year in advance.

CIAT says the system uses machine learning. As more information is fed into the system, the algorithms will get better at identifying patterns and trends. The system will get smarter.

Information Technology expert Andy Jarvis leads the project.

“The cutting edge side of this is really about bringing in streams of information from multiple sources and making sense of it. … But it is a huge volume of information and what it does, the novelty then, is making sense of that using things like artificial intelligence, machine learning, and condensing it into simple messages,” he said.

Other nutrition surveillance systems exist, like FEWSnet, the Famine Early Warning System Network which was created in the mid-1980s.

But CIAT says NEWS will be able to draw insights from a massive amount of diverse data enabling it to identify hunger risks faster than traditional methods.

“What is different about NEWS is that it pays attention to malnutrition, not just drought or famine, but the nutrition outcome that really matters, malnutrition especially in women and children. For the first time, we are saying these are the options way ahead of time. That gives policy makers an opportunity to really do what they intend to do which is make the lives of women and children better in Africa,” said Dr. Mercy Lung’aho, a CIAT nutrition expert.

While food emergencies like famine and drought grab headlines, the International Center for Tropical Agriculture says chronic malnutrition affects one in four people in Africa, taking a serious toll on economic growth and leaving them especially vulnerable in times of crisis….(More)”.

Big Data: A New Empiricism and its Epistemic and Socio-Political Consequences


Chapter by Gernot Rieder and Judith Simon in by Berechenbarkeit der Welt? Philosophie und Wissenschaft im Zeitalter von Big Data: “…paper investigates the rise of Big Data in contemporary society. It examines the most prominent epistemological claims made by Big Data proponents, calls attention to the potential socio-political consequences of blind data trust, and proposes a possible way forward. The paper’s main focus is on the interplay between an emerging new empiricism and an increasingly opaque algorithmic environment that challenges democratic demands for transparency and accountability. It concludes that a responsible culture of quantification requires epistemic vigilance as well as a greater awareness of the potential dangers and pitfalls of an ever more data-driven society….(More)”.

Data Collaboratives: exchanging data to create public value across Latin America and the Caribbean


Stefaan Verhulst, Andrew Young and Prianka Srinivasan at IADB’s Abierto al Publico: “Data is playing an ever-increasing role in bolstering businesses across Latin America – and the rest of the word. In Brazil, Mexico and Colombia alone, the revenue from Big Data is calculated at more than US$603.7 million, a market that is only set to increase as more companies across Latin America and the Caribbean embrace data-driven strategies to enhance their bottom-line. Brazilian banking giant Itau plans to create six data centers across the country, and already uses data collected from consumers online to improve cross-selling techniques and streamline their investments. Data from web-clicks, social media profiles, and telecommunication services is fueling a new generation of entrepreneurs keen to make big dollars from big data.

What if this same data could be used not just to improve business, but to improve the collective well-being of our communities, public spaces, and cities? Analysis of social media data can offer powerful insights to city officials into public trends and movements to better plan infrastructure and policies. Public health officials and humanitarian workers can use mobile phone data to, for instance, map human mobility and better target their interventions. By repurposing the data collected by companies for their business interests, governments, international organizations and NGOs can leverage big data insights for the greater public good.

Key question is thus: How to unlock useful data collected by corporations in a responsible manner and ensure its vast potential does not go to waste?

Data Collaboratives” are emerging as a possible answer. Data collaboratives are a new type of public-private partnerships aimed at creating public value by exchanging data across sectors.

Research conducted by the GovLab finds that Data Collaboratives offer several potential benefits across a number of sectors, including humanitarian and anti-poverty efforts, urban planning, natural resource stewardship, health, and disaster management. As a greater number of companies in Latin America look to data to spur business interests, our research suggests that some companies are also sharing and collaborating around data to confront some of society’s most pressing problems.

Consider the following Data Collaboratives that seek to enhance…(More)”