Tool for Surveillance or Spotlight on Inequality? Big Data and the Law


Paper by Rebecca A. Johnson and Tanina Rostain: “The rise of big data and machine learning is a polarizing force among those studying inequality and the law. Big data and tools like predictive modeling may amplify inequalities in the law, subjecting vulnerable individuals to enhanced surveillance. But these data and tools may also serve an opposite function, shining a spotlight on inequality and subjecting powerful institutions to enhanced oversight. We begin with a typology of the role of big data in inequality and the law. The typology asks questions—Which type of individual or institutional actor holds the data? What problem is the actor trying to use the data to solve?—that help situate the use of big data within existing scholarship on law and inequality. We then highlight the dual uses of big data and computational methods—data for surveillance and data as a spotlight—in three areas of law: rental housing, child welfare, and opioid prescribing. Our review highlights asymmetries where the lack of data infrastructure to measure basic facts about inequality within the law has impeded the spotlight function….(More)”.

Interventions to mitigate the racially discriminatory impacts of emerging tech including AI


Joint Civil Society Statement: “As widespread recent protests have highlighted, racial inequality remains an urgent and devastating issue around the world, and this is as true in the context of technology as it is everywhere else. In fact, it may be more so, as algorithmic technologies based on big data are deployed at previously unimaginable scale, reproducing the discriminatory systems that build and govern them.

The undersigned organizations welcome the publication of the report “Racial discrimination and emerging digital technologies: a human rights analysis,” by Special Rapporteur on contemporary forms of racism, racial discrimination, xenophobia and related intolerance, E. Tendayi Achiume, and wish to underscore the importance and timeliness of a number of the recommendations made therein:

  1. Technologies that have had or will have significant racially discriminatory impacts should be banned outright.
    While incremental regulatory approaches may be appropriate in some contexts, where a technology is demonstrably likely to cause racially discriminatory harm, it should not be deployed until that harm can be prevented. Moreover, certain technologies may always have disparate racial impacts, no matter how much their accuracy can be improved. In the present moment, racially discriminatory technologies include facial and affect recognition technology and so-called predictive analytics. We support Special Rapporteur Achiume’s call for mandatory human rights impact assessments as a prerequisite for the adoption of new technologies. We also believe that where such assessments reveal that a technology has a high likelihood of deleterious racially disparate impacts, states should prevent its use through a ban or moratorium. We join the Special Rapporteur in welcoming recent municipal bans, for example, on the use of facial recognition technology, and encourage national governments to adopt similar policies.  Correspondingly, we reiterate our support for states’ imposition of an immediate moratorium on the trade and use of privately developed surveillance tools until such time as states enact appropriate safeguards, and congratulate Special Rapporteur Achiume on joining that call.
  2. Gender mainstreaming and representation along racial, national and other intersecting identities requires radical improvement at all levels of the tech sector.
  3. Technologists cannot solve political, social, and economic problems without the input of domain experts and those personally impacted.
  4. Access to technology is as urgent an issue of racial discrimination as inequity in the design of technologies themselves.
  5. Representative and disaggregated data is a necessary, if not sufficient, condition for racial equity in emerging digital technologies, but it must be collected and managed equitably as well.
  6. States as well as corporations must provide remedies for racial discrimination, including reparations.… (More)”.

Social Research in Times of Big Data: The Challenges of New Data Worlds and the Need for a Sociology of Social Research


Paper by Rainer Diaz-Bone et al: “The phenomenon of big data does not only deeply affect current societies but also poses crucial challenges to social research. This article argues for moving towards a sociology of social research in order to characterize the new qualities of big data and its deficiencies. We draw on the neopragmatist approach of economics of convention (EC) as a conceptual basis for such a sociological perspective.

This framework suggests investigating processes of quantification in their interplay with orders of justifications and logics of evaluation. Methodological issues such as the question of the “quality of big data” must accordingly be discussed in their deep entanglement with epistemic values, institutional forms, and historical contexts and as necessarily implying political issues such as who controls and has access to data infrastructures. On this conceptual basis, the article uses the example of health to discuss the challenges of big data analysis for social research.

Phenomena such as the rise of new and massive privately owned data infrastructures, the economic valuation of huge amounts of connected data, or the movement of “quantified self” are presented as indications of a profound transformation compared to established forms of doing social research. Methodological and epistemological, but also institutional and political, strategies are presented to face the risk of being “outperformed” and “replaced” by big data analysis as they are already done in big US American and Chinese Internet enterprises. In conclusion, we argue that the sketched developments have important implications both for research practices and methods teaching in the era of big data…(More)”.

The European data market


European Commission: “It was the first European Data Market study (SMART 2013/0063) contracted by the European Commission in 2013 that made a first attempt to provide facts and figures on the size and trends of the EU data economy by developing a European data market monitoring tool.

The final report of the updated European Data Market (EDM) study (SMART 2016/0063) now presents in detail the results of the final round of measurement of the updated European Data Market Monitoring Tool contracted for the 2017-2020 period.

Designed along a modular structure, as a first pillar of the study, the European Data Market Monitoring Tool is built around a core set of quantitative indicators to provide a series of assessments of the emerging market of data at present, i.e. for the years 2018 through 2020, and with projections to 2025.

The key areas covered by the indicators measured in this report are:

  • The data professionals and the balance between demand and supply of data skills;
  • The data companies and their revenues;
  • The data user companies and their spending for data technologies;
  • The market of digital products and services (“Data market”);
  • The data economy and its impacts on the European economy.
  • Forecast scenarios of all the indicators, based on alternative market trajectories.

Additionally, as a second major work stream, the study also presents a series of descriptive stories providing a complementary view to the one offered by the Monitoring Tool (for example, “How Big Data is driving AI” or “The Secondary Use of Health Data and Data-driven Innovation in the European Healthcare Industry”), adding fresh, real-life information around the quantitative indicators. By focusing on specific issues and aspects of the data market, the stories offer an initial, indicative “catalogue” of good practices of what is happening in the data economy today in Europe and what is likely to affect the development of the EU data economy in the medium term.

Finally, as a third work stream of the study, a landscaping exercise on the EU data ecosystem was carried out together with some community building activities to bring stakeholders together from all segments of the data value chain. The map containing the results of the landscaping of the EU data economy as well as reports from the webinars organised by the study are available on the www.datalandscape.eu website….(More)”.

Social Distancing and Social Capital: Why U.S. Counties Respond Differently to Covid-19


NBER Paper by Wenzhi Ding et al: Since social distancing is the primary strategy for slowing the spread of many diseases, understanding why U.S. counties respond differently to COVID-19 is critical for designing effective public policies. Using daily data from about 45 million mobile phones to measure social distancing we examine how counties responded to both local COVID-19 cases and statewide shelter-in-place orders. We find that social distancing increases more in response to cases and official orders in counties where individuals historically (1) engaged less in community activities and (2) demonstrated greater willingness to incur individual costs to contribute to social objectives. Our work highlights the importance of these two features of social capital—community engagement and individual commitment to societal institutions—in formulating public health policies….(More)”

Are there laws of history?


Amanda Rees at AEON: “…If big data could enable us to turn big history into mathematics rather than narratives, would that make it easier to operationalise our past? Some scientists certainly think so.

In February 2010, Peter Turchin, an ecologist from the University of Connecticut, predicted that 2020 would see a sharp increase in political volatility for Western democracies. Turchin was responding critically to the optimistic speculations of scientific progress in the journal Nature: the United States, he said, was coming to the peak of another instability spike (regularly occurring every 50 years or so), while the world economy was reaching the point of a ‘Kondratiev wave’ dip, that is, a steep downturn in a growth-driven supercycle. Along with a number of ‘seemingly disparate’ social pointers, all indications were that serious problems were looming. In the decade since that prediction, the entrenched, often vicious, social, economic and political divisions that have increasingly characterised North American and European society, have made Turchin’s ‘quantitative historical analysis’ seem remarkably prophetic.

A couple of years earlier, in July 2008, Turchin had made a series of trenchant claims about the nature and future of history. Totting up in excess of ‘200 explanations’ proposed to account for the fall of the Roman empire, he was appalled that historians were unable to agree ‘which explanations are plausible and which should be rejected’. The situation, he maintained, was ‘as risible as if, in physics, phlogiston theory and thermodynamics coexisted on equal terms’. Why, Turchin wanted to know, were the efforts in medicine and environmental science to produce healthy bodies and ecologies not mirrored by interventions to create stable societies? Surely it was time ‘for history to become an analytical, and even a predictive, science’. Knowing that historians were themselves unlikely to adopt such analytical approaches to the past, he proposed a new discipline: ‘theoretical historical social science’ or ‘cliodynamics’ – the science of history.

Like C P Snow 60 years before him, Turchin wanted to challenge the boundary between the sciences and humanities – even as periodic attempts to apply the theories of natural science to human behaviour (sociobiology, for example) or to subject natural sciences to the methodological scrutiny of the social sciences (science wars, anyone?) have frequently resulted in hostile turf wars. So what are the prospects for Turchin’s efforts to create a more desirable future society by developing a science of history?…

In 2010, Cliodynamics, the flagship journal for this new discipline, appeared, with its very first article (by the American sociologist Randall Collins) focusing on modelling victory and defeat in battle in relation to material resources and organisational morale. In a move that paralleled Comte’s earlier argument regarding the successive stages of scientific complexity (from physics, through chemistry and biology, to sociology), Turchin passionately rejected the idea that complexity made human societies unsuitable for quantitative analysis, arguing that it was precisely that complexity which made mathematics essential. Weather predictions were once considered unreliable because of the sheer complexity of managing the necessary data. But improvements in technology (satellites, computers) mean that it’s now possible to describe mathematically, and therefore to model, interactions between the system’s various parts – and therefore to know when it’s wise to carry an umbrella. With equal force, Turchin insisted that the cliodynamic approach was not deterministic. It would not predict the future, but instead lay out for governments and political leaders the likely consequences of competing policy choices.

Crucially, and again on the back of the abundantly available and cheap computer power, cliodynamics benefited from the surge in interest in the digital humanities. Existing archives were being digitised, uploaded and made searchable: every day, it seemed, more data were being presented in a format that encouraged quantification and enabled mathematical analysis – including the Old Bailey’s online database, of which Wolf had fallen foul. At the same time, cliodynamicists were repositioning themselves. Four years after its initial launch, the subtitle of their flagship journal was renamed, from The Journal of Theoretical and Mathematical History to The Journal of Quantitative History and Cultural Evolution. As Turchin’s editorial stated, this move was intended to position cliodynamics within a broader evolutionary analysis; paraphrasing the Russian-American geneticist Theodosius Dobzhansky, he claimed that ‘nothing in human history makes sense except in the light of cultural evolution’. Given Turchin’s ecological background, this evolutionary approach to history is unsurprising. But given the historical outcomes of making politics biological, it is potentially worrying….

Mathematical, data-driven, quantitative models of human experience that aim at detachment, objectivity and the capacity to develop and test hypotheses need to be balanced by explicitly fictional, qualitative and imaginary efforts to create and project a lived future that enable their audiences to empathically ground themselves in the hopes and fears of what might be to come. Both, after all, are unequivocally doing the same thing: using history and historical experience to anticipate the global future so that we might – should we so wish – avoid civilisation’s collapse. That said, the question of who ‘we’ are does, always, remain open….(More)”.

‘For good measure’: data gaps in a big data world


Paper by Sarah Giest & Annemarie Samuels: “Policy and data scientists have paid ample attention to the amount of data being collected and the challenge for policymakers to use and utilize it. However, far less attention has been paid towards the quality and coverage of this data specifically pertaining to minority groups. The paper makes the argument that while there is seemingly more data to draw on for policymakers, the quality of the data in combination with potential known or unknown data gaps limits government’s ability to create inclusive policies. In this context, the paper defines primary, secondary, and unknown data gaps that cover scenarios of knowingly or unknowingly missing data and how that is potentially compensated through alternative measures.

Based on the review of the literature from various fields and a variety of examples highlighted throughout the paper, we conclude that the big data movement combined with more sophisticated methods in recent years has opened up new opportunities for government to use existing data in different ways as well as fill data gaps through innovative techniques. Focusing specifically on the representativeness of such data, however, shows that data gaps affect the economic opportunities, social mobility, and democratic participation of marginalized groups. The big data movement in policy may thus create new forms of inequality that are harder to detect and whose impact is more difficult to predict….(More)“.

How data privacy leader Apple found itself in a data ethics catastrophe


Article by Daniel Wu and Mike Loukides: “…Apple learned a critical lesson from this experience. User buy-in cannot end with compliance with rules. It requires ethics, constantly asking how to protect, fight for, and empower users, regardless of what the law says. These strategies contribute to perceptions of trust.

Trust has to be earned, is easily lost, and is difficult to regain….

In our more global, diverse, and rapidly- changing world, ethics may be embodied by the “platinum rule”: Do unto others as they would want done to them. One established field of ethics—bioethics—offers four principles that are related to the platinum rule: nonmaleficence, justice, autonomy, and beneficence.

For organizations that want to be guided by ethics, regardless of what the law says, these principles as essential tools for a purpose-driven mission: protecting (nonmaleficence), fighting for (justice), and empowering users and employees (autonomy and beneficence).

An ethics leader protects users and workers in its operations by using governance best practices. 

Before creating the product, it understands both the qualitative and quantitative contexts of key stakeholders, especially those who will be most impacted, identifying their needs and fears. When creating the product, it uses data protection by design, working with cross-functional roles like legal and privacy engineers to embed ethical principles into the lifecycle of the product and formalize data-sharing agreements. Before launching, it audits the product thoroughly and conducts scenario planning to understand potential ethical mishaps, such as perceived or real gender bias or human rights violations in its supply chain. After launching, its terms of service and collection methods are highly readable and enables even disaffected users to resolve issues delightfully.

Ethics leaders also fight for users and workers, who can be forgotten. These leaders may champion enforceable consumer protections in the first place, before a crisis erupts. With social movements, leaders fight powerful actors preying on vulnerable communities or the public at large—and critically examines and ameliorates its own participation in systemic violence. As a result, instead of last-minute heroic efforts to change compromised operations, it’s been iterating all along.

Finally, ethics leaders empower their users and workers. With diverse communities and employees, they co-create new products that help improve basic needs and enable more, including the vulnerable, to increase their autonomy and their economic mobility. These entrepreneurial efforts validate new revenue streams and relationships while incubating next-generation workers who self-govern and push the company’s mission forward. Employees voice their values and diversify their relationships. Alison Taylor, the Executive Director of Ethical Systems, argues that internal processes should “improve [workers’] reasoning and creativity, instead of short-circuiting them.” Enabling this is a culture of psychological safety and training to engage kindly with divergent ideas.

These purpose-led strategies boost employee performance and retention, drive deep customer loyalty, and carve legacies.

To be clear, Apple may be implementing at least some of these strategies already—but perhaps not uniformly or transparently. For instance, Apple has implemented some provisions of the European Union’s General Data Protection Regulation for all US residents—not just EU and CA residents—including the ability to access and edit data. This expensive move, which goes beyond strict legal requirements, was implemented even without public pressure.

But ethics strategies have major limitations leaders must address

As demonstrated by the waves of ethical “principles” released by Fortune 500 companies and commissions, ethics programs can be murky, dominated by a white, male, and Western interpretation.

Furthermore, focusing purely on ethics gives companies an easy way to “free ride” off social goodwill, but ultimately stay unaccountable, given the lack of external oversight over ethics programs. When companies substitute unaccountable data ethics principles for thoughtful engagement with the enforceable data regulation principles, users will be harmed.

Long-term, without the ability to wave a $100 million fine with clear-cut requirements and lawyers trained to advocate for them internally, ethics leaders may face barriers to buy-in. Unlike their sales, marketing, or compliance counterparts, ethics programs do not directly add revenue or reduce costs. In recessions, these “soft” programs may be the first on the chopping block.

As a result of these factors, we will likely see a surge in ethics-washing: well-intentioned companies that talk ethics, but don’t walk it. More will view these efforts as PR-driven ethics stunts, which don’t deeply engage with actual ethical issues. If harmful business models do not change, ethics leaders will be fighting a losing battle….(More)”.

Synthetic data offers advanced privacy for the Census Bureau, business


Kate Kaye at IAPP: “In the early 2000s, internet accessibility made risks of exposing individuals from population demographic data more likely than ever. So, the U.S. Census Bureau turned to an emerging privacy approach: synthetic data.

Some argue the algorithmic techniques used to develop privacy-secure synthetic datasets go beyond traditional deidentification methods. Today, along with the Census Bureau, clinical researchers, autonomous vehicle system developers and banks use these fake datasets that mimic statistically valid data.

In many cases, synthetic data is built from existing data by filtering it through machine learning models. Real data representing real individuals flows in, and fake data mimicking individuals with corresponding characteristics flows out.

When data scientists at the Census Bureau began exploring synthetic data methods, adoption of the internet had made deidentified, open-source data on U.S. residents, their households and businesses more accessible than in the past.

Especially concerning, census-block-level information was now widely available. Because in rural areas, a census block could represent data associated with as few as one house, simply stripping names, addresses and phone numbers from that information might not be enough to prevent exposure of individuals.

“There was pretty widespread angst” among statisticians, said John Abowd, the bureau’s associate director for research and methodology and chief scientist. The hand-wringing led to a “gradual awakening” that prompted the agency to begin developing synthetic data methods, he said.

Synthetic data built from the real data preserves privacy while providing information that is still relevant for research purposes, Abowd said: “The basic idea is to try to get a model that accurately produces an image of the confidential data.”

The plan for the 2020 census is to produce a synthetic image of that original data. The bureau also produces On the Map, a web-based mapping and reporting application that provides synthetic data showing where workers are employed and where they live along with reports on age, earnings, industry distributions, race, ethnicity, educational attainment and sex.

Of course, the real census data is still locked away, too, Abowd said: “We have a copy and the national archives have a copy of the confidential microdata.”…(More)”.

Scraping the Web for Public Health Gains: Ethical Considerations from a ‘Big Data’ Research Project on HIV and Incarceration


Stuart Rennie, Mara Buchbinder, Eric Juengst, Lauren Brinkley-Rubinstein, and David L Rosen at Public Health Ethics: “Web scraping involves using computer programs for automated extraction and organization of data from the Web for the purpose of further data analysis and use. It is frequently used by commercial companies, but also has become a valuable tool in epidemiological research and public health planning. In this paper, we explore ethical issues in a project that “scrapes” public websites of U.S. county jails as part of an effort to develop a comprehensive database (including individual-level jail incarcerations, court records and confidential HIV records) to enhance HIV surveillance and improve continuity of care for incarcerated populations. We argue that the well-known framework of Emanuel et al. (2000) provides only partial ethical guidance for the activities we describe, which lie at a complex intersection of public health research and public health practice. We suggest some ethical considerations from the ethics of public health practice to help fill gaps in this relatively unexplored area….(More)”.