A Rule of Persons, Not Machines: The Limits of Legal Automation


Paper by Frank A. Pasquale: “For many legal futurists, attorneys’ work is a prime target for automation. They view the legal practice of most businesses as algorithmic: data (such as facts) are transformed into outputs (agreements or litigation stances) via application of set rules. These technophiles promote substituting computer code for contracts and descriptions of facts now written by humans. They point to early successes in legal automation as proof of concept. TurboTax has helped millions of Americans file taxes, and algorithms have taken over certain aspects of stock trading. Corporate efforts to “formalize legal code” may bring new efficiencies in areas of practice characterized by both legal and factual clarity.

However, legal automation can also elide or exclude important human values, necessary improvisations, and irreducibly deliberative governance. Due process, appeals, and narratively intelligible explanation from persons, for persons, depend on forms of communication that are not reducible to software. Language is constitutive of these aspects of law. To preserve accountability and a humane legal order, these reasons must be expressed in language by a responsible person. This basic requirement for legitimacy limits legal automation in several contexts, including corporate compliance, property recordation, and contracting. A robust and ethical legal profession respects the flexibility and subtlety of legal language as a prerequisite for a just and accountable social order. It ensures a rule of persons, not machines…(More)”

Algorithm Observatory: Where anyone can study any social computing algorithm.


About: “We know that social computing algorithms are used to categorize us, but the way they do so is not always transparent. To take just one example, ProPublica recently uncovered that Facebook allows housing advertisers to exclude users by race.

Even so, there are no simple and accessible resources for us, the public, to study algorithms empirically, and to engage critically with the technologies that are shaping our daily lives in such profound ways.

That is why we created Algorithm Observatory.

Part media literacy project and part citizen experiment, the goal of Algorithm Observatory is to provide a collaborative online lab for the study of social computing algorithms. The data collected through this site is analyzed to compare how a particular algorithm handles data differently depending on the characteristics of users.

Algorithm Observatory is a work in progress. This prototype only allows users to explore Facebook advertising algorithms, and the functionality is limited. We are currently looking for funding to realize the project’s full potential: to allow anyone to study any social computing algorithm….

Our future plans

This is a prototype, which only begins to showcase the things that Algorithm Observatory will be able to do in the future.

Eventually, the website will allow anyone to design an experiment involving a social computing algorithm. The platform will allow researchers to recruit volunteer participants, who will be able to contribute content to the site securely and anonymously. Researchers will then be able to conduct an analysis to compare how the algorithm handles users differently depending on individual characteristics. The results will be shared by publishing a report evaluating the social impact of the algorithm. All data and reports will become publicly available and open for comments and reviews. Researchers will be able to study any algorithm, because the site does not require direct access to the source code, but relies instead on empirical observation of the interaction between the algorithm and volunteer participants….(More)”.

The Unlinkable Data Challenge: Advancing Methods in Differential Privacy


National Institute of Standards and Technology: “Databases across the country include information with potentially important research implications and uses, e.g. contingency planning in disaster scenarios, identifying safety risks in aviation, assist in tracking contagious diseases, identifying patterns of violence in local communities.  However, included in these datasets are personally identifiable information (PII) and it is not enough to simply remove PII from these datasets.  It is well known that using auxiliary and possibly completely unrelated datasets, in combination with records in the dataset, can correspond to uniquely identifiable individuals (known as a linkage attack).  Today’s efforts to remove PII do not provide adequate protection against linkage attacks. With the advent of “big data” and technological advances in linking data, there are far too many other possible data sources related to each of us that can lead to our identity being uncovered.

Get Involved – How to Participate

The Unlinkable Data Challenge is a multi-stage Challenge.  This first stage of the Challenge is intended to source detailed concepts for new approaches, inform the final design in the two subsequent stages, and provide recommendations for matching stage 1 competitors into teams for subsequent stages.  Teams will predict and justify where their algorithm fails with respect to the utility-privacy frontier curve.

In this stage, competitors are asked to propose how to de-identify a dataset using less than the available privacy budget, while also maintaining the dataset’s utility for analysis.  For example, the de-identified data, when put through the same analysis pipeline as the original dataset, produces comparable results (i.e. similar coefficients in a linear regression model, or a classifier that produces similar predictions on sub-samples of the data).

This stage of the Challenge seeks Conceptual Solutions that describe how to use and/or combine methods in differential privacy to mitigate privacy loss when publicly releasing datasets in a variety of industries such as public safety, law enforcement, healthcare/biomedical research, education, and finance.  We are limiting the scope to addressing research questions and methodologies that require regression, classification, and clustering analysis on datasets that contain numerical, geo-spatial, and categorical data.

To compete in this stage, we are asking that you propose a new algorithm utilizing existing or new randomized mechanisms with a justification of how this will optimize privacy and utility across different analysis types.  We are also asking you to propose a dataset that you believe would make a good use case for your proposed algorithm, and provide a means of comparing your algorithm and other algorithms.

All submissions must be made using the submission form provided on HeroX website….(More)“.

The Researcher Passport: Improving Data Access and Confidentiality Protection


Report by Margaret C. Levenstein, Allison R.B. Tyler, and Johanna Davidson Bleckman: “Research and evidence-building benefit from the increased availability of administrative datasets, linkage across datasets, detailed geospatial data, and other confidential data. Systems and policies for provisioning access to confidential data, however, have not kept pace and indeed restrict and unnecessarily encumber leading-edge science.

One series of roadblocks can be smoothed or removed by establishing a common understanding of what constitutes different levels of data sensitivity and risk as well as minimum researcher criteria for data access within these levels. This report presents the results of a recently completed study of 23 data repositories.

It describes the extant landscape of policies, procedures, practices, and norms for restricted data access and identifies the significant challenges faced by researchers interested in accessing and analyzing restricted use datasets.

It identifies commonalities among these repositories to articulate shared community standards that can be the basis of a community-normed researcher passport: a credential that identifies a trusted researcher to multiple repositories and other data custodians.

Three main developments are recommended.

First, language harmonization: establishing a common set of terms and definitions – that will evolve over time through collaboration within the research community – will allow different repositories to understand and integrate shared standards and technologies into their own processes.

Second: develop a researcher passport, a durable and transferable digital identifier issued by a central, community-recognized data steward. This passport will capture researcher attributes that emerged as common elements of user access requirements across repositories, including training, and verification of those attributes (e.g., academic degrees, institutional affiliation, citizenship status, and country of residence).

Third: data custodians issue visas that grant a passport holder access to particular datasets for a particular project for a specific period of time. Like stamps on a passport, these visas provide a history of a researcher’s access to restricted data. This history is integrated into the researcher’s credential, establishing the researcher’s reputation as a trusted data steward….(More)

Big Data against Child Obesity


European Commission: “Childhood and adolescent obesity is a major global and European public health problem. Currently, public actions are detached from local needs, mostly including indiscriminate blanket policies and single-element strategies, limiting their efficacy and effectiveness. The need for community-targeted actions has long been obvious, but the lack of monitoring and evaluation framework and the methodological inability to objectively quantify the local community characteristics, in a reasonable timeframe, has hindered that.

Graph showing BigO policy planner

Big Data based Platform

Technological achievements in mobile and wearable electronics and Big Data infrastructures allow the engagement of European citizens in the data collection process, allowing us to reshape policies at a regional, national and European level. In BigO, that will be facilitated through the development of a platform, allowing the quantification of behavioural community patterns through Big Data provided by wearables and eHealth- devices.

Estimate child obesity through community data

BigO has set detailed scientific, technological, validation and business objectives in order to be able to build a system that collects Big Data on children’s behaviour and helps planning health policies against obesity. In addition, during the project, BigO will reach out to more than 25.000 school and age-matched obese children and adolescents as sources for community data. Comprehensive models of the obesity prevalence dependence matrix will be created, allowing the data-driven effectiveness predictions about specific policies on a community and the real-time monitoring of the population response, supported by powerful real-time data visualisations….(More)

Data Governance in the Digital Age


Centre for International Governance Innovation: “Data is being hailed as “the new oil.” The analogy seems appropriate given the growing amount of data being collected, and the advances made in its gathering, storage, manipulation and use for commercial, social and political purposes.

Big data and its application in artificial intelligence, for example, promises to transform the way we live and work — and will generate considerable wealth in the process. But data’s transformative nature also raises important questions around how the benefits are shared, privacy, public security, openness and democracy, and the institutions that will govern the data revolution.

The delicate interplay between these considerations means that they have to be treated jointly, and at every level of the governance process, from local communities to the international arena. This series of essays by leading scholars and practitioners, which is also published as a special report, will explore topics including the rationale for a data strategy, the role of a data strategy for Canadian industries, and policy considerations for domestic and international data governance…

RATIONALE OF A DATA STRATEGY

THE ROLE OF A DATA STRATEGY FOR CANADIAN INDUSTRIES

BALANCING PRIVACY AND COMMERCIAL VALUES

DOMESTIC POLICY FOR DATA GOVERNANCE

INTERNATIONAL POLICY CONSIDERATIONS

EPILOGUE

Ten Reasons Not to Measure Impact—and What to Do Instead


Essay by Mary Kay Gugerty & Dean Karlan in the Stanford Social Innovation Review: “Good impact evaluations—those that answer policy-relevant questions with rigor—have improved development knowledge, policy, and practice. For example, the NGO Living Goods conducted a rigorous evaluation to measure the impact of its community health model based on door-to-door sales and promotions. The evidence of impact was strong: Their model generated a 27-percent reduction in child mortality. This evidence subsequently persuaded policy makers, replication partners, and major funders to support the rapid expansion of Living Goods’ reach to five million people. Meanwhile, rigorous evidence continues to further validate the model and help to make it work even better.

Of course, not all rigorous research offers such quick and rosy results. Consider the many studies required to discover a successful drug and the lengthy process of seeking regulatory approval and adoption by the healthcare system. The same holds true for fighting poverty: Innovations for Poverty Action (IPA), a research and policy nonprofit that promotes impact evaluations for finding solutions to global poverty, has conducted more than 650 randomized controlled trials (RCTs) since its inception in 2002. These studies have sometimes provided evidence about how best to use scarce resources (e.g., give away bed nets for free to fight malaria), as well as how to avoid wasting them (e.g., don’t expand traditional microcredit). But the vast majority of studies did not paint a clear picture that led to immediate policy changes. Developing an evidence base is more like building a mosaic: Each individual piece does not make the picture, but bit by bit a picture becomes clearer and clearer.

How do these investments in evidence pay off? IPA estimated the benefits of its research by looking at its return on investment—the ratio of the benefit from the scale-up of the demonstrated large-scale successes divided by the total costs since IPA’s founding. The ratio was 74x—a huge result. But this is far from a precise measure of impact, since IPA cannot establish what would have happened had IPA never existed. (Yes, IPA recognizes the irony of advocating for RCTs while being unable to subject its own operations to that standard. Yet IPA’s approach is intellectually consistent: Many questions and circumstances do not call for RCTs.)

Even so, a simple thought exercise helps to demonstrate the potential payoff. IPA never works alone—all evaluations and policy engagements are conducted in partnership with academics and implementing organizations, and increasingly with governments. Moving from an idea to the research phase to policy takes multiple steps and actors, often over many years. But even if IPA deserves only 10 percent of the credit for the policy changes behind the benefits calculated above, the ratio of benefits to costs is still 7.4x. That is a solid return on investment.

Despite the demonstrated value of high-quality impact evaluations, a great deal of money and time has been wasted on poorly designed, poorly implemented, and poorly conceived impact evaluations. Perhaps some studies had too small of a sample or paid insufficient attention to establishing causality and quality data, and hence any results should be ignored; others perhaps failed to engage stakeholders appropriately, and as a consequence useful results were never put to use.

The push for more and more impact measurement can not only lead to poor studies and wasted money, but also distract and take resources from collecting data that can actually help improve the performance of an effort. To address these difficulties, we wrote a book, The Goldilocks Challenge, to help guide organizations in designing “right-fit” evidence strategies. The struggle to find the right fit in evidence resembles the predicament that Goldilocks faces in the classic children’s fable. Goldilocks, lost in the forest, finds an empty house with a large number of options: chairs, bowls of porridge, and beds of all sizes. She tries each but finds that most do not suit her: The porridge is too hot or too cold, the bed too hard or too soft—she struggles to find options that are “just right.” Like Goldilocks, the social sector has to navigate many choices and challenges to build monitoring and evaluation systems that fit their needs. Some will push for more and more data; others will not push for enough….(More)”.

City Data Exchange – Lessons Learned From A Public/Private Data Collaboration


Report by the Municipality of Copenhagen: “The City Data Exchange (CDE) is the product of a collaborative project between the Municipality of Copenhagen, the Capital Region of Denmark, and Hitachi. The purpose of the project is to examine the possibilities of creating a marketplace for the exchange of data between public and private organizations.

The CDE consists of three parts:

  • A collaboration between the different partners on supply, and demand of specific data;
  • A platform for selling and purchasing data aimed at both public, and private organizations;
  • An effort to establish further experience in the field of data exchange between public, and private organizations.

In 2013, the City of Copenhagen, and the Copenhagen Region decided to invest in the creation of a marketplace for the exchange of public, and private sector data. The initial investment was meant as a seed towards a self-sustained marketplace. This was an innovative approach to test the readiness of the market to deliver new data-sharing solutions.

The CDE is the result of a tender by the Municipality of Copenhagen and the Capital Region of Denmark in 2015. Hitachi Consulting won the tender and has invested, and worked with the Municipality of Copenhagen, and the Capital Region of Denmark to establish an organization and a technical platform.

The City Data Exchange (CDE) has closed a gap in regional data infrastructure. Both public-and private sector organizations have used the CDE to gain insights into data use cases, new external data sources, GDPR issues, and to explore the value of their data. Before the CDE was launched, there were only a few options available to purchase or sell data.

The City and the Region of Copenhagen are utilizing the insights from the CDE project to improve their internal activities and to shape new policies. The lessons from the CDE also provide insights into a wider national infrastructure for effective data sharing. Based on the insights from approximately 1000 people that the CDE has been in contact with, the recommendations are:

  • Start with the use case, as it is key to engage the data community that will use the data;
  • Create a data competence hub, where the data community can meet and get support;
  • Create simple standards and guidelines for data publishing.

The following paper presents some of the key findings from our work with the CDE. It has been compiled by Smart City Insights on behalf of the partners of the City Data Exchange project…(More)”.

Free Speech is a Triangle


Essay by Jack Balkin: “The vision of free expression that characterized much of the twentieth century is inadequate to protect free expression today.

The twentieth century featured a dyadic or dualist model of speech regulation with two basic kinds of players: territorial governments on the one hand, and speakers on the other. The twenty-first century model is pluralist, with multiple players. It is easiest to think of it as a triangle. On one corner are nation states and the European Union. On the second corner are privately-owned Internet infrastructure companies, including social media companies, search engines, broadband providers, and electronic payment systems. On the third corner are many different kinds of speakers, legacy media, civil society organizations, hackers, and trolls.

Territorial goverments continue to regulate speakers and legacy media through traditional or “old-school” speech regulation. But nation states and the European Union also now employ “new-school” speech regulation that is aimed at Internet infrastructure owners and designed to get these private companies to surveil, censor, and regulate speakers for them. Finally, infrastructure companies like Facebook also regulate and govern speakers through techniques of private governance and surveillance.

The practical ability to speak in the digital world emerges from the struggle for power between these various forces, with old-school, new-school and private regulation directed at speakers, and both nation states and civil society organizations pressuring infrastructure owners to regulate speech.

If the characteristic feature of free speech regulation in our time is a triangle that combines new school speech regulation with private governance, then the best way to protect free speech values today is to combat and compensate for that triangle’s evolving logic of public and private regulation. The first goal is to prevent or ameliorate as much as possible collateral censorship and new forms of digital prior restraint. The second goal is to protect people from new methods of digital surveillance and manipulation—methods that emerged from the rise of large multinational companies that depend on data collection, surveillance, analysis, control, and distribution of personal data.

This essay describes how nation states should and should not regulate the digital infrastructure consistent with the values of freedom of speech and press; it emphasizes that different models of regulation are appropriate for different parts of the digital infrastructure. Some parts of the digital infrastructure are best regulated along the lines of common carriers or places of public accommodation. But governments should not impose First Amendment-style or common carriage obligations on social media and search engines. Rather, governments should require these companies to provide due process toward their end-users. Governments should also treat these companies as information fiduciaries who have duties of good faith and non-manipulation toward their end-users. Governments can implement all of these reforms—properly designed—consistent with constitutional guarantees of free speech and free press….(More)”.

Doing Research In and On the Digital: Research Methods across Fields of Inquiry


Book edited by Cristina Costa and Jenna Condie: “As a social space, the web provides researchers both with a tool and an environment to explore the intricacies of everyday life. As a site of mediated interactions and interrelationships, the ‘digital’ has evolved from being a space of information to a space of creation, thus providing new opportunities regarding how, where and, why to conduct social research.

Doing Research In and On the Digital aims to deliver on two fronts: first, by detailing how researchers are devising and applying innovative research methods for and within the digital sphere, and, secondly, by discussing the ethical challenges and issues implied and encountered in such approaches.

In two core Parts, this collection explores:

  • content collection: methods for harvesting digital data
  • engaging research informants: digital participatory methods and data stories .

With contributions from a diverse range of fields such as anthropology, sociology, education, healthcare and psychology, this volume will particularly appeal to post-graduate students and early career researchers who are navigating through new terrain in their digital-mediated research endeavours….(More)”.