AI and Big Data: Disruptive Regulation


Book by Mark Findlay, Josephine Seah, and Willow Wong: “This provocative and timely book identifies and disrupts the conventional regulation and governance discourses concerning AI and big data. It suggests that, instead of being used as tools for exclusionist commercial markets, AI and big data can be employed in governing digital transformation for social good.

Analysing the ways in which global technology companies have colonised data access, the book reveals how trust, ethics, and digital self-determination can be reconsidered and engaged to promote the interests of marginalised stakeholders in data arrangement. Chapters examine the regulation of labour engagement in digital economies, the landscape of AI ethics, and a multitude of questions regarding participation, costs, and sustainability. Presenting several informative case studies, the book challenges some of the accepted qualifiers of frontier tech and data use and proposes innovative ways of actioning the more conventional regulatory components of big data.

Scholars and students in information and media law, regulation and governance, and law and politics will find this book to be critical reading. It will also be of interest to policymakers and the AI and data science community…(More)”.

We, the Data


Book by Wendy H. Wong: “Our data-intensive world is here to stay, but does that come at the cost of our humanity in terms of autonomy, community, dignity, and equality? In We, the Data, Wendy H. Wong argues that we cannot allow that to happen. Exploring the pervasiveness of data collection and tracking, Wong reminds us that we are all stakeholders in this digital world, who are currently being left out of the most pressing conversations around technology, ethics, and policy. This book clarifies the nature of datafication and calls for an extension of human rights to recognize how data complicate what it means to safeguard and encourage human potential.

As we go about our lives, we are co-creating data through what we do. We must embrace that these data are a part of who we are, Wong explains, even as current policies do not yet reflect the extent to which human experiences have changed. This means we are more than mere “subjects” or “sources” of data “by-products” that can be harvested and used by technology companies and governments. By exploring data rights, facial recognition technology, our posthumous rights, and our need for a right to data literacy, Wong has crafted a compelling case for engaging as stakeholders to hold data collectors accountable. Just as the Universal Declaration of Human Rights laid the global groundwork for human rights, We, the Data gives us a foundation upon which we claim human rights in the age of data…(More)”.

Demographic Parity: Mitigating Biases in Real-World Data


Paper by Orestis Loukas, and Ho-Ryun Chung: “Computer-based decision systems are widely used to automate decisions in many aspects of everyday life, which include sensitive areas like hiring, loaning and even criminal sentencing. A decision pipeline heavily relies on large volumes of historical real-world data for training its models. However, historical training data often contains gender, racial or other biases which are propagated to the trained models influencing computer-based decisions. In this work, we propose a robust methodology that guarantees the removal of unwanted biases while maximally preserving classification utility. Our approach can always achieve this in a model-independent way by deriving from real-world data the asymptotic dataset that uniquely encodes demographic parity and realism. As a proof-of-principle, we deduce from public census records such an asymptotic dataset from which synthetic samples can be generated to train well-established classifiers. Benchmarking the generalization capability of these classifiers trained on our synthetic data, we confirm the absence of any explicit or implicit bias in the computer-aided decision…(More)”.

Five types of urban digital twins


Blog by Darrel Ronald: “The definition for urban digital twins is too vague — so it is important to create a clearer picture of the types of urban digital twins that are available. Not all digital twins are the same and each one comes with features and capabilities, strengths and weakness, as well as appropriate and inappropriate use cases….

Darrel Ronald
Urban Twin taxonomy, Source: Darrel Ronald, Spatiomatics

As shown in my proposed Urban Digital Twin Taxonomy above, I propose that we classify these products first based on their Main Functionality (the Use Case), then based on their Technology Platform. I highlight some of main products within the different categories and their product scope. Next, I detail the different types of twins and offer some brief strengths and weaknesses for each type. This taxonomy could apply to other industries such as architecture or manufacturing, but it is specifically applied to cities and urban development projects.

The main functionalities can be grouped by:

  • Modelling Twin
  • Computational Twin
  • Scenario Twin
  • Operational Twin
  • Experiential Twin

The technology platforms can be grouped by:

  • Computer Aided Design (CAD)
  • Web GIS
  • Geographic Information System (GIS)
  • Gaming…(More)”.

From Happiness Data to Economic Conclusions


Paper by Daniel J. Benjamin, Kristen Cooper, Ori Heffetz & Miles S. Kimball: “Happiness data—survey respondents’ self-reported well-being (SWB)—have become increasingly common in economics research, with recent calls to use them in policymaking. Researchers have used SWB data in novel ways, for example to learn about welfare or preferences when choice data are unavailable or difficult to interpret. Focusing on leading examples of this pioneering research, the first part of this review uses a simple theoretical framework to reverse-engineer some of the crucial assumptions that underlie existing applications. The second part discusses evidence bearing on these assumptions and provides practical advice to the agencies and institutions that generate SWB data, the researchers who use them, and the policymakers who may use the resulting research. While we advocate creative uses of SWB data in economics, we caution that their use in policy will likely require both additional data collection and further research to better understand the data…(More)”.

Evidence 2.0: The Next Era of Evidence-Based Policymaking


Interview with Nick Hart & Jason Saul: “One of the great—if largely unsung—bipartisan congressional acts of recent history was the passage in 2018 of the Foundations for Evidence-Based Policymaking Act. In essence, the “Evidence Act” codified the goal of using solid, consistent evidence as the basis for funding decisions on trillions of dollars of public money. Agencies use this data to decide on the most effective and most promising solutions for a vast array of issues, from early-childhood education to environmental protection.

Five years later, while most federal agencies have created fairly robust evidence bases, unlocking that evidence for practical use by decision makers remains challenging. One might argue that if Evidence 1.0 was focused on the production of evidence, then the next five years—let’s call it Evidence 2.0—will be focused on the effective use of that evidence. Now that evidence is readily available to policymakers, the question is, how can that data be standardized, aggregated, derived, applied, and used for predictive decision-making?…(More)”.

Essential requirements for the governance and management of data trusts, data repositories, and other data collaborations


Paper by Alison Paprica et al: “Around the world, many organisations are working on ways to increase the use, sharing, and reuse of person-level data for research, evaluation, planning, and innovation while ensuring that data are secure and privacy is protected. As a contribution to broader efforts to improve data governance and management, in 2020 members of our team published 12 minimum specification essential requirements (min specs) to provide practical guidance for organisations establishing or operating data trusts and other forms of data infrastructure… We convened an international team, consisting mostly of participants from Canada and the United States of America, to test and refine the original 12 min specs. Twenty-three (23) data-focused organisations and initiatives recorded the various ways they address the min specs. Sub-teams analysed the results, used the findings to make improvements to the min specs, and identified materials to support organisations/initiatives in addressing the min specs.
Analyses and discussion led to an updated set of 15 min specs covering five categories: one min spec for Legal, five for Governance, four for Management, two for Data Users, and three for Stakeholder & Public Engagement. Multiple changes were made to make the min specs language more technically complete and precise. The updated set of 15 min specs has been integrated into a Canadian national standard that, to our knowledge, is the first to include requirements for public engagement and Indigenous Data Sovereignty…(More)”.

What Big Tech Knows About Your Body


Article by Yael Grauer: “If you were seeking online therapy from 2017 to 2021—and a lot of people were—chances are good that you found your way to BetterHelp, which today describes itself as the world’s largest online-therapy purveyor, with more than 2 million users. Once you were there, after a few clicks, you would have completed a form—an intake questionnaire, not unlike the paper one you’d fill out at any therapist’s office: Are you new to therapy? Are you taking any medications? Having problems with intimacy? Experiencing overwhelming sadness? Thinking of hurting yourself? BetterHelp would have asked you if you were religious, if you were LGBTQ, if you were a teenager. These questions were just meant to match you with the best counselor for your needs, small text would have assured you. Your information would remain private.

Except BetterHelp isn’t exactly a therapist’s office, and your information may not have been completely private. In fact, according to a complaint brought by federal regulators, for years, BetterHelp was sharing user data—including email addresses, IP addresses, and questionnaire answers—with third parties, including Facebook and Snapchat, for the purposes of targeting ads for its services. It was also, according to the Federal Trade Commission, poorly regulating what those third parties did with users’ data once they got them. In July, the company finalized a settlement with the FTC and agreed to refund $7.8 million to consumers whose privacy regulators claimed had been compromised. (In a statement, BetterHelp admitted no wrongdoing and described the alleged sharing of user information as an “industry-standard practice.”)

We leave digital traces about our health everywhere we go: by completing forms like BetterHelp’s. By requesting a prescription refill online. By clicking on a link. By asking a search engine about dosages or directions to a clinic or pain in chest dying. By shopping, online or off. By participating in consumer genetic testing. By stepping on a smart scale or using a smart thermometer. By joining a Facebook group or a Discord server for people with a certain medical condition. By using internet-connected exercise equipment. By using an app or a service to count your steps or track your menstrual cycle or log your workouts. Even demographic and financial data unrelated to health can be aggregated and analyzed to reveal or infer sensitive information about people’s physical or mental-health conditions…(More)”.

The Man Who Trapped Us in Databases


McKenzie Funk in The New York University: “One of Asher’s innovations — or more precisely one of his companies’ innovations — was what is now known as the LexID. My LexID, I learned, is 000874529875. This unique string of digits is a kind of shadow Social Security number, one of many such “persistent identifiers,” as they are called, that have been issued not by the government but by data companies like Acxiom, Oracle, Thomson Reuters, TransUnion — or, in this case, LexisNexis.

My LexID was created sometime in the early 2000s in Asher’s computer room in South Florida, as many still are, and without my consent it began quietly stalking me. One early data point on me would have been my name; another, my parents’ address in Oregon. From my birth certificate or my driver’s license or my teenage fishing license — and from the fact that the three confirmed one another — it could get my sex and my date of birth. At the time, it would have been able to collect the address of the college I attended, Swarthmore, which was small and expensive, and it would have found my first full-time employer, the National Geographic Society, quickly amassing more than enough data to let someone — back then, a human someone — infer quite a bit more about me and my future prospects…(More)”

Data Repurposing through Compatibility: A Computational Perspective


Paper by Asia Biega: “Reuse of data in new contexts beyond the purposes for which it was originally collected has contributed to technological innovation and reducing the consent burden on data subjects. One of the legal mechanisms that makes such reuse possible is purpose compatibility assessment. In this paper, I offer an in-depth analysis of this mechanism through a computational lens. I moreover consider what should qualify as repurposing apart from using data for a completely new task, and argue that typical purpose formulations are an impediment to meaningful repurposing. Overall, the paper positions compatibility assessment as a constructive practice beyond an ineffective standard…(More)”