Improving data access democratizes and diversifies science

Research article by Abhishek Nagaraj, Esther Shears, and Mathijs de Vaan: “Data access is critical to empirical research, but past work on open access is largely restricted to the life sciences and has not directly analyzed the impact of data access restrictions. We analyze the impact of improved data access on the quantity, quality, and diversity of scientific research. We focus on the effects of a shift in the accessibility of satellite imagery data from Landsat, a NASA program that provides valuable remote-sensing data. Our results suggest that improved access to scientific data can lead to a large increase in the quantity and quality of scientific research. Further, better data access disproportionately enables the entry of scientists with fewer resources, and it promotes diversity of scientific research….(More)”

The Cruel New Era of Data-Driven Deportation

Article by Alvaro M. Bedoya: “For a long time, mass deportations were a small-data affair, driven by tips, one-off investigations, or animus-driven hunches. But beginning under George W. Bush, and expanding under Barack Obama, ICE leadership started to reap the benefits of Big Data. The centerpiece of that shift was the “Secure Communities” program, which gathered the fingerprints of arrestees at local and state jails across the nation and compared them with immigration records. That program quickly became a major driver for interior deportations. But ICE wanted more data. The agency had long tapped into driver address records through law enforcement networks. Eyeing the breadth of DMV databases, agents began to ask state officials to run face recognition searches on driver photos against the photos of undocumented people. In Utah, for example, ICE officers requested hundreds of face searches starting in late 2015. Many immigrants avoid contact with any government agency, even the DMV, but they can’t go without heat, electricity, or water; ICE aimed to find them, too. So, that same year, ICE paid for access to a private database that includes the addresses of customers from 80 national and regional electric, cable, gas, and telephone companies.

Amid this bonanza, at least, the Obama administration still acknowledged red lines. Some data were too invasive, some uses too immoral. Under Donald Trump, these limits fell away.

In 2017, breaking with prior practice, ICE started to use data from interviews with scared, detained kids and their relatives to find and arrest more than 500 sponsors who stepped forward to take in the children. At the same time, ICE announced a plan for a social media monitoring program that would use artificial intelligence to automatically flag 10,000 people per month for deportation investigations. (It was scuttled only when computer scientists helpfully indicated that the proposed system was impossible.) The next year, ICE secured access to 5 billion license plate scans from public parking lots and roadways, a hoard that tracks the drives of 60 percent of Americans—an initiative blocked by Department of Homeland Security leadership four years earlier. In August, the agency cut a deal with Clearview AI, whose technology identifies people by comparing their faces not to millions of driver photos, but to 3 billion images from social media and other sites. This is a new era of immigrant surveillance: ICE has transformed from an agency that tracks some people sometimes to an agency that can track anyone at any time….(More)”.

AI planners in Minecraft could help machines design better cities

Article by Will Douglas Heaven: “A dozen or so steep-roofed buildings cling to the edges of an open-pit mine. High above them, on top of an enormous rock arch, sits an inaccessible house. Elsewhere, a railway on stilts circles a group of multicolored tower blocks. Ornate pagodas decorate a large paved plaza. And a lone windmill turns on an island, surrounded by square pigs. This is Minecraft city-building, AI style.

Minecraft has long been a canvas for wild invention. Fans have used the hit block-building game to create replicas of everything from downtown Chicago and King’s Landing to working CPUs. In the decade since its first release, anything that can be built has been.

Since 2018, Minecraft has also been the setting for a creative challenge that stretches the abilities of machines. The annual Generative Design in Minecraft (GDMC) competition asks participants to build an artificial intelligence that can generate realistic towns or villages in previously unseen locations. The contest is just for fun, for now, but the techniques explored by the various AI competitors are precursors of ones that real-world city planners could use….(More)”.

Smart Rural: The Open Data Gap

Paper by Johanna Walker et al: “The smart city paradigm has underpinned a great deal of thevuse and production of open data for the benefit of policymakers and citizens. This paper posits that this further enhances the existing urban rural divide. It investigates the availability and use of rural open data along two parameters: pertaining to rural populations, and to key parts of the rural economy (agriculture, fisheries and forestry). It explores the relationship between key statistics of national / rural economies and rural open data; and the use and users of rural open data where it is available. It finds that although countries with more rural populations are not necessarily earlier in their Open Data Maturity journey, there is still a lack of institutionalisation of open data in rural areas; that there is an apparent gap between the importance of agriculture to a country’s GDP and the amount of agricultural data published openly; and lastly, that the smart
city paradigm cannot simply be transferred to the rural setting. It suggests instead the adoption of the emerging ‘smart region’ paradigm as that most likely to support the specific data needs of rural areas….(More)”.

Emerging models of data governance in the age of datafication

Paper by Marina Micheli et al: “The article examines four models of data governance emerging in the current platform society. While major attention is currently given to the dominant model of corporate platforms collecting and economically exploiting massive amounts of personal data, other actors, such as small businesses, public bodies and civic society, take also part in data governance. The article sheds light on four models emerging from the practices of these actors: data sharing pools, data cooperatives, public data trusts and personal data sovereignty. We propose a social science-informed conceptualisation of data governance. Drawing from the notion of data infrastructure we identify the models as a function of the stakeholders’ roles, their interrelationships, articulations of value, and governance principles. Addressing the politics of data, we considered the actors’ competitive struggles for governing data. This conceptualisation brings to the forefront the power relations and multifaceted economic and social interactions within data governance models emerging in an environment mainly dominated by corporate actors. These models highlight that civic society and public bodies are key actors for democratising data governance and redistributing value produced through data. Through the discussion of the models, their underpinning principles and limitations, the article wishes to inform future investigations of socio-technical imaginaries for the governance of data, particularly now that the policy debate around data governance is very active in Europe….(More)”.

Models and Modeling in the Sciences: A Philosophical Introduction

Book by Stephen M. Downes: “Biologists, climate scientists, and economists all rely on models to move their work forward. In this book, Stephen M. Downes explores the use of models in these and other fields to introduce readers to the various philosophical issues that arise in scientific modeling. Readers learn that paying attention to models plays a crucial role in appraising scientific work. 

This book first presents a wide range of models from a number of different scientific disciplines. After assembling some illustrative examples, Downes demonstrates how models shed light on many perennial issues in philosophy of science and in philosophy in general. Reviewing the range of views on how models represent their targets introduces readers to the key issues in debates on representation, not only in science but in the arts as well. Also, standard epistemological questions are cast in new and interesting ways when readers confront the question, “What makes for a good (or bad) model?”…(More)’.

How Algorithms Can Fight Bias Instead of Entrench It

Essay by Tobias Baer: “…How can we build algorithms that correct for biased data and that live up to the promise of equitable decision-making?

When we consider changing an algorithm to eliminate bias, it is helpful to distinguish what we can change at three different levels (from least to most technical): the decision algorithm, formula inputs, and the formula itself.

In discussing the levels, I will use a fictional example, involving Martians and Zeta Reticulans. I do this because picking a real-life example would, in fact, be stereotyping—I would perpetuate the very biases I try to fight by reiterating a simplified version of the world, and every time I state that a particular group of people is disadvantaged, I also can negatively affect the self-perception of people who consider themselves members of these groups. I do apologize if I unintentionally insult any Martians reading this article!

On the simplest and least technical level, we would adjust only the overall decision algorithm that takes one or more statistical formulas (typically to predict unknown outcomes such as academic success, recidivation, or marital bliss) as an input and applies rules to translate the predictions of these formulas into decisions (e.g., by comparing predictions with externally chosen cutoff values or contextually picking one prediction over another). Such rules can be adjusted without touching the statistical formulas themselves.

An example of such an intervention is called boxing. Imagine you have a score of astrological ability. The astrological ability score is a key criterion for shortlisting candidates for the Interplanetary Economic Forecasting Institute. You would have no objective reason to believe that Martians are any less apt at prognosticating white noise than Zeta Reticulans; however, due to racial prejudice in our galaxy, Martian children tend to get asked a lot less for their opinion and therefore have a lot less practice in gabbing than Zeta Reticulans, and as a result only one percent of Martian applicants achieve the minimum score required to be hired for the Interplanetary Economic Forecasting Institute as compared to three percent of Zeta Reticulans.

Boxing would posit that for hiring decisions to be neutral of race, for each race two percent of applicants should be eligible, and boxing would achieve it by calibrating different cut-off scores (i.e., different implied probabilities of astrological success) for Martians and Zeta Reticulans.

Another example of a level-one adjustment would be to use multiple rank-ordering scores and to admit everyone who achieves a high score on any one of them. This approach is particularly well suited if you have different methods of assessment at your disposal, but each method implies a particular bias against one or more subsegments. An example for a crude version of this approach is admissions to medical school in Germany, where routes include college grades, a qualitative assessment through an interview, and a waitlist….(More)”.

How Tech Companies Can Advance Data Science for Social Good

Essay by Nick Martin: “As the world struggles to achieve the UN’s Sustainable Development Goals (SDGs), the need for reliable data to track our progress is more important than ever. Government, civil society, and private sector organizations all play a role in producing, sharing, and using this data, but their information-gathering and -analysis efforts have been able to shed light on only 68 percent of the SDG indicators so far, according to a 2019 UN study.

To help fill the gap, the data science for social good (DSSG) movement has for years been making datasets about important social issues—such as health care infrastructure, school enrollment, air quality, and business registrations—available to trusted organizations or the public. Large tech companies such as Facebook, Google, Amazon, and others have recently begun to embrace the DSSG movement. Spurred on by advances in the field, the Development Data Partnership, the World Economic Forum’s 2030Vision consortium, and Data Collaboratives, they’re offering information about social media users’ mobility during COVID-19, cloud computing infrastructure to help nonprofits analyze large datasets, and other important tools and services.

But sharing data resources doesn’t mean they’ll be used effectively, if at all, to advance social impact. High-impact results require recipients of data assistance to inhabit a robust, holistic data ecosystem that includes assets like policies for safely handling data and the skills to analyze it. As tech firms become increasingly involved with using data and data science to help achieve the SDGs, it’s important that they understand the possibilities and limitations of the nonprofits and other civil society organizations they’re working with. Without a firm grasp on the data ecosystems of their partners, all the technical wizardry in the world may be for naught.

Companies must ask questions such as: What incentives or disincentives are in place for nonprofits to experiment with data science in their work? What gaps remain between what nonprofits or data scientists need and the resources funders provide? What skills must be developed? To help find answers, TechChange, an organization dedicated to using technology for social good, partnered with Project17, Facebook’s partnerships-led initiative to accelerate progress on the SDGs. Over the past six months, the team led interviews with top figures in the DSSG community from industry, academia, and the public sector. The 14 experts shared numerous insights into using data and data science to advance social good and the SDGs. Four takeaways emerged from our conversations and research…(More)”.

Ethical Challenges and Opportunities Associated With the Ability to Perform Medical Screening From Interactions With Search Engines

Viewpoint by Elad Yom-Tov and Yuval Cherlow: “Recent research has shown the efficacy of screening for serious medical conditions from data collected while people interact with online services. In particular, queries to search engines and the interactions with them were shown to be advantageous for screening a range of conditions including diabetes, several forms of cancer, eating disorders, and depression. These screening abilities offer unique advantages in that they can serve a broad strata of the society, including people in underserved populations and in countries with poor access to medical services. However, these advantages need to be balanced against the potential harm to privacy, autonomy, and nonmaleficence, which are recognized as the cornerstones of ethical medical care. Here, we discuss these opportunities and challenges, both when collecting data to develop online screening services and when deploying them. We offer several solutions that balance the advantages of these services with the ethical challenges they pose….(More)”.

AI ethics groups are repeating one of society’s classic mistakes

Article by Abhishek Gupta and Victoria Heath: “International organizations and corporations are racing to develop global guidelines for the ethical use of artificial intelligence. Declarations, manifestos, and recommendations are flooding the internet. But these efforts will be futile if they fail to account for the cultural and regional contexts in which AI operates.

AI systems have repeatedly been shown to cause problems that disproportionately affect marginalized groups while benefiting a privileged few. The global AI ethics efforts under way today—of which there are dozens—aim to help everyone benefit from this technology, and to prevent it from causing harm. Generally speaking, they do this by creating guidelines and principles for developers, funders, and regulators to follow. They might, for example, recommend routine internal audits or require protections for users’ personally identifiable information.

We believe these groups are well-intentioned and are doing worthwhile work. The AI community should, indeed, agree on a set of international definitions and concepts for ethical AI. But without more geographic representation, they’ll produce a global vision for AI ethics that reflects the perspectives of people in only a few regions of the world, particularly North America and northwestern Europe.

This work is not easy or straightforward. “Fairness,” “privacy,” and “bias” mean different things (pdf) in different places. People also have disparate expectations of these concepts depending on their own political, social, and economic realities. The challenges and risks posed by AI also differ depending on one’s locale.

If organizations working on global AI ethics fail to acknowledge this, they risk developing standards that are, at best, meaningless and ineffective across all the world’s regions. At worst, these flawed standards will lead to more AI systems and tools that perpetuate existing biases and are insensitive to local cultures….(More)”.