Unlocking AI for All: The Case for Public Data Banks


Article by Kevin Frazier: “The data relied on by OpenAI, Google, Meta, and other artificial intelligence (AI) developers is not readily available to other AI labs. Google and Meta relied, in part, on data gathered from their own products to train and fine-tune their models. OpenAI used tactics to acquire data that now would not work or may be more likely to be found in violation of the law (whether such tactics violated the law when originally used by OpenAI is being worked out in the courts). Upstart labs as well as research outfits find themselves with a dearth of data. Full realization of the positive benefits of AI, such as being deployed in costly but publicly useful ways (think tutoring kids or identifying common illnesses), as well as complete identification of the negative possibilities of AI (think perpetuating cultural biases) requires that labs other than the big players have access to quality, sufficient data.

The proper response is not to return to an exploitative status quo. Google, for example, may have relied on data from YouTube videos without meaningful consent from users. OpenAI may have hoovered up copyrighted data with little regard for the legal and social ramifications of that approach. In response to these questionable approaches, data has (rightfully) become harder to acquire. Cloudflare has equipped websites with the tools necessary to limit data scraping—the process of extracting data from another computer program. Regulators have developed new legal limits on data scraping or enforced old ones. Data owners have become more defensive over their content and, in some cases, more litigious. All of these largely positive developments from the perspective of data creators (which is to say, anyone and everyone who uses the internet) diminish the odds of newcomers entering the AI space. The creation of a public AI training data bank is necessary to ensure the availability of enough data for upstart labs and public research entities. Such banks would prevent those new entrants from having to go down the costly and legally questionable path of trying to hoover up as much data as possible…(More)”.

Zillow introduces First Street’s comprehensive climate risk data on for-sale listings across the US


Press Release: “Zillow® is introducing climate risk data, provided by First Street…Home shoppers will gain insights into five key risks—flood, wildfire, wind, heat and air quality—directly from listing pages, complete with risk scores, interactive maps and insurance requirements.

Zillow® is introducing climate risk data, provided by First Street, the standard for climate risk financial modeling, on for-sale property listings across the U.S. Home shoppers will gain insights into five key risks—flood, wildfire, wind, heat and air quality—directly from listing pages, complete with risk scores, interactive maps and insurance requirements.

With more than 80% of buyers now considering climate risks when purchasing a home, this feature provides a clearer understanding of potential hazards, helping buyers to better assess long-term affordability and plan for the future. In assisting buyers to navigate the growing risk of climate change, Zillow is the only platform to feature tailored insurance recommendations alongside detailed historical insights, showing if or when a property has experienced past climate events, such as flooding or wildfires…
When using Zillow’s search map view, home shoppers can explore climate risk data through an interactive map highlighting five key risk categories: flood, wildfire, wind, heat and air quality. Each risk is color-coded and has its own color scale, helping consumers intuitively navigate their search. Informative labels give more context to climate data and link to First Street’s property-specific climate risk reports for full insights.

When viewing a for-sale property on Zillow, home shoppers will see a new climate risk section. This section includes a separate module for each risk category—flood, wildfire, wind, heat and air quality—giving detailed, property-specific data from First Street. This section not only shows how these risks might affect the home now and in the future, but also provides crucial information on wind, fire and flood insurance requirements.

Nationwide, more new listings came with major climate risk, compared to homes listed for sale five years ago, according to a Zillow analysis conducted in August. That trend holds true for all five of the climate risk categories Zillow analyzed. Across all new listings in August, 16.7% were at major risk of wildfire, while 12.8% came with a major risk of flooding…(More)”.

Federal Court Invalidates NYC Law Requiring Food Delivery Apps to Share Customer Data with Restaurants


Article by Hunton, Andrews, Kurth: “On September 24, 2024, a federal district court held that New York City’s “Customer Data Law” violates the First Amendment. Passed in the summer of 2021, the law requires food-delivery apps to share customer-specific data with restaurants that prepare delivered meals.

The New York City Council enacted the Customer Data Law to boost the local restaurant industry in the wake of the pandemic. The law requires food-delivery apps to provide restaurants (upon the restaurants’ request) with each diner’s full name, email address, phone number, delivery address, and order contents. Customers may opt out of such sharing. The law’s supporters argue that requiring such disclosure addresses exploitation by the delivery apps and helps restaurants advertise more effectively.

Normally, when a customer places an order through a food-delivery app, the app provides the restaurant with the customer’s first name, last initial and food order. Food-delivery apps share aggregate data analytics with restaurants but generally do not share customer-specific data beyond the information necessary to fulfill an order. Some apps, for example, provide restaurants with data related to their menu performance, customer feedback and daily operations.

Major food-delivery app companies challenged the Customer Data Law, arguing that its data sharing requirement compels speech impermissibly under the First Amendment. Siding with the apps, the U.S. District Court for the Southern District of New York declared the city’s law invalid, holding that its data sharing requirement is not appropriately tailored to a substantial government interest…(More)”.

Rethinking ‘Checks and Balances’ for the A.I. Age


Article by Steve Lohr: “A new project, orchestrated by Stanford University and published on Tuesday, is inspired by the Federalist Papers and contends that today is a broadly similar historical moment of economic and political upheaval that calls for a rethinking of society’s institutional arrangements.

In an introduction to its collection of 12 essays, called the Digitalist Papers, the editors overseeing the project, including Erik Brynjolfsson, director of the Stanford Digital Economy Lab, and Condoleezza Rice, secretary of state in the George W. Bush administration and director of the Hoover Institution, identify their overarching concern.

“A powerful new technology, artificial intelligence,” they write, “explodes onto the scene and threatens to transform, for better or worse, all legacy social institutions.”

The most common theme in the diverse collection of essays: Citizens need to be more involved in determining how to regulate and incorporate A.I. into their lives. “To build A.I. for the people, with the people,” as one essay summed it up.

The project is being published as the technology is racing ahead. A.I. enthusiasts see a future of higher economic growth, increased prosperity and a faster pace of scientific discovery. But the technology is also raising fears of a dystopian alternative — A.I. chatbots and automated software not only replacing millions of workers, but also generating limitless misinformation and worsening political polarization. How to govern and guide A.I. in the public interest remains an open question…(More)”.

Wired Wisdom


Book by Eszter Hargittai and John Palfrey: “Everyone has that one older relative who loves to post misinformation on social media. That older coworker who fell prey to a phishing attack. Or a parent who still can’t quite get the hang of using emoji in texts. By popular account, these incidents are typical of older generations who inevitably struggle with tech woes. But is that the full story?

Absolutely not, according to the findings of Internet researchers Eszter Hargittai and John Palfrey. Their eye-opening book on the Internet’s fastest-growing demographic offers a more nuanced picture—debunking common myths about older adults’ Internet use to offer hope and a necessary call to action. Incorporating original interviews and survey results from thousands of people sixty and over, Wired Wisdom shows that many, in fact, use technology in ways that put younger peers to shame. Over-sixties are often nimble online, and quicker to abandon social media platforms that don’t meet their needs. Despite being targeted more often, they also may be less likely to fall for scams than younger peers. And fake news actually fools fewer people over sixty, who have far more experience evaluating sources and detecting propaganda. Still, there are unseen risks and missed opportunities for this group. Hargittai and Palfrey show that our stereotypes can be hurdles—keeping us from building intergenerational support communities, aiding loved ones to adopt new technology that may improve their lives, and helping us all thrive.

Full of surprising insights, Wired Wisdom helps push readers beyond ageist assumptions, offers practical advice for older tech users and their communities, and ultimately questions what it really means to age well online—no matter your birthdate…(More)”

The Road to Wisdom: On Truth, Science, Faith, and Trust


Book by Francis Collins: “As the COVID-19 pandemic revealed, we have become not just a hyper-partisan society but also a deeply cynical one, distrustful of traditional sources of knowledge and wisdom. Skepticism about vaccines led to the needless deaths of at least 230,000 Americans. “Do your own research” is now a rallying cry in many online rabbit holes. Yet experts can make mistakes, and institutions can lose their moral compass. So how can we navigate through all this?

In The Road to Wisdom, Francis Collins reminds us of the four core sources of judgement and clear thinking: truth, science, faith, and trust. Drawing on his work from the Human Genome Project and heading the National Institutes of Health, as well as on ethics, philosophy, and Christian theology, Collins makes a robust, thoughtful case for each of these sources—their reliability, and their limits. Ultimately, he shows how they work together, not separately—and certainly not in conflict. It is only when we relink these four foundations of wisdom that we can begin to discern the best path forward in life.

​Thoughtful, accessible, winsome, and deeply wise, The Road to Wisdom leads us beyond current animosities to surer footing. Here is the moral, philosophical, and scientific framework with which to address the problems of our time—including distrust of public health, partisanship, racism, response to climate change, and threats to our democracy—but also to guide us in our daily lives. This is a book that will repay many readings, and resolve dilemmas that we all face every day…(More)”.

Utilizing big data without domain knowledge impacts public health decision-making


Paper by Miao Zhang, Salman Rahman, Vishwali Mhasawade and Rumi Chunara: “…New data sources and AI methods for extracting information are increasingly abundant and relevant to decision-making across societal applications. A notable example is street view imagery, available in over 100 countries, and purported to inform built environment interventions (e.g., adding sidewalks) for community health outcomes. However, biases can arise when decision-making does not account for data robustness or relies on spurious correlations. To investigate this risk, we analyzed 2.02 million Google Street View (GSV) images alongside health, demographic, and socioeconomic data from New York City. Findings demonstrate robustness challenges; built environment characteristics inferred from GSV labels at the intracity level often do not align with ground truth. Moreover, as average individual-level behavior of physical inactivity significantly mediates the impact of built environment features by census tract, intervention on features measured by GSV would be misestimated without proper model specification and consideration of this mediation mechanism. Using a causal framework accounting for these mediators, we determined that intervening by improving 10% of samples in the two lowest tertiles of physical inactivity would lead to a 4.17 (95% CI 3.84–4.55) or 17.2 (95% CI 14.4–21.3) times greater decrease in the prevalence of obesity or diabetes, respectively, compared to the same proportional intervention on the number of crosswalks by census tract. This study highlights critical issues of robustness and model specification in using emergent data sources, showing the data may not measure what is intended, and ignoring mediators can result in biased intervention effect estimates…(More)”

The Complexities of Differential Privacy for Survey Data


Paper by Jörg Drechsler & James Bailie: “The concept of differential privacy (DP) has gained substantial attention in recent years, most notably since the U.S. Census Bureau announced the adoption of the concept for its 2020 Decennial Census. However, despite its attractive theoretical properties, implementing DP in practice remains challenging, especially when it comes to survey data. In this paper we present some results from an ongoing project funded by the U.S. Census Bureau that is exploring the possibilities and limitations of DP for survey data. Specifically, we identify five aspects that need to be considered when adopting DP in the survey context: the multi-staged nature of data production; the limited privacy amplification from complex sampling designs; the implications of survey-weighted estimates; the weighting adjustments for nonresponse and other data deficiencies, and the imputation of missing values. We summarize the project’s key findings with respect to each of these aspects and also discuss some of the challenges that still need to be addressed before DP could become the new data protection standard at statistical agencies…(More)”.

Geographies of missing data: Spatializing counterdata production against feminicide


Paper by Catherine D’Ignazio et al: “Feminicide is the gender-related killing of cisgender and transgender women and girls. It reflects patriarchal and racialized systems of oppression and reveals how territories and socio-economic landscapes configure everyday gender-related violence. In recent decades, many grassroots data production initiatives have emerged with the aim of monitoring this extreme but invisibilized phenomenon. We bridge scholarship in feminist and information geographies with data feminism to examine the ways in which space, broadly defined, shapes the counterdata production strategies of feminicide data activists. Drawing on a qualitative study of 33 monitoring efforts led by civil society organizations across 15 countries, primarily in Latin America, we provide a conceptual framework for examining the spatial dimensions of data activism. We show how there are striking transnational patterns related to where feminicide goes unrecorded, resulting in geographies of missing data. In response to these omissions, activists deploy multiple spatialized strategies to make these geographies visible, to situate and contextualize each case of feminicide, to reclaim databases as spaces for memory and witnessing, and to build transnational networks of solidarity. In this sense, we argue that data activism about feminicide constitutes a space of resistance and resignification of everyday forms of gender-related violence…(More)”.

Constructing Valid Geospatial Tools for Environmental Justice


Report from the National Academies of Sciences, Engineering, and Medicine: “Decades of research have shown that the most disadvantaged communities exist at the intersection of high levels of hazard exposure, racial and ethnic marginalization, and poverty.

Mapping and geographical information systems have been crucial for analyzing the environmental burdens of marginalized communities, and several federal and state geospatial tools have emerged to help address environmental justice concerns — such as the Climate and Economic Justice Screening Tool developed in 2022 in response to Justice40 initiatives from the Biden administration.

Constructing Valid Geospatial Tools for Environmental Justice, a new report from the National Academies of Sciences, Engineering, and Medicine, offers recommendations for developing environmental justice tools that reflect the experiences of the communities they measure.

The report recommends data strategies focused on community engagement, validation, and documentation. It emphasizes using a structured development process and offers guidance for selecting and assessing indicators, integrating indicators, and incorporating cumulative impact scoring. Tool developers should choose measures of economic burden beyond the federal poverty level that account for additional dimensions of wealth and geographic variations in cost of living. They should also use indicators that measure the impacts of racism in policies and practices that have led to current disparities…(More)”.