To Understand Global Migration, You Have to See It First


Data visualization by The New York Times: “In the maps below, Times Opinion can provide the clearest picture to date of how people move across the globe: a record of permanent migration to and from 181 countries based on a single, consistent source of information, for every month from the beginning of 2019 through the end of 2022. These estimates are drawn not from government records but from the location data of three billion anonymized Facebook users all over the world.

The analysis — the result of new research published on Wednesday from Meta, the University of Hong Kong and Harvard University — reveals migration’s true global sweep. And yes, it excludes business travelers and tourists: Only people who remain in their destination country for more than a year are counted as migrants here.

The data comes with some limitations. Migration to and from certain countries that have banned or restricted the use of Facebook, including China, Iran and Cuba, is not included in this data set, and it’s impossible to know each migrant’s legal status. Nevertheless, this is the first time that estimates of global migration flows have been made publicly available at this scale. The researchers found that from 2019 to 2022, an annual average of 30 million people — approximately one-third of a percent of the world’s population — migrated each year.

If you would like to see the data behind this analysis for yourself, we made an interactive tool that you can use to explore the full data set…(More)”

AI Needs Your Data. That’s Where Social Media Comes In.


Article by Dave Lee: “Having scraped just about the entire sum of human knowledge, ChatGPT and other AI efforts are making the same rallying cry: Need input!

One solution is to create synthetic data and to train a model using that, though this comes with inherent challenges, particularly around perpetuating bias or introducing compounding inaccuracies.

The other is to find a great gushing spigot of new and fresh data, the more “human” the better. That’s where social networks come in, digital spaces where millions, even billions, of users willingly and constantly post reams of information. Photos, posts, news articles, comments — every interaction of interest to companies that are trying to build conversational and generative AI. Even better, this content is not riddled with the copyright violation risk that comes with using other sources.

Lately, top AI companies have moved more aggressively to own or harness social networks, trampling over the rights of users to dictate how their posts may be used to build these machines. Social network users have long been “the product,” as the famous saying goes. They’re now also a quasi-“product developer” through their posts.

Some companies had the benefit of a social network to begin with. Meta Platforms Inc., the biggest social networking company on the planet, used in-app notifications to inform users that it would be harnessing their posts and photos for its Llama AI models. Late last month, Elon Musk’s xAI acquired X, formerly Twitter, in what was primarily a financial sleight of hand but one that made ideal sense for Musk’s Grok AI. It has been able to gain a foothold in the chatbot market by harnessing timely tweets posted on the network as well as the huge archive of online chatter dating back almost two decades. Then there’s Microsoft Corp., which owns the professional network LinkedIn and has been pushing heavily for users (and journalists) to post more and more original content to the platform.

Microsoft doesn’t, however, share LinkedIn data with its close partner OpenAI, which may explain reports that the ChatGPT maker was in the early stages of building a social network of its own…(More)”

DOGE’s Growing Reach into Personal Data: What it Means for Human Rights


Article by Deborah Brown: “Expansive interagency sharing of personal data could fuel abuses against vulnerable people and communities who are already being targeted by Trump administration policies, like immigrants, lesbian, gay, bisexual, and transgender (LGBT) people, and student protesters. The personal data held by the government reveals deeply sensitive information, such as people’s immigration status, race, gender identity, sexual orientation, and economic status.

A massive centralized government database could easily be used for a range of abusive purposes, like to discriminate against current federal employees and future job applicants on the basis of their sexual orientation or gender identity, or to facilitate the deportation of immigrants. It could result in people forgoing public services out of fear that their data will be weaponized against them by another federal agency.

But the danger doesn’t stop with those already in the administration’s crosshairs. The removal of barriers keeping private data siloed could allow the government or DOGE to deny federal loans for education or Medicaid benefits based on unrelated or even inaccurate data. It could also facilitate the creation of profiles containing all of the information various agencies hold on every person in the country. Such profiles, combined with social media activity, could facilitate the identification and targeting of people for political reasons, including in the context of elections.

Information silos exist for a reason. Personal data should be collected for a determined, specific, and legitimate purpose, and not used for another purpose without notice or justification, according to the key internationally recognized data protection principle, “purpose limitation.” Sharing data seamlessly across federal or even state agencies in the name of an undefined and unmeasurable goal of efficiency is incompatible with this core data protection principle…(More)”.

Data Cooperatives: Democratic Models for Ethical Data Stewardship


Paper by Francisco Mendonca, Giovanna DiMarzo, and Nabil Abdennadher: “Data cooperatives offer a new model for fair data governance, enabling individuals to collectively control, manage, and benefit from their information while adhering to cooperative principles such as democratic member control, economic participation, and community concern. This paper reviews data cooperatives, distinguishing them from models like data trusts, data commons, and data unions, and defines them based on member ownership, democratic governance, and data sovereignty. It explores applications in sectors like healthcare, agriculture, and construction. Despite their potential, data cooperatives face challenges in coordination, scalability, and member engagement, requiring innovative governance strategies, robust technical systems, and mechanisms to align member interests with cooperative goals. The paper concludes by advocating for data cooperatives as a sustainable, democratic, and ethical model for the future data economy…(More)”.

Can We Measure the Impact of a Database?


Article by Peter Buneman, Dennis Dosso, Matteo Lissandrini, Gianmaria Silvello, and He Sun: “Databases publish data. This is undoubtedly the case for scientific and statistical databases, which have largely replaced traditional reference works. Database and Web technologies have led to an explosion in the number of databases that support scientific research, for obvious reasons: Databases provide faster communication of knowledge, hold larger volumes of data, are more easily searched, and are both human- and machine-readable. Moreover, they can be developed rapidly and collaboratively by a mixture of researchers and curators. For example, more than 1,500 curated databases are relevant to molecular biology alone. The value of these databases lies not only in the data they present but also in how they organize that data.

In the case of an author or journal, most bibliometric measures are obtained from citations to an associated set of publications. There are typically many ways of decomposing a database into publications, so we might use its organization to guide our choice of decompositions. We will show that when the database has a hierarchical structure, there is a natural extension of the h-index that works on this hierarchy…(More)”.

Code Shift: Using AI to Analyze Zoning Reform in American Cities


Report by Arianna Salazar-Miranda & Emily Talen: “Cities are at the forefront of addressing global sustainability challenges, particularly those exacerbated by climate change. Traditional zoning codes, which often segregate land uses, have been linked to increased vehicular dependence, urban sprawl and social disconnection, undermining broader social and environmental sustainability objectives. This study investigates the adoption and impact of form-based codes (FBCs), which aim to promote sustainable, compact and mixed-use urban forms as a solution to these issues. Using natural language processing techniques, we analyzed zoning documents from over 2,000 United States census-designated places to identify linguistic patterns indicative of FBC principles. Our fndings reveal widespread adoption of FBCs across the country, with notable variations within regions. FBCs are associated with higher foor to area ratios, narrower and more consistent street setbacks and smaller plots. We also fnd that places with FBCs have improved walkability, shorter commutes and a higher share of multifamily housing. Our fndings highlight the utility of natural language processing for evaluating zoning codes and underscore the potential benefts of form-based zoning reforms for enhancing urban sustainability…(More)”.

Artificial Intelligence and the Future of Work


Report by National Academies of Sciences, Engineering, and Medicine: “Advances in artificial intelligence (AI) promise to improve productivity significantly, but there are many questions about how AI could affect jobs and workers.

Recent technical innovations have driven the rapid development of generative AI systems, which produce text, images, or other content based on user requests – advances which have the potential to complement or replace human labor in specific tasks, and to reshape demand for certain types of expertise in the labor market.

Artificial Intelligence and the Future of Work evaluates recent advances in AI technology and their implications for economic productivity, the workforce, and education in the United States. The report notes that AI is a tool with the potential to enhance human labor and create new forms of valuable work – but this is not an inevitable outcome. Tracking progress in AI and its impacts on the workforce will be critical to helping inform and equip workers and policymakers to flexibly respond to AI developments…(More)”.

AI Is Evolving — And Changing Our Understanding Of Intelligence


Essay by Blaise Agüera y Arcas and James Manyika: “Dramatic advances in artificial intelligence today are compelling us to rethink our understanding of what intelligence truly is. Our new insights will enable us to build better AI and understand ourselves better.

In short, we are in paradigm-shifting territory.

Paradigm shifts are often fraught because it’s easier to adopt new ideas when they are compatible with one’s existing worldview but harder when they’re not. A classic example is the collapse of the geocentric paradigm, which dominated cosmological thought for roughly two millennia. In the geocentric model, the Earth stood still while the Sun, Moon, planets and stars revolved around us. The belief that we were at the center of the universe — bolstered by Ptolemy’s theory of epicycles, a major scientific achievement in its day — was both intuitive and compatible with religious traditions. Hence, Copernicus’s heliocentric paradigm wasn’t just a scientific advance but a hotly contested heresy and perhaps even, for some, as Benjamin Bratton notes, an existential trauma. So, today, artificial intelligence.

In this essay, we will describe five interrelated paradigm shifts informing our development of AI:

  1. Natural Computing — Computing existed in nature long before we built the first “artificial computers.” Understanding computing as a natural phenomenon will enable fundamental advances not only in computer science and AI but also in physics and biology.
  2. Neural Computing — Our brains are an exquisite instance of natural computing. Redesigning the computers that power AI so they work more like a brain will greatly increase AI’s energy efficiency — and its capabilities too.
  3. Predictive Intelligence — The success of large language models (LLMs) shows us something fundamental about the nature of intelligence: it involves statistical modeling of the future (including one’s own future actions) given evolving knowledge, observations and feedback from the past. This insight suggests that current distinctions between designing, training and running AI models are transitory; more sophisticated AI will evolve, grow and learn continuously and interactively, as we do.
  4. General Intelligence — Intelligence does not necessarily require biologically based computation. Although AI models will continue to improve, they are already broadly capable, tackling an increasing range of cognitive tasks with a skill level approaching and, in some cases, exceeding individual human capability. In this sense, “Artificial General Intelligence” (AGI) may already be here — we just keep shifting the goalposts.
  5. Collective Intelligence — Brains, AI agents and societies can all become more capable through increased scale. However, size alone is not enough. Intelligence is fundamentally social, powered by cooperation and the division of labor among many agents. In addition to causing us to rethink the nature of human (or “more than human”) intelligence, this insight suggests social aggregations of intelligences and multi-agent approaches to AI development that could reduce computational costs, increase AI heterogeneity and reframe AI safety debates.

But to understand our own “intelligence geocentrism,” we must begin by reassessing our assumptions about the nature of computing, since it is the foundation of both AI and, we will argue, intelligence in any form…(More)”.

‘We are flying blind’: RFK Jr.’s cuts halt data collection on abortion, cancer, HIV and more


Article by Alice Miranda Ollstein: “The federal teams that count public health problems are disappearing — putting efforts to solve those problems in jeopardy.

Health Secretary Robert F. Kennedy Jr.’s purge of tens of thousands of federal workers has halted efforts to collect data on everything from cancer rates in firefighters to mother-to-baby transmission of HIV and syphilis to outbreaks of drug-resistant gonorrhea to cases of carbon monoxide poisoning.

The cuts threaten to obscure the severity of pressing health threats and whether they’re getting better or worse, leaving officials clueless on how to respond. They could also make it difficult, if not impossible, to assess the impact of the administration’s spending and policies. Both outside experts and impacted employees argue the layoffs will cost the government more money in the long run by eliminating information on whether programs are effective or wasteful, and by allowing preventable problems to fester.

“Surveillance capabilities are crucial for identifying emerging health issues, directing resources efficiently, and evaluating the effectiveness of existing policies,” said Jerome Adams, who served as surgeon general in the first Trump’s administration. “Without robust data and surveillance systems, we cannot accurately assess whether we are truly making America healthier.”..(More)”.

Statistical methods in public policy research


Chapter by Andrew Heiss: “This essay provides an overview of statistical methods in public policy, focused primarily on the United States. I trace the historical development of quantitative approaches in policy research, from early ad hoc applications through the 19th and early 20th centuries, to the full institutionalization of statistical analysis in federal, state, local, and nonprofit agencies by the late 20th century.

I then outline three core methodological approaches to policy-centered statistical research across social science disciplines: description, explanation, and prediction, framing each in terms of the focus of the analysis. In descriptive work, researchers explore what exists and examine any variable of interest to understand their different distributions and relationships. In explanatory work, researchers ask why does it exist and how can it be influenced. The focus of the analysis is on explanatory variables (X) to either (1) accurately estimate their relationship with an outcome variable (Y), or (2) causally attribute the effect of specific explanatory variables on outcomes. In predictive work, researchers as what will happen next and focus on the outcome variable (Y) and on generating accurate forecasts, classifications, and predictions from new data. For each approach, I examine key techniques, their applications in policy contexts, and important methodological considerations.

I then consider critical perspectives on quantitative policy analysis framed around issues related to a three-part “data imperative” where governments are driven to count, gather, and learn from data. Each of these imperatives entail substantial issues related to privacy, accountability, democratic participation, and epistemic inequalities—issues at odds with public sector values of transparency and openness. I conclude by identifying some emerging trends in public sector-focused data science, inclusive ethical guidelines, open research practices, and future directions for the field…(More)”.