To Understand Global Migration, You Have to See It First


Data visualization by The New York Times: “In the maps below, Times Opinion can provide the clearest picture to date of how people move across the globe: a record of permanent migration to and from 181 countries based on a single, consistent source of information, for every month from the beginning of 2019 through the end of 2022. These estimates are drawn not from government records but from the location data of three billion anonymized Facebook users all over the world.

The analysis — the result of new research published on Wednesday from Meta, the University of Hong Kong and Harvard University — reveals migration’s true global sweep. And yes, it excludes business travelers and tourists: Only people who remain in their destination country for more than a year are counted as migrants here.

The data comes with some limitations. Migration to and from certain countries that have banned or restricted the use of Facebook, including China, Iran and Cuba, is not included in this data set, and it’s impossible to know each migrant’s legal status. Nevertheless, this is the first time that estimates of global migration flows have been made publicly available at this scale. The researchers found that from 2019 to 2022, an annual average of 30 million people — approximately one-third of a percent of the world’s population — migrated each year.

If you would like to see the data behind this analysis for yourself, we made an interactive tool that you can use to explore the full data set…(More)”

Inside arXiv—the Most Transformative Platform in All of Science


Article by Sheon Han: “Nearly 35 years ago, Ginsparg created arXiv, a digital repository where researchers could share their latest findings—before those findings had been systematically reviewed or verified. Visit arXiv.org today (it’s pronounced like “archive”) and you’ll still see its old-school Web 1.0 design, featuring a red banner and the seal of Cornell University, the platform’s institutional home. But arXiv’s unassuming facade belies the tectonic reconfiguration it set off in the scientific community. If arXiv were to stop functioning, scientists from every corner of the planet would suffer an immediate and profound disruption. “Everybody in math and physics uses it,” Scott Aaronson, a computer scientist at the University of Texas at Austin, told me. “I scan it every night.”

Every industry has certain problems universally acknowledged as broken: insurance in health care, licensing in music, standardized testing in education, tipping in the restaurant business. In academia, it’s publishing. Academic publishing is dominated by for-profit giants like Elsevier and Springer. Calling their practice a form of thuggery isn’t so much an insult as an economic observation. Imagine if a book publisher demanded that authors write books for free and, instead of employing in-house editors, relied on other authors to edit those books, also for free. And not only that: The final product was then sold at prohibitively expensive prices to ordinary readers, and institutions were forced to pay exorbitant fees for access…(More)”.

AI Needs Your Data. That’s Where Social Media Comes In.


Article by Dave Lee: “Having scraped just about the entire sum of human knowledge, ChatGPT and other AI efforts are making the same rallying cry: Need input!

One solution is to create synthetic data and to train a model using that, though this comes with inherent challenges, particularly around perpetuating bias or introducing compounding inaccuracies.

The other is to find a great gushing spigot of new and fresh data, the more “human” the better. That’s where social networks come in, digital spaces where millions, even billions, of users willingly and constantly post reams of information. Photos, posts, news articles, comments — every interaction of interest to companies that are trying to build conversational and generative AI. Even better, this content is not riddled with the copyright violation risk that comes with using other sources.

Lately, top AI companies have moved more aggressively to own or harness social networks, trampling over the rights of users to dictate how their posts may be used to build these machines. Social network users have long been “the product,” as the famous saying goes. They’re now also a quasi-“product developer” through their posts.

Some companies had the benefit of a social network to begin with. Meta Platforms Inc., the biggest social networking company on the planet, used in-app notifications to inform users that it would be harnessing their posts and photos for its Llama AI models. Late last month, Elon Musk’s xAI acquired X, formerly Twitter, in what was primarily a financial sleight of hand but one that made ideal sense for Musk’s Grok AI. It has been able to gain a foothold in the chatbot market by harnessing timely tweets posted on the network as well as the huge archive of online chatter dating back almost two decades. Then there’s Microsoft Corp., which owns the professional network LinkedIn and has been pushing heavily for users (and journalists) to post more and more original content to the platform.

Microsoft doesn’t, however, share LinkedIn data with its close partner OpenAI, which may explain reports that the ChatGPT maker was in the early stages of building a social network of its own…(More)”

DOGE’s Growing Reach into Personal Data: What it Means for Human Rights


Article by Deborah Brown: “Expansive interagency sharing of personal data could fuel abuses against vulnerable people and communities who are already being targeted by Trump administration policies, like immigrants, lesbian, gay, bisexual, and transgender (LGBT) people, and student protesters. The personal data held by the government reveals deeply sensitive information, such as people’s immigration status, race, gender identity, sexual orientation, and economic status.

A massive centralized government database could easily be used for a range of abusive purposes, like to discriminate against current federal employees and future job applicants on the basis of their sexual orientation or gender identity, or to facilitate the deportation of immigrants. It could result in people forgoing public services out of fear that their data will be weaponized against them by another federal agency.

But the danger doesn’t stop with those already in the administration’s crosshairs. The removal of barriers keeping private data siloed could allow the government or DOGE to deny federal loans for education or Medicaid benefits based on unrelated or even inaccurate data. It could also facilitate the creation of profiles containing all of the information various agencies hold on every person in the country. Such profiles, combined with social media activity, could facilitate the identification and targeting of people for political reasons, including in the context of elections.

Information silos exist for a reason. Personal data should be collected for a determined, specific, and legitimate purpose, and not used for another purpose without notice or justification, according to the key internationally recognized data protection principle, “purpose limitation.” Sharing data seamlessly across federal or even state agencies in the name of an undefined and unmeasurable goal of efficiency is incompatible with this core data protection principle…(More)”.

Can We Measure the Impact of a Database?


Article by Peter Buneman, Dennis Dosso, Matteo Lissandrini, Gianmaria Silvello, and He Sun: “Databases publish data. This is undoubtedly the case for scientific and statistical databases, which have largely replaced traditional reference works. Database and Web technologies have led to an explosion in the number of databases that support scientific research, for obvious reasons: Databases provide faster communication of knowledge, hold larger volumes of data, are more easily searched, and are both human- and machine-readable. Moreover, they can be developed rapidly and collaboratively by a mixture of researchers and curators. For example, more than 1,500 curated databases are relevant to molecular biology alone. The value of these databases lies not only in the data they present but also in how they organize that data.

In the case of an author or journal, most bibliometric measures are obtained from citations to an associated set of publications. There are typically many ways of decomposing a database into publications, so we might use its organization to guide our choice of decompositions. We will show that when the database has a hierarchical structure, there is a natural extension of the h-index that works on this hierarchy…(More)”.

AI Is Evolving — And Changing Our Understanding Of Intelligence


Essay by Blaise Agüera y Arcas and James Manyika: “Dramatic advances in artificial intelligence today are compelling us to rethink our understanding of what intelligence truly is. Our new insights will enable us to build better AI and understand ourselves better.

In short, we are in paradigm-shifting territory.

Paradigm shifts are often fraught because it’s easier to adopt new ideas when they are compatible with one’s existing worldview but harder when they’re not. A classic example is the collapse of the geocentric paradigm, which dominated cosmological thought for roughly two millennia. In the geocentric model, the Earth stood still while the Sun, Moon, planets and stars revolved around us. The belief that we were at the center of the universe — bolstered by Ptolemy’s theory of epicycles, a major scientific achievement in its day — was both intuitive and compatible with religious traditions. Hence, Copernicus’s heliocentric paradigm wasn’t just a scientific advance but a hotly contested heresy and perhaps even, for some, as Benjamin Bratton notes, an existential trauma. So, today, artificial intelligence.

In this essay, we will describe five interrelated paradigm shifts informing our development of AI:

  1. Natural Computing — Computing existed in nature long before we built the first “artificial computers.” Understanding computing as a natural phenomenon will enable fundamental advances not only in computer science and AI but also in physics and biology.
  2. Neural Computing — Our brains are an exquisite instance of natural computing. Redesigning the computers that power AI so they work more like a brain will greatly increase AI’s energy efficiency — and its capabilities too.
  3. Predictive Intelligence — The success of large language models (LLMs) shows us something fundamental about the nature of intelligence: it involves statistical modeling of the future (including one’s own future actions) given evolving knowledge, observations and feedback from the past. This insight suggests that current distinctions between designing, training and running AI models are transitory; more sophisticated AI will evolve, grow and learn continuously and interactively, as we do.
  4. General Intelligence — Intelligence does not necessarily require biologically based computation. Although AI models will continue to improve, they are already broadly capable, tackling an increasing range of cognitive tasks with a skill level approaching and, in some cases, exceeding individual human capability. In this sense, “Artificial General Intelligence” (AGI) may already be here — we just keep shifting the goalposts.
  5. Collective Intelligence — Brains, AI agents and societies can all become more capable through increased scale. However, size alone is not enough. Intelligence is fundamentally social, powered by cooperation and the division of labor among many agents. In addition to causing us to rethink the nature of human (or “more than human”) intelligence, this insight suggests social aggregations of intelligences and multi-agent approaches to AI development that could reduce computational costs, increase AI heterogeneity and reframe AI safety debates.

But to understand our own “intelligence geocentrism,” we must begin by reassessing our assumptions about the nature of computing, since it is the foundation of both AI and, we will argue, intelligence in any form…(More)”.

‘We are flying blind’: RFK Jr.’s cuts halt data collection on abortion, cancer, HIV and more


Article by Alice Miranda Ollstein: “The federal teams that count public health problems are disappearing — putting efforts to solve those problems in jeopardy.

Health Secretary Robert F. Kennedy Jr.’s purge of tens of thousands of federal workers has halted efforts to collect data on everything from cancer rates in firefighters to mother-to-baby transmission of HIV and syphilis to outbreaks of drug-resistant gonorrhea to cases of carbon monoxide poisoning.

The cuts threaten to obscure the severity of pressing health threats and whether they’re getting better or worse, leaving officials clueless on how to respond. They could also make it difficult, if not impossible, to assess the impact of the administration’s spending and policies. Both outside experts and impacted employees argue the layoffs will cost the government more money in the long run by eliminating information on whether programs are effective or wasteful, and by allowing preventable problems to fester.

“Surveillance capabilities are crucial for identifying emerging health issues, directing resources efficiently, and evaluating the effectiveness of existing policies,” said Jerome Adams, who served as surgeon general in the first Trump’s administration. “Without robust data and surveillance systems, we cannot accurately assess whether we are truly making America healthier.”..(More)”.

So You Want to Be a Dissident?


Essay by Julia Angwin and Ami Fields-Meyer: “…Heimans points to an increasingly hostile digital landscape as one barrier to effective grassroots campaigns. At the dawn of the digital era, in the two-thousands, e-mail transformed the field of political organizing, enabling groups like MoveOn.org to mobilize huge campaigns against the Iraq War, and allowing upstart candidates like Howard Dean and Barack Obama to raise money directly from people instead of relying on Party infrastructure. But now everyone’s e-mail inboxes are overflowing. The tech oligarchs who control the social-media platforms are less willing to support progressive activism. Globally, autocrats have more tools to surveil and disrupt digital campaigns. And regular people are burned out on actions that have failed to remedy fundamental problems in society.

It’s not clear what comes next. Heimans hopes that new tactics will be developed, such as, perhaps, a new online platform that would help organizing, or the strengthening of a progressive-media ecosystem that will engage new participants. “Something will emerge that kind of revitalizes the space.”

There’s an oft-told story about Andrei Sakharov, the celebrated twentieth-century Soviet activist. Sakharov made his name working as a physicist on the development of the U.S.S.R.’s hydrogen bomb, at the height of the Cold War, but shot to global prominence after Leonid Brezhnev’s regime punished him for speaking publicly about the dangers of those weapons, and also about Soviet repression.

When an American friend was visiting Sakharov and his wife, the activist Yelena Bonner, in Moscow, the friend referred to Sakharov as a dissident. Bonner corrected him: “My husband is a physicist, not a dissident.”

This is a fundamental tension of building a principled dissident culture—it risks wrapping people up in a kind of negative identity, a cloak of what they are not. The Soviet dissidents understood their work as a struggle to uphold the laws and rights that were enshrined in the Soviet constitution, not as a fight against a regime.

“They were fastidious about everything they did being consistent with Soviet law,” Benjamin Nathans, a history professor at the University of Pennsylvania and the author of a book on Soviet dissidents, said. “I call it radical civil obedience.”

An affirmative vision of what the world should be is the inspiration for many of those who, in these tempestuous early months of Trump 2.0, have taken meaningful risks—acts of American dissent.

Consider Mariann Budde, the Episcopal bishop who used her pulpit before Trump on Inauguration Day to ask the President’s “mercy” for two vulnerable groups for whom he has reserved his most visceral disdain. For her sins, a congressional ally of the President called for the pastor to be “added to the deportation list.”..(More)”.

The Future of Health Is Preventive — If We Get Data Governance Right


Article by Stefaan Verhulst: “After a long gestation period of three years, the European Health Data Space (EHDS) is now coming into effect across the European Union, potentially ushering in a new era of health data access, interoperability, and innovation. As this ambitious initiative enters the implementation phase, it brings with it the opportunity to fundamentally reshape how health systems across Europe operate. More generally, the EHDS contains important lessons (and some cautions) for the rest of the world, suggesting how a fragmented, reactive model of healthcare may transition to one that is more integrated, proactive, and prevention-oriented.

For too long, health systems–in the EU and around the world–have been built around treating diseases rather than preventing them. Now, we have an opportunity to change that paradigm. Data, and especially the advent of AI, give us the tools to predict and intervene before illness takes hold. Data offers the potential for a system that prioritizes prevention–one where individuals receive personalized guidance to stay healthy, policymakers access real-time evidence to address risks before they escalate, and epidemics are predicted weeks in advance, enabling proactive, rapid, and highly effective responses.

But to make AI-powered preventive health care a reality, and to make the EHDS a success, we need a new data governance approach, one that would include two key components:

  • The ability to reuse data collected for other purposes (e.g., mobility, retail sales, workplace trends) to improve health outcomes.
  • The ability to integrate different data sources–clinical records and electronic health records (EHRS), but also environmental, social, and economic data — to build a complete picture of health risks.

In what follows, we outline some critical aspects of this new governance framework, including responsible data access and reuse (so-called secondary use), moving beyond traditional consent models to a social license for reuse, data stewardship, and the need to prioritize high-impact applications. We conclude with some specific recommendations for the EHDS, built from the preceding general discussion about the role of AI and data in preventive health…(More)”.

Trump Wants to Merge Government Data. Here Are 314 Things It Might Know About You.


Article by Emily Badger and Sheera Frenkel: “The federal government knows your mother’s maiden name and your bank account number. The student debt you hold. Your disability status. The company that employs you and the wages you earn there. And that’s just a start. It may also know your …and at least 263 more categories of data.These intimate details about the personal lives of people who live in the United States are held in disconnected data systems across the federal government — some at the Treasury, some at the Social Security Administration and some at the Department of Education, among other agencies.

The Trump administration is now trying to connect the dots of that disparate information. Last month, President Trump signed an executive order calling for the “consolidation” of these segregated records, raising the prospect of creating a kind of data trove about Americans that the government has never had before, and that members of the president’s own party have historically opposed.

The effort is being driven by Elon Musk, the world’s richest man, and his lieutenants with the Department of Government Efficiency, who have sought access to dozens of databases as they have swept through agencies across the federal government. Along the way, they have elbowed past the objections of career staff, data security protocols, national security experts and legal privacy protections…(More)”.