For sale: Data on US servicemembers — and lots of it


Article by Alfred Ng: “Active-duty members of the U.S. military are vulnerable to having their personal information collected, packaged and sold to overseas companies without any vetting, according to a new report funded by the U.S. Military Academy at West Point.

The report highlights a significant American security risk, according to military officials, lawmakers and the experts who conducted the research, and who say the data available on servicemembers exposes them to blackmail based on their jobs and habits.

It also casts a spotlight on the practices of data brokers, a set of firms that specialize in scraping and packaging people’s digital records such as health conditions and credit ratings.

“It’s really a case of being able to target people based on specific vulnerabilities,” said Maj. Jessica Dawson, a research scientist at the Army Cyber Institute at West Point who initiated the study.

Data brokers gather government files, publicly available information and financial records into packages they can sell to marketers and other interested companies. As the practice has grown into a $214 billion industry, it has raised privacy concerns and come under scrutiny from lawmakers in Congress and state capitals.

Worried it could also present a risk to national security, the U.S. Military Academy at West Point funded the study from Duke University to see how servicemembers’ information might be packaged and sold.

Posing as buyers in the U.S. and Singapore, Duke researchers contacted multiple data-broker firms who listed datasets about active-duty servicemembers for sale. Three agreed and sold datasets to the researchers while two declined, saying the requests came from companies that didn’t meet their verification standards.

In total, the datasets contained information on nearly 30,000 active-duty military personnel. They also purchased a dataset on an additional 5,000 friends and family members of military personnel…(More)”

AI models could help negotiators secure peace deals


The Economist: “In a messy age of grinding wars and multiplying tariffs, negotiators are as busy as the stakes are high. Alliances are shifting and political leaders are adjusting—if not reversing—positions. The resulting tumult is giving even seasoned negotiators trouble keeping up with their superiors back home. Artificial-intelligence (AI) models may be able to lend a hand.

Some such models are already under development. One of the most advanced projects, dubbed Strategic Headwinds, aims to help Western diplomats in talks on Ukraine. Work began during the Biden administration in America, with officials on the White House’s National Security Council (NSC) offering guidance to the Centre for Strategic and International Studies (CSIS), a think-tank in Washington that runs the project. With peace talks under way, CSIS has speeded up its effort. Other outfits are doing similar work.

The CSIS programme is led by a unit called the Futures Lab. This team developed an AI language model using software from Scale AI, a firm based in San Francisco, and unique training data. The lab designed a tabletop strategy game called “Hetman’s Shadow” in which Russia, Ukraine and their allies hammer out deals. Data from 45 experts who played the game were fed into the model. So were media analyses of issues at stake in the Russia-Ukraine war, as well as answers provided by specialists to a questionnaire about the relative values of potential negotiation trade-offs. A database of 374 peace agreements and ceasefires was also poured in.

Thus was born, in late February, the first iteration of the Ukraine-Russia Peace Agreement Simulator. Users enter preferences for outcomes grouped under four rubrics: territory and sovereignty; security arrangements; justice and accountability; and economic conditions. The AI model then cranks out a draft agreement. The software also scores, on a scale of one to ten, the likelihood that each of its components would be satisfactory, negotiable or unacceptable to Russia, Ukraine, America and Europe. The model was provided to government negotiators from those last three territories, but a limited “dashboard” version of the software can be run online by interested members of the public…(More)”.

The New Commons Challenge: Advancing AI for Public Good through Data Commons


Press Release: “The Open Data Policy Lab, a collaboration between The GovLab at New York University and Microsoft, has launched the New Commons Challenge, an initiative to advance the responsible reuse of data for AI-driven solutions that enhance local decision-making and humanitarian response. 

The Challenge will award two winning institutions $100,000 each to develop data commons that fuel responsible AI innovation in these critical areas.

With the increasing use of generative AI in crisis management, disaster preparedness, and local decision-making, access to diverse and high-quality data has never been more essential. 

The New Commons Challenge seeks to support organizations—including start-ups, non-profits, NGOs, universities, libraries, and AI developers—to build shared data ecosystems that improve real-world outcomes, from public health to emergency response.

Bridging Research and Real-World Impact

The New Commons Challenge is about putting data into action,” said Stefaan Verhulst, Co-Founder and Chief Research and Development Officer at The GovLab. “By enabling new models of data stewardship, we aim to support AI applications that save lives, strengthen communities, and enhance local decision-making where it matters most.”

The Challenge builds on the Open Data Policy Lab’s recent report, “Blueprint to Unlock New Data Commons for AI,” which advocates for creating collaboratively governed data ecosystems that support responsible AI development.

How the Challenge Works

The challenge unfolds in two phases: Phase One: Open Call for Concept Notes (April 14 – June 2, 2025) 

Innovators world-wide are invited to submit concept notes outlining their ideas. Phase Two: Full Proposal Submissions & Expert Review (June 2025)

  • Selected applicants will be invited to submit a full proposal
  • An interdisciplinary panel will evaluate proposals based on their impact potential, feasibility, and ethical governance.

Winners Announced in Late Summer 2025

Two selected projects will each receive $100,000 in funding, alongside technical support, mentorship, and global recognition…(More)”.

To Understand Global Migration, You Have to See It First


Data visualization by The New York Times: “In the maps below, Times Opinion can provide the clearest picture to date of how people move across the globe: a record of permanent migration to and from 181 countries based on a single, consistent source of information, for every month from the beginning of 2019 through the end of 2022. These estimates are drawn not from government records but from the location data of three billion anonymized Facebook users all over the world.

The analysis — the result of new research published on Wednesday from Meta, the University of Hong Kong and Harvard University — reveals migration’s true global sweep. And yes, it excludes business travelers and tourists: Only people who remain in their destination country for more than a year are counted as migrants here.

The data comes with some limitations. Migration to and from certain countries that have banned or restricted the use of Facebook, including China, Iran and Cuba, is not included in this data set, and it’s impossible to know each migrant’s legal status. Nevertheless, this is the first time that estimates of global migration flows have been made publicly available at this scale. The researchers found that from 2019 to 2022, an annual average of 30 million people — approximately one-third of a percent of the world’s population — migrated each year.

If you would like to see the data behind this analysis for yourself, we made an interactive tool that you can use to explore the full data set…(More)”

AI Needs Your Data. That’s Where Social Media Comes In.


Article by Dave Lee: “Having scraped just about the entire sum of human knowledge, ChatGPT and other AI efforts are making the same rallying cry: Need input!

One solution is to create synthetic data and to train a model using that, though this comes with inherent challenges, particularly around perpetuating bias or introducing compounding inaccuracies.

The other is to find a great gushing spigot of new and fresh data, the more “human” the better. That’s where social networks come in, digital spaces where millions, even billions, of users willingly and constantly post reams of information. Photos, posts, news articles, comments — every interaction of interest to companies that are trying to build conversational and generative AI. Even better, this content is not riddled with the copyright violation risk that comes with using other sources.

Lately, top AI companies have moved more aggressively to own or harness social networks, trampling over the rights of users to dictate how their posts may be used to build these machines. Social network users have long been “the product,” as the famous saying goes. They’re now also a quasi-“product developer” through their posts.

Some companies had the benefit of a social network to begin with. Meta Platforms Inc., the biggest social networking company on the planet, used in-app notifications to inform users that it would be harnessing their posts and photos for its Llama AI models. Late last month, Elon Musk’s xAI acquired X, formerly Twitter, in what was primarily a financial sleight of hand but one that made ideal sense for Musk’s Grok AI. It has been able to gain a foothold in the chatbot market by harnessing timely tweets posted on the network as well as the huge archive of online chatter dating back almost two decades. Then there’s Microsoft Corp., which owns the professional network LinkedIn and has been pushing heavily for users (and journalists) to post more and more original content to the platform.

Microsoft doesn’t, however, share LinkedIn data with its close partner OpenAI, which may explain reports that the ChatGPT maker was in the early stages of building a social network of its own…(More)”

DOGE’s Growing Reach into Personal Data: What it Means for Human Rights


Article by Deborah Brown: “Expansive interagency sharing of personal data could fuel abuses against vulnerable people and communities who are already being targeted by Trump administration policies, like immigrants, lesbian, gay, bisexual, and transgender (LGBT) people, and student protesters. The personal data held by the government reveals deeply sensitive information, such as people’s immigration status, race, gender identity, sexual orientation, and economic status.

A massive centralized government database could easily be used for a range of abusive purposes, like to discriminate against current federal employees and future job applicants on the basis of their sexual orientation or gender identity, or to facilitate the deportation of immigrants. It could result in people forgoing public services out of fear that their data will be weaponized against them by another federal agency.

But the danger doesn’t stop with those already in the administration’s crosshairs. The removal of barriers keeping private data siloed could allow the government or DOGE to deny federal loans for education or Medicaid benefits based on unrelated or even inaccurate data. It could also facilitate the creation of profiles containing all of the information various agencies hold on every person in the country. Such profiles, combined with social media activity, could facilitate the identification and targeting of people for political reasons, including in the context of elections.

Information silos exist for a reason. Personal data should be collected for a determined, specific, and legitimate purpose, and not used for another purpose without notice or justification, according to the key internationally recognized data protection principle, “purpose limitation.” Sharing data seamlessly across federal or even state agencies in the name of an undefined and unmeasurable goal of efficiency is incompatible with this core data protection principle…(More)”.

Data Cooperatives: Democratic Models for Ethical Data Stewardship


Paper by Francisco Mendonca, Giovanna DiMarzo, and Nabil Abdennadher: “Data cooperatives offer a new model for fair data governance, enabling individuals to collectively control, manage, and benefit from their information while adhering to cooperative principles such as democratic member control, economic participation, and community concern. This paper reviews data cooperatives, distinguishing them from models like data trusts, data commons, and data unions, and defines them based on member ownership, democratic governance, and data sovereignty. It explores applications in sectors like healthcare, agriculture, and construction. Despite their potential, data cooperatives face challenges in coordination, scalability, and member engagement, requiring innovative governance strategies, robust technical systems, and mechanisms to align member interests with cooperative goals. The paper concludes by advocating for data cooperatives as a sustainable, democratic, and ethical model for the future data economy…(More)”.

Can We Measure the Impact of a Database?


Article by Peter Buneman, Dennis Dosso, Matteo Lissandrini, Gianmaria Silvello, and He Sun: “Databases publish data. This is undoubtedly the case for scientific and statistical databases, which have largely replaced traditional reference works. Database and Web technologies have led to an explosion in the number of databases that support scientific research, for obvious reasons: Databases provide faster communication of knowledge, hold larger volumes of data, are more easily searched, and are both human- and machine-readable. Moreover, they can be developed rapidly and collaboratively by a mixture of researchers and curators. For example, more than 1,500 curated databases are relevant to molecular biology alone. The value of these databases lies not only in the data they present but also in how they organize that data.

In the case of an author or journal, most bibliometric measures are obtained from citations to an associated set of publications. There are typically many ways of decomposing a database into publications, so we might use its organization to guide our choice of decompositions. We will show that when the database has a hierarchical structure, there is a natural extension of the h-index that works on this hierarchy…(More)”.

Code Shift: Using AI to Analyze Zoning Reform in American Cities


Report by Arianna Salazar-Miranda & Emily Talen: “Cities are at the forefront of addressing global sustainability challenges, particularly those exacerbated by climate change. Traditional zoning codes, which often segregate land uses, have been linked to increased vehicular dependence, urban sprawl and social disconnection, undermining broader social and environmental sustainability objectives. This study investigates the adoption and impact of form-based codes (FBCs), which aim to promote sustainable, compact and mixed-use urban forms as a solution to these issues. Using natural language processing techniques, we analyzed zoning documents from over 2,000 United States census-designated places to identify linguistic patterns indicative of FBC principles. Our fndings reveal widespread adoption of FBCs across the country, with notable variations within regions. FBCs are associated with higher foor to area ratios, narrower and more consistent street setbacks and smaller plots. We also fnd that places with FBCs have improved walkability, shorter commutes and a higher share of multifamily housing. Our fndings highlight the utility of natural language processing for evaluating zoning codes and underscore the potential benefts of form-based zoning reforms for enhancing urban sustainability…(More)”.

Artificial Intelligence and the Future of Work


Report by National Academies of Sciences, Engineering, and Medicine: “Advances in artificial intelligence (AI) promise to improve productivity significantly, but there are many questions about how AI could affect jobs and workers.

Recent technical innovations have driven the rapid development of generative AI systems, which produce text, images, or other content based on user requests – advances which have the potential to complement or replace human labor in specific tasks, and to reshape demand for certain types of expertise in the labor market.

Artificial Intelligence and the Future of Work evaluates recent advances in AI technology and their implications for economic productivity, the workforce, and education in the United States. The report notes that AI is a tool with the potential to enhance human labor and create new forms of valuable work – but this is not an inevitable outcome. Tracking progress in AI and its impacts on the workforce will be critical to helping inform and equip workers and policymakers to flexibly respond to AI developments…(More)”.