Data, waves and wind to be counted in the economy


Article by Robert Cuffe: “Wind and waves are set to be included in calculations of the size of countries’ economies for the first time, as part of changes approved at the United Nations.

Assets like oilfields were already factored in under the rules – last updated in 2008.

This update aims to capture areas that have grown since then, such as the cost of using up natural resources and the value of data.

The changes come into force in 2030, and could mean an increase in estimates of the size of the UK economy making promises to spend a fixed share of the economy on defence or aid more expensive.

The economic value of wind and waves can be estimated from the price of all the energy that can be generated from the turbines in a country.

The update also treats data as an asset in its own right on top of the assets that house it like servers and cables.

Governments use a common rule book for measuring the size of their economies and how they grow over time.

These changes to the rule book are “tweaks, rather than a rewrite”, according to Prof Diane Coyle of the University of Cambridge.

Ben Zaranko of the Institute for Fiscal Studies (IFS) calls it an “accounting” change, rather than a real change. He explains: “We’d be no better off in a material sense, and tax revenues would be no higher.”

But it could make economies look bigger, creating a possible future spending headache for the UK government…(More)”.

Bridging the Data Provenance Gap Across Text, Speech and Video


Paper by Shayne Longpre et al: “Progress in AI is driven largely by the scale and quality of training data. Despite this, there is a deficit of empirical analysis examining the attributes of well-established datasets beyond text. In this work we conduct the largest and first-of-its-kind longitudinal audit across modalities–popular text, speech, and video datasets–from their detailed sourcing trends and use restrictions to their geographical and linguistic representation. Our manual analysis covers nearly 4000 public datasets between 1990-2024, spanning 608 languages, 798 sources, 659 organizations, and 67 countries. We find that multimodal machine learning applications have overwhelmingly turned to web-crawled, synthetic, and social media platforms, such as YouTube, for their training sets, eclipsing all other sources since 2019. Secondly, tracing the chain of dataset derivations we find that while less than 33% of datasets are restrictively licensed, over 80% of the source content in widely-used text, speech, and video datasets, carry non-commercial restrictions. Finally, counter to the rising number of languages and geographies represented in public AI training datasets, our audit demonstrates measures of relative geographical and multilingual representation have failed to significantly improve their coverage since 2013. We believe the breadth of our audit enables us to empirically examine trends in data sourcing, restrictions, and Western-centricity at an ecosystem-level, and that visibility into these questions are essential to progress in responsible AI. As a contribution to ongoing improvements in dataset transparency and responsible use, we release our entire multimodal audit, allowing practitioners to trace data provenance across text, speech, and video…(More)”.

Artificial intelligence for modelling infectious disease epidemics


Paper by Moritz U. G. Kraemer et al: “Infectious disease threats to individual and public health are numerous, varied and frequently unexpected. Artificial intelligence (AI) and related technologies, which are already supporting human decision making in economics, medicine and social science, have the potential to transform the scope and power of infectious disease epidemiology. Here we consider the application to infectious disease modelling of AI systems that combine machine learning, computational statistics, information retrieval and data science. We first outline how recent advances in AI can accelerate breakthroughs in answering key epidemiological questions and we discuss specific AI methods that can be applied to routinely collected infectious disease surveillance data. Second, we elaborate on the social context of AI for infectious disease epidemiology, including issues such as explainability, safety, accountability and ethics. Finally, we summarize some limitations of AI applications in this field and provide recommendations for how infectious disease epidemiology can harness most effectively current and future developments in AI…(More)”.

A Roadmap to Accessing Mobile Network Data for Statistics


Guide by Global Partnership for Sustainable Development Data: “… introduces milestones on the path to mobile network data access. While it is aimed at stakeholders in national statistical systems and across national governments in general, the lessons should resonate with others seeking to take this route. The steps in this guide are written in the order in which they should be taken, and some readers who have already embarked on this journey may find they have completed some of these steps. 

This roadmap is meant to be followed in steps, and readers may start, stop, and return to points on the path at any point. 

The path to mobile network data access has three milestones:

  1. Evaluating the opportunity – setting clear goals for the desired impact of data innovation.
  2. Engaging with stakeholders – getting critical stakeholders to support your cause.
  3. Executing collaboration agreements – signing a written agreement among partners…(More)”

Moving Toward the FAIR-R principles: Advancing AI-Ready Data


Paper by Stefaan Verhulst, Andrew Zahuranec and Hannah Chafetz: “In today’s rapidly evolving AI ecosystem, making data ready for AI-optimized for training, fine-tuning, and augmentation-is more critical than ever. While the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) have guided data management and open science, they do not inherently address AI-specific needs. Expanding FAIR to FAIR-R, incorporating Readiness for AI, could accelerate the responsible use of open data in AI applications that serve the public interest. This paper introduces the FAIR-R framework and identifies current efforts for enhancing AI-ready data through improved data labeling, provenance tracking, and new data standards. However, key challenges remain: How can data be structured for AI without compromising ethics? What governance models ensure equitable access? How can AI itself be leveraged to improve data quality? Answering these questions is essential for unlocking the full potential of AI-driven innovation while ensuring responsible and transparent data use…(More)”.

Elon Musk Also Has a Problem with Wikipedia


Article by Margaret Talbot: “If you have spent time on Wikipedia—and especially if you’ve delved at all into the online encyclopedia’s inner workings—you will know that it is, in almost every aspect, the inverse of Trumpism. That’s not a statement about its politics. The thousands of volunteer editors who write, edit, and fact-check the site manage to adhere remarkably well, over all, to one of its core values: the neutral point of view. Like many of Wikipedia’s s principles and procedures, the neutral point of view is the subject of a practical but sophisticated epistemological essay posted on Wikipedia. Among other things, the essay explains, N.P.O.V. means not stating opinions as facts, and also, just as important, not stating facts as opinions. (So, for example, the third sentence of the entry titled “Climate change” states, with no equivocation, that “the current rise in global temperatures is driven by human activities, especially fossil fuel burning since the Industrial Revolution.”)…So maybe it should come as no surprise that Elon Musk has lately taken time from his busy schedule of dismantling the federal government, along with many of its sources of reliable information, to attack Wikipedia. On January 21st, after the site updated its page on Musk to include a reference to the much-debated stiff-armed salute he made at a Trump inaugural event, he posted on X that “since legacy media propaganda is considered a ‘valid’ source by Wikipedia, it naturally simply becomes an extension of legacy media propaganda!” He urged people not to donate to the site: “Defund Wikipedia until balance is restored!” It’s worth taking a look at how the incident is described on Musk’s page, quite far down, and judging for yourself. What I see is a paragraph that first describes the physical gesture (“Musk thumped his right hand over his heart, fingers spread wide, and then extended his right arm out, emphatically, at an upward angle, palm down and fingers together”), goes on to say that “some” viewed it as a Nazi or a Roman salute, then quotes Musk disparaging those claims as “politicized,” while noting that he did not explicitly deny them. (There is also now a separate Wikipedia article, “Elon Musk salute controversy,” that goes into detail about the full range of reactions.)

This is not the first time Musk has gone after the site. In December, he posted on X, “Stop donating to Wokepedia.” And that wasn’t even his first bad Wikipedia pun. “I will give them a billion dollars if they change their name to Dickipedia,” he wrote, in an October, 2023, post. It seemed to be an ego thing at first. Musk objected to being described on his page as an “early investor” in Tesla, rather than as a founder, which is how he prefers to be identified, and seemed frustrated that he couldn’t just buy the site. But lately Musk’s beef has merged with a general conviction on the right that Wikipedia—which, like all encyclopedias, is a tertiary source that relies on original reporting and research done by other media and scholars—is biased against conservatives.

The Heritage Foundation, the think tank behind the Project 2025 policy blueprint, has plans to unmask Wikipedia editors who maintain their privacy using pseudonyms (these usernames are displayed in the article history but don’t necessarily make it easy to identify the people behind them) and whose contributions on Israel it deems antisemitic…(More)”.

Presenting the StanDat database on international standards: improving data accessibility on marginal topics


Article by Solveig Bjørkholt: “This article presents an original database on international standards, constructed using modern data gathering methods. StanDat facilitates studies into the role of standards in the global political economy by (1) being a source for descriptive statistics, (2) enabling researchers to assess scope conditions of previous findings, and (3) providing data for new analyses, for example the exploration of the relationship between standardization and trade, as demonstrated in this article. The creation of StanDat aims to stimulate further research into the domain of standards. Moreover, by exemplifying data collection and dissemination techniques applicable to investigating less-explored subjects in the social sciences, it serves as a model for gathering, systematizing, and sharing data in areas where information is plentiful yet not readily accessible for research…(More)”.

Diversifying Professional Roles in Data Science


Policy Briefing by Emma Karoune and Malvika Sharan: The interdisciplinary nature of the data science workforce extends beyond the traditional notion of a “data scientist.” A successful data science team requires a wide range of technical expertise, domain knowledge and leadership capabilities. To strengthen such a team-based approach, this note recommends that institutions, funders and policymakers invest in developing and professionalising diverse roles, fostering a resilient data science ecosystem for the future. 


By recognising the diverse specialist roles that collaborate within interdisciplinary teams, organisations can leverage deep expertise across multiple skill sets, enhancing responsible decision-making and fostering innovation at all levels. Ultimately, this note seeks to shift the perception of data science professionals from the conventional view of individual data scientists to a competency-based model of specialist roles within a team, each essential to the success of data science initiatives…(More)”.

To Stop Tariffs, Trump Demands Opioid Data That Doesn’t Yet Exist


Article by Josh Katz and Margot Sanger-Katz: “One month ago, President Trump agreed to delay tariffs on Canada and Mexico after the two countries agreed to help stem the flow of fentanyl into the United States. On Tuesday, the Trump administration imposed the tariffs anyway, saying that the countries had failed to do enough — and claiming that tariffs would be lifted only when drug deaths fall.

But the administration has seemingly established an impossible standard. Real-time, national data on fentanyl overdose deaths does not exist, so there is no way to know whether Canada and Mexico were able to “adequately address the situation” since February, as the White House demanded.

“We need to see material reduction in autopsied deaths from opioids,” said Howard Lutnick, the commerce secretary, in an interview on CNBC on Tuesday, indicating that such a decline would be a precondition to lowering tariffs. “But you’ve seen it — it has not been a statistically relevant reduction of deaths in America.”

In a way, Mr. Lutnick is correct that there is no evidence that overdose deaths have fallen in the last month — since there is no such national data yet. His stated goal to measure deaths again in early April will face similar challenges.

But data through September shows that fentanyl deaths had already been falling at a statistically significant rate for months, causing overall drug deaths to drop at a pace unlike any seen in more than 50 years of recorded drug overdose mortality data.

The declines can be seen in provisional data from the Centers for Disease Control and Prevention, which compiles death records from states, which in turn collect data from medical examiners and coroners in cities and towns. Final national data generally takes more than a year to produce. But, as the drug overdose crisis has become a major public health emergency in recent years, the C.D.C. has been publishing monthly data, with some holes, at around a four-month lag…(More)”.

Commerce Secretary’s Comments Raise Fears of Interference in Federal Data


Article by Ben Casselman and Colby Smith: “Comments from a member of President Trump’s cabinet over the weekend have renewed concerns that the new administration could seek to interfere with federal statistics — especially if they start to show that the economy is slipping into a recession.

In an interview on Fox News on Sunday, Howard Lutnick, the commerce secretary, suggested that he planned to change the way the government reports data on gross domestic product in order to remove the impact of government spending.

“You know that governments historically have messed with G.D.P.,” he said. “They count government spending as part of G.D.P. So I’m going to separate those two and make it transparent.”

It wasn’t immediately clear what Mr. Lutnick meant. The basic definition of gross domestic product is widely accepted internationally and has been unchanged for decades. It tallies consumer spending, private-sector investment, net exports, and government investment and spending to arrive at a broad measure of all goods and services produced in a country.The Bureau of Economic Analysis, which is part of Mr. Lutnick’s department, already produces a detailed breakdown of G.D.P. into its component parts. Many economists focus on a measure — known as “final sales to private domestic purchasers” — that excludes government spending and is often seen as a better indicator of underlying demand in the economy. That measure has generally shown stronger growth in recent quarters than overall G.D.P. figures.

In recent weeks, however, there have been mounting signs elsewhere that the economy could be losing momentumConsumer spending fell unexpectedly in January, applications for unemployment insurance have been creeping upward, and measures of housing construction and home sales have turned down. A forecasting model from the Federal Reserve Bank of Atlanta predicts that G.D.P. could contract sharply in the first quarter of the year, although most private forecasters still expect modest growth.

Cuts to federal spending and the federal work force could act as a further drag on economic growth in coming months. Removing federal spending from G.D.P. calculations, therefore, could obscure the impact of the administration’s policies…(More)”.