We could run out of data to train AI language programs 


Article by Tammy Xu: “Large language models are one of the hottest areas of AI research right now, with companies racing to release programs like GPT-3 that can write impressively coherent articles and even computer code. But there’s a problem looming on the horizon, according to a team of AI forecasters: we might run out of data to train them on.

Language models are trained using texts from sources like Wikipedia, news articles, scientific papers, and books. In recent years, the trend has been to train these models on more and more data in the hope that it’ll make them more accurate and versatile.

The trouble is, the types of data typically used for training language models may be used up in the near future—as early as 2026, according to a paper by researchers from Epoch, an AI research and forecasting organization, that is yet to be peer reviewed. The issue stems from the fact that, as researchers build more powerful models with greater capabilities, they have to find ever more texts to train them on. Large language model researchers are increasingly concerned that they are going to run out of this sort of data, says Teven Le Scao, a researcher at AI company Hugging Face, who was not involved in Epoch’s work.

The issue stems partly from the fact that language AI researchers filter the data they use to train models into two categories: high quality and low quality. The line between the two categories can be fuzzy, says Pablo Villalobos, a staff researcher at Epoch and the lead author of the paper, but text from the former is viewed as better-written and is often produced by professional writers…(More)”.

How many yottabytes in a quettabyte? Extreme numbers get new names


Article by Elizabeth Gibney: “By the 2030s, the world will generate around a yottabyte of data per year — that’s 1024 bytes, or the amount that would fit on DVDs stacked all the way to Mars. Now, the booming growth of the data sphere has prompted the governors of the metric system to agree on new prefixes beyond that magnitude, to describe the outrageously big and small.

Representatives from governments worldwide, meeting at the General Conference on Weights and Measures (CGPM) outside Paris on 18 November, voted to introduce four new prefixes to the International System of Units (SI) with immediate effect. The prefixes ronna and quetta represent 1027 and 1030, and ronto and quecto signify 10−27 and 10−30. Earth weighs around one ronnagram, and an electron’s mass is about one quectogram.

This is the first update to the prefix system since 1991, when the organization added zetta (1021), zepto (1021), yotta (1024) and yocto (10−24). In that case, metrologists were adapting to fit the needs of chemists, who wanted a way to express SI units on the scale of Avogadro’s number — the 6 × 1023 units in a mole, a measure of the quantity of substances. The more familiar prefixes peta and exa were added in 1975 (see ‘Extreme figures’).

Extreme figures

Advances in scientific fields have led to increasing need for prefixes to describe very large and very small numbers.

FactorNameSymbolAdopted
1030quettaQ2022
1027ronnaR2022
1024yottaY1991
1021zettaZ1991
1018exaE1975
1015petaP1975
10−15femtof1964
10−18attoa1964
10−21zeptoz1991
10−24yoctoy1991
10−27rontor2022
10−30quectoq2022

Prefixes are agreed at the General Conference on Weights and Measures.

Today, the driver is data science, says Richard Brown, a metrologist at the UK National Physical Laboratory in Teddington. He has been working on plans to introduce the latest prefixes for five years, and presented the proposal to the CGPM on 17 November. With the annual volume of data generated globally having already hit zettabytes, informal suggestions for 1027 — including ‘hella’ and ‘bronto’ — were starting to take hold, he says. Google’s unit converter, for example, already tells users that 1,000 yottabytes is 1 hellabyte, and at least one UK government website quotes brontobyte as the correct term….(More)”

Institutions, Experts & the Loss of Trust


Essay by Henry E. Brady and Kay Lehman Schlozman: “Institutions are critical to our personal and societal well-being. They develop and disseminate knowledge, enforce the law, keep us healthy, shape labor relations, and uphold social and religious norms. But institutions and the people who lead them cannot fulfill their missions if they have lost legitimacy in the eyes of the people they are meant to serve.

Americans’ distrust of Congress is long-standing. What is less well-documented is how partisan polarization now aligns with the growing distrust of institutions once thought of as nonpolitical. Refusals to follow public health guidance about COVID-19, calls to defund the police, the rejection of election results, and disbelief of the press highlight the growing polarization of trust. But can these relationships be broken? And how does the polarization of trust affect institutions’ ability to confront shared problems, like climate change, epidemics, and economic collapse?…(More)”.

Humanizing Science and Engineering for the Twenty-First Century


Essay by Kaye Husbands Fealing, Aubrey Deveny Incorvaia and Richard Utz: “Solving complex problems is never a purely technical or scientific matter. When science or technology advances, insights and innovations must be carefully communicated to policymakers and the public. Moreover, scientists, engineers, and technologists must draw on subject matter expertise in other domains to understand the full magnitude of the problems they seek to solve. And interdisciplinary awareness is essential to ensure that taxpayer-funded policy and research are efficient and equitable and are accountable to citizens at large—including members of traditionally marginalized communities…(More)”.

Science and the World Cup: how big data is transforming football


Essay by David Adam: “The scowl on Cristiano Ronaldo’s face made international headlines last month when the Portuguese superstar was pulled from a match between Manchester United and Newcastle with 18 minutes left to play. But he’s not alone in his sentiment. Few footballers agree with a manager’s decision to substitute them in favour of a fresh replacement.

During the upcoming football World Cup tournament in Qatar, players will have a more evidence-based way to argue for time on the pitch. Within minutes of the final whistle, tournament organizers will send each player a detailed breakdown of their performance. Strikers will be able to show how often they made a run and were ignored. Defenders will have data on how much they hassled and harried the opposing team when it had possession.

It’s the latest incursion of numbers into the beautiful game. Data analysis now helps to steer everything from player transfers and the intensity of training, to targeting opponents and recommending the best direction to kick the ball at any point on the pitch.

Meanwhile, footballers face the kind of data scrutiny more often associated with an astronaut. Wearable vests and straps can now sense motion, track position with GPS and count the number of shots taken with each foot. Cameras at multiple angles capture everything from headers won to how long players keep the ball. And to make sense of this information, most elite football teams now employ data analysts, including mathematicians, data scientists and physicists plucked from top companies and labs such as computing giant Microsoft and CERN, Europe’s particle-physics laboratory near Geneva, Switzerland….(More)”.

The network science of collective intelligence


Article by Damon Centola: “In the last few years, breakthroughs in computational and experimental techniques have produced several key discoveries in the science of networks and human collective intelligence. This review presents the latest scientific findings from two key fields of research: collective problem-solving and the wisdom of the crowd. I demonstrate the core theoretical tensions separating these research traditions and show how recent findings offer a new synthesis for understanding how network dynamics alter collective intelligence, both positively and negatively. I conclude by highlighting current theoretical problems at the forefront of research on networked collective intelligence, as well as vital public policy challenges that require new research efforts…(More)”.

How government can capitalise on a revolution in data sharing


Article by Alison Pritchard: “A watershed moment in the culture of data sharing, the pandemic has led to the use of linked data increasingly becoming standard practice. From linking census and NHS data to track the virus’s impact among minority ethnic groups, to the linking of timely local data sources to support local authorities’ responses, the value of sharing data across boundaries was self-evident. 

Using data to inform a multidisciplinary pandemic response accelerated our longstanding work on data capability. To continue this progress, there is now a need to make government data more organised, easier to access, and integrated for use. Our learning has guided the development of a new cloud-based platform that will ensure that anonymised data about our society and economy are now linked and accessible for vital research and decision-making in the UK.

The idea of sharing data to maximise impact isn’t new to us at the ONS – we’ve been doing this successfully for over 15 years through our well-respected Secure Research Service (SRS). The new Integrated Data Service (IDS) is the next step in this data-sharing journey, where, in a far more advanced form, government will have the ability to work with data at source – in a safe and secure environment – rather than moving data around, which currently creates friction and significant cost. The service, being compliant with the Digital Economy Act, opens up opportunities to capitalise on the often-underutilised research elements of that key legislation.

The launch of the full IDS in the spring of 2023 will see ready-to-use datasets made available to cross-government teams and wider research communities, enabling them to securely share, link and access them for vital research. The service is a collaboration among institutions to work on projects that shed light on some of the big challenges of the day, and to provide the ability to answer questions that we don’t yet know we need to answer…(More)”.

GDP is getting a makeover — what it means for economies, health and the planet


Article by Ehsan Masood: “The numbers are heading in the wrong direction. If the world continues on its current track, it will fall well short of achieving almost all of the 17 Sustainable Development Goals (SDGs) that the United Nations set to protect the environment and end poverty and inequality by 2030.

The projected grade for:

Eliminating hunger: F.

Ensuring healthy lives for all: F.

Protecting and sustainably using ocean resources: F.

The trends were there before 2020, but then problems increased with the COVID-19 pandemic, war in Ukraine and the worsening effects of climate change. The world is in “a new uncertainty complex”, says economist Pedro Conceição, lead author of the United Nations Human Development Report.

One measure of this is the drastic change in the Human Development Index (HDI), which combines educational outcomes, income and life expectancy into a single composite indicator. After 2019, the index has fallen for two successive years for the first time since its creation in 1990. “I don’t think this is a one-off, or a blip. I think this could be a new reality,” Conceição says.

UN secretary-general António Guterres is worried. “We need an urgent rescue effort for the SDGs,” he wrote in the foreword to the latest progress report, published in July. Over the past year, Guterres and the heads of big UN agencies, such as the Statistics Division and the UN Development Programme, have been assessing what’s gone wrong and what needs to be done. They’re converging on the idea that it’s time to stop using gross domestic product (GDP) as the world’s main measure of prosperity, and to complement it with a dashboard of indicators, possibly ones linked to the SDGs. If this happens, it would be the biggest shift in how economies are measured since nations first started using GDP in 1953, almost 70 years ago.

Guterres’s is the latest in a crescendo of voices calling for GDP to be dropped as the world’s primary go-to indicator, and for a dashboard of metrics instead. In 2008, then French president Nicolas Sarkozy endorsed such a call from a team of economists, including Nobel laureates Amartya Sen and Joseph Stiglitz.

And in August, the White House announced a 15-year plan to develop a new summary statistic that would show how changes to natural assets — the natural wealth on which economies depend — affect GDP. The idea, according to the project’s main architect, economist Eli Fenichel at the White House Office of Science and Technology Policy, is to help society to determine whether today’s consumption is being accomplished without compromising the future opportunities that nature provides. “GDP only gives a partial and — for many common uses — an incomplete, picture of economic progress,” Fenichel says.

The fact that Guterres has made this a priority, amid so many major crises, is a sign that “going beyond GDP has been picked up at the highest level”, says Stefan Schweinfest, the director of the UN Statistics Division, based in New York City…(More)”.

The Case for Abolishing Elections


Essay by Nicholas Coccoma: “Terry Bouricius remembers the moment he converted to democracy by lottery. A bookish Vermonter, now 68, he was elected to the State House in 1990 after working for years as a public official in Burlington. At first state government excited him, but he quickly grew disillusioned. “During my time as a legislator,” he told me in an interview last year, “it became obvious to me that the ‘people’s house’ was not very representative of the people who actually lived in Vermont.”

The revelation came while Bouricius was working on a housing committee. “The committee members were an outgoing and garrulous bunch,” he observed. “Shy wallflowers almost never become legislators.” More disturbing, he noted how his fellow politicians—all of whom owned their homes—tended to legislate in favor of landlords and against tenants. “I saw that the experiences and beliefs of legislators shape legislation far more than facts,” he said. “After that, I frequently commented that any 150 Vermonters pulled from the phone book would be more representative than the elected House membership.”

There is widespread disgust with electoral politics and a hunger for greater responsiveness—a hunger, in other words, for democracy.

Many Americans agree. In a poll conducted in January 2020, 65 percent of respondents said that everyday people selected by lottery—who meet some basic requirements and are willing and able to serve—would perform better or much better compared to elected politicians. In March last year a Pew survey found that a staggering 79 percent believe it’s very or somewhat important for the government to create assemblies where everyday citizens from all walks of life can debate issues and make recommendations about national laws. “My decade of experience serving in the state legislature convinces me that this popular assessment is correct,” Bouricius said.

The idea—technically known as “sortition”—has been spreading. Perhaps its most prominent academic advocate is Yale political theorist Hélène Landemore. Her 2020 book Open Democracy: Reinventing Popular Rule for the Twenty-First Century explores the limitations of both direct democracy and electoral-representative democracy, advocating instead for government by large, randomly selected “mini-publics.” As she put it in conversation with Ezra Klein at the New York Times last year, “I think we are realizing the limits of just being able to choose rulers, as opposed to actually being able to choose outcomes.” She is not alone. Rutgers philosopher Alex Guerrero and Belgian public intellectual David Van Reybrouck have made similar arguments in favor of democracy by lottery. In the 2016 translation of his book Against Elections, Van Reybrouck characterizes elections as “the fossil fuel of politics.” “Whereas once they gave democracy a huge boost,” he writes, “much like the boost that oil gave the economy, it now it turns out they cause colossal problems of their own.”…(More)”.

Algorithms Quietly Run the City of DC—and Maybe Your Hometown


Article by Khari Johnson: “Washington, DC, IS the home base of the most powerful government on earth. It’s also home to 690,000 people—and 29 obscure algorithms that shape their lives. City agencies use automation to screen housing applicants, predict criminal recidivism, identify food assistance fraud, determine if a high schooler is likely to drop out, inform sentencing decisions for young people, and many other things.

That snapshot of semiautomated urban life comes from a new report from the Electronic Privacy Information Center (EPIC). The nonprofit spent 14 months investigating the city’s use of algorithms and found they were used across 20 agencies, with more than a third deployed in policing or criminal justice. For many systems, city agencies would not provide full details of how their technology worked or was used. The project team concluded that the city is likely using still more algorithms that they were not able to uncover.

The findings are notable beyond DC because they add to the evidence that many cities have quietly put bureaucratic algorithms to work across their departments, where they can contribute to decisions that affect citizens’ lives.

Government agencies often turn to automation in hopes of adding efficiency or objectivity to bureaucratic processes, but it’s often difficult for citizens to know they are at work, and some systems have been found to discriminate and lead to decisions that ruin human lives. In Michigan, an unemployment-fraud detection algorithm with a 93 percent error rate caused 40,000 false fraud allegations. A 2020 analysis by Stanford University and New York University found that nearly half of federal agencies are using some form of automated decisionmaking systems…(More)”.