Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Rival Human Crowd Accuracy


Paper by Philipp Schoenegger, Indre Tuminauskaite, Peter S. Park, and Philip E. Tetlock: “Human forecasting accuracy in practice relies on the ‘wisdom of the crowd’ effect, in which predictions about future events are significantly improved by aggregating across a crowd of individual forecasters. Past work on the forecasting ability of large language models (LLMs) suggests that frontier LLMs, as individual forecasters, underperform compared to the gold standard of a human crowd forecasting tournament aggregate. In Study 1, we expand this research by using an LLM ensemble approach consisting of a crowd of twelve LLMs. We compare the aggregated LLM predictions on 31 binary questions to that of a crowd of 925 human forecasters from a three-month forecasting tournament. Our preregistered main analysis shows that the LLM crowd outperforms a simple no-information benchmark and is not statistically different from the human crowd. In exploratory analyses, we find that these two approaches are equivalent with respect to medium-effect-size equivalence bounds. We also observe an acquiescence effect, with mean model predictions being significantly above 50%, despite an almost even split of positive and negative resolutions. Moreover, in Study 2, we test whether LLM predictions (of GPT-4 and Claude 2) can be improved by drawing on human cognitive output. We find that both models’ forecasting accuracy benefits from exposure to the median human prediction as information, improving accuracy by between 17% and 28%: though this leads to less accurate predictions than simply averaging human and machine forecasts. Our results suggest that LLMs can achieve forecasting accuracy rivaling that of human crowd forecasting tournaments: via the simple, practically applicable method of forecast aggregation. This replicates the ‘wisdom of the crowd’ effect for LLMs, and opens up their use for a variety of applications throughout society…(More)”.

Eat, Click, Judge: The Rise of Cyber Jurors on China’s Food Apps


Article from Ye Zhanhang: “From unwanted ingredients in takeaway meals and negative restaurant reviews to late deliveries and poor product quality, digital marketplaces teem with minor frustrations. 

But because they affect customer satisfaction and business reputations, several Chinese online shopping platforms have come up with a unique solution: Ordinary users can become “cyber jurors” to deliberate and cast decisive votes in resolving disputes between buyers and sellers.

Though introduced in 2020, the concept has surged in popularity among young Chinese in recent months, primarily fueled by viral cases that users eagerly follow, scrutinizing every detail and deliberation online…

To be eligible for the role, a user must meet certain criteria, including having a verified account, maintaining consumption records within the past three months, and successfully navigating five mock cases as part of an entry test. Cyber jurors don’t receive any money for completing cases but may be rewarded with coupons.

Xianyu, an online secondhand shopping platform, has also introduced a “court” system that assembles a jury of 17 volunteer users to adjudicate disputes between buyers and sellers. 

Miao Mingyu, a law professor at the University of Chinese Academy of Social Sciences, told China Youth Daily that this public jury function, with its impartial third-party perspective, has the potential to enhance transaction transparency and the fairness of the platform’s evaluation system.

Despite Chinese law prohibiting platforms from removing user reviews of products, Miao noted that this feature has enabled the platform to effectively address unfair negative reviews without violating legal constraints…(More)”.

How to improve economic forecasting


Article by Nicholas Gruen: “Today’s four-day weather forecasts are as accurate as one-day forecasts were 30 years ago. Economic forecasts, on the other hand, aren’t noticeably better. Former Federal Reserve chair Ben Bernanke should ponder this in his forthcoming review of the Bank of England’s forecasting.

There’s growing evidence that we can improve. But myopia and complacency get in the way. Myopia is an issue because economists think technical expertise is the essence of good forecasting when, actually, two things matter more: forecasters’ understanding of the limits of their expertise and their judgment in handling those limits.

Enter Philip Tetlock, whose 2005 book on geopolitical forecasting showed how little experts added to forecasting done by informed non-experts. To compare forecasts between the two groups, he forced participants to drop their vague weasel words — “probably”, “can’t be ruled out” — and specify exactly what they were forecasting and with what probability. 

That started sorting the sheep from the goats. The simple “point forecasts” provided by economists — such as “growth will be 3.0 per cent” — are doubly unhelpful in this regard. They’re silent about what success looks like. If I have forecast 3.0 per cent growth and actual growth comes in at 3.2 per cent — did I succeed or fail? Such predictions also don’t tell us how confident the forecaster is.

By contrast, “a 70 per cent chance of rain” specifies a clear event with a precise estimation of the weather forecaster’s confidence. Having rigorously specified the rules of the game, Tetlock has since shown how what he calls “superforecasting” is possible and how diverse teams of superforecasters do even better. 

What qualities does Tetlock see in superforecasters? As well as mastering necessary formal techniques, they’re open-minded, careful, curious and self-critical — in other words, they’re not complacent. Aware, like Socrates, of how little they know, they’re constantly seeking to learn — from unfolding events and from colleagues…(More)”.

The Crowdless Future? How Generative AI Is Shaping the Future of Human Crowdsourcing


Paper by Leonard Boussioux, Jacqueline Lane, Miaomiao Zhang, Vladimir Jacimovic, and Karim Lakhani: “This study investigates the capability of generative artificial intelligence (AI) in creating innovative business solutions compared to human crowdsourcing methods. We initiated a crowdsourcing challenge focused on sustainable, circular economy business opportunities. The challenge attracted a diverse range of solvers from a myriad of countries and industries. Simultaneously, we employed GPT-4 to generate AI solutions using three different prompt levels, each calibrated to simulate distinct human crowd and expert personas. 145 evaluators assessed a randomized selection of 10 out of 234 human and AI solutions, a total of 1,885 evaluator-solution pairs. Results showed comparable quality between human and AI-generated solutions. However, human ideas were perceived as more novel, whereas AI solutions delivered better environmental and financial value. We use natural language processing techniques on the rich solution text to show that although human solvers and GPT-4 cover a similar range of industries of application, human solutions exhibit greater semantic diversity. The connection between semantic diversity and novelty is stronger in human solutions, suggesting differences in how novelty is created by humans and AI or detected by human evaluators. This study illuminates the potential and limitations of both human and AI crowdsourcing to solve complex organizational problems and sets the groundwork for a possible integrative human-AI approach to problem-solving…(More)”.

Wikipedia’s Moment of Truth


Article by Jon Gertner at the New York Times: “In early 2021, a Wikipedia editor peered into the future and saw what looked like a funnel cloud on the horizon: the rise of GPT-3, a precursor to the new chatbots from OpenAI. When this editor — a prolific Wikipedian who goes by the handle Barkeep49 on the site — gave the new technology a try, he could see that it was untrustworthy. The bot would readily mix fictional elements (a false name, a false academic citation) into otherwise factual and coherent answers. But he had no doubts about its potential. “I think A.I.’s day of writing a high-quality encyclopedia is coming sooner rather than later,” he wrote in “Death of Wikipedia,” an essay that he posted under his handle on Wikipedia itself. He speculated that a computerized model could, in time, displace his beloved website and its human editors, just as Wikipedia had supplanted the Encyclopaedia Britannica, which in 2012 announced it was discontinuing its print publication.

Recently, when I asked this editor — he asked me to withhold his name because Wikipedia editors can be the targets of abuse — if he still worried about his encyclopedia’s fate, he told me that the newer versions made him more convinced that ChatGPT was a threat. “It wouldn’t surprise me if things are fine for the next three years,” he said of Wikipedia, “and then, all of a sudden, in Year 4 or 5, things drop off a cliff.”..(More)”.

The wisdom of crowds for improved disaster resilience: a near-real-time analysis of crowdsourced social media data on the 2021 flood in Germany


Paper by Mahsa Moghadas, Alexander Fekete, Abbas Rajabifard & Theo Kötter: “Transformative disaster resilience in times of climate change underscores the importance of reflexive governance, facilitation of socio-technical advancement, co-creation of knowledge, and innovative and bottom-up approaches. However, implementing these capacity-building processes by relying on census-based datasets and nomothetic (or top-down) approaches remains challenging for many jurisdictions. Web 2.0 knowledge sharing via online social networks, whereas, provides a unique opportunity and valuable data sources to complement existing approaches, understand dynamics within large communities of individuals, and incorporate collective intelligence into disaster resilience studies. Using Twitter data (passive crowdsourcing) and an online survey, this study draws on the wisdom of crowds and public judgment in near-real-time disaster phases when the flood disaster hit Germany in July 2021. Latent Dirichlet Allocation, an unsupervised machine learning technique for Topic Modeling, was applied to the corpora of two data sources to identify topics associated with different disaster phases. In addition to semantic (textual) analysis, spatiotemporal patterns of online disaster communication were analyzed to determine the contribution patterns associated with the affected areas. Finally, the extracted topics discussed online were compiled into five themes related to disaster resilience capacities (preventive, anticipative, absorptive, adaptive, and transformative). The near-real-time collective sensing approach reflected optimized diversity and a spectrum of people’s experiences and knowledge regarding flooding disasters and highlighted communities’ sociocultural characteristics. This bottom-up approach could be an innovative alternative to traditional participatory techniques of organizing meetings and workshops for situational analysis and timely unfolding of such events at a fraction of the cost to inform disaster resilience initiatives…(More)”.

How one group of ‘fellas’ is winning the meme war in support of Ukraine


Article by Suzanne Smalley: “The North Atlantic Fella Organization, or NAFO, has arrived.

Ukraine’s Defense Ministry celebrated the group on Twitter for waging a “fierce fight” against Kremlin trolls. And Rep. Adam Kinzinger, D-Ill., tweeted that he was “self-declaring as a proud member of #NAFO” and “the #fellas shall prevail.”

The brainchild of former Marine Matt Moores, NAFO launched in May and quickly blew up on Twitter. It’s become something of a movement, drawing support in military and cybersecurity circles who circulate its meme backing Ukraine in its war against Russia.

“The power of what we’re doing is that instead of trying to come in and point-by-point refute, and argue about what’s true and what isn’t, it’s coming and saying, ‘Hey, that’s dumb,’” Moores said during a panel on Wednesday at the Center for International and Strategic Studies in Washington. “And the moment somebody’s replying to a cartoon dog online, you’ve lost if you work for the government of Russia.”

Memes have figured heavily in the information war following the Russian invasion. The Ukrainian government has proven eager to highlight memes on agency websites and officials have been known to personally thank online communities that spread anti-Russian memes. The NAFO meme shared by the defense ministry in August showed a Shiba Inu dog in a military uniform appearing to celebrate a missile launch.

The Shiba Inu has long been a motif in internet culture. According to Vice’s Motherboard, the use of Shiba Inu to represent a “fella” waging online war against the Russians dates to at least May when an artist started rewarding fellas who donated money to the Georgian Legion by creating customized fella art for online use…(More)”.

Policy Choice and the Wisdom of Crowds


Paper by Nicholas Otis: “Using data from seven large-scale randomized experiments, I test whether crowds of academic experts can forecast the relative effectiveness of policy interventions. Eight-hundred and sixty-three academic experts provided 9,295 forecasts of the causal effects from these experiments, which span a diverse set of interventions (e.g., information provision, psychotherapy, soft-skills training), outcomes (e.g., consumption, COVID-19 vaccination, employment), and locations (Jordan, Kenya, Sweden, the United States). For each policy comparisons (a pair of policies and an outcome), I calculate the percent of crowd forecasts that correctly rank policies by their experimentally estimated treatment effects. While only 65% of individual experts identify which of two competing policies will have a larger causal effect, the average forecast from bootstrapped crowds of 30 experts identifies the better policy 86% of the time, or 92% when restricting analysis to pairs of policies who effects differ at the p < 0.10 level. Only 10 experts are needed to produce an 18-percentage point (27%) improvement in policy choice…(More)”.

Crowdsourced Politics


Book by Ariadne Vromen, Darren Halpin, Michael Vaughan: “This book focuses on online petitioning and crowdfunding platforms to demonstrate the everyday impact that digital communications have had on contemporary citizen participation. It argues that crowdsourced participation has become normalised and institutionalised into the repertoires of citizens and their organisations. 

To illustrate their arguments the authors use an original survey on acts of political engagement, undertaken with Australian citizens. Through detailed interviews and online analysis they show how advocacy organisations now use online petitions for strategic interventions and mobilisation. They also analyse the policy issues that mobilise citizens on crowdsourcing platforms, including a unique dataset of 17,000 petitions from the popular non-government platform, Change.org. Contrasting mass public concerns with the policy agenda of the government of the day shows there is a disjuncture and lack of responsiveness to crowdsourced citizen expression. Ultimately the book explores the long-term implications of citizen-led change for democracy. ..(More)”.

Mapping community resources for disaster preparedness: humanitarian data capability and automated futures


Report by Anthony McCosker et al: “This report details the rationale, background research and design for a platform to help local communities map resources for disaster preparedness. It sets out a first step in improving community data capability through resource mapping to enhance humanitarian action before disaster events occur.The project seeks to enable local community disaster preparedness and thus build community resilience by improving the quality of data about community strengths, resources and assets.

In this report, the authors define a gap in existing humanitarian mapping approaches and the uses of open, public and social media data in humanitarian contexts. The report surveys current knowledge and present a selection of case studies delivering data and humanitarian mapping in local communities.

Drawing on this knowledge and practice review and stakeholder workshops throughout 2021, the authors also define a method and toolkit for the effective use of community assets data…(More)”