DAOs of Collective Intelligence? Unraveling the Complexity of Blockchain Governance in Decentralized Autonomous Organizations


Paper by Mark C. Ballandies, Dino Carpentras, and Evangelos Pournaras: “Decentralized autonomous organizations (DAOs) have transformed organizational structures by shifting from traditional hierarchical control to decentralized approaches, leveraging blockchain and cryptoeconomics. Despite managing significant funds and building global networks, DAOs face challenges like declining participation, increasing centralization, and inabilities to adapt to changing environments, which stifle innovation. This paper explores DAOs as complex systems and applies complexity science to explain their inefficiencies. In particular, we discuss DAO challenges, their complex nature, and introduce the self-organization mechanisms of collective intelligence, digital democracy, and adaptation. By applying these mechansims to improve DAO design and construction, a practical design framework for DAOs is created. This contribution lays a foundation for future research at the intersection of complexity science and DAOs…(More)”.

Using internet search data as part of medical research


Blog by Susan Thomas and Matthew Thompson: “…In the UK, almost 50 million health-related searches are made using Google per year. Globally there are 100s of millions of health-related searches every day. And, of course, people are doing these searches in real-time, looking for answers to their concerns in the moment. It’s also possible that, even if people aren’t noticing and searching about changes to their health, their behaviour is changing. Maybe they are searching more at night because they are having difficulty sleeping or maybe they are spending more (or less) time online. Maybe an individual’s search history could actually be really useful for researchers. This realisation has led medical researchers to start to explore whether individuals’ online search activity could help provide those subtle, almost unnoticeable signals that point to the beginning of a serious illness.

Our recent review found 23 studies have been published so far that have done exactly this. These studies suggest that online search activity among people later diagnosed with a variety of conditions ranging from pancreatic cancer and stroke to mood disorders, was different to people who did not have one of these conditions.

One of these studies was published by researchers at Imperial College London, who used online search activity to identify signals of women with gynaecological malignancies. They found that women with malignant (e.g. ovarian cancer) and benign conditions had different search patterns, up to two months prior to a GP referral. 

Pause for a moment, and think about what this could mean. Ovarian cancer is one of the most devastating cancers women get. It’s desperately hard to detect early – and yet there are signals of this cancer visible in women’s internet searches months before diagnosis?…(More)”.

Even laypeople use legalese


Paper by Eric Martínez, Francis Mollica and Edward Gibson: “Whereas principles of communicative efficiency and legal doctrine dictate that laws be comprehensible to the common world, empirical evidence suggests legal documents are largely incomprehensible to lawyers and laypeople alike. Here, a corpus analysis (n = 59) million words) first replicated and extended prior work revealing laws to contain strikingly higher rates of complex syntactic structures relative to six baseline genres of English. Next, two preregistered text generation experiments (n = 286) tested two leading hypotheses regarding how these complex structures enter into legal documents in the first place. In line with the magic spell hypothesis, we found people tasked with writing official laws wrote in a more convoluted manner than when tasked with writing unofficial legal texts of equivalent conceptual complexity. Contrary to the copy-and-edit hypothesis, we did not find evidence that people editing a legal document wrote in a more convoluted manner than when writing the same document from scratch. From a cognitive perspective, these results suggest law to be a rare exception to the general tendency in human language toward communicative efficiency. In particular, these findings indicate law’s complexity to be derived from its performativity, whereby low-frequency structures may be inserted to signal law’s authoritative, world-state-altering nature, at the cost of increased processing demands on readers. From a law and policy perspective, these results suggest that the tension between the ubiquity and impenetrability of the law is not an inherent one, and that laws can be simplified without a loss or distortion of communicative content…(More)”.

Regulating the Direction of Innovation


Paper by Joshua S. Gans: “This paper examines the regulation of technological innovation direction under uncertainty about potential harms. We develop a model with two competing technological paths and analyze various regulatory interventions. Our findings show that market forces tend to inefficiently concentrate research on leading paths. We demonstrate that ex post regulatory instruments, particularly liability regimes, outperform ex ante restrictions in most scenarios. The optimal regulatory approach depends critically on the magnitude of potential harm relative to technological benefits. Our analysis reveals subtle insurance motives in resource allocation across research paths, challenging common intuitions about diversification. These insights have important implications for regulating emerging technologies like artificial intelligence, suggesting the need for flexible, adaptive regulatory frameworks…(More)”.

Rejecting Public Utility Data Monopolies


Paper by Amy L. Stein: “The threat of monopoly power looms large today. Although not the telecommunications and tobacco monopolies of old, the Goliaths of Big Tech have become today’s target for potential antitrust violations. It is not only their control over the social media infrastructure and digital advertising technologies that gives people pause, but their monopolistic collection, use, and sale of customer data. But large technology companies are not the only private companies that have exclusive access to your data; that can crowd out competitors; and that can hold, use, or sell your data with little to no regulation. These other private companies are not data companies, platforms, or even brokers. They are public utilities.

Although termed “public utilities,” these entities are overwhelmingly private, shareholder-owned entities. Like private Big Tech, utilities gather incredible amounts of data from customers and use this data in various ways. And like private Big Tech, these utilities can exercise exclusionary and self-dealing anticompetitive behavior with respect to customer data. But there is one critical difference— unlike Big Tech, utilities enjoy an implied immunity from antitrust laws. This state action immunity has historically applied to utility provision of essential services like electricity and heat. As utilities find themselves in the position of unsuspecting data stewards, however, there is a real and unexplored question about whether their long- enjoyed antitrust immunity should extend to their data practices.

As the first exploration of this question, this Article tests the continuing application and rationale of the state action immunity doctrine to the evolving services that a utility provides as the grid becomes digitized. It demonstrates the importance of staunching the creep of state action immunity over utility data practices. And it recognizes the challenges of developing remedies for such data practices that do not disrupt the state-sanctioned monopoly powers of utilities over the provision of essential services. This Article analyzes both antitrust and regulatory remedies, including a new customer- focused “data duty,” as possible mechanisms to enhance consumer (ratepayer) welfare in this space. Exposing utility data practices to potential antitrust liability may be just the lever that is needed to motivate states, public utility commissions, and utilities to develop a more robust marketplace for energy data…(More)”.

Generative Discrimination: What Happens When Generative AI Exhibits Bias, and What Can Be Done About It


Paper by Philipp Hacker, Frederik Zuiderveen Borgesius, Brent Mittelstadt and Sandra Wachter: “Generative AI (genAI) technologies, while beneficial, risk increasing discrimination by producing demeaning content and subtle biases through inadequate representation of protected groups. This chapter examines these issues, categorizing problematic outputs into three legal categories: discriminatory content; harassment; and legally hard cases like harmful stereotypes. It argues for holding genAI providers and deployers liable for discriminatory outputs and highlights the inadequacy of traditional legal frameworks to address genAI-specific issues. The chapter suggests updating EU laws to mitigate biases in training and input data, mandating testing and auditing, and evolving legislation to enforce standards for bias mitigation and inclusivity as technology advances…(More)”.

The problem of ‘model collapse’: how a lack of human data limits AI progress


Article by Michael Peel: “The use of computer-generated data to train artificial intelligence models risks causing them to produce nonsensical results, according to new research that highlights looming challenges to the emerging technology. 

Leading AI companies, including OpenAI and Microsoft, have tested the use of “synthetic” data — information created by AI systems to then also train large language models (LLMs) — as they reach the limits of human-made material that can improve the cutting-edge technology.

Research published in Nature on Wednesday suggests the use of such data could lead to the rapid degradation of AI models. One trial using synthetic input text about medieval architecture descended into a discussion of jackrabbits after fewer than 10 generations of output. 

The work underlines why AI developers have hurried to buy troves of human-generated data for training — and raises questions of what will happen once those finite sources are exhausted. 

“Synthetic data is amazing if we manage to make it work,” said Ilia Shumailov, lead author of the research. “But what we are saying is that our current synthetic data is probably erroneous in some ways. The most surprising thing is how quickly this stuff happens.”

The paper explores the tendency of AI models to collapse over time because of the inevitable accumulation and amplification of mistakes from successive generations of training.

The speed of the deterioration is related to the severity of shortcomings in the design of the model, the learning process and the quality of data used. 

The early stages of collapse typically involve a “loss of variance”, which means majority subpopulations in the data become progressively over-represented at the expense of minority groups. In late-stage collapse, all parts of the data may descend into gibberish…(More)”.

Anonymization: The imperfect science of using data while preserving privacy


Paper by Andrea Gadotti et al: “Information about us, our actions, and our preferences is created at scale through surveys or scientific studies or as a result of our interaction with digital devices such as smartphones and fitness trackers. The ability to safely share and analyze such data is key for scientific and societal progress. Anonymization is considered by scientists and policy-makers as one of the main ways to share data while minimizing privacy risks. In this review, we offer a pragmatic perspective on the modern literature on privacy attacks and anonymization techniques. We discuss traditional de-identification techniques and their strong limitations in the age of big data. We then turn our attention to modern approaches to share anonymous aggregate data, such as data query systems, synthetic data, and differential privacy. We find that, although no perfect solution exists, applying modern techniques while auditing their guarantees against attacks is the best approach to safely use and share data today…(More)”.

The Data That Powers A.I. Is Disappearing Fast


Article by Kevin Roose: “For years, the people building powerful artificial intelligence systems have used enormous troves of text, images and videos pulled from the internet to train their models.

Now, that data is drying up.

Over the past year, many of the most important web sources used for training A.I. models have restricted the use of their data, according to a study published this week by the Data Provenance Initiative, an M.I.T.-led research group.

The study, which looked at 14,000 web domains that are included in three commonly used A.I. training data sets, discovered an “emerging crisis in consent,” as publishers and online platforms have taken steps to prevent their data from being harvested.

The researchers estimate that in the three data sets — called C4, RefinedWeb and Dolma — 5 percent of all data, and 25 percent of data from the highest-quality sources, has been restricted. Those restrictions are set up through the Robots Exclusion Protocol, a decades-old method for website owners to prevent automated bots from crawling their pages using a file called robots.txt.

The study also found that as much as 45 percent of the data in one set, C4, had been restricted by websites’ terms of service.

“We’re seeing a rapid decline in consent to use data across the web that will have ramifications not just for A.I. companies, but for researchers, academics and noncommercial entities,” said Shayne Longpre, the study’s lead author, in an interview.

Data is the main ingredient in today’s generative A.I. systems, which are fed billions of examples of text, images and videos. Much of that data is scraped from public websites by researchers and compiled in large data sets, which can be downloaded and freely used, or supplemented with data from other sources…(More)”.

Governance of deliberative mini-publics: emerging consensus and divergent views


Paper by Lucy J. Parry, Nicole Curato, and , and John S. Dryzek: “Deliberative mini-publics are forums for citizen deliberation composed of randomly selected citizens convened to yield policy recommendations. These forums have proliferated in recent years but there are no generally accepted standards to govern their practice. Should there be? We answer this question by bringing the scholarly literature on citizen deliberation into dialogue with the lived experience of the people who study, design and implement mini-publics. We use Q methodology to locate five distinct perspectives on the integrity of mini-publics, and map the structure of agreement and dispute across them. We find that, across the five viewpoints, there is emerging consensus as well as divergence on integrity issues, with disagreement over what might be gained or lost by adapting common standards of practice, and possible sources of integrity risks. This article provides an empirical foundation for further discussion on integrity standards in the future…(More)”.