How Much of the World Is It Possible to Model?


Article by Dan Rockmore: “…Modelling, in general, is now routine. We model everything, from elections to economics, from the climate to the coronavirus. Like model cars, model airplanes, and model trains, mathematical models aren’t the real thing—they’re simplified representations that get the salient parts right. Like fashion models, model citizens, and model children, they’re also idealized versions of reality. But idealization and abstraction can be forms of strength. In an old mathematical-modelling joke, a group of experts is hired to improve milk production on a dairy farm. One of them, a physicist, suggests, “Consider a spherical cow.” Cows aren’t spheres any more than brains are jiggly sponges, but the point of modelling—in some ways, the joy of it—is to see how far you can get by using only general scientific principles, translated into mathematics, to describe messy reality.

To be successful, a model needs to replicate the known while generalizing into the unknown. This means that, as more becomes known, a model has to be improved to stay relevant. Sometimes new developments in math or computing enable progress. In other cases, modellers have to look at reality in a fresh way. For centuries, a predilection for perfect circles, mixed with a bit of religious dogma, produced models that described the motion of the sun, moon, and planets in an Earth-centered universe; these models worked, to some degree, but never perfectly. Eventually, more data, combined with more expansive thinking, ushered in a better model—a heliocentric solar system based on elliptical orbits. This model, in turn, helped kick-start the development of calculus, reveal the law of gravitational attraction, and fill out our map of the solar system. New knowledge pushes models forward, and better models help us learn.

Predictions about the universe are scientifically interesting. But it’s when models make predictions about worldly matters that people really pay attention.We anxiously await the outputs of models run by the Weather Channel, the Fed, and fivethirtyeight.com. Models of the stock market guide how our pension funds are invested; models of consumer demand drive production schedules; models of energy use determine when power is generated and where it flows. Insurers model our fates and charge us commensurately. Advertisers (and propagandists) rely on A.I. models that deliver targeted information (or disinformation) based on predictions of our reactions.

But it’s easy to get carried away..(More)”

Missing Evidence : Tracking Academic Data Use around the World


Worldbank Report: “Data-driven research on a country is key to producing evidence-based public policies. Yet little is known about where data-driven research is lacking and how it could be expanded. This paper proposes a method for tracking academic data use by country of subject, applying natural language processing to open-access research papers. The model’s predictions produce country estimates of the number of articles using data that are highly correlated with a human-coded approach, with a correlation of 0.99. Analyzing more than 1 million academic articles, the paper finds that the number of articles on a country is strongly correlated with its gross domestic product per capita, population, and the quality of its national statistical system. The paper identifies data sources that are strongly associated with data-driven research and finds that availability of subnational data appears to be particularly important. Finally, the paper classifies countries into groups based on whether they could most benefit from increasing their supply of or demand for data. The findings show that the former applies to many low- and lower-middle-income countries, while the latter applies to many upper-middle- and high-income countries…(More)”.

Ground Truths Are Human Constructions


Article by Florian Jaton: “Artificial intelligence algorithms are human-made, cultural constructs, something I saw first-hand as a scholar and technician embedded with AI teams for 30 months. Among the many concrete practices and materials these algorithms need in order to come into existence are sets of numerical values that enable machine learning. These referential repositories are often called “ground truths,” and when computer scientists construct or use these datasets to design new algorithms and attest to their efficiency, the process is called “ground-truthing.”

Understanding how ground-truthing works can reveal inherent limitations of algorithms—how they enable the spread of false information, pass biased judgments, or otherwise erode society’s agency—and this could also catalyze more thoughtful regulation. As long as ground-truthing remains clouded and abstract, society will struggle to prevent algorithms from causing harm and to optimize algorithms for the greater good.

Ground-truth datasets define AI algorithms’ fundamental goal of reliably predicting and generating a specific output—say, an image with requested specifications that resembles other input, such as web-crawled images. In other words, ground-truth datasets are deliberately constructed. As such, they, along with their resultant algorithms, are limited and arbitrary and bear the sociocultural fingerprints of the teams that made them…(More)”.

The Data Revolution and the Study of Social Inequality: Promise and Perils


Paper by Mario L. Small: “The social sciences are in the midst of a revolution in access to data, as governments and private companies have accumulated vast digital records of rapidly multiplying aspects of our lives and made those records available to researchers. The accessibility and comprehensiveness of the data are unprecedented. How will the data revolution affect the study of social inequality? I argue that the speed, breadth, and low cost with which large-scale data can be acquired promise a dramatic transformation in the questions we can answer, but this promise can be undercut by size-induced blindness, the tendency to ignore important limitations amidst a source with billions of data points. The likely consequences for what we know about the social world remain unclear…(More)”.

Where Did the Open Access Movement Go Wrong?


An Interview with Richard Poynder by Richard Anderson: “…Open access was intended to solve three problems that have long blighted scholarly communication – the problems of accessibilityaffordability, and equity. 20+ years after the Budapest Open Access Initiative (BOAI) we can see that the movement has signally failed to solve the latter two problems. And with the geopolitical situation deteriorating solving the accessibility problem now also looks to be at risk. The OA dream of “universal open access” remains a dream and seems likely to remain one.

What has been the essence of the OA movement’s failure?

The fundamental problem was that OA advocates did not take ownership of their own movement. They failed, for instance, to establish a central organization (an OA foundation, if you like) in order to organize and better manage the movement; and they failed to publish a single, canonical definition of open access. This is in contrast to the open source movement, and is an omission I drew attention to in 2006

This failure to take ownership saw responsibility for OA pass to organizations whose interests are not necessarily in sync with the objectives of the movement.

It did not help that the BOAI definition failed to specify that to be classified as open access, scholarly works needed to be made freely available immediately on publication and that they should remain freely available in perpetuity. Nor did it give sufficient thought to how OA would be funded (and OA advocates still fail to do that).

This allowed publishers to co-opt OA for their own purposes, most notably by introducing embargoes and developing the pay-to-publish gold OA model, with its now infamous article processing charge (APC).

Pay-to-publish OA is now the dominant form of open access and looks set to increase the cost of scholarly publishing and so worsen the affordability problem. Amongst other things, this has disenfranchised unfunded researchers and those based in the global south (notwithstanding APC waiver promises).

What also did not help is that OA advocates passed responsibility for open access over to universities and funders. This was contradictory, because OA was conceived as something that researchers would opt into. The assumption was that once the benefits of open access were explained to them, researchers would voluntarily embrace it – primarily by self-archiving their research in institutional or preprint repositories. But while many researchers were willing to sign petitions in support of open access, few (outside disciplines like physics) proved willing to practice it voluntarily.

In response to this lack of engagement, OA advocates began to petition universities, funders, and governments to introduce OA policies recommending that researchers make their papers open access. When these policies also failed to have the desired effect, OA advocates demanded their colleagues be forced to make their work OA by means of mandates requiring them to do so.

Most universities and funders (certainly in the global north) responded positively to these calls, in the belief that open access would increase the pace of scientific development and allow them to present themselves as forward-thinking, future-embracing organizations. Essentially, they saw it as a way of improving productivity and ROI while enhancing their public image.

While many researchers were willing to sign petitions in support of open access, few proved willing to practice it voluntarily.

But in light of researchers’ continued reluctance to make their works open access, universities and funders began to introduce increasingly bureaucratic rules, sanctions, and reporting tools to ensure compliance, and to manage the more complex billing arrangements that OA has introduced.

So, what had been conceived as a bottom-up movement founded on principles of voluntarism morphed into a top-down system of command and control, and open access evolved into an oppressive bureaucratic process that has failed to address either the affordability or equity problems. And as the process, and the rules around that process, have become ever more complex and oppressive, researchers have tended to become alienated from open access.

As a side benefit for universities and funders OA has allowed them to better micromanage their faculty and fundees, and to monitor their publishing activities in ways not previously possible. This has served to further proletarianize researchers and today they are becoming the academic equivalent of workers on an assembly line. Philip Mirowski has predicted that open access will lead to the deskilling of academic labor. The arrival of generative AI might seem to make that outcome the more likely…

Can these failures be remedied by means of an OA reset? With this aim in mind (and aware of the failures of the movement), OA advocates are now devoting much of their energy to trying to persuade universities, funders, and philanthropists to invest in a network of alternative nonprofit open infrastructures. They envisage these being publicly owned and focused on facilitating a flowering of new diamond OA journals, preprint servers, and Publish, Review, Curate (PRC) initiatives. In the process, they expect commercial publishers will be marginalized and eventually dislodged.

But it is highly unlikely that the large sums of money that would be needed to create these alternative infrastructures will be forthcoming, certainly not at sufficient levels or on anything other than a temporary basis.

While it is true that more papers and preprints are being published open access each year, I am not convinced this is taking us down the road to universal open access, or that there is a global commitment to open access.

Consequently, I do not believe that a meaningful reset is possible: open access has reached an impasse and there is no obvious way forward that could see the objectives of the OA movement fulfilled.

Partly for this reason, we are seeing attempts to rebrand, reinterpret, and/or reimagine open access and its objectives…(More)”.

Introduction to Digital Humanism


Open access textbook edited by Hannes Werthner et al: “…introduces and defines digital humanism from a diverse range of disciplines. Following the 2019 Vienna Manifesto, the book calls for a digital humanism that describes, analyzes, and, most importantly, influences the complex interplay of technology and humankind, for a better society and life, fully respecting universal human rights.The book is organized in three parts: Part I “Background” provides the multidisciplinary background needed to understand digital humanism in its philosophical, cultural, technological, historical, social, and economic dimensions. The goal is to present the necessary knowledge upon which an effective interdisciplinary discourse on digital humanism can be founded. Part II “Digital Humanism – a System’s View” focuses on an in-depth presentation and discussion of the main digital humanism concerns arising in current digital systems. The goal of this part is to make readers aware and sensitive to these issues, including e.g. the control and autonomy of AI systems, privacy and security, and the role of governance. Part III “Critical and Societal Issues of Digital Systems” delves into critical societal issues raised by advances of digital technologies. While the public debate in the past has often focused on them separately, especially when they became visible through sensational events the aim here is to shed light on the entire landscape and show their interconnected relationships. This includes issues such as AI and ethics, fairness and bias, privacy and surveillance, platform power and democracy.

This textbook is intended for students, teachers, and policy makers interested in digital humanism. It is designed for stand-alone and for complementary courses in computer science, or curricula in science, engineering, humanities and social sciences. Each chapter includes questions for students and an annotated reading list to dive deeper into the associated chapter material. The book aims to provide readers with as wide an exposure as possible to digital advances and their consequences for humanity. It includes constructive ideas and approaches that seek to ensure that our collective digital future is determined through human agency…(More)”.

Computational social science is growing up: why puberty consists of embracing measurement validation, theory development, and open science practices


Paper by Timon Elmer: “Puberty is a phase in which individuals often test the boundaries of themselves and surrounding others and further define their identity – and thus their uniqueness compared to other individuals. Similarly, as Computational Social Science (CSS) grows up, it must strike a balance between its own practices and those of neighboring disciplines to achieve scientific rigor and refine its identity. However, there are certain areas within CSS that are reluctant to adopt rigorous scientific practices from other fields, which can be observed through an overreliance on passively collected data (e.g., through digital traces, wearables) without questioning the validity of such data. This paper argues that CSS should embrace the potential of combining both passive and active measurement practices to capitalize on the strengths of each approach, including objectivity and psychological quality. Additionally, the paper suggests that CSS would benefit from integrating practices and knowledge from other established disciplines, such as measurement validation, theoretical embedding, and open science practices. Based on this argument, the paper provides ten recommendations for CSS to mature as an interdisciplinary field of research…(More)”.

Informing Decisionmakers in Real Time


Article by Robert M. Groves: “In response, the National Science Foundation (NSF) proposed the creation of a complementary group to provide decisionmakers at all levels with the best available evidence from the social sciences to inform pandemic policymaking. In May 2020, with funding from NSF and additional support from the Alfred P. Sloan Foundation and the David and Lucile Packard Foundation, NASEM established the Societal Experts Action Network (SEAN) to connect “decisionmakers grappling with difficult issues to the evidence, trends, and expert guidance that can help them lead their communities and speed their recovery.” We chose to build a network because of the widespread recognition that no one small group of social scientists would have the expertise or the bandwidth to answer all the questions facing decisionmakers. What was needed was a structure that enabled an ongoing feedback loop between researchers and decisionmakers. This structure would foster the integration of evidence, research, and advice in real time, which broke with NASEM’s traditional form of aggregating expert guidance over lengthier periods.

In its first phase, SEAN’s executive committee set about building a network that could both gather and disseminate knowledge. To start, we brought in organizations of decisionmakers—including the National Association of Counties, the National League of Cities, the International City/County Management Association, and the National Conference of State Legislatures—to solicit their questions. Then we added capacity to the network by inviting social and behavioral organizations—like the National Bureau of Economic Research, the National Hazards Center at the University of Colorado Boulder, the Kaiser Family Foundation, the National Opinion Research Center at the University of Chicago, The Policy Lab at Brown University, and Testing for America—to join and respond to questions and disseminate guidance. In this way, SEAN connected teams of experts with evidence and answers to leaders and communities looking for advice…(More)”.

WikiCrow: Automating Synthesis of Human Scientific Knowledge


About: “As scientists, we stand on the shoulders of giants. Scientific progress requires curation and synthesis of prior knowledge and experimental results. However, the scientific literature is so expansive that synthesis, the comprehensive combination of ideas and results, is a bottleneck. The ability of large language models to comprehend and summarize natural language will  transform science by automating the synthesis of scientific knowledge at scale. Yet current LLMs are limited by hallucinations, lack access to the most up-to-date information, and do not provide reliable references for statements.

Here, we present WikiCrow, an automated system that can synthesize cited Wikipedia-style summaries for technical topics from the scientific literature. WikiCrow is built on top of Future House’s internal LLM agent platform, PaperQA, which in our testing, achieves state-of-the-art (SOTA) performance on a retrieval-focused version of PubMedQA and other benchmarks, including a new retrieval-first benchmark, LitQA, developed internally to evaluate systems retrieving full-text PDFs across the entire scientific literature.

As a demonstration of the potential for AI to impact scientific practice, we use WikiCrow to generate draft articles for the 15,616 human protein-coding genes that currently lack Wikipedia articles, or that have article stubs. WikiCrow creates articles in 8 minutes, is much more consistent than human editors at citing its sources, and makes incorrect inferences or statements about 9% of the time, a number that we expect to improve as we mature our systems. WikiCrow will be a foundational tool for the AI Scientists we plan to build in the coming years, and will help us to democratize access to scientific research…(More)”.

How to make data open? Stop overlooking librarians


Article by Jessica Farrell: “The ‘Year of Open Science’, as declared by the US Office of Science and Technology Policy (OSTP), is now wrapping up. This followed an August 2022 memo from OSTP acting director Alondra Nelson, which mandated that data and peer-reviewed publications from federally funded research should be made freely accessible by the end of 2025. Federal agencies are required to publish full plans for the switch by the end of 2024.

But the specifics of how data will be preserved and made publicly available are far from being nailed down. I worked in archives for ten years and now facilitate two digital-archiving communities, the Software Preservation Network and BitCurator Consortium, at Educopia in Atlanta, Georgia. The expertise of people such as myself is often overlooked. More open-science projects need to integrate digital archivists and librarians, to capitalize on the tools and approaches that we have already created to make knowledge accessible and open to the public.How to make your scientific data accessible, discoverable and useful

Making data open and ‘FAIR’ — findable, accessible, interoperable and reusable — poses technical, legal, organizational and financial questions. How can organizations best coordinate to ensure universal access to disparate data? Who will do that work? How can we ensure that the data remain open long after grant funding runs dry?

Many archivists agree that technical questions are the most solvable, given enough funding to cover the labour involved. But they are nonetheless complex. Ideally, any open research should be testable for reproducibility, but re-running scripts or procedures might not be possible unless all of the required coding libraries and environments used to analyse the data have also been preserved. Besides the contents of spreadsheets and databases, scientific-research data can include 2D or 3D images, audio, video, websites and other digital media, all in a variety of formats. Some of these might be accessible only with proprietary or outdated software…(More)”.