Using Data Science for Improving the Use of Scholarly Research in Public Policy


Blog by Basil Mahfouz: “Scientists worldwide published over 2.6 million papers in 2022 – Almost 5 papers per minute and more than double what they published in the year 2000. Are policy makers making the most of the wealth of available scientific knowledge? In this blog, we describe how we are applying data science methods on the bibliometric database of Elsevier’s International Centre for the Study of Research (ICSR) to analyse how scholarly research is being used by policy makers. More specifically, we will discuss how we are applying natural language processing and network dynamics to identify where there is policy action and also strong evidence; where there is policy interest but a lack of evidence; and where potential policies and strategies are not making full use of available knowledge or tools…(More)”.

Designing Research For Impact


Blog by Duncan Green: “The vast majority of proposals seem to conflate impact with research dissemination (a heroic leap of faith – changing the world one seminar at a time), or to outsource impact to partners such as NGOs and thinktanks.

Of the two, the latter looks more promising, but then the funder should ask to see both evidence of genuine buy-in from the partners, and appropriate budget for the work. Bringing in a couple of NGOs as ‘bid candy’ with little money attached is unlikely to produce much impact.

There is plenty written on how to genuinely design research for impact, e.g. this chapter from a number of Oxfam colleagues on its experience, or How to Engage Policy Makers with your Research (an excellent book I reviewed recently and on the LSE Review of Books). In brief, proposals should:

  • Identify the kind(s) of impacts being sought: policy change, attitudinal shifts (public or among decision makers), implementation of existing laws and policies etc.
  • Provide a stakeholder mapping of the positions of key players around those impacts – supporters, waverers and opponents.
  • Explain how the research plans to target some/all of these different individuals/groups, including during the research process itself (not just ‘who do we send the papers to once they’re published?’).
  • Which messengers/intermediaries will be recruited to convey the research to the relevant targets (researchers themselves are not always the best-placed to persuade them)
  • Potential ‘critical junctures’ such as crises or changes of political leadership that could open windows of opportunity for uptake, and how the research team is set up to spot and respond to them.
  • Anticipated attacks/backlash against research on sensitive issues and how the researchers plan to respond
  • Plans for review and adaptation of the influencing strategy

I am not arguing for proposals to indicate specific impact outcomes – most systems are way too complex for that. But, an intentional plan based on asking questions on the points above would probably help researchers improve their chances of impact.

Based on the conversations I’ve been having, I also have some thoughts on what is blocking progress.

Impact is still too often seen as an annoying hoop to jump through at the funding stage (and then largely forgotten, at least until reporting at the end of the project). The incentives are largely personal/moral (‘I want to make a difference’), whereas the weight of professional incentives are around accumulating academic publications and earning the approval of peers (hence the focus on seminars).

incentives are largely personal/moral (‘I want to make a difference’), whereas the weight of professional incentives are around accumulating academic publications

The timeline of advocacy, with its focus on ‘dancing with the system’, jumping on unexpected windows of opportunity etc, does not mesh with the relentless but slow pressure to write and publish. An academic is likely to pay a price if they drop their current research plans to rehash prior work to take advantage of a brief policy ‘window of opportunity’.

There is still some residual snobbery, at least in some disciplines. You still hear terms like ‘media don’, which is not meant as a compliment. For instance, my friend Ha-Joon Chang is now an economics professor at SOAS, but what on earth was Cambridge University thinking not making a global public intellectual and brilliant mind into a prof, while he was there?

True, there is also some more justified concern that designing research for impact can damage the research’s objectivity/credibility – hence the desire to pull in NGOs and thinktanks as intermediaries. But, this conversation still feels messy and unresolved, at least in the UK…(More)”.

AI and new standards promise to make scientific data more useful by making it reusable and accessible


Article by Bradley Wade Bishop: “…AI makes it highly desirable for any data to be machine-actionable – that is, usable by machines without human intervention. Now, scholars can consider machines not only as tools but also as potential autonomous data reusers and collaborators.

The key to machine-actionable data is metadata. Metadata are the descriptions scientists set for their data and may include elements such as creator, date, coverage and subject. Minimal metadata is minimally useful, but correct and complete standardized metadata makes data more useful for both people and machines.

It takes a cadre of research data managers and librarians to make machine-actionable data a reality. These information professionals work to facilitate communication between scientists and systems by ensuring the quality, completeness and consistency of shared data.

The FAIR data principles, created by a group of researchers called FORCE11 in 2016 and used across the world, provide guidance on how to enable data reuse by machines and humans. FAIR data is findable, accessible, interoperable and reusable – meaning it has robust and complete metadata.

In the past, I’ve studied how scientists discover and reuse data. I found that scientists tend to use mental shortcuts when they’re looking for data – for example, they may go back to familiar and trusted sources or search for certain key terms they’ve used before. Ideally, my team could build this decision-making process of experts and remove as many biases as possible to improve AI. The automation of these mental shortcuts should reduce the time-consuming chore of locating the right data…(More)”.

City/Science Intersections: A Scoping Review of Science for Policy in Urban Contexts


Paper by Gabriela Manrique Rueda et al: “Science is essential for cities to understand and intervene on the increasing global risks. However, challenges in effectively utilizing scientific knowledge in decision-making processes limit cities’ abilities to address these risks. This scoping review examines the development of science for urban policy, exploring the contextual factors, organizational structures, and mechanisms that facilitate or hinder the integration of science and policy. It investigates the challenges faced and the outcomes achieved. The findings reveal that science has gained influence in United Nations (UN) policy discourses, leading to the expansion of international, regional, and national networks connecting science and policy. Boundary-spanning organizations and collaborative research initiatives with stakeholders have emerged, creating platforms for dialogue, knowledge sharing, and experimentation. However, cultural differences between the science and policy realms impede the effective utilization of scientific knowledge in decision-making. While efforts are being made to develop methods and tools for knowledge co-production, translation, and mobilization, more attention is needed to establish science-for-policy organizational structures and address power imbalances in research processes that give rise to ethical challenges…(More)”.

Do People Like Algorithms? A Research Strategy


Paper by Cass R. Sunstein and Lucia Reisch: “Do people like algorithms? In this study, intended as a promissory note and a description of a research strategy, we offer the following highly preliminary findings. (1) In a simple choice between a human being and an algorithm, across diverse settings and without information about the human being or the algorithm, people in our tested groups are about equally divided in their preference. (2) When people are given a very brief account of the data on which an algorithm relies, there is a large shift in favor of the algorithm over the human being. (3) When people are given a very brief account of the experience of the relevant human being, without an account of the data on which the relevant algorithm relies, there is a moderate shift in favor of the human being. (4) When people are given both (a) a very brief account of the experience of the relevant human being and (b) a very brief account of the data on which the relevant algorithm relies, there is a large shift in favor of the algorithm over the human being. One lesson is that in the tested groups, at least one-third of people seem to have a clear preference for either a human being or an algorithm – a preference that is unaffected by brief information that seems to favor one or the other. Another lesson is that a brief account of the data on which an algorithm relies does have a significant effect on a large percentage of the tested groups, whether or not people are also given positive information about the human alternative. Across the various surveys, we do not find persistent demographic differences, with one exception: men appear to like algorithms more than women do. These initial findings are meant as proof of concept, or more accurately as a suggestion of concept, intended to inform a series of larger and more systematic studies of whether and when people prefer to rely on algorithms or human beings, and also of international and demographic differences…(More)”.

The Early History of Counting


Essay by Keith Houston: “Figuring out when humans began to count systematically, with purpose, is not easy. Our first real clues are a handful of curious, carved bones dating from the final few millennia of the three-​million-​year expanse of the Old Stone Age, or Paleolithic era. Those bones are humanity’s first pocket calculators: For the prehistoric humans who carved them, they were mathematical notebooks and counting aids rolled into one. For the anthropologists who unearthed them thousands of years later, they were proof that our ability to count had manifested itself no later than 40,000 years ago.

In 1973, while excavating a cave in the Lebombo Mountains, near South Africa’s border with Swaziland, Peter Beaumont found a small, broken bone with twenty-​nine notches carved across it. The so-​called Border Cave had been known to archaeologists since 1934, but the discovery during World War II of skeletal remains dating to the Middle Stone Age heralded a site of rare importance. It was not until Beaumont’s dig in the 1970s, however, that the cave gave up its most significant treasure: the earliest known tally stick, in the form of a notched, three-​inch long baboon fibula.

On the face of it, the numerical instrument known as the tally stick is exceedingly mundane. Used since before recorded history—​still used, in fact, by some cultures—​to mark the passing days, or to account for goods or monies given or received, most tally sticks are no more than wooden rods incised with notches along their length. They help their users to count, to remember, and to transfer ownership. All of which is reminiscent of writing, except that writing did not arrive until a scant 5,000 years ago—​and so, when the Lebombo bone was determined to be some 42,000 years old, it instantly became one of the most intriguing archaeological artifacts ever found. Not only does it put a date on when Homo sapiens started counting, it also marks the point at which we began to delegate our memories to external devices, thereby unburdening our minds so that they might be used for something else instead. Writing in 1776, the German historian Justus Möser knew nothing of the Lebombo bone, but his musings on tally sticks in general are strikingly apposite:

The notched tally stick itself testifies to the intelligence of our ancestors. No invention is simpler and yet more significant than this…(More)”.

Philosophy of Open Science


Book by Sabina Leonelli: “The Open Science [OS] movement aims to foster the wide dissemination, scrutiny and re-use of research components for the good of science and society. This Element examines the role played by OS principles and practices within contemporary research and how this relates to the epistemology of science. After reviewing some of the concerns that have prompted calls for more openness, it highlights how the interpretation of openness as the sharing of resources, so often encountered in OS initiatives and policies, may have the unwanted effect of constraining epistemic diversity and worsening epistemic injustice, resulting in unreliable and unethical scientific knowledge. By contrast, this Element proposes to frame openness as the effort to establish judicious connections among systems of practice, predicated on a process-oriented view of research as a tool for effective and responsible agency…(More)”.

AI tools are designing entirely new proteins that could transform medicine


Article by Ewen Callaway: “OK. Here we go.” David Juergens, a computational chemist at the University of Washington (UW) in Seattle, is about to design a protein that, in 3-billion-plus years of tinkering, evolution has never produced.

On a video call, Juergens opens a cloud-based version of an artificial intelligence (AI) tool he helped to develop, called RFdiffusion. This neural network, and others like it, are helping to bring the creation of custom proteins — until recently a highly technical and often unsuccessful pursuit — to mainstream science.

These proteins could form the basis for vaccines, therapeutics and biomaterials. “It’s been a completely transformative moment,” says Gevorg Grigoryan, the co-founder and chief technical officer of Generate Biomedicines in Somerville, Massachusetts, a biotechnology company applying protein design to drug development.

The tools are inspired by AI software that synthesizes realistic images, such as the Midjourney software that, this year, was famously used to produce a viral image of Pope Francis wearing a designer white puffer jacket. A similar conceptual approach, researchers have found, can churn out realistic protein shapes to criteria that designers specify — meaning, for instance, that it’s possible to speedily draw up new proteins that should bind tightly to another biomolecule. And early experiments show that when researchers manufacture these proteins, a useful fraction do perform as the software suggests.

The tools have revolutionized the process of designing proteins in the past year, researchers say. “It is an explosion in capabilities,” says Mohammed AlQuraishi, a computational biologist at Columbia University in New York City, whose team has developed one such tool for protein design. “You can now create designs that have sought-after qualities.”

“You’re building a protein structure customized for a problem,” says David Baker, a computational biophysicist at UW whose group, which includes Juergens, developed RFdiffusion. The team released the software in March 2023, and a paper describing the neural network appears this week in Nature1. (A preprint version was released in late 2022, at around the same time that several other teams, including AlQuraishi’s2 and Grigoryan’s3, reported similar neural networks)…(More)”.

Just Citation


Paper by Amanda Levendowski: “Contemporary citation practices are often unjust. Data cartels, like Google, Westlaw, and Lexis, prioritize profits and efficiency in ways that threaten people’s autonomy, particularly that of pregnant people and immigrants. Women and people of color have been legal scholars for more than a century, yet colleagues consistently under-cite and under-acknowledge their work. Other citations frequently lead to materials that cannot be accessed by disabled people, poor people or the public due to design, paywalls or link rot. Yet scholars and students often understand citation practices as “just” citation and perpetuate these practices unknowingly. This Article is an intervention. Using an intersectional feminist framework for understanding how cyberlaws oppress and liberate oppressed, an emerging movement known as feminist cyberlaw, this Article investigates problems posed by prevailing citation practices and introduces practical methods that bring citation into closer alignment with the feminist values of safety, equity, and accessibility. Escaping data cartels, engaging marginalized scholars, embracing free and public resources, and ensuring that those resources remain easily available represent small, radical shifts that promote just citation. This Article provides powerful, practical tools for pursuing all of them…(More)”.

Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data


Proceedings from the National Academies of Sciences: “Artificial intelligence (AI), facial recognition, and other advanced computational and statistical techniques are accelerating advancements in the life sciences and many other fields. However, these technologies and the scientific developments they enable also hold the potential for unintended harm and malicious exploitation. To examine these issues and to discuss practices for anticipating and preventing the misuse of advanced data analytics and biological data in a global context, the National Academies of Sciences, Engineering, and Medicine convened two virtual workshops on November 15, 2022, and February 9, 2023. The workshops engaged scientists from the United States, South Asia, and Southeast Asia through a series of presentations and scenario-based exercises to explore emerging applications and areas of research, their potential benefits, and the ethical issues and security risks that arise when AI applications are used in conjunction with biological data. This publication highlights the presentations and discussions of the workshops…(More)”.