Open data: The building block of 21st century (open) science

Paper by Corina Pascu and Jean-Claude Burgelman: “Given this irreversibility of data driven and reproducible science and the role machines will play in that, it is foreseeable that the production of scientific knowledge will be more like a constant flow of updated data driven outputs, rather than a unique publication/article of some sort. Indeed, the future of scholarly publishing will be more based on the publication of data/insights with the article as a narrative.

For open data to be valuable, reproducibility is a sine qua non (King2011; Piwowar, Vision and Whitlock2011) and—equally important as most of the societal grand challenges require several sciences to work together—essential for interdisciplinarity.

This trend correlates with the already ongoing observed epistemic shift in the rationale of science: from demonstrating the absolute truth via a unique narrative (article or publication), to the best possible understanding what at that moment is needed to move forward in the production of knowledge to address problem “X” (de Regt2017).

Science in the 21st century will be thus be more “liquid,” enabled by open science and data practices and supported or even co-produced by artificial intelligence (AI) tools and services, and thus a continuous flow of knowledge produced and used by (mainly) machines and people. In this paradigm, an article will be the “atomic” entity and often the least important output of the knowledge stream and scholarship production. Publishing will offer in the first place a platform where all parts of the knowledge stream will be made available as such via peer review.

The new frontier in open science as well as where most of future revenue will be made, will be via value added data services (such as mining, intelligence, and networking) for people and machines. The use of AI is on the rise in society, but also on all aspects of research and science: what can be put in an algorithm will be put; the machines and deep learning add factor “X.”

AI services for science 4 are already being made along the research process: data discovery and analysis and knowledge extraction out of research artefacts are accelerated with the use of AI. AI technologies also help to maximize the efficiency of the publishing process and make peer-review more objective5 (Table 1).

Table 1. Examples of AI services for science already being developed

Abbreviation: AI, artificial intelligence.

Source: Authors’ research based on public sources, 2021.

Ultimately, actionable knowledge and translation of its benefits to society will be handled by humans in the “machine era” for decades to come. But as computers are indispensable research assistants, we need to make what we publish understandable to them.

The availability of data that are “FAIR by design” and shared Application Programming Interfaces (APIs) will allow new ways of collaboration between scientists and machines to make the best use of research digital objects of any kind. The more findable, accessible, interoperable, and reusable (FAIR) data resources will become available, the more it will be possible to use AI to extract and analyze new valuable information. The main challenge is to master the interoperability and quality of research data…(More)”.

Facebook-owner Meta to share more political ad targeting data

Article by Elizabeth Culliford: “Facebook owner Meta Platforms Inc (FB.O) will share more data on targeting choices made by advertisers running political and social-issue ads in its public ad database, it said on Monday.

Meta said it would also include detailed targeting information for these individual ads in its “Facebook Open Research and Transparency” database used by academic researchers, in an expansion of a pilot launched last year.

“Instead of analyzing how an ad was delivered by Facebook, it’s really going and looking at an advertiser strategy for what they were trying to do,” said Jeff King, Meta’s vice president of business integrity, in a phone interview.

The social media giant has faced pressure in recent years to provide transparency around targeted advertising on its platforms, particularly around elections. In 2018, it launched a public ad library, though some researchers criticized it for glitches and a lack of detailed targeting data.Meta said the ad library will soon show a summary of targeting information for social issue, electoral or political ads run by a page….The company has run various programs with external researchers as part of its transparency efforts. Last year, it said a technical error meant flawed data had been provided to academics in its “Social Science One” project…(More)”.

The Bare Minimum of Theory: A Definitional Definition for the Social Sciences

Paper by Chitu Okoli: “The ongoing debates in the information systems (IS) discipline on the nature of theory are implicitly rooted in different epistemologies of the social sciences and in a lack of consensus on a definition of theory. Thus, we focus here on the much-neglected topic of what constitutes the bare minimum of what can possibly be considered theory—only by carefully understanding the bare minimum can we really understand the essence of what makes a theory a theory. We definitionally define a theory in the social sciences as an explanation of the relationship between two or more measurable concepts. (“Measurable” refers to qualitative coding and inference of mechanisms, as well as quantitative magnitudes.) The rigorous justification of each element of this definition helps to resolve issues such as providing a consistent basis of determining what qualifies as theory; the value of other knowledge contributions that are not theory; how to identify theories regardless of if they are named; and a call to recognize diverse forms of theorizing across the social science epistemologies of positivism, interpretivism, critical social theory, critical realism, and pragmatism. Although focused on IS, most of these issues are pertinent to any scholarly discipline within the social sciences…(More)”.

A Computational Inflection for Scientific Discovery

Paper by Tom Hope, Doug Downey, Oren Etzioni, Daniel S. Weld, and Eric Horvitz: “We stand at the foot of a significant inflection in the trajectory of scientific discovery. As society continues on its fast-paced digital transformation, so does humankind’s collective scientific knowledge and discourse. We now read and write papers in digitized form, and a great deal of the formal and informal processes of science are captured digitally — including papers, preprints and books, code and datasets, conference presentations, and interactions in social networks and communication platforms. The transition has led to the growth of a tremendous amount of information, opening exciting opportunities for computational models and systems that analyze and harness it. In parallel, exponential growth in data processing power has fueled remarkable advances in AI, including self-supervised neural models capable of learning powerful representations from large-scale unstructured text without costly human supervision. The confluence of societal and computational trends suggests that computer science is poised to ignite a revolution in the scientific process itself.
However, the explosion of scientific data, results and publications stands in stark contrast to the constancy of human cognitive capacity. While scientific knowledge is expanding with rapidity, our minds have remained static, with severe limitations on the capacity for finding, assimilating and manipulating information. We propose a research agenda of task-guided knowledge retrieval, in which systems counter humans’ bounded capacity by ingesting corpora of scientific knowledge and retrieving inspirations, explanations, solutions and evidence synthesized to directly augment human performance on salient tasks in scientific endeavors. We present initial progress on methods and prototypes, and lay out important opportunities and challenges ahead with computational approaches that have the potential to revolutionize science…(More)”.

How does research data generate societal impact?

Blog by Eric Jensen and Mark Reed: “Managing data isn’t exciting and it can feel like a hassle to deposit data at the end of a project, when you want to focus on publishing your findings.

But if you want your research to have impact, paying attention to data could make a big difference, according to new research we published recently in the journal PLOS ONE.

We analysed case studies from the UK Research Excellence Framework (REF) exercise in 2014 to show how data analysis and curation can generate benefits for policy and practice, and sought to understand the pathways through which data typically leads to impact. In this series of blog posts we will unpack this research and show you how you can manage your data for impact.

We were commissioned by the Australian Research Data Commons (ARDC) to investigate how research data contributes to demonstrable non-academic benefits to society from research, drawing on existing impact case studies from the REF. We then analyzed case studies from the Australian Research Council (ARC) Engagement and Impact Assessment 2018, a similar exercise to the UK’s…

The most prevalent type of research data-driven impact was benefits for professional practice (45% UK; 44% Australia).

This category of impact includes changing the ways professionals operate and improving the quality of products or services through better methods, technologies, and responses to issues through better understanding. It also includes changing organisational culture and improving workplace productivity or outcomes.

Government impacts were the next most prevalent category identified in this research (21% UK; 20% Australia).

These impacts include the introduction of new policies and changes to existing policies, as well as

  • reducing the cost to deliver government services
  • enhancing the effectiveness or efficiency of government services and operations
  • more efficient government planning

Other relatively common types of research data-driven impacts were economic impact (13% UK; 14% Australia) and public health impacts (10% UK; 8% Australia)…(More)”.

Accelerating ethics, empathy, and equity in geographic information science

Paper by T. A. Nelson, F. Goodchild and D. J. Wright: “Science has traditionally been driven by curiosity and followed one goal: the pursuit of truth and the advancement of knowledge. Recently, ethics, empathy, and equity, which we term “the 3Es,” are emerging as new drivers of research and disrupting established practices. Drawing on our own field of GIScience (geographic information science), our goal is to use the geographic approach to accelerate the response to the 3Es by identifying priority issues and research needs that, if addressed, will advance ethical, empathic, and equitable GIScience. We also aim to stimulate similar responses in other disciplines. Organized around the 3Es we discuss ethical issues arising from locational privacy and cartographic integrity, how our ability to build knowledge that will lead to empathy can be curbed by data that lack representativeness and by inadvertent inferential error, and how GIScientists can lead toward equity by supporting social justice efforts and democratizing access to spatial science and its tools. We conclude with a call to action and invite all scientists to join in a fundamentally different science that responds to the 3Es and mobilizes for change by engaging in humility, broadening measures of excellences and success, diversifying our networks, and creating pathways to inclusive education. Science united around the 3Es is the right response to this unique moment where society and the planet are facing a vast array of challenges that require knowledge, truth, and action…(More)”

Opening Up to Open Science

Essay by Chelle Gentemann, Christopher Erdmann and Caitlin Kroeger: “The modern Hippocratic Oath outlines ethical standards that physicians worldwide swear to uphold. “I will respect the hard-won scientific gains of those physicians in whose steps I walk,” one of its tenets reads, “and gladly share such knowledge as is mine with those who are to follow.”

But what form, exactly, should knowledge-sharing take? In the practice of modern science, knowledge in most scientific disciplines is generally shared through peer-reviewed publications at the end of a project. Although publication is both expected and incentivized—it plays a key role in career advancement, for example—many scientists do not take the extra step of sharing data, detailed methods, or code, making it more difficult for others to replicate, verify, and build on their results. Even beyond that, professional science today is full of personal and institutional incentives to hold information closely to retain a competitive advantage.

This way of sharing science has some benefits: peer review, for example, helps to ensure (even if it never guarantees) scientific integrity and prevent inadvertent misuse of data or code. But the status quo also comes with clear costs: it creates barriers (in the form of publication paywalls), slows the pace of innovation, and limits the impact of research. Fast science is increasingly necessary, and with good reason. Technology has not only improved the speed at which science is carried out, but many of the problems scientists study, from climate change to COVID-19, demand urgency. Whether modeling the behavior of wildfires or developing a vaccine, the need for scientists to work together and share knowledge has never been greater. In this environment, the rapid dissemination of knowledge is critical; closed, siloed knowledge slows progress to a degree society cannot afford. Imagine the consequences today if, as in the 2003 SARS disease outbreak, the task of sequencing genomes still took months and tools for labs to share the results openly online didn’t exist. Today’s challenges require scientists to adapt and better recognize, facilitate, and reward collaboration.

Open science is a path toward a collaborative culture that, enabled by a range of technologies, empowers the open sharing of data, information, and knowledge within the scientific community and the wider public to accelerate scientific research and understanding. Yet despite its benefits, open science has not been widely embraced…(More)”

Citizen science and environmental justice: exploring contradictory outcomes through a case study of air quality monitoring in Dublin

Paper by Fiadh Tubridy et al: “Citizen science is advocated as a response to a broad range of contemporary societal and ecological challenges. However, there are widely varying models of citizen science which may either challenge or reinforce existing knowledge paradigms and associated power dynamics. This paper explores different approaches to citizen science in the context of air quality monitoring in terms of their implications for environmental justice. This is achieved through a case study of air quality management in Dublin which focuses on the role of citizen science in this context. The evidence shows that the dominant interpretation of citizen science in Dublin is that it provides a means to promote awareness and behaviour change rather than to generate knowledge and inform new regulations or policies. This is linked to an overall context of technocratic governance and the exclusion of non-experts from decision-making. It is further closely linked to neoliberal governance imperatives to individualise responsibility and promote market-based solutions to environmental challenges. Last, the evidence highlights that this model of citizen science risks compounding inequalities by transferring responsibility and blame for air pollution to those who have limited resources to address it. Overall, the paper highlights the need for critical analysis of the implications of citizen science in different instances and for alternative models of citizen science whereby communities would contribute to setting objectives and determining how their data is used…(More)”.

Time to recognize authorship of open data

Nature Editorial: “At times, it seems there’s an unstoppable momentum towards the principle that data sets should be made widely available for research purposes (also called open data). Research funders all over the world are endorsing the open data-management standards known as the FAIR principles (which ensure data are findable, accessible, interoperable and reusable). Journals are increasingly asking authors to make the underlying data behind papers accessible to their peers. Data sets are accompanied by a digital object identifier (DOI) so they can be easily found. And this citability helps researchers to get credit for the data they generate.

But reality sometimes tells a different story. The world’s systems for evaluating science do not (yet) value openly shared data in the same way that they value outputs such as journal articles or books. Funders and research leaders who design these systems accept that there are many kinds of scientific output, but many reject the idea that there is a hierarchy among them.

In practice, those in powerful positions in science tend not to regard open data sets in the same way as publications when it comes to making hiring and promotion decisions or awarding memberships to important committees, or in national evaluation systems. The open-data revolution will stall unless this changes….

Universities, research groups, funding agencies and publishers should, together, start to consider how they could better recognize open data in their evaluation systems. They need to ask: how can those who have gone the extra mile on open data be credited appropriately?

There will always be instances in which researchers cannot be given access to human data. Data from infants, for example, are highly sensitive and need to pass stringent privacy and other tests. Moreover, making data sets accessible takes time and funding that researchers don’t always have. And researchers in low- and middle-income countries have concerns that their data could be used by researchers or businesses in high-income countries in ways that they have not consented to.

But crediting all those who contribute their knowledge to a research output is a cornerstone of science. The prevailing convention — whereby those who make their data open for researchers to use make do with acknowledgement and a citation — needs a rethink. As long as authorship on a paper is significantly more valued than data generation, this will disincentivize making data sets open. The sooner we change this, the better….(More)”.

Measuring costs and benefits of citizen science

Article by Kathy Tzilivakis: “It’s never been easy to accurately measure the impact of any scientific research, but it’s even harder for citizen science projects, which don’t follow traditional methods. Public involvement places citizen science in a new era of data collection, one that requires a new measurement plan.

As you read this, thousands of ordinary people across Europe are busy tagging, categorizing and counting in the name of science. They may be reporting crop yields, analyzing plastic waste found in nature or monitoring the populations of wildlife. This relatively new method of public participation in scientific enquiry is experiencing a considerable upswing in both quality and scale of projects.

Of course, people have been sharing observations about the natural world for millennia—way before the term “citizen science” appeared on the cover of sociologist Alan Irwin‘s 1995 book “Citizen Science: A Study of People, Expertise, and Sustainable Development. “

Today, citizen science is on the rise with bigger projects that are more ambitious and better networked than ever before. And while collecting seawater samples and photographing wild birds are two well-known examples of citizen science, this is just the tip of the iceberg.

Citizen science is evolving thanks to new data collection techniques enabled by the internet, smartphones and social media. Increased connectivity is encouraging a wide range of observations that can be easily recorded and shared. The reams of crowd-sourced data from members of the public are a boon for researchers working on large-scale and geographically diverse projects. Often it would be too difficult and expensive to obtain this data otherwise.

Both sides win because scientists are helped to collect much better data and an enthusiastic public gets to engage with the fascinating world of science.

But success has been difficult to define, let alone to translate into indicators for assessment. Until now.

A group of EU researchers has taken on the challenge of building the first integrated and interactive platform to measure costs and benefits of citizen science….

“The platform will be very complex but able to capture the characteristics and the results of projects, and measure their impact on several domains like society, economy, environment, science and technology and governance,” said Dr. Luigi Ceccaroni, who is coordinating the Measuring Impact of Citizen Science (MICS) project behind the platform. Currently at the testing stage, the platform is slated to go live before the end of this year….(More)”