Ready, set, share: Researchers brace for new data-sharing rules

Jocelyn Kaiser and Jeffrey Brainard in Science: “…By 2025, new U.S. requirements for data sharing will extend beyond biomedical research to encompass researchers across all scientific disciplines who receive federal research funding. Some funders in the European Union and China have also enacted data-sharing requirements. The new U.S. moves are feeding hopes that a worldwide movement toward increased sharing is in the offing. Supporters think it could speed the pace and reliability of science.

Some scientists may only need to make a few adjustments to comply with the policies. That’s because data sharing is already common in fields such as protein crystallography and astronomy. But in other fields the task could be weighty, because sharing is often an afterthought. For example, a study involving 7750 medical research papers found that just 9% of those published from 2015 to 2020 promised to make their data publicly available, and authors of just 3% actually shared, says lead author Daniel Hamilton of the University of Melbourne, who described the finding at the International Congress on Peer Review and Scientific Publication in September 2022. Even when authors promise to share their data, they often fail to follow through. Out of 21,000 journal articles that included data-sharing plans, a study published in PLOS ONE in 2020 found, fewer than 21% provided links to the repository storing the data.

Journals and funders, too, have a mixed record when it comes to supporting data sharing. Research presented at the September 2022 peer-review congress found only about half of the 110 largest public, corporate, and philanthropic funders of health research around the world recommend or require grantees to share data…

“Health research is the field where the ethical obligation to share data is the highest,” says Aidan Tan, a clinician-researcher at the University of Sydney who led the study. “People volunteer in clinical trials and put themselves at risk to advance medical research and ultimately improve human health.”

Across many fields of science, researchers’ support for sharing data has increased during the past decade, surveys show. But given the potential cost and complexity, many are apprehensive about the NIH policy, and other requirements to follow. “How we get there is pretty messy right now,” says Parker Antin, a developmental biologist and associate vice president for research at the University of Arizona. “I’m really not sure whether the total return will justify the cost. But I don’t know of any other way to find out than trying to do it.”

Science offers this guide as researchers prepare to plunge in….(More)”.

Computational Social Science for the Public Good: Towards a Taxonomy of Governance and Policy Challenges

Chapter by Stefaan G. Verhulst: “Computational Social Science (CSS) has grown exponentially as the process of datafication and computation has increased. This expansion, however, is yet to translate into effective actions to strengthen public good in the form of policy insights and interventions. This chapter presents 20 limiting factors in how data is accessed and analysed in the field of CSS. The challenges are grouped into the following six categories based on their area of direct impact: Data Ecosystem, Data Governance, Research Design, Computational Structures and Processes, the Scientific Ecosystem, and Societal Impact. Through this chapter, we seek to construct a taxonomy of CSS governance and policy challenges. By first identifying the problems, we can then move to effectively address them through research, funding, and governance agendas that drive stronger outcomes…(More)”. Full Book: Handbook of Computational Social Science for Policy

Kid-edited journal pushes scientists for clear writing on complex topics

Article by Mark Johnson: “The reviewer was not impressed with the paper written by Israeli brain researcher Idan Segev and a colleague from Switzerland.

“Professor Idan,” she wrote to Segev. “I didn’t understand anything that you said.”

Segev and co-author Felix Schürmann revised their paper on the Human Brain project, a massive effort seeking to channel all that we know about the mind into a vast computer model. But once again the reviewer sent it back. Still not clear enough. It took a third version to satisfy the reviewer.

“Okay,” said the reviewer, an 11-year-old girl from New York named Abby. “Now I understand.”

Such is the stringent editing process at the online science journal Frontiers for Young Minds, where top scientists, some of them Nobel Prize winners, submit papers on gene-editinggravitational waves and other topics — to demanding reviewers ages 8 through 15.

Launched in 2013, the Lausanne, Switzerland-based publication is coming of age at a moment when skeptical members of the public look to scientists for clear guidance on the coronavirus and on potentially catastrophic climate change, among other issues. At Frontiers for Young Minds, the goal is not just to publish science papers but also to make them accessible to young readers like the reviewers. In doing so, it takes direct aim at a long-standing problem in science — poor communication between professionals and the public.

“Scientists tend to default to their own jargon and don’t think carefully about whether this is a word that the public actually knows,” said Jon Lorsch, director of the National Institute of General Medical Sciences. “Sometimes to actually explain something you need a sentence as opposed to the one word scientists are using.”

Dense language sends a message “that science is for scientists; that you have to be an ‘intellectual’ to read and understand scientific literature; and that science is not relevant or important for everyday life,” according to a paper published last year in Advances in Physiology Education.

Frontiers for Young Minds, which has drawn nearly 30 million online page views in its nine years, offers a different message on its homepage: “Science for kids, edited by kids.”..(More)”.

Report on the Future of Conferences

Arxiv Report by Steven Fraser and Dennis Mancl: “In 2020, virtual conferences became almost the only alternative to cancellation. Now that the pandemic is subsiding, the pros and cons of virtual conferences need to be reevaluated. In this report, we scrutinize the dynamics and economics of conferences and highlight the history of successful virtual meetings in industry. We also report on the attitudes of conference attendees from an informal survey we ran in spring 2022…(More).

The ethical and legal landscape of brain data governance

Paper by Paschal Ochang , Bernd Carsten Stahl, and Damian Eke: “Neuroscience research is producing big brain data which informs both advancements in neuroscience research and drives the development of advanced datasets to provide advanced medical solutions. These brain data are produced under different jurisdictions in different formats and are governed under different regulations. The governance of data has become essential and critical resulting in the development of various governance structures to ensure that the quality, availability, findability, accessibility, usability, and utility of data is maintained. Furthermore, data governance is influenced by various ethical and legal principles. However, it is still not clear what ethical and legal principles should be used as a standard or baseline when managing brain data due to varying practices and evolving concepts. Therefore, this study asks what ethical and legal principles shape the current brain data governance landscape? A systematic scoping review and thematic analysis of articles focused on biomedical, neuro and brain data governance was carried out to identify the ethical and legal principles which shape the current brain data governance landscape. The results revealed that there is currently a large variation of how the principles are presented and discussions around the terms are very multidimensional. Some of the principles are still at their infancy and are barely visible. A range of principles emerged during the thematic analysis providing a potential list of principles which can provide a more comprehensive framework for brain data governance and a conceptual expansion of neuroethics…(More)”.

The Strength of Knowledge Ties

Paper by Luca Maria Aiello: “Social relationships are probably the most important things we have in our life. They help us to get new jobslive longer, and be happier. At the scale of cities, networks of diverse social connections determine the economic prospects of a population. The strength of social ties is believed one of the key factors that regulate these outcomes. According to Granovetter’s classic theory about tie strength, information flows through social ties of two strengths: weak ties that are used infrequently but bridge distant groups that tend to posses diverse knowledge; and strong ties, that are used frequently, knit communities together, and provide dependable sources of support.

For decades, tie strength has been quantified using the frequency of interaction. Yet, frequency does not reflect Granovetter’s initial conception of strength, which in his view is a multidimensional concept, such as the “combination of the amount of time, the emotional intensity, intimacy, and services which characterize the tie.” Frequency of interaction is traditionally used as a proxy for more complex social processes mostly because it is relatively easy to measure (e.g., the number of calls in phone records). But what if we had a way to measure these social processes directly?

We used advanced techniques in Natural Language Processing (NLP) to quantify whether the text of a message conveys knowledge (whether the message provides information about a specific domain) or support (expressions of emotional or practical help), and applied it to a large conversation network from Reddit composed by 630K users resident in the United States, linked by 12.8M ties. Our hypothesis was that the resulting knowledge and support networks would fare better in predicting social outcomes than a traditional social network weighted by interaction frequency. In particular, borrowing a classic experimental setup, we tested whether the diversity of social connections of Reddit users resident in a specific US state would correlate with the economic opportunities in that state (estimated with GDP per capita)…(More)”.

We need data infrastructure as well as data sharing – conflicts of interest in video game research

Article by David Zendle & Heather Wardle: “Industry data sharing has the potential to revolutionise evidence on video gaming and mental health, as well as a host of other critical topics. However, collaborative data sharing agreements between academics and industry partners may also afford industry enormous power in steering the development of this evidence base. In this paper, we outline how nonfinancial conflicts of interest may emerge when industry share data with academics. We then go on to describe ways in which such conflicts may affect the quality of the evidence base. Finally, we suggest strategies for mitigating this impact and preserving research independence. We focus on the development of data infrastructure: technological, social, and educational architecture that facilitates unfettered and free access to the kinds of high-quality data that industry hold, but without industry involvement…(More)”.

ResearchDataGov is a product of the federal statistical agencies and units, created in response to the Foundations of Evidence-based Policymaking Act of 2018. The site is the single portal for discovery of restricted data in the federal statistical system. The agencies have provided detailed descriptions of each data asset. Users can search for data by topic, agency, and keywords. Questions related to the data should be directed to the owning agency, using the contact information on the page that describes the data. In late 2022, users will be able to apply for access to these data using a single-application process built into ResearchDataGov. is built by and hosted at ICPSR at the University of Michigan, under contract and guidance from the National Center for Science and Engineering Statistics within the National Science Foundation.

The data described in are owned by and accessed through the agencies and units of the federal statistical system. Data access is determined by the owning or distributing agency and is limited to specific physical or virtual data enclaves. Even though all data assets are listed in a single inventory, they are not necessarily available for use in the same location(s). Multiple data assets accessed in the same location may not be able to be used together due to disclosure risk and other requirements. Please note the access modality of the data in which you are interested and seek guidance from the owning agency about whether assets can be linked or otherwise used together…(More)”.

A Landscape of Open Science Policies Research

Paper by Alejandra Manco: “This literature review aims to examine the approach given to open science policy in the different studies. The main findings are that the approach given to open science has different aspects: policy framing and its geopolitical aspects are described as an asymmetries replication and epistemic governance tool. The main geopolitical aspects of open science policies described in the literature are the relations between international, regional, and national policies. There are also different components of open science covered in the literature: open data seems much discussed in the works in the English language, while open access is the main component discussed in the Portuguese and Spanish speaking papers. Finally, the relationship between open science policies and the science policy is framed by highlighting the innovation and transparency that open science can bring into it…(More)”

Data Analysis for Social Science: A Friendly and Practical Introduction

Book by Elena Llaudet and Kosuke Imai: “…provides a friendly introduction to the statistical concepts and programming skills needed to conduct and evaluate social scientific studies. Using plain language and assuming no prior knowledge of statistics and coding, the book provides a step-by-step guide to analyzing real-world data with the statistical program R for the purpose of answering a wide range of substantive social science questions. It teaches not only how to perform the analyses but also how to interpret results and identify strengths and limitations. This one-of-a-kind textbook includes supplemental materials to accommodate students with minimal knowledge of math and clearly identifies sections with more advanced material so that readers can skip them if they so choose.

  • Analyzes real-world data using the powerful, open-sourced statistical program R, which is free for everyone to use
  • Teaches how to measure, predict, and explain quantities of interest based on data
  • Shows how to infer population characteristics using survey research, predict outcomes using linear models, and estimate causal effects with and without randomized experiments
  • Assumes no prior knowledge of statistics or coding
  • Specifically designed to accommodate students with a variety of math backgrounds
  • Provides cheatsheets of statistical concepts and R code
  • Supporting materials available online, including real-world datasets and the code to analyze them, plus—for instructor use—sample syllabi, sample lecture slides, additional datasets, and additional exercises with solutions…(More)”.