Ten lessons for data sharing with a data commons


Article by Robert L. Grossman: “..Lesson 1. Build a commons for a specific community with a specific set of research challenges

Although there are a few data repositories that serve the general scientific community that have proved successful, in general data commons that target a specific user community have proven to be the most successful. The first lesson is to build a data commons for a specific research community that is struggling to answer specific research challenges with data. As a consequence, a data commons is a partnership between the data scientists developing and supporting the commons and the disciplinary scientists with the research challenges.

Lesson 2. Successful commons curate and harmonize the data

Successful commons curate and harmonize the data and produce data products of broad interest to the community. It’s time consuming, expensive, and labor intensive to curate and harmonize data, by much of the value of data commons is centralizing this work so that it can be done once instead of many times by each group that needs the data. These days, it is very easy to think of a data commons as a platform containing data, not spend the time curating or harmonizing it, and then be surprised that the data in the commons is not used more widely used and its impact is not as high as expected.

Lesson 3. It’s ultimately about the data and its value to generate new research discoveries

Despite the importance of a study, few scientists will try to replicate previously published studies. Instead, data is usually accessed if it can lead to a new high impact paper. For this reason, data commons play two different but related roles. First, they preserve data for reproducible science. This is a small fraction of the data access, but plays a critical role in reproducible science. Second, data commons make data available for new high value science.

Lesson 4. Reduce barriers to access to increase usage

A useful rule of thumb is that every barrier to data access cuts down access by a factor of 10. Common barriers that reduce use of a commons include: registration vs no-registration; open access vs controlled access; click through agreements vs signing of data usage agreements and approval by data access committees; license restrictions on the use of the data vs no license restrictions…(More)”.

Researchers scramble as Twitter plans to end free data access


Article by Heidi Ledford: “Akin Ünver has been using Twitter data for years. He investigates some of the biggest issues in social science, including political polarization, fake news and online extremism. But earlier this month, he had to set aside time to focus on a pressing emergency: helping relief efforts in Turkey and Syria after the devastating earthquake on 6 February.

Aid workers in the region have been racing to rescue people trapped by debris and to provide health care and supplies to those displaced by the tragedy. Twitter has been invaluable for collecting real-time data and generating crucial maps to direct the response, says Ünver, a computational social scientist at Özyeğin University in Istanbul.

So when he heard that Twitter was about to end its policy of providing free access to its application programming interface (API) — a pivotal set of rules that allows people to extract and process large amounts of data from the platform — he was dismayed. “Couldn’t come at a worse time,” he tweeted. “Most analysts and programmers that are building apps and functions for Turkey earthquake aid and relief, and are literally saving lives, are reliant on Twitter API.”..

Twitter has long offered academics free access to its API, an unusual approach that has been instrumental in the rise of computational approaches to studying social media. So when the company announced on 2 February that it would end that free access in a matter of days, it sent the field into a tailspin. “Thousands of research projects running over more than a decade would not be possible if the API wasn’t free,” says Patty Kostkova, who specializes in digital health studies at University College London…(More)”.

Managing Intellectual Property Rights in Citizen Science: A Guide for Researchers and Citizen Scientists


Report by Teresa Scassa & Haewon Chung: “IP issues arise in citizen science in a variety of different ways. Indeed, the more broadly the concept of citizen science is cast, the more diverse the potential IP interests. Some community-based projects, for example, may well involve the sharing of traditional knowledge, whereas open innovation projects are ones that are most likely to raise patent issues and to do so in a context where commercialization is a project goal. Trademark issues may also arise, particularly where a project gains a certain degree of renown. In this study we touch on issues of patenting and commercialization; however, we also recognize that most citizen science projects do not have commercialization as an objective, and have IP issues that flow predominantly from copyright law. This guide navigates these issues topically and points the reader towards further research and law in this area should they wish to gain an even more comprehensive understanding of the nuances. It accompanies a prior study conducted by the same authors that created a Typology of Citizen Science Projects from an Intellecutal Property Perspective…(More)”.

ChatGPT reminds us why good questions matter


Article by Stefaan Verhulst and Anil Ananthaswamy: “Over 100 million people used ChatGPT in January alone, according to one estimate, making it the fastest-growing consumer application in history. By producing resumes, essays, jokes and even poetry in response to prompts, the software brings into focus not just language models’ arresting power, but the importance of framing our questions correctly.

To that end, a few years ago I initiated the 100 Questions Initiative, which seeks to catalyse a cultural shift in the way we leverage data and develop scientific insights. The project aims not only to generate new questions, but also reimagine the process of asking them…

As a species and a society, we tend to look for answers. Answers appear to provide a sense of clarity and certainty, and can help guide our actions and policy decisions. Yet any answer represents a provisional end-stage of a process that begins with questions – and often can generate more questions. Einstein drew attention to the critical importance of how questions are framed, which can often determine (or at least play a significant role in determining) the answers we ultimately reach. Frame a question differently and one might reach a different answer. Yet as a society we undervalue the act of questioning – who formulates questions, how they do so, the impact they have on what we investigate, and on the decisions we make. Nor do we pay sufficient attention to whether the answers are in fact addressing the questions initially posed…(More)”.

Ready, set, share: Researchers brace for new data-sharing rules


Jocelyn Kaiser and Jeffrey Brainard in Science: “…By 2025, new U.S. requirements for data sharing will extend beyond biomedical research to encompass researchers across all scientific disciplines who receive federal research funding. Some funders in the European Union and China have also enacted data-sharing requirements. The new U.S. moves are feeding hopes that a worldwide movement toward increased sharing is in the offing. Supporters think it could speed the pace and reliability of science.

Some scientists may only need to make a few adjustments to comply with the policies. That’s because data sharing is already common in fields such as protein crystallography and astronomy. But in other fields the task could be weighty, because sharing is often an afterthought. For example, a study involving 7750 medical research papers found that just 9% of those published from 2015 to 2020 promised to make their data publicly available, and authors of just 3% actually shared, says lead author Daniel Hamilton of the University of Melbourne, who described the finding at the International Congress on Peer Review and Scientific Publication in September 2022. Even when authors promise to share their data, they often fail to follow through. Out of 21,000 journal articles that included data-sharing plans, a study published in PLOS ONE in 2020 found, fewer than 21% provided links to the repository storing the data.

Journals and funders, too, have a mixed record when it comes to supporting data sharing. Research presented at the September 2022 peer-review congress found only about half of the 110 largest public, corporate, and philanthropic funders of health research around the world recommend or require grantees to share data…

“Health research is the field where the ethical obligation to share data is the highest,” says Aidan Tan, a clinician-researcher at the University of Sydney who led the study. “People volunteer in clinical trials and put themselves at risk to advance medical research and ultimately improve human health.”

Across many fields of science, researchers’ support for sharing data has increased during the past decade, surveys show. But given the potential cost and complexity, many are apprehensive about the NIH policy, and other requirements to follow. “How we get there is pretty messy right now,” says Parker Antin, a developmental biologist and associate vice president for research at the University of Arizona. “I’m really not sure whether the total return will justify the cost. But I don’t know of any other way to find out than trying to do it.”

Science offers this guide as researchers prepare to plunge in….(More)”.

Computational Social Science for the Public Good: Towards a Taxonomy of Governance and Policy Challenges


Chapter by Stefaan G. Verhulst: “Computational Social Science (CSS) has grown exponentially as the process of datafication and computation has increased. This expansion, however, is yet to translate into effective actions to strengthen public good in the form of policy insights and interventions. This chapter presents 20 limiting factors in how data is accessed and analysed in the field of CSS. The challenges are grouped into the following six categories based on their area of direct impact: Data Ecosystem, Data Governance, Research Design, Computational Structures and Processes, the Scientific Ecosystem, and Societal Impact. Through this chapter, we seek to construct a taxonomy of CSS governance and policy challenges. By first identifying the problems, we can then move to effectively address them through research, funding, and governance agendas that drive stronger outcomes…(More)”. Full Book: Handbook of Computational Social Science for Policy

Kid-edited journal pushes scientists for clear writing on complex topics


Article by Mark Johnson: “The reviewer was not impressed with the paper written by Israeli brain researcher Idan Segev and a colleague from Switzerland.

“Professor Idan,” she wrote to Segev. “I didn’t understand anything that you said.”

Segev and co-author Felix Schürmann revised their paper on the Human Brain project, a massive effort seeking to channel all that we know about the mind into a vast computer model. But once again the reviewer sent it back. Still not clear enough. It took a third version to satisfy the reviewer.

“Okay,” said the reviewer, an 11-year-old girl from New York named Abby. “Now I understand.”

Such is the stringent editing process at the online science journal Frontiers for Young Minds, where top scientists, some of them Nobel Prize winners, submit papers on gene-editinggravitational waves and other topics — to demanding reviewers ages 8 through 15.

Launched in 2013, the Lausanne, Switzerland-based publication is coming of age at a moment when skeptical members of the public look to scientists for clear guidance on the coronavirus and on potentially catastrophic climate change, among other issues. At Frontiers for Young Minds, the goal is not just to publish science papers but also to make them accessible to young readers like the reviewers. In doing so, it takes direct aim at a long-standing problem in science — poor communication between professionals and the public.

“Scientists tend to default to their own jargon and don’t think carefully about whether this is a word that the public actually knows,” said Jon Lorsch, director of the National Institute of General Medical Sciences. “Sometimes to actually explain something you need a sentence as opposed to the one word scientists are using.”

Dense language sends a message “that science is for scientists; that you have to be an ‘intellectual’ to read and understand scientific literature; and that science is not relevant or important for everyday life,” according to a paper published last year in Advances in Physiology Education.

Frontiers for Young Minds, which has drawn nearly 30 million online page views in its nine years, offers a different message on its homepage: “Science for kids, edited by kids.”..(More)”.

Report on the Future of Conferences


Arxiv Report by Steven Fraser and Dennis Mancl: “In 2020, virtual conferences became almost the only alternative to cancellation. Now that the pandemic is subsiding, the pros and cons of virtual conferences need to be reevaluated. In this report, we scrutinize the dynamics and economics of conferences and highlight the history of successful virtual meetings in industry. We also report on the attitudes of conference attendees from an informal survey we ran in spring 2022…(More).

The ethical and legal landscape of brain data governance


Paper by Paschal Ochang , Bernd Carsten Stahl, and Damian Eke: “Neuroscience research is producing big brain data which informs both advancements in neuroscience research and drives the development of advanced datasets to provide advanced medical solutions. These brain data are produced under different jurisdictions in different formats and are governed under different regulations. The governance of data has become essential and critical resulting in the development of various governance structures to ensure that the quality, availability, findability, accessibility, usability, and utility of data is maintained. Furthermore, data governance is influenced by various ethical and legal principles. However, it is still not clear what ethical and legal principles should be used as a standard or baseline when managing brain data due to varying practices and evolving concepts. Therefore, this study asks what ethical and legal principles shape the current brain data governance landscape? A systematic scoping review and thematic analysis of articles focused on biomedical, neuro and brain data governance was carried out to identify the ethical and legal principles which shape the current brain data governance landscape. The results revealed that there is currently a large variation of how the principles are presented and discussions around the terms are very multidimensional. Some of the principles are still at their infancy and are barely visible. A range of principles emerged during the thematic analysis providing a potential list of principles which can provide a more comprehensive framework for brain data governance and a conceptual expansion of neuroethics…(More)”.

The Strength of Knowledge Ties


Paper by Luca Maria Aiello: “Social relationships are probably the most important things we have in our life. They help us to get new jobslive longer, and be happier. At the scale of cities, networks of diverse social connections determine the economic prospects of a population. The strength of social ties is believed one of the key factors that regulate these outcomes. According to Granovetter’s classic theory about tie strength, information flows through social ties of two strengths: weak ties that are used infrequently but bridge distant groups that tend to posses diverse knowledge; and strong ties, that are used frequently, knit communities together, and provide dependable sources of support.

For decades, tie strength has been quantified using the frequency of interaction. Yet, frequency does not reflect Granovetter’s initial conception of strength, which in his view is a multidimensional concept, such as the “combination of the amount of time, the emotional intensity, intimacy, and services which characterize the tie.” Frequency of interaction is traditionally used as a proxy for more complex social processes mostly because it is relatively easy to measure (e.g., the number of calls in phone records). But what if we had a way to measure these social processes directly?

We used advanced techniques in Natural Language Processing (NLP) to quantify whether the text of a message conveys knowledge (whether the message provides information about a specific domain) or support (expressions of emotional or practical help), and applied it to a large conversation network from Reddit composed by 630K users resident in the United States, linked by 12.8M ties. Our hypothesis was that the resulting knowledge and support networks would fare better in predicting social outcomes than a traditional social network weighted by interaction frequency. In particular, borrowing a classic experimental setup, we tested whether the diversity of social connections of Reddit users resident in a specific US state would correlate with the economic opportunities in that state (estimated with GDP per capita)…(More)”.