Becoming a data steward

Shalini Kurapati at the LSE Impact Blog: “In the context of higher education, data stewards are the first point of reference for all data related questions. In my role as a data steward at TU Delft, I was able to advise, support and train researchers on various aspects of data management throughout the life cycle of a research project, from initial planning to post-publication. This included storing, managing and sharing research outputs such as data, images, models and code.

Data stewards also advise researchers on the ethical, policy and legal considerations during data collection, processing and dissemination. In a way, they are general practitioners for research data management and can usually solve most problems faced by academics. In cases that require specialist intervention, they also serve as a key point for referral (eg: IT, patent, legal experts).

Data stewardship is often organised centrally through the university library. (Subject) Data librarians, research data consultants and research data officers, usually perform similar roles to data stewards. However, TU Delft operates a decentralised model, where data stewards are placed within faculties as disciplinary experts with research experience. This allows data stewards to provide discipline specific support to researchers, which is particularly beneficial, as the concept of what data is itself varies across disciplines….(More)”.

Timing Technology

Blog by Gwern Branwen: “Technological forecasts are often surprisingly prescient in terms of predicting that something was possible & desirable and what they predict eventually happens; but they are far less successful at predicting the timing, and almost always fail, with the success (and riches) going to another.

Why is their knowledge so useless? The right moment cannot be known exactly in advance, so attempts to forecast will typically be off by years or worse. For many claims, there is no way to invest in an idea except by going all in and launching a company, resulting in extreme variance in outcomes, even when the idea is good and the forecasts correct about the (eventual) outcome.

Progress can happen and can be foreseen long before, but the details and exact timing due to bottlenecks are too difficult to get right. Launching too early means failure, but being conservative & launching later is just as bad because regardless of forecasting, a good idea will draw overly-optimistic researchers or entrepreneurs to it like moths to a flame: all get immolated but the one with the dumb luck to kiss the flame at the perfect instant, who then wins everything, at which point everyone can see that the optimal time is past. All major success stories overshadow their long list of predecessors who did the same thing, but got unlucky. So, ideas can be divided into the overly-optimistic & likely doomed, or the fait accompli. On an individual level, ideas are worthless because so many others have them too—‘multiple invention’ is the rule, and not the exception.

This overall problem falls under the reinforcement learning paradigm, and successful approaches are analogous to Thompson sampling/posterior sampling: even an informed strategy can’t reliably beat random exploration which gradually shifts towards successful areas while continuing to take occasional long shots. Since people tend to systematically over-exploit, how is this implemented? Apparently by individuals acting suboptimally on the personal level, but optimally on societal level by serving as random exploration.

A major benefit of R&D, then, is in laying fallow until the ‘ripe time’ when they can be immediately exploited in previously-unpredictable ways; applied R&D or VC strategies should focus on maintaining diversity of investments, while continuing to flexibly revisit previous failures which forecasts indicate may have reached ‘ripe time’. This balances overall exploitation & exploration to progress as fast as possible, showing the usefulness of technological forecasting on a global level despite its uselessness to individuals….(More)”.

Supporting priority setting in science using research funding landscapes

Report by the Research on Research Institute: “In this working paper, we describe how to map research funding landscapes in order to support research funders in setting priorities. Based on data on scientific publications, a funding landscape highlights the research fields that are supported by different funders. The funding landscape described here has been created using data from the Dimensions database. It is presented using a freely available web-based tool that provides an interactive visualization of the landscape. We demonstrate the use of the tool through a case study in which we analyze funding of mental health research…(More)”.

Robotic Bureaucracy: Administrative Burden and Red Tape in University Research

Essay by Barry Bozeman and Jan Youtie: “…examines university research administration and the use of software systems that automate university research grants and contract administration, including the automatic sending of emails for reporting and compliance purposes. These systems are described as “robotic bureaucracy.” The rise of regulations and their contribution to administrative burden on university research have led university administrators to increasingly rely on robotic bureaucracy to handle compliance. This article draws on the administrative burden, behavioral public administration, and electronic communications and management literatures, which are increasingly focused on the psychological and cognitive bases of behavior. These literatures suggest that the assumptions behind robotic bureaucracy ignore the extent to which these systems shift the burden of compliance from administrators to researchers….(More)”.

Why Trust Science?

Book by Naomi Oreskes: “Do doctors really know what they are talking about when they tell us vaccines are safe? Should we take climate experts at their word when they warn us about the perils of global warming? Why should we trust science when our own politicians don’t? In this landmark book, Naomi Oreskes offers a bold and compelling defense of science, revealing why the social character of scientific knowledge is its greatest strength—and the greatest reason we can trust it.

Tracing the history and philosophy of science from the late nineteenth century to today, Oreskes explains that, contrary to popular belief, there is no single scientific method. Rather, the trustworthiness of scientific claims derives from the social process by which they are rigorously vetted. This process is not perfect—nothing ever is when humans are involved—but she draws vital lessons from cases where scientists got it wrong. Oreskes shows how consensus is a crucial indicator of when a scientific matter has been settled, and when the knowledge produced is likely to be trustworthy.

Based on the Tanner Lectures on Human Values at Princeton University, this timely and provocative book features critical responses by climate experts Ottmar Edenhofer and Martin Kowarsch, political scientist Jon Krosnick, philosopher of science Marc Lange, and science historian Susan Lindee, as well as a foreword by political theorist Stephen Macedo….(More)”.

Restricting data’s use: A spectrum of concerns in need of flexible approaches

Dharma Akmon and Susan Jekielek at IASSIST Quaterly: “As researchers consider making their data available to others, they are concerned with the responsible use of data. As a result, they often seek to place restrictions on secondary use. The Research Connections archive at ICPSR makes available the datasets of dozens of studies related to childcare and early education. Of the 103 studies archived to date, 20 have some restrictions on access. While ICPSR’s data access systems were designed primarily to accommodate public use data (i.e. data without disclosure concerns) and potentially disclosive data, our interactions with depositors reveal a more nuanced notion range of needs for restricting use. Some data present a relatively low risk of threatening participants’ confidentiality, yet the data producers still want to monitor who is accessing the data and how they plan to use them. Other studies contain data with such a high risk of disclosure that their use must be restricted to a virtual data enclave. Still other studies rest on agreements with participants that require continuing oversight of secondary use by data producers, funders, and participants. This paper describes data producers’ range of needs to restrict data access and discusses how systems can better accommodate these needs….(More)”.

The Church of Techno-Optimism

Margaret O’Mara at the New York Times: “…But Silicon Valley does have a politics. It is neither liberal nor conservative. Nor is it libertarian, despite the dog-eared copies of Ayn Rand’s novels that you might find strewn about the cubicles of a start-up in Palo Alto.

It is techno-optimism: the belief that technology and technologists are building the future and that the rest of the world, including government, needs to catch up. And this creed burns brightly, undimmed by the anti-tech backlash. “It’s now up to all of us together to harness this tremendous energy to benefit all humanity,” the venture capitalist Frank Chen said in a November 2018 speech about artificial intelligence. “We are going to build a road to space,” Jeff Bezos declared as he unveiled plans for a lunar lander last spring. And as Elon Musk recently asked his Tesla shareholders, “Would I be doing this if I weren’t optimistic?”

But this is about more than just Silicon Valley. Techno-optimism has deep roots in American political culture, and its belief in American ingenuity and technological progress. Reckoning with that history is crucial to the discussion about how to rein in Big Tech’s seemingly limitless power.

The language of techno-optimism first appears in the rhetoric of American politics after World War II. “Science, the Endless Frontier” was the title of the soaringly techno-optimistic 1945 report by Vannevar Bush, the chief science adviser to Franklin Roosevelt and Harry Truman, which set in motion the American government’s unprecedented postwar spending on research and development. That wave of money transformed the Santa Clara Valley and turned Stanford University into an engineering powerhouse. Dwight Eisenhower filled the White House with advisers whom he called “my scientists.” John Kennedy, announcing America’s moon shot in 1962, declared that “man, in his quest for knowledge and progress, is determined and cannot be deterred.”

In a 1963 speech, a founder of Hewlett-Packard, David Packard, looked back on his life during the Depression and marveled at the world that he lived in, giving much of the credit to technological innovation unhindered by bureaucratic interference: “Radio, television, Teletype, the vast array of publications of all types bring to a majority of the people everywhere in the world information in considerable detail, about what is going on everywhere else. Horizons are opened up, new aspirations are generated.”…(More)”

Goodhart’s Law: Are Academic Metrics Being Gamed?

Essay by Michael Fire: “…We attained the following five key insights from our study:

First, these results support Goodhart’s Law as it relates to academic publishing; that is, traditional measures (e.g., number of papers, number of citations, h-index, and impact factor) have become targets, and are no longer true measures importance/impact. By making papers shorter and collaborating with more authors, researchers are able to produce more papers in the same amount of time. Moreover, the majority of changes in papers’ structure are correlated with papers that receive higher numbers of citations. Authors can use longer titles and abstracts, or use question or exclamation marks in titles, to make their papers more appealing for readers and increase citations, i.e. academic clickbait. These results support our hypothesis that academic papers have evolved in order to score a bullseye on target metrics.

Second, it is clear that citation number has become a target for some researchers. We observe a general increasing trend for researchers to cite their previous work in their new studies, with some authors self citing dozens, or even hundreds, of times. Moreover, a huge quantity of papers – over 72% of all papers and 25% of all papers with at least 5 references – have no citations at all after 5 years. Clearly, a signficant amount of resources is spent on papers with limited impact, which may indicate that researchers are publishing more papers of poorer quality to boost their total number of publications. Additionally, we noted that different decades have very different paper citation distributions. Consequently, comparing citation records of researchers who published papers in different time periods can be challenging.

Number of self-citations over time

Third, we observed an exponential growth in the number of new researchers who publish papers, likely due to career pressures. …(More)”.

The Why of the World

Book review by Tim Maudlin of The Book of Why: The New Science of Cause and Effect by Judea Pearl and Dana Mackenzie: “Correlation is not causation.” Though true and important, the warning has hardened into the familiarity of a cliché. Stock examples of so-called spurious correlations are now a dime a dozen. As one example goes, a Pacific island tribe believed flea infestations to be good for one’s health because they observed that healthy people had fleas while sick people did not. The correlation is real and robust, but fleas do not cause health, of course: they merely indicate it. Fleas on a fevered body abandon ship and seek a healthier host. One should not seek out and encourage fleas in the quest to ward off sickness.

The rub lies in another observation: that the evidence for causation seems to lie entirely in correlations. But for seeing correlations, we would have no clue about causation. The only reason we discovered that smoking causes lung cancer, for example, is that we observed correlations in that particular circumstance. And thus a puzzle arises: if causation cannot be reduced to correlation, how can correlation serve as evidence of causation?

The Book of Why, co-authored by the computer scientist Judea Pearl and the science writer Dana Mackenzie, sets out to give a new answer to this old question, which has been around—in some form or another, posed by scientists and philosophers alike—at least since the Enlightenment. In 2011 Pearl won the Turing Award, computer science’s highest honor, for “fundamental contributions to artificial intelligence through the development of a calculus of probabilistic and causal reasoning,” and this book sets out to explain what all that means for a general audience, updating his more technical book on the same subject, Causality, published nearly two decades ago. Written in the first person, the new volume mixes theory, history, and memoir, detailing both the technical tools of causal reasoning Pearl has developed as well as the tortuous path by which he arrived at them—all along bucking a scientific establishment that, in his telling, had long ago contented itself with data-crunching analysis of correlations at the expense of investigation of causes. There are nuggets of wisdom and cautionary tales in both these aspects of the book, the scientific as well as the sociological…(More)”.

Computational Communication Science

Introduction to Special Issue of the International Journal of Communication:”Over the past two decades, processes of digitalization and mediatization have shaped the communication landscape and have had a strong impact on various facets of communication. The digitalization of communication results in completely new forms of digital traces that make communication processes observable in new and unprecedented ways. Although many scholars in the social sciences acknowledge the chances and requirements of the digital revolution in communication, they are also facing fundamental challenges in implementing successful research programs, strategies, and designs that are based on computational methods and “big data.” This Special Section aims at bringing together seminal perspectives on challenges and chances of computational communication science (CCS). In this introduction, we highlight the impulses provided by the research presented in the Special Section, discuss the most pressing challenges in the context of CCS, and sketch a potential roadmap for future research in this field….(More)”.