Impact through Engagement: Co-production of administrative data research


Paper by Elizabeth Nelson and Frances Burns: “The Administrative Data Research Centre Northern Ireland (ADRC NI) is a research partnership between Queen’s University Belfast and Ulster University to facilitate access to linked administrative data for research purposes for public benefit and for evidence-based policy development. This requires a social licence extended by publics which is maintained by a robust approach to engagement and involvement.

Public engagement is central to the ADRC NI’s approach to research. Research impact is pursued and secured through robust engagement and co-production of research with publics and key stakeholders. This is done by focusing on data subjects (the cohort of people whose lives make up the datasets, placing value on experts by experience outside of academic knowledge, and working with public(s) as key data advocates, through project steering committees and targeted events with stakeholders. The work is led by a dedicated Public Engagement, Communications and Impact Manager.

While there are strengths and weaknesses to the ADRC NI approach, examples of successful partnerships and clear pathways to impact demonstrate its utility and ability to amplify the positive impact of administrative data research. Working with publics as data use becomes more ubiquitous in a post-COVID-19 world will become more critical. ADRC NI’s model is a potential way forward….(More)”.

See also Special Issue on Public Involvement and Engagement by the International Journal of Population Data Science.

Commission proposes measures to boost data sharing and support European data spaces


Press Release: “To better exploit the potential of ever-growing data in a trustworthy European framework, the Commission today proposes new rules on data governance. The Regulation will facilitate data sharing across the EU and between sectors to create wealth for society, increase control and trust of both citizens and companies regarding their data, and offer an alternative European model to data handling practice of major tech platforms.

The amount of data generated by public bodies, businesses and citizens is constantly growing. It is expected to multiply by five between 2018 and 2025. These new rules will allow this data to be harnessed and will pave the way for sectoral European data spaces to benefit society, citizens and companies. In the Commission’s data strategy of February this year, nine such data spaces have been proposed, ranging from industry to energy, and from health to the European Green Deal. They will, for example, contribute to the green transition by improving the management of energy consumption, make delivery of personalised medicine a reality, and facilitate access to public services.

The Regulation includes:

  • A number of measures to increase trust in data sharing, as the lack of trust is currently a major obstacle and results in high costs.
  • Create new EU rules on neutrality to allow novel data intermediaries to function as trustworthy organisers of data sharing.
  • Measures to facilitate the reuse of certain data held by the public sector. For example, the reuse of health data could advance research to find cures for rare or chronic diseases.
  • Means to give Europeans control on the use of the data they generate, by making it easier and safer for companies and individuals to voluntarily make their data available for the wider common good under clear conditions….(More)”.

Introducing Reach: find and track research being put into action


Blog by Dawn Duhaney: “At Wellcome Data Labs we’re releasing our first product, Reach. Our goal is to support funding organisations and researchers by making it easier to find and track scientific research being put into action by governments and global health organisations.

https://reach.wellcomedatalabs.org/
https://reach.wellcomedatalabs.org/

We focused on solving this problem in collaboration with our internal Insights and Analysis team for Wellcome and with partner organisations before deciding to release Reach more widely.

We found that evaluation teams wanted tools to help them measure the influence academic research was having on policy making institutions. We noticed that it is often challenging to track how scientific evidence makes its way into policy making. Institutions like the UK Government and the World Health Organisation have hundreds of thousands of policy documents available — it’s a heavily manual task to search through them to find evidence of our funded research.

At Wellcome we have some established methods for collecting evidence of policy influence from our funded research such as end of scheme reporting and via word of mouth. Through these methods we found great examples of how funded research was being put into policy and practice by government and global health organisations.

One example is from Kenya. The KEMRI Research Programme — a collaboration between the Kenyan Medical Research Institute, Wellcome and Oxford University launched a research programme to improve maternal health in 2005. Their research was cited in the World Health Organisation and with advocacy efforts from the KEMRI team influenced the development of new Kenyan national guidelines of paediatric care.

In Wellcome Data Labs we wanted to build a tool that would aid the discovery of evidence based policy making and be a step in the process of assessing research influence for evaluators, researchers and funding institutions….(More)”.

Armchair Survey Research: A Possible Post-COVID-19 Boon in Social Science


Paper by Samiul Hasan: “Post-COVID-19 technologies for higher education and corporate communication have opened-up wonderful opportunity for Online Survey Research. These technologies could be used for one-to-one interview, group interview, group questionnaire survey, online questionnaire survey, or even ‘focus group’ discussions. This new trend, which may aptly be called ‘armchair survey research’ may be the only or new trend in social science research. If that is the case, an obvious question might be what is ‘survey research’ and how is it going to be easier in the post-COVID-19 world? My intention is to offer some help to the promising researchers who have all quality and eagerness to undertake good social science research for publication, but no fund.

The text is divided into three main parts. Part one deals with “Science, Social Science and Research” to highlight some important points about the importance of ‘What’, ‘Why’, and ‘So what’ and ‘framing of a research question’ for a good research. Then the discussion moves to ‘reliability and validity’ in social science research including falsifiability, content validity, and construct validity. This part ends with discussions on concepts, constructs, and variables in a theoretical (conceptual) framework. The second part deals categorically with ‘survey research’ highlighting the use and features of interviews and questionnaire surveys. It deals primarily with the importance and use of nominal response or scale and ordinal response or scale as well as the essentials of question content and wording, and question sequencing. The last part deals with survey research in the post-COVID-19 period highlighting strategies for undertaking better online survey research, without any fund….(More)”.

Scaling up Citizen Science


Report for the European Commission: “The rapid pace of technology advancements, the open innovation paradigm, and the ubiquity of high-speed connectivity, greatly facilitate access to information to individuals, increasing their opportunities to achieve greater emancipation and empowerment. This provides new opportunities for widening participation in scientific research and policy, thus opening a myriad of avenues driving a paradigm shift across fields and disciplines, including the strengthening of Citizen Science. Nowadays, the application of Citizen Science principles spans across several scientific disciplines, covering different geographical scales. While the interdisciplinary approach taken so far has shown significant results and findings, the current situation depicts a wide range of projects that are heavily context-dependent and where the learning outcomes of pilots are very much situated within the specific areas in which these projects are implemented. There is little evidence on how to foster the spread and scalability in Citizen Science. Furthermore, the Citizen Science community currently lacks a general agreement on what these terms mean, entail and how these can be approached.

To address these issues, we developed a theoretically grounded framework to unbundle the meaning of scaling and spreading in Citizen Science. In this framework, we defined nine constructs that represent the enablers of these complex phenomena. We then validated, enriched, and instantiated this framework through four qualitative case studies of, diverse, successful examples of scaling and spreading in Citizen Science. The framework and the rich experiences allow formulating four theoretically and empirically grounded scaling scenarios. We propose the framework and the in-depth case studies as the main contribution from this report. We hope to stimulate future research to further refine our understanding of the important, complex and multifaceted phenomena of scaling and spreading in Citizen Science. The framework also proposes a structured mindset for practitioners that either want to ideate and start a new Citizen Science intervention that is scalable-by-design, or for those that are interested in assessing the scalability potential of an existing initiative….(More)”.

tl;dr: this AI sums up research papers in a sentence


Jeffrey M. Perkel & Richard Van Noorden at Nature: “The creators of a scientific search engine have unveiled software that automatically generates one-sentence summaries of research papers, which they say could help scientists to skim-read papers faster.

The free tool, which creates what the team calls TLDRs (the common Internet acronym for ‘Too long, didn’t read’), was activated this week for search results at Semantic Scholar, a search engine created by the non-profit Allen Institute for Artificial Intelligence (AI2) in Seattle, Washington. For the moment, the software generates sentences only for the ten million computer-science papers covered by Semantic Scholar, but papers from other disciplines should be getting summaries in the next month or so, once the software has been fine-tuned, says Dan Weld, who manages the Semantic Scholar group at AI2…

Weld was inspired to create the TLDR software in part by the snappy sentences his colleagues share on Twitter to flag up articles. Like other language-generation software, the tool uses deep neural networks trained on vast amounts of text. The team included tens of thousands of research papers matched to their titles, so that the network could learn to generate concise sentences. The researchers then fine-tuned the software to summarize content by training it on a new data set of a few thousand computer-science papers with matching summaries, some written by the papers’ authors and some by a class of undergraduate students. The team has gathered training examples to improve the software’s performance in 16 other fields, with biomedicine likely to come first.

The TLDR software is not the only scientific summarizing tool: since 2018, the website Paper Digest has offered summaries of papers, but it seems to extract key sentences from text, rather than generate new ones, Weld notes. TLDR can generate a sentence from a paper’s abstract, introduction and conclusion. Its summaries tend to be built from key phrases in the article’s text, so are aimed squarely at experts who already understand a paper’s jargon. But Weld says the team is working on generating summaries for non-expert audiences….(More)”.

Facial-recognition research needs an ethical reckoning


Editorial in Nature: “…As Nature reports in a series of Features on facial recognition this week, many in the field are rightly worried about how the technology is being used. They know that their work enables people to be easily identified, and therefore targeted, on an unprecedented scale. Some scientists are analysing the inaccuracies and biases inherent in facial-recognition technology, warning of discrimination, and joining the campaigners calling for stronger regulation, greater transparency, consultation with the communities that are being monitored by cameras — and for use of the technology to be suspended while lawmakers reconsider where and how it should be used. The technology might well have benefits, but these need to be assessed against the risks, which is why it needs to be properly and carefully regulated.Is facial recognition too biased to be let loose?

Responsible studies

Some scientists are urging a rethink of ethics in the field of facial-recognition research, too. They are arguing, for example, that scientists should not be doing certain types of research. Many are angry about academic studies that sought to study the faces of people from vulnerable groups, such as the Uyghur population in China, whom the government has subjected to surveillance and detained on a mass scale.

Others have condemned papers that sought to classify faces by scientifically and ethically dubious measures such as criminality….One problem is that AI guidance tends to consist of principles that aren’t easily translated into practice. Last year, the philosopher Brent Mittelstadt at the University of Oxford, UK, noted that at least 84 AI ethics initiatives had produced high-level principles on both the ethical development and deployment of AI (B. Mittelstadt Nature Mach. Intell. 1, 501–507; 2019). These tended to converge around classical medical-ethics concepts, such as respect for human autonomy, the prevention of harm, fairness and explicability (or transparency). But Mittelstadt pointed out that different cultures disagree fundamentally on what principles such as ‘fairness’ or ‘respect for autonomy’ actually mean in practice. Medicine has internationally agreed norms for preventing harm to patients, and robust accountability mechanisms. AI lacks these, Mittelstadt noted. Specific case studies and worked examples would be much more helpful to prevent ethics guidance becoming little more than window-dressing….(More)”.

Leveraging Open Data with a National Open Computing Strategy


Policy Brief by Lara Mangravite and John Wilbanks: “Open data mandates and investments in public data resources, such as the Human Genome Project or the U.S. National Oceanic and Atmospheric Administration Data Discovery Portal, have provided essential data sets at a scale not possible without government support. By responsibly sharing data for wide reuse, federal policy can spur innovation inside the academy and in citizen science communities. These approaches are enabled by private-sector advances in cloud computing services and the government has benefited from innovation in this domain. However, the use of commercial products to manage the storage of and access to public data resources poses several challenges.

First, too many cloud computing systems fail to properly secure data against breaches, improperly share copies of data with other vendors, or use data to add to their own secretive and proprietary models. As a result, the public does not trust technology companies to responsibly manage public data—particularly private data of individual citizens. These fears are exacerbated by the market power of the major cloud computing providers, which may limit the ability of individuals or institutions to negotiate appropriate terms. This impacts the willingness of U.S. citizens to have their personal information included within these databases.

Second, open data solutions are springing up across multiple sectors without coordination. The federal government is funding a series of independent programs that are working to solve the same problem, leading to a costly duplication of effort across programs.

Third and most importantly, the high costs of data storage, transfer, and analysis preclude many academics, scientists, and researchers from taking advantage of governmental open data resources. Cloud computing has radically lowered the costs of high-performance computing, but it is still not free. The cost of building the wrong model at the wrong time can quickly run into tens of thousands of dollars.

Scarce resources mean that many academic data scientists are unable or unwilling to spend their limited funds to reuse data in exploratory analyses outside their narrow projects. And citizen scientists must use personal funds, which are especially scarce in communities traditionally underrepresented in research. The vast majority of public data made available through existing open science policy is therefore left unused, either as reference material or as “foreground” for new hypotheses and discoveries….The Solution: Public Cloud Computing…(More)”.

Not fit for Purpose: A critical analysis of the ‘Five Safes’


Paper by Chris Culnane, Benjamin I. P. Rubinstein, and David Watts: “Adopted by government agencies in Australia, New Zealand, and the UK as policy instrument or as embodied into legislation, the ‘Five Safes’ framework aims to manage risks of releasing data derived from personal information. Despite its popularity, the Five Safes has undergone little legal or technical critical analysis. We argue that the Fives Safes is fundamentally flawed: from being disconnected from existing legal protections and appropriation of notions of safety without providing any means to prefer strong technical measures, to viewing disclosure risk as static through time and not requiring repeat assessment. The Five Safes provides little confidence that resulting data sharing is performed using ‘safety’ best practice or for purposes in service of public interest….(More)”.

Statistical illiteracy isn’t a niche problem. During a pandemic, it can be fatal


Article by Carlo Rovelli: “In the institute where I used to work a few years ago, a rare non-infectious illness hit five colleagues in quick succession. There was a sense of alarm, and a hunt for the cause of the problem. In the past the building had been used as a biology lab, so we thought that there might be some sort of chemical contamination, but nothing was found. The level of apprehension grew. Some looked for work elsewhere.

One evening, at a dinner party, I mentioned these events to a friend who is a mathematician, and he burst out laughing. “There are 400 tiles on the floor of this room; if I throw 100 grains of rice into the air, will I find,” he asked us, “five grains on any one tile?” We replied in the negative: there was only one grain for every four tiles: not enough to have five on a single tile.

We were wrong. We tried numerous times, actually throwing the rice, and there was always a tile with two, three, four, even five or more grains on it. Why? Why would grains “flung randomly” not arrange themselves into good order, equidistant from each other?

Because they land, precisely, by chance, and there are always disorderly grains that fall on tiles where others have already gathered. Suddenly the strange case of the five ill colleagues seemed very different. Five grains of rice falling on the same tile does not mean that the tile possesses some kind of “rice-­attracting” force. Five people falling ill in a workplace did not mean that it must be contaminated. The institute where I worked was part of a university. We, know-­all professors, had fallen into a gross statistical error. We had become convinced that the “above average” number of sick people required an explanation. Some had even gone elsewhere, changing jobs for no good reason.

Life is full of stories such as this. Insufficient understanding of statistics is widespread. The current pandemic has forced us all to engage in probabilistic reasoning, from governments having to recommend behaviour on the basis of statistical predictions, to people estimating the probability of catching the virus while taking part in common activities. Our extensive statistical illiteracy is today particularly dangerous.

We use probabilistic reasoning every day, and most of us have a vague understanding of averages, variability and correlations. But we use them in an approximate fashion, often making errors. Statistics sharpen and refine these notions, giving them a precise definition, allowing us to reliably evaluate, for instance, whether a medicine or a building is dangerous or not.

Society would gain significant advantages if children were taught the fundamental ideas of probability theory and statistics: in simple form in primary school, and in greater depth in secondary school….(More)”.