Formalised data citation practices would encourage more authors to make their data available for reuse


 Hyoungjoo Park and Dietmar Wolfram at the LSE Impact Blog: “Today’s researchers work in a heavily data-intensive and collaborative environment in order to further scientific discovery across and within fields. It is becoming routine for researchers (i.e. authors and data publishers) to submit their research data, such as datasets, biological samples in biomedical fields, and computer code, as supplementary information in order to comply with data sharing requirements of major funding agencies, high-profile journals, and data journals. This is part of open science, where data and any publication products are expected to be made available to anyone interested.

Given that researchers benefit from publicly shared data through data reuse in their own research, researchers who provide access to data should be acknowledged for their contributions, much in the same way that authors are recognised for their research publications through citation. Researchers who use shared data or other shared research products (e.g. open access software, tissue cultures) should also acknowledge the providers of these resources through formal citation. At present, data citation is not widely practised in most disciplines and as an object of study remains largely overlooked….

We found that data citations appear in the references section of an article less frequently than in the main text, making it difficult to identify the reward and credit for data authors (i.e. data sharers). Consistent data citation formats could not be found. Current data citation practices do not (yet) benefit data sharers. Also, data citation was sometimes located in the supplementary information, outside of the references. Data that had been reused was often not acknowledged in the reference lists, but was rather hidden in the representation of data (e.g. tables, figures, images, graphs, and other elements), which may be a consequence of the fact that data citation practices are not yet common in scholarly communications.

Ongoing challenges remain in identifying and documenting data citation. First, the practice of informal data citation presents a challenge for accurately documenting data citation. …

Second, data recitation by one or more co-authors of earlier studies (i.e. self-citation) is common, which reduces the broader impact of data sharing by limiting much of the reuse to the original authors..

Third, currently indexed data citations may not include rapidly advancing areas, such as in the hard sciences or computer engineering, because approximately 90% of indexed works were associated with journal articles…

Fourth, the number of authors associated with shared datasets raises questions of the ownership of and responsibility for a collective work, although some journals require one author to be responsible for the data used in the study…(More). (See also An examination of research data sharing and re-use: implications for data citation practice, published in Scientometrics)

Open Governance as a Service


Andrei Sambra and Lalana Kagal for the 2017 ACM on Web Science Conference: “This extended abstract discusses how public services can become more open and engage citizens more actively, by providing the local, public administration with the right tools. It calls for public services to think more creatively about how they can collaborate with the public to make better use of the energy and enthusiasm, as well as missing skills that people have and want to offer. It explores the challenges, both in terms of policy and technology, that public services face in mobilizing resources that are by nature voluntary. We intend to provide the governance tools that enable public services to leverage skills coming from the local community, and improve their autonomy, transparency and analytical tools required for true open governance….(More)”.

Civic Tech for Urban Collaborative Governance


Hollie Russon-Gilman in PS: Political Science & Politics: “This article aims to contribute to a burgeoning field of ‘civic technology’ to identify precise pathways through which multi-stakeholder partnerships can foster, embed, and encourage more collaborative governance, outlining a research agenda to guide next steps. Instead of looking at technology as a civic panacea or, at the other extreme, as an irrelevant force, this article takes seriously both the democratic potential and the political constraints of the use of technology for more collaborative governance. The article begins by delineating contours of a civic definition of technology focused on generating public good, provides case study examples of civic tech deployed in America’s cities, raises research questions to inform future multi-stakeholder partnerships, and concludes with implications for the public sector workforce and ecosystem.”…(More)”.

AI, people, and society


Eric Horvitz at Science: “In an essay about his science fiction, Isaac Asimov reflected that “it became very common…to picture robots as dangerous devices that invariably destroyed their creators.” He rejected this view and formulated the “laws of robotics,” aimed at ensuring the safety and benevolence of robotic systems. Asimov’s stories about the relationship between people and robots were only a few years old when the phrase “artificial intelligence” (AI) was used for the first time in a 1955 proposal for a study on using computers to “…solve kinds of problems now reserved for humans.” Over the half-century since that study, AI has matured into subdisciplines that have yielded a constellation of methods that enable perception, learning, reasoning, and natural language understanding.

Growing exuberance about AI has come in the wake of surprising jumps in the accuracy of machine pattern recognition using methods referred to as “deep learning.” The advances have put new capabilities in the hands of consumers, including speech-to-speech translation and semi-autonomous driving. Yet, many hard challenges persist—and AI scientists remain mystified by numerous capabilities of human intellect.

Excitement about AI has been tempered by concerns about potential downsides. Some fear the rise of superintelligences and the loss of control of AI systems, echoing themes from age-old stories. Others have focused on nearer-term issues, highlighting potential adverse outcomes. For example, data-fueled classifiers used to guide high-stakes decisions in health care and criminal justice may be influenced by biases buried deep in data sets, leading to unfair and inaccurate inferences. Other imminent concerns include legal and ethical issues regarding decisions made by autonomous systems, difficulties with explaining inferences, threats to civil liberties through new forms of surveillance, precision manipulation aimed at persuasion, criminal uses of AI, destabilizing influences in military applications, and the potential to displace workers from jobs and to amplify inequities in wealth.

As we push AI science forward, it will be critical to address the influences of AI on people and society, on short- and long-term scales. Valuable assessments and guidance can be developed through focused studies, monitoring, and analysis. The broad reach of AI’s influences requires engagement with interdisciplinary groups, including computer scientists, social scientists, psychologists, economists, and lawyers. On longer-term issues, conversations are needed to bridge differences of opinion about the possibilities of superintelligence and malevolent AI. Promising directions include working to specify trajectories and outcomes, and engaging computer scientists and engineers with expertise in software verification, security, and principles of failsafe design….Asimov concludes in his essay, “I could not bring myself to believe that if knowledge presented danger, the solution was ignorance. To me, it always seemed that the solution had to be wisdom. You did not refuse to look at danger, rather you learned how to handle it safely.” Indeed, the path forward for AI should be guided by intellectual curiosity, care, and collaboration….(More)”

NIH-funded team uses smartphone data in global study of physical activity


National Institutes of Health: “Using a larger dataset than for any previous human movement study, National Institutes of Health-funded researchers at Stanford University in Palo Alto, California, have tracked physical activity by population for more than 100 countries. Their research follows on a recent estimate that more than 5 million people die each year from causes associated with inactivity.

The large-scale study of daily step data from anonymous smartphone users dials in on how countries, genders, and community types fare in terms of physical activity and what results may mean for intervention efforts around physical activity and obesity. The study was published July 10, 2017, in the advance online edition of Nature.

“Big data is not just about big numbers, but also the patterns that can explain important health trends,” said Grace Peng, Ph.D., director of the National Institute of Biomedical Imaging and Bioengineering (NIBIB) program in Computational Modeling, Simulation and Analysis.

“Data science and modeling can be immensely powerful tools. They can aid in harnessing and analyzing all the personalized data that we get from our phones and wearable devices.”

Almost three quarters of adults in developed countries and half of adults in developing economies carry a smartphone. The devices are equipped with tiny accelerometers, computer chip that maintains the orientation of the screen, and can also automatically record stepping motions. The users whose data contributed to this study subscribed to the Azumio Argus app, a free application for tracking physical activity and other health behaviors….

In addition to the step records, the researchers accessed age, gender, and height and weight status of users who registered the smartphone app. They used the same calculation that economists use for income inequality — called the Gini index — to calculate activity inequality by country.

“These results reveal how much of a population is activity-rich, and how much of a population is activity-poor,” Delp said. “In regions with high activity inequality there are many people who are activity poor, and activity inequality is a strong predictor of health outcomes.”…

The researchers investigated the idea that making improvements in a city’s walkability — creating an environment that is safe and enjoyable to walk — could reduce activity inequality and the activity gender gap.

“If you must cross major highways to get from point A to point B in a city, the walkability is low; people rely on cars,” Delp said. “In cities like New York and San Francisco, where you can get across town on foot safely, the city has high walkability.”

Data from 69 U.S. cities showed that higher walkability scores are associated with lower activity inequality. Higher walkability is associated with significantly more daily steps across all age, gender, and body-mass-index categories.  However, the researchers found that women recorded comparatively less activity than men in places that are less walkable.

The study exemplifies how smartphones can deliver new insights about key health behaviors, including what the authors categorize as the global pandemic of physical inactivity….(More)”.

Data and the City


Book edited by Rob Kitchin, Tracey P. Lauriault, and Gavin McArdle: “There is a long history of governments, businesses, science and citizens producing and utilizing data in order to monitor, regulate, profit from and make sense of the urban world. Recently, we have entered the age of big data, and now many aspects of everyday urban life are being captured as data and city management is mediated through data-driven technologies.

Data and the City is the first edited collection to provide an interdisciplinary analysis of how this new era of urban big data is reshaping how we come to know and govern cities, and the implications of such a transformation. This book looks at the creation of real-time cities and data-driven urbanism and considers the relationships at play. By taking a philosophical, political, practical and technical approach to urban data, the authors analyse the ways in which data is produced and framed within socio-technical systems. They then examine the constellation of existing and emerging urban data technologies. The volume concludes by considering the social and political ramifications of data-driven urbanism, questioning whom it serves and for what ends.

This book, the companion volume to 2016’s Code and the City, offers the first critical reflection on the relationship between data, data practices and the city, and how we come to know and understand cities through data. It will be crucial reading for those who wish to understand and conceptualize urban big data, data-driven urbanism and the development of smart cities….(More)”

Carnegie Mellon scientists use app to track foul odors in Pittsburgh


Ashley Murray at Pittsburgh Post-Gazette:If you smell something, say something. Scientists at Carnegie Mellon University want Pittsburghers to put their collective noses to the task and report foul smells using a mobile reporting application called Smell PGH.

Since the app launched last year, more than 1,300 users have reported foul smells more than 4,300 times — most of which they’ve described as “industrial,” “sulfur” or “woodsmoke.”

The app was developed at CMU’s Community Robotics, Education and Technology Empowerment (CREATE) Lab.

“The app is really about the community,” said Beatrice Dias, project director at the CREATE Lab. “To show that you’re not alone in your negative experiences of pollution impact.”

Smartphone users can create a “smell report” within the app, which has the capability to alert the Allegheny County Health Department.

Health department spokeswoman Melissa Wade said the agency has received and followed-up on 3,000 reports generated from the app.

Users can also view a real-time map of all smell reports in and around the city. A new feature added last month allows users to go back in time and play a time-lapse animation of little colored triangles — green, yellow and red, symbolizing varying degrees of smell — that pop up and disappear as odors were reported….

“The goal is I’m trying to predict the smell in the next few hours, like a weather forecast,” Mr. Hsu said. “Let’s say today from 12 to 1 p.m. we have 10 smell reports. I can check not only the smell reports, but the data from other sensor stations around Pittsburgh, so I know during this hour what the reading is of all the air-quality related variables, like PM 2.5, like sulfur and nitrogen oxides, [and] the wind speed, the wind direction. There are a lot of parameters we need to consider.”…

Another goal of this citizen science initiative, Mr. Hsu said, is to improve communication between the public and governmental regulation agencies, like the health department.

“Before this technology if you smelled something bad, you might not be sure if this came from ambient air, your neighborhood or just traffic issues,” Mr. Hsu said. “But if you use the app, you can see a lot of your neighbors are reporting, too. And then maybe the government can use this to see the problems in a city.”…(More)”.

Children and the Data Cycle: Rights And Ethics in a Big Data World


Gabrielle Berman andKerry Albright at UNICEF: “In an era of increasing dependence on data science and big data, the voices of one set of major stakeholders – the world’s children and those who advocate on their behalf – have been largely absent. A recent paper estimates one in three global internet users is a child, yet there has been little rigorous debate or understanding of how to adapt traditional, offline ethical standards for research involving data collection from children, to a big data, online environment (Livingstone et al., 2015). This paper argues that due to the potential for severe, long-lasting and differential impacts on children, child rights need to be firmly integrated onto the agendas of global debates about ethics and data science. The authors outline their rationale for a greater focus on child rights and ethics in data science and suggest steps to move forward, focusing on the various actors within the data chain including data generators, collectors, analysts and end-users. It concludes by calling for a much stronger appreciation of the links between child rights, ethics and data science disciplines and for enhanced discourse between stakeholders in the data chain, and those responsible for upholding the rights of children, globally….(More)”.

Research data infrastructures in the UK


The Open Research Data Task Force : “This report is intended to inform the work of the Open Research Data Task Force, which has been established with the aim of building on the principles set out in Open Research Data Concordat (published in July 2016) to co-ordinate creation of a roadmap to develop the infrastructure for open research data across the UK. As an initial contribution to that work, the report provides an outline of the policy and service infrastructure in the UK as it stands in the first half of 2017, including some comparisons with other countries; and it points to some key areas and issues which require attention. It does not seek to identify possible courses of action, nor even to suggest priorities the Task Force might consider in creating its final report to be published in 2018. That will be the focus of work for the Task Force over the next few months.

Why is this important?

The digital revolution continues to bring fundamental changes to all aspects of research: how it is conducted, the findings that are produced, and how they are interrogated and transmitted not only within the research community but more widely. We are as yet still in the early stages of a transformation in which progress is patchy across the research community, but which has already posed significant challenges for research funders and institutions, as well as for researchers themselves. Research data is at the heart of those challenges: not simply the datasets that provide the core of the evidence analysed in scholarly publications, but all the data created and collected throughout the research process. Such data represents a potentially-valuable resource for people and organisations in the commercial, public and voluntary sectors, as well as for researchers. Access to such data, and more general moves towards open science, are also critically-important in ensuring that research is reproducible, and thus in sustaining public confidence in the work of the research community. But effective use of research data depends on an infrastructure – of hardware, software and services, but also of policies, organisations and individuals operating at various levels – that is as yet far from fully-formed. The exponential increases in volumes of data being generated by researchers create in themselves new demands for storage and computing power. But since the data is characterised more by heterogeneity then by uniformity, development of the infrastructure to manage it involves a complex set of requirements in preparing, collecting, selecting, analysing, processing, storing and preserving that data throughout its life cycle.

Over the past decade and more, there have been many initiatives on the part of research institutions, funders, and members of the research community at local, national and international levels to address some of these issues. Diversity is a key feature of the landscape, in terms of institutional types and locations, funding regimes, and nature and scope of partnerships, as well as differences between disciplines and subject areas. Hence decision-makers at various levels have fostered via their policies and strategies many community-organised developments, as well as their own initiatives and services. Significant progress has been achieved as a result, through the enthusiasm and commitment of key organisations and individuals. The less positive features have been a relative lack of harmonisation or consolidation, and there is an increasing awareness of patchiness in provision, with gaps, overlaps and inconsistencies. This is not surprising, since policies, strategies and services relating to research data necessarily affect all aspects of support for the diverse processes of research itself. Developing new policies and infrastructure for research data implies significant re-thinking of structures and regimes for supporting, fostering and promoting research itself. That in turn implies taking full account of widely-varying characteristics and needs of research of different kinds, while also keeping in clear view the benefits to be gained from better management of research data, and from greater openness in making data accessible for others to re-use for a wide range of different purposes….(More)”.

Using Collaboration to Harness Big Data for Social Good


Jake Porway at SSIR: “These days, it’s hard to get away from the hype around “big data.” We read articles about how Silicon Valley is using data to drive everything from website traffic to autonomous cars. We hear speakers at social sector conferences talk about how nonprofits can maximize their impact by leveraging new sources of digital information like social media data, open data, and satellite imagery.

Braving this world can be challenging, we know. Creating a data-driven organization can require big changes in culture and process. Some nonprofits, like Crisis Text Line and Watsi, started off boldly by building their own data science teams. But for the many other organizations wondering how to best use data to advance their mission, we’ve found that one ingredient works better than all the software and tech that you can throw at a problem: collaboration.

As a nonprofit dedicated to applying data science for social good, DataKind has run more than 200 projects in collaboration with other nonprofits worldwide by connecting them to teams of volunteer data scientists. What do the most successful ones have in common? Strong collaborations on three levels: with data science experts, within the organization itself, and across the nonprofit sector as a whole.

1. Collaborate with data science experts to define your project. As we often say, finding problems can be harder than finding solutions. ….

2. Collaborate across your organization to “build with, not for.” Our projects follow the principles of human-centered design and the philosophy pioneered in the civic tech world of “design with, not for.” ….

3. Collaborate across your sector to move the needle. Many organizations think about building data science solutions for unique challenges they face, such as predicting the best location for their next field office. However, most of us are fighting common causes shared by many other groups….

By focusing on building strong collaborations on these three levels—with data experts, across your organization, and across your sector—you’ll go from merely talking about big data to making big impact….(More).