Open data could have helped us learn from another mining dam disaster

Paulo A. de Souza Jr. at Nature: “The recent Brumadinho dam disaster in Brazil is an example of infrastructure failure with catastrophic consequences. Over 300 people were reported dead or missing, and nearly 400 more were rescued alive. The environmental impact is massive and difficult to quantify. The frequency of these disasters demonstrates that the current assets for monitoring integrity and generating alerting managers, authorities and the public to ongoing change in tailings are, in many cases, not working as they should. There is also the need for adequate prevention procedures. Monitoring can be perfect, but without timely and appropriate action, it will be useless. Good management therefore requires quality data. Undisputedly, management practices of industrial sites, including audit procedures, must improve, and data and metadata available from preceding accidents should be better used. There is a rich literature available about design, construction, operation, maintenance and decommissioning of tailing facilities. These include guidelines, standards, case studies, technical reports, consultancy and audit practices, and scientific papers. Regulation varies from country to country and in some cases, like Australia and Canada, it is controlled by individual state agencies. There are, however, few datasets available that are shared with the technical and scientific community more globally; particularly for prior incidents. Conspicuously lacking are comprehensive data related to monitoring of large infrastructures such as mining dams.

Today, Scientific Data published a Data Descriptor presenting a dataset obtained from 54 laboratory experiments on the breaching of fluvial dikes because of flow overtopping. (Re)use of such data can help improve our understanding of fundamental processes underpinning industrial infrastructure collapse (e.g., fluvial dike breaching, mining dam failure), and assess the accuracy of numerical models for the prediction of such incidents. This is absolutely essential for better management of floods, mitigation of dam collapses, and similar accidents. The authors propose a framework that could exemplify how data involving similar infrastructure can be stored, shared, published, and reused…(More)”.

Problematizing data-driven urban practices: Insights from five Dutch ‘smart cities’

Paper by Damion J.Bunders and KrisztinaVarró: Recently, the concept of the smart city has gained growing popularity. As cities worldwide have set the aim to harness digital technologies to their development, increasing focus came to lie on the potential challenges and concerns related to data-driven urban practices. In the existing literature, these challenges and concerns have been dominantly approached from a pragmatic approach based on the a priori assumed ‘goodness’ of the smart city; for a small group of critics, the very notion of the smart city is questionable. This paper takes the middle-way by interrogating how municipal and civil society stakeholders problematize the challenges and concerns related to data-driven practices in five Dutch cities, and how they act on these concerns in practice.

The lens of problematization posits that the ways of problematizing data-driven practices contribute to their actual enactment, and that this is an inherently political process. The case study shows that stakeholders do not only perceive practical challenges but are widely aware of and are (partly) pro-actively engaging with perceived normative-ethical and societal concerns, leading to different (sometimes inter-related) technological, legal/political, organizational, informative and participative strategies. Nonetheless, the explicit contestation of smart city policies through these strategies remains limited in scope. The paper argues that more research is needed to uncover the structural-institutional dynamics that facilitate and/or prevent the repoliticization of smart city projects….(More)”.

Data Collaboration for the Common Good: Enabling Trust and Innovation Through Public-Private Partnerships

World Economic Forum Report: “As the digital technologies of the Fourth Industrial Revolution continue to drive change throughout all sectors of the global economy, a unique moment exists to create a more inclusive, innovative and resilient society. Central to this change is the use of data. It is abundantly available but if improperly used will be the source of dangerous and unwelcome results.

When data is shared, linked and combined across sectoral and institutional boundaries, a multiplier effect occurs. Connecting one bit with another unlocks new insights and understandings that often weren’t anticipated. Yet, due to commercial limits and liabilities, the full value of data is often unrealized. This is particularly true when it comes to using data for the common good. While public-private data collaborations represent an unprecedented opportunity to address some of the world’s most urgent and complex challenges, they have generally been small and limited in impact. An entangled set of legal, technical, social, ethical and commercial risks have created an environment where the incentives for innovation have stalled. Additionally, the widening lack of trust among individuals and institutions creates even more uncertainty. After nearly a decade of anticipation on the promise of public-private data collaboration – with relatively few examples of success at global scale – a pivotal moment has arrived to encourage progress and move forward….(More)”

(See also

San Francisco becomes the first US city to ban facial recognition by government agencies

Colin Lecher at The Verge: “In a first for a city in the United States, San Francisco has voted to ban its government agencies from using facial recognition technology.

The city’s Board of Supervisors voted eight to one to approve the proposal, set to take effect in a month, that would bar city agencies, including law enforcement, from using the tool. The ordinance would also require city agencies to get board approval for their use of surveillance technology, and set up audits of surveillance tech already in use. Other cities have approved similar transparency measures.“

The plan, called the Stop Secret Surveillance Ordinance, was spearheaded by Supervisor Aaron Peskin. In a statement read ahead of the vote, Peskin said it was “an ordinance about having accountability around surveillance technology.”

“This is not an anti-technology policy,” he said, stressing that many tools used by law enforcement are still important to the city’s security. Still, he added, facial recognition is “uniquely dangerous and oppressive.”

The ban comes amid a broader debate over facial recognition, which can be used to rapidly identify people and has triggered new questions about civil liberties. Experts have raised specific concerns about the tools, as studies have demonstrated instances of troubling bias and error rates.

Microsoft, which offers facial recognition tools, has called for some form of regulation for the technology — but how, exactly, to regulate the tool has been contested. Proposals have ranged from light regulation to full moratoriums. Legislation has largely stalled, however.

San Francisco’s decision will inevitably be used as an example as the debate continues and other cities and states decide whether and how to regulate facial recognition. Civil liberties groups like the ACLU of Northern California have already thrown their support behind the San Francisco plan, while law enforcement in the area has pushed back….(More)”.

The death of the literature review and the rise of the dynamic knowledge map

Gorgi Krlev at LSE Impact Blog: “Literature reviews are a core part of academic research that are loathed by some and loved by others. The LSE Impact Blog recently presented two proposals on how to deal with the issues raised by literature reviews: Richard P. Phelps argues, due to their numerous flaws, we should simply get rid of them as a requirement in scholarly articles. In contrast, Arnaud Vaganay proposes, despite their flaws, we can save them by means of standardization that would make them more robust. Here, I put forward an alternative that strikes a balance between the two: Let’s build databases that help systemize academic research. There are examples of such databases in evidence-based health-care, why not replicate those examples more widely?

The seed of the thought underlying my proposition of building dynamic knowledge maps in the social sciences and humanities was planted in 2014. I was attending a talk within Oxford’s evidence-based healthcare programme. Jon Brassey, the main speaker of the event and founder of the TRIP database, was explaining his life goal: making systematic reviews and meta-analyses in healthcare research redundant! His argument was that a database containing all available research on treatment of a symptom, migraine for instance, would be able to summarize and display meta-effects within seconds, whereas a thorough meta-analysis would require weeks, if not months, if done by a conventional research team.

Although still imperfect, TRIP has made significant progress in realizing this vision. The most recent addition to the database are “evidence maps” that visualize what we know about effective treatments. Evidence maps compare alternative treatments based on all available studies. They indicate effectiveness of a treatment, the “size” of evidence underscoring the claim and the risk of bias contained in the underlying studies. Here and below is an example based on 943 studies, as of today, dealing with effective treatment of migraine, indicating aggregated study size and risk of bias.

Source: TRIP database

There have been heated debates about the value and relevance of academic research (propositions have centred on intensifying research on global challenges or harnessing data for policy impact), its rigor (for example reproducibility), and the speed of knowledge production, including the “glacial pace of academic publishing”. Literature reviews, for the reasons laid out by Phelps and Vaganay, suffer from imperfections that make them: time consuming, potentially incomplete or misleading, erratic, selective, and ultimately blurry rather than insightful. As a result, conducting literature reviews is arguably not an effective use of research time and only adds to wider inefficiencies in research….(More)”.

Big Data and the Computable Society: Algorithms and People in the Digital World

Book by Domenico Talia: “Data and algorithms are changing our life. The awareness of importance and pervasiveness of the digital revolution is the primary element from which to start a path of knowledge to grasp what is happening in the world of big data and digital innovation and to understand these impacts on our minds and relationships between people, traceability and the computability of behavior of individuals and social organizations.

This book analyses contemporary and future issues related to big data, algorithms, data analysis, artificial intelligence and the internet. It introduces and discusses relationships between digital technologies and power, the role of the pervasive algorithms in our life and the risk of technological alienation, the relationships between the use of big data, the privacy of citizens and the exercise of democracy, the techniques of artificial intelligence and their impact on the labor world, the Industry 4.0 at the time of the Internet of Things, social media, open data and public innovation.

Each chapter raises a set of questions and answers to help the reader to know the key issues in the enormous maze that the tools of info-communication have built around us….(More)”.

Data Trusts, Health Data, and the Professionalization of Data Management

Paper by Keith Porcaro: “This paper explores how trusts can provide a legal model for professionalizing health data management. Data is potential. Over time, data collected for one purpose can support others. Clinical records at a hospital, created to manage a patient’s care, can be internally analyzed to identify opportunities for process and safety improvements at a hospital, or externally analyzed with other records to identify optimal treatment patterns. Data also carries the potential for harm. Personal data can be leaked or exposed. Proprietary models can be used to discriminate against patients, or price them out of care.

As novel uses of data proliferate, an individual data holder may be ill-equipped to manage complex new data relationships in a way that maximizes value and minimizes harm. A single organization may be limited by management capacity or risk tolerance. Organizations across sectors have digitized unevenly or late, and may not have mature data controls and policies. Collaborations that involve multiple organizations may face coordination problems, or disputes over ownership.

Data management is still a relatively young field. Most models of external data-sharing are based on literally transferring data—copying data between organizations, or pooling large datasets together under the control of a third party—rather than facilitating external queries of a closely held dataset.

Few models to date have focused on the professional management of data on behalf of a data holder, where the data holder retains control over not only their data, but the inferences derived from their data. Trusts can help facilitate the professionalization of data management. Inspired by the popularity of trusts for managing financial investments, this paper argues that data trusts are well-suited as a vehicle for open-ended professional management of data, where a manager’s discretion is constrained by fiduciary duties and a trust document that defines the data holder’s goals…(More)”.

The Pathologies of Digital Consent

Paper by Neil M. Richards and Woodrow Hartzog: “Consent permeates both our law and our lives — especially in the digital context. Consent is the foundation of the relationships we have with search engines, social networks, commercial web sites, and any one of the dozens of other digitally mediated businesses we interact with regularly. We are frequently asked to consent to terms of service, privacy notices, the use of cookies, and so many other commercial practices. Consent is important, but it’s possible to have too much of a good thing. As a number of scholars have documented, while consent models permeate the digital consumer landscape, the practical conditions of these agreements fall far short of the gold standard of knowing and voluntary consent. Yet as scholars, advocates, and consumers, we lack a common vocabulary for talking about the different ways in which digital consents can be flawed.

This article offers four contributions to improve our understanding of consent in the digital world. First, we offer a conceptual vocabulary of “the pathologies of consent” — a framework for talking about different kinds of defects that consent models can suffer, such as unwitting consent, coerced consent, and incapacitated consent. Second, we offer three conditions for when consent will be most valid in the digital context: when choice is infrequent, when the potential harms resulting from that choice are vivid and easy to imagine, and where we have the correct incentives choose consciously and seriously. The further we fall from these conditions, the more a particular consent will be pathological and thus suspect. Third, we argue that out theory of consent pathologies sheds light on the so-called “privacy paradox” — the notion that there is a gap between what consumers say about wanting privacy and what they actually do in practice. Understanding the “privacy paradox” in terms of consent pathologies shows how consumers are not hypocrites who say one thing but do another. On the contrary, the pathologies of consent reveal how consumers can be nudged and manipulated by powerful companies against their actual interests, and that this process is easier when consumer protection law falls far from the gold standard. In light of these findings, we offer a fourth contribution — the theory of consumer trust we have suggested in prior work and which we further elaborate here as an alternative to our over-reliance on consent and its many pathologies….(More)”.

Data Science for Local Government

Report by Jonathan Bright, Bharath Ganesh, Cathrine Seidelin and Thomas Vogl: “The Data Science for Local Government project was about understanding how the growth of ‘data science’ is changing the way that local government works in the UK. We define data science as a dual shift which involves both bringing in new decision making and analytical techniques to local government work (e.g. machine learning and predictive analytics, artificial intelligence and A/B testing) and also expanding the types of data local government makes use of (for example, by repurposing administrative data, harvesting social media data, or working with mobile phone companies). The emergence of data science is facilitated by the growing availability of free, open-source tools for both collecting data and performing analysis.

Based on extensive documentary review, a nationwide survey of local authorities, and in-depth interviews with over 30 practitioners, we have sought to produce a comprehensive guide to the different types of data science being undertaken in the UK, the types of opportunities and benefits created, and also some of the challenges and difficulties being encountered.

Our aim was to provide a basis for people working in local government to start on their own data science projects, both by providing a library of dozens of ideas which have been tried elsewhere and also by providing hints and tips for overcoming key problems and challenges….(More)”

How AI could save lives without spilling medical secrets

Will Knight at MIT Technology Review: “The potential for artificial intelligence to transform health care is huge, but there’s a big catch.

AI algorithms will need vast amounts of medical data on which to train before machine learning can deliver powerful new ways to spot and understand the cause of disease. That means imagery, genomic information, or electronic health records—all potentially very sensitive information.

That’s why researchers are working on ways to let AI learn from large amounts of medical data while making it very hard for that data to leak.

One promising approach is now getting its first big test at Stanford Medical School in California. Patients there can choose to contribute their medical data to an AI system that can be trained to diagnose eye disease without ever actually accessing their personal details.

Participants submit ophthalmology test results and health record data through an app. The information is used to train a machine-learning model to identify signs of eye disease in the images. But the data is protected by technology developed by Oasis Labs, a startup spun out of UC Berkeley, which guarantees that the information cannot be leaked or misused. The startup was granted permission by regulators to start the trial last week.

The sensitivity of private patient data is a looming problem. AI algorithms trained on data from different hospitals could potentially diagnose illness, prevent disease, and extend lives. But in many countries medical records cannot easily be shared and fed to these algorithms for legal reasons. Research on using AI to spot disease in medical images or data usually involves relatively small data sets, which greatly limits the technology’s promise….

Oasis stores the private patient data on a secure chip, designed in collaboration with other researchers at Berkeley. The data remains within the Oasis cloud; outsiders are able to run algorithms on the data, and receive the results, without its ever leaving the system. A smart contractsoftware that runs on top of a blockchain—is triggered when a request to access the data is received. This software logs how the data was used and also checks to make sure the machine-learning computation was carried out correctly….(More)”.