Sharing Private Data for Public Good


Stefaan G. Verhulst at Project Syndicate: “After Hurricane Katrina struck New Orleans in 2005, the direct-mail marketing company Valassis shared its database with emergency agencies and volunteers to help improve aid delivery. In Santiago, Chile, analysts from Universidad del Desarrollo, ISI Foundation, UNICEF, and the GovLab collaborated with Telefónica, the city’s largest mobile operator, to study gender-based mobility patterns in order to design a more equitable transportation policy. And as part of the Yale University Open Data Access project, health-care companies Johnson & Johnson, Medtronic, and SI-BONE give researchers access to previously walled-off data from 333 clinical trials, opening the door to possible new innovations in medicine.

These are just three examples of “data collaboratives,” an emerging form of partnership in which participants exchange data for the public good. Such tie-ups typically involve public bodies using data from corporations and other private-sector entities to benefit society. But data collaboratives can help companies, too – pharmaceutical firms share data on biomarkers to accelerate their own drug-research efforts, for example. Data-sharing initiatives also have huge potential to improve artificial intelligence (AI). But they must be designed responsibly and take data-privacy concerns into account.

Understanding the societal and business case for data collaboratives, as well as the forms they can take, is critical to gaining a deeper appreciation the potential and limitations of such ventures. The GovLab has identified over 150 data collaboratives spanning continents and sectors; they include companies such as Air FranceZillow, and Facebook. Our research suggests that such partnerships can create value in three main ways….(More)”.

Companies Collect a Lot of Data, But How Much Do They Actually Use?


Article by Priceonomics Data Studio: “For all the talk of how data is the new oil and the most valuable resource of any enterprise, there is a deep dark secret companies are reluctant to share — most of the data collected by businesses simply goes unused.

This unknown and unused data, known as dark data comprises more than half the data collected by companies. Given that some estimates indicate that 7.5 septillion (7,700,000,000,000,000,000,000) gigabytes of data are generated every single day, not using  most of it is a considerable issue.

In this article, we’ll look at this dark data. Just how much of it is created by companies, what are the reasons this data isn’t being analyzed, and what are the costs and implications of companies not using the majority of the data they collect.  

Before diving into the analysis, it’s worth spending a moment clarifying what we mean by the term “dark data.” Gartner defines dark data as:

“The information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). 

To learn more about this phenomenon, Splunk commissioned a global survey of 1,300+ business leaders to better understand how much data they collect, and how much is dark. Respondents were from IT and business roles, and were located in Australia, China, France, Germany, Japan, the United States, and the United Kingdom. across various industries. For the report, Splunk defines dark data as: “all the unknown and untapped data across an organization, generated by systems, devices and interactions.”

While the costs of storing data has decreased overtime, the cost of saving septillions of gigabytes of wasted data is still significant. What’s more, during this time the strategic importance of data has increased as companies have found more and more uses for it. Given the cost of storage and the value of data, why does so much of it go unused?

The following chart shows the reasons why dark data isn’t currently being harnessed:

By a large margin, the number one reason given for not using dark data is that companies lack a tool to capture or analyze the data. Companies accumulate data from server logs, GPS networks, security tools, call records, web traffic and more. Companies track everything from digital transactions to the temperature of their server rooms to the contents of retail shelves. Most of this data lies in separate systems, is unstructured, and cannot be connected or analyzed.

Second, the data captured just isn’t good enough. You might have important customer information about a transaction, but it’s missing location or other important metadata because that information sits somewhere else or was never captured in useable format.

Additionally, dark data exists because there is simply too much data out there and a lot of is unstructured. The larger the dataset (or the less structured it is), the more sophisticated the tool required for analysis. Additionally, these kinds of datasets often time require analysis by individuals with significant data science expertise who are often is short supply

The implications of the prevalence are vast. As a result of the data deluge, companies often don’t know where all the sensitive data is stored and can’t be confident they are complying with consumer data protection measures like GDPR. …(More)”.

How does Finland use health and social data for the public benefit?


Karolina Mackiewicz at ICT & Health: “…Better innovation opportunities, quicker access to comprehensive ready-combined data, smoother permit procedures needed for research – those are some of the benefits for society, academia or business announced by the Ministry of Social Affairs and Health of Finland when the Act on the Secondary Use of Health and Social Data was introduced.

It came into force on 1st of May 2019. According to the Finnish Innovation Fund SITRA, which was involved in the development of the legislation and carried out the pilot projects, it’s a ‘groundbreaking’ piece of legislation. It’ not only effectively introduces a one-stop-shop for data but it’s also one of the first, if not the first, implementations of the GDPR (the EU’s General Data Protection Regulation) for the secondary use of data in Europe. 

The aim of the Act is “to facilitate the effective and safe processing and access to the personal social and health data for steering, supervision, research, statistics and development in the health and social sector”. A second objective is to guarantee an individual’s legitimate expectations as well as their rights and freedoms when processing personal data. In other words, the Ministry of Health promises that the Act will help eliminate the administrative burden in access to the data by the researchers and innovative businesses while respecting the privacy of individuals and providing conditions for the ethically sustainable way of using data….(More)”.

Introduction to Decision Intelligence


Blog post by Cassie Kozyrkov: “…Decision intelligence is a new academic discipline concerned with all aspects of selecting between options. It brings together the best of applied data science, social science, and managerial science into a unified field that helps people use data to improve their lives, their businesses, and the world around them. It’s a vital science for the AI era, covering the skills needed to lead AI projects responsibly and design objectives, metrics, and safety-nets for automation at scale.

Let’s take a tour of its basic terminology and concepts. The sections are designed to be friendly to skim-reading (and skip-reading too, that’s where you skip the boring bits… and sometimes skip the act of reading entirely).

What’s a decision?

Data are beautiful, but it’s decisions that are important. It’s through our decisions — our actions — that we affect the world around us.

We define the word “decision” to mean any selection between options by any entity, so the conversation is broader than MBA-style dilemmas (like whether to open a branch of your business in London).

In this terminology, labeling a photo as cat versus not-cat is a decision executed by a computer system, while figuring out whether to launch that system is a decision taken thoughtfully by the human leader (I hope!) in charge of the project.

What’s a decision-maker?

In our parlance, a “decision-maker” is not that stakeholder or investor who swoops in to veto the machinations of the project team, but rather the person who is responsible for decision architecture and context framing. In other words, a creator of meticulously-phrased objectives as opposed to their destroyer.

What’s decision-making?

Decision-making is a word that is used differently by different disciplines, so it can refer to:

  • taking an action when there were alternative options (in this sense it’s possible to talk about decision-making by a computer or a lizard).
  • performing the function of a (human) decision-maker, part of which is taking responsibility for decisions. Even though a computer system can execute a decision, it will not be called a decision-maker because it does not bear responsibility for its outputs — that responsibility rests squarely on the shoulders of the humans who created it.

Decision intelligence taxonomy

One way to approach learning about decision intelligence is to break it along traditional lines into its quantitative aspects (largely overlapping with applied data science) and qualitative aspects (developed primarily by researchers in the social and managerial sciences)….(More)”.


Blockchain and the General Data Protection Regulation


Report by the European Directorate-General for Parliamentary Research Services (EPRS): “Blockchain is a much-discussed instrument that, according to some, promises to inaugurate a new era of data storage and code-execution, which could, in turn, stimulate new business models and markets. The precise impact of the technology is, of course, hard to anticipate with certainty, in particular as many remain sceptical of blockchain’s potential impact. In recent times, there has been much discussion in policy circles, academia and the private sector regarding the tension between blockchain and the European Union’s General Data Protection Regulation (GDPR). Indeed, many of the points of tension between blockchain and the GDPR are due to two overarching factors.

First, the GDPR is based on an underlying assumption that in relation to each personal data point there is at least one natural or legal person – the data controller – whom data subjects can address to enforce their rights under EU data protection law. These data controllers must comply with the GDPR’s obligations. Blockchains, however, are distributed databases that often seek to achieve decentralisation by replacing a unitary actor with many different players. The lack of consensus as to how (joint-)controllership ought to be defined hampers the allocation of responsibility and accountability.

Second, the GDPR is based on the assumption that data can be modified or erased where necessary to comply with legal requirements, such as Articles 16 and 17 GDPR. Blockchains, however, render the unilateral modification of data purposefully onerous in order to ensure data integrity and to increase trust in the network. Furthermore, blockchains underline the challenges of adhering to the requirements of data minimisation and purpose limitation in the current form of the data economy.

This study examines the European data protection framework and applies it to blockchain technologies so as to document these tensions. It also highlights the fact that blockchain may help further some of the GDPR’s objectives. Concrete policy options are developed on the basis of this analysis….(More)”

Aliens in Europe. An open approach to involve more people in invasive species detection


Paper by Sven Schade et al: “Amplified by the phenomenon of globalisation, such as increased human mobility and the worldwide shipping of goods, we observe an increasing spread of animals and plants outside their native habitats. A few of these ‘aliens’ have negative impacts on their environment, including threats to local biodiversity, agricultural productivity, and human health. Our work addresses these threats, particularly within the European Union (EU), where a related legal framework has been established. We follow an open and participatory approach that allows more people to share their experiences of invasive alien species (IAS) in their surroundings. Over the past three years, we developed a mobile phone application, together with the underlying data management and validation infrastructure, which allows smartphone users to report a selected list of IAS. We put quality assurance and data integration mechanisms into place that allows the uptake of information into existing official systems in order to make it accessible to the relevant policy-making at EU level.

This article summarises our scientific methodology and technical approach, explains our decisions, and provides an outlook to the future of IAS monitoring involving citizens and utilising the latest technological advancements. Last but not least we emphasise on software design for reuse, within the domain of IAS monitoring, but also for supporting citizen science apps more generally. Whereas much could already be achieved, many scientific, technical and organizational challenges still remain to be addressed before data can be seamlessly shared and integrated. Here, we particularly highlight issues that emerge in an international setting, which involves many different stakeholders….(More)”.

How technology can enable a more sustainable agriculture industry


Matt High at CSO:”…The sector also faces considerable pressure in terms of its transparency, largely driven by shifting consumer preferences for responsibly sourced and environmentally-friendly goods. The UK, for example, has seen shoppers transition away from typical agricultural commodities towards ‘free-from’ or alternative options that combine health, sustainability and quality.

It means that farmers worldwide must work harder and smarter in embedding corporate social responsibility (CSR) practices into their operations. Davis, who through Anthesis delivers financially driven sustainability strategies, strongly believes that sustainability is no longer a choice. “The agricultural sector is intrinsic to a wide range of global systems, societies and economies,” he says, adding that those organisations that do not embed sustainability best practice into their supply chains will face “increasing risk of price volatility, security of supply, commodity shortages, fraud and uncertainty.” To counter this, he urges businesses to develop CSR founded on a core set of principles that enable sustainable practices to be successfully adopted at a pace and scale that mitigates those risks discussed.

Data is proving a particularly useful tool in this regard. Take the Cool Farm Tool, for example, which is a global, free-to-access online greenhouse gas (GHG), water and biodiversity footprint calculator used by farmers in more than 115 countries worldwide to enable effective management of critical on-farm sustainability challenges. Member organisations such as Pepsi, Tesco and Danone aggregate their supply chain data to report total agricultural footprint against key sustainability metrics – outputs from which are used to share knowledge and best practice on carbon and water reductions strategies….(More)”.

The Data Protection Officer Handbook


Handbook by Douwe Korff and Marie Georges: “This Handbook was prepared for and is used in the EU-funded  “T4DATA” training‐of-trainers programme. Part I explains the history and development of European data protection law and provides an overview of European data protection instruments including the Council of Europe Convention and its “Modernisation” and the various EU data protection instruments relating to Justice and Home Affairs, the CFSP and the EU institutions, before focusing on the GDPR in Part II. The final part (Part III) consists of detailed practical advice on the various tasks of the Data Protection Officer now institutionalised by the GDPR. Although produced for the T4DATA programme that focusses on DPOs in the public sector, it is hoped that the Handbook will be useful also to anyone else interested in the application of the GDPR, including DPOs in the private sector….(More)”.

Bringing machine learning to the masses


Matthew Hutson at Science: “Artificial intelligence (AI) used to be the specialized domain of data scientists and computer programmers. But companies such as Wolfram Research, which makes Mathematica, are trying to democratize the field, so scientists without AI skills can harness the technology for recognizing patterns in big data. In some cases, they don’t need to code at all. Insights are just a drag-and-drop away. One of the latest systems is software called Ludwig, first made open-source by Uber in February and updated last week. Uber used Ludwig for projects such as predicting food delivery times before releasing it publicly. At least a dozen startups are using it, plus big companies such as Apple, IBM, and Nvidia. And scientists: Tobias Boothe, a biologist at the Max Planck Institute of Molecular Cell Biology and Genetics in Dresden, Germany, uses it to visually distinguish thousands of species of flatworms, a difficult task even for experts. To train Ludwig, he just uploads images and labels….(More)”.

Exploring Digital Ecosystems: Organizational and Human Challenges


Proceedings edited by Alessandra Lazazzara, Francesca Ricciardi and Stefano Za: “The recent surge of interest in digital ecosystems is not only transforming the business landscape, but also poses several human and organizational challenges. Due to the pervasive effects of the transformation on firms and societies alike, both scholars and practitioners are interested in understanding the key mechanisms behind digital ecosystems, their emergence and evolution. In order to disentangle such factors, this book presents a collection of research papers focusing on the relationship between technologies (e.g. digital platforms, AI, infrastructure) and behaviours (e.g. digital learning, knowledge sharing, decision-making). Moreover, it provides critical insights into how digital ecosystems can shape value creation and benefit various stakeholders. The plurality of perspectives offered makes the book particularly relevant for users, companies, scientists and governments. The content is based on a selection of the best papers – original double-blind peer-reviewed contributions – presented at the annual conference of the Italian chapter of the AIS, which took place in Pavia, Italy in October 2018….(More)”.