Companies Collect a Lot of Data, But How Much Do They Actually Use?


Article by Priceonomics Data Studio: “For all the talk of how data is the new oil and the most valuable resource of any enterprise, there is a deep dark secret companies are reluctant to share — most of the data collected by businesses simply goes unused.

This unknown and unused data, known as dark data comprises more than half the data collected by companies. Given that some estimates indicate that 7.5 septillion (7,700,000,000,000,000,000,000) gigabytes of data are generated every single day, not using  most of it is a considerable issue.

In this article, we’ll look at this dark data. Just how much of it is created by companies, what are the reasons this data isn’t being analyzed, and what are the costs and implications of companies not using the majority of the data they collect.  

Before diving into the analysis, it’s worth spending a moment clarifying what we mean by the term “dark data.” Gartner defines dark data as:

“The information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). 

To learn more about this phenomenon, Splunk commissioned a global survey of 1,300+ business leaders to better understand how much data they collect, and how much is dark. Respondents were from IT and business roles, and were located in Australia, China, France, Germany, Japan, the United States, and the United Kingdom. across various industries. For the report, Splunk defines dark data as: “all the unknown and untapped data across an organization, generated by systems, devices and interactions.”

While the costs of storing data has decreased overtime, the cost of saving septillions of gigabytes of wasted data is still significant. What’s more, during this time the strategic importance of data has increased as companies have found more and more uses for it. Given the cost of storage and the value of data, why does so much of it go unused?

The following chart shows the reasons why dark data isn’t currently being harnessed:

By a large margin, the number one reason given for not using dark data is that companies lack a tool to capture or analyze the data. Companies accumulate data from server logs, GPS networks, security tools, call records, web traffic and more. Companies track everything from digital transactions to the temperature of their server rooms to the contents of retail shelves. Most of this data lies in separate systems, is unstructured, and cannot be connected or analyzed.

Second, the data captured just isn’t good enough. You might have important customer information about a transaction, but it’s missing location or other important metadata because that information sits somewhere else or was never captured in useable format.

Additionally, dark data exists because there is simply too much data out there and a lot of is unstructured. The larger the dataset (or the less structured it is), the more sophisticated the tool required for analysis. Additionally, these kinds of datasets often time require analysis by individuals with significant data science expertise who are often is short supply

The implications of the prevalence are vast. As a result of the data deluge, companies often don’t know where all the sensitive data is stored and can’t be confident they are complying with consumer data protection measures like GDPR. …(More)”.

Exploring the Smart City Indexes and the Role of Macro Factors for Measuring Cities Smartness


María Verónica Alderete in Social Indicators Research: “The main objective of this paper is to discuss the key factors involved in the definition of smart city indexes. Although recent literature has explored the smart city subject, it is of concern if macro ICT factors should also be considered for assessing the technological innovation of a city. To achieve this goal, firstly a literature review of smart city is provided. An analysis of the smart city concept together with a theoretical framework based on the knowledge society and the Quintuple Helix innovation model are included. Secondly, the study analyzes some smart city cases in developed and developing countries. Thirdly, it describes, criticizes and compares some well-known smart city indexes. Lastly, the empirical literature is explored to detect if there are studies proposing changes in smart city indexes or methodologies to consider the macro level variables. It results that cities at the top of the indexes rankings are from developed countries. On the other side, most cities at the bottom of the ranking are from developing or not developed countries. As a result, it is addressed that the ICT development of Smart Cities depends both on the cities’ characteristics and features, and on macro-technological factors. Secondly, there is a scarce number of papers in the subject including macro or country factors, and most of them are revisions of the literature or case studies. There is a lack of studies discussing the indexes’ methodologies. This paper provides some guidelines to build one….(More)”.

Stop the Open Data Bus, We Want to Get Off


Paper by Chris Culnane, Benjamin I. P. Rubinstein, and Vanessa Teague: “The subject of this report is the re-identification of individuals in the Myki public transport dataset released as part of the Melbourne Datathon 2018. We demonstrate the ease with which we were able to re-identify ourselves, our co-travellers, and complete strangers; our analysis raises concerns about the nature and granularity of the data released, in particular the ability to identify vulnerable or sensitive groups…..

This work highlights how a large number of passengers could be re-identified in the 2018 Myki data release, with detailed discussion of specific people. The implications of re-identification are potentially serious: ex-partners, one-time acquaintances, or other parties can determine places of home, work, times of travel, co-travelling patterns—presenting risk to vulnerable groups in particular…

In 2018 the Victorian Government released a large passenger centric transport dataset to a data science competition—the 2018 Melbourne Datathon. Access to the data was unrestricted, with a URL provided on the datathon’s website to download the complete dataset from an Amazon S3 Bucket. Over 190 teams continued to analyse the data through the 2 month competition period. The data consisted of touch on and touch off events for the Myki smart card ticketing system used throughout the state of Victoria, Australia. With such data, contestants would be able to apply retrospective analyses on an entire public transport system, explore suitability of predictive models, etc.

The Myki ticketing system is used across Victorian public transport: on trains, buses and trams. The dataset was a longitudinal dataset, consisting of touch on and touch off events from Week 27 in 2015 through to Week 26 in 2018. Each event contained a card identifier (cardId; not the actual card number), the card type, the time of the touch on or off, and various location information, for example a stop ID or route ID, along with other fields which we omit here for brevity. Events could be indexed by the cardId and as such, all the events associated with a single card could be retrieved. There are a total of 15,184,336 cards in the dataset—more than twice the 2018 population of Victoria. It appears that all touch on and off events for metropolitan trains and trams have been included, though other forms of transport such as intercity trains and some buses are absent. In total there are nearly 2 billion touch on and off events in the dataset.

No information was provided as to the de-identification that was performed on the dataset. Our analysis indicates that little to no de-identification took place on the bulk of the data, as will become evident in Section 3. The exception is the cardId, which appears to have been mapped in some way from the Myki Card Number. The exact mapping has not been discovered, although concerns remain as to its security effectiveness….(More)”.

Datafication and accountability in public health


Introduction to a special issue of Social Studies of Science by Klaus Hoeyer, Susanne Bauer, and Martyn Pickersgill: “In recent years and across many nations, public health has become subject to forms of governance that are said to be aimed at establishing accountability. In this introduction to a special issue, From Person to Population and Back: Exploring Accountability in Public Health, we suggest opening up accountability assemblages by asking a series of ostensibly simple questions that inevitably yield complicated answers: What is counted? What counts? And to whom, how and why does it count? Addressing such questions involves staying attentive to the technologies and infrastructures through which data come into being and are made available for multiple political agendas. Through a discussion of public health, accountability and datafication we present three key themes that unite the various papers as well as illustrate their diversity….(More)”.

Governance sinkholes


Blog post by Geoff Mulgan: “Governance sinkholes appear when shifts in technology, society and the economy throw up the need for new arrangements. Each industrial revolution has created many governance sinkholes – and prompted furious innovation to fill them. The fourth industrial revolution will be no different. But most governments are too distracted to think about what to do to fill these holes, let alone to act. This blog sets out my diagnosis – and where I think the most work is needed to design new institutions….

It’s not too hard to get a map of the fissures and gaps – and to see where governance is needed but is missing. There are all too many of these now.

Here are a few examples. One is long-term care, currently missing adequate financing, regulation, information and navigation tools, despite its huge and growing significance. The obvious contrast is with acute healthcare, which, for all its problems, is rich in institutions and governance.

A second example is lifelong learning and training. Again, there is a striking absence of effective institutions to provide funding, navigation, policy and problem solving, and again, the contrast with the institution-rich fields of primary, secondary and tertiary education is striking. The position on welfare is not so different, as is the absence of institutions fit for purpose in supporting people in precarious work.

I’m particularly interested in another kind of sinkhole: the absence of the right institutions to handle data and knowledge – at global, national and local levels – now that these dominate the economy, and much of daily life. In field after field, there are huge potential benefits to linking data sets and connecting artificial and human intelligence to spot patterns or prevent problems. But we lack any institutions with either the skills or the authority to do this well, and in particular to think through the trade-offs between the potential benefits and the potential risks….(More)”.

What the Hack? – Towards a Taxonomy of Hackathons


Paper by Christoph Kollwitz and Barbara Dinter: “In order to master the digital transformation and to survive in global competition, companies face the challenge of improving transformation processes, such as innovation processes. However, the design of these processes poses a challenge, as the related knowledge is still largely in its infancy. A popular trend since the mid-2000s are collaborative development events, so-called hackathons, where people with different professional backgrounds work collaboratively on development projects for a defined period. While hackathons are a widespread phenomenon in practice and many field reports and individual observations exist, there is still a lack of holistic and structured representations of the new phenomenon in literature.

The paper at hand aims to develop a taxonomy of hackathons in order to illustrate their nature and underlying characteristics. For this purpose, a systematic literature review is combined with existing taxonomies or taxonomy-like artifacts (e.g. morphological boxes, typologies) from similar research areas in an iterative taxonomy development process. The results contribute to an improved understanding of the phenomenon hackathon and allow the more effective use of hackathons as a new tool in organizational innovation processes. Furthermore, the taxonomy provides guidance on how to apply hackathons for organizational innovation processes….(More)”.

Governing Complexity: Analyzing and Applying Polycentricity


Book edited by Andreas Thiel, William A. Blomquist, and Dustin E. Garrick: “There has been a rapid expansion of academic interest and publications on polycentricity. In the contemporary world, nearly all governance situations are polycentric, but people are not necessarily used to thinking this way. Governing Complexity provides an updated explanation of the concept of polycentric governance. The editors provide examples of it in contemporary settings involving complex natural resource systems, as well as a critical evaluation of the utility of the concept. With contributions from leading scholars in the field, this book makes the case that polycentric governance arrangements exist and it is possible for polycentric arrangements to perform well, persist for long periods, and adapt. Whether they actually function well, persist, or adapt depends on multiple factors that are reviewed and discussed, both theoretically and with examples from actual cases….(More)”.

After Technopoly


Alan Jacobs at the New Atlantis: “Technocratic solutionism is dying. To replace it, we must learn again the creation and reception of myth….
What Neil Postman called “technopoly” may be described as the universal and virtually inescapable rule of our everyday lives by those who make and deploy technology, especially, in this moment, the instruments of digital communication. It is difficult for us to grasp what it’s like to live under technopoly, or how to endure or escape or resist the regime. These questions may best be approached by drawing on a handful of concepts meant to describe a slightly earlier stage of our common culture.

First, following on my earlier essay in these pages, “Wokeness and Myth on Campus” (Summer/Fall 2017), I want to turn again to a distinction by the Polish philosopher Leszek Kołakowski between the “technological core” of culture and the “mythical core” — a distinction he believed is essential to understanding many cultural developments.

“Technology” for Kołakowski is something broader than we usually mean by it. It describes a stance toward the world in which we view things around us as objects to be manipulated, or as instruments for manipulating our environment and ourselves. This is not necessarily meant in a negative sense; some things ought to be instruments — the spoon I use to stir my soup — and some things need to be manipulated — the soup in need of stirring. Besides tools, the technological core of culture includes also the sciences and most philosophy, as those too are governed by instrumental, analytical forms of reasoning by which we seek some measure of control.

By contrast, the mythical core of culture is that aspect of experience that is not subject to manipulation, because it is prior to our instrumental reasoning about our environment. Throughout human civilization, says Kołakowski, people have participated in myth — they may call it “illumination” or “awakening” or something else — as a way of connecting with “nonempirical unconditioned reality.” It is something we enter into with our full being, and all attempts to describe the experience in terms of desire, will, understanding, or literal meaning are ways of trying to force the mythological core into the technological core by analyzing and rationalizing myth and pressing it into a logical order. This is why the two cores are always in conflict, and it helps to explain why rational argument is often a fruitless response to people acting from the mythical core….(More)”.

How does Finland use health and social data for the public benefit?


Karolina Mackiewicz at ICT & Health: “…Better innovation opportunities, quicker access to comprehensive ready-combined data, smoother permit procedures needed for research – those are some of the benefits for society, academia or business announced by the Ministry of Social Affairs and Health of Finland when the Act on the Secondary Use of Health and Social Data was introduced.

It came into force on 1st of May 2019. According to the Finnish Innovation Fund SITRA, which was involved in the development of the legislation and carried out the pilot projects, it’s a ‘groundbreaking’ piece of legislation. It’ not only effectively introduces a one-stop-shop for data but it’s also one of the first, if not the first, implementations of the GDPR (the EU’s General Data Protection Regulation) for the secondary use of data in Europe. 

The aim of the Act is “to facilitate the effective and safe processing and access to the personal social and health data for steering, supervision, research, statistics and development in the health and social sector”. A second objective is to guarantee an individual’s legitimate expectations as well as their rights and freedoms when processing personal data. In other words, the Ministry of Health promises that the Act will help eliminate the administrative burden in access to the data by the researchers and innovative businesses while respecting the privacy of individuals and providing conditions for the ethically sustainable way of using data….(More)”.

Introduction to Decision Intelligence


Blog post by Cassie Kozyrkov: “…Decision intelligence is a new academic discipline concerned with all aspects of selecting between options. It brings together the best of applied data science, social science, and managerial science into a unified field that helps people use data to improve their lives, their businesses, and the world around them. It’s a vital science for the AI era, covering the skills needed to lead AI projects responsibly and design objectives, metrics, and safety-nets for automation at scale.

Let’s take a tour of its basic terminology and concepts. The sections are designed to be friendly to skim-reading (and skip-reading too, that’s where you skip the boring bits… and sometimes skip the act of reading entirely).

What’s a decision?

Data are beautiful, but it’s decisions that are important. It’s through our decisions — our actions — that we affect the world around us.

We define the word “decision” to mean any selection between options by any entity, so the conversation is broader than MBA-style dilemmas (like whether to open a branch of your business in London).

In this terminology, labeling a photo as cat versus not-cat is a decision executed by a computer system, while figuring out whether to launch that system is a decision taken thoughtfully by the human leader (I hope!) in charge of the project.

What’s a decision-maker?

In our parlance, a “decision-maker” is not that stakeholder or investor who swoops in to veto the machinations of the project team, but rather the person who is responsible for decision architecture and context framing. In other words, a creator of meticulously-phrased objectives as opposed to their destroyer.

What’s decision-making?

Decision-making is a word that is used differently by different disciplines, so it can refer to:

  • taking an action when there were alternative options (in this sense it’s possible to talk about decision-making by a computer or a lizard).
  • performing the function of a (human) decision-maker, part of which is taking responsibility for decisions. Even though a computer system can execute a decision, it will not be called a decision-maker because it does not bear responsibility for its outputs — that responsibility rests squarely on the shoulders of the humans who created it.

Decision intelligence taxonomy

One way to approach learning about decision intelligence is to break it along traditional lines into its quantitative aspects (largely overlapping with applied data science) and qualitative aspects (developed primarily by researchers in the social and managerial sciences)….(More)”.