Index: Secondary Uses of Personal Data


By Alexandra Shaw, Andrew Zahuranec, Andrew Young, Stefaan Verhulst

The Living Library Index–inspired by the Harper’s Index–provides important statistics and highlights global trends in governance innovation. This installment focuses on public perceptions regarding secondary uses of personal data (or the re-use of data initially collected for a different purpose). It provides a summary of societal perspectives toward personal data usage, sharing, and control. It is not meant to be comprehensive–rather, it intends to illustrate conflicting, and often confusing, attitudes toward the re-use of personal data. 

Please share any additional, illustrative statistics on data, or other issues at the nexus of technology and governance, with us at [email protected]

Data ownership and control 

  • Percentage of Americans who say it is “very important” they control information collected about them: 74% – 2016
  • Americans who think that today’s privacy laws are not good enough at protecting people’s privacy online: 68% – 2016
  • Americans who say they have “a lot” of control over how companies collect and use their information: 9% – 2015
  • In a survey of 507 online shoppers, the number of respondents who indicated they don’t want brands tracking their location: 62% – 2015
  • In a survey of 507 online shoppers, the amount who “prefer offers that are targeted to where they are and what they are doing:” 60% – 2015 
  • Number of surveyed American consumers willing to provide data to corporations under the following conditions: 
    • “Data about my social concerns to better connect me with non-profit organizations that advance those causes:” 19% – 2018
    • “Data about my DNA to help me uncover any hereditary illnesses:” 21% – 2018
    • “Data about my interests and hobbies to receive relevant information and offers from online sellers:” 32% – 2018
    • “Data about my location to help me find the fastest route to my destination:” 40% – 2018
    • “My email address to receive exclusive offers from my favorite brands:”  56% – 2018  

Consumer Attitudes 

  • Academic study participants willing to donate personal data to research if it could lead to public good: 60% – 2014
  • Academic study participants willing to share personal data for research purposes in the interest of public good: 25% – 2014
  • Percentage who expect companies to “treat [them] like an individual, not as a member of some segment like ‘millennials’ or ‘suburban mothers:’” 74% – 2018 
    • Percentage who believe that brands should understand a “consumer’s individual situation (e.g. marital status, age, location, etc.)” when they’re being marketed to: 70% – 2018 Number who are “more annoyed” by companies now compared to 5 years ago: 40% – 2018Percentage worried their data is shared across companies without their permission: 88% – 2018Amount worried about a brand’s ability to track their behavior while on the brand’s website, app, or neither: 75% – 2018 
  • Consumers globally who expect brands to anticipate needs before they arise: 33%  – 2018 
  • Surveyed residents of the United Kingdom who identify as:
    • “Data pragmatists” willing to share personal data “under the right circumstances:” 58% – 2017
    • “Fundamentalists,” who would not share personal data for better services: 24% – 2017
    • Respondents who think data sharing is part of participating in the modern economy: 62% – 2018
    • Respondents who believe that data sharing benefits enterprises more than consumers: 75% – 2018
    • People who want more control over their data that enterprises collect: 84% – 2018
    • Percentage “unconcerned” about personal data protection: 18% – 2018
  • Percentage of Americans who think that government should do more to regulate large technology companies: 55% – 2018
  • Registered American voters who trust broadband companies with personal data “a great deal” or “a fair amount”: 43% – 2017
  • Americans who report experiencing a major data breach: 64% – 2017
  • Number of Americans who believe that their personal data is less secure than it was 5 years ago: 49% – 2019
  • Amount of surveyed American citizens who consider trust in a company an important factor for sharing data: 54% – 2018

Convenience

Microsoft’s 2015 Consumer Data Value Exchange Report attempts to understand consumer attitudes on the exchange of personal data across the global markets of Australia, Brazil, Canada, Colombia, Egypt, Germany, Kenya, Mexico, Nigeria, Spain, South Africa, United Kingdom and the United States. From their survey of 16,500 users, they find:

  • The most popular incentives for sharing data are: 
    • Cash rewards: 64% – 2015
    • Significant discounts: 49% – 2015
    • Streamlined processes: 29% – 2015
    • New ideas: 28% – 2015
  • Respondents who would prefer to see more ads to get new services: 34% – 2015
  • Respondents willing to share search terms for a service that enabled fewer steps to get things done: 70% – 2015 
  • Respondents willing to share activity data for such an improvement: 82% – 2015
  • Respondents willing to share their gender for “a service that inspires something new based on others like them:” 79% – 2015

A 2015 Pew Research Center survey presented Americans with several data-sharing scenarios related to convenience. Participants could respond: “acceptable,” “it depends,” or “not acceptable” to the following scenarios: 

  • Share health information to get access to personal health records and arrange appointments more easily:
    • Acceptable: 52% – 2015
    • It depends: 20% – 2015
    • Not acceptable: 26% – 2015
  • Share data for discounted auto insurance rates: 
    • Acceptable: 37% – 2015
    • It depends: 16% – 2015
    • Not acceptable: 45% – 2015
  • Share data for free social media services: 
    • Acceptable: 33% – 2015
    • It depends: 15% – 2015
    • Not acceptable: 51% – 2015
  • Share data on smart thermostats for cheaper energy bills: 
    • Acceptable: 33% – 2015
    • It depends: 15% – 2015
    • Not acceptable: 51% – 2015

Other Studies

  • Surveyed banking and insurance customers who would exchange personal data for:
    • Targeted auto insurance premiums: 64% – 2019
    • Better life insurance premiums for healthy lifestyle choices: 52% – 2019 
  • Surveyed banking and insurance customers willing to share data specifically related to income, location and lifestyle habits to: 
    • Secure faster loan approvals: 81.3% – 2019
    • Lower the chances of injury or loss: 79.7% – 2019 
    • Receive discounts on non-insurance products or services: 74.6% – 2019
    • Receive text alerts related to banking account activity: 59.8% – 2019 
    • Get saving advice based on spending patterns: 56.6% – 2019
  • In a survey of over 7,000 members of the public around the globe, respondents indicated:
    • They thought “smartphone and tablet apps used for navigation, chat, and news that can access your contacts, photos, and browsing history” is “creepy;” 16% – 2016
    • Emailing a friend about a trip to Paris and receiving advertisements for hotels, restaurants and excursions in Paris is “creepy:” 32% – 2016
    • A free fitness-tracking device that monitors your well-being and sends a monthly report to you and your employer is “creepy:” 45% – 2016
    • A telematics device that allows emergency services to track your vehicle is “creepy:” 78% – 2016
  • The number of British residents who do not want to work with virtual agents of any kind: 48% – 2017
  • Americans who disagree that “if companies give me a discount, it is a fair exchange for them to collect information about me without my knowing”: 91% – 2015

Data Brokers, Intermediaries, and Third Parties 

  • Americans who consider it acceptable for a grocery store to offer a free loyalty card in exchange for selling their shopping data to third parties: 47% – 2016
  • Number of people who know that “searches, site visits and purchases” are reviewed without consent:  55% – 2015
  • The number of people in 1991 who wanted companies to ask them for permission first before collecting their personal information and selling that data to intermediaries: 93% – 1991
    • Number of Americans who “would be very concerned if the company at which their data were stored sold it to another party:” 90% – 2008
    • Percentage of Americans who think it’s unacceptable for their grocery store to share their shopping data with third parties in exchange for a free loyalty card: 32% – 2016
  • Percentage of Americans who think that government needs to do more to regulate advertisers: 64% – 2016
    • Number of Americans who “want to have control over what marketers can learn about” them online: 84% – 2015
    • Percentage of Americans who think they have no power over marketers to figure out what they’re learning about them: 58% – 2015
  • Registered American voters who are “somewhat uncomfortable” or “very uncomfortable” with companies like Internet service providers or websites using personal data to recommend stories, articles, or videos:  56% – 2017
  • Registered American voters who are “somewhat uncomfortable” or “very uncomfortable” with companies like Internet service providers or websites selling their personal information to third parties for advertising purposes: 64% – 2017

Personal Health Data

The Robert Wood Johnson Foundation’s 2014 Health Data Exploration Project Report analyzes attitudes about personal health data (PHD). PHD is self-tracking data related to health that is traceable through wearable devices and sensors. The three major stakeholder groups involved in using PHD for public good are users, companies that track the users’ data, and researchers. 

  • Overall Respondents:
    • Percentage who believe anonymity is “very” or “extremely” important: 67% – 2014
    • Percentage who “probably would” or “definitely would” share their personal data with researchers: 78% – 2014
    • Percentage who believe that they own—or should own—all the data about them, even when it is indirectly collected: 54% – 2014
    • Percentage who think they share or ought to share ownership with the company: 30% – 2014
    • Percentage who think companies alone own or should own all the data about them: 4% – 2014
    • Percentage for whom data ownership “is not something I care about”: 13% – 2014
    • Percentage who indicated they wanted to own their data: 75% – 2014 
    • Percentage who would share data only if “privacy were assured:” 68% – 2014
    • People who would supply data regardless of privacy or compensation: 27% – 2014
      • Percentage of participants who mentioned privacy, anonymity, or confidentiality when asked under what conditions they would share their data:  63% – 2014
      • Percentage who would be “more” or “much more” likely to share data for compensation: 56% – 2014
      • Percentage who indicated compensation would make no difference: 38% – 2014
      • Amount opposed to commercial  or profit-making use of their data: 13% – 2014
    • Percentage of people who would only share personal health data with a guarantee of:
      • Privacy: 57% – 2014
      • Anonymization: 90% – 2014
  • Surveyed Researchers: 
    • Percentage who agree or strongly agree that self-tracking data would help provide more insights in their research: 89% – 2014
    • Percentage who say PHD could answer questions that other data sources could not: 95% – 2014
    • Percentage who have used public datasets: 57% – 2014
    • Percentage who have paid for data for research: 19% – 2014
    • Percentage who have used self-tracking data before for research purposes: 46% – 2014
    • Percentage who have worked with application, device, or social media companies: 23% – 2014
    • Percentage who “somewhat disagree” or “strongly disagree” there are barriers that cannot be overcome to using self-tracking data in their research: 82% – 2014 

SOURCES: 

“2019 Accenture Global Financial Services Consumer Study: Discover the Patterns in Personality”, Accenture, 2019. 

“Americans’ Views About Data Collection and Security”, Pew Research Center, 2015. 

“Data Donation: Sharing Personal Data for Public Good?”, ResearchGate, 2014.

Data privacy: What the consumer really thinks,” Acxiom, 2018.

“Exclusive: Public wants Big Tech regulated”, Axios, 2018.

Consumer data value exchange,” Microsoft, 2015.

Crossing the Line: Staying on the right side of consumer privacy,” KPMG International Cooperative, 2016.

“How do you feel about the government sharing our personal data? – livechat”, The Guardian, 2017. 

“Personal data for public good: using health information in medical research”, The Academy of Medical Sciences, 2006. 

“Personal Data for the Public Good: New Opportunities to Enrich Understanding of Individual and Population Health”, Robert Wood Johnson Foundation, Health Data Exploration Project, Calit2, UC Irvine and UC San Diego, 2014. 

“Pew Internet and American Life Project: Cloud Computing Raises Privacy Concerns”, Pew Research Center, 2008. 

“Poll: Little Trust That Tech Giants Will Keep Personal Data Private”, Morning Consult & Politico, 2017. 

“Privacy and Information Sharing”, Pew Research Center, 2016. 

“Privacy, Data and the Consumer: What US Thinks About Sharing Data”, MarTech Advisor, 2018. 

“Public Opinion on Privacy”, Electronic Privacy Information Center, 2019. 

“Selligent Marketing Cloud Study Finds Consumer Expectations and Marketer Challenges are Rising in Tandem”, Selligent Marketing Cloud, 2018. 

The Data-Sharing Disconnect: The Impact of Context, Consumer Trust, and Relevance in Retail Marketing,” Boxever, 2015. 

Microsoft Research reveals understanding gap in the brand-consumer data exchange,” Microsoft Research, 2015.

“Survey: 58% will share personal data under the right circumstances”, Marketing Land: Third Door Media, 2019. 

“The state of privacy in post-Snowden America”, Pew Research Center, 2016. 

The Tradeoff Fallacy: How Marketers Are Misrepresenting American Consumers And Opening Them Up to Exploitation”, University of Pennsylvania, 2015.

Great Policy Successes


Book by Mallory Compton and Edited by Paul ‘t Hart: “With so much media and political criticism of their shortcomings and failures, it is easy to overlook the fact that many governments work pretty well much of the time. Great Policy Successes turns the spotlight on instances of public policy that are remarkably successful. It develops a framework for identifying and assessing policy successes, paying attention not just to their programmatic outcomes but also to the quality of the processes by which policies are designed and delivered, the level of support and legitimacy they attain, and the extent to which successful performance endures over time. The bulk of the book is then devoted to 15 detailed case studies of striking policy successes from around the world, including Singapore’s public health system, Copenhagen and Melbourne’s rise from stilted backwaters to the highly liveable and dynamic urban centres they are today, Brazil’s Bolsa Familia poverty relief scheme, the US’s GI Bill, and Germany’s breakthrough labour market reforms of the 2000s. Each case is set in context, its main actors are introduced, key events and decisions are described, the assessment framework is applied to gauge the nature and level of its success, key contributing factors to success are identified, and potential lessons and future challenges are identified. Purposefully avoiding the kind of heavy theorizing that characterizes many accounts of public policy processes, each case is written in an accessible and narrative style ideally suited for classroom use in conjunction with mainstream textbooks on public policy design, implementation, and evaluation….(More)”.

GovChain


Introduction to Report by Tom Rodden: “This report addresses the most discussed digital technologies of the last few years. There has been considerable debate about the potential benefits and threats that arise from the use of Distributed Ledger Technologies. What is clear from these debates is that blockchain is an important technology that has the potential to transform a range of sectors. The importance of Distributed Ledger Technology was identified and discussed in a 2016 report produced by Sir Mark Walport, the UK Government’s Chief Scientific Adviser at the time.

The report provided recommendations for the use of blockchain to meet national needs, and to ensure the UK’s competitiveness in the global arena. The report outlined the need for a broad response that spanned the public and private sector, whilst also recognising the need for leadership in the development and deployment of blockchain technologies.

This report provides an update and reflection on the use of blockchain technologies by Governments and Public Sector bodies around the world. Much has happened since 2016 and this report provides a reminder of the importance of Distributed Ledger Technologies for the public sector, and the various orientations of blockchains adopted across the globe. The team have mapped the various regulatory and policy responses to blockchain, and cryptocurrencies more broadly. This mapping not only reveals a varying degree of friendliness towards blockchain, it also highlights the challenges involved in implementing Distributed Ledger Technology systems in the public sector.

Distributed Ledger Technologies are an important technology for the public sector, albeit there exists a number of policy implications. If we are to show leadership in the use of blockchain and its application it is imperative that we are aware of both its benefits and limitations; and the issues that need to be addressed to ensure we gain value from the use of Distributed Ledger Technologies. This report captures the public sector experiences of blockchain technologies across the globe, and also documents the issues raised and the various responses. This is a hugely informative and useful document for those who seek to make use of blockchains in the public sector….(More)”.

Towards “Government as a Platform”? Preliminary Lessons from Australia, the United Kingdom and the United States


Paper by J. Ramon Gil‐Garcia, Paul Henman, and Martha Alicia Avila‐Maravilla: “In the last two decades, Internet portals have been used by governments around the world as part of very diverse strategies from service provision to citizen engagement. Several authors propose that there is an evolution of digital government reflected in the functionality and sophistication of these portals and other technologies. More recently, scholars and practitioners are proposing different conceptualizations of “government as a platform” and, for some, this could be the next stage of digital government. However, it is not clear what are the main differences between a sophisticated Internet portal and a platform. Therefore, based on an analysis of three of the most advanced national portals, this ongoing research paper explores to what extent these digital efforts clearly represent the basic characteristics of platforms. So, this paper explores questions such as: (1) to what extent current national portals reflect the characteristics of what has been called “government as a platform?; and (2) Are current national portals evolving towards “government as a platform”?…(More)”.

What statistics can and can’t tell us about ourselves


Hannah Fry at The New Yorker: “Harold Eddleston, a seventy-seven-year-old from Greater Manchester, was still reeling from a cancer diagnosis he had been given that week when, on a Saturday morning in February, 1998, he received the worst possible news. He would have to face the future alone: his beloved wife had died unexpectedly, from a heart attack.

Eddleston’s daughter, concerned for his health, called their family doctor, a well-respected local man named Harold Shipman. He came to the house, sat with her father, held his hand, and spoke to him tenderly. Pushed for a prognosis as he left, Shipman replied portentously, “I wouldn’t buy him any Easter eggs.” By Wednesday, Eddleston was dead; Dr. Shipman had murdered him.

Harold Shipman was one of the most prolific serial killers in history. In a twenty-three-year career as a mild-mannered and well-liked family doctor, he injected at least two hundred and fifteen of his patients with lethal doses of opiates. He was finally arrested in September, 1998, six months after Eddleston’s death.

David Spiegelhalter, the author of an important and comprehensive new book, “The Art of Statistics” (Basic), was one of the statisticians tasked by the ensuing public inquiry to establish whether the mortality rate of Shipman’s patients should have aroused suspicion earlier. Then a biostatistician at Cambridge, Spiegelhalter found that Shipman’s excess mortality—the number of his older patients who had died in the course of his career over the number that would be expected of an average doctor’s—was a hundred and seventy-four women and forty-nine men at the time of his arrest. The total closely matched the number of victims confirmed by the inquiry….

In 1825, the French Ministry of Justice ordered the creation of a national collection of crime records. It seems to have been the first of its kind anywhere in the world—the statistics of every arrest and conviction in the country, broken down by region, assembled and ready for analysis. It’s the kind of data set we take for granted now, but at the time it was extraordinarily novel. This was an early instance of Big Data—the first time that mathematical analysis had been applied in earnest to the messy and unpredictable realm of human behavior.

Or maybe not so unpredictable. In the early eighteen-thirties, a Belgian astronomer and mathematician named Adolphe Quetelet analyzed the numbers and discovered a remarkable pattern. The crime records were startlingly consistent. Year after year, irrespective of the actions of courts and prisons, the number of murders, rapes, and robberies reached almost exactly the same total. There is a “terrifying exactitude with which crimes reproduce themselves,” Quetelet said. “We know in advance how many individuals will dirty their hands with the blood of others. How many will be forgers, how many poisoners.”

To Quetelet, the evidence suggested that there was something deeper to discover. He developed the idea of a “Social Physics,” and began to explore the possibility that human lives, like planets, had an underlying mechanistic trajectory. There’s something unsettling in the idea that, amid the vagaries of choice, chance, and circumstance, mathematics can tell us something about what it is to be human. Yet Quetelet’s overarching findings still stand: at some level, human life can be quantified and predicted. We can now forecast, with remarkable accuracy, the number of women in Germany who will choose to have a baby each year, the number of car accidents in Canada, the number of plane crashes across the Southern Hemisphere, even the number of people who will visit a New York City emergency room on a Friday evening….(More)”

Index: The Data Universe 2019


By Michelle Winowatan, Andrew J. Zahuranec, Andrew Young, Stefaan Verhulst, Max Jun Kim

The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on the data universe.

Please share any additional, illustrative statistics on data, or other issues at the nexus of technology and governance, with us at [email protected]

Internet Traffic:

  • Percentage of the world’s population that uses the internet: 51.2% (3.9 billion people) – 2018
  • Number of search processed worldwide by Google every year: at least 2 trillion – 2016
  • Website traffic worldwide generated through mobile phones: 52.2% – 2018
  • The total number of mobile subscriptions in the first quarter of 2019: 7.9 billion (addition of 44 million in quarter) – 2019
  • Amount of mobile data traffic worldwide: nearly 30 billion GB – 2018
  • Data category with highest traffic worldwide: video (60%) – 2018
  • Global average of data traffic per smartphone per month: 5.6 GB – 2018
    • North America: 7 GB – 2018
    • Latin America: 3.1 GB – 2018
    • Western Europe: 6.7 GB – 2018
    • Central and Eastern Europe: 4.5 GB – 2018
    • North East Asia: 7.1 GB – 2018
    • Southeast Asia and Oceania: 3.6 GB – 2018
    • India, Nepal, and Bhutan: 9.8 GB – 2018
    • Middle East and Africa: 3.0 GB – 2018
  • Time between the creation of each new bitcoin block: 9.27 minutes – 2019

Streaming Services:

  • Total hours of video streamed by Netflix users every minute: 97,222 – 2017
  • Hours of YouTube watched per day: over 1 billion – 2018
  • Number of tracks uploaded to Spotify every day: Over 20,000 – 2019
  • Number of Spotify’s monthly active users: 232 million – 2019
  • Spotify’s total subscribers: 108 million – 2019
  • Spotify’s hours of content listened: 17 billion – 2019
  • Total number of songs on Spotify’s catalog: over 30 million – 2019
  • Apple Music’s total subscribers: 60 million – 2019
  • Total number of songs on Apple Music’s catalog: 45 million – 2019

Social Media:

Calls and Messaging:

Retail/Financial Transaction:

  • Number of packages shipped by Amazon in a year: 5 billion – 2017
  • Total value of payments processed by Venmo in a year: USD 62 billion – 2019
  • Based on an independent analysis of public transactions on Venmo in 2017:
  • Based on a non-representative survey of 2,436 US consumers between the ages of 21 and 72 on P2P platforms:
    • The average volume of transactions handled by Venmo: USD 64.2 billion – 2019
    • The average volume of transactions handled by Zelle: USD 122.0 billion – 2019
    • The average volume of transactions handled by PayPal: USD 141.8 billion – 2019 
    • Platform with the highest percent adoption among all consumers: PayPal (48%) – 2019 

Internet of Things:

Sources:

The hidden assumptions in public engagement: A case study of engaging on ethics in government data analysis


Paper by Emily S. Rempel, Julie Barnett and Hannah Durrant: “This study examines the hidden assumptions around running public-engagement exercises in government. We study an example of public engagement on the ethics of combining and analysing data in national government – often called data science ethics. We study hidden assumptions, drawing on hidden curriculum theories in education research, as it allows us to identify conscious and unconscious underlying processes related to conducting public engagement that may impact results. Through participation in the 2016 Public Dialogue for Data Science Ethics in the UK, four key themes were identified that exposed underlying public engagement norms. First, that organizers had constructed a strong imagined public as neither overly critical nor supportive, which they used to find and engage participants. Second, that official aims of the engagement, such as including publics in developing ethical data regulations, were overshadowed by underlying meta-objectives, such as counteracting public fears. Third, that advisory group members, organizers and publics understood the term ‘engagement’ in varying ways, from creating interest to public inclusion. And finally, that stakeholder interests, particularly government hopes for a positive report, influenced what was written in the final report. Reflection on these underlying mechanisms, such as the development of meta-objectives that seek to benefit government and technical stakeholders rather than publics, suggests that the practice of public engagement can, in fact, shut down opportunities for meaningful public dialogue….(More)”.

Sharing Private Data for Public Good


Stefaan G. Verhulst at Project Syndicate: “After Hurricane Katrina struck New Orleans in 2005, the direct-mail marketing company Valassis shared its database with emergency agencies and volunteers to help improve aid delivery. In Santiago, Chile, analysts from Universidad del Desarrollo, ISI Foundation, UNICEF, and the GovLab collaborated with Telefónica, the city’s largest mobile operator, to study gender-based mobility patterns in order to design a more equitable transportation policy. And as part of the Yale University Open Data Access project, health-care companies Johnson & Johnson, Medtronic, and SI-BONE give researchers access to previously walled-off data from 333 clinical trials, opening the door to possible new innovations in medicine.

These are just three examples of “data collaboratives,” an emerging form of partnership in which participants exchange data for the public good. Such tie-ups typically involve public bodies using data from corporations and other private-sector entities to benefit society. But data collaboratives can help companies, too – pharmaceutical firms share data on biomarkers to accelerate their own drug-research efforts, for example. Data-sharing initiatives also have huge potential to improve artificial intelligence (AI). But they must be designed responsibly and take data-privacy concerns into account.

Understanding the societal and business case for data collaboratives, as well as the forms they can take, is critical to gaining a deeper appreciation the potential and limitations of such ventures. The GovLab has identified over 150 data collaboratives spanning continents and sectors; they include companies such as Air FranceZillow, and Facebook. Our research suggests that such partnerships can create value in three main ways….(More)”.

Companies Collect a Lot of Data, But How Much Do They Actually Use?


Article by Priceonomics Data Studio: “For all the talk of how data is the new oil and the most valuable resource of any enterprise, there is a deep dark secret companies are reluctant to share — most of the data collected by businesses simply goes unused.

This unknown and unused data, known as dark data comprises more than half the data collected by companies. Given that some estimates indicate that 7.5 septillion (7,700,000,000,000,000,000,000) gigabytes of data are generated every single day, not using  most of it is a considerable issue.

In this article, we’ll look at this dark data. Just how much of it is created by companies, what are the reasons this data isn’t being analyzed, and what are the costs and implications of companies not using the majority of the data they collect.  

Before diving into the analysis, it’s worth spending a moment clarifying what we mean by the term “dark data.” Gartner defines dark data as:

“The information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). 

To learn more about this phenomenon, Splunk commissioned a global survey of 1,300+ business leaders to better understand how much data they collect, and how much is dark. Respondents were from IT and business roles, and were located in Australia, China, France, Germany, Japan, the United States, and the United Kingdom. across various industries. For the report, Splunk defines dark data as: “all the unknown and untapped data across an organization, generated by systems, devices and interactions.”

While the costs of storing data has decreased overtime, the cost of saving septillions of gigabytes of wasted data is still significant. What’s more, during this time the strategic importance of data has increased as companies have found more and more uses for it. Given the cost of storage and the value of data, why does so much of it go unused?

The following chart shows the reasons why dark data isn’t currently being harnessed:

By a large margin, the number one reason given for not using dark data is that companies lack a tool to capture or analyze the data. Companies accumulate data from server logs, GPS networks, security tools, call records, web traffic and more. Companies track everything from digital transactions to the temperature of their server rooms to the contents of retail shelves. Most of this data lies in separate systems, is unstructured, and cannot be connected or analyzed.

Second, the data captured just isn’t good enough. You might have important customer information about a transaction, but it’s missing location or other important metadata because that information sits somewhere else or was never captured in useable format.

Additionally, dark data exists because there is simply too much data out there and a lot of is unstructured. The larger the dataset (or the less structured it is), the more sophisticated the tool required for analysis. Additionally, these kinds of datasets often time require analysis by individuals with significant data science expertise who are often is short supply

The implications of the prevalence are vast. As a result of the data deluge, companies often don’t know where all the sensitive data is stored and can’t be confident they are complying with consumer data protection measures like GDPR. …(More)”.

How does Finland use health and social data for the public benefit?


Karolina Mackiewicz at ICT & Health: “…Better innovation opportunities, quicker access to comprehensive ready-combined data, smoother permit procedures needed for research – those are some of the benefits for society, academia or business announced by the Ministry of Social Affairs and Health of Finland when the Act on the Secondary Use of Health and Social Data was introduced.

It came into force on 1st of May 2019. According to the Finnish Innovation Fund SITRA, which was involved in the development of the legislation and carried out the pilot projects, it’s a ‘groundbreaking’ piece of legislation. It’ not only effectively introduces a one-stop-shop for data but it’s also one of the first, if not the first, implementations of the GDPR (the EU’s General Data Protection Regulation) for the secondary use of data in Europe. 

The aim of the Act is “to facilitate the effective and safe processing and access to the personal social and health data for steering, supervision, research, statistics and development in the health and social sector”. A second objective is to guarantee an individual’s legitimate expectations as well as their rights and freedoms when processing personal data. In other words, the Ministry of Health promises that the Act will help eliminate the administrative burden in access to the data by the researchers and innovative businesses while respecting the privacy of individuals and providing conditions for the ethically sustainable way of using data….(More)”.