Explore our articles

Stefaan Verhulst

APEC: “The objectives of this study is to better understand: 1) how firms from different sectors use data in their business models; and considering the significant increase in data-related policies and regulations enacted by governments across the world, 2) how such policies and regulations are affecting their use of data and hence business models. The study also tries: 3) to identify some of the middle-ground approaches that would enable governments to achieve public policy objectives, such as data security and privacy, and at the same time, also promote the growth of data-utilizing businesses. 39 firms from 12 economies have participated in this project and they come from a diverse group of industries, including aviation, logistics, shipping, payment services, encryption services, and manufacturing. The synthesis report can be found in Chapter 1 while the case study chapters can be found in Chapter 2 to 10….(More)”.

Fostering an Enabling Policy and Regulatory Environment in APEC for Data-Utilizing Businesses

Stefaan G. Verhulst at Project Syndicate: “After Hurricane Katrina struck New Orleans in 2005, the direct-mail marketing company Valassis shared its database with emergency agencies and volunteers to help improve aid delivery. In Santiago, Chile, analysts from Universidad del Desarrollo, ISI Foundation, UNICEF, and the GovLab collaborated with Telefónica, the city’s largest mobile operator, to study gender-based mobility patterns in order to design a more equitable transportation policy. And as part of the Yale University Open Data Access project, health-care companies Johnson & Johnson, Medtronic, and SI-BONE give researchers access to previously walled-off data from 333 clinical trials, opening the door to possible new innovations in medicine.

These are just three examples of “data collaboratives,” an emerging form of partnership in which participants exchange data for the public good. Such tie-ups typically involve public bodies using data from corporations and other private-sector entities to benefit society. But data collaboratives can help companies, too – pharmaceutical firms share data on biomarkers to accelerate their own drug-research efforts, for example. Data-sharing initiatives also have huge potential to improve artificial intelligence (AI). But they must be designed responsibly and take data-privacy concerns into account.

Understanding the societal and business case for data collaboratives, as well as the forms they can take, is critical to gaining a deeper appreciation the potential and limitations of such ventures. The GovLab has identified over 150 data collaboratives spanning continents and sectors; they include companies such as Air FranceZillow, and Facebook. Our research suggests that such partnerships can create value in three main ways….(More)”.

Sharing Private Data for Public Good

Molly Wood at Wired: “…But now that data is being used to train artificial intelligence, and the insights those future algorithms create could quite literally save lives.

So while targeted advertising is an easy villain, data-hogging artificial intelligence is a dangerously nuanced and highly sympathetic bad guy, like Erik Killmonger in Black Panther. And it won’t be easy to hate.

I recently met with a company that wants to do a sincerely good thing. They’ve created a sensor that pregnant women can wear, and it measures their contractions. It can reliably predict when women are going into labor, which can help reduce preterm births and C-sections. It can get women into care sooner, which can reduce both maternal and infant mortality.

All of this is an unquestionable good.

And this little device is also collecting a treasure trove of information about pregnancy and labor that is feeding into clinical research that could upend maternal care as we know it. Did you know that the way most obstetricians learn to track a woman’s progress through labor is based on a single study from the 1950s, involving 500 women, all of whom were white?…

To save the lives of pregnant women and their babies, researchers and doctors, and yes, startup CEOs and even artificial intelligence algorithms need data. To cure cancer, or at least offer personalized treatments that have a much higher possibility of saving lives, those same entities will need data….

And for we consumers, well, a blanket refusal to offer up our data to the AI gods isn’t necessarily the good choice either. I don’t want to be the person who refuses to contribute my genetic data via 23andMe to a massive research study that could, and I actually believe this is possible, lead to cures and treatments for diseases like Parkinson’s and Alzheimer’s and who knows what else.

I also think I deserve a realistic assessment of the potential for harm to find its way back to me, because I didn’t think through or wasn’t told all the potential implications of that choice—like how, let’s be honest, we all felt a little stung when we realized the 23andMe research would be through a partnership with drugmaker (and reliable drug price-hiker) GlaxoSmithKline. Drug companies, like targeted ads, are easy villains—even though this partnership actually couldproduce a Parkinson’s drug. But do we know what GSK’s privacy policy looks like? That deal was a level of sharing we didn’t necessarily expect….(More)”.

The Ethics of Hiding Your Data From the Machines

Eric Gordon and Rogelio Alejandro Lopez in Media and Communication: “This article reports on a qualitative study of community based organizations’ (CBOs) adoption of information communication technologies (ICT). As ICTs in the civic sector, otherwise known as civic tech, get adopted with greater regularity in large and small organizations, there is need to understand how these technologies shape and challenge the nature of civic work. Based on a nine-month ethnographic study of one organization in Boston and additional interviews with fourteen other organizations throughout the United States, the study addresses a guiding research question: how do CBOs reconcile the changing (increasingly mediated) nature of civic work as ICTs, and their effective adoption and use for civic purposes, increasingly represent forward-thinking, progress, and innovation in the civic sector?—of civic tech as a measure of “keeping up with the times.”

From a sense of top-down pressures to innovate in a fast-moving civic sector, to changing bottom-up media practices among community constituents, our findings identify four tensions in the daily practice of civic tech, including: 1) function vs. representation, 2) amplification vs. transformation, 3) grassroots vs. grasstops, and 4) youth vs. adults. These four tensions, derived from a grounded theory approach, provide a conceptual picture of a civic tech landscape that is much more complicated than a suite of tools to help organizations become more efficient. The article concludes with recommendations for practitioners and researchers….(More)”.

The Practice of Civic Tech: Tensions in the Adoption and Use of New Technologies in Community Based Organizations

Article by Priceonomics Data Studio: “For all the talk of how data is the new oil and the most valuable resource of any enterprise, there is a deep dark secret companies are reluctant to share — most of the data collected by businesses simply goes unused.

This unknown and unused data, known as dark data comprises more than half the data collected by companies. Given that some estimates indicate that 7.5 septillion (7,700,000,000,000,000,000,000) gigabytes of data are generated every single day, not using  most of it is a considerable issue.

In this article, we’ll look at this dark data. Just how much of it is created by companies, what are the reasons this data isn’t being analyzed, and what are the costs and implications of companies not using the majority of the data they collect.  

Before diving into the analysis, it’s worth spending a moment clarifying what we mean by the term “dark data.” Gartner defines dark data as:

“The information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). 

To learn more about this phenomenon, Splunk commissioned a global survey of 1,300+ business leaders to better understand how much data they collect, and how much is dark. Respondents were from IT and business roles, and were located in Australia, China, France, Germany, Japan, the United States, and the United Kingdom. across various industries. For the report, Splunk defines dark data as: “all the unknown and untapped data across an organization, generated by systems, devices and interactions.”

While the costs of storing data has decreased overtime, the cost of saving septillions of gigabytes of wasted data is still significant. What’s more, during this time the strategic importance of data has increased as companies have found more and more uses for it. Given the cost of storage and the value of data, why does so much of it go unused?

The following chart shows the reasons why dark data isn’t currently being harnessed:

By a large margin, the number one reason given for not using dark data is that companies lack a tool to capture or analyze the data. Companies accumulate data from server logs, GPS networks, security tools, call records, web traffic and more. Companies track everything from digital transactions to the temperature of their server rooms to the contents of retail shelves. Most of this data lies in separate systems, is unstructured, and cannot be connected or analyzed.

Second, the data captured just isn’t good enough. You might have important customer information about a transaction, but it’s missing location or other important metadata because that information sits somewhere else or was never captured in useable format.

Additionally, dark data exists because there is simply too much data out there and a lot of is unstructured. The larger the dataset (or the less structured it is), the more sophisticated the tool required for analysis. Additionally, these kinds of datasets often time require analysis by individuals with significant data science expertise who are often is short supply

The implications of the prevalence are vast. As a result of the data deluge, companies often don’t know where all the sensitive data is stored and can’t be confident they are complying with consumer data protection measures like GDPR. …(More)”.

Companies Collect a Lot of Data, But How Much Do They Actually Use?

María Verónica Alderete in Social Indicators Research: “The main objective of this paper is to discuss the key factors involved in the definition of smart city indexes. Although recent literature has explored the smart city subject, it is of concern if macro ICT factors should also be considered for assessing the technological innovation of a city. To achieve this goal, firstly a literature review of smart city is provided. An analysis of the smart city concept together with a theoretical framework based on the knowledge society and the Quintuple Helix innovation model are included. Secondly, the study analyzes some smart city cases in developed and developing countries. Thirdly, it describes, criticizes and compares some well-known smart city indexes. Lastly, the empirical literature is explored to detect if there are studies proposing changes in smart city indexes or methodologies to consider the macro level variables. It results that cities at the top of the indexes rankings are from developed countries. On the other side, most cities at the bottom of the ranking are from developing or not developed countries. As a result, it is addressed that the ICT development of Smart Cities depends both on the cities’ characteristics and features, and on macro-technological factors. Secondly, there is a scarce number of papers in the subject including macro or country factors, and most of them are revisions of the literature or case studies. There is a lack of studies discussing the indexes’ methodologies. This paper provides some guidelines to build one….(More)”.

Exploring the Smart City Indexes and the Role of Macro Factors for Measuring Cities Smartness

Paper by Chris Culnane, Benjamin I. P. Rubinstein, and Vanessa Teague: “The subject of this report is the re-identification of individuals in the Myki public transport dataset released as part of the Melbourne Datathon 2018. We demonstrate the ease with which we were able to re-identify ourselves, our co-travellers, and complete strangers; our analysis raises concerns about the nature and granularity of the data released, in particular the ability to identify vulnerable or sensitive groups…..

This work highlights how a large number of passengers could be re-identified in the 2018 Myki data release, with detailed discussion of specific people. The implications of re-identification are potentially serious: ex-partners, one-time acquaintances, or other parties can determine places of home, work, times of travel, co-travelling patterns—presenting risk to vulnerable groups in particular…

In 2018 the Victorian Government released a large passenger centric transport dataset to a data science competition—the 2018 Melbourne Datathon. Access to the data was unrestricted, with a URL provided on the datathon’s website to download the complete dataset from an Amazon S3 Bucket. Over 190 teams continued to analyse the data through the 2 month competition period. The data consisted of touch on and touch off events for the Myki smart card ticketing system used throughout the state of Victoria, Australia. With such data, contestants would be able to apply retrospective analyses on an entire public transport system, explore suitability of predictive models, etc.

The Myki ticketing system is used across Victorian public transport: on trains, buses and trams. The dataset was a longitudinal dataset, consisting of touch on and touch off events from Week 27 in 2015 through to Week 26 in 2018. Each event contained a card identifier (cardId; not the actual card number), the card type, the time of the touch on or off, and various location information, for example a stop ID or route ID, along with other fields which we omit here for brevity. Events could be indexed by the cardId and as such, all the events associated with a single card could be retrieved. There are a total of 15,184,336 cards in the dataset—more than twice the 2018 population of Victoria. It appears that all touch on and off events for metropolitan trains and trams have been included, though other forms of transport such as intercity trains and some buses are absent. In total there are nearly 2 billion touch on and off events in the dataset.

No information was provided as to the de-identification that was performed on the dataset. Our analysis indicates that little to no de-identification took place on the bulk of the data, as will become evident in Section 3. The exception is the cardId, which appears to have been mapped in some way from the Myki Card Number. The exact mapping has not been discovered, although concerns remain as to its security effectiveness….(More)”.

Stop the Open Data Bus, We Want to Get Off

Introduction to a special issue of Social Studies of Science by Klaus Hoeyer, Susanne Bauer, and Martyn Pickersgill: “In recent years and across many nations, public health has become subject to forms of governance that are said to be aimed at establishing accountability. In this introduction to a special issue, From Person to Population and Back: Exploring Accountability in Public Health, we suggest opening up accountability assemblages by asking a series of ostensibly simple questions that inevitably yield complicated answers: What is counted? What counts? And to whom, how and why does it count? Addressing such questions involves staying attentive to the technologies and infrastructures through which data come into being and are made available for multiple political agendas. Through a discussion of public health, accountability and datafication we present three key themes that unite the various papers as well as illustrate their diversity….(More)”.

Datafication and accountability in public health

Blog post by Geoff Mulgan: “Governance sinkholes appear when shifts in technology, society and the economy throw up the need for new arrangements. Each industrial revolution has created many governance sinkholes – and prompted furious innovation to fill them. The fourth industrial revolution will be no different. But most governments are too distracted to think about what to do to fill these holes, let alone to act. This blog sets out my diagnosis – and where I think the most work is needed to design new institutions….

It’s not too hard to get a map of the fissures and gaps – and to see where governance is needed but is missing. There are all too many of these now.

Here are a few examples. One is long-term care, currently missing adequate financing, regulation, information and navigation tools, despite its huge and growing significance. The obvious contrast is with acute healthcare, which, for all its problems, is rich in institutions and governance.

A second example is lifelong learning and training. Again, there is a striking absence of effective institutions to provide funding, navigation, policy and problem solving, and again, the contrast with the institution-rich fields of primary, secondary and tertiary education is striking. The position on welfare is not so different, as is the absence of institutions fit for purpose in supporting people in precarious work.

I’m particularly interested in another kind of sinkhole: the absence of the right institutions to handle data and knowledge – at global, national and local levels – now that these dominate the economy, and much of daily life. In field after field, there are huge potential benefits to linking data sets and connecting artificial and human intelligence to spot patterns or prevent problems. But we lack any institutions with either the skills or the authority to do this well, and in particular to think through the trade-offs between the potential benefits and the potential risks….(More)”.

Governance sinkholes

Paper by Christoph Kollwitz and Barbara Dinter: “In order to master the digital transformation and to survive in global competition, companies face the challenge of improving transformation processes, such as innovation processes. However, the design of these processes poses a challenge, as the related knowledge is still largely in its infancy. A popular trend since the mid-2000s are collaborative development events, so-called hackathons, where people with different professional backgrounds work collaboratively on development projects for a defined period. While hackathons are a widespread phenomenon in practice and many field reports and individual observations exist, there is still a lack of holistic and structured representations of the new phenomenon in literature.

The paper at hand aims to develop a taxonomy of hackathons in order to illustrate their nature and underlying characteristics. For this purpose, a systematic literature review is combined with existing taxonomies or taxonomy-like artifacts (e.g. morphological boxes, typologies) from similar research areas in an iterative taxonomy development process. The results contribute to an improved understanding of the phenomenon hackathon and allow the more effective use of hackathons as a new tool in organizational innovation processes. Furthermore, the taxonomy provides guidance on how to apply hackathons for organizational innovation processes….(More)”.

What the Hack? – Towards a Taxonomy of Hackathons

Get the latest news right in you inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday