Sharing Private Data for Public Good


Stefaan G. Verhulst at Project Syndicate: “After Hurricane Katrina struck New Orleans in 2005, the direct-mail marketing company Valassis shared its database with emergency agencies and volunteers to help improve aid delivery. In Santiago, Chile, analysts from Universidad del Desarrollo, ISI Foundation, UNICEF, and the GovLab collaborated with Telefónica, the city’s largest mobile operator, to study gender-based mobility patterns in order to design a more equitable transportation policy. And as part of the Yale University Open Data Access project, health-care companies Johnson & Johnson, Medtronic, and SI-BONE give researchers access to previously walled-off data from 333 clinical trials, opening the door to possible new innovations in medicine.

These are just three examples of “data collaboratives,” an emerging form of partnership in which participants exchange data for the public good. Such tie-ups typically involve public bodies using data from corporations and other private-sector entities to benefit society. But data collaboratives can help companies, too – pharmaceutical firms share data on biomarkers to accelerate their own drug-research efforts, for example. Data-sharing initiatives also have huge potential to improve artificial intelligence (AI). But they must be designed responsibly and take data-privacy concerns into account.

Understanding the societal and business case for data collaboratives, as well as the forms they can take, is critical to gaining a deeper appreciation the potential and limitations of such ventures. The GovLab has identified over 150 data collaboratives spanning continents and sectors; they include companies such as Air FranceZillow, and Facebook. Our research suggests that such partnerships can create value in three main ways….(More)”.

The Ethics of Hiding Your Data From the Machines


Molly Wood at Wired: “…But now that data is being used to train artificial intelligence, and the insights those future algorithms create could quite literally save lives.

So while targeted advertising is an easy villain, data-hogging artificial intelligence is a dangerously nuanced and highly sympathetic bad guy, like Erik Killmonger in Black Panther. And it won’t be easy to hate.

I recently met with a company that wants to do a sincerely good thing. They’ve created a sensor that pregnant women can wear, and it measures their contractions. It can reliably predict when women are going into labor, which can help reduce preterm births and C-sections. It can get women into care sooner, which can reduce both maternal and infant mortality.

All of this is an unquestionable good.

And this little device is also collecting a treasure trove of information about pregnancy and labor that is feeding into clinical research that could upend maternal care as we know it. Did you know that the way most obstetricians learn to track a woman’s progress through labor is based on a single study from the 1950s, involving 500 women, all of whom were white?…

To save the lives of pregnant women and their babies, researchers and doctors, and yes, startup CEOs and even artificial intelligence algorithms need data. To cure cancer, or at least offer personalized treatments that have a much higher possibility of saving lives, those same entities will need data….

And for we consumers, well, a blanket refusal to offer up our data to the AI gods isn’t necessarily the good choice either. I don’t want to be the person who refuses to contribute my genetic data via 23andMe to a massive research study that could, and I actually believe this is possible, lead to cures and treatments for diseases like Parkinson’s and Alzheimer’s and who knows what else.

I also think I deserve a realistic assessment of the potential for harm to find its way back to me, because I didn’t think through or wasn’t told all the potential implications of that choice—like how, let’s be honest, we all felt a little stung when we realized the 23andMe research would be through a partnership with drugmaker (and reliable drug price-hiker) GlaxoSmithKline. Drug companies, like targeted ads, are easy villains—even though this partnership actually couldproduce a Parkinson’s drug. But do we know what GSK’s privacy policy looks like? That deal was a level of sharing we didn’t necessarily expect….(More)”.

The Practice of Civic Tech: Tensions in the Adoption and Use of New Technologies in Community Based Organizations


Eric Gordon and Rogelio Alejandro Lopez in Media and Communication: “This article reports on a qualitative study of community based organizations’ (CBOs) adoption of information communication technologies (ICT). As ICTs in the civic sector, otherwise known as civic tech, get adopted with greater regularity in large and small organizations, there is need to understand how these technologies shape and challenge the nature of civic work. Based on a nine-month ethnographic study of one organization in Boston and additional interviews with fourteen other organizations throughout the United States, the study addresses a guiding research question: how do CBOs reconcile the changing (increasingly mediated) nature of civic work as ICTs, and their effective adoption and use for civic purposes, increasingly represent forward-thinking, progress, and innovation in the civic sector?—of civic tech as a measure of “keeping up with the times.”

From a sense of top-down pressures to innovate in a fast-moving civic sector, to changing bottom-up media practices among community constituents, our findings identify four tensions in the daily practice of civic tech, including: 1) function vs. representation, 2) amplification vs. transformation, 3) grassroots vs. grasstops, and 4) youth vs. adults. These four tensions, derived from a grounded theory approach, provide a conceptual picture of a civic tech landscape that is much more complicated than a suite of tools to help organizations become more efficient. The article concludes with recommendations for practitioners and researchers….(More)”.

Companies Collect a Lot of Data, But How Much Do They Actually Use?


Article by Priceonomics Data Studio: “For all the talk of how data is the new oil and the most valuable resource of any enterprise, there is a deep dark secret companies are reluctant to share — most of the data collected by businesses simply goes unused.

This unknown and unused data, known as dark data comprises more than half the data collected by companies. Given that some estimates indicate that 7.5 septillion (7,700,000,000,000,000,000,000) gigabytes of data are generated every single day, not using  most of it is a considerable issue.

In this article, we’ll look at this dark data. Just how much of it is created by companies, what are the reasons this data isn’t being analyzed, and what are the costs and implications of companies not using the majority of the data they collect.  

Before diving into the analysis, it’s worth spending a moment clarifying what we mean by the term “dark data.” Gartner defines dark data as:

“The information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). 

To learn more about this phenomenon, Splunk commissioned a global survey of 1,300+ business leaders to better understand how much data they collect, and how much is dark. Respondents were from IT and business roles, and were located in Australia, China, France, Germany, Japan, the United States, and the United Kingdom. across various industries. For the report, Splunk defines dark data as: “all the unknown and untapped data across an organization, generated by systems, devices and interactions.”

While the costs of storing data has decreased overtime, the cost of saving septillions of gigabytes of wasted data is still significant. What’s more, during this time the strategic importance of data has increased as companies have found more and more uses for it. Given the cost of storage and the value of data, why does so much of it go unused?

The following chart shows the reasons why dark data isn’t currently being harnessed:

By a large margin, the number one reason given for not using dark data is that companies lack a tool to capture or analyze the data. Companies accumulate data from server logs, GPS networks, security tools, call records, web traffic and more. Companies track everything from digital transactions to the temperature of their server rooms to the contents of retail shelves. Most of this data lies in separate systems, is unstructured, and cannot be connected or analyzed.

Second, the data captured just isn’t good enough. You might have important customer information about a transaction, but it’s missing location or other important metadata because that information sits somewhere else or was never captured in useable format.

Additionally, dark data exists because there is simply too much data out there and a lot of is unstructured. The larger the dataset (or the less structured it is), the more sophisticated the tool required for analysis. Additionally, these kinds of datasets often time require analysis by individuals with significant data science expertise who are often is short supply

The implications of the prevalence are vast. As a result of the data deluge, companies often don’t know where all the sensitive data is stored and can’t be confident they are complying with consumer data protection measures like GDPR. …(More)”.

Datafication and accountability in public health


Introduction to a special issue of Social Studies of Science by Klaus Hoeyer, Susanne Bauer, and Martyn Pickersgill: “In recent years and across many nations, public health has become subject to forms of governance that are said to be aimed at establishing accountability. In this introduction to a special issue, From Person to Population and Back: Exploring Accountability in Public Health, we suggest opening up accountability assemblages by asking a series of ostensibly simple questions that inevitably yield complicated answers: What is counted? What counts? And to whom, how and why does it count? Addressing such questions involves staying attentive to the technologies and infrastructures through which data come into being and are made available for multiple political agendas. Through a discussion of public health, accountability and datafication we present three key themes that unite the various papers as well as illustrate their diversity….(More)”.

Data versus Democracy


Book by  Kris Shaffer: “Human attention is in the highest demand it has ever been. The drastic increase in available information has compelled individuals to find a way to sift through the media that is literally at their fingertips. Content recommendation systems have emerged as the technological solution to this social and informational problem, but they’ve also created a bigger crisis in confirming our biases by showing us only, and exactly, what it predicts we want to see. Data versus Democracy investigates and explores how, in the era of social media, human cognition, algorithmic recommendation systems, and human psychology are all working together to reinforce (and exaggerate) human bias. The dangerous confluence of these factors is driving media narratives, influencing opinions, and possibly changing election results. 


In this book, algorithmic recommendations, clickbait, familiarity bias, propaganda, and other pivotal concepts are analyzed and then expanded upon via fascinating and timely case studies: the 2016 US presidential election, Ferguson, GamerGate, international political movements, and more events that come to affect every one of us. What are the implications of how we engage with information in the digital age? Data versus Democracy explores this topic and an abundance of related crucial questions. We live in a culture vastly different from any that has come before. In a society where engagement is currency, we are the product. Understanding the value of our attention, how organizations operate based on this concept, and how engagement can be used against our best interests is essential in responsibly equipping ourselves against the perils of disinformation….(More)”.

The Costs of Connection: How Data Is Colonizing Human Life and Appropriating It for Capitalism


Book by Nick Couldry: “We are told that progress requires human beings to be connected, and that science, medicine and much else that is good demands the kind massive data collection only possible if every thing and person are continuously connected.

But connection, and the continuous surveillance that connection makes possible, usher in an era of neocolonial appropriation. In this new era, social life becomes a direct input to capitalist production, and data – the data collected and processed when we are connected – is the means for this transformation. Hence the need to start counting the costs of connection.

Capturing and processing social data is today handled by an emerging social quantification sector. We are familiar with its leading players, from Acxiom to Equifax, from Facebook to Uber. Together, they ensure the regular and seemingly natural conversion of daily life into a stream of data that can be appropriated for value. This stream is extracted from sensors embedded in bodies and objects, and from the traces left by human interaction online. The result is a new social order based on continuous tracking, and offering unprecedented new opportunities for social discrimination and behavioral influence.  This order has disturbing consequences for freedom, justice and power — indeed, for the quality of human life.

The true violence of this order is best understood through the history of colonialism. But because we assume that colonialism has been replaced by advanced capitalism, we often miss the connection. The concept of data colonialism can thus be used to trace continuities from colonialism’s historic appropriation of territories and material resources to the datafication of everyday life today. While the modes, intensities, scales and contexts of dispossession have changed, the underlying function remains the same: to acquire resources from which economic value can be extracted.

In data colonialism, data is appropriated through a new type of social relation: data relations. We are living through a time when the organization of capital and the configurations of power are changing dramatically because of this contemporary form of social relation. Data colonialism justifies what it does as an advance in scientific knowledge, personalized marketing, or rational management, just as historic colonialism claimed a civilizing mission. Data colonialism is global, dominated by powerful forces in East and West, in the USA and China. The result is a world where, wherever we are connected, we are colonized by data.

Where is data colonialism heading in the long term? Just as historical colonialism paved the way for industrial capitalism, data colonialism is paving the way for a new stage of capitalism whose outlines we only partly see: the capitalization of life without limit. There will be no part of human life, no layer of experience, that is not extractable for economic value. Human life will be there for mining by corporations without reserve as governments look on appreciatively. This process of capitalization will be the foundation for a highly unequal new social arrangement, a social order that is deeply incompatible with human freedom and autonomy.

But resistance is still possible, drawing on past and present decolonial struggles, as well as the on the best of the humanities, philosophy, political economy, information and social science. The goal is to name what is happening and imagine better ways of living together without the exploitation on which today’s models of ‘connection’ are founded….(More)”

This High-Tech Solution to Disaster Response May Be Too Good to Be True


Sheri Fink in The New York Times: “The company called One Concern has all the characteristics of a buzzy and promising Silicon Valley start-up: young founders from Stanford, tens of millions of dollars in venture capital and a board with prominent names.

Its particular niche is disaster response. And it markets a way to use artificial intelligence to address one of the most vexing issues facing emergency responders in disasters: figuring out where people need help in time to save them.

That promise to bring new smarts and resources to an anachronistic field has generated excitement. Arizona, Pennsylvania and the World Bank have entered into contracts with One Concern over the past year. New York City and San Jose, Calif., are in talks with the company. And a Japanese city recently became One Concern’s first overseas client.

But when T.J. McDonald, who works for Seattle’s office of emergency management, reviewed a simulated earthquake on the company’s damage prediction platform, he spotted problems. A popular big-box store was grayed out on the web-based map, meaning there was no analysis of the conditions there, and shoppers and workers who might be in danger would not receive immediate help if rescuers relied on One Concern’s results.

“If that Costco collapses in the middle of the day, there’s going to be a lot of people who are hurt,” he said.

The error? The simulation, the company acknowledged, missed many commercial areas because damage calculations relied largely on residential census data.

One Concern has marketed its products as lifesaving tools for emergency responders after earthquakes, floods and, soon, wildfires. But interviews and documents show the company has often exaggerated its tools’ abilities and has kept outside experts from reviewing its methodology. In addition, some product features are available elsewhere at no charge, and data-hungry insurance companies — whose interests can diverge from those of emergency workers — are among One Concern’s biggest investors and customers.

Some critics even suggest that shortcomings in One Concern’s approach could jeopardize lives….(More)”.

Trust and Mistrust in Americans’ Views of Scientific Experts


Report by the Pew Research Center: “In an era when science and politics often appear to collide, public confidence in scientists is on the upswing, and six-inten Americans say scientists should play an active role in policy debates about scientific
issues, according to a new Pew Research Center survey.

The survey finds public confidence in scientists on par with confidence in the military. It also exceeds the levels of public confidence in other groups and institutions, including the media, business leaders and elected officials.

At the same time, Americans are divided along party lines in terms of how they view the value and objectivity of scientists and their ability to act in the public interest. And, while political divides do not carry over to views of all scientists and scientific issues, there are particularly sizable gaps between Democrats and Republicans when it comes to trust in scientists whose work is related to the environment.

Higher levels of familiarity with the work of scientists are associated with more positive and more trusting views of scientists regarding their competence, credibility and commitment to the public, the survey shows….(More)”.

De-risking custom technology projects


Paper by Robin Carnahan, Randy Hart, and Waldo Jaquith: “Only 13% of large government software projects are successful. State IT projects, in particular, are often challenged because states lack basic knowledge about modern software development, relying on outdated procurement processes.

State governments are increasingly reliant on modern software and hardware to deliver essential services to the public, and the success of any major policy initiative depends on the success of the underlying software infrastructure. Government agencies all confront similar challenges, facing budget and staffing constraints while struggling to modernize legacy technology systems that are out-of-date, inflexible, expensive, and ineffective. Government officials and agencies often rely on the same legacy processes that led to problems in the first place.

The public deserves a government that provides the same world-class technology they get from the commercial marketplace. Trust in government depends on it.

This handbook is designed for executives, budget specialists, legislators, and other “non-technical” decision-makers who fund or oversee state government technology projects. It can help you set these projects up for success by asking the right questions, identifying the right outcomes, and equally important, empowering you with a basic knowledge of the fundamental principles of modern software design.

This handbook also gives you the tools you need to start tackling related problems like:

  • The need to use, maintain, and modernize legacy systems simultaneously
  • Lock-in from legacy commercial arrangements
  • Siloed organizations and risk-averse cultures
  • Long budget cycles that don’t always match modern software design practices
  • Security threats
  • Hiring, staffing, and other resource constraints

This is written specifically for procurement of custom software, but it’s important to recognize that commercial off-the-shelf software (COTS) is often custom and Software as a Service (SaaS) often requires custom code. Once any customization is made, the bulk of this advice in this handbook applies to these commercial offerings. (See “Beware the customized commercial software trap” for details.)

As government leaders, we must be good stewards of public money by demanding easy-to-use, cost-effective, sustainable digital tools for use by the public and civil servants. This handbook will help you do just that….(More)”