Big Data, Thick Mediation, and Representational Opacity


Rafael Alvarado and Paul Humphreys in the New Literary History: “In 2008, the phrase “big data” shifted in meaning. It turned from referring to a problem and an opportunity for organizations with very large data sets to being the talisman for an emerging economic and cultural order that is both celebrated and feared for its deep and pervasive effects on the human condition. Economically, the phrase now denotes a data-mediated form of commerce exemplified by Google. Culturally, the phrase stands for a new form of knowledge and knowledge production. In this essay, we explore the connection between these two implicit meanings, considered as dimensions of a real social and scientific transformation with observable properties. We develop three central concepts: the datasphere, thick mediation, and representational opacity. These concepts provide a theoretical framework for making sense of how the economic and cultural dimensions interact to produce a set of effects, problems, and opportunities, not all of which have been addressed by big data’s critics and advocates….(More)”.

Is your software racist?


Li Zhou at Politico: “Late last year, a St. Louis tech executive named Emre Şarbak noticed something strange about Google Translate. He was translating phrases from Turkish — a language that uses a single gender-neutral pronoun “o” instead of “he” or “she.” But when he asked Google’s tool to turn the sentences into English, they seemed to read like a children’s book out of the 1950’s. The ungendered Turkish sentence “o is a nurse” would become “she is a nurse,” while “o is a doctor” would become “he is a doctor.”

The website Quartz went on to compose a sort-of poem highlighting some of these phrases; Google’s translation program decided that soldiers, doctors and entrepreneurs were men, while teachers and nurses were women. Overwhelmingly, the professions were male. Finnish and Chinese translations had similar problems of their own, Quartz noted.

What was going on? Google’s Translate tool “learns” language from an existing corpus of writing, and the writing often includes cultural patterns regarding how men and women are described. Because the model is trained on data that already has biases of its own, the results that it spits out serve only to further replicate and even amplify them.

It might seem strange that a seemingly objective piece of software would yield gender-biased results, but the problem is an increasing concern in the technology world. The term is “algorithmic bias” — the idea that artificially intelligent software, the stuff we count on to do everything from power our Netflix recommendations to determine our qualifications for a loan, often turns out to perpetuate social bias.

Voice-based assistants, like Amazon’s Alexa, have struggled to recognize different accents. A Microsoft chatbot on Twitter started spewing racist posts after learning from other users on the platform. In a particularly embarrassing example in 2015, a black computer programmer found that Google’s photo-recognition tool labeled him and a friend as “gorillas.”

Sometimes the results of hidden computer bias are insulting, other times merely annoying. And sometimes the effects are potentially life-changing….(More)”.

Our Hackable Political Future


Henry J. Farrell and Rick Perlstein at the New York Times: “….A program called Face2Face, developed at Stanford, films one person speaking, then manipulates that person’s image to resemble someone else’s. Throw in voice manipulation technology, and you can literally make anyone say anything — or at least seem to….

Another harrowing potential is the ability to trick the algorithms behind self-driving cars to not recognize traffic signs. Computer scientists have shown that nearly invisible changes to a stop sign can fool algorithms into thinking it says yield instead. Imagine if one of these cars contained a dissident challenging a dictator.

In 2007, Barack Obama’s political opponents insisted that footage existed of Michelle Obama ranting against “whitey.” In the future, they may not have to worry about whether it actually existed. If someone called their bluff, they may simply be able to invent it, using data from stock photos and pre-existing footage.

The next step would be one we are already familiar with: the exploitation of the algorithms used by social media sites like Twitter and Facebook to spread stories virally to those most inclined to show interest in them, even if those stories are fake.

It might be impossible to stop the advance of this kind of technology. But the relevant algorithms here aren’t only the ones that run on computer hardware. They are also the ones that undergird our too easily hacked media system, where garbage acquires the perfumed scent of legitimacy with all too much ease. Editors, journalists and news producers can play a role here — for good or for bad.

Outlets like Fox News spread stories about the murder of Democratic staff members and F.B.I. conspiracies to frame the president. Traditional news organizations, fearing that they might be left behind in the new attention economy, struggle to maximize “engagement with content.”

This gives them a built-in incentive to spread informational viruses that enfeeble the very democratic institutions that allow a free media to thrive. Cable news shows consider it their professional duty to provide “balance” by giving partisan talking heads free rein to spout nonsense — or amplify the nonsense of our current president.

It already feels as though we are living in an alternative science-fiction universe where no one agrees on what it true. Just think how much worse it will be when fake news becomes fake video. Democracy assumes that its citizens share the same reality. We’re about to find out whether democracy can be preserved when this assumption no longer holds….(More)”.

Should We Treat Data as Labor? Moving Beyond ‘Free’


Paper by Imanol Arrieta Ibarra, Leonard Goff, Diego Jiménez Hernández and Jaron Lanier: “In the digital economy, user data is typically treated as capital created by corporations observing willing individuals. This neglects users’ role in creating data, reducing incentives for users, distributing the gains from the data economy unequally and stoking fears of automation. Instead treating data (at least partially) as labor could help resolve these issues and restore a functioning market for user contributions, but may run against the near-term interests of dominant data monopsonists who have benefited from data being treated as ‘free’. Countervailing power, in the form of competition, a data labor movement and/or thoughtful regulation could help restore balance….(More)”.

Artificial intelligence and privacy


Report by the The Norwegian Data Protection Authority (DPA): “…If people cannot trust that information about them is being handled properly, it may limit their willingness to share information – for example with their doctor, or on social media. If we find ourselves in a situation in which sections of the population refuse to share information because they feel that their personal integrity is being violated, we will be faced with major challenges to our freedom of speech and to people’s trust in the authorities.

A refusal to share personal information will also represent a considerable challenge with regard to the commercial use of such data in sectors such as the media, retail trade and finance services.

About the report

This report elaborates on the legal opinions and the technologies described in the 2014 report «Big Data – privacy principles under pressure». In this report we will provide greater technical detail in describing artificial intelligence (AI), while also taking a closer look at four relevant AI challenges associated with the data protection principles embodied in the GDPR:

  • Fairness and discrimination
  • Purpose limitation
  • Data minimisation
  • Transparency and the right to information

This represents a selection of data protection concerns that in our opinion are most relevance for the use of AI today.

The target group for this report consists of people who work with, or who for other reasons are interested in, artificial intelligence. We hope that engineers, social scientists, lawyers and other specialists will find this report useful….(More) (Download Report)”.

Data As Infrastructure


Report by Peter Kawalek and Ali Bayat: “In the 21st Century, data is infrastructure. This is because the managed and built environments increasingly depend upon data in real-time. Moreover, the sources of this data are potentially multiple, not necessarily arising from within the control of traditional institutions, and yet this data can be complex in form. It follows from this that data is a critical component and needs to be understood as a key part of 21st infrastructure but that it presents new challenges to those institutions concerned with the safe and effective management of infrastructure.

It is now widely accepted that the digitization of the economy has taken root in a way that means it is not confined to one sector. All sectors are affected in some common ways. Brynjolfsson and McAfee are among those who have described this. The economic drivers behind digitization are successfully isolated and described by Goldfarb & Tucker , and there are important contributions to understanding given by economists including Levin and Nordhaus . It is proportionate to describe the economic and social ramifications within the frame of ‘Creative Destruction’, originally described by Schumpeter in 19425 . In this light, the importance of data can be expected to grow across most or all industry sectors. Its effective management will become ever more critical to the economy and to society more widely….(More)”

Views on Open Data Business from Software Development Companies


Antti Herala, Jussi Kasurinen, and Erno Vanhala in the Journal of Theoretical and Applied Electronic Commerce Research: “The main concept of open data and its application is simple; access to the publicly-funded data provides greater returns from the public investment and can generate wealth through the downstream use of outputs, such as traffic information or weather forecast services. However, even though open data and data sharing as concepts are forty years old with the open data initiative reaching ten, the practical actions and applications have tended to stay on the superficial level, and significant progress or success stories are hard to find. The current trend is that the governments and municipalities are opening their data, but the impact and usefulness of raw open data repositories to citizens – and even to businesses – can be questioned Besides the governments, a handful of private organizations are opening their data in an attempt to unlock the economic value of open data, but even they have difficulties finding innovative usage, let alone generate additional profit.

In a previous study it was found that companies are interested in open data and that this mindset spans over different industries, from both publicly available data to the private business-to-business data access. Open data is not only a resource for software companies, but also for traditional engineering industries and even for small, nonfranchised local markets and shops. In our previous study, it was established that there is evidence  on recognizing the applicability of open data, and opening the data to the clients by private organizations leads to business opportunities, creating new value.

However, while there is interest towards open data in a wide variety of businesses, the question still remains whether or not open data is actually used to generate income or are there some other sharing methods in use that are more efficient and more profitable.

For this study, four research questions were formulated. The first three are concentrating on the usage of open data as well as the interest towards opening or sharing data and the fourth research question revolves around the different types of openness:

  • How do new clients express interest towards open data?
  • What kind of open data-based solutions is the existing clientele expecting?
  • How does the product portfolio of a software company respond to open data?
  • What are the current trends of open initiatives?…(More)”.

Earth Observation Open Science and Innovation


Open Access book edited by Pierre-Philippe Mathieu and Christoph Aubrecht: “Over  the  past  decades,  rapid developments in digital and sensing technologies, such  as the Cloud, Web and Internet of Things, have dramatically changed the way we live and work. The digital transformation is revolutionizing our ability to monitor our planet and transforming the  way we access, process and exploit Earth Observation data from satellites.

This book reviews these megatrends and their implications for the Earth Observation community as well as the wider data economy. It provides insight into new paradigms of Open Science and Innovation applied to space data, which are characterized by openness, access to large volume of complex data, wide availability of new community tools, new techniques for big data analytics such as Artificial Intelligence, unprecedented level of computing power, and new types of collaboration among researchers, innovators, entrepreneurs and citizen scientists. In addition, this book aims to provide readers with some reflections on the future of Earth Observation, highlighting through a series of use cases not just the new opportunities created by the New Space revolution, but also the new challenges that must be addressed in order to make the most of the large volume of complex and diverse data delivered by the new generation of satellites….(More)”.

Foursquare to The Rescue: Predicting Ambulance Calls Across Geographies


Paper by Anastasios NoulasColin MoffattDesislava Hristova, and Bruno Gonćalves: “Understanding how ambulance incidents are spatially distributed can shed light to the epidemiological dynamics of geographic areas and inform healthcare policy design. Here we analyze a longitudinal dataset of more than four million ambulance calls across a region of twelve million residents in the North West of England. With the aim to explain geographic variations in ambulance call frequencies, we employ a wide range of data layers including open government datasets describing population demographics and socio-economic characteristics, as well as geographic activity in online services such as Foursquare.

Working at a fine level of spatial granularity we demonstrate that daytime population levels and the deprivation status of an area are the most important variables when it comes to predicting the volume of ambulance calls at an area. Foursquare check-ins on the other hand complement these government sourced indicators, offering a novel view to population nightlife and commercial activity locally. We demonstrate how check-in activity can provide an edge when predicting certain types of emergency incidents in a multi-variate regression model…(More)”.

A Roadmap to a Nationwide Data Infrastructure for Evidence-Based Policymaking


Introduction by Julia Lane and Andrew Reamer of a Special Issue of the Annals of the American Academy of Political and Social Science: “Throughout the United States, there is broad interest in expanding the nation’s capacity to design and implement public policy based on solid evidence. That interest has been stimulated by the new types of data that are available that can transform the way in which policy is designed and implemented. Yet progress in making use of sensitive data has been hindered by the legal, technical, and operational obstacles to access for research and evaluation. Progress has also been hindered by an almost exclusive focus on the interest and needs of the data users, rather than the interest and needs of the data providers. In addition, data stewardship is largely artisanal in nature.

There are very real consequences that result from lack of action. State and local governments are often hampered in their capacity to effectively mount and learn from innovative efforts. Although jurisdictions often have treasure troves of data from existing programs, the data are stove-piped, underused, and poorly maintained. The experience reported by one large city public health commissioner is too common: “We commissioners meet periodically to discuss specific childhood deaths in the city. In most cases, we each have a thick file on the child or family. But the only time we compare notes is after the child is dead.”1 In reality, most localities lack the technical, analytical, staffing, and legal capacity to make effective use of existing and emerging resources.

It is our sense that fundamental changes are necessary and a new approach must be taken to building data infrastructures. In particular,

  1. Privacy and confidentiality issues must be addressed at the beginning—not added as an afterthought.
  2. Data providers must be involved as key stakeholders throughout the design process.
  3. Workforce capacity must be developed at all levels.
  4. The scholarly community must be engaged to identify the value to research and policy….

To develop a roadmap for the creation of such an infrastructure, the Bill and Melinda Gates Foundation, together with the Laura and John Arnold Foundation, hosted a day-long workshop of more than sixty experts to discuss the findings of twelve commissioned papers and their implications for action. This volume of The ANNALS showcases those twelve articles. The workshop papers were grouped into three thematic areas: privacy and confidentiality, the views of data producers, and comprehensive strategies that have been used to build data infrastructures in other contexts. The authors and the attendees included computer scientists, social scientists, practitioners, and data producers.

This introductory article places the research in both an historical and a current context. It also provides a framework for understanding the contribution of the twelve articles….(More)”.