Online Political Microtargeting: Promises and Threats for Democracy


Frederik Zuiderveen Borgesius et al in Utrecht Law Review: “Online political microtargeting involves monitoring people’s online behaviour, and using the collected data, sometimes enriched with other data, to show people-targeted political advertisements. Online political microtargeting is widely used in the US; Europe may not be far behind.

This paper maps microtargeting’s promises and threats to democracy. For example, microtargeting promises to optimise the match between the electorate’s concerns and political campaigns, and to boost campaign engagement and political participation. But online microtargeting could also threaten democracy. For instance, a political party could, misleadingly, present itself as a different one-issue party to different individuals. And data collection for microtargeting raises privacy concerns. We sketch possibilities for policymakers if they seek to regulate online political microtargeting. We discuss which measures would be possible, while complying with the right to freedom of expression under the European Convention on Human Rights….(More)”.

Data Collaboratives can transform the way civil society organisations find solutions


Stefaan G. Verhulst at Disrupt & Innovate: “The need for innovation is clear: The twenty-first century is shaping up to be one of the most challenging in recent history. From climate change to income inequality to geopolitical upheaval and terrorism: the difficulties confronting International Civil Society Organisations (ICSOs) are unprecedented not only in their variety but also in their complexity. At the same time, today’s practices and tools used by ICSOs seem stale and outdated. Increasingly, it is clear, we need not only new solutions but new methods for arriving at solutions.

Data will likely become more central to meeting these challenges. We live in a quantified era. It is estimated that 90% of the world’s data was generated in just the last two years. We know that this data can help us understand the world in new ways and help us meet the challenges mentioned above. However, we need new data collaboration methods to help us extract the insights from that data.

UNTAPPED DATA POTENTIAL

For all of data’s potential to address public challenges, the truth remains that most data generated today is in fact collected by the private sector – including ICSOs who are often collecting a vast amount of data – such as, for instance, the International Committee of the Red Cross, which generates various (often sensitive) data related to humanitarian activities. This data, typically ensconced in tightly held databases toward maintaining competitive advantage or protecting from harmful intrusion, contains tremendous possible insights and avenues for innovation in how we solve public problems. But because of access restrictions and often limited data science capacity, its vast potential often goes untapped.

DATA COLLABORATIVES AS A SOLUTION

Data Collaboratives offer a way around this limitation. They represent an emerging public-private partnership model, in which participants from different areas — including the private sector, government, and civil society — come together to exchange data and pool analytical expertise.

While still an emerging practice, examples of such partnerships now exist around the world, across sectors and public policy domains. Importantly several ICSOs have started to collaborate with others around their own data and that of the private and public sector. For example:

  • Several civil society organisations, academics, and donor agencies are partnering in the Health Data Collaborative to improve the global data infrastructure necessary to make smarter global and local health decisions and to track progress against the Sustainable Development Goals (SDGs).
  • Additionally, the UN Office for the Coordination of Humanitarian Affairs (UNOCHA) built Humanitarian Data Exchange (HDX), a platform for sharing humanitarian from and for ICSOs – including Caritas, InterAction and others – donor agencies, national and international bodies, and other humanitarian organisations.

These are a few examples of Data Collaboratives that ICSOs are participating in. Yet, the potential for collaboration goes beyond these examples. Likewise, so do the concerns regarding data protection and privacy….(More)”.

The future of statistics and data science


Paper by Sofia C. Olhede and Patrick J. Wolfe in Statistics & Probability Letters: “The Danish physicist Niels Bohr is said to have remarked: “Prediction is very difficult, especially about the future”. Predicting the future of statistics in the era of big data is not so very different from prediction about anything else. Ever since we started to collect data to predict cycles of the moon, seasons, and hence future agriculture yields, humankind has worked to infer information from indirect observations for the purpose of making predictions.

Even while acknowledging the momentous difficulty in making predictions about the future, a few topics stand out clearly as lying at the current and future intersection of statistics and data science. Not all of these topics are of a strictly technical nature, but all have technical repercussions for our field. How might these repercussions shape the still relatively young field of statistics? And what can sound statistical theory and methods bring to our understanding of the foundations of data science? In this article we discuss these issues and explore how new open questions motivated by data science may in turn necessitate new statistical theory and methods now and in the future.

Together, the ubiquity of sensing devices, the low cost of data storage, and the commoditization of computing have led to a volume and variety of modern data sets that would have been unthinkable even a decade ago. We see four important implications for statistics.

First, many modern data sets are related in some way to human behavior. Data might have been collected by interacting with human beings, or personal or private information traceable back to a given set of individuals might have been handled at some stage. Mathematical or theoretical statistics traditionally does not concern itself with the finer points of human behavior, and indeed many of us have only had limited training in the rules and regulations that pertain to data derived from human subjects. Yet inevitably in a data-rich world, our technical developments cannot be divorced from the types of data sets we can collect and analyze, and how we can handle and store them.

Second, the importance of data to our economies and civil societies means that the future of regulation will look not only to protect our privacy, and how we store information about ourselves, but also to include what we are allowed to do with that data. For example, as we collect high-dimensional vectors about many family units across time and space in a given region or country, privacy will be limited by that high-dimensional space, but our wish to control what we do with data will go beyond that….

Third, the growing complexity of algorithms is matched by an increasing variety and complexity of data. Data sets now come in a variety of forms that can be highly unstructured, including images, text, sound, and various other new forms. These different types of observations have to be understood together, resulting in multimodal data, in which a single phenomenon or event is observed through different types of measurement devices. Rather than having one phenomenon corresponding to single scalar values, a much more complex object is typically recorded. This could be a three-dimensional shape, for example in medical imaging, or multiple types of recordings such as functional magnetic resonance imaging and simultaneous electroencephalography in neuroscience. Data science therefore challenges us to describe these more complex structures, modeling them in terms of their intrinsic patterns.

Finally, the types of data sets we now face are far from satisfying the classical statistical assumptions of identically distributed and independent observations. Observations are often “found” or repurposed from other sampling mechanisms, rather than necessarily resulting from designed experiments….

 Our field will either meet these challenges and become increasingly ubiquitous, or risk rapidly becoming irrelevant to the future of data science and artificial intelligence….(More)”.

Spanning Today’s Chasms: Seven Steps to Building Trusted Data Intermediaries


James Shulman at the Mellon Foundation: “In 2001, when hundreds of individual colleges and universities were scrambling to scan their slide libraries, The Andrew W. Mellon Foundation created a new organization, Artstor, to assemble a massive library of digital images from disparate sources to support teaching and research in the arts and humanities.

Rather than encouraging—or paying for—each school to scan its own slide of the Mona Lisa, the Mellon Foundation created an intermediary organization that would balance the interests of those who created, photographed and cared for art works, such as artists and museums, and those who wanted to use such images for the admirable calling of teaching and studying history and culture.  This organization would reach across the gap that separated these two communities and would respect and balance the interests of both sides, while helping each accomplish their missions.  At the same time that Napster was using technology to facilitate the un-balanced transfer of digital content from creators to users, the Mellon Foundation set up a new institution aimed at respecting the interests of one side of the market and supporting the socially desirable work of the other.

As the internet has enabled the sharing of data across the world, new intermediaries have emerged as entire platforms. A networked world needs such bridges—think Etsy or Ebay sitting between sellers and buyers, or Facebook sitting between advertisers and users. While intermediaries that match sellers and buyers of things provide a marketplace to bridge from one side or the other, aggregators of data work in admittedly more shadowy territories.

In the many realms that market forces won’t support, however, a great deal of public good can be done by aggregating and managing access to datasets that might otherwise continue to live in isolation. Whether due to institutional sociology that favors local solutions, the technical challenges associated with merging heterogeneous databases built with different data models, intellectual property limitations, or privacy concerns, datasets are built and maintained by independent groups that—if networked—could be used to further each other’s work.

Think of those studying coral reefs, or those studying labor practices in developing markets, or child welfare offices seeking to call upon court records in different states, or medical researchers working in different sub-disciplines but on essentially the same disease.  What intermediary invests in joining these datasets?  Many people assume that computers can simply “talk” to each other and share data intuitively, but without targeted investment in connecting them, they can’t.  Unlike modern databases that are now often designed with the cloud in mind, decades of locally created databases churn away in isolation, at great opportunity cost to us all.

Art history research is an unusually vivid example. Most people can understand that if you want to study Caravaggio, you don’t want to hunt and peck across hundreds of museums, books, photo archives, libraries, churches, and private collections.  You want all that content in one place—exactly what Mellon sought to achieve by creating Artstor.

What did we learn in creating Artstor that might be distilled as lessons for others taking on an aggregation project to serve the public good?….(More)”.

Facebook’s next project: American inequality


Nancy Scola at Politico: “Facebook CEO Mark Zuckerberg is quietly cracking open his company’s vast trove of user data for a study on economic inequality in the U.S. — the latest sign of his efforts to reckon with divisions in American society that the social network is accused of making worse.

The study, which hasn’t previously been reported, is mining the social connections among Facebook’s American users to shed light on the growing income disparity in the U.S., where the top 1 percent of households is said to control 40 percent of the country’s wealth. Facebook is an incomparably rich source of information for that kind of research: By one estimate, about three of five American adults use the social network….

Facebook confirmed the broad contours of its partnership with Chetty but declined to elaborate on the substance of the study. Chetty, in a brief interview following a January speech in Washington, said he and his collaborators — who include researchers from Stanford and New York University — have been working on the inequality study for at least six months.

“We’re using social networks, and measuring interactions there, to understand the role of social capital much better than we’ve been able to,” he said.

Researchers say they see Facebook’s enormous cache of data as a remarkable resource, offering an unprecedentedly detailed and sweeping look at American society. That store of information contains both details that a user might tell Facebook — their age, hometown, schooling, family relationships — and insights that the company has picked up along the way, such as the interest groups they’ve joined and geographic distribution of who they call a “friend.”

It’s all the more significant, researchers say, when you consider that Facebook’s user base — about 239 million monthly users in the U.S. and Canada at last count — cuts across just about every demographic group.

And all that information, say researchers, lets them take guesses about users’ wealth. Facebook itself recently patented a way of figuring out someone’s socioeconomic status using factors ranging from their stated hobbies to how many internet-connected devices they own.

A Facebook spokesman addressed the potential privacy implications of the study’s access to user data, saying, “We conduct research at Facebook responsibly, which includes making sure we protect people’s information.” The spokesman added that Facebook follows an “enhanced” review process for research projects, adopted in 2014 after a controversy over a study that manipulated some people’s news feeds to see if it made them happier or sadder.

According to a Stanford University source familiar with Chetty’s study, the Facebook account data used in the research has been stripped of any details that could be used to identify users. The source added that academics involved in the study have gone through security screenings that include background checks, and can access the Facebook data only in secure facilities….(More)”.

An AI That Reads Privacy Policies So That You Don’t Have To


Andy Greenberg at Wired: “…Today, researchers at Switzerland’s Federal Institute of Technology at Lausanne (EPFL), the University of Wisconsin and the University of Michigan announced the release of Polisis—short for “privacy policy analysis”—a new website and browser extension that uses their machine-learning-trained app to automatically read and make sense of any online service’s privacy policy, so you don’t have to.

In about 30 seconds, Polisis can read a privacy policy it’s never seen before and extract a readable summary, displayed in a graphic flow chart, of what kind of data a service collects, where that data could be sent, and whether a user can opt out of that collection or sharing. Polisis’ creators have also built a chat interface they call Pribot that’s designed to answer questions about any privacy policy, intended as a sort of privacy-focused paralegal advisor. Together, the researchers hope those tools can unlock the secrets of how tech firms use your data that have long been hidden in plain sight….

Polisis isn’t actually the first attempt to use machine learning to pull human-readable information out of privacy policies. Both Carnegie Mellon University and Columbia have made their own attempts at similar projects in recent years, points out NYU Law Professor Florencia Marotta-Wurgler, who has focused her own research on user interactions with terms of service contracts online. (One of her own studies showed that only .07 percent of users actually click on a terms of service link before clicking “agree.”) The Usable Privacy Policy Project, a collaboration that includes both Columbia and CMU, released its own automated tool to annotate privacy policies just last month. But Marotta-Wurgler notes that Polisis’ visual and chat-bot interfaces haven’t been tried before, and says the latest project is also more detailed in how it defines different kinds of data. “The granularity is really nice,” Marotta-Wurgler says. “It’s a way of communicating this information that’s more interactive.”…(More)”.

Building Trust in Data and Statistics


Shaida Badiee at UN World Data Forum: …What do we want for a 2030 data ecosystem?

Hope to achieve: A world where data are part of the DNA and culture of decision-making, used by all and valued as an important public good. A world where citizens trust the systems that produce data and have the skills and means to use and verify their quality and accuracy. A world where there are safeguards in place to protect privacy, while bringing the benefits of open data to all. In this world, countries value their national statistical systems, which are working independently with trusted partners in the public and private sectors and citizens to continuously meet the changing and expanding demands from data users and policy makers. Private sector data generators are generously sharing their data with public sector. And gaps in data are closing, making the dream of “leaving no one behind” come true, with SDG goals on the path to being met by 2030.

Hope to avoid: A world where large corporations control the bulk of national and international data and statistics with only limited sharing with the public sector, academics, and citizens. The culture of every man for himself and who pays, wins, dominates data sharing practices. National statistical systems are under-resourced and under-valued, with low trust from users, further weakening them and undermining their independence from political interference and their ability to control quality. The divide between those who have and those who do not have access, skills, and the ability to use data for decision-making and policy has widened. Data systems and their promise to count the uncounted and “leave no one behind” are falling behind due to low capacity and poor standards and institutions, and the hope of the 2030 agenda is fading.

With this vision in mind, are we on the right path? An optimist would say we are closer to the data ecosystem that we want to achieve. However, there are also some examples of movement in the wrong direction. There is no magic wand to make our wish come true, but a powerful enabler would be building trust in data and statistics. Therefore, this should be included as a goal in all our data strategies and action plans.

Here are some important building blocks underlying trust in data and statistics:

  1. Building strong organizational infrastructure, governance, and partnerships;
  2. Following sound data standards and principles for production, sharing, interoperability, and dissemination; and
  3. Addressing the last mile in the data value chain to meet users’ needs, create value with data, and ensure meaningful impacts…(More)”.

Self-Tracking: Empirical and Philosophical Investigations


Book edited by Btihaj Ajana: “…provides an empirical and philosophical investigation of self-tracking practices. In recent years, there has been an explosion of apps and devices that enable the data capturing and monitoring of everyday activities, behaviours and habits. Encouraged by movements such as the Quantified Self, a growing number of people are embracing this culture of quantification and tracking in the spirit of improving their health and wellbeing.
The aim of this book is to enhance understanding of this fast-growing trend, bringing together scholars who are working at the forefront of the critical study of self-tracking practices. Each chapter provides a different conceptual lens through which one can examine these practices, while grounding the discussion in relevant empirical examples.
From phenomenology to discourse analysis, from questions of identity, privacy and agency to issues of surveillance and tracking at the workplace, this edited collection takes on a wide, and yet focused, approach to the timely topic of self-tracking. It constitutes a useful companion for scholars, students and everyday users interested in the Quantified Self phenomenon…(More)”.

A Really Bad Blockchain Idea: Digital Identity Cards for Rohingya Refugees


Wayan Vota at ICTworks: “The Rohingya Project claims to be a grassroots initiative that will empower Rohingya refugees with a blockchain-leveraged financial ecosystem tied to digital identity cards….

What Could Possibly Go Wrong?

Concerns about Rohingya data collection are not new, so Linda Raftree‘s Facebook post about blockchain for biometrics started a spirited discussion on this escalation of techno-utopia. Several people put forth great points about the Rohingya Project’s potential failings. For me, there were four key questions originating in the discussion that we should all be debating:

1. Who Determines Ethnicity?

Ethnicity isn’t a scientific way to categorize humans. Ethnic groups are based on human constructs such as common ancestry, language, society, culture, or nationality. Who are the Rohingya Project to be the ones determining who is Rohingya or not? And what is this rigorous assessment they have that will do what science cannot?

Might it be better not to perpetuate the very divisions that cause these issues? Or at the very least, let people self-determine their own ethnicity.

2. Why Digitally Identify Refugees?

Let’s say that we could group a people based on objective metrics. Should we? Especially if that group is persecuted where it currently lives and in many of its surrounding countries? Wouldn’t making a list of who is persecuted be a handy reference for those who seek to persecute more?

Instead, shouldn’t we focus on changing the mindset of the persecutors and stop the persecution?

3. Why Blockchain for Biometrics?

How could linking a highly persecuted people’s biometric information, such as fingerprints, iris scans, and photographs, to a public, universal, and immutable distributed ledger be a good thing?

Might it be highly irresponsible to digitize all that information? Couldn’t that data be used by nefarious actors to perpetuate new and worse exploitation of Rohingya? India has already lost Aadhaar data and the Equafax lost Americans’ data. How will the small, lightly funded Rohingya Project do better?

Could it be possible that old-fashioned paper forms are a better solution than digital identity cards? Maybe laminate them for greater durability, but paper identity cards can be hidden, even destroyed if needed, to conceal information that could be used against the owner.

4. Why Experiment on the Powerless?

Rohingya refugees already suffer from massive power imbalances, and now they’ll be asked to give up their digital privacy, and use experimental technology, as part of an NGO’s experiment, in order to get needed services.

Its not like they’ll have the agency to say no. They are homeless, often penniless refugees, who will probably have no realistic way to opt-out of digital identity cards, even if they don’t want to be experimented on while they flee persecution….(More)”

Artificial intelligence and privacy


Report by the The Norwegian Data Protection Authority (DPA): “…If people cannot trust that information about them is being handled properly, it may limit their willingness to share information – for example with their doctor, or on social media. If we find ourselves in a situation in which sections of the population refuse to share information because they feel that their personal integrity is being violated, we will be faced with major challenges to our freedom of speech and to people’s trust in the authorities.

A refusal to share personal information will also represent a considerable challenge with regard to the commercial use of such data in sectors such as the media, retail trade and finance services.

About the report

This report elaborates on the legal opinions and the technologies described in the 2014 report «Big Data – privacy principles under pressure». In this report we will provide greater technical detail in describing artificial intelligence (AI), while also taking a closer look at four relevant AI challenges associated with the data protection principles embodied in the GDPR:

  • Fairness and discrimination
  • Purpose limitation
  • Data minimisation
  • Transparency and the right to information

This represents a selection of data protection concerns that in our opinion are most relevance for the use of AI today.

The target group for this report consists of people who work with, or who for other reasons are interested in, artificial intelligence. We hope that engineers, social scientists, lawyers and other specialists will find this report useful….(More) (Download Report)”.