Eurobarometer survey shows support for sustainability and data sharing


Press Release: “Europeans want their digital devices to be easier to repair or recycle and are willing to share their personal information to improve public services, as a special Eurobarometer survey shows. The survey, released today, measured attitudes towards the impact of digitalisation on daily lives of Europeans in 27 EU Member States and the United Kingdom. It covers several different areas including digitalisation and the environment, sharing personal information, disinformation, digital skills and the use of digital ID….

Overall, 59% of respondents would be willing to share some of their personal information securely to improve public services. In particular, most respondents are willing to share their data to improve medical research and care (42%), to improve the response to crisis (31%) or to improve public transport and reduce air pollution (26%).

An overwhelming majority of respondents who use their social media accounts to log in to other online services (74%) want to know how their data is used. A large majority would consider it useful to have a secure single digital ID that could serve for all online services and give them control over the use of their data….

In addition to the Special Eurobarometer report, the last iteration of the Standard Eurobarometer conducted in November 2019 also tested public perceptions related to Artificial Intelligence. The findings also published in a separate report today.

Around half of the respondents (51%) said that public policy intervention is needed to ensure ethical applications. Half of the respondents (50%) mention the healthcare sector as the area where AI could be most beneficial. A strong majority (80%) of the respondents think that they should be informed when a digital service or mobile application uses AI in various situations….(More)”.

Accelerating AI with synthetic data


Essay by Khaled El Emam: “The application of artificial intelligence and machine learning to solve today’s problems requires access to large amounts of data. One of the key obstacles faced by analysts is access to this data (for example, these issues were reflected in reports from the General Accountability Office and the McKinsey Institute).

Synthetic data can help solve this data problem in a privacy preserving manner.

What is synthetic data ?

Data synthesis is an emerging privacy-enhancing technology that can enable access to realistic data, which is information that may be synthetic, but has the properties of an original dataset. It also simultaneously ensures that such information can be used and disclosed with reduced obligations under contemporary privacy statutes. Synthetic data retains the statistical properties of the original data. Therefore, there are an increasing number of use cases where it would serve as a proxy for real data.

Synthetic data is created by taking an original (real) dataset and then building a model to characterize the distributions and relationships in that data — this is called the “synthesizer.” The synthesizer is typically an artificial neural network or other machine learning technique that learns these (original) data characteristics. Once that model is created, it can be used to generate synthetic data. The data is generated from the model and does not have a 1:1 mapping to real data, meaning that the likelihood of mapping the synthetic records to real individuals would be very small — it is not considered personal information.

Many different types of data can be synthesized, including images, video, audio, text and structured data. The main focus in this article is on the synthesis of structured data.

Even though data can be generated in this manner, that does not mean it cannot be personal information. If the synthesizer is overfit to real data, then the generated data will replicate the original real data. Therefore, the synthesizer has to be constructed in a manner to avoid such overfitting. A formal privacy assurance should also be performed on the synthesized data to validate that there is a weak mapping between synthetic records to individuals….(More)”.

We All Wear Tinfoil Hats Now


Article by Geoff Shullenberger on “How fears of mind control went from paranoid delusion to conventional wisdom”: “In early 2017, after the double shock of Brexit and the election of Donald Trump, the British data-mining firm Cambridge Analytica gained sudden notoriety. The previously little-known company, reporters claimed, had used behavioral influencing techniques to turn out social media users to vote in both elections. By its own account, Cambridge Analytica had worked with both campaigns to produce customized propaganda for targeting individuals on Facebook likely to be swept up in the tide of anti-immigrant populism. Its methods, some news sources suggested, might have sent enough previously disengaged voters to the polls to have tipped the scales in favor of the surprise victors. To a certain segment of the public, this story seemed to answer the question raised by both upsets: How was it possible that the seemingly solid establishment consensus had been rejected? What’s more, the explanation confirmed everything that seemed creepy about the Internet, evoking a sci-fi vision of social media users turned into an army of political zombies, mobilized through subliminal manipulation.

Cambridge Analytica’s violations of Facebook users’ privacy have made it an enduring symbol of the dark side of social media. However, the more dramatic claims about the extent of the company’s political impact collapse under closer scrutiny, mainly because its much-hyped “psychographic targeting” methods probably don’t work. As former Facebook product manager Antonio García Martínez noted in a 2018 Wired article, “the public, with no small help from the media sniffing a great story, is ready to believe in the supernatural powers of a mostly unproven targeting strategy,” but “most ad insiders express skepticism about Cambridge Analytica’s claims of having influenced the election, and stress the real-world difficulty of changing anyone’s mind about anything with mere Facebook ads, least of all deeply ingrained political views.” According to García, the entire affair merely confirms a well-established truth: “In the ads world, just because a product doesn’t work doesn’t mean you can’t sell it….(More)”.

How Philanthropy Can Help Lead on Data Justice


Louise Lief at Stanford Social Innovation Review: “Today, data governs almost every aspect of our lives, shaping the opportunities we have, how we perceive reality and understand problems, and even what we believe to be possible. Philanthropy is particularly data driven, relying on it to inform decision-making, define problems, and measure impact. But what happens when data design and collection methods are flawed, lack context, or contain critical omissions and misdirected questions? With bad data, data-driven strategies can misdiagnose problems and worsen inequities with interventions that don’t reflect what is needed.

Data justice begins by asking who controls the narrative. Who decides what data is collected and for which purpose? Who interprets what it means for a community? Who governs it? In recent years, affected communities, social justice philanthropists, and academics have all begun looking deeper into the relationship between data and social justice in our increasingly data-driven world. But philanthropy can play a game-changing role in developing practices of data justice to more accurately reflect the lived experience of communities being studied. Simply incorporating data justice principles into everyday foundation practice—and requiring it of grantees—would be transformative: It would not only revitalize research, strengthen communities, influence policy, and accelerate social change, it would also help address deficiencies in current government data sets.

When Data Is Flawed

Some of the most pioneering work on data justice has been done by Native American communities, who have suffered more than most from problems with bad data. A 2017 analysis of American Indian data challenges—funded by the W.K. Kellogg Foundation and the Morris K. Udall and Stewart L. Udall Foundation—documented how much data on Native American communities is of poor quality, inaccurate, inadequate, inconsistent, irrelevant, and/or inaccessible. The National Congress of American Indians even described American Native communities as “The Asterisk Nation,” because in many government data sets they are represented only by an asterisk denoting sampling errors instead of data points.

Where it concerns Native Americans, data is often not standardized and different government databases identify tribal members at least seven different ways using different criteria; federal and state statistics often misclassify race and ethnicity; and some data collection methods don’t allow tribes to count tribal citizens living off the reservation. For over a decade the Department of the Interior’s Bureau of Indian Affairs has struggled to capture the data it needs for a crucial labor force report it is legally required to produce; methodology errors and reporting problems have been so extensive that at times it prevented the report from even being published. But when the Department of the Interior changed several reporting requirements in 2014 and combined data submitted by tribes with US Census data, it only compounded the problem, making historical comparisons more difficult. Moreover, Native Americans have charged that the Census Bureau significantly undercounts both the American Indian population and key indicators like joblessness….(More)”.

Self-interest and data protection drive the adoption and moral acceptability of big data technologies: A conjoint analysis approach


Paper by Rabia I.Kodapanakka, lMark J.Brandt, Christoph Kogler, and Iljavan Beest: “Big data technologies have both benefits and costs which can influence their adoption and moral acceptability. Prior studies look at people’s evaluations in isolation without pitting costs and benefits against each other. We address this limitation with a conjoint experiment (N = 979), using six domains (criminal investigations, crime prevention, citizen scores, healthcare, banking, and employment), where we simultaneously test the relative influence of four factors: the status quo, outcome favorability, data sharing, and data protection on decisions to adopt and perceptions of moral acceptability of the technologies.

We present two key findings. (1) People adopt technologies more often when data is protected and when outcomes are favorable. They place equal or more importance on data protection in all domains except healthcare where outcome favorability has the strongest influence. (2) Data protection is the strongest driver of moral acceptability in all domains except healthcare, where the strongest driver is outcome favorability. Additionally, sharing data lowers preference for all technologies, but has a relatively smaller influence. People do not show a status quo bias in the adoption of technologies. When evaluating moral acceptability, people show a status quo bias but this is driven by the citizen scores domain. Differences across domains arise from differences in magnitude of the effects but the effects are in the same direction. Taken together, these results highlight that people are not always primarily driven by self-interest and do place importance on potential privacy violations. They also challenge the assumption that people generally prefer the status quo….(More)”.

How to Put the Data Subject's Sovereignty into Practice. Ethical Considerations and Governance Perspectives



Paper by Peter Dabrock: “Ethical considerations and governance approaches of AI are at a crossroads. Either one tries to convey the impression that one can bring back a status quo ante of our given “onlife”-era, or one accepts to get responsibly involved in a digital world in which informational self-determination can no longer be safeguarded and fostered through the old fashioned data protection principles of informed consent, purpose limitation and data economy. The main focus of the talk is on how under the given conditions of AI and machine learning, data sovereignty (interpreted as controllability [not control (!)] of the data subject over the use of her data throughout the entire data processing cycle) can be strengthened without hindering innovation dynamics of digital economy and social cohesion of fully digitized societies. In order to put this approach into practice the talk combines a presentation of the concept of data sovereignty put forward by the German Ethics Council with recent research trends in effectively applying the AI ethics principles of explainability and enforceability…(More)”.

NGOs embrace GDPR, but will it be used against them?


Report by Vera Franz et al: “When the world’s most comprehensive digital privacy law – the EU General Data Protection Regulation (GDPR) – took effect in May 2018, media and tech experts focused much of their attention on how corporations, who hold massive amounts of data, would be affected by the law.

This focus was understandable, but it left some important questions under-examined–specifically about non-profit organizations that operate in the public’s interest. How would non-governmental organizations (NGOs) be impacted? What does GDPR compliance mean in very practical terms for NGOs? What are the challenges they are facing? Could the GDPR be ‘weaponized’ against NGOs and if so, how? What good compliance practices can be shared among non-profits?

Ben Hayes and Lucy Hannah from Data Protection Support & Management and I have examined these questions in detail and released our findings in this report.

Our key takeaway: GDPR compliance is an integral part of organisational resilience, and it requires resources and attention from NGO leaders, foundations and regulators to defend their organisations against attempts by governments and corporations to misuse the GDPR against them.

In a political climate where human rights and social justice groups are under increasing pressure, GDPR compliance needs to be given the attention it deserves by NGO leaders and funders. Lack of compliance will attract enforcement action by data protection regulators and create opportunities for retaliation by civil society adversaries.

At the same time, since the law came into force, we recognise that some NGOs have over-complied with the law, possibly diverting scarce resources and hampering operations.

For example, during our research, we discovered a small NGO that undertook an advanced and resource-intensive compliance process (a Data Protection Impact Assessment or DPIA) for all processing operations. DPIAs are only required for large-scale and high-risk processing of personal data. Yet this NGO, which holds very limited personal data and undertakes no marketing or outreach activities, engaged in this complex and time-consuming assessment because the organization was under enormous pressure from their government. They told us they “wanted to do everything possible to avoid attracting attention.”…

Our research also found that private companies, individuals and governments who oppose the work of an organisation have used GDPR to try to keep NGOs from publishing their work. To date, NGOs have successfully fought against this misuse of the law….(More)“.

Federal Agencies Use Cellphone Location Data for Immigration Enforcement


Byron Tau and Michelle Hackman at the Wall Street Journal: “The Trump administration has bought access to a commercial database that maps the movements of millions of cellphones in America and is using it for immigration and border enforcement, according to people familiar with the matter and documents reviewed by The Wall Street Journal.

The location data is drawn from ordinary cellphone apps, including those for games, weather and e-commerce, for which the user has granted permission to log the phone’s location.

The Department of Homeland Security has used the information to detect undocumented immigrants and others who may be entering the U.S. unlawfully, according to these people and documents.

U.S. Immigration and Customs Enforcement, a division of DHS, has used the data to help identify immigrants who were later arrested, these people said. U.S. Customs and Border Protection, another agency under DHS, uses the information to look for cellphone activity in unusual places, such as remote stretches of desert that straddle the Mexican border, the people said.

The federal government’s use of such data for law enforcement purposes hasn’t previously been reported.

Experts say the information amounts to one of the largest known troves of bulk data being deployed by law enforcement in the U.S.—and that the use appears to be on firm legal footing because the government buys access to it from a commercial vendor, just as a private company could, though its use hasn’t been tested in court.

“This is a classic situation where creeping commercial surveillance in the private sector is now bleeding directly over into government,” said Alan Butler, general counsel of the Electronic Privacy Information Center, a think tank that pushes for stronger privacy laws.

According to federal spending contracts, a division of DHS that creates experimental products began buying location data in 2017 from Venntel Inc. of Herndon, Va., a small company that shares several executives and patents with Gravy Analytics, a major player in the mobile-advertising world.

In 2018, ICE bought $190,000 worth of Venntel licenses. Last September, CBP bought $1.1 million in licenses for three kinds of software, including Venntel subscriptions for location data. 

The Department of Homeland Security and its components acknowledged buying access to the data, but wouldn’t discuss details about how they are using it in law-enforcement operations. People familiar with some of the efforts say it is used to generate investigative leads about possible illegal border crossings and for detecting or tracking migrant groups.

CBP has said it has privacy protections and limits on how it uses the location information. The agency says that it accesses only a small amount of the location data and that the data it does use is anonymized to protect the privacy of Americans….(More)”

International Humanitarian and Development Aid and Big Data Governance


Chapter by Andrej Zwitter: “Modern technology and innovations constantly transform the world. This also applies to humanitarian action and development aid, for example: humanitarian drones, crowd sourcing of information, or the utility of Big Data in crisis analytics and humanitarian intelligence. The acceleration of modernization in these adjacent fields can in part be attributed to new partnerships between aid agencies and new private stakeholders that increasingly become active, such as individual crisis mappers, mobile telecommunication companies, or technological SMEs.

These partnerships, however, must be described as simultaneously beneficial as well as problematic. Many private actors do not subscribe to the humanitarian principles (humanity, impartiality, independence, and neutrality), which govern UN and NGO operations, or are not even aware of them. Their interests are not solely humanitarian, but may include entrepreneurial agendas. The unregulated use of data in humanitarian intelligence has already caused negative consequences such as the exposure of sensitive data about aid agencies and of victims of disasters.

This chapter investigates the emergent governance trends around data innovation in the humanitarian and development field. It takes a look at the ways in which the field tries to regulate itself and the utility of the humanitarian principles for Big Data analytics and data-driven innovation. It will argue that it is crucially necessary to formulate principles for data governance in the humanitarian context in order to ensure the safeguarding of beneficiaries that are particularly vulnerable. In order to do that, the chapter proposes to reinterpret the humanitarian principles to accommodate the new reality of datafication of different aspects of society…(More)”.

Why It’s So Hard for Users to Control Their Data


Bhaskar Chakravorti at the Harvard Business Review: “A recent IBM study found that 81% of consumers say they have become more concerned about how their data is used online. But most users continue to hand over their data online and tick consent boxes impatiently, giving rise to a “privacy paradox,” where users’ concerns aren’t reflected in their behaviors. It’s a daunting challenge for regulators and companies alike to navigate the future of data governance.

In my view, we’re missing a system that defines and grants users digital agency” — the ability to own the rights to their personal data, manage access to this data and, potentially, be compensated fairly for such access. This would make data similar to other forms of personal property: a home, a bank account or even a mobile phone number. But before we can imagine such a state, we need to examine three central questions: Why don’t users care enough to take actions that match their concerns? What are the possible solutions? Why is this so difficult?

Why don’t users’ actions match their concerns?

To start, data is intangible. We don’t actively hand it over. As a byproduct of our online activity, it is easy to ignore or forget about. A lot of data harvesting is invisible to the consumer — they see the results in marketing offers, free services, customized feeds, tailored ads, and beyond.

Second, even if users wanted to negotiate more data agency, they have little leverage. Normally, in well-functioning markets, customers can choose from a range of competing providers. But this is not the case if the service is a widely used digital platform. For many, leaving a platform like Facebook feels like it would come at a high cost in terms of time and effort and that they have no other option for an equivalent service with connections to the same people. Plus, many people use their Facebook logins on numerous apps and services. On top of that, Facebook has bought up many of its natural alternatives, like Instagram. It’s equally hard to switch away from other major platforms, like Google or Amazon, without a lot of personal effort.

Third, while a majority of American users believe more regulation is needed, they are not as enthusiastic about broad regulatory solutions being imposed. Instead, they would prefer to have better data management tools at their disposal. However, managing one’s own data would be complex – and that would deter users from embracing such an option….(More)”.