Explore our articles
View All Results

Stefaan Verhulst

Chaya Nayak at Facebook: “In 2018, Facebook began an initiative to support independent academic research on social media’s role in elections and democracy. This first-of-its-kind project seeks to provide researchers access to privacy-preserving data sets in order to support research on these important topics.

Today, we are announcing that we have substantially increased the amount of data we’re providing to 60 academic researchers across 17 labs and 30 universities around the world. This release delivers on the commitment we made in July 2018 to share a data set that enables researchers to study information and misinformation on Facebook, while also ensuring that we protect the privacy of our users.

This new data release supplants data we released in the fall of 2019. That 2019 data set consisted of links that had been shared publicly on Facebook by at least 100 unique Facebook users. It included information about share counts, ratings by Facebook’s third-party fact-checkers, and user reporting on spam, hate speech, and false news associated with those links. We have expanded the data set to now include more than 38 million unique links with new aggregated information to help academic researchers analyze how many people saw these links on Facebook and how they interacted with that content – including views, clicks, shares, likes, and other reactions. We’ve also aggregated these shares by age, gender, country, and month. And, we have expanded the time frame covered by the data from January 2017 – February 2019 to January 2017 – August 2019.

With this data, researchers will be able to understand important aspects of how social media shapes our world. They’ll be able to make progress on the research questions they proposed, such as “how to characterize mainstream and non-mainstream online news sources in social media” and “studying polarization, misinformation, and manipulation across multiple platforms and the larger information ecosystem.”

In addition to the data set of URLs, researchers will continue to have access to CrowdTangle and Facebook’s Ad Library API to augment their analyses. Per the original plan for this project, outside of a limited review to ensure that no confidential or user data is inadvertently released, these researchers will be able to publish their findings without approval from Facebook.

We are sharing this data with researchers while continuing to prioritize the privacy of people who use our services. This new data set, like the data we released before it, is protected by a method known as differential privacy. Researchers have access to data tables from which they can learn about aggregated groups, but where they cannot identify any individual user. As Harvard University’s Privacy Tools project puts it:

“The guarantee of a differentially private algorithm is that its behavior hardly changes when a single individual joins or leaves the dataset — anything the algorithm might output on a database containing some individual’s information is almost as likely to have come from a database without that individual’s information. … This gives a formal guarantee that individual-level information about participants in the database is not leaked.” …(More)”

New privacy-protected Facebook data for independent research on social media’s impact on democracy

Rebecca Ruiz at Mashable: “Since its founding in 2013, the free mental health support service Crisis Text Line has focused on using data and technology to better aid those who reach out for help. 

Unlike helplines that offer assistance based on the order in which users dialed, texted, or messaged, Crisis Text Line has an algorithm that determines who is in most urgent need of counseling. The nonprofit is particularly interested in learning which emoji and words texters use when their suicide risk is high, so as to quickly connect them with a counselor. Crisis Text Line just released new insights about those patterns. 

Based on its analysis of 129 million messages processed between 2013 and the end of 2019, the nonprofit found that the pill emoji, or ?, was 4.4 times more likely to end in a life-threatening situation than the word suicide. 

Other words that indicate imminent danger include 800mg, acetaminophen, excedrin, and antifreeze; those are two to three times more likely than the word suicide to involve an active rescue of the texter. The loudly crying emoji face, or ?, is similarly high-risk. In general, the words that trigger the greatest alarm suggest the texter has a method or plan to attempt suicide or may be in the process of taking their own life. …(More)”.

This emoji could mean your suicide risk is high, according to AI

Paper by Payam Aminpour et al: “Sustainable management of natural resources requires adequate scientific knowledge about complex relationships between human and natural systems. Such understanding is difficult to achieve in many contexts due to data scarcity and knowledge limitations.

We explore the potential of harnessing the collective intelligence of resource stakeholders to overcome this challenge. Using a fisheries example, we show that by aggregating the system knowledge held by stakeholders through graphical mental models, a crowd of diverse resource users produces a system model of social–ecological relationships that is comparable to the best scientific understanding.

We show that the averaged model from a crowd of diverse resource users outperforms those of more homogeneous groups. Importantly, however, we find that the averaged model from a larger sample of individuals can perform worse than one constructed from a smaller sample. However, when averaging mental models within stakeholder-specific subgroups and subsequently aggregating across subgroup models, the effect is reversed. Our work identifies an inexpensive, yet robust way to develop scientific understanding of complex social–ecological systems by leveraging the collective wisdom of non-scientist stakeholders…(More)”.

Wisdom of stakeholder crowds in complex social–ecological systems

Eerke Boiten at The Guardian: “…It is clear that the black box society does not only feed on internet surveillance information. Databases collected by public bodies are becoming more and more part of the dark data economy. Last month, it emerged that a data broker in receipt of the UK’s national pupil database had shared its access with gambling companies. This is likely to be the tip of the iceberg; even where initial recipients of shared data might be checked and vetted, it is much harder to oversee who the data is passed on to from there.

Health data, the rich population-wide information held within the NHS, is another such example. Pharmaceutical companies and internet giants have been eyeing the NHS’s extensive databases for commercial exploitation for many years. Google infamously claimed it could save 100,000 lives if only it had free rein with all our health data. If there really is such value hidden in NHS data, do we really want Google to extract it to sell it to us? Google still holds health data that its subsidiary DeepMind Health obtained illegally from the NHS in 2016.

Although many health data-sharing schemes, such as in the NHS’s register of approved data releases], are said to be “anonymised”, this offers a limited guarantee against abuse.

There is just too much information included in health data that points to other aspects of patients’ lives and existence. If recipients of anonymised health data want to use it to re-identify individuals, they will often be able to do so by combining it, for example, with publicly available information. That this would be illegal under UK data protection law is a small consolation as it would be extremely hard to detect.

It is clear that providing access to public organisations’ data for research purposes can serve the greater good and it is unrealistic to expect bodies such as the NHS to keep this all in-house.

However, there are other methods by which to do this, beyond the sharing of anonymised databases. CeLSIUS, for example, a physical facility where researchers can interrogate data under tightly controlled conditions for specific registered purposes, holds UK census information over many years.

These arrangements prevent abuse, such as through deanonymisation, do not have the problem of shared data being passed on to third parties and ensure complete transparency of the use of the data. Online analogues of such set-ups do not yet exist, but that is where the future of safe and transparent access to sensitive data lies….(More)”.

Our personal health history is too valuable to be harvested by the tech giants

Thesis by Hiska Ubels: “Enduring depopulation and ageing have affected the liveability of many of the smaller villages in the more peripheral rural municipalities of the Netherlands. Combined with a general climate of austerity and structural public budget cuts, this has led to the search of both communities and local governments for solutions in which citizens take and obtain more responsibilities and higher levels of local autonomy in dealing with local liveability challenges.

This PhD-thesis explores how novel forms of governance with high levels of civic self-reliance can be understood from the perspectives of the involved residents, local governments and the supposed beneficiaries. It also discusses the dynamics, potentials and limitations that come to the fore. To achieve this, firstly, it focusses on the development of role shifts of responsibilities and decision-making power between local governments and citizens in experimental governance initiatives over time and the main factors that enhance and obstruct higher levels of civic autonomy. Then it investigates the influence of government involvement on a civic initiatives’ organisation structure and governance process, and by doing so on the key conditions of its civic self-steering capacity. In addition, it examines how novel governance forms with citizens in the lead are experienced by the community members to whose community liveability they are supposed to contribute. Lastly, it explores the reasons why citizens do not engage in such initiatives….(More)”.

Novel forms of governance with high levels of civic self-reliance

Paper by Rabia I.Kodapanakka, lMark J.Brandt, Christoph Kogler, and Iljavan Beest: “Big data technologies have both benefits and costs which can influence their adoption and moral acceptability. Prior studies look at people’s evaluations in isolation without pitting costs and benefits against each other. We address this limitation with a conjoint experiment (N = 979), using six domains (criminal investigations, crime prevention, citizen scores, healthcare, banking, and employment), where we simultaneously test the relative influence of four factors: the status quo, outcome favorability, data sharing, and data protection on decisions to adopt and perceptions of moral acceptability of the technologies.

We present two key findings. (1) People adopt technologies more often when data is protected and when outcomes are favorable. They place equal or more importance on data protection in all domains except healthcare where outcome favorability has the strongest influence. (2) Data protection is the strongest driver of moral acceptability in all domains except healthcare, where the strongest driver is outcome favorability. Additionally, sharing data lowers preference for all technologies, but has a relatively smaller influence. People do not show a status quo bias in the adoption of technologies. When evaluating moral acceptability, people show a status quo bias but this is driven by the citizen scores domain. Differences across domains arise from differences in magnitude of the effects but the effects are in the same direction. Taken together, these results highlight that people are not always primarily driven by self-interest and do place importance on potential privacy violations. They also challenge the assumption that people generally prefer the status quo….(More)”.

Self-interest and data protection drive the adoption and moral acceptability of big data technologies: A conjoint analysis approach

Paper by Yoonsang Kim, Rachel Nordgren and Sherry Emery: “Public health and social science increasingly use Twitter for behavioral and marketing surveillance. However, few studies provide sufficient detail about Twitter data collection to allow either direct comparisons between studies or to support replication.

The three primary application programming interfaces (API) of Twitter data sources are Streaming, Search, and Firehose. To date, no clear guidance exists about the advantages and limitations of each API, or about the comparability of the amount, content, and user accounts of retrieved tweets from each API. Such information is crucial to the validity, interpretation, and replicability of research findings.

This study examines whether tweets collected using the same search filters over the same time period, but calling different APIs, would retrieve comparable datasets. We collected tweets about anti-smoking, e-cigarettes, and tobacco using the aforementioned APIs. The retrieved tweets largely overlapped between three APIs, but each also retrieved unique tweets, and the extent of overlap varied over time and by topic, resulting in different trends and potentially supporting diverging inferences. Researchers need to understand how different data sources can influence both the amount, content, and user accounts of data they retrieve from social media, in order to assess the implications of their choice of data source…(More)”.

The Story of Goldilocks and Three Twitter’s APIs: A Pilot Study on Twitter Data Sources and Disclosure


Paper by Peter Dabrock: “Ethical considerations and governance approaches of AI are at a crossroads. Either one tries to convey the impression that one can bring back a status quo ante of our given “onlife”-era, or one accepts to get responsibly involved in a digital world in which informational self-determination can no longer be safeguarded and fostered through the old fashioned data protection principles of informed consent, purpose limitation and data economy. The main focus of the talk is on how under the given conditions of AI and machine learning, data sovereignty (interpreted as controllability [not control (!)] of the data subject over the use of her data throughout the entire data processing cycle) can be strengthened without hindering innovation dynamics of digital economy and social cohesion of fully digitized societies. In order to put this approach into practice the talk combines a presentation of the concept of data sovereignty put forward by the German Ethics Council with recent research trends in effectively applying the AI ethics principles of explainability and enforceability…(More)”.

How to Put the Data Subject's Sovereignty into Practice. Ethical Considerations and Governance Perspectives

Book by Srikanta Patnaik, Siddhartha Sen and Magdi S. Mahmoud: “This book offers a transdisciplinary perspective on the concept of “smart villages” Written by an authoritative group of scholars, it discusses various aspects that are essential to fostering the development of successful smart villages. Presenting cutting-edge technologies, such as big data and the Internet-of-Things, and showing how they have been successfully applied to promote rural development, it also addresses important policy and sustainability issues. As such, this book offers a timely snapshot of the state-of-the-art in smart village research and practice….(More)”.

Smart Village Technology

Greg Bensinger in the Washington Post: “For more than 70 years, India and Pakistan have waged sporadic and deadly skirmishes over control of the mountainous region of Kashmir. Tens of thousands have died in the conflict, including three just this month.

Both sides claim the Himalayan outpost as their own, but Web surfers in India could be forgiven for thinking the dispute is all but settled: The borders on Google’s online maps there display Kashmir as fully under Indian control. Elsewhere, users see the region’s snaking outlines as a dotted line, acknowledging the dispute.

Google’s corporate mission is “to organize the world’s information,” but it also bends it to its will. From Argentina to the United Kingdom to Iran, the world’s borders look different depending on where you’re viewing them from. That’s because Google — and other online mapmakers — simply change them.

With some 80 percent market share in mobile maps and over a billion users, Google Maps has an outsize impact on people’s perception of the world — from driving directions to restaurant reviews to naming attractions to adjudicating historical border wars.

And while maps are meant to bring order to the world, the Silicon Valley firm’s decision-making on maps is often shrouded in secrecy, even to some of those who work to shape its digital atlases every day. It is influenced not just by history and local laws, but also the shifting whims of diplomats, policymakers and its own executives, say people familiar with the matter, who asked not to be identified because they weren’t authorized to discuss internal processes….(More)”.

Google redraws the borders on maps depending on who’s looking

Get the latest news right in you inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday