Stefaan Verhulst
Lee Rainie and Janna Anderson at Pew Research Center: “The years of almost unfettered enthusiasm about the benefits of the internet have been followed by a period of techlash as users worry about the actors who exploit the speed, reach and complexity of the internet for harmful purposes. Over the past four years – a time of the Brexit decision in the United Kingdom, the American presidential election and a variety of other elections – the digital disruption of democracy has been a leading concern.
The hunt for remedies is at an early stage. Resistance to American-based big tech firms is increasingly evident, and some tech pioneers have joined the chorus. Governments are actively investigating technology firms, and some tech firms themselves are requesting government regulation. Additionally, nonprofit organizations and foundations are directing resources toward finding the best strategies for coping with the harmful effects of disruption. For example, the Knight Foundation announced in 2019 that it is awarding $50 million in grants to encourage the development of a new field of research centered on technology’s impact on democracy.
In light of this furor, Pew Research Center and Elon University’s Imagining the Internet Center canvassed technology experts in the summer of 2019 to gain their insights about the potential future effects of people’s use of technology on democracy….
The main themes found in an analysis of the experts’ comments are outlined in the next two tables….(More)”.


Paper by Darcy W E Allen and Chris Berg: ” Understanding the considerations and complexities of blockchain governance is urgent. The aim of this paper is to draw on institutional governance theory — including corporate governance — to provide insights into the core considerations in designing blockchain governance mechanisms.
We define blockchain governance are the processes by which stakeholders (those who are affected by and can affect the network) exercise bargaining power over the network. The main considerations include how we define stakeholders in blockchain governance, how the consensus mechanism itself distributes endogenous bargaining power between those stakeholders, the role of exogenous governance mechanisms and institutional frameworks, and the needs for bootstrapping. While we can learn from corporate and internet governance, blockchain governance should be understood as being an institutionally distinct organisational form with distinct governance systems….(More)”.
Barend Mons at Nature: “It is irresponsible to support research but not data stewardship…
Many of the world’s hardest problems can be tackled only with data-intensive, computer-assisted research. And I’d speculate that the vast majority of research data are never published. Huge sums of taxpayer funds go to waste because such data cannot be reused. Policies for data reuse are falling into place, but fixing the situation will require more resources than the scientific community is willing to face.
In 2013, I was part of a group of Dutch experts from many disciplines that called on our national science funder to support data stewardship. Seven years later, policies that I helped to draft are starting to be put into practice. These require data created by machines and humans to meet the FAIR principles (that is, they are findable, accessible, interoperable and reusable). I now direct an international Global Open FAIR office tasked with helping communities to implement the guidelines, and I am convinced that doing so will require a large cadre of professionals, about one for every 20 researchers.
Even when data are shared, the metadata, expertise, technologies and infrastructure necessary for reuse are lacking. Most published data sets are scattered into ‘supplemental files’ that are often impossible for machines or even humans to find. These and other sloppy data practices keep researchers from building on each other’s work. In cases of disease outbreaks, for instance, this might even cost lives….(More)”.
David Eaves, and Naeha Rashid in Policy Options: “A few weeks ago, members of the Nexus traveller identification program were notified that Canadian Border Services is upgrading its automated system, from iris scanners to facial recognition technology. This is meant to simplify identification and increase efficiency without compromising security. But it also raises profound questions concerning how we discuss and develop public policies around such technology – questions that may not be receiving sufficiently open debate in the rush toward promised greater security.
Analogous to the U.S. Customs and Border Protection (CBP) program Global Entry, Nexus is a joint Canada-US border control system designed for low-risk, pre-approved travellers. Nexus does provide a public good, and there are valid reasons to improve surveillance at airports. Even before 9/11, border surveillance was an accepted annoyance and since then, checkpoint operations have become more vigilant and complex in response to the public demand for safety.
Nexus is one of the first North America government-sponsored services to adopt facial recognition, and as such it could be a pilot program that other services will follow. Left unchecked, the technology will likely become ubiquitous at North American border crossings within the next decade, and it will probably be adopted by governments to solve domestic policy challenges.
Facial recognition software is imperfect and has documented bias, but it will continue to improve and become superior to humans in identifying individuals. Given this, questions arise such as, what policies guide the use of this technology? What policies should inform future government use? In our headlong rush toward enhanced security, we risk replicating the justification the used by the private sector in an attempt to balance effectiveness, efficiency and privacy.
One key question involves citizens’ capacity to consent. Previously, Nexus members submitted to fingerprint and retinal scans – biometric markers that are relatively unique and enable government to verify identity at the border. Facial recognition technology uses visual data and seeks, analyzes, and stores identifying facial information in a database, which is then used to compare with new images and video….(More)”.
Paper by Luca Maria Aiello, Daniele Quercia, Rossano Schifanella & Lucia Del Prete: “We present the Tesco Grocery 1.0 dataset: a record of 420 M food items purchased by 1.6 M fidelity card owners who shopped at the 411 Tesco stores in Greater London over the course of the entire year of 2015, aggregated at the level of census areas to preserve anonymity. For each area, we report the number of transactions and nutritional properties of the typical food item bought including the average caloric intake and the composition of nutrients.
The set of global trade international numbers (barcodes) for each food type is also included. To establish data validity we: i) compare food purchase volumes to population from census to assess representativeness, and ii) match nutrient and energy intake to official statistics of food-related illnesses to appraise the extent to which the dataset is ecologically valid. Given its unprecedented scale and geographic granularity, the data can be used to link food purchases to a number of geographically-salient indicators, which enables studies on health outcomes, cultural aspects, and economic factors….(More)”.
Article by Viola Zhou: “On Valentine’s Day, a 36-year-old lawyer Matt Ma in the eastern Chinese province of Zhejiang discovered he had been coded “red”.The colour, displayed in a payment app on his smartphone, indicated that he needed to be quarantined at home even though he had no symptoms of the dangerous coronavirus.
Without a green light from the system, Ma could not travel from his ancestral hometown of Lishui to his new home city of Hangzhou, which is now surrounded by checkpoints set up to contain the epidemic.
Ma is one of the millions of people whose movements are being choreographed by the government through software that feeds on troves of data and issues orders that effectively dictate whether they must stay in or can go to work.Their experience represents a slice of China’s desperate attempt to stop the coronavirus by using a mixed bag of cutting-edge technologies and old-fashioned surveillance. It was also a rare real-world test of the use of technology on a large scale to halt the spread of communicable diseases.
“This kind of massive use of technology is unprecedented,” said Christos Lynteris, a medical anthropologist at the University of St Andrews who has studied epidemics in China.
But Hangzhou’s experiment has also revealed the pitfalls of applying opaque formulas to a large population.
In the city’s case, there are reports of people being marked incorrectly, falling victim to an algorithm that is, by the government’s own admission, not perfect….(More)”.
Essay by Khaled El Emam: “The application of artificial intelligence and machine learning to solve today’s problems requires access to large amounts of data. One of the key obstacles faced by analysts is access to this data (for example, these issues were reflected in reports from the General Accountability Office and the McKinsey Institute).
Synthetic data can help solve this data problem in a privacy preserving manner.
What is synthetic data ?
Data synthesis is an emerging privacy-enhancing technology that can enable access to realistic data, which is information that may be synthetic, but has the properties of an original dataset. It also simultaneously ensures that such information can be used and disclosed with reduced obligations under contemporary privacy statutes. Synthetic data retains the statistical properties of the original data. Therefore, there are an increasing number of use cases where it would serve as a proxy for real data.
Synthetic data is created by taking an original (real) dataset and then building a model to characterize the distributions and relationships in that data — this is called the “synthesizer.” The synthesizer is typically an artificial neural network or other machine learning technique that learns these (original) data characteristics. Once that model is created, it can be used to generate synthetic data. The data is generated from the model and does not have a 1:1 mapping to real data, meaning that the likelihood of mapping the synthetic records to real individuals would be very small — it is not considered personal information.
Many different types of data can be synthesized, including images, video, audio, text and structured data. The main focus in this article is on the synthesis of structured data.
Even though data can be generated in this manner, that does not mean it cannot be personal information. If the synthesizer is overfit to real data, then the generated data will replicate the original real data. Therefore, the synthesizer has to be constructed in a manner to avoid such overfitting. A formal privacy assurance should also be performed on the synthesized data to validate that there is a weak mapping between synthetic records to individuals….(More)”.
Paper by Palotti et al: “Venezuela is going through the worst economical, political and social crisis in its modern history. Basic products like food or medicine are scarce and hyperinflation is combined with economic depression. This situation is creating an unprecedented refugee and migrant crisis in the region. Governments and international agencies have not been able to consistently leverage reliable information using traditional methods. Therefore, to organize and deploy any kind of humanitarian response, it is crucial to evaluate new methodologies to measure the number and location of Venezuelan refugees and migrants across Latin America.
In this paper, we propose to use Facebook’s advertising platform as an additional data source for monitoring the ongoing crisis. We estimate and validate national and sub-national numbers of refugees and migrants and break-down their socio-economic profiles to further understand the complexity of the phenomenon. Although limitations exist, we believe that the presented methodology can be of value for real-time assessment of refugee and migrant crises world-wide….(More)”.
Gabriel M Leung and Kathy Leung at The Lancet: “Coronavirus disease 2019 (COVID-19) has spread with unprecedented speed and scale since the first zoonotic event that introduced the causative virus—severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)—into humans, probably during November, 2019, according to phylogenetic analyses suggesting the most recent common ancestor of the sequenced genomes emerged between Oct 23, and Dec 16, 2019. The reported cumulative number of confirmed patients worldwide already exceeds 70 000 in almost 30 countries and territories as of Feb 19, 2020, although that the actual number of infections is likely to far outnumber this case count.
During any novel emerging epidemic, let alone one with such magnitude and speed of global spread, a first task is to put together a line list of suspected, probable, and confirmed individuals on the basis of working criteria of the respective case definitions. This line list would allow for quick preliminary assessment of epidemic growth and potential for spread, evidence-based determination of the period of quarantine and isolation, and monitoring of efficiency of detection of potential cases. Frequent refreshing of the line list would further enable real-time updates as more clinical, epidemiological, and virological (including genetic) knowledge become available as the outbreak progresses….
We surveyed different and varied sources of possible line lists for COVID-19 (appendix pp 1–4). A bottleneck remains in carefully collating as much relevant data as possible, sifting through and verifying these data, extracting intelligence to forecast and inform outbreak strategies, and thereafter repeating this process in iterative cycles to monitor and evaluate progress. A possible methodological breakthrough would be to develop and validate algorithms for automated bots to search through cyberspaces of all sorts, by text mining and natural language processing (in languages not limited to English) to expedite these processes.In this era of smartphone and their accompanying applications, the authorities are required to combat not only the epidemic per se, but perhaps an even more sinister outbreak of fake news and false rumours, a so-called infodemic…(More)”.
Press Release: “The European Data Portal publishes its study “The Economic Impact of Open Data: Opportunities for value creation in Europe”. It researches the value created by open data in Europe. It is the second study by the European Data Portal, following the 2015 report. The open data market size is estimated at €184 billion and forecast to reach between €199.51 and €334.21 billion in 2025. The report additionally considers how this market size is distributed along different sectors and how many people are employed due to open data. The efficiency gains from open data, such as potential lives saved, time saved, environmental benefits, and improvement of language services, as well as associated potential costs savings are explored and quantified where possible. Finally, the report also considers examples and insights from open data re-use in organisations. The key findings of the report are summarised below:
- The specification and implementation of high-value datasets as part of the new Open Data Directive is a promising opportunity to address quality & quantity demands of open data.
- Addressing quality & quantity demands is important, yet not enough to reach the full potential of open data.
- Open data re-users have to be aware and capable of understanding and leveraging the potential.
- Open data value creation is part of the wider challenge of skill and process transformation: a lengthy process whose change and impact are not always easy to observe and measure.
- Sector-specific initiatives and collaboration in and across private and public sector foster value creation.
- Combining open data with personal, shared, or crowdsourced data is vital for the realisation of further growth of the open data market.
- For different challenges, we must explore and improve multiple approaches of data re-use that are ethical, sustainable, and fit-for-purpose….(More)”.
