Automation in Moderation


Article by Hannah Bloch-Wehba: “This Article assesses recent efforts to compel or encourage online platforms to use automated means to prevent the dissemination of unlawful online content before it is ever seen or distributed. As lawmakers in Europe and around the world closely scrutinize platforms’ “content moderation” practices, automation and artificial intelligence appear increasingly attractive options for ridding the Internet of many kinds of harmful online content, including defamation, copyright infringement, and terrorist speech. Proponents of these initiatives suggest that requiring platforms to screen user content using automation will promote healthier online discourse and will aid efforts to limit Big Tech’s power.

In fact, however, the regulations that incentivize platforms to use automation in content moderation come with unappreciated costs for civil liberties and unexpected benefits for platforms. The new automation techniques exacerbate existing risks to free speech and user privacy and create ripe new sources of information for surveillance, aggravating threats to free expression, associational rights, religious freedoms, and equality. Automation also worsens transparency and accountability deficits. Far from curtailing private power, the new regulations endorse and expand platform authority to police online speech, with little in the way of oversight and few countervailing checks. New regulations of online intermediaries should therefore incorporate checks on the use of automation to avoid exacerbating these dynamics. Carefully drawn transparency obligations, algorithmic accountability mechanisms, and procedural safeguards can help to ameliorate the effects of these regulations on users and competition…(More)”.

Many Tech Experts Say Digital Disruption Will Hurt Democracy


Lee Rainie and Janna Anderson at Pew Research Center: “The years of almost unfettered enthusiasm about the benefits of the internet have been followed by a period of techlash as users worry about the actors who exploit the speed, reach and complexity of the internet for harmful purposes. Over the past four years – a time of the Brexit decision in the United Kingdom, the American presidential election and a variety of other elections – the digital disruption of democracy has been a leading concern.

The hunt for remedies is at an early stage. Resistance to American-based big tech firms is increasingly evident, and some tech pioneers have joined the chorus. Governments are actively investigating technology firms, and some tech firms themselves are requesting government regulation. Additionally, nonprofit organizations and foundations are directing resources toward finding the best strategies for coping with the harmful effects of disruption. For example, the Knight Foundation announced in 2019 that it is awarding $50 million in grants to encourage the development of a new field of research centered on technology’s impact on democracy.

In light of this furor, Pew Research Center and Elon University’s Imagining the Internet Center canvassed technology experts in the summer of 2019 to gain their insights about the potential future effects of people’s use of technology on democracy….

The main themes found in an analysis of the experts’ comments are outlined in the next two tables….(More)”.

Blockchain Governance: What We Can Learn From the Economics of Corporate Governance


Paper by Darcy W E Allen and Chris Berg: ” Understanding the considerations and complexities of blockchain governance is urgent. The aim of this paper is to draw on institutional governance theory — including corporate governance — to provide insights into the core considerations in designing blockchain governance mechanisms.

We define blockchain governance are the processes by which stakeholders (those who are affected by and can affect the network) exercise bargaining power over the network. The main considerations include how we define stakeholders in blockchain governance, how the consensus mechanism itself distributes endogenous bargaining power between those stakeholders, the role of exogenous governance mechanisms and institutional frameworks, and the needs for bootstrapping. While we can learn from corporate and internet governance, blockchain governance should be understood as being an institutionally distinct organisational form with distinct governance systems….(More)”.

Invest 5% of research funds in ensuring data are reusable


Barend Mons at Nature: “It is irresponsible to support research but not data stewardship…

Many of the world’s hardest problems can be tackled only with data-intensive, computer-assisted research. And I’d speculate that the vast majority of research data are never published. Huge sums of taxpayer funds go to waste because such data cannot be reused. Policies for data reuse are falling into place, but fixing the situation will require more resources than the scientific community is willing to face.

In 2013, I was part of a group of Dutch experts from many disciplines that called on our national science funder to support data stewardship. Seven years later, policies that I helped to draft are starting to be put into practice. These require data created by machines and humans to meet the FAIR principles (that is, they are findable, accessible, interoperable and reusable). I now direct an international Global Open FAIR office tasked with helping communities to implement the guidelines, and I am convinced that doing so will require a large cadre of professionals, about one for every 20 researchers.

Even when data are shared, the metadata, expertise, technologies and infrastructure necessary for reuse are lacking. Most published data sets are scattered into ‘supplemental files’ that are often impossible for machines or even humans to find. These and other sloppy data practices keep researchers from building on each other’s work. In cases of disease outbreaks, for instance, this might even cost lives….(More)”.

Facial Recognition Software requires Checks and Balances


David Eaves,  and Naeha Rashid in Policy Options: “A few weeks ago, members of the Nexus traveller identification program were notified that Canadian Border Services is upgrading its automated system, from iris scanners to facial recognition technology. This is meant to simplify identification and increase efficiency without compromising security. But it also raises profound questions concerning how we discuss and develop public policies around such technology – questions that may not be receiving sufficiently open debate in the rush toward promised greater security.

Analogous to the U.S. Customs and Border Protection (CBP) program Global Entry, Nexus is a joint Canada-US border control system designed for low-risk, pre-approved travellers. Nexus does provide a public good, and there are valid reasons to improve surveillance at airports. Even before 9/11, border surveillance was an accepted annoyance and since then, checkpoint operations have become more vigilant and complex in response to the public demand for safety.

Nexus is one of the first North America government-sponsored services to adopt facial recognition, and as such it could be a pilot program that other services will follow. Left unchecked, the technology will likely become ubiquitous at North American border crossings within the next decade, and it will probably be adopted by governments to solve domestic policy challenges.

Facial recognition software is imperfect and has documented bias, but it will continue to improve and become superior to humans in identifying individuals. Given this, questions arise such as, what policies guide the use of this technology? What policies should inform future government use? In our headlong rush toward enhanced security, we risk replicating the justification the used by the private sector in an attempt to balance effectiveness, efficiency and privacy.

One key question involves citizens’ capacity to consent. Previously, Nexus members submitted to fingerprint and retinal scans – biometric markers that are relatively unique and enable government to verify identity at the border. Facial recognition technology uses visual data and seeks, analyzes, and stores identifying facial information in a database, which is then used to compare with new images and video….(More)”.

Tesco Grocery 1.0, a large-scale dataset of grocery purchases in London


Paper by Luca Maria Aiello, Daniele Quercia, Rossano Schifanella & Lucia Del Prete: “We present the Tesco Grocery 1.0 dataset: a record of 420 M food items purchased by 1.6 M fidelity card owners who shopped at the 411 Tesco stores in Greater London over the course of the entire year of 2015, aggregated at the level of census areas to preserve anonymity. For each area, we report the number of transactions and nutritional properties of the typical food item bought including the average caloric intake and the composition of nutrients.

The set of global trade international numbers (barcodes) for each food type is also included. To establish data validity we: i) compare food purchase volumes to population from census to assess representativeness, and ii) match nutrient and energy intake to official statistics of food-related illnesses to appraise the extent to which the dataset is ecologically valid. Given its unprecedented scale and geographic granularity, the data can be used to link food purchases to a number of geographically-salient indicators, which enables studies on health outcomes, cultural aspects, and economic factors….(More)”.

How big data is dividing the public in China’s coronavirus fight – green, yellow, red


Article by Viola Zhou: “On Valentine’s Day, a 36-year-old lawyer Matt Ma in the eastern Chinese province of Zhejiang discovered he had been coded “red”.The colour, displayed in a payment app on his smartphone, indicated that he needed to be quarantined at home even though he had no symptoms of the dangerous coronavirus.

Without a green light from the system, Ma could not travel from his ancestral hometown of Lishui to his new home city of Hangzhou, which is now surrounded by checkpoints set up to contain the epidemic.

Ma is one of the millions of people whose movements are being choreographed by the government through software that feeds on troves of data and issues orders that effectively dictate whether they must stay in or can go to work.Their experience represents a slice of China’s desperate attempt to stop the coronavirus by using a mixed bag of cutting-edge technologies and old-fashioned surveillance. It was also a rare real-world test of the use of technology on a large scale to halt the spread of communicable diseases.

“This kind of massive use of technology is unprecedented,” said Christos Lynteris, a medical anthropologist at the University of St Andrews who has studied epidemics in China.

But Hangzhou’s experiment has also revealed the pitfalls of applying opaque formulas to a large population.

In the city’s case, there are reports of people being marked incorrectly, falling victim to an algorithm that is, by the government’s own admission, not perfect….(More)”.

Accelerating AI with synthetic data


Essay by Khaled El Emam: “The application of artificial intelligence and machine learning to solve today’s problems requires access to large amounts of data. One of the key obstacles faced by analysts is access to this data (for example, these issues were reflected in reports from the General Accountability Office and the McKinsey Institute).

Synthetic data can help solve this data problem in a privacy preserving manner.

What is synthetic data ?

Data synthesis is an emerging privacy-enhancing technology that can enable access to realistic data, which is information that may be synthetic, but has the properties of an original dataset. It also simultaneously ensures that such information can be used and disclosed with reduced obligations under contemporary privacy statutes. Synthetic data retains the statistical properties of the original data. Therefore, there are an increasing number of use cases where it would serve as a proxy for real data.

Synthetic data is created by taking an original (real) dataset and then building a model to characterize the distributions and relationships in that data — this is called the “synthesizer.” The synthesizer is typically an artificial neural network or other machine learning technique that learns these (original) data characteristics. Once that model is created, it can be used to generate synthetic data. The data is generated from the model and does not have a 1:1 mapping to real data, meaning that the likelihood of mapping the synthetic records to real individuals would be very small — it is not considered personal information.

Many different types of data can be synthesized, including images, video, audio, text and structured data. The main focus in this article is on the synthesis of structured data.

Even though data can be generated in this manner, that does not mean it cannot be personal information. If the synthesizer is overfit to real data, then the generated data will replicate the original real data. Therefore, the synthesizer has to be constructed in a manner to avoid such overfitting. A formal privacy assurance should also be performed on the synthesized data to validate that there is a weak mapping between synthetic records to individuals….(More)”.

Monitoring of the Venezuelan exodus through Facebook’s advertising platform


Paper by Palotti et al: “Venezuela is going through the worst economical, political and social crisis in its modern history. Basic products like food or medicine are scarce and hyperinflation is combined with economic depression. This situation is creating an unprecedented refugee and migrant crisis in the region. Governments and international agencies have not been able to consistently leverage reliable information using traditional methods. Therefore, to organize and deploy any kind of humanitarian response, it is crucial to evaluate new methodologies to measure the number and location of Venezuelan refugees and migrants across Latin America.

In this paper, we propose to use Facebook’s advertising platform as an additional data source for monitoring the ongoing crisis. We estimate and validate national and sub-national numbers of refugees and migrants and break-down their socio-economic profiles to further understand the complexity of the phenomenon. Although limitations exist, we believe that the presented methodology can be of value for real-time assessment of refugee and migrant crises world-wide….(More)”.

Crowdsourcing data to mitigate epidemics


Gabriel M Leung and Kathy Leung at The Lancet: “Coronavirus disease 2019 (COVID-19) has spread with unprecedented speed and scale since the first zoonotic event that introduced the causative virus—severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)—into humans, probably during November, 2019, according to phylogenetic analyses suggesting the most recent common ancestor of the sequenced genomes emerged between Oct 23, and Dec 16, 2019. The reported cumulative number of confirmed patients worldwide already exceeds 70 000 in almost 30 countries and territories as of Feb 19, 2020, although that the actual number of infections is likely to far outnumber this case count.

During any novel emerging epidemic, let alone one with such magnitude and speed of global spread, a first task is to put together a line list of suspected, probable, and confirmed individuals on the basis of working criteria of the respective case definitions. This line list would allow for quick preliminary assessment of epidemic growth and potential for spread, evidence-based determination of the period of quarantine and isolation, and monitoring of efficiency of detection of potential cases. Frequent refreshing of the line list would further enable real-time updates as more clinical, epidemiological, and virological (including genetic) knowledge become available as the outbreak progresses….

We surveyed different and varied sources of possible line lists for COVID-19 (appendix pp 1–4). A bottleneck remains in carefully collating as much relevant data as possible, sifting through and verifying these data, extracting intelligence to forecast and inform outbreak strategies, and thereafter repeating this process in iterative cycles to monitor and evaluate progress. A possible methodological breakthrough would be to develop and validate algorithms for automated bots to search through cyberspaces of all sorts, by text mining and natural language processing (in languages not limited to English) to expedite these processes.In this era of smartphone and their accompanying applications, the authorities are required to combat not only the epidemic per se, but perhaps an even more sinister outbreak of fake news and false rumours, a so-called infodemic…(More)”.