The plan to mine the world’s research papers


Priyanka Pulla in Nature: “Carl Malamud is on a crusade to liberate information locked up behind paywalls — and his campaigns have scored many victories. He has spent decades publishing copyrighted legal documents, from building codes to court records, and then arguing that such texts represent public-domain law that ought to be available to any citizen online. Sometimes, he has won those arguments in court. Now, the 60-year-old American technologist is turning his sights on a new objective: freeing paywalled scientific literature. And he thinks he has a legal way to do it.

Over the past year, Malamud has — without asking publishers — teamed up with Indian researchers to build a gigantic store of text and images extracted from 73 million journal articles dating from 1847 up to the present day. The cache, which is still being created, will be kept on a 576-terabyte storage facility at Jawaharlal Nehru University (JNU) in New Delhi. “This is not every journal article ever written, but it’s a lot,” Malamud says. It’s comparable to the size of the core collection in the Web of Science database, for instance. Malamud and his JNU collaborator, bioinformatician Andrew Lynn, call their facility the JNU data depot.

No one will be allowed to read or download work from the repository, because that would breach publishers’ copyright. Instead, Malamud envisages, researchers could crawl over its text and data with computer software, scanning through the world’s scientific literature to pull out insights without actually reading the text.

The unprecedented project is generating much excitement because it could, for the first time, open up vast swathes of the paywalled literature for easy computerized analysis. Dozens of research groups already mine papers to build databases of genes and chemicals, map associations between proteins and diseases, and generate useful scientific hypotheses. But publishers control — and often limit — the speed and scope of such projects, which typically confine themselves to abstracts, not full text. Researchers in India, the United States and the United Kingdom are already making plans to use the JNU store instead. Malamud and Lynn have held workshops at Indian government laboratories and universities to explain the idea. “We bring in professors and explain what we are doing. They get all excited and they say, ‘Oh gosh, this is wonderful’,” says Malamud.

But the depot’s legal status isn’t yet clear. Malamud, who contacted several intellectual-property (IP) lawyers before starting work on the depot, hopes to avoid a lawsuit. “Our position is that what we are doing is perfectly legal,” he says. For the moment, he is proceeding with caution: the JNU data depot is air-gapped, meaning that no one can access it from the Internet. Users have to physically visit the facility, and only researchers who want to mine for non-commercial purposes are currently allowed in. Malamud says his team does plan to allow remote access in the future. “The hope is to do this slowly and deliberately. We are not throwing this open right away,” he says….(More)”.

The Governance Turn in Information Privacy Law


Paper by Jane K. Winn: “The governance turn in information privacy law is a turn away from a model of bureaucratic administration of individual control rights and toward a model of collaborative governance of shared interests in information. Collaborative information governance has roots in the American pragmatic philosophy of Peirce, James and Dewey and the 1973 HEW Report that rejected unilateral individual control rights, recognizing instead the essential characteristic of mutuality of shared purposes that are mediated through information governance. America’s current information privacy law regime consists of market mechanisms supplemented by sector-specific, risk-based laws designed to foster a culture of compliance. Prior to the GDPR, data protection law compliance in Europe was more honored in the breach than the observance, so the EU’s strengthening of its bureaucratic individual control rights model reveals more about the EU’s democratic deficit than a commitment to compliance.

The conventional “Europe good, America bad” wisdom about information privacy law obscures a paradox: if the focus shifts from what “law in the books” says to what “law in action” does, it quickly becomes apparent that American businesses lead the world with their efforts to comply with information privacy law, so “America good, Europe bad” might be more accurate. Creating a federal legislative interface through which regulators and voluntary, consensus standards organizations can collaborate could break the current political stalemate triggered by California’s 2018 EU-style information privacy law. Such a pragmatic approach to information governance can safeguard Americans’ continued access to the benefits of innovation and economic growth as well as providing risk-based protection from harm. America can preserve its leadership of the global information economy by rejecting EU-style information privacy laws and building instead a flexible, dynamic framework of information governance capable of addressing both privacy and disclosure issues simultaneously….(More)”.

Proposal for an International Taxonomy on the Various Forms of the ‘Right to Be Forgotten’: A Study on the Convergence of Norms


Paper by W. Gregory Voss and Céline Castets-Renard: “The term “right to be forgotten” is used today to represent a multitude of rights, and this fact causes difficulties in interpretation, analysis, and comprehension of such rights. These rights have become of utmost importance due to the increased risks to the privacy of individuals on the Internet, where social media, blogs, fora, and other outlets have entered into common use as part of human expression. Search engines, as Internet intermediaries, have been enrolled to assist in the attempt to regulate the Internet, and the rights falling under the moniker of the “right to be forgotten,” without truly knowing the extent of the related rights. In part to alleviate such problems, and focusing on digital technology and media, this paper proposes a taxonomy to identify various rights from different countries, which today are often regrouped under the banner “right to be forgotten,” and to do so in an understandable and coherent way. As an integral part of this exercise, this study aims to measure the extent to which there is a convergence of legal rules internationally in order to regulate private life on the Internet and to elucidate the impact that the important Google Spain “right to be forgotten” ruling of the Court of Justice of the European Union has had on law in other jurisdictions on this matter.

This paper will first introduce the definition and context of the “right to be forgotten.” Second, it will trace some of the sources of the rights discussed around the world to survey various forms of the “right to be forgotten” internationally and propose a taxonomy. This work will allow for a determination on whether there is a convergence of norms regarding the “right to be forgotten” and, more generally, with respect to privacy and personal data protection laws. Finally, this paper will provide certain criteria for the relevant rights and organize them into a proposed analytical grid to establish more precisely the proposed taxonomy of the “right to be forgotten” for the use of scholars, practitioners, policymakers, and students alike….(More)”.

Foundations of Information Ethics


Book by John T. F. Burgess and Emily J. M. Knox: “As discussions about the roles played by information in economic, political, and social arenas continue to evolve, the need for an intellectual primer on information ethics that also functions as a solid working casebook for LIS students and professionals has never been more urgent. This text, written by a stellar group of ethics scholars and contributors from around the globe, expertly fills that need. Organized into twelve chapters, making it ideal for use by instructors, this volume from editors Burgess and Knox

  • thoroughly covers principles and concepts in information ethics, as well as the history of ethics in the information professions;
  • examines human rights, information access, privacy, discourse, intellectual property, censorship, data and cybersecurity ethics, intercultural information ethics, and global digital citizenship and responsibility;
  • synthesizes the philosophical underpinnings of these key subjects with abundant primary source material to provide historical context along with timely and relevant case studies;
  • features contributions from John M. Budd, Paul T. Jaeger, Rachel Fischer, Margaret Zimmerman, Kathrine A. Henderson, Peter Darch, Michael Zimmer, and Masooda Bashir, among others; and
  • offers a special concluding chapter by Amelia Gibson that explores emerging issues in information ethics, including discussions ranging from the ethics of social media and social movements to AI decision making…(More)”.

Data & Policy: A new venue to study and explore policy–data interaction


Opening editorial by Stefaan G. Verhulst, Zeynep Engin and Jon Crowcroft: “…Policy–data interactions or governance initiatives that use data have been the exception rather than the norm, isolated prototypes and trials rather than an indication of real, systemic change. There are various reasons for the generally slow uptake of data in policymaking, and several factors will have to change if the situation is to improve. ….

  • Despite the number of successful prototypes and small-scale initiatives, policy makers’ understanding of data’s potential and its value proposition generally remains limited (Lutes, 2015). There is also limited appreciation of the advances data science has made the last few years. This is a major limiting factor; we cannot expect policy makers to use data if they do not recognize what data and data science can do.
  • The recent (and justifiable) backlash against how certain private companies handle consumer data has had something of a reverse halo effect: There is a growing lack of trust in the way data is collected, analyzed, and used, and this often leads to a certain reluctance (or simply risk-aversion) on the part of officials and others (Engin, 2018).
  • Despite several high-profile open data projects around the world, much (probably the majority) of data that could be helpful in governance remains either privately held or otherwise hidden in silos (Verhulst and Young, 2017b). There remains a shortage not only of data but, more specifically, of high-quality and relevant data.
  • With few exceptions, the technical capacities of officials remain limited, and this has obviously negative ramifications for the potential use of data in governance (Giest, 2017).
  • It’s not just a question of limited technical capacities. There is often a vast conceptual and values gap between the policy and technical communities (Thompson et al., 2015; Uzochukwu et al., 2016); sometimes it seems as if they speak different languages. Compounding this difference in world views is the fact that the two communities rarely interact.
  • Yet, data about the use and evidence of the impact of data remain sparse. The impetus to use more data in policy making is stymied by limited scholarship and a weak evidential basis to show that data can be helpful and how. Without such evidence, data advocates are limited in their ability to make the case for more data initiatives in governance.
  • Data are not only changing the way policy is developed, but they have also reopened the debate around theory- versus data-driven methods in generating scientific knowledge (Lee, 1973; Kitchin, 2014; Chivers, 2018; Dreyfuss, 2017) and thus directly questioning the evidence base to utilization and implementation of data within policy making. A number of associated challenges are being discussed, such as: (i) traceability and reproducibility of research outcomes (due to “black box processing”); (ii) the use of correlation instead of causation as the basis of analysis, biases and uncertainties present in large historical datasets that cause replication and, in some cases, amplification of human cognitive biases and imperfections; and (iii) the incorporation of existing human knowledge and domain expertise into the scientific knowledge generation processes—among many other topics (Castelvecchi, 2016; Miller and Goodchild, 2015; Obermeyer and Emanuel, 2016; Provost and Fawcett, 2013).
  • Finally, we believe that there should be a sound under-pinning a new theory of what we call Policy–Data Interactions. To date, in reaction to the proliferation of data in the commercial world, theories of data management,1 privacy,2 and fairness3 have emerged. From the Human–Computer Interaction world, a manifesto of principles of Human–Data Interaction (Mortier et al., 2014) has found traction, which intends reducing the asymmetry of power present in current design considerations of systems of data about people. However, we need a consistent, symmetric approach to consideration of systems of policy and data, how they interact with one another.

All these challenges are real, and they are sticky. We are under no illusions that they will be overcome easily or quickly….

During the past four conferences, we have hosted an incredibly diverse range of dialogues and examinations by key global thought leaders, opinion leaders, practitioners, and the scientific community (Data for Policy, 2015201620172019). What became increasingly obvious was the need for a dedicated venue to deepen and sustain the conversations and deliberations beyond the limitations of an annual conference. This leads us to today and the launch of Data & Policy, which aims to confront and mitigate the barriers to greater use of data in policy making and governance.

Data & Policy is a venue for peer-reviewed research and discussion about the potential for and impact of data science on policy. Our aim is to provide a nuanced and multistranded assessment of the potential and challenges involved in using data for policy and to bridge the “two cultures” of science and humanism—as CP Snow famously described in his lecture on “Two Cultures and the Scientific Revolution” (Snow, 1959). By doing so, we also seek to bridge the two other dichotomies that limit an examination of datafication and is interaction with policy from various angles: the divide between practice and scholarship; and between private and public…

So these are our principles: scholarly, pragmatic, open-minded, interdisciplinary, focused on actionable intelligence, and, most of all, innovative in how we will share insight and pushing at the boundaries of what we already know and what already exists. We are excited to launch Data & Policy with the support of Cambridge University Press and University College London, and we’re looking for partners to help us build it as a resource for the community. If you’re reading this manifesto it means you have at least a passing interest in the subject; we hope you will be part of the conversation….(More)”.

Cyberdiplomacy: Managing Security and Governance Online


Book by Shaun Riordan: “The world has been sleep-walking into cyber chaos. The spread of misinformation via social media and the theft of data and intellectual property, along with regular cyberattacks, threaten the fabric of modern societies. All the while, the Internet of Things increases the vulnerability of computer systems, including those controlling critical infrastructure. What can be done to tackle these problems? Does diplomacy offer ways of managing security and containing conflict online?

In this provocative book, Shaun Riordan shows how traditional diplomatic skills and mindsets can be combined with new technologies to bring order and enhance international cooperation. He explains what cyberdiplomacy means for diplomats, foreign services and corporations and explores how it can be applied to issues such as internet governance, cybersecurity, cybercrime and information warfare. Cyberspace, he argues, is too important to leave to technicians. Using the vital tools offered by cyberdiplomacy, we can reduce the escalation and proliferation of cyberconflicts by proactively promoting negotiation and collaboration online….(More)”.

Five myths about whistleblowers


Dana Gold in the Washington Post: “When a whistleblower revealed the Trump administration’s decision to overturn 25 security clearance denials, it was the latest in a long and storied history of insiders exposing significant abuses of public trust. Whistles were blown on U.S. involvement in Vietnam, the Watergate coverupEnron’s financial fraud, the National Security Agency’s mass surveillance of domestic electronic communications and, during the Trump administration, the corruption of former Environmental Protection Agency chief Scott Pruitt , Cambridge Analytica’s theft of Facebook users’ data to develop targeted political ads, and harm to children posed by the “zero tolerance” immigration policy. Despite the essential role whistleblowers play in illuminating the truth and protecting the public interest, several myths persist about them, some pernicious.

MYTH NO. 1 Whistleblowers are employees who report problems externally….

MYTH NO. 2 Whistleblowers are either disloyal or heroes….

MYTH NO. 3 ‘Leaker’ is another term for ‘whistleblower.’…

MYTH NO. 4 Remaining anonymous is the best strategy for whistleblowing….

MYTH NO. 5 Julian Assange is a whistleblower….(More)”.

Trustworthy Privacy Indicators: Grades, Labels, Certifications and Dashboards


Paper by Joel R. Reidenberg et al: “Despite numerous groups’ efforts to score, grade, label, and rate the privacy of websites, apps, and network-connected devices, these attempts at privacy indicators have, thus far, not been widely adopted. Privacy policies, however, remain long, complex, and impractical for consumers. Communicating in some short-hand form, synthesized privacy content is now crucial to empower internet users and provide them more meaningful notice, as well as nudge consumers and data processors toward more meaningful privacy. Indeed, on the basis of these needs, the National Institute of Standards and Technology and the Federal Trade Commission in the United States, as well as lawmakers and policymakers in the European Union, have advocated for the development of privacy indicator systems.

Efforts to develop privacy grades, scores, labels, icons, certifications, seals, and dashboards have wrestled with various deficiencies and obstacles for the wide-scale deployment as meaningful and trustworthy privacy indicators. This paper seeks to identify and explain these deficiencies and obstacles that have hampered past and current attempts. With these lessons, the article then offers criteria that will need to be established in law and policy for trustworthy indicators to be successfully deployed and adopted through technological tools. The lack of standardization prevents user-recognizability and dependability in the online marketplace, diminishes the ability to create automated tools for privacy, and reduces incentives for consumers and industry to invest in a privacy indicators. Flawed methods in selection and weighting of privacy evaluation criteria and issues interpreting language that is often ambiguous and vague jeopardize success and reliability when baked into an indicator of privacy protectiveness or invasiveness. Likewise, indicators fall short when those organizations rating or certifying the privacy practices are not objective, trustworthy, and sustainable.

Nonetheless, trustworthy privacy rating systems that are meaningful, accurate, and adoptable can be developed to assure effective and enduring empowerment of consumers. This paper proposes a framework using examples from prior and current attempts to create privacy indicator systems in order to provide a valuable resource for present-day, real world policymaking….(More)”.

Mapping the challenges and opportunities of artificial intelligence for the conduct of diplomacy


DiploFoundation: “This report provides an overview of the evolution of diplomacy in the context of artificial intelligence (AI). AI has emerged as a very hot topic on the international agenda impacting numerous aspects of our political, social, and economic lives. It is clear that AI will remain a permanent feature of international debates and will continue to shape societies and international relations.

It is impossible to ignore the challenges – and opportunities – AI is bringing to the diplomatic realm. Its relevance as a topic for diplomats and others working in international relations will only increase….(More)”.

A Behavioral Economics Approach to Digitalisation


Paper by Dirk Beerbaum and Julia M. Puaschunder: “A growing body of academic research in the field of behavioural economics, political science and psychology demonstrate how an invisible hand can nudge people’s decisions towards a preferred option. Contrary to the assumptions of the neoclassical economics, supporters of nudging argue that people have problems coping with a complex world, because of their limited knowledge and their restricted rationality. Technological improvement in the age of information has increased the possibilities to control the innocent social media users or penalise private investors and reap the benefits of their existence in hidden persuasion and discrimination. Nudging enables nudgers to plunder the simple uneducated and uninformed citizen and investor, who is neither aware of the nudging strategies nor able to oversee the tactics used by the nudgers (Puaschunder 2017a, b; 2018a, b).

The nudgers are thereby legally protected by democratically assigned positions they hold. The law of motion of the nudging societies holds an unequal concentration of power of those who have access to compiled data and coding rules, relevant for political power and influencing the investor’s decision usefulness (Puaschunder 2017a, b; 2018a, b). This paper takes as a case the “transparency technology XBRL (eXtensible Business Reporting Language)” (Sunstein 2013, 20), which should make data more accessible as well as usable for private investors. It is part of the choice architecture on regulation by governments (Sunstein 2013). However, XBRL is bounded to a taxonomy (Piechocki and Felden 2007).

Considering theoretical literature and field research, a representation issue (Beerbaum, Piechocki and Weber 2017) for principles-based accounting taxonomies exists, which intelligent machines applying Artificial Intelligence (AI) (Mwilu, Prat and Comyn-Wattiau 2015) nudge to facilitate decision usefulness. This paper conceptualizes ethical questions arising from the taxonomy engineering based on machine learning systems: Should the objective of the coding rule be to support or to influence human decision making or rational artificiality? This paper therefore advocates for a democratisation of information, education and transparency about nudges and coding rules (Puaschunder 2017a, b; 2018a, b)…(More)”.