The war to free science


Brian Resnick and Julia Belluz at Vox: “The 27,500 scientists who work for the University of California generate 10 percent of all the academic research papers published in the United States.

Their university recently put them in a strange position: Sometime this year, these scientists will not be able to directly access much of the world’s published research they’re not involved in.

That’s because in February, the UC system — one of the country’s largest academic institutions, encompassing Berkeley, Los Angeles, Davis, and several other campuses — dropped its nearly $11 million annual subscription to Elsevier, the world’s largest publisher of academic journals.

On the face of it, this seemed like an odd move. Why cut off students and researchers from academic research?

In fact, it was a principled stance that may herald a revolution in the way science is shared around the world.

The University of California decided it doesn’t want scientific knowledge locked behind paywalls, and thinks the cost of academic publishing has gotten out of control.

Elsevier owns around 3,000 academic journals, and its articles account for some 18 percentof all the world’s research output. “They’re a monopolist, and they act like a monopolist,” says Jeffrey MacKie-Mason, head of the campus libraries at UC Berkeley and co-chair of the team that negotiated with the publisher.Elsevier makes huge profits on its journals, generating billions of dollars a year for its parent company RELX .

This is a story about more than subscription fees. It’s about how a private industry has come to dominate the institutions of science, and how librarians, academics, and even pirates are trying to regain control.

The University of California is not the only institution fighting back. “There are thousands of Davids in this story,” says University of California Davis librarian MacKenzie Smith, who, like so many other librarians around the world, has been pushing for more open access to science. “But only a few big Goliaths.”…(More)”.

Data & Policy: A new venue to study and explore policy–data interaction


Opening editorial by Stefaan G. Verhulst, Zeynep Engin and Jon Crowcroft: “…Policy–data interactions or governance initiatives that use data have been the exception rather than the norm, isolated prototypes and trials rather than an indication of real, systemic change. There are various reasons for the generally slow uptake of data in policymaking, and several factors will have to change if the situation is to improve. ….

  • Despite the number of successful prototypes and small-scale initiatives, policy makers’ understanding of data’s potential and its value proposition generally remains limited (Lutes, 2015). There is also limited appreciation of the advances data science has made the last few years. This is a major limiting factor; we cannot expect policy makers to use data if they do not recognize what data and data science can do.
  • The recent (and justifiable) backlash against how certain private companies handle consumer data has had something of a reverse halo effect: There is a growing lack of trust in the way data is collected, analyzed, and used, and this often leads to a certain reluctance (or simply risk-aversion) on the part of officials and others (Engin, 2018).
  • Despite several high-profile open data projects around the world, much (probably the majority) of data that could be helpful in governance remains either privately held or otherwise hidden in silos (Verhulst and Young, 2017b). There remains a shortage not only of data but, more specifically, of high-quality and relevant data.
  • With few exceptions, the technical capacities of officials remain limited, and this has obviously negative ramifications for the potential use of data in governance (Giest, 2017).
  • It’s not just a question of limited technical capacities. There is often a vast conceptual and values gap between the policy and technical communities (Thompson et al., 2015; Uzochukwu et al., 2016); sometimes it seems as if they speak different languages. Compounding this difference in world views is the fact that the two communities rarely interact.
  • Yet, data about the use and evidence of the impact of data remain sparse. The impetus to use more data in policy making is stymied by limited scholarship and a weak evidential basis to show that data can be helpful and how. Without such evidence, data advocates are limited in their ability to make the case for more data initiatives in governance.
  • Data are not only changing the way policy is developed, but they have also reopened the debate around theory- versus data-driven methods in generating scientific knowledge (Lee, 1973; Kitchin, 2014; Chivers, 2018; Dreyfuss, 2017) and thus directly questioning the evidence base to utilization and implementation of data within policy making. A number of associated challenges are being discussed, such as: (i) traceability and reproducibility of research outcomes (due to “black box processing”); (ii) the use of correlation instead of causation as the basis of analysis, biases and uncertainties present in large historical datasets that cause replication and, in some cases, amplification of human cognitive biases and imperfections; and (iii) the incorporation of existing human knowledge and domain expertise into the scientific knowledge generation processes—among many other topics (Castelvecchi, 2016; Miller and Goodchild, 2015; Obermeyer and Emanuel, 2016; Provost and Fawcett, 2013).
  • Finally, we believe that there should be a sound under-pinning a new theory of what we call Policy–Data Interactions. To date, in reaction to the proliferation of data in the commercial world, theories of data management,1 privacy,2 and fairness3 have emerged. From the Human–Computer Interaction world, a manifesto of principles of Human–Data Interaction (Mortier et al., 2014) has found traction, which intends reducing the asymmetry of power present in current design considerations of systems of data about people. However, we need a consistent, symmetric approach to consideration of systems of policy and data, how they interact with one another.

All these challenges are real, and they are sticky. We are under no illusions that they will be overcome easily or quickly….

During the past four conferences, we have hosted an incredibly diverse range of dialogues and examinations by key global thought leaders, opinion leaders, practitioners, and the scientific community (Data for Policy, 2015201620172019). What became increasingly obvious was the need for a dedicated venue to deepen and sustain the conversations and deliberations beyond the limitations of an annual conference. This leads us to today and the launch of Data & Policy, which aims to confront and mitigate the barriers to greater use of data in policy making and governance.

Data & Policy is a venue for peer-reviewed research and discussion about the potential for and impact of data science on policy. Our aim is to provide a nuanced and multistranded assessment of the potential and challenges involved in using data for policy and to bridge the “two cultures” of science and humanism—as CP Snow famously described in his lecture on “Two Cultures and the Scientific Revolution” (Snow, 1959). By doing so, we also seek to bridge the two other dichotomies that limit an examination of datafication and is interaction with policy from various angles: the divide between practice and scholarship; and between private and public…

So these are our principles: scholarly, pragmatic, open-minded, interdisciplinary, focused on actionable intelligence, and, most of all, innovative in how we will share insight and pushing at the boundaries of what we already know and what already exists. We are excited to launch Data & Policy with the support of Cambridge University Press and University College London, and we’re looking for partners to help us build it as a resource for the community. If you’re reading this manifesto it means you have at least a passing interest in the subject; we hope you will be part of the conversation….(More)”.

From Theory to Practice : Open Government Data, Accountability, and Service Delivery


Report by Michael Christopher Jelenic: “Open data and open government data have recently attracted much attention as a means to innovate, add value, and improve outcomes in a variety of sectors, public and private. Although some of the benefits of open data initiatives have been assessed in the past, particularly their economic and financial returns, it is often more difficult to evaluate their social and political impacts. In the public sector, a murky theory of change has emerged that links the use of open government data with greater government accountability as well as improved service delivery in key sectors, including health and education, among others. In the absence of cross-country empirical research on this topic, this paper asks the following: Based on the evidence available, to what extent and for what reasons is the use of open government data associated with higher levels of accountability and improved service delivery in developing countries?

To answer this question, the paper constructs a unique data set that operationalizes open government data, government accountability, service delivery, as well as other intervening and control variables. Relying on data from 25 countries in Sub-Saharan Africa, the paper finds a number of significant associations between open government data, accountability, and service delivery. However, the findings suggest differentiated effects of open government data across the health and education sectors, as well as with respect to service provision and service delivery outcomes. Although this early research has limitations and does not attempt to establish a purely causal relationship between the variables, it provides initial empirical support for claims about the efficacy of open government data for improving accountability and service delivery….(More)”

How to use data for good — 5 priorities and a roadmap


Stefaan Verhulst at apolitical: “…While the overarching message emerging from these case studies was promising, several barriers were identified that if not addressed systematically could undermine the potential of data science to address critical public needs and limit the opportunity to scale the practice more broadly.

Below we summarise the five priorities that emerged through the workshop for the field moving forward.

1. Become People-Centric

Much of the data currently used for drawing insights involve or are generated by people.

These insights have the potential to impact people’s lives in many positive and negative ways. Yet, the people and the communities represented in this data are largely absent when practitioners design and develop data for social good initiatives.

To ensure data is a force for positive social transformation (i.e., they address real people’s needs and impact lives in a beneficiary way), we need to experiment with new ways to engage people at the design, implementation, and review stage of data initiatives beyond simply asking for their consent.

(Photo credit: Image from the people-led innovation report)

As we explain in our People-Led Innovation methodology, different segments of people can play multiple roles ranging from co-creation to commenting, reviewing and providing additional datasets.

The key is to ensure their needs are front and center, and that data science for social good initiatives seek to address questions related to real problems that matter to society-at-large (a key concern that led The GovLab to instigate 100 Questions Initiative).

2. Establish Data About the Use of Data (for Social Good)

Many data for social good initiatives remain fledgling.

As currently designed, the field often struggles with translating sound data projects into positive change. As a result, many potential stakeholders—private sector and government “owners” of data as well as public beneficiaries—remain unsure about the value of using data for social good, especially against the background of high risks and transactions costs.

The field needs to overcome such limitations if data insights and its benefits are to spread. For that, we need hard evidence about data’s positive impact. Ironically, the field is held back by an absence of good data on the use of data—a lack of reliable empirical evidence that could guide new initiatives.

The field needs to prioritise developing a far more solid evidence base and “business case” to move data for social good from a good idea to reality.

3. Develop End-to-End Data Initiatives

Too often, data for social good focus on the “data-to-knowledge” pipeline without focusing on how to move “knowledge into action.”

As such, the impact remains limited and many efforts never reach an audience that can actually act upon the insights generated. Without becoming more sophisticated in our efforts to provide end-to-end projects and taking “data from knowledge to action,” the positive impact of data will be limited….

4. Invest in Common Trust and Data Steward Mechanisms 

For data for social good initiatives (including data collaboratives) to flourish and scale, there must be substantial trust between all parties involved; and amongst the public-at-large.

Establishing such a platform of trust requires each actor to invest in developing essential trust mechanisms such as data governance structures, contracts, and dispute resolution methods. Today, designing and establishing these mechanisms take tremendous time, energy, and expertise. These high transaction costs result from the lack of common templates and the need to each time design governance structures from scratch…

5. Build Bridges Across Cultures

As C.P. Snow famously described in his lecture on “Two Cultures and the Scientific Revolution,” we must bridge the “two cultures” of science and humanism if we are to solve the world’s problems….

To implement these five priorities we will need experimentation at the operational but also institutional level. This involves the establishment of “data stewards” within organisations that can accelerate data for social good initiative in a responsible manner integrating the five priorities above….(More)”

We should extend EU bank data sharing to all sectors


Carlos Torres Vila in the Financial Times: “Data is now driving the global economy — just look at the list of the world’s most valuable companies. They collect and exploit the information that users generate through billions of online interactions taking place every day. 


But companies are hoarding data too, preventing others, including the users to whom the data relates, from accessing and using it. This is true of traditional groups such as banks, telcos and utilities, as well as the large digital enterprises that rely on “proprietary” data. 
Global and national regulators must address this problem by forcing companies to give users an easy way to share their own data, if they so choose. This is the logical consequence of personal data belonging to users. There is also the potential for enormous socio-economic benefits if we can create consent-based free data flows. 
We need data-sharing across companies in all sectors in a real time, standardised way — not at a speed and in a format dictated by the companies that stockpile user data. These new rules should apply to all electronic data generated by users, whether provided directly or observed during their online interactions with any provider, across geographic borders and in any sector. This could include everything from geolocation history and electricity consumption to recent web searches, pension information or even most recently played songs. 

This won’t be easy to achieve in practice, but the good news is that we already have a framework that could be the model for a broader solution. The UK’s Open Banking system provides a tantalising glimpse of what may be possible. In Europe, the regulation known as the Payment Services Directive 2 allows banking customers to share data about their transactions with multiple providers via secure, structured IT interfaces. We are already seeing this unlock new business models and drive competition in digital financial services. But these rules do not go far enough — they only apply to payments history, and that isn’t enough to push forward a data-driven economic revolution across other sectors of the economy. 

We need a global framework with common rules across regions and sectors. This has already happened in financial services: after the 2008 financial crisis, the G20 strengthened global banking standards and created the Financial Stability Board. The rules, while not perfect, have delivered uniformity which has strengthened the system. 

We need a similar global push for common rules on the use of data. While it will be difficult to achieve consensus on data, and undoubtedly more difficult still to implement and enforce it, I believe that now is the time to decide what we want. The involvement of the G20 in setting up global standards will be essential to realising the potential that data has to deliver a better world for all of us. There will be complaints about the cost of implementation. I know first hand how expensive it can be to simultaneously open up and protect sensitive core systems. 

The alternative is siloed data that holds back innovation. There will also be justified concerns that easier data sharing could lead to new user risks. Security must be a non-negotiable principle in designing intercompany interfaces and protecting access to sensitive data. But Open Banking shows that these challenges are resolvable. …(More)”.

France Bans Judge Analytics, 5 Years In Prison For Rule Breakers


Artificial Lawyer: “In a startling intervention that seeks to limit the emerging litigation analytics and prediction sector, the French Government has banned the publication of statistical information about judges’ decisions – with a five year prison sentence set as the maximum punishment for anyone who breaks the new law.

Owners of legal tech companies focused on litigation analytics are the most likely to suffer from this new measure.

The new law, encoded in Article 33 of the Justice Reform Act, is aimed at preventing anyone – but especially legal tech companies focused on litigation prediction and analytics – from publicly revealing the pattern of judges’ behaviour in relation to court decisions.

A key passage of the new law states:

‘The identity data of magistrates and members of the judiciary cannot be reused with the purpose or effect of evaluating, analysing, comparing or predicting their actual or alleged professional practices.’ *

As far as Artificial Lawyer understands, this is the very first example of such a ban anywhere in the world.

Insiders in France told Artificial Lawyer that the new law is a direct result of an earlier effort to make all case law easily accessible to the general public, which was seen at the time as improving access to justice and a big step forward for transparency in the justice sector.

However, judges in France had not reckoned on NLP and machine learning companies taking the public data and using it to model how certain judges behave in relation to particular types of legal matter or argument, or how they compare to other judges.

In short, they didn’t like how the pattern of their decisions – now relatively easy to model – were potentially open for all to see.

Unlike in the US and the UK, where judges appear to have accepted the fait accompli of legal AI companies analysing their decisions in extreme detail and then creating models as to how they may behave in the future, French judges have decided to stamp it out….(More)”.

The Landscape of Open Data Policies


Apograf: “Open Access (OA) publishing has a long history, going back to the early 1990s, and was born with the explicit intention of improving access to scholarly literature. The internet has played a pivotal role in garnering support for free and reusable research publications, as well as stronger and more democratic peer-review systems — ones are not bogged down by the restrictions of influential publishing platforms….

Looking back, looking forward

Launched in 1991, ArXiv.org was a pioneering platform in this regard, a telling example of how researchers could cooperate to publish academic papers for free and in full view for the public. Though it has limitations — papers are curated by moderators and are not peer-reviewed — arXiv is a demonstration of how technology can be used to overcome some of the incentive and distribution problems that scientific research had long been subjected to.

The scientific community has itself assumed the mantle to this end: the Budapest Open Access Initiative (BOAI) and the Berlin Declaration on Open Access Initiative, launched in 2002 and 2003 respectively, are considered landmark movements in the push for unrestricted access to scientific research. While mostly symbolic, the effort highlighted the growing desire to solve the problems plaguing the space through technology.

The BOAI manifesto begins with a statement that is an encapsulation of the movement’s purpose,

“An old tradition and a new technology have converged to make possible an unprecedented public good. The old tradition is the willingness of scientists and scholars to publish the fruits of their research in scholarly journals without payment, for the sake of inquiry and knowledge. The new technology is the internet. The public good they make possible is the world-wide electronic distribution of the peer-reviewed journal literature and completely free and unrestricted access to it by all scientists, scholars, teachers, students, and other curious minds.”

Plan S is a more recent attempt to make publicly funded research available to all. Launched by Science Europe in September 2018, Plan S — short for ‘Shock’ — has energized the research community with its resolution to make access to publicly funded knowledge a right to everyone and dissolve the profit-driven ecosystem of research publication. Members of the European Union have vowed to achieve this by 2020.

Plan S has been supported by governments outside Europe as well. China has thrown itself behind it, and the state of California has enacted a law that requires open access to research one year after publishing. It is, of course, not without its challenges: advocacy and ensuring that publishing is not restricted a few venues are two such obstacles. However, the organization behind forming the guidelines, cOAlition S, has agreed to make the guidelines more flexible.

The emergence of this trend is not without its difficulties, however, and numerous obstacles continue to hinder the dissemination of information in a manner that is truly transparent and public. Chief among these are the many gates that continue to keep research as somewhat of exclusive property, besides the fact that the infrastructure and development for such systems are short on funding and staff…..(More)”.

Opening Data for Global Health


Chapter by Matt Laessig, Bryon Jacob and Carla AbouZahr in The Palgrave Handbook of Global Health Data Methods for Policy and Practice: “…provide best practices for organizations to adopt to disseminate data openly for others to use. They describe development of the open data movement and its rapid adoption by governments, non-governmental organizations, and research groups. The authors provide examples from the health sector—an early adopter—but acknowledge concerns specific to health relating to informed consent, intellectual property, and ownership of personal data. Drawing on their considerable contributions to the open data movement, Laessig and Jacob share their Open Data Progression Model. They describe six stages to make data open: from data collection, documentation of the data, opening the data, engaging the community of users, making the data interoperable, to finally linking the data….(More)”

Come to Finland if you want to glimpse the future of health data!


Jukka Vahti at Sitra: “The Finnish tradition of establishing, maintaining and developing data registers goes back to the 1600s, when parish records were first kept.

When this old custom is combined with the opportunities afforded by digitisation, the positive approach Finns have towards research and technology, and the recently updated legislation enabling the data economy, Finland and the Finnish people can lead the way as Europe gradually, or even suddenly, switches to a fair data economy.

The foundations for a fair data economy already exist

The fair data economy is a natural continuation of the former projects promoting e-services that were undertaken in Finland.

For example, the Data Exchange Layer is already speeding up the transfer of data from one system to another in Finland and in Estonia, the country where the system originated, and a system unique to just these two countries.

In May 2019 Finland also saw the entry into force of the Act on the Secondary Use of Health and Social Data, according to which the information on social welfare and healthcare held in registers may be used for purposes of statistics, research, education, knowledge management, control and supervision conducted by authorities, and development and innovation activity.

The new law will make the work of researchers and service developers more effective, as the business of acquiring a permit will take place through a one-stop-shop principle and it will be possible to use data from more than one source more readily than before….(More)”.

Open Data and the Private Sector


Chapter by Joel Gurin, Carla Bonini and Stefaan Verhulst in State of Open Data: “The open data movement launched a decade ago with a focus on transparency, good governance, and citizen participation. As other chapters in this collection have documented in detail, those critical uses of open data have remained paramount and are continuing to grow in importance at a time of fake news and increased secrecy. But the value of open data extends beyond transparency and accountability – open data is also an important resource for business and economic growth.

The past several years have seen an increased focus on the value of open data to the private sector. In 2012, the Open Data Institute (ODI) was founded in the United Kingdom (UK) and backed with GBP 10 million by the UK government to maximise the value of open data in business and government. A year later, McKinsey released a report suggesting open data could help unlock USD 3 to 5 trillion in economic value annually. At around the same time, Monsanto acquired the Climate Corporation, a digital agriculture company that leverages open data to inform farmers for approximately USD 1.1 billion. In 2014, the GovLab launched the Open Data 500,2the first national study of businesses using open government data (now in six countries), and, in 2015, Open Data for Development (OD4D) launched the Open Data Impact Map, which today contains more than 1 100 examples of private sector companies using open data. The potential business applications of open data continue to be a priority for many governments around the world as they plan and develop their data programmes.

The use of open data has become part of the broader business practice of using data and data science to inform business decisions, ranging from launching new products and services to optimising processes and outsmarting the competition. In this chapter, we take stock of the state of open data and the private sector by analysing how the private sector both leverages and contributes to the open data ecosystem….(More)”.