The war to free science


Brian Resnick and Julia Belluz at Vox: “The 27,500 scientists who work for the University of California generate 10 percent of all the academic research papers published in the United States.

Their university recently put them in a strange position: Sometime this year, these scientists will not be able to directly access much of the world’s published research they’re not involved in.

That’s because in February, the UC system — one of the country’s largest academic institutions, encompassing Berkeley, Los Angeles, Davis, and several other campuses — dropped its nearly $11 million annual subscription to Elsevier, the world’s largest publisher of academic journals.

On the face of it, this seemed like an odd move. Why cut off students and researchers from academic research?

In fact, it was a principled stance that may herald a revolution in the way science is shared around the world.

The University of California decided it doesn’t want scientific knowledge locked behind paywalls, and thinks the cost of academic publishing has gotten out of control.

Elsevier owns around 3,000 academic journals, and its articles account for some 18 percentof all the world’s research output. “They’re a monopolist, and they act like a monopolist,” says Jeffrey MacKie-Mason, head of the campus libraries at UC Berkeley and co-chair of the team that negotiated with the publisher.Elsevier makes huge profits on its journals, generating billions of dollars a year for its parent company RELX .

This is a story about more than subscription fees. It’s about how a private industry has come to dominate the institutions of science, and how librarians, academics, and even pirates are trying to regain control.

The University of California is not the only institution fighting back. “There are thousands of Davids in this story,” says University of California Davis librarian MacKenzie Smith, who, like so many other librarians around the world, has been pushing for more open access to science. “But only a few big Goliaths.”…(More)”.

Data & Policy: A new venue to study and explore policy–data interaction


Opening editorial by Stefaan G. Verhulst, Zeynep Engin and Jon Crowcroft: “…Policy–data interactions or governance initiatives that use data have been the exception rather than the norm, isolated prototypes and trials rather than an indication of real, systemic change. There are various reasons for the generally slow uptake of data in policymaking, and several factors will have to change if the situation is to improve. ….

  • Despite the number of successful prototypes and small-scale initiatives, policy makers’ understanding of data’s potential and its value proposition generally remains limited (Lutes, 2015). There is also limited appreciation of the advances data science has made the last few years. This is a major limiting factor; we cannot expect policy makers to use data if they do not recognize what data and data science can do.
  • The recent (and justifiable) backlash against how certain private companies handle consumer data has had something of a reverse halo effect: There is a growing lack of trust in the way data is collected, analyzed, and used, and this often leads to a certain reluctance (or simply risk-aversion) on the part of officials and others (Engin, 2018).
  • Despite several high-profile open data projects around the world, much (probably the majority) of data that could be helpful in governance remains either privately held or otherwise hidden in silos (Verhulst and Young, 2017b). There remains a shortage not only of data but, more specifically, of high-quality and relevant data.
  • With few exceptions, the technical capacities of officials remain limited, and this has obviously negative ramifications for the potential use of data in governance (Giest, 2017).
  • It’s not just a question of limited technical capacities. There is often a vast conceptual and values gap between the policy and technical communities (Thompson et al., 2015; Uzochukwu et al., 2016); sometimes it seems as if they speak different languages. Compounding this difference in world views is the fact that the two communities rarely interact.
  • Yet, data about the use and evidence of the impact of data remain sparse. The impetus to use more data in policy making is stymied by limited scholarship and a weak evidential basis to show that data can be helpful and how. Without such evidence, data advocates are limited in their ability to make the case for more data initiatives in governance.
  • Data are not only changing the way policy is developed, but they have also reopened the debate around theory- versus data-driven methods in generating scientific knowledge (Lee, 1973; Kitchin, 2014; Chivers, 2018; Dreyfuss, 2017) and thus directly questioning the evidence base to utilization and implementation of data within policy making. A number of associated challenges are being discussed, such as: (i) traceability and reproducibility of research outcomes (due to “black box processing”); (ii) the use of correlation instead of causation as the basis of analysis, biases and uncertainties present in large historical datasets that cause replication and, in some cases, amplification of human cognitive biases and imperfections; and (iii) the incorporation of existing human knowledge and domain expertise into the scientific knowledge generation processes—among many other topics (Castelvecchi, 2016; Miller and Goodchild, 2015; Obermeyer and Emanuel, 2016; Provost and Fawcett, 2013).
  • Finally, we believe that there should be a sound under-pinning a new theory of what we call Policy–Data Interactions. To date, in reaction to the proliferation of data in the commercial world, theories of data management,1 privacy,2 and fairness3 have emerged. From the Human–Computer Interaction world, a manifesto of principles of Human–Data Interaction (Mortier et al., 2014) has found traction, which intends reducing the asymmetry of power present in current design considerations of systems of data about people. However, we need a consistent, symmetric approach to consideration of systems of policy and data, how they interact with one another.

All these challenges are real, and they are sticky. We are under no illusions that they will be overcome easily or quickly….

During the past four conferences, we have hosted an incredibly diverse range of dialogues and examinations by key global thought leaders, opinion leaders, practitioners, and the scientific community (Data for Policy, 2015201620172019). What became increasingly obvious was the need for a dedicated venue to deepen and sustain the conversations and deliberations beyond the limitations of an annual conference. This leads us to today and the launch of Data & Policy, which aims to confront and mitigate the barriers to greater use of data in policy making and governance.

Data & Policy is a venue for peer-reviewed research and discussion about the potential for and impact of data science on policy. Our aim is to provide a nuanced and multistranded assessment of the potential and challenges involved in using data for policy and to bridge the “two cultures” of science and humanism—as CP Snow famously described in his lecture on “Two Cultures and the Scientific Revolution” (Snow, 1959). By doing so, we also seek to bridge the two other dichotomies that limit an examination of datafication and is interaction with policy from various angles: the divide between practice and scholarship; and between private and public…

So these are our principles: scholarly, pragmatic, open-minded, interdisciplinary, focused on actionable intelligence, and, most of all, innovative in how we will share insight and pushing at the boundaries of what we already know and what already exists. We are excited to launch Data & Policy with the support of Cambridge University Press and University College London, and we’re looking for partners to help us build it as a resource for the community. If you’re reading this manifesto it means you have at least a passing interest in the subject; we hope you will be part of the conversation….(More)”.

Introducing ‘AI Commons’: A framework for collaboration to achieve global impact


Press Release: “Last week’s 3rd annual AI for Good Global Summit once again showcased the growing number of Artificial Intelligence (AI) projects with promise to advance the United Nations Sustainable Development Goals (SDGs).

Now, using the Summit’s momentum, AI innovators and humanitarian leaders are prepared to take the ‘AI for Good’ movement to the next level.

They are working together to launch an ‘AI Commons’ that aims to scale AI for Good projects and maximize their impact across the world.

The AI Commons will enable AI adopters to connect with AI specialists and data owners to align incentives for innovation and develop AI solutions to precisely defined problems.

“The concept of AI Commons has developed over three editions of the Summit and is now motivating implementation,” said ITU Secretary-General Houlin Zhao in closing remarks to the summit. “AI and data need to be a shared resource if we are serious about scaling AI for good. The community supporting the Summit is creating infrastructure to scale-up their collaboration − to convert the principles underlying the Summit into global impact.”…

The AI Commons will provide an open framework for collaboration, a decentralized system to democratize problem solving with AI.

It aims to be a “knowledge space”, says Banifatemi, answering a key question: “How can problem solving with AI become common knowledge?”

“The goal is to be an open initiative, like a Linux effort, like an open-source network, where everyone can participate and we jointly share and we create an abundance of knowledge, knowledge of how we can solve problems with AI,” said Banifatemi.

AI development and application will build on the state of the art, enabling AI solutions to scale with the help of shared datasets, testing and simulation environments, AI models and associated software, and storage and computing resources….(More)”.

How not to conduct a consultation – and why asking the public is not always such a great idea


Agnes Batory & Sara Svensson at Policy and Politics: “Involving people in policy-making is generally a good thing. Policy-makers themselves often pay at least lip-service to the importance of giving citizens a say. In the academic literature, participatory governance has been, with some exaggeration, almost universally hailed as a panacea to all ills in Western democracies. In particular, it is advocated as a way to remedy the alienation of voters from politicians who seem to be oblivious to the concerns of the common man and woman, with an ensuing decline in public trust in government. Representation by political parties is ridden with problems, so the argument goes, and in any case it is overly focused on the act of voting in elections – a one-off event once every few years which limits citizens’ ability to control the policy agenda. On the other hand, various forms of public participation are expected to educate citizens, help develop a civic culture, and boost the legitimacy of decision-making. Consequently, practices to ensure that citizens can provide direct input into policy-making are to be welcomed on both pragmatic and normative grounds.  

I do not disagree with these generally positive expectations. However, the main objective of my recent article in Policy and Politics, co-authored with Sara Svensson, is to inject a dose of healthy scepticism into the debate or, more precisely, to show that there are circumstances in which public consultations will achieve anything but greater legitimacy and better policy-outcomes. We do this partly by discussing the more questionable assumptions in the participatory governance literature, and partly by examining a recent, glaring example of the misuse, and abuse, of popular input….(More)”.

Number of fact-checking outlets surges to 188 in more than 60 countries


Mark Stencel at Poynter: “The number of fact-checking outlets around the world has grown to 188 in more than 60 countries amid global concerns about the spread of misinformation, according to the latest tally by the Duke Reporters’ Lab.

Since the last annual fact-checking census in February 2018, we’ve added 39 more outlets that actively assess claims from politicians and social media, a 26% increase. The new total is also more than four times the 44 fact-checkers we counted when we launched our global database and map in 2014.

Globally, the largest growth came in Asia, which went from 22 to 35 outlets in the past year. Nine of the 27 fact-checking outlets that launched since the start of 2018 were in Asia, including six in India. Latin American fact-checking also saw a growth spurt in that same period, with two new outlets in Costa Rica, and others in Mexico, Panama and Venezuela.

The actual worldwide total is likely much higher than our current tally. That’s because more than a half-dozen of the fact-checkers we’ve added to the database since the start of 2018 began as election-related partnerships that involved the collaboration of multiple organizations. And some those election partners are discussing ways to continue or reactivate that work— either together or on their own.

Over the past 12 months, five separate multimedia partnerships enlisted more than 60 different fact-checking organizations and other news companies to help debunk claims and verify information for voters in MexicoBrazilSweden,Nigeria and the Philippines. And the Poynter Institute’s International Fact-Checking Network assembled a separate team of 19 media outlets from 13 countries to consolidate and share their reporting during the run-up to last month’s elections for the European Parliament. Our database includes each of these partnerships, along with several others— but not each of the individual partners. And because they were intentionally short-run projects, three of these big partnerships appear among the 74 inactive projects we also document in our database.

Politics isn’t the only driver for fact-checkers. Many outlets in our database are concentrating efforts on viral hoaxes and other forms of online misinformation — often in coordination with the big digital platforms on which that misinformation spreads.

We also continue to see new topic-specific fact-checkers such as Metafact in Australia and Health Feedback in France— both of which launched in 2018 to focus on claims about health and medicine for a worldwide audience….(More)”.

How Organizations with Data and Technology Skills Can Play a Critical Role in the 2020 Census


Blog Post by Kathryn L.S. Pettit and Olivia Arena: “The 2020 Census is less than a year away, and it’s facing new challenges that could result in an inaccurate count. The proposed inclusion of a citizenship question, the lack of comprehensive and unified messaging, and the new internet-response option could worsen the undercount of vulnerable and marginalized communities and deprive these groups of critical resources.

The US Census Bureau aims to count every US resident. But some groups are more likely to be missed than others. Communities of color, immigrants, young children, renters, people experiencing homelessness, and people living in rural areas have long been undercounted in the census. Because the census count is used to apportion federal funding and draw legislative districts for political seats, an inaccurate count means that these populations receive less than their fair share of resources and representation.

Local governments and community-based organizations have begun forming Complete Count Committees, coalitions of trusted community voices established to encourage census responses, to achieve a more accurate count in 2020. Local organizations with data and technology skills—like civic tech groups, libraries, technology training organizations, and data intermediaries—can harness their expertise to help these coalitions achieve a complete count.

As the coordinator of the National Neighborhood Indicators Partnership (NNIP), we are learning about 2020 Census mobilization in communities across the country. We have found that data and technology groups are natural partners in this work; they understand what is at risk in 2020, are embedded in communities as trusted data providers, and can amplify the importance of the census.

Threats to a complete count

The proposed citizenship question, currently being challenged in court, would likely suppress the count of immigrants and households in immigrant communities in the US. Though federal law prohibits the Census Bureau from disclosing individual-level data, even to other agencies, people may still be skeptical about the confidentiality of the data or generally distrust the government. Acknowledging these fears is important for organizations partnering in outreach to vulnerable communities.

Another potential hurdle is that, for the first time, the Census Bureau will encourage people to complete their census forms online (though answering by mail or phone will still be options). Though a high tech census could be more cost-effective, the digital divide compounded by the underfunding of the Census Bureau that limited initial testing of new methods and outreach could worsen the undercount….(More)”.

Open government and citizen engagement: From theory to action


Camilo Romero Galeano at apolitical: “…According to the 2016 Corruption Perception Index analysing the behaviour of 178 countries, 69% of countries evaluated again raised the alarm about what has been referred to as “the cancer of the public service”.

The scandals of misappropriation of public funds, illicit enrichment of public officials, the slippery labyrinths of procurement and all kinds of practices that challenge ethics in the public service are daily news around the world.

Colombia and the department of Nariño suffer from the same problems. Bad practices of traditional politics and chiefdoms have ended up destroying the trust that citizens once had in political institutions. Corruption and its devastating effects always end up undermining people’s dignity.

With this as the current state of affairs, and in our capacity as a subnational government, we have designed hand in hand with the citizens of Nariño a new government program. It  is based on an approach to innovation called “New Government” that relies on three pillars: open government; social innovation; and collaborative economy.

The new program has been endorsed by more than 300,000 voters and subsequently concretised in our roadmap for the territory: “Nariño heart of the World”. The creation of this policy document brought together 31,700 participants and involved travelling around the 13 subregions that compose the 64 municipalities in Nariño.

In this way, citizen participation has become an essential tool in the fight against corruption.

Our open government strategy is called GANA — Gobierno Abierto de Nariño (in English, “Win — Open Government of Nariño”). The strategy takes a step forward in ensuring cabinet officials become transparent and publicly declare private assets. Citizens can now find out the financial conditions in which public officials begin and finish their administrative periods. Each one of us….(More)”

The Tricky Ethics of Using YouTube Videos for Academic Research


Jane C.Hu in P/S Magazine: “…But just because something is legal doesn’t mean it’s ethical. That doesn’t mean it’s necessarily unethical, either, but it’s worth asking questions about how and why researchers use social media posts, and whether those uses could be harmful. I was once a researcher who had to obtain human-subjects approval from a university institutional review board, and I know it can be a painstaking application process with long wait times. Collecting data from individuals takes a long time too. If you could just sub in YouTube videos in place of collecting your own data, that saves time, money, and effort. But that could be at the expense of the people whose data you’re scraping.

But, you might say, if people don’t want to be studied online, then they shouldn’t post anything. But most people don’t fully understand what “publicly available” really means or its ramifications. “You might know intellectually that technically anyone can see a tweet, but you still conceptualize your audience as being your 200 Twitter followers,” Fiesler says. In her research, she’s found that the majority of people she’s polled have no clue that researchers study public tweets.

Some may disagree that it’s researchers’ responsibility to work around social media users’ ignorance, but Fiesler and others are calling for their colleagues to be more mindful about any work that uses publicly available data. For instance, Ashley Patterson, an assistant professor of language and literacy at Penn State University, ultimately decided to use YouTube videos in her dissertation work on biracial individuals’ educational experiences. That’s a decision she arrived at after carefully considering her options each step of the way. “I had to set my own levels of ethical standards and hold myself to it, because I knew no one else would,” she says. One of Patterson’s first steps was to ask herself what YouTube videos would add to her work, and whether there were any other ways to collect her data. “It’s not a matter of whether it makes my life easier, or whether it’s ‘just data out there’ that would otherwise go to waste. The nature of my question and the response I was looking for made this an appropriate piece [of my work],” she says.

Researchers may also want to consider qualitative, hard-to-quantify contextual cues when weighing ethical decisions. What kind of data is being used? Fiesler points out that tweets about, say, a television show are way less personal than ones about a sensitive medical condition. Anonymized written materials, like Facebook posts, could be less invasive than using someone’s face and voice from a YouTube video. And the potential consequences of the research project are worth considering too. For instance, Fiesler and other critics have pointed out that researchers who used YouTube videos of people documenting their experience undergoing hormone replacement therapy to train an artificial intelligence to identify trans people could be putting their unwitting participants in danger. It’s not obvious how the results of Speech2Face will be used, and, when asked for comment, the paper’s researchers said they’d prefer to quote from their paper, which pointed to a helpful purpose: providing a “representative face” based on the speaker’s voice on a phone call. But one can also imagine dangerous applications, like doxing anonymous YouTubers.

One way to get ahead of this, perhaps, is to take steps to explicitly inform participants their data is being used. Fiesler says that, when her team asked people how they’d feel after learning their tweets had been used for research, “not everyone was necessarily super upset, but most people were surprised.” They also seemed curious; 85 percent of participants said that, if their tweet were included in research, they’d want to read the resulting paper. “In human-subjects research, the ethical standard is informed consent, but inform and consent can be pulled apart; you could potentially inform people without getting their consent,” Fiesler suggests….(More)”.

How to use data for good — 5 priorities and a roadmap


Stefaan Verhulst at apolitical: “…While the overarching message emerging from these case studies was promising, several barriers were identified that if not addressed systematically could undermine the potential of data science to address critical public needs and limit the opportunity to scale the practice more broadly.

Below we summarise the five priorities that emerged through the workshop for the field moving forward.

1. Become People-Centric

Much of the data currently used for drawing insights involve or are generated by people.

These insights have the potential to impact people’s lives in many positive and negative ways. Yet, the people and the communities represented in this data are largely absent when practitioners design and develop data for social good initiatives.

To ensure data is a force for positive social transformation (i.e., they address real people’s needs and impact lives in a beneficiary way), we need to experiment with new ways to engage people at the design, implementation, and review stage of data initiatives beyond simply asking for their consent.

(Photo credit: Image from the people-led innovation report)

As we explain in our People-Led Innovation methodology, different segments of people can play multiple roles ranging from co-creation to commenting, reviewing and providing additional datasets.

The key is to ensure their needs are front and center, and that data science for social good initiatives seek to address questions related to real problems that matter to society-at-large (a key concern that led The GovLab to instigate 100 Questions Initiative).

2. Establish Data About the Use of Data (for Social Good)

Many data for social good initiatives remain fledgling.

As currently designed, the field often struggles with translating sound data projects into positive change. As a result, many potential stakeholders—private sector and government “owners” of data as well as public beneficiaries—remain unsure about the value of using data for social good, especially against the background of high risks and transactions costs.

The field needs to overcome such limitations if data insights and its benefits are to spread. For that, we need hard evidence about data’s positive impact. Ironically, the field is held back by an absence of good data on the use of data—a lack of reliable empirical evidence that could guide new initiatives.

The field needs to prioritise developing a far more solid evidence base and “business case” to move data for social good from a good idea to reality.

3. Develop End-to-End Data Initiatives

Too often, data for social good focus on the “data-to-knowledge” pipeline without focusing on how to move “knowledge into action.”

As such, the impact remains limited and many efforts never reach an audience that can actually act upon the insights generated. Without becoming more sophisticated in our efforts to provide end-to-end projects and taking “data from knowledge to action,” the positive impact of data will be limited….

4. Invest in Common Trust and Data Steward Mechanisms 

For data for social good initiatives (including data collaboratives) to flourish and scale, there must be substantial trust between all parties involved; and amongst the public-at-large.

Establishing such a platform of trust requires each actor to invest in developing essential trust mechanisms such as data governance structures, contracts, and dispute resolution methods. Today, designing and establishing these mechanisms take tremendous time, energy, and expertise. These high transaction costs result from the lack of common templates and the need to each time design governance structures from scratch…

5. Build Bridges Across Cultures

As C.P. Snow famously described in his lecture on “Two Cultures and the Scientific Revolution,” we must bridge the “two cultures” of science and humanism if we are to solve the world’s problems….

To implement these five priorities we will need experimentation at the operational but also institutional level. This involves the establishment of “data stewards” within organisations that can accelerate data for social good initiative in a responsible manner integrating the five priorities above….(More)”

The Landscape of Open Data Policies


Apograf: “Open Access (OA) publishing has a long history, going back to the early 1990s, and was born with the explicit intention of improving access to scholarly literature. The internet has played a pivotal role in garnering support for free and reusable research publications, as well as stronger and more democratic peer-review systems — ones are not bogged down by the restrictions of influential publishing platforms….

Looking back, looking forward

Launched in 1991, ArXiv.org was a pioneering platform in this regard, a telling example of how researchers could cooperate to publish academic papers for free and in full view for the public. Though it has limitations — papers are curated by moderators and are not peer-reviewed — arXiv is a demonstration of how technology can be used to overcome some of the incentive and distribution problems that scientific research had long been subjected to.

The scientific community has itself assumed the mantle to this end: the Budapest Open Access Initiative (BOAI) and the Berlin Declaration on Open Access Initiative, launched in 2002 and 2003 respectively, are considered landmark movements in the push for unrestricted access to scientific research. While mostly symbolic, the effort highlighted the growing desire to solve the problems plaguing the space through technology.

The BOAI manifesto begins with a statement that is an encapsulation of the movement’s purpose,

“An old tradition and a new technology have converged to make possible an unprecedented public good. The old tradition is the willingness of scientists and scholars to publish the fruits of their research in scholarly journals without payment, for the sake of inquiry and knowledge. The new technology is the internet. The public good they make possible is the world-wide electronic distribution of the peer-reviewed journal literature and completely free and unrestricted access to it by all scientists, scholars, teachers, students, and other curious minds.”

Plan S is a more recent attempt to make publicly funded research available to all. Launched by Science Europe in September 2018, Plan S — short for ‘Shock’ — has energized the research community with its resolution to make access to publicly funded knowledge a right to everyone and dissolve the profit-driven ecosystem of research publication. Members of the European Union have vowed to achieve this by 2020.

Plan S has been supported by governments outside Europe as well. China has thrown itself behind it, and the state of California has enacted a law that requires open access to research one year after publishing. It is, of course, not without its challenges: advocacy and ensuring that publishing is not restricted a few venues are two such obstacles. However, the organization behind forming the guidelines, cOAlition S, has agreed to make the guidelines more flexible.

The emergence of this trend is not without its difficulties, however, and numerous obstacles continue to hinder the dissemination of information in a manner that is truly transparent and public. Chief among these are the many gates that continue to keep research as somewhat of exclusive property, besides the fact that the infrastructure and development for such systems are short on funding and staff…..(More)”.