Preprints: The What, The Why, The How.


Center for Open Science: “The use of preprint servers by scholarly communities is definitely on the rise. Many developments in the past year indicate that preprints will be a huge part of the research landscape. Developments with DOIs, changes in funder expectations, and the launch of many new services indicate that preprints will become much more pervasive and reach beyond the communities where they started.

From funding agencies that want to realize impact from their efforts sooner to researchers’ desire to disseminate their research more quickly, the growth of these servers and the number of works being shared, has been substantial. At COS, we already host twenty different organizations’ services via the OSF Preprints platform.

So what’s a preprint and what is it good for? A preprint is a manuscript submitted to a  dedicated repository (like OSF PreprintsPeerJbioRxiv or arXiv) prior to peer review and formal publication. Some of those repositories may also accept other types of research outputs, like working papers and posters or conference proceedings. Getting a preprint out there has a variety of benefits for authors other stakeholders in the research:

  • They increase the visibility of research, and sooner. While traditional papers can languish in the peer review process for months, even years, a preprint is live the minute it is submitted and moderated (if the service moderates). This means your work gets indexed by Google Scholar and Altmetric, and discovered by more relevant readers than ever before.
  • You can get feedback on your work and make improvements prior to journal submission. Many authors have publicly commented about the recommendations for improvements they’ve received on their preprint that strengthened their work and even led to finding new collaborators.
  • Papers with an accompanying preprint get cited 30% more often than papers without. This research from PeerJsums it up, but that’s a big benefit for scholars looking to get more visibility and impact from their efforts.
  • Preprints get a permanent DOI, which makes them part of the freely accessible scientific record forever. This means others can relay on that permanence when citing your work in their research. It also means that your idea, developed by you, has a “stake in the ground” where potential scooping and intellectual theft are concerned.

So, preprints can really help lubricate scientific progress. But there are some things to keep in mind before you post. Usually, you can’t post a preprint of an article that’s already been submitted to a journal for peer review. Policies among journals vary widely, so it’s important to check with the journal you’re interested in sending your paper to BEFORE you submit a preprint that might later be published. A good resource for doing this is JISC’s SHERPA/RoMEO database. It’s also a good idea to understand the licensing choices available. At OSF Preprints, we recommend the CC-BY license suite, but you can check choosealicense.com or https://osf.io/6uupa/ for good overviews on how best to license your submissions….(More)”.

Data Protection and e-Privacy: From Spam and Cookies to Big Data, Machine Learning and Profiling


Chapter by Lilian Edwards in L Edwards ed Law, Policy and the Internet (Hart , 2018): “In this chapter, I examine in detail how data subjects are tracked, profiled and targeted by their activities on line and, increasingly, in the “offline” world as well. Tracking is part of both commercial and state surveillance, but in this chapter I concentrate on the former. The European law relating to spam, cookies, online behavioural advertising (OBA), machine learning (ML) and the Internet of Things (IoT) is examined in detail, using both the GDPR and the forthcoming draft ePrivacy Regulation. The chapter concludes by examining both code and law solutions which might find a way forward to protect user privacy and still enable innovation, by looking to paradigms not based around consent, and less likely to rely on a “transparency fallacy”. Particular attention is drawn to the new work around Personal Data Containers (PDCs) and distributed ML analytics….(More)”.

Civic Tech: Making Technology Work for People


Book by Andrew Schrock: “The term “Civic Tech” has gained international recognition as a way to unite communities and government through technology design. But what does it mean for our shared future? In this book, Andrew Schrock cuts through the hype by telling stories of the people and ideas driving the movement. He argues that Civic Tech emerged in response to inequality and persistent social problems. The collaborative approaches and early successes of “techies” may not be easy solutions, but they exemplify a powerful political alternative. Civic Tech draws our attention to the challenges of public ownership and democratizing technology design—vital goals for the years ahead….(More)”.

Activating Agency or Nudging?


Article by Michael Walton: “Two ideas in development – activating agency of citizens and using “nudges” to change their behavior – seem diametrically opposed in spirit: activating latent agency at the ground level versus  top-down designs that exploit people’s behavioral responses. Yet both start from a psychological focus and a belief that changes in people’s behavior can lead to “better” outcomes, for the individuals involved and for society.  So how should we think of these contrasting sets of ideas? When should each approach be used?…

Let’s compare the two approaches with respect to diagnostic frame, practice and ethics.

Diagnostic frame.  

The common ground is recognition that people use short-cuts for decision-making, in ways that can hurt their own interests.  In both approaches, there is an emphasis that decision-making is particularly tough for poor people, given the sheer weight of daily problem-solving.  In behavioral economics one core idea is that we have limited mental “bandwidth” and this form of scarcity hampers decision-making. However, in the “agency” tradition, there is much more emphasis on unearthing and working with the origins of the prevailing mental models, with respect to social exclusion, stigmatization, and the typically unequal economic and cultural relations with respect to more powerful groups in a society.  One approach works more with symptoms, the other with root causes.

Implications for practice.  

The two approaches on display in Cerrito both concern social gains, and both involve a role for an external actor.  But here the contrast is sharp. In the “nudge” approach the external actor is a beneficent technocrat, trying out alternative offers to poor (or non-poor) people to improve outcomes.  A vivid example is alternative messages to tax payers in Guatemala, that induce varying improvements in tax payments.  In the “agency” approach the essence of the interaction is between a front-line worker and an individual or family, with a co-created diagnosis and plan, designed around goals and specific actions that the poor person chooses.  This is akin to what anthropologist Arjun Appadurai termed increasing the “capacity to aspire,” and can extend to greater engagement in civic and political life.

Ethics.

In both approaches, ethics is central.  As implicated in the “nudging for social good as opposed to electoral gain,” some form of ethical regulation is surely needed. In “action to activate agency,” the central ethical issue is of maintaining equality in design between activist and citizen, and explicit owning of any decisions.

What does this imply?

To some degree this is a question of domain of action.  Nudging is most appropriate in a program for which there is a fully supported political and social program, and the issue is how to make it work (as in paying taxes).  The agency approach has a broader ambition, but starts from domains that are potentially within an individual’s control once the sources of “ineffective” or inhibited behavior are tackled, including via front-line interactions with public or private actors….(More)”.

Data Ethics Framework


Introduction by Matt Hancock MP, Secretary of State for Digital, Culture, Media and Sport to the UK’s Data Ethics Framework: “Making better use of data offers huge benefits, in helping us provide the best possible services to the people we serve.

However, all new opportunities present new challenges. The pace of technology is changing so fast that we need to make sure we are constantly adapting our codes and standards. Those of us in the public sector need to lead the way.

As we set out to develop our National Data Strategy, getting the ethics right, particularly in the delivery of public services, is critical. To do this, it is essential that we agree collective standards and ethical frameworks.

Ethics and innovation are not mutually exclusive. Thinking carefully about how we use our data can help us be better at innovating when we use it.

Our new Data Ethics Framework sets out clear principles for how data should be used in the public sector. It will help us maximise the value of data whilst also setting the highest standards for transparency and accountability when building or buying new data technology.

We have come a long way since we published the first version of the Data Science Ethical Framework. This new version focuses on the need for technology, policy and operational specialists to work together, so we can make the most of expertise from across disciplines.

We want to work with others to develop transparent standards for using new technology in the public sector, promoting innovation in a safe and ethical way.

This framework will build the confidence in public sector data use needed to underpin a strong digital economy. I am looking forward to working with all of you to put it into practice…. (More)”

The Data Ethics Framework principles

1.Start with clear user need and public benefit

2.Be aware of relevant legislation and codes of practice

3.Use data that is proportionate to the user need

4.Understand the limitations of the data

5.Ensure robust practices and work within your skillset

6.Make your work transparent and be accountable

7.Embed data use responsibly

The Data Ethics Workbook

I want your (anonymized) social media data


Anthony Sanford at The Conversation: “Social media sites’ responses to the Facebook-Cambridge Analytica scandal and new European privacy regulations have given users much more control over who can access their data, and for what purposes. To me, as a social media user, these are positive developments: It’s scary to think what these platforms could do with the troves of data available about me. But as a researcher, increased restrictions on data sharing worry me.

I am among the many scholars who depend on data from social media to gain insights into people’s actions. In a rush to protect individuals’ privacy, I worry that an unintended casualty could be knowledge about human nature. My most recent work, for example, analyzes feelings people express on Twitter to explain why the stock market fluctuates so much over the course of a single day. There are applications well beyond finance. Other scholars have studied mass transit rider satisfactionemergency alert systems’ function during natural disasters and how online interactions influence people’s desire to lead healthy lifestyles.

This poses a dilemma – not just for me personally, but for society as a whole. Most people don’t want social media platforms to share or sell their personal information, unless specifically authorized by the individual user. But as members of a collective society, it’s useful to understand the social forces at work influencing everyday life and long-term trends. Before the recent crises, Facebook and other companies had already been making it hard for legitimate researchers to use their data, including by making it more difficult and more expensive to download and access data for analysis. The renewed public pressure for privacy means it’s likely to get even tougher….

It’s true – and concerning – that some presumably unethical people have tried to use social media data for their own benefit. But the data are not the actual problem, and cutting researchers’ access to data is not the solution. Doing so would also deprive society of the benefits of social media analysis.

Fortunately, there is a way to resolve this dilemma. Anonymization of data can keep people’s individual privacy intact, while giving researchers access to collective data that can yield important insights.

There’s even a strong model for how to strike that balance efficiently: the U.S. Census Bureau. For decades, that government agency has collected extremely personal data from households all across the country: ages, employment status, income levels, Social Security numbers and political affiliations. The results it publishes are very rich, but also not traceable to any individual.

It often is technically possible to reverse anonymity protections on data, using multiple pieces of anonymized information to identify the person they all relate to. The Census Bureau takes steps to prevent this.

For instance, when members of the public access census data, the Census Bureau restricts information that is likely to identify specific individuals, such as reporting there is just one person in a community with a particularly high- or low-income level.

For researchers the process is somewhat different, but provides significant protections both in law and in practice. Scholars have to pass the Census Bureau’s vetting process to make sure they are legitimate, and must undergo training about what they can and cannot do with the data. The penalties for violating the rules include not only being barred from using census data in the future, but also civil fines and even criminal prosecution.

Even then, what researchers get comes without a name or Social Security number. Instead, the Census Bureau uses what it calls “protected identification keys,” a random number that replaces data that would allow researchers to identify individuals.

Each person’s data is labeled with his or her own identification key, allowing researchers to link information of different types. For instance, a researcher wanting to track how long it takes people to complete a college degree could follow individuals’ education levels over time, thanks to the identification keys.

Social media platforms could implement a similar anonymization process instead of increasing hurdles – and cost – to access their data…(More)” .

Six or Seven Things Social Media Can Do For Democracy


Ethan Zuckerman: “I am concerned that we’ve not had a robust conversation about what we want social media to do for us.

We know what social media does for platform companies like Facebook and Twitter: it generates enormous masses of user-generated content that can be monetized with advertising, and reams of behavioral data that make that advertising more valuable. Perhaps we have a sense for what social media does for us as individuals, connecting us to distant friends, helping us maintain a lightweight awareness of each other’s lives even when we are not co-present. Or perhaps it’s a machine for disappointment and envy, a window into lives better lived than our own. It’s likely that what social media does for us personally is a deeply idiosyncratic question, dependent on our own lives, psyches and decisions, better discussed with our therapists than spoken about in generalities.

I’m interested in what social media should do for us as citizens in a democracy. We talk about social media as a digital public sphere, invoking Habermas and coffeehouses frequented by the bourgeoisie. Before we ask whether the internet succeeds as a public sphere, we ought to ask whether that’s actually what we want it to be.

I take my lead here from journalism scholar Michael Schudson, who took issue with a hyperbolic statement made by media critic James Carey: “journalism as a practice is unthinkable except in the context of democracy; in fact, journalism is usefully understood as another name for democracy.” For Schudson, this was a step too far. Journalism may be necessary for democracy to function well, but journalism by itself is not democracy and cannot produce democracy. Instead, we should work to understand the “Six or Seven Things News Can Do for Democracy”, the title of an incisive essay Schudson wrote to anchor his book, Why Democracies Need an Unloveable Press….

In this same spirit, I’d like to suggest six or seven things social media can do for democracy. I am neither as learned or as wise as Schudson, so I fully expect readers to offer half a dozen functions that I’ve missed. In the spirit of Schudson’s public forum and Benkler’s digital public sphere, I offer these in the hopes of starting, not ending, a conversation.

Social media can inform us…

Social media can amplify important voices and issues…

Social media can be a tool for connection and solidarity…

Social media can be a space for mobilization…

Social media can be a space for deliberation and debate…

Social media can be a tool for showing us a diversity of views and perspectives…

Social media can be a model for democratically governed spaces…(More).

Technology and satellite companies open up a world of data


Gabriel Popkin at Nature: “In the past few years, technology and satellite companies’ offerings to scientists have increased dramatically. Thousands of researchers now use high-resolution data from commercial satellites for their work. Thousands more use cloud-computing resources provided by big Internet companies to crunch data sets that would overwhelm most university computing clusters. Researchers use the new capabilities to track and visualize forest and coral-reef loss; monitor farm crops to boost yields; and predict glacier melt and disease outbreaks. Often, they are analysing much larger areas than has ever been possible — sometimes even encompassing the entire globe. Such studies are landing in leading journals and grabbing media attention.

Commercial data and cloud computing are not panaceas for all research questions. NASA and the European Space Agency carefully calibrate the spectral quality of their imagers and test them with particular types of scientific analysis in mind, whereas the aim of many commercial satellites is to take good-quality, high-resolution pictures for governments and private customers. And no company can compete with Landsat’s free, publicly available, 46-year archive of images of Earth’s surface. For commercial data, scientists must often request images of specific regions taken at specific times, and agree not to publish raw data. Some companies reserve cloud-computing assets for researchers with aligned interests such as artificial intelligence or geospatial-data analysis. And although companies publicly make some funding and other resources available for scientists, getting access to commercial data and resources often requires personal connections. Still, by choosing the right data sources and partners, scientists can explore new approaches to research problems.

Mapping poverty

Joshua Blumenstock, an information scientist at the University of California, Berkeley (UCB), is always on the hunt for data he can use to map wealth and poverty, especially in countries that do not conduct regular censuses. “If you’re trying to design policy or do anything to improve living conditions, you generally need data to figure out where to go, to figure out who to help, even to figure out if the things you’re doing are making a difference.”

In a 2015 study, he used records from mobile-phone companies to map Rwanda’s wealth distribution (J. Blumenstock et al. Science 350, 1073–1076; 2015). But to track wealth distribution worldwide, patching together data-sharing agreements with hundreds of these companies would have been impractical. Another potential information source — high-resolution commercial satellite imagery — could have cost him upwards of US$10,000 for data from just one country….

Use of commercial images can also be restricted. Scientists are free to share or publish most government data or data they have collected themselves. But they are typically limited to publishing only the results of studies of commercial data, and at most a limited number of illustrative images.

Many researchers are moving towards a hybrid approach, combining public and commercial data, and running analyses locally or in the cloud, depending on need. Weiss still uses his tried-and-tested ArcGIS software from Esri for studies of small regions, and jumps to Earth Engine for global analyses.

The new offerings herald a shift from an era when scientists had to spend much of their time gathering and preparing data to one in which they’re thinking about how to use them. “Data isn’t an issue any more,” says Roy. “The next generation is going to be about what kinds of questions are we going to be able to ask?”…(More)”.

A Rule of Persons, Not Machines: The Limits of Legal Automation


Paper by Frank A. Pasquale: “For many legal futurists, attorneys’ work is a prime target for automation. They view the legal practice of most businesses as algorithmic: data (such as facts) are transformed into outputs (agreements or litigation stances) via application of set rules. These technophiles promote substituting computer code for contracts and descriptions of facts now written by humans. They point to early successes in legal automation as proof of concept. TurboTax has helped millions of Americans file taxes, and algorithms have taken over certain aspects of stock trading. Corporate efforts to “formalize legal code” may bring new efficiencies in areas of practice characterized by both legal and factual clarity.

However, legal automation can also elide or exclude important human values, necessary improvisations, and irreducibly deliberative governance. Due process, appeals, and narratively intelligible explanation from persons, for persons, depend on forms of communication that are not reducible to software. Language is constitutive of these aspects of law. To preserve accountability and a humane legal order, these reasons must be expressed in language by a responsible person. This basic requirement for legitimacy limits legal automation in several contexts, including corporate compliance, property recordation, and contracting. A robust and ethical legal profession respects the flexibility and subtlety of legal language as a prerequisite for a just and accountable social order. It ensures a rule of persons, not machines…(More)”

Algorithm Observatory: Where anyone can study any social computing algorithm.


About: “We know that social computing algorithms are used to categorize us, but the way they do so is not always transparent. To take just one example, ProPublica recently uncovered that Facebook allows housing advertisers to exclude users by race.

Even so, there are no simple and accessible resources for us, the public, to study algorithms empirically, and to engage critically with the technologies that are shaping our daily lives in such profound ways.

That is why we created Algorithm Observatory.

Part media literacy project and part citizen experiment, the goal of Algorithm Observatory is to provide a collaborative online lab for the study of social computing algorithms. The data collected through this site is analyzed to compare how a particular algorithm handles data differently depending on the characteristics of users.

Algorithm Observatory is a work in progress. This prototype only allows users to explore Facebook advertising algorithms, and the functionality is limited. We are currently looking for funding to realize the project’s full potential: to allow anyone to study any social computing algorithm….

Our future plans

This is a prototype, which only begins to showcase the things that Algorithm Observatory will be able to do in the future.

Eventually, the website will allow anyone to design an experiment involving a social computing algorithm. The platform will allow researchers to recruit volunteer participants, who will be able to contribute content to the site securely and anonymously. Researchers will then be able to conduct an analysis to compare how the algorithm handles users differently depending on individual characteristics. The results will be shared by publishing a report evaluating the social impact of the algorithm. All data and reports will become publicly available and open for comments and reviews. Researchers will be able to study any algorithm, because the site does not require direct access to the source code, but relies instead on empirical observation of the interaction between the algorithm and volunteer participants….(More)”.