“Anonymous” Data Won’t Protect Your Identity


Sophie Bushwick at Scientific American: “The world produces roughly 2.5 quintillion bytes of digital data per day, adding to a sea of information that includes intimate details about many individuals’ health and habits. To protect privacy, data brokers must anonymize such records before sharing them with researchers and marketers. But a new study finds it is relatively easy to reidentify a person from a supposedly anonymized data set—even when that set is incomplete.

Massive data repositories can reveal trends that teach medical researchers about disease, demonstrate issues such as the effects of income inequality, coach artificial intelligence into humanlike behavior and, of course, aim advertising more efficiently. To shield people who—wittingly or not—contribute personal information to these digital storehouses, most brokers send their data through a process of deidentification. This procedure involves removing obvious markers, including names and social security numbers, and sometimes taking other precautions, such as introducing random “noise” data to the collection or replacing specific details with general ones (for example, swapping a birth date of “March 7, 1990” for “January–April 1990”). The brokers then release or sell a portion of this information.

“Data anonymization is basically how, for the past 25 years, we’ve been using data for statistical purposes and research while preserving people’s privacy,” says Yves-Alexandre de Montjoye, an assistant professor of computational privacy at Imperial College London and co-author of the new study, published this week in Nature Communications.  Many commonly used anonymization techniques, however, originated in the 1990s, before the Internet’s rapid development made it possible to collect such an enormous amount of detail about things such as an individual’s health, finances, and shopping and browsing habits. This discrepancy has made it relatively easy to connect an anonymous line of data to a specific person: if a private detective is searching for someone in New York City and knows the subject is male, is 30 to 35 years old and has diabetes, the sleuth would not be able to deduce the man’s name—but could likely do so quite easily if he or she also knows the target’s birthday, number of children, zip code, employer and car model….(More)”

Battling Information Illiteracy


Article by Paul T. Jaeger and Natalie Greene Taylor on “How misinformation affects the future of policy…“California wildfires are being magnified and made so much worse by the bad environmental laws which aren’t allowing massive amounts of readily available water to be properly utilized. It is being diverted into the Pacific Ocean. Must also tree clear to stop fire from spreading!”

This tweet was a statement by a US president about a major event, suggesting changes to existing policies. It is also not true. Every element of the tweet—other than the existence of California, the Pacific Ocean, and wildfires—is false. And it was not a simple misunderstanding, because a tweet from Trump the next day reiterated these themes and blamed the state’s governor personally for holding back water to fight the fires.

So how does this pertain to information policy, since the tweet is about environmental policy issues? The answer is in the information. The use and misuse of information in governance and policymaking may be turning into the biggest information policy issue of all. And as technologies and methods of communication evolve, a large part of engaging with and advocating for information policy will consist of addressing the new challenges of teaching information literacy and behavior.

Misinformation literacy

The internet has made it easy for people to be information illiterate in new ways. Anyone can create information now—regardless of quality—and get it in front of a large number of people. The ability of social media to spread information as fast as possible, and to as many people as possible, challenges literacy, as does the ability to manipulate images, sounds, and video with ease….(More)”

The value of data in Canada: Experimental estimates


Statistics Canada: “As data and information take on a far more prominent role in Canada and, indeed, all over the world, data, databases and data science have become a staple of modern life. When the electricity goes out, Canadians are as much in search of their data feed as they are food and heat. Consumers are using more and more data that is embodied in the products they buy, whether those products are music, reading material, cars and other appliances, or a wide range of other goods and services. Manufacturers, merchants and other businesses depend increasingly on the collection, processing and analysis of data to make their production processes more efficient and to drive their marketing strategies.

The increasing use of and investment in all things data is driving economic growth, changing the employment landscape and reshaping how and from where we buy and sell goods. Yet the rapid rise in the use and importance of data is not well measured in the existing statistical system. Given the ‘lack of data on data’, Statistics Canada has initiated new research to produce a first set of estimates of the value of data, databases and data science. The development of these estimates benefited from collaboration with the Bureau of Economic Analysis in the United States and the Organisation for Economic Co-operation and Development.

In 2018, Canadian investment in data, databases and data science was estimated to be as high as $40 billion. This was greater than the annual investment in industrial machinery, transportation equipment, and research and development and represented approximately 12% of total non-residential investment in 2018….

Statistics Canada recently released a conceptual framework outlining how one might measure the economic value of data, databases and data science. Thanks to this new framework, the growing role of data in Canada can be measured through time. This framework is described in a paper that was released in The Daily on June 24, 2019 entitled “Measuring investments in data, databases and data science: Conceptual framework.” That paper describes the concept of an ‘information chain’ in which data are derived from everyday observations, databases are constructed from data, and data science creates new knowledge by analyzing the contents of databases….(More)”.

The Impact of Citizen Environmental Science in the United States


Paper by George Wyeth, Lee C. Paddock, Alison Parker, Robert L. Glicksman and Jecoliah Williams: “An increasingly sophisticated public, rapid changes in monitoring technology, the ability to process large volumes of data, and social media are increasing the capacity for members of the public and advocacy groups to gather, interpret, and exchange environmental data. This development has the potential to alter the government-centric approach to environmental governance; however, citizen science has had a mixed record in influencing government decisions and actions. This Article reviews the rapid changes that are going on in the field of citizen science and examines what makes citizen science initiatives impactful, as well as the barriers to greater impact. It reports on 10 case studies, and evaluates these to provide findings about the state of citizen science and recommendations on what might be done to increase its influence on environmental decisionmaking….(More)”,

The plan to mine the world’s research papers


Priyanka Pulla in Nature: “Carl Malamud is on a crusade to liberate information locked up behind paywalls — and his campaigns have scored many victories. He has spent decades publishing copyrighted legal documents, from building codes to court records, and then arguing that such texts represent public-domain law that ought to be available to any citizen online. Sometimes, he has won those arguments in court. Now, the 60-year-old American technologist is turning his sights on a new objective: freeing paywalled scientific literature. And he thinks he has a legal way to do it.

Over the past year, Malamud has — without asking publishers — teamed up with Indian researchers to build a gigantic store of text and images extracted from 73 million journal articles dating from 1847 up to the present day. The cache, which is still being created, will be kept on a 576-terabyte storage facility at Jawaharlal Nehru University (JNU) in New Delhi. “This is not every journal article ever written, but it’s a lot,” Malamud says. It’s comparable to the size of the core collection in the Web of Science database, for instance. Malamud and his JNU collaborator, bioinformatician Andrew Lynn, call their facility the JNU data depot.

No one will be allowed to read or download work from the repository, because that would breach publishers’ copyright. Instead, Malamud envisages, researchers could crawl over its text and data with computer software, scanning through the world’s scientific literature to pull out insights without actually reading the text.

The unprecedented project is generating much excitement because it could, for the first time, open up vast swathes of the paywalled literature for easy computerized analysis. Dozens of research groups already mine papers to build databases of genes and chemicals, map associations between proteins and diseases, and generate useful scientific hypotheses. But publishers control — and often limit — the speed and scope of such projects, which typically confine themselves to abstracts, not full text. Researchers in India, the United States and the United Kingdom are already making plans to use the JNU store instead. Malamud and Lynn have held workshops at Indian government laboratories and universities to explain the idea. “We bring in professors and explain what we are doing. They get all excited and they say, ‘Oh gosh, this is wonderful’,” says Malamud.

But the depot’s legal status isn’t yet clear. Malamud, who contacted several intellectual-property (IP) lawyers before starting work on the depot, hopes to avoid a lawsuit. “Our position is that what we are doing is perfectly legal,” he says. For the moment, he is proceeding with caution: the JNU data depot is air-gapped, meaning that no one can access it from the Internet. Users have to physically visit the facility, and only researchers who want to mine for non-commercial purposes are currently allowed in. Malamud says his team does plan to allow remote access in the future. “The hope is to do this slowly and deliberately. We are not throwing this open right away,” he says….(More)”.

Participatory Citizenship and Crisis in Contemporary Brazil


Book by Valesca Lima: “This book discusses the issues of citizen rights, governance and political crisis in Brazil. The project has a focus on “citizenship in times of crisis,” i.e., seeking to understand how citizenship rights have changed since the Brazilian political and economic crisis that started in 2014. Building on theories of citizenship and governance, the author examines policy-based evidence on the retractions of participatory rights, which are consequence of a stagnant economic scenario and the re-organization of conservative sectors. This work will appeal to scholarly audiences interested in citizenship, Brazilian politics, and Latin American policy and governance….(More)”.

Improving access to information and restoring the public’s faith in democracy through deliberative institutions


Katherine R. Knobloch at Democratic Audit: “Both scholars and citizens have begun to believe that democracy is in decline. Authoritarian power grabs, polarising rhetoric, and increasing inequality can all claim responsibility for democratic systems that feel broken. Democracy depends on a polity who believe that their engagement matters, but evidence suggests democratic institutions have become unresponsive to the will of the public. How can we restore faith in self-government when both research and personal experience tell us that the public is losing power, not gaining it?

Deliberative public engagement

Deliberative democracy offers one solution, and it’s slowly shifting how the public engages in political decision-making. In Oregon, the Citizens’ Initiative Review(CIR) asks a group of randomly selected voters to carefully study public issues and then make policy recommendations based on their collective experience and insight. In Ireland, Citizens’ Assemblies are being used to amend the country’s constitution to better reflect changing cultural norms. In communities across the world, Participatory Budgeting is giving the public control over local government spending. Far from squashing democratic power, these deliberative institutions bolster it. They exemplify a new wave in democratic government, one that aims to bring community members together across political and cultural divides to make decisions about how to govern themselves.

Though the contours of deliberative events vary, most share key characteristics. A diverse body of community members gather together to learn from experts and one another, think through the short- and long-term implications of different policy positions, and discuss how issues affect not only themselves but their wider communities. At the end of those conversations, they make decisions that are representative of the diversity of participants and their ideas and which have been tested through collective reasoning….(More)”.

Studying Crime and Place with the Crime Open Database


M. P. J. Ashby in Research Data Journal for the Humanities and Social Sciences: “The study of spatial and temporal crime patterns is important for both academic understanding of crime-generating processes and for policies aimed at reducing crime. However, studying crime and place is often made more difficult by restrictions on access to appropriate crime data. This means understanding of many spatio-temporal crime patterns are limited to data from a single geographic setting, and there are few attempts at replication. This article introduces the Crime Open Database (code), a database of 16 million offenses from 10 of the largest United States cities over 11 years and more than 60 offense types. Open crime data were obtained from each city, having been published in multiple incompatible formats. The data were processed to harmonize geographic co-ordinates, dates and times, offense categories and location types, as well as adding census and other geographic identifiers. The resulting database allows the wider study of spatio-temporal patterns of crime across multiple US cities, allowing greater understanding of variations in the relationships between crime and place across different settings, as well as facilitating replication of research….(More)”.

Governing Smart Data in the Public Interest: Lessons from Ontario’s Smart Metering Entity


Paper by Teresa Scassa and Merlynda Vilain: “The collection of vast quantities of personal data from embedded sensors is increasingly an aspect of urban life. This type of data collection is a feature of so-called smart cities, and it raises important questions about data governance. This is particularly the case where the data may be made available for reuse by others and for a variety of purposes.

This paper focuses on the governance of data captured through “smart” technologies and uses Ontario’s smart metering program as a case study. Ontario rolled out mandatory smart metering for electrical consumption in the early 2000s largely to meet energy conservation goals. In doing so, it designed a centralized data governance system overseen by the Smart Metering Entity to manage smart meter data and to protect consumer privacy. As interest in access to the data grew among third parties, and as new potential applications for the data emerged, the regulator sought to develop a model for data sharing that would protect privacy in relation to these new uses and that would avoid uses that might harm the public interest…(More)”.

The Remarkable Unresponsiveness of College Students to Nudging And What We Can Learn from It


Paper by Philip Oreopoulos and Uros Petronijevic: “We present results from a five-year effort to design promising online and text-message interventions to improve college achievement through several distinct channels. From a sample of nearly 25,000 students across three different campuses, we find some improvement from coaching-based interventions on mental health and study time, but none of the interventions we evaluate significantly influences academic outcomes (even for those students more at risk of dropping out). We interpret the results with our survey data and a model of student effort. Students study about five to eight hours fewer each week than they plan to, though our interventions do not alter this tendency. The coaching interventions make some students realize that more effort is needed to attain good grades but, rather than working harder, they settle by adjusting grade expectations downwards. Our study time impacts are not large enough for translating into significant academic benefits. More comprehensive but expensive programs appear more promising for helping college students outside the classroom….(More)”