What privacy preserving techniques make possible: for transport authorities


Blog by Georgina Bourke: “The Mayor of London listed cycling and walking as key population health indicators in the London Health Inequalities Strategy. The pandemic has only amplified the need for people to use cycling as a safer and healthier mode of transport. Yet as the majority of cyclists are white, Black communities are less likely to get the health benefits that cycling provides. Groups like Transport for London (TfL) should monitor how different communities cycle and who is excluded. Organisations like the London Office of Technology and Innovation (LOTI) could help boroughs procure privacy preserving technology to help their efforts.

But at the moment, it’s difficult for public organisations to access mobility data held by private companies. One reason is because mobility data is sensitive. Even if you remove identifiers like name and address, there’s still a risk you can reidentify someone by linking different data sets together. This means you could track how an individual moved around a city. I wrote more about the privacy risks with mobility data in a previous blog post. The industry’s awareness of privacy issues in using and sharing mobility data is rising. In the case of Los Angeles Department of Transport’s Mobility Data Specification (LADOT), Uber is concerned about sharing anonymised data because of the privacy risk. Both organisations are now involved in a legal battle to see which has the rights to the data. This might have been avoided if Uber had applied privacy preserving techniques….

Privacy preserving techniques can help mobility providers share important insights with authorities without compromising peoples’ privacy.

Instead of requiring access to all customer trip data, authorities could ask specific questions like, where are the least popular places to cycle? If mobility providers apply techniques like randomised response, an individual’s identity is obscured by the noise added to the data. This means it’s highly unlikely that someone could be reidentified later on. And because this technique requires authorities to ask very specific questions – for randomised response to work, the answer has to be binary, ie Yes or No – authorities will also be practicing data minimisation by default.

It’s easy to imagine transport authorities like TfL combining privacy preserved mobility data from multiple mobility providers to compare insights and measure service provision. They could cross reference the privacy preserved bike trip data with demographic data in the local area to learn how different communities cycle. The first step to addressing inequality is being able to measure it….(More)”.

The Truth Is Paywalled But The Lies Are Free


Essay by Nathan J. Robinson: “…This means that a lot of the most vital information will end up locked behind the paywall. And while I am not much of a New Yorker fan either, it’s concerning that the Hoover Institute will freely give you Richard Epstein’s infamous article downplaying the threat of coronavirus, but Isaac Chotiner’s interview demolishing Epstein requires a monthly subscription, meaning that the lie is more accessible than its refutation. Eric Levitz of New York is one of the best and most prolific left political commentators we have. But unless you’re a subscriber of New York, you won’t get to hear much of what he has to say each month. 

Possibly even worse is the fact that so much academic writing is kept behind vastly more costly paywalls. A white supremacist on YouTube will tell you all about race and IQ but if you want to read a careful scholarly refutation, obtaining a legal PDF from the journal publisher would cost you $14.95, a price nobody in their right mind would pay for one article if they can’t get institutional access. (I recently gave up on trying to access a scholarly article because I could not find a way to get it for less than $39.95, though in that case the article was garbage rather than gold.) Academic publishing is a nightmarish patchwork, with lots of articles advertised at exorbitant fees on one site, and then for free on another, or accessible only through certain databases, which your university or public library may or may not have access to. (Libraries have to budget carefully because subscription prices are often nuts. A library subscription to the Journal of Coordination Chemistryfor instance, costs $11,367 annually.) 

Of course, people can find their ways around paywalls. SciHub is a completely illegal but extremely convenient means of obtaining academic research for free. (I am purely describing it, not advocating it.) You can find a free version of the article debunking race and IQ myths on ResearchGate, a site that has engaged in mass copyright infringement in order to make research accessible. Often, because journal publishers tightly control access to their copyrighted work in order to charge those exorbitant fees for PDFs, the versions of articles that you can get for free are drafts that have not yet gone through peer review, and have thus been subjected to less scrutiny. This means that the more reliable an article is, the less accessible it is. On the other hand, pseudo-scholarhip is easy to find. Right-wing think tanks like the Cato Institute, the Foundation for Economic Education, the Hoover Institution, the Mackinac Center, the American Enterprise Institute, and the Heritage Foundation pump out slickly-produced policy documents on every subject under the sun. They are utterly untrustworthy—the conclusion is always going to be “let the free market handle the problem,” no matter what the problem or what the facts of the case. But it is often dressed up to look sober-minded and non-ideological. 

It’s not easy or cheap to be an “independent researcher.” When I was writing my first book, Superpredator, I wanted to look through newspaper, magazine, and journal archives to find everything I could about Bill Clinton’s record on race. I was lucky I had a university affiliation, because this gave me access to databases like LexisNexis. If I hadn’t, the cost of finding out what I wanted to find out would likely have run into the thousands of dollars.  

A problem beyond cost, though, is convenience. I find that even when I am doing research through databases and my university library, it is often an absolute mess: the sites are clunky and constantly demanding login credentials. The amount of time wasted in figuring out how to obtain a piece of research material is a massive cost on top of the actual pricing. The federal court document database, PACER, for instance, charges 10 cents a page for access to records, which adds up quickly since legal research often involves looking through thousands of pages. They offer an exemption if you are a researcher or can’t afford it, but to get the exemption you have to fill out a three page form and provide an explanation of both why you need each document and why you deserve the exemption. This is a waste of time that inhibits people’s productivity and limits their access to knowledge.

In fact, to see just how much human potential is being squandered by having knowledge dispensed by the “free market,” let us briefly picture what “totally democratic and accessible knowledge” would look like…(More)”.

How open data could tame Big Tech’s power and avoid a breakup


Patrick Leblond at The Conversation: “…Traditional antitrust approaches such as breaking up Big Tech firms and preventing potential competitor acquisitions are never-ending processes. Even if you break them up and block their ability to acquire other, smaller tech firms, Big Tech will start growing again because of network effects and their data advantage.

And how do we know when a tech firm is big enough to ensure competitive markets? What are the size or scope thresholds for breaking up firms or blocking mergers and acquisitions?

A small startup acquired for millions of dollars can be worth billions of dollars for a Big Tech acquirer once integrated in its ecosystem. A series of small acquisitions can result in a dominant position in one area of the digital economy. Knowing this, competition/antitrust authorities would potentially have to examine every tech transaction, however small.

Not only would this be administratively costly or burdensome on resources, but it would also be difficult for government officials to assess with some precision (and therefore legitimacy), the likely future economic impact of an acquisition in a rapidly evolving technological environment.

Open data access, level the playing field

Given that mass data collection is at the core of Big Tech’s power as gatekeepers to customers, a key solution is to open up data access for other firms so that they can compete better.

Anonymized data (to protect an individual’s privacy rights) about people’s behaviour, interests, views, etc., should be made available for free to anyone wanting to pursue a commercial or non-commercial endeavour. Data about a firm’s operations or performance would, however, remain private.

Using an analogy from the finance world, Big Tech firms act as insider traders. Stock market insiders often possess insider (or private) information about companies that the public does not have. Such individuals then have an incentive to profit by buying or selling shares in those companies before the public becomes aware of the information.

Big Tech’s incentives are no different than stock market insiders. They trade on exclusively available private information (data) to generate extraordinary profits.

Continuing the finance analogy, financial securities regulators forbid the use of inside or non-publicly available information for personal benefit. Individuals found to illegally use such information are punished with jail time and fines.

They also require companies to publicly report relevant information that affects or could significantly affect their performance. Finally, they oblige insiders to publicly report when they buy and sell shares in a company in which they have access to privileged information.

Transposing stock market insider trading regulation to Big Tech implies that data access and use should be monitored under an independent regulatory body — call it a Data Market Authority. Such a body would be responsible for setting and enforcing principles, rules and standards of behaviour among individuals and organizations in the data-driven economy.

For example, a Data Market Authority would require firms to publicly report how they acquire and use personal data. It would prohibit personal data hoarding by ensuring that data is easily portable from one platform, network or marketplace to another. It would also prohibit the buying and selling of personal data as well as protect individuals’ privacy by imposing penalties on firms and individuals in cases of non-compliance.

Data openly and freely available under a strict regulatory environment would likely be a better way to tame Big Tech’s power than breaking them up and having antitrust authorities approving every acquisition that they wish to make….(More)”.

A Time for More Democracy Not Less


Graham Smith at Involve: “As part of the “A democratic response to COVID-19” project, we have been scanning print and social media to get a sense of how arguments for participation and deliberation are resonating in public debates….

Researchers from the Institute for Development Studies point to learning from previous pandemics. Drawing from their experience of working on the ebola epidemic in West Africa, they argue that pandemics are not just technical problems to be solved, but are social in character. They call for more deliberation and participation to ensure that decisions reflect not only the diversity of expert opinion, but also respond to the experiential knowledge of the most vulnerable….

A number of these proposals call for citizens’ assemblies, perhaps to the detriment of other participatory and deliberative processes. The Carnegie Trust offers a broader agenda, reminding us of the pressing contemporary significance of their pre-COVID-19 calls for co-design and co-production. 

The Nuffield Council offers some simple guidance to government about how to act:

  • Show us (the public) what it is doing and thinking across the range of issues of concern
  • Set out the ethical considerations that inform(ed) its judgements
  • Explain how it has arrived at decisions (including taking advice from e.g. SAGE, MEAG), and not that it is just ‘following the science’
  • Invite a broad range of perspectives into the room, including wider public representation 
  • Think ahead – consult and engage other civic interests

We have found only a small number of examples of specific initiatives taking a participatory or deliberative approach to bringing in a broader range of voices in response to the pandemic. Our Covid Voices is gathering written statements of the experience of COVID-19 from those with health conditions or disabilities. The thinktank Demos is running a ‘People’s Commission’, inviting stories of lockdown life. It is not only reflections or stories. The Scottish Government invited ideas on how to tackle the virus, receiving and synthesising 4,000 suggestions. The West Midlands Combined Authority has established a citizens’ panel to guide its recovery work. The UK Citizens’ Assembly (and the French Convention) produced recommendations on how commitments to reach net zero carbon emissions need to be central to a post-COVID-19 recovery. We are sure that these examples only touch the surface of activity and that there will be many more initiatives that we are yet to hear about.

Of course, in one area, citizens have already taken matters into their own hands, with the huge growth in mutual-aid groups to ensure people’s emergency needs are met. The New Local Government Network has considered how public authorities could best support and work with such groups, and Danny Kruger MP was invited by the Prime Minister to investigate how to build on this community-level response.

The call for a more participatory and deliberative approach to governance needs to be more than a niche concern. As the Financial Times recognises, we need a “new civic contract” between government and the people….(More)”.

Cities, crowding, and the coronavirus: Predicting contagion risk hotspots


Paper by Gaurav Bhardwaj et al: “Today, over 4 billion people around the world—more than half the global population—live in cities. By 2050, with the urban population more than doubling its current size, nearly 7 of 10 people in the world will live in cities. Evidence from today’s developed countries and rapidly emerging economies shows that urbanization and the development of cities is a source of dynamism that can lead to enhanced productivity. In fact, no country in the industrial age has ever achieved significant economic growth without urbanization.

The underlying driver of this dynamism is the ability of cities to bring people together. Social and economic interactions are the hallmark of city life, making people more productive and often creating a vibrant market for innovations by entrepreneurs and investors. International evidence suggests that the elasticity of income per capita with respect to city population is between 3% and 8% (Rosenthal & Strange 2003). Each doubling of city size raises its productivity by 5%.

But the coronavirus pandemic is now seriously limiting social interactions. With no vaccine available, prevention through containment and social distancing, along with frequent handwashing, appear to be, for now, the only viable strategies against the virus. The goal is to slow transmission and avoid overwhelming health systems that have finite resources. Hence non-essential businesses have been closed and social distancing measures, including lockdowns, are being applied in many countries. Will such measures defeat the virus in dense urban areas? In principle, yes. Wealthier people in dense neighborhoods can isolate themselves while having amenities and groceries delivered to them. Many can connect remotely to work, and some can even afford to live without working for a time. But poorer residents of crowded neighborhoods cannot afford such luxuries.

To help city leaders prioritize resources towards places with the highest exposure and contagion risk, we have developed a simple methodology that can be rapidly deployed. This methodology identifies hotspots for contagion and vulnerability, based on:
– The practical inability for keeping people apart, based on a combination of population density and livable floor space that does not allow for 2 meters of physical distancing.
– Conditions where, even under lockdown, people might have little option but to cluster (e.g., to access public toilets and water pumps)…(More)”.

Creating a digital commons


Report by the IPPR (UK): ” There are, today, almost no parts of life that are untouched by the presence of data. Virtually every action we take produces some form of digital trail – our phones track our locations, our browsers track searches, our social network apps log our friends and family – even when we are only dimly aware of it.

It is the combination of this near-ubiquitous gathering of data with fast processing that has generated the economic and social transformation of the last few years – one that, if current developments in artificial intelligence (AI) continue, is only likely to accelerate. Combined with data-enabled technology, from the internet of things to 3D printing, we are potentially on the cusp of a radically different economy and society.

As the world emerges from the first phase of the pandemic, the demands for a socially just and sustainable recovery have grown. The data economy can and should be an essential part of that reconstruction, from the efficient management of energy systems to providing greater flexibility in working time. However, without effective public policy, and democratic oversight and management, the danger is that the tendencies in the data economy that we have already seen towards monopoly and opacity – reinforced, so far, by the crisis – will continue to dominate. It is essential, then, that planning for a fairer, more sustainable economy in the future build in active public policy for data…

This report focusses closely on data as the fundamental building block of the emerging economy, and argues that its use, management, ownership, and control as critical to shaping the future…(More)”.

Public perceptions on data sharing: key insights from the UK and the USA


Paper by Saira Ghafur, Jackie Van Dael, Melanie Leis and Ara Darzi, and Aziz Sheikh: “Data science and artificial intelligence (AI) have the potential to transform the delivery of health care. Health care as a sector, with all of the longitudinal data it holds on patients across their lifetimes, is positioned to take advantage of what data science and AI have to offer. The current COVID-19 pandemic has shown the benefits of sharing data globally to permit a data-driven response through rapid data collection, analysis, modelling, and timely reporting.

Despite its obvious advantages, data sharing is a controversial subject, with researchers and members of the public justifiably concerned about how and why health data are shared. The most common concern is privacy; even when data are (pseudo-)anonymised, there remains a risk that a malicious hacker could, using only a few datapoints, re-identify individuals. For many, it is often unclear whether the risks of data sharing outweigh the benefits.

A series of surveys over recent years indicate that the public holds a range of views about data sharing. Over the past few years, there have been several important data breaches and cyberattacks. This has resulted in patients and the public questioning the safety of their data, including the prospect or risk of their health data being shared with unauthorised third parties.

We surveyed people across the UK and the USA to examine public attitude towards data sharing, data access, and the use of AI in health care. These two countries were chosen as comparators as both are high-income countries that have had substantial national investments in health information technology (IT) with established track records of using data to support health-care planning, delivery, and research. The UK and USA, however, have sharply contrasting models of health-care delivery, making it interesting to observe if these differences affect public attitudes.

Willingness to share anonymised personal health information varied across receiving bodies (figure). The more commercial the purpose of the receiving institution (eg, for an insurance or tech company), the less often respondents were willing to share their anonymised personal health information in both the UK and the USA. Older respondents (≥35 years) in both countries were generally less likely to trust any organisation with their anonymised personal health information than younger respondents (<35 years)…

Despite the benefits of big data and technology in health care, our findings suggest that the rapid development of novel technologies has been received with concern. Growing commodification of patient data has increased awareness of the risks involved in data sharing. There is a need for public standards that secure regulation and transparency of data use and sharing and support patient understanding of how data are used and for what purposes….(More)”.

The Open Innovation in Science research field: a collaborative conceptualisation approach


Paper by Susanne Beck et al: “Openness and collaboration in scientific research are attracting increasing attention from scholars and practitioners alike. However, a common understanding of these phenomena is hindered by disciplinary boundaries and disconnected research streams. We link dispersed knowledge on Open Innovation, Open Science, and related concepts such as Responsible Research and Innovation by proposing a unifying Open Innovation in Science (OIS) Research Framework. This framework captures the antecedents, contingencies, and consequences of open and collaborative practices along the entire process of generating and disseminating scientific insights and translating them into innovation. Moreover, it elucidates individual-, team-, organisation-, field-, and society‐level factors shaping OIS practices. To conceptualise the framework, we employed a collaborative approach involving 47 scholars from multiple disciplines, highlighting both tensions and commonalities between existing approaches. The OIS Research Framework thus serves as a basis for future research, informs policy discussions, and provides guidance to scientists and practitioners….(More)”.

Calling Bullshit: The Art of Scepticism in a Data-Driven World


Book by Carl Bergstrom and Jevin West: “Politicians are unconstrained by facts. Science is conducted by press release. Higher education rewards bullshit over analytic thought. Startup culture elevates bullshit to high art. Advertisers wink conspiratorially and invite us to join them in seeing through all the bullshit — and take advantage of our lowered guard to bombard us with bullshit of the second order. The majority of administrative activity, whether in private business or the public sphere, seems to be little more than a sophisticated exercise in the combinatorial reassembly of bullshit.

We’re sick of it. It’s time to do something, and as educators, one constructive thing we know how to do is to teach people. So, the aim of this course is to help students navigate the bullshit-rich modern environment by identifying bullshit, seeing through it, and combating it with effective analysis and argument.

What do we mean, exactly, by bullshit and calling bullshit? As a first approximation:

Bullshit involves language, statistical figures, data graphics, and other forms of presentation intended to persuade by impressing and overwhelming a reader or listener, with a blatant disregard for truth and logical coherence.

Calling bullshit is a performative utterance, a speech act in which one publicly repudiates something objectionable. The scope of targets is broader than bullshit alone. You can call bullshit on bullshit, but you can also call bullshit on lies, treachery, trickery, or injustice.

In this course we will teach you how to spot the former and effectively perform the latter.

While bullshit may reach its apogee in the political domain, this is not a course on political bullshit. Instead, we will focus on bullshit that comes clad in the trappings of scholarly discourse. Traditionally, such highbrow nonsense has come couched in big words and fancy rhetoric, but more and more we see it presented instead in the guise of big data and fancy algorithms — and these quantitative, statistical, and computational forms of bullshit are those that we will be addressing in the present course.

Of course an advertisement is trying to sell you something, but do you know whether the TED talk you watched last night is also bullshit — and if so, can you explain why? Can you see the problem with the latest New York Times or Washington Post article fawning over some startup’s big data analytics? Can you tell when a clinical trial reported in the New England Journal or JAMA is trustworthy, and when it is just a veiled press release for some big pharma company?…(More)”.

Project Patient Voice


Press Release: “The U.S. Food and Drug Administration today launched Project Patient Voice, an initiative of the FDA’s Oncology Center of Excellence (OCE). Through a new website, Project Patient Voice creates a consistent source of publicly available information describing patient-reported symptoms from cancer trials for marketed treatments. While this patient-reported data has historically been analyzed by the FDA during the drug approval process, it is rarely included in product labeling and, therefore, is largely inaccessible to the public.

“Project Patient Voice has been initiated by the Oncology Center of Excellence to give patients and health care professionals unique information on symptomatic side effects to better inform their treatment choices,” said FDA Principal Deputy Commissioner Amy Abernethy, M.D., Ph.D. “The Project Patient Voice pilot is a significant step in advancing a patient-centered approach to oncology drug development. Where patient-reported symptom information is collected rigorously, this information should be readily available to patients.” 

Patient-reported outcome (PRO) data is collected using questionnaires that patients complete during clinical trials. These questionnaires are designed to capture important information about disease- or treatment-related symptoms. This includes how severe or how often a symptom or side effect occurs.

Patient-reported data can provide additional, complementary information for health care professionals to discuss with patients, specifically when discussing the potential side effects of a particular cancer treatment. In contrast to the clinician-reported safety data in product labeling, the data in Project Patient Voice is obtained directly from patients and can show symptoms before treatment starts and at multiple time points while receiving cancer treatment. 

The Project Patient Voice website will include a list of cancer clinical trials that have available patient-reported symptom data. Each trial will include a table of the patient-reported symptoms collected. Each patient-reported symptom can be selected to display a series of bar and pie charts describing the patient-reported symptom at baseline (before treatment starts) and over the first 6 months of treatment. This information provides insights into side effects not currently available in standard FDA safety tables, including existing symptoms before the start of treatment, symptoms over time, and the subset of patients who did not have a particular symptom prior to starting treatment….(More)”.