Inside the ‘Wikipedia of Maps,’ Tensions Grow Over Corporate Influence


Corey Dickinson at Bloomberg: “What do Lyft, Facebook, the International Red Cross, the U.N., the government of Nepal and Pokémon Go have in common? They all use the same source of geospatial data: OpenStreetMap, a free, open-source online mapping service akin to Google Maps or Apple Maps. But unlike those corporate-owned mapping platforms, OSM is built on a network of mostly volunteer contributors. Researchers have described it as the “Wikipedia for maps.”

Since it launched in 2004, OpenStreetMap has become an essential part of the world’s technology infrastructure. Hundreds of millions of monthly users interact with services derived from its data, from ridehailing apps, to social media geotagging on Snapchat and Instagram, to humanitarian relief operations in the wake of natural disasters. 

But recently the map has been changing, due the growing impact of private sector companies that rely on it. In a 2019 paper published in the ISPRS International Journal of Geo-Information, a cross-institutional team of researchers traced how Facebook, Apple, Microsoft and other companies have gained prominence as editors of the map. Their priorities, the researchers say, are driving significant change to what is being mapped compared to the past. 

“OpenStreetMap’s data is crowdsourced, which has always made spectators to the project a bit wary about the quality of the data,” says Dipto Sarkar, a professor of geoscience at Carleton University in Ottawa, and one of the paper’s co-authors. “As the data becomes more valuable and is used for an ever-increasing list of projects, the integrity of the information has to be almost perfect. These companies need to make sure there’s a good map of the places they want to expand in, and nobody else is offering that, so they’ve decided to fill it in themselves.”…(More)”.

Collective bargaining on digital platforms and data stewardship


Paper by Astha Kapoor: “… there is a need to think of exploitation on platforms not only through the lens of labour rights but also that of data rights. In the current context, it is impossible to imagine well-being without more agency on the way data are collected, stored and used. It is imperative to envision structures through which worker communities and representatives can be more involved in determining their own data lives on platforms. There is a need to organize and mobilize workers on data rights.

One of the ways in which this can be done is through a mechanism of community data stewards who represent the needs and interests of workers to their platforms, thus negotiating and navigating the data-based decisions. This paper examines the need for data rights as a critical requirement for worker well-being in the platform economy and the ways in which it can be actualized. It argues, given that workers on platforms produce data through collective labour on and off the platform, that worker data are a community resource and should be governed by representatives of workers who can negotiate with platforms on the use of that data for workers and for the public interest. The paper analyses the opportunity for a community data steward mechanism that represents workers’ interests and intermediates on data issues, such as transparency and accountability, with offline support systems. And is also a voice to online action to address some of the injustices of the data economy. Thus, a data steward is a tool through which workers better control their data—consent, privacy and rights—better and organize online. Essentially, it is a way forward for workers to mobilize collective bargaining on data rights.

The paper covers the impact of the COVID-19 pandemic on workers’ rights and well-being. It explores the idea of community data rights on the platform economy and why collective bargaining on data is imperative for any kind of meaningful negotiation with technology companies. The role of a community data steward in reclaiming workers’ power in the platform economy is explained, concluding with policy recommendations for a community data steward structure in the Indian context….(More)”.

Public-Private Partnerships: Compound and Data Sharing in Drug Discovery and Development


Paper by Andrew M. Davis et al: “Collaborative efforts between public and private entities such as academic institutions, governments, and pharmaceutical companies form an integral part of scientific research, and notable instances of such initiatives have been created within the life science community. Several examples of alliances exist with the broad goal of collaborating toward scientific advancement and improved public welfare. Such collaborations can be essential in catalyzing breaking areas of science within high-risk or global public health strategies that may have otherwise not progressed. A common term used to describe these alliances is public-private partnership (PPP). This review discusses different aspects of such partnerships in drug discovery/development and provides example applications as well as successful case studies. Specific areas that are covered include PPPs for sharing compounds at various phases of the drug discovery process—from compound collections for hit identification to sharing clinical candidates. Instances of PPPs to support better data integration and build better machine learning models are also discussed. The review also provides examples of PPPs that address the gap in knowledge or resources among involved parties and advance drug discovery, especially in disease areas with unfulfilled and/or social needs, like neurological disorders, cancer, and neglected and rare diseases….(More)”.

Governance for Innovation and Privacy: The Promise of Data Trusts and Regulatory Sandboxes


Essay by Chantal Bernier: “Innovation feeds on data, both personal, identified data and de-identified data. To protect the data from increasing privacy risks, governance structures emerge to allow the use and sharing of data as necessary for innovation while addressing privacy risks. Two frameworks proposed to fulfill this purpose are data trusts and regulatory sandboxes.

The Government of Canada introduced the concept of “data trust” into the Canadian privacy law modernization discussion through Canada’s Digital Charter in Action: A Plan by Canadians, for Canadians, to “enable responsible innovation.” At a high level, a data trust may be defined, according to the Open Data Institute, as a legal structure that is appropriate to the data sharing it is meant to govern and that provides independent stewardship of data.

Bill C-11, known as the Digital Charter Implementation Act, 2020, and tabled on November 17, 2020, lays the groundwork for the possibility of creating data trusts for private organizations to disclose de-identified data to specific public institutions for “socially beneficial purposes.” In her recent article “Replacing Canada’s 20-Year-Old Data Protection Law,” Teresa Scassa provides a superb overview and analysis of the bill.

Another instrument for privacy protective innovation is referred to as the “regulatory sandbox.” The United Kingdom’s Information Commissioner’s Office (ICO) provides a regulatory sandbox service that encourages organizations to submit innovative initiatives without fear of enforcement action. From there, the ICO sandbox team provides advice related to privacy risks and how to embed privacy protection.

Both governance measures may hold the future of privacy and innovation, provided that we accept this equation: De-identified data may no longer be considered irrevocably anonymous and therefore should not be released unconditionally, but the risk of re-identification is so remote that the data may be released under a governance structure that mitigates the residual privacy risk.  

Innovation Needs Identified Personal Data and De-identified Data   

The role of data in innovation does not need to be explained. Innovation requires a full understanding of what is, to project toward what could be. The need for personal data, however, calls for far more than an explanation. Its use must be justified. Applications abound, and they may not be obvious to the layperson. Researchers and statisticians, however, underline the critical role of personal data with one word: reliability.

Processing data that can be traced, either through identifiers or through pseudonyms, allows superior machine learning, longitudinal studies and essential correlations, which provide, in turn, better data in which to ground innovation. Statistics Canada has developed a “Continuum of Microdata Access” to its databases on the premise that “researchers require access to microdata at the individual business, household or person level for research purposes. To preserve the privacy and confidentiality of respondents, and to encourage the use of microdata, Statistics Canada offers a wide range of options through a series of online channels, facilities and programs.”

Since the first national census in 1871, Canada has put data — derived from personal data collected through the census and surveys — to good use in the public and private sectors alike. Now, new privacy risks emerge, as the unprecedented volume of data collection and the power of analytics bring into question the notion that the de-identification of data — and therefore its anonymization — is irreversible.

And yet, data to inform innovation for the good of humanity cannot exclude data about humans. So, we must look to governance measures to release de-identified data for innovation in a privacy-protective manner. …(More)”.

Fostering trustworthy data sharing: Establishing data foundations in practice


Paper by Sophie Stalla-Bourdillon, Laura Carmichael and Alexsis Wintour: “Independent data stewardship remains a core component of good data governance practice. Yet, there is a need for more robust independent data stewardship models that are able to oversee data-driven, multi-party data sharing, usage and re-usage, which can better incorporate citizen representation, especially in relation to personal data. We propose that data foundations—inspired by Channel Islands’ foundations laws—provide a workable model for good data governance not only in the Channel Islands, but also elsewhere. A key advantage of this model—in addition to leveraging existing legislation and building on established precedent—is the statutory role of the guardian that is a unique requirement in the Channel Islands, and when interpreted in a data governance model provides the independent data steward. The principal purpose for this paper, therefore, is to demonstrate why data foundations are well suited to the needs of data sharing initiatives. We further examine how data foundations could be established in practice—and provide key design principles that should be used to guide the design and development of any data foundation….(More)”.

Tracking COVID-19 using online search


Paper by Vasileios Lampos et al: “Previous research has demonstrated that various properties of infectious diseases can be inferred from online search behaviour. In this work we use time series of online search query frequencies to gain insights about the prevalence of COVID-19 in multiple countries. We first develop unsupervised modelling techniques based on associated symptom categories identified by the United Kingdom’s National Health Service and Public Health England. We then attempt to minimise an expected bias in these signals caused by public interest—as opposed to infections—using the proportion of news media coverage devoted to COVID-19 as a proxy indicator. Our analysis indicates that models based on online searches precede the reported confirmed cases and deaths by 16.7 (10.2–23.2) and 22.1 (17.4–26.9) days, respectively. We also investigate transfer learning techniques for mapping supervised models from countries where the spread of the disease has progressed extensively to countries that are in earlier phases of their respective epidemic curves. Furthermore, we compare time series of online search activity against confirmed COVID-19 cases or deaths jointly across multiple countries, uncovering interesting querying patterns, including the finding that rarer symptoms are better predictors than common ones. Finally, we show that web searches improve the short-term forecasting accuracy of autoregressive models for COVID-19 deaths. Our work provides evidence that online search data can be used to develop complementary public health surveillance methods to help inform the COVID-19 response in conjunction with more established approaches….(More)”.

Robot census: Gathering data to improve policymaking on new technologies


Essay by Robert Seamans: There is understandable excitement about the impact that new technologies like artificial intelligence (AI) and robotics will have on our economy. In our everyday lives, we already see the benefits of these technologies: when we use our smartphones to navigate from one location to another using the fastest available route or when a predictive typing algorithm helps us finish a sentence in our email. At the same time, there are concerns about possible negative effects of these new technologies on labor. The Council of Economic Advisers of the past two Administrations have addressed these issues in the annual Economic Report of the President (ERP). For example, the 2016 ERP included a chapter on technology and innovation that linked robotics to productivity and growth, and the 2019 ERP included a chapter on artificial intelligence that discussed the uneven effects of technological change. Both these chapters used data at highly aggregated levels, in part because that is the data that is available. As I’ve noted elsewhere, AI and robots are everywhere, except, as it turns out, in the data.

To date, there have been no large scale, systematic studies in the U.S. on how robots and AI affect productivity and labor in individual firms or establishments (a firm could own one or more establishments, which for example could be a plant in a manufacturing setting or a storefront in a retail setting). This is because the data are scarce. Academic researchers interested in the effects of AI and robotics on economic outcomes have mostly used aggregate country and industry-level data. Very recently, some have studied these issues at the firm level using data on robot imports to France, Spain, and other countries. I review a few of these academic papers in both categories below, which provide early findings on the nuanced role these new technologies have on labor. Thanks to some excellent work being done by the U.S. Census Bureau, however, we may soon have more data to work with. This includes new questions on robot purchases in the Annual Survey of Manufacturers and Annual Capital Expenditures Survey and new questions on other technologies including cloud computing and machine learning in the Annual Business Survey….(More)”.

Governance of Data Sharing: a Law & Economics Proposal


Paper by Jens Prufer and Inge Graef: “To prevent market tipping, which inhibits innovation, there is an urgent need to mandate sharing of user information in data-driven markets. Existing legal mechanisms to impose data sharing under EU competition law and data portability under the GDPR are not sufficient to tackle this problem. Mandated data sharing requires the design of a governance structure that combines elements of economically efficient centralization with legally necessary decentralization. We identify three feasible options. One is to centralize investigations and enforcement in a European Data Sharing Agency (EDSA), while decision-making power lies with National Competition Authorities in a Board of Supervisors. The second option is to set up a Data Sharing Cooperation Network coordinated through a European Data Sharing Board, with the National Competition Authority best placed to run the investigation adjudicating and enforcing the mandatory data-sharing decision across the EU. A third option is to mix both governance structures and to task national authorities to investigate and adjudicate and the EU-level EDSA with enforcement of data sharing….(More)”

Democratizing data in a 5G world


Blog by Dimitrios Dosis at Mastercard: “The next generation of mobile technology has arrived, and it’s more powerful than anything we’ve experienced before. 5G can move data faster, with little delay — in fact, with 5G, you could’ve downloaded a movie in the time you’ve read this far. 5G will also create a vast network of connected machines. The Internet of Things will finally deliver on its promise to fuse all our smart products — vehicles, appliances, personal devices — into a single streamlined ecosystem.

My smartwatch could monitor my blood pressure and schedule a doctor’s appointment, while my car could collect data on how I drive and how much gas I use while behind the wheel. In some cities, petrol trucks already act as roving gas stations, receiving pings when cars are low on gas and refueling them as needed, wherever they are.

This amounts to an incredible proliferation of data. By 2025, every connected person will conduct nearly 5,000 data interactions every day — one every 18 seconds — whether they know it or not. 

Enticing and convenient as new 5G-powered developments may be, it also raises complex questions about data. Namely, who is privy to our personal information? As your smart refrigerator records the foods you buy, will the refrigerator’s manufacturer be able to see your eating habits? Could it sell that information to a consumer food product company for market research without your knowledge? And where would the information go from there? 

People are already asking critical questions about data privacy. In fact, 72% of them say they are paying attention to how companies collect and use their data, according to a global survey released last year by the Harvard Business Review Analytic Services. The survey, sponsored by Mastercard, also found that while 60% of executives believed consumers think the value they get in exchange for sharing their data is worthwhile, only 44% of consumers actually felt that way.

There are many reasons for this data disconnect, including the lack of transparency that currently exists in data sharing and the tension between an individual’s need for privacy and his or her desire for personalization.

This paradox can be solved by putting data in the hands of the people who create it — giving consumers the ability to manage, control and share their own personal information when they want to, with whom they want to, and in a way that benefits them.

That’s the basis of Mastercard’s core set of principles regarding data responsibility – and in this 5G world, it’s more important than ever. We will be able to gain from these new technologies, but this change must come with trust and user control at its core. The data ecosystem needs to evolve from schemes dominated by third parties, where some data brokers collect inferred, often unreliable and inaccurate data, then share it without the consumer’s knowledge….(More)”.

Give more data, awareness and control to individual citizens, and they will help COVID-19 containment


Paper by Mirco Nanni et al: “The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the “phase 2” of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens’ privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens’ “personal data stores”, to be shared separately and selectively (e.g., with a backend system, but possibly also with other citizens), voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. The decentralized approach is also scalable to large populations, in that only the data of positive patients need be handled at a central level. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allow the user to share spatio-temporal aggregates—if and when they want and for specific aims—with health authorities, for instance. Second, we favour a longer-term pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society….(More)”.