Selected Readings on Data Portability


By Juliet McMurren, Andrew Young, and Stefaan G. Verhulst

As part of an ongoing effort to build a knowledge base for the field of improving governance through technology, The GovLab publishes a series of Selected Readings, which provide an annotated and curated collection of recommended works on themes such as open data, data collaboration, and civic technology.

In this edition, we explore selected literature on data portability.

To suggest additional readings on this or any other topic, please email info@thelivinglib.org. All our Selected Readings can be found here.

Context

Data today exists largely in silos, generating problems and inefficiencies for the individual, business and society at large. These include:

  • difficulty switching (data) between competitive service providers;
  • delays in sharing data for important societal research initiatives;
  • barriers for data innovators to reuse data that could generate insights to inform individuals’ decision making; and
  • inhibitions to scale data donation.

Data portability — the principle that individuals have a right to obtain, copy, and reuse their own personal data and to transfer it from one IT platform or service to another for their own purposes — is positioned as a solution to these problems. When fully implemented, it would make data liquid, giving individuals the ability to access their own data in a usable and transferable format, transfer it from one service provider to another, or donate data for research and enhanced data analysis by those working in the public interest.

Some companies, including Google, Apple, Twitter and Facebook, have sought to advance data portability through initiatives like the Data Transfer Project, an open source software project designed to facilitate data transmittals. Newly enacted data protection legislation such as Europe’s General Data Protection Regulation (2018) and the California Consumer Privacy Act (2018) give data holders a right to data portability. However, despite the legal and technical advances made, many questions toward scaling up data liquidity and portability responsibly and systematically remain. These new data rights have generated complex and as yet unanswered questions about the limits of data ownership, the implications for privacy, security and intellectual property rights, and the practicalities of how, when, and to whom data can be transferred.

In this edition of the GovLab’s Selected Readings series, we examine the emerging literature on data portability to provide a foundation for future work on the value proposition of data portability. Readings are listed in alphabetical order.

Selected readings

Cho, Daegon, Pedro Ferreira, and Rahul Telang, The Impact of Mobile Number Portability on Price and Consumer Welfare (2016)

  • In this paper, the authors analyze how Mobile Number Portability (MNP) — the ability for consumers to maintain their phone number when changing providers, thus reducing switching costs — affected the relationship between switching costs, market price and consumer surplus after it was introduced in most European countries in the early 2000s.
  • Theory holds that when switching costs are high, market leaders will enjoy a substantial advantage and are able to keep prices high. Policy makers will therefore attempt to decrease switching costs to intensify competition and reduce prices to consumers.
  • The study reviewed quarterly data from 47 wireless service providers in 15 EU countries between 1999 and 2006. The data showed that MNP simultaneously decreased market price by over four percent and increased consumer welfare by an average of at least €2.15 per person per quarter. This increase amounted to a total of €880 million per quarter across the 15 EU countries analyzed in this paper and accounted for 15 percent of the increase in consumer surplus observed over this time.

CtrlShift, Data Mobility: The data portability growth opportunity for the UK economy (2018)

  • Commissioned by the UK Department of Digital, Culture, Media and Sport (DCMS), this study was intended to identify the potential of personal data portability for the UK economy.
  • Its scope went beyond the legal right to data portability envisaged by the GDPR, to encompass the current state of personal data portability and mobility, requirements for safe and secure data sharing, and the potential economic benefits through stimulation of innovation, productivity and competition.
  • The report concludes that increased personal data mobility has the potential to be a vital stimulus for the development of the digital economy, driving growth by empowering individuals to make use of their own data and consent to others using it to create new data-driven services and technologies.
  • However, the report concludes that there are significant challenges to be overcome, and new risks to be addressed, before the value of personal data can be realized. Much personal data remains locked in organizational silos, and systemic issues related to data security and governance and the uneven sharing of benefits need to be resolved.

Data Guidance and Future of Privacy Forum, Comparing Privacy Laws: GDPR v. CCPA (2018)

  • This paper compares the provisions of the GDPR with those of the California Consumer Privacy Act (2018).
  • Both article 20 of the GDPR and section 1798 of the CCPA recognize a right to data portability. Both also confer on data subjects the right to receive data from controllers free of charge upon request, and oblige controllers to create mechanisms to provide subjects with their data in portable and reusable form so that it can be transmitted to third parties for reuse.
  • In the CCPA, the right to data portability is an extension of the right to access, and only confers on data subjects the right to apply for data collected within the past 12 months and have it delivered to them. The GDPR does not impose a time limit, and allows data to be transferred from one data controller to another, but limits the right to automatically collected personal data provided by the data subject themselves through consent or contract.

Data Transfer Project, Data Transfer Project Overview and Fundamentals (2018)

  • The paper presents an overview of the goals, principles, architecture, and system components of the Data Transfer Project. The intent of the DTP is to increase the number of services offering data portability and provide users with the ability to transfer data directly in and out of participating providers through systems that are easy and intuitive to use, private and secure, reciprocal between services, and focused on user data. The project, which is supported by Microsoft, Google, Twitter and Facebook, is an open-source initiative that encourages the participation of other providers to reduce the infrastructure burden on providers and users.
  • In addition to benefits to innovation, competition, and user choice, the authors point to benefits to security, through allowing users to backup, organize, or archive their data, recover from account hijacking, and retrieve their data from deprecated services.
  • The DTP’s remit was to test concepts and feasibility for the transfer of specific types of user data between online services using a system of adapters to transfer proprietary formats into canonical formats that can be used to transfer data while allowing providers to maintain control over the security of their service. While not resolving all formatting or support issues, this approach would allow substantial data portability and encourage ecosystem sustainability.

Deloitte, How to Flourish in an Uncertain Future: Open Banking(2017)

  • This report addresses the innovative and disruptive potential of open banking, in which data is shared between members of the banking ecosystem at the authorization of the customer, with the potential to increase competition and facilitate new products and services. In the resulting marketplace model, customers could use a single banking interface to access products from multiple players, from established banks to newcomers and fintechs.
  • The report’s authors identify significant threats to current banking models. Banks that failed to embrace open banking could be relegated to a secondary role as an infrastructure provider, while third parties — tech companies, fintech, and price comparison websites — take over the customer relationship.
  • The report identifies four overlapping operating models banks could adopt within an open banking model: full service providers, delivering proprietary products through their own interface with little or no third-party integration; utilities, which provide other players with infrastructure without customer-facing services; suppliers, which offer proprietary products through third-party interfaces; and interfaces,which provide distribution services through a marketplace interface. To retain market share, incumbents are likely to need to adopt a combination of these roles, offering their own products and services and those of third parties through their own and others’ interfaces.

Digital Competition Expert Panel Unlocking Digital Competition(2019)

  • This report captures the findings of the UK Digital Competition Expert Panel, which was tasked in 2018 with considering opportunities and challenges the digital economy might pose for competition and competition policy and to recommend any necessary changes. The panel focused on the impact of big players within the sector, appropriate responses to mergers or anticompetitive practices, and the impact on consumers.
  • The panel found that the digital economy is creating many benefits, but that digital markets are subject to tipping, in which emerging winners can scoop much of the market. This concentration can give rise to substantial costs, especially to consumers, and cannot be solved by competition alone. However, government policy and regulatory solutions have limitations, including the slowness of policy change, uneven enforcement and profound informational asymmetries between companies and government.
  • The panel proposed the creation of a digital markets unit that would be tasked with developing a code of competitive conduct, enabling greater personal data mobility and systems designed with open standards, and advancing access to non-personal data to reduce barriers to market entry.
  • The panel’s model of data mobility goes beyond data portability, which involves consumers being able to request and transfer their own data from one provider to another. Instead, the panel recommended empowering consumers to instigate transfers of data between a business and a third party in order to access price information, compare goods and services, or access tailored advice and recommendations. They point to open banking as an example of how this could function in practice.
  • It also proposed updating merger policy to make it more forward-looking to better protect consumers and innovation and preserve the competitiveness of the market. It recommended the creation of antitrust policy that would enable the implementation of interim measures to limit damage to competition while antitrust cases are in process.

Egan, Erin, Charting a Way Forward: Data Portability and Privacy(2019)

  • This white paper by Facebook’s VP and Chief Privacy Officer, Policy, represents an attempt to advance the conversation about the relationship between data portability, privacy, and data protection. The author sets out five key questions about data portability: what is it, whose and what data should be portable, how privacy should be protected in the context of portability, and where responsibility for data misuse or improper protection should lie.
  • The paper finds that definitions of data portability still remain imprecise, particularly with regard to the distinction between data portability and data transfer. In the interest of feasibility and a reasonable operational burden on providers, it proposes time limits on providers’ obligations to make observed data portable.
  • The paper concludes that there are strong arguments both for and against allowing users to port their social graph — the map of connections between that user and other users of the service — but that the key determinant should be a capacity to ensure the privacy of all users involved. Best-practice data portability protocols that would resolve current differences of approach as to what, how and by whom information should be made available would help promote broader portability, as would resolution of liability for misuse or data exposure.

Engels, Barbara, Data portability among online platforms (2016)

  • The article examines the effects on competition and innovation of data portability among online platforms such as search engines, online marketplaces, and social media, and how relations between users, data, and platform services change in an environment of data portability.
  • The paper finds that the benefits to competition and innovation of portability are greatest in two kinds of environments: first, where platforms offer complementary products and can realize synergistic benefits by sharing data; and secondly, where platforms offer substitute or rival products but the risk of anti-competitive behaviour is high, as for search engines.
  • It identifies privacy and security issues raised by data portability. Portability could, for example, allow an identity fraudster to misuse personal data across multiple platforms, compounding the harm they cause.
  • It also suggests that standards for data interoperability could act to reduce innovation in data technology, encouraging data controllers to continue to use outdated technologies in order to comply with inflexible, government-mandated standards.

Graef, Inge, Martin Husovec and Nadezhda Purtova, Data Portability and Data Control: Lessons for an Emerging Concept in EU Law (2018)

  • This paper situates the data portability right conferred by the GDPR within rights-based data protection law. The authors argue that the right to data portability should be seen as a new regulatory tool aimed at stimulating competition and innovation in data-driven markets.
  • The authors note the potential for conflicts between the right to data portability and the intellectual property rights of data holders, suggesting that the framers underestimated the potential impact of such conflicts on copyright, trade secrets and sui generis database law.
  • Given that the right to data portability is being replicated within consumer protection law and the regulation of non-personal data, the authors argue framers of these laws should consider the potential for conflict and the impact of such conflict on incentives to innovate.

Mohsen, Mona Omar and Hassan A. Aziz The Blue Button Project: Engaging Patients in Healthcare by a Click of a Button (2015)

  • This paper provides a literature review on the Blue Button initiative, an early data portability project which allows Americans to access, view or download their health records in a variety of formats.
  • Originally launched through the Department of Veterans’ Affairs in 2010, the Blue Button initiative had expanded to more than 500 organizations by 2014, when the Department of Health and Human Services launched the Blue Button Connector to facilitate both patient access and development of new tools.
  • The Blue Button has enabled the development of tools such as the Harvard-developed Growth-Tastic app, which allows parents to check their child’s growth by submitting their downloaded pediatric health data. Pharmacies across the US have also adopted the Blue Button to provide patients with access to their prescription history.

More than Data and Mission: Smart, Got Data? The Value of Energy Data to Customers (2016)

  • This report outlines the public value of the Green Button, a data protocol that provides customers with private and secure access to their energy use data collected by smart meters.
  • The authors outline how the use of the Green Button can help states meet their energy and climate goals by enabling them to structure renewables and other distributed energy resources (DER) such as energy efficiency, demand response, and solar photovoltaics. Access to granular, near real time data can encourage innovation among DER providers, facilitating the development of applications like “virtual energy audits” that identify efficiency opportunities, allowing customers to reduce costs through time-of-use pricing, and enabling the optimization of photovoltaic systems to meet peak demand.
  • Energy efficiency receives the greatest boost from initiatives like the Green Button, with studies showing energy savings of up to 18 percent when customers have access to their meter data. In addition to improving energy conservation, access to meter data could improve the efficiency of appliances by allowing devices to trigger sleep modes in response to data on usage or price. However, at the time of writing, problems with data portability and interoperability were preventing these benefits from being realized, at a cost of tens of millions of dollars.
  • The authors recommend that commissions require utilities to make usage data available to customers or authorized third parties in standardized formats as part of basic utility service, and tariff data to developers for use in smart appliances.

MyData, Understanding Data Operators (2020)

  • MyData is a global movement of data users, activists and developers with a common goal to empower individuals with their personal data to enable them and their communities to develop knowledge, make informed decisions and interact more consciously and efficiently.
  • This introductory paper presents the state of knowledge about data operators, trusted data intermediaries that provide infrastructure for human-centric personal data management and governance, including data sharing and transfer. The operator model allows data controllers to outsource issues of legal compliance with data portability requirements, while offering individual users a transparent and intuitive way to manage the data transfer process.
  • The paper examines use cases from 48 “proto-operators” from 15 countries who fulfil some of the functions of an operator, albeit at an early level of maturity. The paper finds that operators offer management of identity authentication, data transaction permissions, connections between services, value exchange, data model management, personal data transfer and storage, governance support, and logging and accountability. At the heart of these functions is the need for minimum standards of data interoperability.
  • The paper reviews governance frameworks from the general (legislative) to the specific (operators), and explores proto-operator business models. In keeping with an emerging field, business models are currently unclear and potentially unsustainable, and one of a number of areas, including interoperability requirements and governance frameworks, that must still be developed.

National Science and Technology Council Smart Disclosure and Consumer Decision Making: Report of the Task Force on Smart Disclosure (2013)

  • This report summarizes the work and findings of the 2011–2013 Task Force on Smart Disclosure: Information and Efficiency, an interagency body tasked with advancing smart disclosure, through which data is made more available and accessible to both consumers and innovators.
  • The Task Force recognized the capacity of smart disclosure to inform consumer choices, empower them through access to useful personal data, enable the creation of new tools, products and services, and promote efficiency and growth. It reviewed federal efforts to promote smart disclosure within sectors and in data types that crosscut sectors, such as location data, consumer feedback, enforcement and compliance data and unique identifiers. It also surveyed specific public-private partnerships on access to data, such as the Blue and Green Button and MyData initiatives in health, energy and education respectively.
  • The Task Force reviewed steps taken by the Federal Government to implement smart disclosure, including adoption of machine readable formats and standards for metadata, use of APIs, and making data available in an unstructured format rather than not releasing it at all. It also reviewed “choice engines” making use of the data to provide services to consumers across a range of sectors.
  • The Task Force recommended that smart disclosure should be a core component of efforts to institutionalize and operationalize open data practices, with agencies proactively identifying, tagging, and planning the release of candidate data. It also recommended that this be supported by a government-wide community of practice.

Nicholas, Gabriel Taking It With You: Platform Barriers to Entry and the Limits of Data Portability (2020)

  • This paper considers whether, as is often claimed, data portability offers a genuine solution to the lack of competition within the tech sector.
  • It concludes that current regulatory approaches to data portability, which focus on reducing switching costs through technical solutions such as one-off exports and API interoperability, are not sufficient to generate increased competition. This is because they fail to address other barriers to entry, including network effects, unique data access, and economies of scale.
  • The author proposes an alternative approach, which he terms collective portability, which would allow groups of users to coordinate the transfer of their data to a new platform. This model raises questions about how such collectives would make decisions regarding portability, but would enable new entrants to successfully target specific user groups and scale rapidly without having to reach users one by one.

OECD, Enhancing Access to and Sharing of Data: Reconciling Risks and Benefits for Data Re-use across Societies (2019)

  • This background paper to a 2017 expert workshop on risks and benefits of data reuse considers data portability as one strategy within a data openness continuum that also includes open data, market-based B2B contractual agreements, and restricted data-sharing agreements within research and data for social good applications.
  • It considers four rationales offered for data portability. These include empowering individuals towards the “informational self-determination” aspired to by GDPR, increased competition within digital and other markets through reductions in information asymmetries between individuals and providers, switching costs, and barriers to market entry; and facilitating increased data flows.
  • The report highlights the need for both syntactic and semantic interoperability standards to ensure data can be reused across systems, both of which may be fostered by increased rights to data portability. Data intermediaries have an important role to play in the development of these standards, through initiatives like the Data Transfer Project, a collaboration which brought together Facebook, Google, Microsoft, and Twitter to create an open-source data portability platform.

Personal Data Protection Commission Singapore Response to Feedback on the Public Consultation on Proposed Data Portability and Data Innovation Provisions (2020)

  • The report summarizes the findings of the 2019 PDPC public consultation on proposals to introduce provisions on data portability and data innovation in Singapore’s Personal Data Protection Act.
  • The proposed provision would oblige organizations to transmit an individual’s data to another organization in a commonly used machine-readable format, upon the individual’s request. The obligation does not extend to data intermediaries or organizations that do not have a presence in Singapore, although data holders may choose to honor those requests.
  • The obligation would apply to electronic data that is either provided by the individual or generated by the individual’s activities in using the organization’s service or product, but not derived data created by the processing of other data by the data holder. Respondents were concerned that including derived data could harm organizations’ competitiveness.
  • Respondents were concerned about how to honour data portability requests where the data of third parties was involved, as in the case of a joint account holder, for example. The PDPC opted for a “balanced, reasonable, and pragmatic approach,” allowing data involving third parties to be ported where it was under the requesting individual’s control, was to be used for domestic and personal purposes, and related only to the organization’s product or service.

Quinn, Paul Is the GDPR and Its Right to Data Portability a Major Enabler of Citizen Science? (2018)

  • This article explores the potential of data portability to advance citizen science by enabling participants to port their personal data from one research project to another. Citizen science — the collection and contribution of large amounts of data by private individuals for scientific research — has grown rapidly in response to the development of new digital means to capture, store, organize, analyze and share data.
  • The GDPR right to data portability aids citizen science by requiring transfer of data in machine-readable format and allowing data subjects to request its transfer to another data controller. This requirement of interoperability does not amount to compatibility, however, and data thus transferred would probably still require cleaning to be usable, acting as a disincentive to reuse.
  • The GDPR’s limitation of transferability to personal data provided by the data subject excludes some forms of data that might possess significant scientific potential, such as secondary personal data derived from further processing or analysis.
  • The GDPR right to data portability also potentially limits citizen science by restricting the grounds for processing data to which the right applies to data obtained through a subject’s express consent or through the performance of a contract. This limitation excludes other forms of data processing described in the GDPR, such as data processing for preventive or occupational medicine, scientific research, or archiving for reasons of public or scientific interest. It is also not clear whether the GDPR compels data controllers to transfer data outside the European Union.

Wong, Janis and Tristan Henderson, How Portable is Portable? Exercising the GDPR’s Right to Data Portability (2018)

  • This paper presents the results of 230 real-world requests for data portability in order to assess how — and how well — the GDPR right to data portability is being implemented. The authors were interested in establishing the kinds of file formats that were returned in response to requests, and to identify practical difficulties encountered in making and interpreting requests, over a three month period beginning on the day the GDPR came into effect.
  • The findings revealed continuing problems around ensuring portability for both data controllers and data subjects. Of the 230 requests, only 163 were successfully completed.
  • Data controllers frequently had difficulty understanding the requirements of GDPR, providing data in incomplete or inappropriate formats: only 40 percent of the files supplied were in a fully compliant format. Additionally, some data controllers were confused between the right to data portability and other rights conferred by the GDPR, such as the right to access or erasure.

Netnography: The Essential Guide to Qualitative Social Media Research


Book by Robert Kozinets: “Netnography is an adaptation of ethnography for the online world, pioneered by Robert Kozinets, and is concerned with the study of online cultures and communities as distinct social phenomena, rather than isolated content. In this landmark third edition, Netnography: The Essential Guide provides the theoretical and methodological groundwork as well as the practical applications, helping students both understand and do netnographic research projects of their own.

Packed with enhanced learning features throughout, linking concepts to structured activities in a step by step way, the book is also now accompanied by a striking new visual design and further case studies, offering the essential student resource to conducting online ethnographic research. Real world examples provided demonstrate netnography in practice across the social sciences, in media and cultural studies, anthropology, education, nursing, travel and tourism, and others….(More)”.

The Economics of Social Data: An Introduction


Paper by Dirk Bergemann and Alessandro Bonatti: “Large internet platforms collect data from individual users in almost every interaction on the internet. Whenever an individual browses a news website, searches for a medical term or for a travel recommendation, or simply checks the weather forecast on an app, that individual generates data. A central feature of the data collected from the individuals is its social aspect. Namely, the data captured from an individual user is not only informative about this specific individual, but also about users in some metric similar to the individual. Thus, the individual data is really social data. The social nature of the data generates an informational externality that we investigate in this note….(More)”.

Real-time flu tracking. By monitoring social media, scientists can monitor outbreaks as they happen.


Charles Schmidt at Nature: “Conventional influenza surveillance describes outbreaks of flu that have already happened. It is based on reports from doctors, and produces data that take weeks to process — often leaving the health authorities to chase the virus around, rather than get on top of it.

But every day, thousands of unwell people pour details of their symptoms and, perhaps unknowingly, locations into search engines and social media, creating a trove of real-time flu data. If such data could be used to monitor flu outbreaks as they happen and to make accurate predictions about its spread, that could transform public-health surveillance.

Powerful computational tools such as machine learning and a growing diversity of data streams — not just search queries and social media, but also cloud-based electronic health records and human mobility patterns inferred from census information — are making it increasingly possible to monitor the spread of flu through the population by following its digital signal. Now, models that track flu in real time and forecast flu trends are making inroads into public-health practice.

“We’re becoming much more comfortable with how these models perform,” says Matthew Biggerstaff, an epidemiologist who works on flu preparedness at the US Centers for Disease Control and Prevention (CDC) in Atlanta, Georgia.

In 2013–14, the CDC launched the FluSight Network, a website informed by digital modelling that predicts the timing, peak and short-term intensity of the flu season in ten regions of the United States and across the whole country. According to Biggerstaff, flu forecasting helps responders to plan ahead, so they can be ready with vaccinations and communication strategies to limit the effects of the virus. Encouraged by progress in the field, the CDC announced in January 2019 that it will spend US$17.5 million to create a network of influenza-forecasting centres of excellence, each tasked with improving the accuracy and communication of real-time forecasts.

The CDC is leading the way on digital flu surveillance, but health agencies elsewhere are following suit. “We’ve been working to develop and apply these models with collaborators using a range of data sources,” says Richard Pebody, a consultant epidemiologist at Public Health England in London. The capacity to predict flu trajectories two to three weeks in advance, Pebody says, “will be very valuable for health-service planning.”…(More)”.

So­cial me­dia data re­veal where vis­it­ors to nature loca­tions provide po­ten­tial be­ne­fits or threats to biodiversity


University of Helsinki: “In a new article published in the journal Science of the Total Environment, a team of researchers assessed global patterns of visitation rates, attractiveness and pressure to more than 12,000 Important Bird and Biodiversity Areas (IBAs), which are sites of international significance for nature conservation, by using geolocated data mined from social media (Twitter and Flickr).

The study found that Important Bird and Biodiversity Areas located in Europe and Asia, and in temperate biomes, had the highest density of social media users. Results also showed that sites of importance for congregatory species, which were also more accessible, more densely populated and provided more tourism facilities, received higher visitation than did sites richer in bird species.

 “Resources in biodiversity conservation are woefully inadequate and novel data sources from social media provide openly available user-generated information about human-nature interactions, at an unprecedented spatio-temporal scale”, says Dr Anna Hausmann from the University of Helsinki, a conservation scientist leading the study. “Our group has been exploring and validating data retrieved from social media to understand people´s preferences for experiencing nature in national parks at a local, national and continental scale”, she continues, “in this study, we expand our analyses at a global level”. …

“Social media content and metadata contain useful information for understanding human-nature interactions in space and time”, says Prof. Tuuli Toivonen, another co-author in the paper and the leader of the Digital Geography Lab at the University of Helsinki. “Social media data can also be used to cross-validate and enrich data collected by conservation organizations”, she continues. The study found that the 17 percent of all Important Bird and Biodiversity Areas (IBA) that were assessed by experts to be under greater human disturbance also had higher density of social media users….(More)”.

Social Media Monitoring: How the Department of Homeland Security Uses Digital Data in the Name of National Security


Report by the Brennan Center for Justice: “The Department of Homeland Security (DHS) is rapidly expanding its collection of social media information and using it to evaluate the security risks posed by foreign and American travelers. This year marks a major expansion. The visa applications vetted by DHS will include social media handles that the State Department is set to collect from some 15 million travelers per year.1 Social media can provide a vast trove of information about individuals, including their personal preferences, political and religious views, physical and mental health, and the identity of their friends and family. But it is susceptible to misinterpretation, and wholesale monitoring of social media creates serious risks to privacy and free speech. Moreover, despite the rush to implement these programs, there is scant evidence that they actually meet the goals for which they are deployed…(More)”

Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Instagram, GitHub, and More


Book (New 3rd Edition) by Matthew A. Russell and Mikhail Klassen:  “Mine the rich data tucked away in popular social websites such as Twitter, Facebook, LinkedIn, and Instagram. With the third edition of this popular guide, data scientists, analysts, and programmers will learn how to glean insights from social media—including who’s connecting with whom, what they’re talking about, and where they’re located—using Python code examples, Jupyter notebooks, or Docker containers.

In part one, each standalone chapter focuses on one aspect of the social landscape, including each of the major social sites, as well as web pages, blogs and feeds, mailboxes, GitHub, and a newly added chapter covering Instagram. Part two provides a cookbook with two dozen bite-size recipes for solving particular issues with Twitter….(More)”.

Index: Open Data


By Alexandra Shaw, Michelle Winowatan, Andrew Young, and Stefaan Verhulst

The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on open data and was originally published in 2018.

Value and Impact

  • The projected year at which all 28+ EU member countries will have a fully operating open data portal: 2020

  • Between 2016 and 2020, the market size of open data in Europe is expected to increase by 36.9%, and reach this value by 2020: EUR 75.7 billion

Public Views on and Use of Open Government Data

  • Number of Americans who do not trust the federal government or social media sites to protect their data: Approximately 50%

  • Key findings from The Economist Intelligence Unit report on Open Government Data Demand:

    • Percentage of respondents who say the key reason why governments open up their data is to create greater trust between the government and citizens: 70%

    • Percentage of respondents who say OGD plays an important role in improving lives of citizens: 78%

    • Percentage of respondents who say OGD helps with daily decision making especially for transportation, education, environment: 53%

    • Percentage of respondents who cite lack of awareness about OGD and its potential use and benefits as the greatest barrier to usage: 50%

    • Percentage of respondents who say they lack access to usable and relevant data: 31%

    • Percentage of respondents who think they don’t have sufficient technical skills to use open government data: 25%

    • Percentage of respondents who feel the number of OGD apps available is insufficient, indicating an opportunity for app developers: 20%

    • Percentage of respondents who say OGD has the potential to generate economic value and new business opportunity: 61%

    • Percentage of respondents who say they don’t trust governments to keep data safe, protected, and anonymized: 19%

Efforts and Involvement

  • Time that’s passed since open government advocates convened to create a set of principles for open government data – the instance that started the open data government movement: 10 years

  • Countries participating in the Open Government Partnership today: 79 OGP participating countries and 20 subnational governments

  • Percentage of “open data readiness” in Europe according to European Data Portal: 72%

    • Open data readiness consists of four indicators which are presence of policy, national coordination, licensing norms, and use of data.

  • Number of U.S. cities with Open Data portals: 27

  • Number of governments who have adopted the International Open Data Charter: 62

  • Number of non-state organizations endorsing the International Open Data Charter: 57

  • Number of countries analyzed by the Open Data Index: 94

  • Number of Latin American countries that do not have open data portals as of 2017: 4 total – Belize, Guatemala, Honduras and Nicaragua

  • Number of cities participating in the Open Data Census: 39

Demand for Open Data

  • Open data demand measured by frequency of open government data use according to The Economist Intelligence Unit report:

    • Australia

      • Monthly: 15% of respondents

      • Quarterly: 22% of respondents

      • Annually: 10% of respondents

    • Finland

      • Monthly: 28% of respondents

      • Quarterly: 18% of respondents

      • Annually: 20% of respondents

    •  France

      • Monthly: 27% of respondents

      • Quarterly: 17% of respondents

      • Annually: 19% of respondents

        •  
    • India

      • Monthly: 29% of respondents

      • Quarterly: 20% of respondents

      • Annually: 10% of respondents

    • Singapore

      • Monthly: 28% of respondents

      • Quarterly: 15% of respondents

      • Annually: 17% of respondents 

    • UK

      • Monthly: 23% of respondents

      • Quarterly: 21% of respondents

      • Annually: 15% of respondents

    • US

      • Monthly: 16% of respondents

      • Quarterly: 15% of respondents

      • Annually: 20% of respondents

  • Number of FOIA requests received in the US for fiscal year 2017: 818,271

  • Number of FOIA request processed in the US for fiscal year 2017: 823,222

  • Distribution of FOIA requests in 2017 among top 5 agencies with highest number of request:

    • DHS: 45%

    • DOJ: 10%

    • NARA: 7%

    • DOD: 7%

    • HHS: 4%

Examining Datasets

  • Country with highest index score according to ODB Leaders Edition: Canada (76 out of 100)

  • Country with lowest index score according to ODB Leaders Edition: Sierra Leone (22 out of 100)

  • Number of datasets open in the top 30 governments according to ODB Leaders Edition: Fewer than 1 in 5

  • Average percentage of datasets that are open in the top 30 open data governments according to ODB Leaders Edition: 19%

  • Average percentage of datasets that are open in the top 30 open data governments according to ODB Leaders Edition by sector/subject:

    • Budget: 30%

    • Companies: 13%

    • Contracts: 27%

    • Crime: 17%

    • Education: 13%

    • Elections: 17%

    • Environment: 20%

    • Health: 17%

    • Land: 7%

    • Legislation: 13%

    • Maps: 20%

    • Spending: 13%

    • Statistics: 27%

    • Trade: 23%

    • Transport: 30%

  • Percentage of countries that release data on government spending according to ODB Leaders Edition: 13%

  • Percentage of government data that is updated at regular intervals according to ODB Leaders Edition: 74%

  • Number of datasets available through:

  • Number of datasets classed as “open” in 94 places worldwide analyzed by the Open Data Index: 11%

  • Percentage of open datasets in the Caribbean, according to Open Data Census: 7%

  • Number of companies whose data is available through OpenCorporates: 158,589,950

City Open Data

  • New York City

  • Singapore

    • Number of datasets published in Singapore: 1,480

    • Percentage of datasets with standardized format: 35%

    • Percentage of datasets made as raw as possible: 25%

  • Barcelona

    • Number of datasets published in Barcelona: 443

    • Open data demand in Barcelona measured by:

      • Number of unique sessions in the month of September 2018: 5,401

    • Quality of datasets published in Barcelona according to Tim Berners Lee 5-star Open Data: 3 stars

  • London

    • Number of datasets published in London: 762

    • Number of data requests since October 2014: 325

  • Bandung

    • Number of datasets published in Bandung: 1,417

  • Buenos Aires

    • Number of datasets published in Buenos Aires: 216

  • Dubai

    • Number of datasets published in Dubai: 267

  • Melbourne

    • Number of datasets published in Melbourne: 199

Sources

  • About OGP, Open Government Partnership. 2018.  

The Global Commons of Data


Paper by Jennifer Shkabatur: “Data platform companies (such as Facebook, Google, or Twitter) amass and process immense amounts of data that is generated by their users. These companies primarily use the data to advance their commercial interests, but there is a growing public dismay regarding the adverse and discriminatory impacts of their algorithms on society at large. The regulation of data platform companies and their algorithms has been hotly debated in the literature, but current approaches often neglect the value of data collection, defy the logic of algorithmic decision-making, and exceed the platform companies’ operational capacities.

This Article suggests a different approach — an open, collaborative, and incentives-based stance toward data platforms that takes full advantage of the tremendous societal value of user-generated data. It contends that this data shall be recognized as a “global commons,” and access to it shall be made available to a wide range of independent stakeholders — research institutions, journalists, public authorities, and international organizations. These external actors would be able to utilize the data to address a variety of public challenges, as well as observe from within the operation and impacts of the platforms’ algorithms.

After making the theoretical case for the “global commons of data,” the Article explores the practical implementation of this model. First, it argues that a data commons regime should operate through a spectrum of data sharing and usage modalities that would protect the commercial interests of data platforms and the privacy of data users. Second, it discusses regulatory measures and incentives that can solicit the collaboration of platform companies with the commons model. Lastly, it explores the challenges embedded in this approach….(More)”.

The Nail Finds a Hammer: Self-Sovereign Identity, Design Principles, and Property Rights in the Developing World


Report by Michael Graglia, Christopher Mellon and Tim Robustelli: “Our interest in identity systems was an inevitable outgrowth of our earlier work on blockchain-based1 land registries.2 Property registries, which at the simplest level are ledgers of who has which rights to which asset, require a very secure and reliable means of identifying both people and properties. In the course of investigating solutions to that problem, we began to appreciate the broader challenges of digital identity and its role in international development. And the more we learned about digital identity, the more convinced we became of the need for self-sovereign identity, or SSI. This model, and the underlying principles of identity which it incorporates, will be described in detail in this paper.

We believe that the great potential of SSI is that it can make identity in the digital world function more like identity in the physical world, in which every person has a unique and persistent identity which is represented to others by means of both their physical attributes and a collection of credentials attested to by various external sources of authority. These credentials are stored and controlled by the identity holder—typically in a wallet—and presented to different people for different reasons at the identity holder’s discretion. Crucially, the identity holder controls what information to present based on the environment, trust level, and type of interaction. Moreover, their fundamental identity persists even though the credentials by which it is represented may change over time.

The digital incarnation of this model has many benefits, including both greatly improved privacy and security, and the ability to create more trustworthy online spaces. Social media and news sites, for example, might limit participation to users with verified identities, excluding bots and impersonators.

The need for identification in the physical world varies based on location and social context. We expect to walk in relative anonymity down a busy city street, but will show a driver’s license to enter a bar, and both a driver’s license and a birth certificate to apply for a passport. There are different levels of ID and supporting documents required for each activity. But in each case, access to personal information is controlled by the user who may choose whether or not to share it.

Self-sovereign identity gives users complete control of their own identities and related personal data, which sits encrypted in distributed storage instead of being stored by a third party in a central database. In older, “federated identity” models, a single account—a Google account, for example—might be used to log in to a number of third-party sites, like news sites or social media platforms. But in this model a third party brokers all of these ID transactions, meaning that in exchange for the convenience of having to remember fewer passwords, the user must sacrifice a degree of privacy.

A real world equivalent would be having to ask the state to share a copy of your driver’s license with the bar every time you wanted to prove that you were over the age of 21. SSI, in contrast, gives the user a portable, digital credential (like a driver’s license or some other document that proves your age), the authenticity of which can be securely validated via cryptography without the recipient having to check with the authority that issued it. This means that while the credential can be used to access many different sites and services, there is no third-party broker to track the services to which the user is authenticating. Furthermore, cryptographic techniques called “zero-knowledge proofs” (ZKPs) can be used to prove possession of a credential without revealing the credential itself. This makes it possible, for example, for users to prove that they are over the age of 21 without having to share their actual birth dates, which are both sensitive information and irrelevant to a binary, yes-or-no ID transaction….(More)”.