The Global Commons of Data


Paper by Jennifer Shkabatur: “Data platform companies (such as Facebook, Google, or Twitter) amass and process immense amounts of data that is generated by their users. These companies primarily use the data to advance their commercial interests, but there is a growing public dismay regarding the adverse and discriminatory impacts of their algorithms on society at large. The regulation of data platform companies and their algorithms has been hotly debated in the literature, but current approaches often neglect the value of data collection, defy the logic of algorithmic decision-making, and exceed the platform companies’ operational capacities.

This Article suggests a different approach — an open, collaborative, and incentives-based stance toward data platforms that takes full advantage of the tremendous societal value of user-generated data. It contends that this data shall be recognized as a “global commons,” and access to it shall be made available to a wide range of independent stakeholders — research institutions, journalists, public authorities, and international organizations. These external actors would be able to utilize the data to address a variety of public challenges, as well as observe from within the operation and impacts of the platforms’ algorithms.

After making the theoretical case for the “global commons of data,” the Article explores the practical implementation of this model. First, it argues that a data commons regime should operate through a spectrum of data sharing and usage modalities that would protect the commercial interests of data platforms and the privacy of data users. Second, it discusses regulatory measures and incentives that can solicit the collaboration of platform companies with the commons model. Lastly, it explores the challenges embedded in this approach….(More)”.

Beyond Open vs. Closed: Balancing Individual Privacy and Public Accountability in Data Sharing


Paper by Bill Howe et al: “Data too sensitive to be “open” for analysis and re-purposing typically remains “closed” as proprietary information. This dichotomy undermines efforts to make algorithmic systems more fair, transparent, and accountable. Access to proprietary data in particular is needed by government agencies to enforce policy, researchers to evaluate methods, and the public to hold agencies accountable; all of these needs must be met while preserving individual privacy and firm competitiveness. In this paper, we describe an integrated legaltechnical approach provided by a third-party public-private data trust designed to balance these competing interests.

Basic membership allows firms and agencies to enable low-risk access to data for compliance reporting and core methods research, while modular data sharing agreements support a wide array of projects and use cases. Unless specifically stated otherwise in an agreement, all data access is initially provided to end users through customized synthetic datasets that offer a) strong privacy guarantees, b) removal of signals that could expose competitive advantage for the data providers, and c) removal of biases that could reinforce discriminatory policies, all while maintaining empirically good fidelity to the original data. We find that the liberal use of synthetic data, in conjunction with strong legal protections over raw data, strikes a tunable balance between transparency, proprietorship, privacy, and research objectives; and that the legal-technical framework we describe can form the basis for organizational data trusts in a variety of contexts….(More)”.

The Nail Finds a Hammer: Self-Sovereign Identity, Design Principles, and Property Rights in the Developing World


Report by Michael Graglia, Christopher Mellon and Tim Robustelli: “Our interest in identity systems was an inevitable outgrowth of our earlier work on blockchain-based1 land registries.2 Property registries, which at the simplest level are ledgers of who has which rights to which asset, require a very secure and reliable means of identifying both people and properties. In the course of investigating solutions to that problem, we began to appreciate the broader challenges of digital identity and its role in international development. And the more we learned about digital identity, the more convinced we became of the need for self-sovereign identity, or SSI. This model, and the underlying principles of identity which it incorporates, will be described in detail in this paper.

We believe that the great potential of SSI is that it can make identity in the digital world function more like identity in the physical world, in which every person has a unique and persistent identity which is represented to others by means of both their physical attributes and a collection of credentials attested to by various external sources of authority. These credentials are stored and controlled by the identity holder—typically in a wallet—and presented to different people for different reasons at the identity holder’s discretion. Crucially, the identity holder controls what information to present based on the environment, trust level, and type of interaction. Moreover, their fundamental identity persists even though the credentials by which it is represented may change over time.

The digital incarnation of this model has many benefits, including both greatly improved privacy and security, and the ability to create more trustworthy online spaces. Social media and news sites, for example, might limit participation to users with verified identities, excluding bots and impersonators.

The need for identification in the physical world varies based on location and social context. We expect to walk in relative anonymity down a busy city street, but will show a driver’s license to enter a bar, and both a driver’s license and a birth certificate to apply for a passport. There are different levels of ID and supporting documents required for each activity. But in each case, access to personal information is controlled by the user who may choose whether or not to share it.

Self-sovereign identity gives users complete control of their own identities and related personal data, which sits encrypted in distributed storage instead of being stored by a third party in a central database. In older, “federated identity” models, a single account—a Google account, for example—might be used to log in to a number of third-party sites, like news sites or social media platforms. But in this model a third party brokers all of these ID transactions, meaning that in exchange for the convenience of having to remember fewer passwords, the user must sacrifice a degree of privacy.

A real world equivalent would be having to ask the state to share a copy of your driver’s license with the bar every time you wanted to prove that you were over the age of 21. SSI, in contrast, gives the user a portable, digital credential (like a driver’s license or some other document that proves your age), the authenticity of which can be securely validated via cryptography without the recipient having to check with the authority that issued it. This means that while the credential can be used to access many different sites and services, there is no third-party broker to track the services to which the user is authenticating. Furthermore, cryptographic techniques called “zero-knowledge proofs” (ZKPs) can be used to prove possession of a credential without revealing the credential itself. This makes it possible, for example, for users to prove that they are over the age of 21 without having to share their actual birth dates, which are both sensitive information and irrelevant to a binary, yes-or-no ID transaction….(More)”.

Why Doctors Hate Their Computers


Atul Gawande at the New Yorker: “….More than ninety per cent of American hospitals have been computerized during the past decade, and more than half of Americans have their health information in the Epic system. Seventy thousand employees of Partners HealthCare—spread across twelve hospitals and hundreds of clinics in New England—were going to have to adopt the new software. I was in the first wave of implementation, along with eighteen thousand other doctors, nurses, pharmacists, lab techs, administrators, and the like.

The surgeons at the training session ranged in age from thirty to seventy, I estimated—about sixty per cent male, and one hundred per cent irritated at having to be there instead of seeing patients. Our trainer looked younger than any of us, maybe a few years out of college, with an early-Justin Bieber wave cut, a blue button-down shirt, and chinos. Gazing out at his sullen audience, he seemed unperturbed. I learned during the next few sessions that each instructor had developed his or her own way of dealing with the hostile rabble. One was encouraging and parental, another unsmiling and efficient. Justin Bieber took the driver’s-ed approach: You don’t want to be here; I don’t want to be here; let’s just make the best of it.

I did fine with the initial exercises, like looking up patients’ names and emergency contacts. When it came to viewing test results, though, things got complicated. There was a column of thirteen tabs on the left side of my screen, crowded with nearly identical terms: “chart review,” “results review,” “review flowsheet.” We hadn’t even started learning how to enter information, and the fields revealed by each tab came with their own tools and nuances.

But I wasn’t worried. I’d spent my life absorbing changes in computer technology, and I knew that if I pushed through the learning curve I’d eventually be doing some pretty cool things. In 1978, when I was an eighth grader in Ohio, I built my own one-kilobyte computer from a mail-order kit, learned to program in basic, and was soon playing the arcade game Pong on our black-and-white television set. The next year, I got a Commodore 64 from RadioShack and became the first kid in my school to turn in a computer-printed essay (and, shortly thereafter, the first to ask for an extension “because the computer ate my homework”). As my Epic training began, I expected my patience to be rewarded in the same way.

My hospital had, over the years, computerized many records and processes, but the new system would give us one platform for doing almost everything health professionals needed—recording and communicating our medical observations, sending prescriptions to a patient’s pharmacy, ordering tests and scans, viewing results, scheduling surgery, sending insurance bills. With Epic, paper lab-order slips, vital-signs charts, and hospital-ward records would disappear. We’d be greener, faster, better.

But three years later I’ve come to feel that a system that promised to increase my mastery over my work has, instead, increased my work’s mastery over me. I’m not the only one. A 2016 study found that physicians spent about two hours doing computer work for every hour spent face to face with a patient—whatever the brand of medical software. In the examination room, physicians devoted half of their patient time facing the screen to do electronic tasks. And these tasks were spilling over after hours. The University of Wisconsin found that the average workday for its family physicians had grown to eleven and a half hours. The result has been epidemic levels of burnout among clinicians. Forty per cent screen positive for depression, and seven per cent report suicidal thinking—almost double the rate of the general working population.

Something’s gone terribly wrong. Doctors are among the most technology-avid people in society; computerization has simplified tasks in many industries. Yet somehow we’ve reached a point where people in the medical profession actively, viscerally, volubly hate their computers….(More)”.

A Behavioral Economics Approach to Digitalisation


Paper by Dirk Beerbaum and Julia M. Puaschunder: “A growing body of academic research in the field of behavioural economics, political science and psychology demonstrate how an invisible hand can nudge people’s decisions towards a preferred option. Contrary to the assumptions of the neoclassical economics, supporters of nudging argue that people have problems coping with a complex world, because of their limited knowledge and their restricted rationality. Technological improvement in the age of information has increased the possibilities to control the innocent social media users or penalise private investors and reap the benefits of their existence in hidden persuasion and discrimination. Nudging enables nudgers to plunder the simple uneducated and uninformed citizen and investor, who is neither aware of the nudging strategies nor able to oversee the tactics used by the nudgers (Puaschunder 2017a, b; 2018a, b).

The nudgers are thereby legally protected by democratically assigned positions they hold. The law of motion of the nudging societies holds an unequal concentration of power of those who have access to compiled data and coding rules, relevant for political power and influencing the investor’s decision usefulness (Puaschunder 2017a, b; 2018a, b). This paper takes as a case the “transparency technology XBRL (eXtensible Business Reporting Language)” (Sunstein 2013, 20), which should make data more accessible as well as usable for private investors. It is part of the choice architecture on regulation by governments (Sunstein 2013). However, XBRL is bounded to a taxonomy (Piechocki and Felden 2007).

Considering theoretical literature and field research, a representation issue (Beerbaum, Piechocki and Weber 2017) for principles-based accounting taxonomies exists, which intelligent machines applying Artificial Intelligence (AI) (Mwilu, Prat and Comyn-Wattiau 2015) nudge to facilitate decision usefulness. This paper conceptualizes ethical questions arising from the taxonomy engineering based on machine learning systems: Should the objective of the coding rule be to support or to influence human decision making or rational artificiality? This paper therefore advocates for a democratisation of information, education and transparency about nudges and coding rules (Puaschunder 2017a, b; 2018a, b)…(More)”.

Big data analytics to identify illegal construction waste dumping: A Hong Kong study


WeishengLu at Resources, Conservation and Recycling: “Illegal dumping, referring to the intentional and criminal abandonment of waste in unauthorized areas, has long plagued governments and environmental agencies worldwide. Despite the tremendous resources spent to combat it, the surreptitious nature of illegal dumping indicates the extreme difficulty in its identification. In 2006, the Construction Waste Disposal Charging Scheme (CWDCS) was implemented, regulating that all construction waste must be disposed of at government waste facilities if not otherwise properly reused or recycled.

While the CWDCS has significantly improved construction waste management in Hong Kong, it has also triggered illegal dumping problems. Inspired by the success of big data in combating urban crime, this paper aims to identify illegal dumping cases by mining a publicly available data set containing more than 9 million waste disposal records from 2011 to 2017. Using behavioral indicators and up-to-date big data analytics, possible drivers for illegal dumping (e.g., long queuing times) were identified. The analytical results also produced a list of 546 waste hauling trucks suspected of involvement in illegal dumping. This paper contributes to the understanding of illegal dumping behavior and joins the global research community in exploring the value of big data, particularly for combating urban crime. It also presents a three-step big data-enabled urban crime identification methodology comprising ‘Behavior characterization’, ‘Big data analytical model development’, and ‘Model training, calibration, and evaluation’….(More)”.

The Nail Finds a Hammer: Self-Sovereign Identity, Design Principles, and Property Rights in the Developing World


Report by Michael Graglia, Christopher Mellon and Tim Robustelli: “Our interest in identity systems was an inevitable outgrowth of our earlier work on blockchain-based1 land registries.2 Property registries, which at the simplest level are ledgers of who has which rights to which asset, require a very secure and reliable means of identifying both people and properties. In the course of investigating solutions to that problem, we began to appreciate the broader challenges of digital identity and its role in international development. And the more we learned about digital identity, the more convinced we became of the need for self-sovereign identity, or SSI. This model, and the underlying principles of identity which it incorporates, will be described in detail in this paper.

We believe that the great potential of SSI is that it can make identity in the digital world function more like identity in the physical world, in which every person has a unique and persistent identity which is represented to others by means of both their physical attributes and a collection of credentials attested to by various external sources of authority. These credentials are stored and controlled by the identity holder—typically in a wallet—and presented to different people for different reasons at the identity holder’s discretion. Crucially, the identity holder controls what information to present based on the environment, trust level, and type of interaction. Moreover, their fundamental identity persists even though the credentials by which it is represented may change over time.

The digital incarnation of this model has many benefits, including both greatly improved privacy and security, and the ability to create more trustworthy online spaces. Social media and news sites, for example, might limit participation to users with verified identities, excluding bots and impersonators.

The need for identification in the physical world varies based on location and social context. We expect to walk in relative anonymity down a busy city street, but will show a driver’s license to enter a bar, and both a driver’s license and a birth certificate to apply for a passport. There are different levels of ID and supporting documents required for each activity. But in each case, access to personal information is controlled by the user who may choose whether or not to share it.

Self-sovereign identity gives users complete control of their own identities and related personal data, which sits encrypted in distributed storage instead of being stored by a third party in a central database. In older, “federated identity” models, a single account—a Google account, for example—might be used to log in to a number of third-party sites, like news sites or social media platforms. But in this model a third party brokers all of these ID transactions, meaning that in exchange for the convenience of having to remember fewer passwords, the user must sacrifice a degree of privacy.

A real world equivalent would be having to ask the state to share a copy of your driver’s license with the bar every time you wanted to prove that you were over the age of 21. SSI, in contrast, gives the user a portable, digital credential (like a driver’s license or some other document that proves your age), the authenticity of which can be securely validated via cryptography without the recipient having to check with the authority that issued it. This means that while the credential can be used to access many different sites and services, there is no third-party broker to track the services to which the user is authenticating. Furthermore, cryptographic techniques called “zero-knowledge proofs” (ZKPs) can be used to prove possession of a credential without revealing the credential itself. This makes it possible, for example, for users to prove that they are over the age of 21 without having to share their actual birth dates, which are both sensitive information and irrelevant to a binary, yes-or-no ID transaction….(More)”.

Governing Open Data Platforms to Cultivate Innovation Ecosystems: The Case of the Government of Buenos Aires


Paper by Carla Bonina, Ben Eaton and Stefan Henningsson: “Open Government Data (OGD) is increasingly an object of research. Whilst referred to as a platform problem, few studies examine the phenomenon using platform concepts. One challenge governments face is to establish thriving OGD ecosystems through appropriate platform governance. The governance of innovating complements on the demand side of platforms, such as services using OGD datasets by entrepreneurs for citizen users, is well studied in platform literature. However, understanding of the supply side and how third parties can be governed to help innovate platform core architecture, such a ministries sourcing quality datasets for OGD platforms, is lacking. In our preliminary study of emergent OGD platforms in Latin America, we construct a model extending concepts of boundary resources from the demand side to the supply side to expand our understanding of platform governance. In addition to contributing to platform governance theories, we improve our understanding of OGD platform ecosystem cultivation….(More)”.

Beyond Open vs. Closed: Balancing Individual Privacy and Public Accountability in Data Sharing


Paper by Bill Howe et al: “Data too sensitive to be “open” for analysis and re-purposing typically remains “closed” as proprietary information. This dichotomy undermines efforts to make algorithmic systems more fair, transparent, and accountable. Access to proprietary data in particular is needed by government agencies to enforce policy, researchers to evaluate methods, and the public to hold agencies accountable; all of these needs must be met while preserving individual privacy and firm competitiveness. In this paper, we describe an integrated legaltechnical approach provided by a third-party public-private data trust designed to balance these competing interests.

Basic membership allows firms and agencies to enable low-risk access to data for compliance reporting and core methods research, while modular data sharing agreements support a wide array of projects and use cases. Unless specifically stated otherwise in an agreement, all data access is initially provided to end users through customized synthetic datasets that offer a) strong privacy guarantees, b) removal of signals that could expose competitive advantage for the data providers, and c) removal of biases that could reinforce discriminatory policies, all while maintaining empirically good fidelity to the original data. We find that the liberal use of synthetic data, in conjunction with strong legal protections over raw data, strikes a tunable balance between transparency, proprietorship, privacy, and research objectives; and that the legal-technical framework we describe can form the basis for organizational data trusts in a variety of contexts….(More)”.

An open-science crowdsourcing approach for producing community noise maps using smartphones


Judicaël Picaut at al at Building and Environment: “An alternative method is proposed for the assessment of the noise environment, on the basis of a crowdsourcing approach. For this purpose, a smartphone application and a spatial data infrastructure have been specifically developed in order to collect physical data (noise indicators, GPS positions, etc.) and perceptual data (pleasantness), without territorial limits, of the sound environment.

As the project is developed within an Open Science framework, all source codes, methodologies, tools and raw data are freely available, and if necessary, can be duplicated for any specific use. In particular, the collected data can be used by the scientific community, cities, associations, or any institution, which would like to develop new tools for the evaluation and representation of sound environments. In this paper, all the methodological and technical issues are detailed, and a first analysis of the collected data is proposed….(More)”.