The Global Commons of Data


Paper by Jennifer Shkabatur: “Data platform companies (such as Facebook, Google, or Twitter) amass and process immense amounts of data that is generated by their users. These companies primarily use the data to advance their commercial interests, but there is a growing public dismay regarding the adverse and discriminatory impacts of their algorithms on society at large. The regulation of data platform companies and their algorithms has been hotly debated in the literature, but current approaches often neglect the value of data collection, defy the logic of algorithmic decision-making, and exceed the platform companies’ operational capacities.

This Article suggests a different approach — an open, collaborative, and incentives-based stance toward data platforms that takes full advantage of the tremendous societal value of user-generated data. It contends that this data shall be recognized as a “global commons,” and access to it shall be made available to a wide range of independent stakeholders — research institutions, journalists, public authorities, and international organizations. These external actors would be able to utilize the data to address a variety of public challenges, as well as observe from within the operation and impacts of the platforms’ algorithms.

After making the theoretical case for the “global commons of data,” the Article explores the practical implementation of this model. First, it argues that a data commons regime should operate through a spectrum of data sharing and usage modalities that would protect the commercial interests of data platforms and the privacy of data users. Second, it discusses regulatory measures and incentives that can solicit the collaboration of platform companies with the commons model. Lastly, it explores the challenges embedded in this approach….(More)”.

A Third of Wikipedia Discussions Are Stuck in Forever Beefs


Samantha Cole at Motherboard: “Wikipedia, the internet’s encyclopedia, is run entirely by volunteers—people who spend large swaths of their personal time making sure the information that hundreds of millions of people access every day stays accurate and up-to-date. Of those volunteers, 77 percent of Wikipedia articles are written by just one percent of Wikipedia editors. As such, tensions tend to get a little high, because these editors are often highly invested. They’ve been arguing about corn for nearly a decade, for example, and there’s a long-running edit war about the meaning of neuroticism.

When editors disagree about an edit to be made on a Wikipedia article, they start by discussing it on the article’s Talk page. When that doesn’t result in a decision, they can open a Request for Comment (RfC). From there, any editor can choose a side or discuss the merits of whatever edit is up for discussion, and—in theory—come to an agreement. Or at least, some kind of decision about how to make the edit.

But a new study by MIT researchers found that as many as one-third of RfC disputes go unresolved, often abandoned out of frustration or exhaustion. The most common sticking points were chalked up to inexperience, inattention from experience editors, and just plain petty bickering….

But they didn’t just critique how Wikipedians argue: The researchers developed a tool called Wikum that they say will help resolve more discussions, and make it easier for editors to stay involved when arguments get gnarly. The tool uses the data they found and analyzed in this research, to summarize threads and predict when they’re at risk of going stale….(More)”.

Another Use for A.I.: Finding Millions of Unregistered Voters


Steve Lohr at The New York Times: “The mechanics of elections that attract the most attention are casting and counting, snafus with voting machines and ballots and allegations of hacking and fraud. But Jeff Jonas, a prominent data scientist, is focused on something else: the integrity, updating and expansion of voter rolls.

“As I dove into the subject, it grew on me, the complexity and relevance of the problem,” he said.

As a result, Mr. Jonas has played a geeky, behind-the-scenes role in encouraging turnout for the midterm elections on Tuesday.

For the last four years, Mr. Jonas has used his software for a multistate project known as Electronic Registration Information Center that identifies eligible voters and cleans up voter rolls. Since its founding in 2012, the nonprofit center has identified 26 million people who are eligible but unregistered to vote, as well as 10 million registered voters who have moved, appear on more than one list or have died.

“I have no doubt that more people are voting as a result of ERIC,” said John Lindback, a former senior election administrator in Oregon and Alaska who was the center’s first executive director.

Voter rolls, like nearly every aspect of elections, are a politically charged issue. ERIC, brought together by the Pew Charitable Trusts, is meant to play it down the middle. It was started largely with professional election administrators, from both red and blue states.

But the election officials recognized that their headaches often boiled down to a data-handling challenge. Then Mr. Jonas added his technology, which has been developed and refined for decades. It is artificial intelligence software fine-tuned for spotting and resolving identities, whether people or things….(More)”.

Governments fail to capitalise on swaths of open data


Valentina Romei in the Financial Times: “…Behind the push for open data is a desire to make governments more transparent, accountable and efficient — but also to allow businesses to create products and services that spark economic development. The global annual opportunity cost of failing to do this effectively is about $5tn, according to one estimate from McKinsey, the consultancy.

The UK is not the only country falling short, says the Open Data Barometer, which monitors the status of government data across the world. Among the 30 leading governments — those that have championed the open data movement and have made progress over five years — “less than a quarter of the data with the biggest potential for social and economic impact” is truly open. This goal of transparency, it seems, has not proved sufficient for “creating value” — the movement’s latest focus. In 2015, nearly a decade after advocates first discussed the principles of open government data, 62 countries adopted the six Open Data Charter principles — which called for data to be open by default, usable and comparable….

The use of open data has already bore fruit for some countries. In 2015, Japan’s ministry of land, infrastructure and transport set up an open data site aimed at disabled and elderly people. The 7,000 data points published are downloadable and the service can be used to generate a map that shows which passenger terminals on train, bus and ferry networksprovide barrier-free access.

In the US, The Climate Corporation, a digital agriculture company, combined 30 years of weather data and 60 years of crop yield data to help farmers increase their productivity. And in the UK, subscription service Land Insight merges different sources of land data to help individuals and developers compare property information, forecast selling prices, contact land owners and track planning applications…
Open Data 500, an international network of organisations that studies the use and impact of open data, reveals that private companies in South Korea are using government agency data, with technology, advertising and business services among the biggest users. It shows, for example, that Archidraw, a four-year-old Seoul-based company that provides 3D visualisation tools for interior design and property remodelling, has used mapping data from the Ministry of Land, Infrastructure and Transport…(More)”.

Beyond Open vs. Closed: Balancing Individual Privacy and Public Accountability in Data Sharing


Paper by Bill Howe et al: “Data too sensitive to be “open” for analysis and re-purposing typically remains “closed” as proprietary information. This dichotomy undermines efforts to make algorithmic systems more fair, transparent, and accountable. Access to proprietary data in particular is needed by government agencies to enforce policy, researchers to evaluate methods, and the public to hold agencies accountable; all of these needs must be met while preserving individual privacy and firm competitiveness. In this paper, we describe an integrated legaltechnical approach provided by a third-party public-private data trust designed to balance these competing interests.

Basic membership allows firms and agencies to enable low-risk access to data for compliance reporting and core methods research, while modular data sharing agreements support a wide array of projects and use cases. Unless specifically stated otherwise in an agreement, all data access is initially provided to end users through customized synthetic datasets that offer a) strong privacy guarantees, b) removal of signals that could expose competitive advantage for the data providers, and c) removal of biases that could reinforce discriminatory policies, all while maintaining empirically good fidelity to the original data. We find that the liberal use of synthetic data, in conjunction with strong legal protections over raw data, strikes a tunable balance between transparency, proprietorship, privacy, and research objectives; and that the legal-technical framework we describe can form the basis for organizational data trusts in a variety of contexts….(More)”.

The Nail Finds a Hammer: Self-Sovereign Identity, Design Principles, and Property Rights in the Developing World


Report by Michael Graglia, Christopher Mellon and Tim Robustelli: “Our interest in identity systems was an inevitable outgrowth of our earlier work on blockchain-based1 land registries.2 Property registries, which at the simplest level are ledgers of who has which rights to which asset, require a very secure and reliable means of identifying both people and properties. In the course of investigating solutions to that problem, we began to appreciate the broader challenges of digital identity and its role in international development. And the more we learned about digital identity, the more convinced we became of the need for self-sovereign identity, or SSI. This model, and the underlying principles of identity which it incorporates, will be described in detail in this paper.

We believe that the great potential of SSI is that it can make identity in the digital world function more like identity in the physical world, in which every person has a unique and persistent identity which is represented to others by means of both their physical attributes and a collection of credentials attested to by various external sources of authority. These credentials are stored and controlled by the identity holder—typically in a wallet—and presented to different people for different reasons at the identity holder’s discretion. Crucially, the identity holder controls what information to present based on the environment, trust level, and type of interaction. Moreover, their fundamental identity persists even though the credentials by which it is represented may change over time.

The digital incarnation of this model has many benefits, including both greatly improved privacy and security, and the ability to create more trustworthy online spaces. Social media and news sites, for example, might limit participation to users with verified identities, excluding bots and impersonators.

The need for identification in the physical world varies based on location and social context. We expect to walk in relative anonymity down a busy city street, but will show a driver’s license to enter a bar, and both a driver’s license and a birth certificate to apply for a passport. There are different levels of ID and supporting documents required for each activity. But in each case, access to personal information is controlled by the user who may choose whether or not to share it.

Self-sovereign identity gives users complete control of their own identities and related personal data, which sits encrypted in distributed storage instead of being stored by a third party in a central database. In older, “federated identity” models, a single account—a Google account, for example—might be used to log in to a number of third-party sites, like news sites or social media platforms. But in this model a third party brokers all of these ID transactions, meaning that in exchange for the convenience of having to remember fewer passwords, the user must sacrifice a degree of privacy.

A real world equivalent would be having to ask the state to share a copy of your driver’s license with the bar every time you wanted to prove that you were over the age of 21. SSI, in contrast, gives the user a portable, digital credential (like a driver’s license or some other document that proves your age), the authenticity of which can be securely validated via cryptography without the recipient having to check with the authority that issued it. This means that while the credential can be used to access many different sites and services, there is no third-party broker to track the services to which the user is authenticating. Furthermore, cryptographic techniques called “zero-knowledge proofs” (ZKPs) can be used to prove possession of a credential without revealing the credential itself. This makes it possible, for example, for users to prove that they are over the age of 21 without having to share their actual birth dates, which are both sensitive information and irrelevant to a binary, yes-or-no ID transaction….(More)”.

Why Doctors Hate Their Computers


Atul Gawande at the New Yorker: “….More than ninety per cent of American hospitals have been computerized during the past decade, and more than half of Americans have their health information in the Epic system. Seventy thousand employees of Partners HealthCare—spread across twelve hospitals and hundreds of clinics in New England—were going to have to adopt the new software. I was in the first wave of implementation, along with eighteen thousand other doctors, nurses, pharmacists, lab techs, administrators, and the like.

The surgeons at the training session ranged in age from thirty to seventy, I estimated—about sixty per cent male, and one hundred per cent irritated at having to be there instead of seeing patients. Our trainer looked younger than any of us, maybe a few years out of college, with an early-Justin Bieber wave cut, a blue button-down shirt, and chinos. Gazing out at his sullen audience, he seemed unperturbed. I learned during the next few sessions that each instructor had developed his or her own way of dealing with the hostile rabble. One was encouraging and parental, another unsmiling and efficient. Justin Bieber took the driver’s-ed approach: You don’t want to be here; I don’t want to be here; let’s just make the best of it.

I did fine with the initial exercises, like looking up patients’ names and emergency contacts. When it came to viewing test results, though, things got complicated. There was a column of thirteen tabs on the left side of my screen, crowded with nearly identical terms: “chart review,” “results review,” “review flowsheet.” We hadn’t even started learning how to enter information, and the fields revealed by each tab came with their own tools and nuances.

But I wasn’t worried. I’d spent my life absorbing changes in computer technology, and I knew that if I pushed through the learning curve I’d eventually be doing some pretty cool things. In 1978, when I was an eighth grader in Ohio, I built my own one-kilobyte computer from a mail-order kit, learned to program in basic, and was soon playing the arcade game Pong on our black-and-white television set. The next year, I got a Commodore 64 from RadioShack and became the first kid in my school to turn in a computer-printed essay (and, shortly thereafter, the first to ask for an extension “because the computer ate my homework”). As my Epic training began, I expected my patience to be rewarded in the same way.

My hospital had, over the years, computerized many records and processes, but the new system would give us one platform for doing almost everything health professionals needed—recording and communicating our medical observations, sending prescriptions to a patient’s pharmacy, ordering tests and scans, viewing results, scheduling surgery, sending insurance bills. With Epic, paper lab-order slips, vital-signs charts, and hospital-ward records would disappear. We’d be greener, faster, better.

But three years later I’ve come to feel that a system that promised to increase my mastery over my work has, instead, increased my work’s mastery over me. I’m not the only one. A 2016 study found that physicians spent about two hours doing computer work for every hour spent face to face with a patient—whatever the brand of medical software. In the examination room, physicians devoted half of their patient time facing the screen to do electronic tasks. And these tasks were spilling over after hours. The University of Wisconsin found that the average workday for its family physicians had grown to eleven and a half hours. The result has been epidemic levels of burnout among clinicians. Forty per cent screen positive for depression, and seven per cent report suicidal thinking—almost double the rate of the general working population.

Something’s gone terribly wrong. Doctors are among the most technology-avid people in society; computerization has simplified tasks in many industries. Yet somehow we’ve reached a point where people in the medical profession actively, viscerally, volubly hate their computers….(More)”.

Beyond democracy: could seasteads and cryptocurrencies replace the nation state?


Patri Friedman in The Spectator: “For the past 20 years I’ve been working to enable start-up societies: permanent autonomous zones on land or at sea intended to accelerate economic development and to serve as laboratories for voluntary political experiments.

For just as long (in fact since I first read The Sovereign Individual), I’ve been interested in the potential of digital cash, which is finally arriving in the form of bitcoin and the emerging cryptocurrency industry.

Start-up societies and cryptocurrencies have many parallels. Both grew from individualist movements seeking ways to take their philosophy from online message boards to the real world. Both seek to decentralise power in order to disrupt traditional institutions seen as having been captured by selfish elites. And both are critically dependent on ‘governance’ — the technology of designing and enforcing rules for collective decision-making.

Because of these parallels, people are often curious about how the two movements relate. Will seasteads — as manmade permanent dwellings at sea are known — use cryptocurrencies? Will blockchain projects such as Bitnation replace the nation state? In a world of competing virtual economic systems, do we even need to reform government in real life? (Answers: maybe, not soon and absolutely.)

There’s an old saying that we overestimate what we can accomplish in a week, but underestimate what we can accomplish in a decade. Similarly, I think people greatly overestimate the immediate impact of blockchain on startup countries, while underestimating the degree to which the fates of start-up countries and blockchain are ultimately intertwined.

In the near term, I don’t believe that blockchain will somehow enable start-up societies. The reason is simple: the hard thing about starting a new country is not the payment system. That’s why we live in a world with 1,000 cryptocurrencies but no sovereign micro-nations.

I’m also sceptical of the crypto-anarchy theory that rapidly evolving online institutions will somehow remove the need for improving offline ones. Physical space underpins virtual space, and most human activity still happens in physical space. Moreover, no matter how transcendently effulgent your networked life is, it can be ended by a single bullet. So the performance of your friendly neighbourhood nation state, with its monopoly on physical violence, still matters in the digital age…(More)”

A Behavioral Economics Approach to Digitalisation


Paper by Dirk Beerbaum and Julia M. Puaschunder: “A growing body of academic research in the field of behavioural economics, political science and psychology demonstrate how an invisible hand can nudge people’s decisions towards a preferred option. Contrary to the assumptions of the neoclassical economics, supporters of nudging argue that people have problems coping with a complex world, because of their limited knowledge and their restricted rationality. Technological improvement in the age of information has increased the possibilities to control the innocent social media users or penalise private investors and reap the benefits of their existence in hidden persuasion and discrimination. Nudging enables nudgers to plunder the simple uneducated and uninformed citizen and investor, who is neither aware of the nudging strategies nor able to oversee the tactics used by the nudgers (Puaschunder 2017a, b; 2018a, b).

The nudgers are thereby legally protected by democratically assigned positions they hold. The law of motion of the nudging societies holds an unequal concentration of power of those who have access to compiled data and coding rules, relevant for political power and influencing the investor’s decision usefulness (Puaschunder 2017a, b; 2018a, b). This paper takes as a case the “transparency technology XBRL (eXtensible Business Reporting Language)” (Sunstein 2013, 20), which should make data more accessible as well as usable for private investors. It is part of the choice architecture on regulation by governments (Sunstein 2013). However, XBRL is bounded to a taxonomy (Piechocki and Felden 2007).

Considering theoretical literature and field research, a representation issue (Beerbaum, Piechocki and Weber 2017) for principles-based accounting taxonomies exists, which intelligent machines applying Artificial Intelligence (AI) (Mwilu, Prat and Comyn-Wattiau 2015) nudge to facilitate decision usefulness. This paper conceptualizes ethical questions arising from the taxonomy engineering based on machine learning systems: Should the objective of the coding rule be to support or to influence human decision making or rational artificiality? This paper therefore advocates for a democratisation of information, education and transparency about nudges and coding rules (Puaschunder 2017a, b; 2018a, b)…(More)”.

Big data analytics to identify illegal construction waste dumping: A Hong Kong study


WeishengLu at Resources, Conservation and Recycling: “Illegal dumping, referring to the intentional and criminal abandonment of waste in unauthorized areas, has long plagued governments and environmental agencies worldwide. Despite the tremendous resources spent to combat it, the surreptitious nature of illegal dumping indicates the extreme difficulty in its identification. In 2006, the Construction Waste Disposal Charging Scheme (CWDCS) was implemented, regulating that all construction waste must be disposed of at government waste facilities if not otherwise properly reused or recycled.

While the CWDCS has significantly improved construction waste management in Hong Kong, it has also triggered illegal dumping problems. Inspired by the success of big data in combating urban crime, this paper aims to identify illegal dumping cases by mining a publicly available data set containing more than 9 million waste disposal records from 2011 to 2017. Using behavioral indicators and up-to-date big data analytics, possible drivers for illegal dumping (e.g., long queuing times) were identified. The analytical results also produced a list of 546 waste hauling trucks suspected of involvement in illegal dumping. This paper contributes to the understanding of illegal dumping behavior and joins the global research community in exploring the value of big data, particularly for combating urban crime. It also presents a three-step big data-enabled urban crime identification methodology comprising ‘Behavior characterization’, ‘Big data analytical model development’, and ‘Model training, calibration, and evaluation’….(More)”.