Who will benefit most from the data economy?


Special Report by The Economist: “The data economy is a work in progress. Its economics still have to be worked out; its infrastructure and its businesses need to be fully built; geopolitical arrangements must be found. But there is one final major tension: between the wealth the data economy will create and how it will be distributed. The data economy—or the “second economy”, as Brian Arthur of the Santa Fe Institute terms it—will make the world a more productive place no matter what, he predicts. But who gets what and how is less clear. “We will move from an economy where the main challenge is to produce more and more efficiently,” says Mr Arthur, “to one where distribution of the wealth produced becomes the biggest issue.”

The data economy as it exists today is already very unequal. It is dominated by a few big platforms. In the most recent quarter, Amazon, Apple, Alphabet, Microsoft and Facebook made a combined profit of $55bn, more than the next five most valuable American tech firms over the past 12 months. This corporate inequality is largely the result of network effects—economic forces that mean size begets size. A firm that can collect a lot of data, for instance, can make better use of artificial intelligence and attract more users, who in turn supply more data. Such firms can also recruit the best data scientists and have the cash to buy the best ai startups.

It is also becoming clear that, as the data economy expands, these sorts of dynamics will increasingly apply to non-tech companies and even countries. In many sectors, the race to become a dominant data platform is on. This is the mission of Compass, a startup, in residential property. It is one goal of Tesla in self-driving cars. And Apple and Google hope to repeat the trick in health care. As for countries, America and China account for 90% of the market capitalisation of the world’s 70 largest platforms (see chart), Africa and Latin America for just 1%. Economies on both continents risk “becoming mere providers of raw data…while having to pay for the digital intelligence produced,” the United Nations Conference on Trade and Development recently warned.

Yet it is the skewed distribution of income between capital and labour that may turn out to be the most pressing problem of the data economy. As it grows, more labour will migrate into the mirror worlds, just as other economic activity will. It is not only that people will do more digitally, but they will perform actual “data work”: generating the digital information needed to train and improve ai services. This can mean simply moving about online and providing feedback, as most people already do. But it will increasingly include more active tasks, such as labelling pictures, driving data-gathering vehicles and perhaps, one day, putting one’s digital twin through its paces. This is the reason why some say ai should actually be called “collective intelligence”: it takes in a lot of human input—something big tech firms hate to admit….(More)”.

We All Wear Tinfoil Hats Now


Article by Geoff Shullenberger on “How fears of mind control went from paranoid delusion to conventional wisdom”: “In early 2017, after the double shock of Brexit and the election of Donald Trump, the British data-mining firm Cambridge Analytica gained sudden notoriety. The previously little-known company, reporters claimed, had used behavioral influencing techniques to turn out social media users to vote in both elections. By its own account, Cambridge Analytica had worked with both campaigns to produce customized propaganda for targeting individuals on Facebook likely to be swept up in the tide of anti-immigrant populism. Its methods, some news sources suggested, might have sent enough previously disengaged voters to the polls to have tipped the scales in favor of the surprise victors. To a certain segment of the public, this story seemed to answer the question raised by both upsets: How was it possible that the seemingly solid establishment consensus had been rejected? What’s more, the explanation confirmed everything that seemed creepy about the Internet, evoking a sci-fi vision of social media users turned into an army of political zombies, mobilized through subliminal manipulation.

Cambridge Analytica’s violations of Facebook users’ privacy have made it an enduring symbol of the dark side of social media. However, the more dramatic claims about the extent of the company’s political impact collapse under closer scrutiny, mainly because its much-hyped “psychographic targeting” methods probably don’t work. As former Facebook product manager Antonio García Martínez noted in a 2018 Wired article, “the public, with no small help from the media sniffing a great story, is ready to believe in the supernatural powers of a mostly unproven targeting strategy,” but “most ad insiders express skepticism about Cambridge Analytica’s claims of having influenced the election, and stress the real-world difficulty of changing anyone’s mind about anything with mere Facebook ads, least of all deeply ingrained political views.” According to García, the entire affair merely confirms a well-established truth: “In the ads world, just because a product doesn’t work doesn’t mean you can’t sell it….(More)”.

Experts say privately held data available in the European Union should be used better and more


European Commission: “Data can solve problems from traffic jams to disaster relief, but European countries are not yet using this data to its full potential, experts say in a report released today. More secure and regular data sharing across the EU could help public administrations use private sector data for the public good.

In order to increase Business-to-Government (B2G) data sharing, the experts advise to make data sharing in the EU easier by taking policy, legal and investment measures in three main areas:

  1. Governance of B2G data sharing across the EU: such as putting in place national governance structures, setting up a recognised function (‘data stewards’) in public and private organisations, and exploring the creation of a cross-EU regulatory framework.
  2. Transparency, citizen engagement and ethics: such as making B2G data sharing more citizen-centric, developing ethical guidelines, and investing in training and education.
  3. Operational models, structures and technical tools: such as creating incentives for companies to share data, carrying out studies on the benefits of B2G data sharing, and providing support to develop the technical infrastructure through the Horizon Europe and Digital Europe programmes.

They also revised the principles on private sector data sharing in B2G contexts and included new principles on accountability and on fair and ethical data use, which should guide B2G data sharing for the public interest. Examples of successful B2G data sharing partnerships in the EU include an open forest data system in Finland to help manage the ecosystem, mapping of EU fishing activities using ship tracking data, and genome sequencing data of breast cancer patients to identify new personalised treatments. …

The High-Level Expert Group on Business-to-Government Data Sharing was set up in autumn 2018 and includes members from a broad range of interests and sectors. The recommendations presented today in its final report feed into the European strategy for data and can be used as input for other possible future Commission initiatives on Business-to-Government data sharing….(More)”.

Government by Algorithm: Artificial Intelligence in Federal Administrative Agencies


The Administrative Conference of the United States: “Artificial intelligence (AI) promises to transform how government agencies do their work. Rapid developments in AI have the potential to reduce the cost of core governance functions, improve the quality of decisions, and unleash the power of administrative data, thereby making government performance more efficient and effective. Agencies that use AI to realize these gains will also confront important questions about the proper design of algorithms and user interfaces, the respective scope of human and machine decision-making, the boundaries between public actions and private contracting, their own capacity to learn over time using AI, and whether the use of AI is even permitted.

These are important issues for public debate and academic inquiry. Yet little is known about how agencies are currently using AI systems beyond a few headlinegrabbing examples or surface-level descriptions. Moreover, even amidst growing public and  scholarly discussion about how society might regulate government use of AI, little attention has been devoted to how agencies acquire such tools in the first place or oversee their use. In an effort to fill these gaps, the Administrative Conference of the United States (ACUS) commissioned this report from researchers at Stanford University and New York University. The research team included a diverse set of lawyers, law students, computer scientists, and social scientists with the capacity to analyze these cutting-edge issues from technical, legal, and policy angles. The resulting report offers three cuts at federal agency use of AI:

  • a rigorous canvass of AI use at the 142 most significant federal departments, agencies, and sub-agencies (Part I)
  • a series of in-depth but accessible case studies of specific AI applications at seven leading agencies covering a range of governance tasks (Part II); and
  • a set of cross-cutting analyses of the institutional, legal, and policy challenges raised by agency use of AI (Part III)….(More)”

Urban Systems Design Creating Sustainable Smart Cities in the Internet of Things Era


Book edited by Yoshiki Yamagata and Perry P.J. Yang: “…shows how to design, model and monitor smart communities using a distinctive IoT-based urban systems approach. Focusing on the essential dimensions that constitute smart communities energy, transport, urban form, and human comfort, this helpful guide explores how IoT-based sharing platforms can achieve greater community health and well-being based on relationship building, trust, and resilience. Uncovering the achievements of the most recent research on the potential of IoT and big data, this book shows how to identify, structure, measure and monitor multi-dimensional urban sustainability standards and progress.

This thorough book demonstrates how to select a project, which technologies are most cost-effective, and their cost-benefit considerations. The book also illustrates the financial, institutional, policy and technological needs for the successful transition to smart cities, and concludes by discussing both the conventional and innovative regulatory instruments needed for a fast and smooth transition to smart, sustainable communities….(More)”.

Dark Data: Why What You Don’t Know Matters


Book by David J. Hand: “In the era of big data, it is easy to imagine that we have all the information we need to make good decisions. But in fact the data we have are never complete, and may be only the tip of the iceberg. Just as much of the universe is composed of dark matter, invisible to us but nonetheless present, the universe of information is full of dark data that we overlook at our peril. In Dark Data, data expert David Hand takes us on a fascinating and enlightening journey into the world of the data we don’t see.

Dark Data explores the many ways in which we can be blind to missing data and how that can lead us to conclusions and actions that are mistaken, dangerous, or even disastrous. Examining a wealth of real-life examples, from the Challenger shuttle explosion to complex financial frauds, Hand gives us a practical taxonomy of the types of dark data that exist and the situations in which they can arise, so that we can learn to recognize and control for them. In doing so, he teaches us not only to be alert to the problems presented by the things we don’t know, but also shows how dark data can be used to our advantage, leading to greater understanding and better decisions.

Today, we all make decisions using data. Dark Data shows us all how to reduce the risk of making bad ones….(More)”.

How Philanthropy Can Help Lead on Data Justice


Louise Lief at Stanford Social Innovation Review: “Today, data governs almost every aspect of our lives, shaping the opportunities we have, how we perceive reality and understand problems, and even what we believe to be possible. Philanthropy is particularly data driven, relying on it to inform decision-making, define problems, and measure impact. But what happens when data design and collection methods are flawed, lack context, or contain critical omissions and misdirected questions? With bad data, data-driven strategies can misdiagnose problems and worsen inequities with interventions that don’t reflect what is needed.

Data justice begins by asking who controls the narrative. Who decides what data is collected and for which purpose? Who interprets what it means for a community? Who governs it? In recent years, affected communities, social justice philanthropists, and academics have all begun looking deeper into the relationship between data and social justice in our increasingly data-driven world. But philanthropy can play a game-changing role in developing practices of data justice to more accurately reflect the lived experience of communities being studied. Simply incorporating data justice principles into everyday foundation practice—and requiring it of grantees—would be transformative: It would not only revitalize research, strengthen communities, influence policy, and accelerate social change, it would also help address deficiencies in current government data sets.

When Data Is Flawed

Some of the most pioneering work on data justice has been done by Native American communities, who have suffered more than most from problems with bad data. A 2017 analysis of American Indian data challenges—funded by the W.K. Kellogg Foundation and the Morris K. Udall and Stewart L. Udall Foundation—documented how much data on Native American communities is of poor quality, inaccurate, inadequate, inconsistent, irrelevant, and/or inaccessible. The National Congress of American Indians even described American Native communities as “The Asterisk Nation,” because in many government data sets they are represented only by an asterisk denoting sampling errors instead of data points.

Where it concerns Native Americans, data is often not standardized and different government databases identify tribal members at least seven different ways using different criteria; federal and state statistics often misclassify race and ethnicity; and some data collection methods don’t allow tribes to count tribal citizens living off the reservation. For over a decade the Department of the Interior’s Bureau of Indian Affairs has struggled to capture the data it needs for a crucial labor force report it is legally required to produce; methodology errors and reporting problems have been so extensive that at times it prevented the report from even being published. But when the Department of the Interior changed several reporting requirements in 2014 and combined data submitted by tribes with US Census data, it only compounded the problem, making historical comparisons more difficult. Moreover, Native Americans have charged that the Census Bureau significantly undercounts both the American Indian population and key indicators like joblessness….(More)”.

New privacy-protected Facebook data for independent research on social media’s impact on democracy


Chaya Nayak at Facebook: “In 2018, Facebook began an initiative to support independent academic research on social media’s role in elections and democracy. This first-of-its-kind project seeks to provide researchers access to privacy-preserving data sets in order to support research on these important topics.

Today, we are announcing that we have substantially increased the amount of data we’re providing to 60 academic researchers across 17 labs and 30 universities around the world. This release delivers on the commitment we made in July 2018 to share a data set that enables researchers to study information and misinformation on Facebook, while also ensuring that we protect the privacy of our users.

This new data release supplants data we released in the fall of 2019. That 2019 data set consisted of links that had been shared publicly on Facebook by at least 100 unique Facebook users. It included information about share counts, ratings by Facebook’s third-party fact-checkers, and user reporting on spam, hate speech, and false news associated with those links. We have expanded the data set to now include more than 38 million unique links with new aggregated information to help academic researchers analyze how many people saw these links on Facebook and how they interacted with that content – including views, clicks, shares, likes, and other reactions. We’ve also aggregated these shares by age, gender, country, and month. And, we have expanded the time frame covered by the data from January 2017 – February 2019 to January 2017 – August 2019.

With this data, researchers will be able to understand important aspects of how social media shapes our world. They’ll be able to make progress on the research questions they proposed, such as “how to characterize mainstream and non-mainstream online news sources in social media” and “studying polarization, misinformation, and manipulation across multiple platforms and the larger information ecosystem.”

In addition to the data set of URLs, researchers will continue to have access to CrowdTangle and Facebook’s Ad Library API to augment their analyses. Per the original plan for this project, outside of a limited review to ensure that no confidential or user data is inadvertently released, these researchers will be able to publish their findings without approval from Facebook.

We are sharing this data with researchers while continuing to prioritize the privacy of people who use our services. This new data set, like the data we released before it, is protected by a method known as differential privacy. Researchers have access to data tables from which they can learn about aggregated groups, but where they cannot identify any individual user. As Harvard University’s Privacy Tools project puts it:

“The guarantee of a differentially private algorithm is that its behavior hardly changes when a single individual joins or leaves the dataset — anything the algorithm might output on a database containing some individual’s information is almost as likely to have come from a database without that individual’s information. … This gives a formal guarantee that individual-level information about participants in the database is not leaked.” …(More)”

This emoji could mean your suicide risk is high, according to AI


Rebecca Ruiz at Mashable: “Since its founding in 2013, the free mental health support service Crisis Text Line has focused on using data and technology to better aid those who reach out for help. 

Unlike helplines that offer assistance based on the order in which users dialed, texted, or messaged, Crisis Text Line has an algorithm that determines who is in most urgent need of counseling. The nonprofit is particularly interested in learning which emoji and words texters use when their suicide risk is high, so as to quickly connect them with a counselor. Crisis Text Line just released new insights about those patterns. 

Based on its analysis of 129 million messages processed between 2013 and the end of 2019, the nonprofit found that the pill emoji, or ?, was 4.4 times more likely to end in a life-threatening situation than the word suicide. 

Other words that indicate imminent danger include 800mg, acetaminophen, excedrin, and antifreeze; those are two to three times more likely than the word suicide to involve an active rescue of the texter. The loudly crying emoji face, or ?, is similarly high-risk. In general, the words that trigger the greatest alarm suggest the texter has a method or plan to attempt suicide or may be in the process of taking their own life. …(More)”.

Our personal health history is too valuable to be harvested by the tech giants


Eerke Boiten at The Guardian: “…It is clear that the black box society does not only feed on internet surveillance information. Databases collected by public bodies are becoming more and more part of the dark data economy. Last month, it emerged that a data broker in receipt of the UK’s national pupil database had shared its access with gambling companies. This is likely to be the tip of the iceberg; even where initial recipients of shared data might be checked and vetted, it is much harder to oversee who the data is passed on to from there.

Health data, the rich population-wide information held within the NHS, is another such example. Pharmaceutical companies and internet giants have been eyeing the NHS’s extensive databases for commercial exploitation for many years. Google infamously claimed it could save 100,000 lives if only it had free rein with all our health data. If there really is such value hidden in NHS data, do we really want Google to extract it to sell it to us? Google still holds health data that its subsidiary DeepMind Health obtained illegally from the NHS in 2016.

Although many health data-sharing schemes, such as in the NHS’s register of approved data releases], are said to be “anonymised”, this offers a limited guarantee against abuse.

There is just too much information included in health data that points to other aspects of patients’ lives and existence. If recipients of anonymised health data want to use it to re-identify individuals, they will often be able to do so by combining it, for example, with publicly available information. That this would be illegal under UK data protection law is a small consolation as it would be extremely hard to detect.

It is clear that providing access to public organisations’ data for research purposes can serve the greater good and it is unrealistic to expect bodies such as the NHS to keep this all in-house.

However, there are other methods by which to do this, beyond the sharing of anonymised databases. CeLSIUS, for example, a physical facility where researchers can interrogate data under tightly controlled conditions for specific registered purposes, holds UK census information over many years.

These arrangements prevent abuse, such as through deanonymisation, do not have the problem of shared data being passed on to third parties and ensure complete transparency of the use of the data. Online analogues of such set-ups do not yet exist, but that is where the future of safe and transparent access to sensitive data lies….(More)”.