Facebook’s next project: American inequality


Nancy Scola at Politico: “Facebook CEO Mark Zuckerberg is quietly cracking open his company’s vast trove of user data for a study on economic inequality in the U.S. — the latest sign of his efforts to reckon with divisions in American society that the social network is accused of making worse.

The study, which hasn’t previously been reported, is mining the social connections among Facebook’s American users to shed light on the growing income disparity in the U.S., where the top 1 percent of households is said to control 40 percent of the country’s wealth. Facebook is an incomparably rich source of information for that kind of research: By one estimate, about three of five American adults use the social network….

Facebook confirmed the broad contours of its partnership with Chetty but declined to elaborate on the substance of the study. Chetty, in a brief interview following a January speech in Washington, said he and his collaborators — who include researchers from Stanford and New York University — have been working on the inequality study for at least six months.

“We’re using social networks, and measuring interactions there, to understand the role of social capital much better than we’ve been able to,” he said.

Researchers say they see Facebook’s enormous cache of data as a remarkable resource, offering an unprecedentedly detailed and sweeping look at American society. That store of information contains both details that a user might tell Facebook — their age, hometown, schooling, family relationships — and insights that the company has picked up along the way, such as the interest groups they’ve joined and geographic distribution of who they call a “friend.”

It’s all the more significant, researchers say, when you consider that Facebook’s user base — about 239 million monthly users in the U.S. and Canada at last count — cuts across just about every demographic group.

And all that information, say researchers, lets them take guesses about users’ wealth. Facebook itself recently patented a way of figuring out someone’s socioeconomic status using factors ranging from their stated hobbies to how many internet-connected devices they own.

A Facebook spokesman addressed the potential privacy implications of the study’s access to user data, saying, “We conduct research at Facebook responsibly, which includes making sure we protect people’s information.” The spokesman added that Facebook follows an “enhanced” review process for research projects, adopted in 2014 after a controversy over a study that manipulated some people’s news feeds to see if it made them happier or sadder.

According to a Stanford University source familiar with Chetty’s study, the Facebook account data used in the research has been stripped of any details that could be used to identify users. The source added that academics involved in the study have gone through security screenings that include background checks, and can access the Facebook data only in secure facilities….(More)”.

Can Crowdsourcing and Collaboration Improve the Future of Human Health?


Ben Wiegand at Scientific American: “The process of medical research has been likened to searching for a needle in a haystack. With the continued acceleration of novel science and health care technologies in areas like artificial intelligence, digital therapeutics and the human microbiome we have tremendous opportunity to search the haystack in new and exciting ways. Applying these high-tech advances to today’s most pressing health issues increases our ability to address the root cause of disease, intervene earlier and change the trajectory of human health.

Global crowdsourcing forums, like the Johnson & Johnson Innovation QuickFire Challenges, can be incredibly valuable tools for searching the “haystack.” An initiative of JLABS—the no-strings-attached incubators of Johnson & Johnson Innovation—these contests spur scientific diversity through crowdsourcing, inspiring and attracting fresh thinking. They seek to stimulate the global innovation ecosystem through funding, mentorship and access to resources that can kick-start breakthrough ideas.

Our most recent challenge, the Next-Gen Baby Box QuickFire Challenge, focused on updating the 80-year-old “Finnish baby box,” a free, government-issued maternity supply kit for new parents containing such essentials as baby clothing, bath and sleep supplies packaged in a sleep-safe cardboard box. Since it first launched, the baby box has, together with increased use of maternal healthcare services early in pregnancy, helped to significantly reduce the Finnish infant mortality rate from 65 in every 1,000 live births in the 1930s to 2.5 per 1,000 today—one of the lowest rates in the world.

Partnering with Finnish innovation and government groups, we set out to see if updating this popular early parenting tool with the power of personalized health technology might one day impact Finland’s unparalleled high rate of type 1 diabetes. We issued the call globally to help create “the Baby Box of the future” as part of the Janssen and Johnson & Johnson Innovation vision to create a world without disease by accelerating science and delivering novel solutions to prevent, intercept and cure disease. The contest brought together entrepreneurs, researchers and innovators to focus on ideas with the potential to promote child health, detect childhood disease earlier and facilitate healthy parenting.

Incentive challenges like this award participants who have most effectively met a predefined objective or task. It’s a concept that emerged well before our time—as far back as the 18th century—from Napoleon’s Food Preservation Prize, meant to find a way to keep troops fed during battle, to the Longitude Prize for improved marine navigation.

Research shows that prize-based challenges that attract talent across a wide range of disciplines can generate greater risk-taking and yield more dramatic solutions….(More)”.

The Social Media Threat to Society and Security


George Soros at Project Syndicate: “It takes significant effort to assert and defend what John Stuart Mill called the freedom of mind. And there is a real chance that, once lost, those who grow up in the digital age – in which the power to command and shape people’s attention is increasingly concentrated in the hands of a few companies – will have difficulty regaining it.

The current moment in world history is a painful one. Open societies are in crisis, and various forms of dictatorships and mafia states, exemplified by Vladimir Putin’s Russia, are on the rise. In the United States, President Donald Trump would like to establish his own mafia-style state but cannot, because the Constitution, other institutions, and a vibrant civil society won’t allow it….

The rise and monopolistic behavior of the giant American Internet platform companies is contributing mightily to the US government’s impotence. These companies have often played an innovative and liberating role. But as Facebook and Google have grown ever more powerful, they have become obstacles to innovation, and have caused a variety of problems of which we are only now beginning to become aware…

Social media companies’ true customers are their advertisers. But a new business model is gradually emerging, based not only on advertising but also on selling products and services directly to users. They exploit the data they control, bundle the services they offer, and use discriminatory pricing to keep more of the benefits that they would otherwise have to share with consumers. This enhances their profitability even further, but the bundling of services and discriminatory pricing undermine the efficiency of the market economy.

Social media companies deceive their users by manipulating their attention, directing it toward their own commercial purposes, and deliberately engineering addiction to the services they provide. This can be very harmful, particularly for adolescents.

There is a similarity between Internet platforms and gambling companies. Casinos have developed techniques to hook customers to the point that they gamble away all of their money, even money they don’t have.

Something similar – and potentially irreversible – is happening to human attention in our digital age. This is not a matter of mere distraction or addiction; social media companies are actually inducing people to surrender their autonomy. And this power to shape people’s attention is increasingly concentrated in the hands of a few companies.

It takes significant effort to assert and defend what John Stuart Mill called the freedom of mind. Once lost, those who grow up in the digital age may have difficulty regaining it.

This would have far-reaching political consequences. People without the freedom of mind can be easily manipulated. This danger does not loom only in the future; it already played an important role in the 2016 US presidential election.

There is an even more alarming prospect on the horizon: an alliance between authoritarian states and large, data-rich IT monopolies, bringing together nascent systems of corporate surveillance with already-developed systems of state-sponsored surveillance. This may well result in a web of totalitarian control the likes of which not even George Orwell could have imagined….(More)”.

Free Speech in the Filter Age


Alexandra Borchardt at Project Syndicate: “In a democracy, the rights of the many cannot come at the expense of the rights of the few. In the age of algorithms, government must, more than ever, ensure the protection of vulnerable voices, even erring on victims’ side at times.

Germany’s Network Enforcement Act – according to which social-media platforms like Facebook and YouTube could be fined €50 million ($63 million) for every “obviously illegal” post within 24 hours of receiving a notification – has been controversial from the start. After it entered fully into effect in January, there was a tremendous outcry, with critics from all over the political map arguing that it was an enticement to censorship. Government was relinquishing its powers to private interests, they protested.

So, is this the beginning of the end of free speech in Germany?

Of course not. To be sure, Germany’s Netzwerkdurchsetzungsgesetz (or NetzDG) is the strictest regulation of its kind in a Europe that is growing increasingly annoyed with America’s powerful social-media companies. And critics do have some valid points about the law’s weaknesses. But the possibilities for free expression will remain abundant, even if some posts are deleted mistakenly.

The truth is that the law sends an important message: democracies won’t stay silent while their citizens are exposed to hateful and violent speech and images – content that, as we know, can spur real-life hate and violence. Refusing to protect the public, especially the most vulnerable, from dangerous content in the name of “free speech” actually serves the interests of those who are already privileged, beginning with the powerful companies that drive the dissemination of information.

Speech has always been filtered. In democratic societies, everyone has the right to express themselves within the boundaries of the law, but no one has ever been guaranteed an audience. To have an impact, citizens have always needed to appeal to – or bypass – the “gatekeepers” who decide which causes and ideas are relevant and worth amplifying, whether through the media, political institutions, or protest.

The same is true today, except that the gatekeepers are the algorithms that automatically filter and rank all contributions. Of course, algorithms can be programmed any way companies like, meaning that they may place a premium on qualities shared by professional journalists: credibility, intelligence, and coherence.

But today’s social-media platforms are far more likely to prioritize potential for advertising revenue above all else. So the noisiest are often rewarded with a megaphone, while less polarizing, less privileged voices are drowned out, even if they are providing the smart and nuanced perspectives that can truly enrich public discussions….(More)”.

Small Data for Big Impact


Liz Luckett at the Stanford Social Innovation Review: “As an investor in data-driven companies, I’ve been thinking a lot about my grandfather—a baker, a small business owner, and, I now realize, a pioneering data scientist. Without much more than pencil, paper, and extraordinarily deep knowledge of his customers in Washington Heights, Manhattan, he bought, sold, and managed inventory while also managing risk. His community was poor, but his business prospered. This was not because of what we celebrate today as the power and predictive promise of big data, but rather because of what I call small data: nuanced market insights that come through regular and trusted interactions.

Big data takes into account volumes of information from largely electronic sources—such as credit cards, pay stubs, test scores—and segments people into groups. As a result, people participating in the formalized economy benefit from big data. But people who are paid in cash and have no recognized accolades, such as higher education, are left out. Small data captures those insights to address this market failure. My grandfather, for example, had critical customer information he carefully gathered over the years: who could pay now, who needed a few days more, and which tabs to close. If he had access to a big data algorithm, it likely would have told him all his clients were unlikely to repay him, based on the fact that they were low income (vs. high income) and low education level (vs. college degree). Today, I worry that in our enthusiasm for big data and aggregated predictions, we often lose the critical insights we can gain from small data, because we don’t collect it. In the process, we are missing vital opportunities to both make money and create economic empowerment.

We won’t solve this problem of big data by returning to my grandfather’s shop floor. What we need is more and better data—a small data movement to supply vital missing links in marketplaces and supply chains the world over. What are the proxies that allow large companies to discern whom among the low income are good customers in the absence of a shopkeeper? At The Social Entrepreneurs’ Fund (TSEF), we are profitably investing in a new breed of data company: enterprises that are intentionally and responsibly serving low-income communities, and generating new and unique insights about the behavior of individuals in the process. The value of the small data they collect is becoming increasingly useful to other partners, including corporations who are willing to pay for it. It is a kind of dual market opportunity that for the first time makes it economically advantageous for these companies to reach the poor. We are betting on small data to transform opportunities and quality of life for the underserved, tap into markets that were once seen as too risky or too costly to reach, and earn significant returns for investors….(More)”.

An AI That Reads Privacy Policies So That You Don’t Have To


Andy Greenberg at Wired: “…Today, researchers at Switzerland’s Federal Institute of Technology at Lausanne (EPFL), the University of Wisconsin and the University of Michigan announced the release of Polisis—short for “privacy policy analysis”—a new website and browser extension that uses their machine-learning-trained app to automatically read and make sense of any online service’s privacy policy, so you don’t have to.

In about 30 seconds, Polisis can read a privacy policy it’s never seen before and extract a readable summary, displayed in a graphic flow chart, of what kind of data a service collects, where that data could be sent, and whether a user can opt out of that collection or sharing. Polisis’ creators have also built a chat interface they call Pribot that’s designed to answer questions about any privacy policy, intended as a sort of privacy-focused paralegal advisor. Together, the researchers hope those tools can unlock the secrets of how tech firms use your data that have long been hidden in plain sight….

Polisis isn’t actually the first attempt to use machine learning to pull human-readable information out of privacy policies. Both Carnegie Mellon University and Columbia have made their own attempts at similar projects in recent years, points out NYU Law Professor Florencia Marotta-Wurgler, who has focused her own research on user interactions with terms of service contracts online. (One of her own studies showed that only .07 percent of users actually click on a terms of service link before clicking “agree.”) The Usable Privacy Policy Project, a collaboration that includes both Columbia and CMU, released its own automated tool to annotate privacy policies just last month. But Marotta-Wurgler notes that Polisis’ visual and chat-bot interfaces haven’t been tried before, and says the latest project is also more detailed in how it defines different kinds of data. “The granularity is really nice,” Marotta-Wurgler says. “It’s a way of communicating this information that’s more interactive.”…(More)”.

World’s biggest city database shines light on our increasingly urbanised planet


EU Joint Research Centers: “The JRC has launched a new tool with data on all 10,000 urban centres scattered across the globe. It is the largest and most comprehensive database on cities ever published.

With data derived from the JRC’s Global Human Settlement Layer (GHSL), researchers have discovered that the world has become even more urbanised than previously thought.

Populations in urban areas doubled in Africa and grew by 1.1 billion in Asia between 1990 and 2015.

Globally, more than 400 cities have a population between 1 and 5 million. More than 40 cities have 5 to 10 million people, and there are 32 ‘megacities’ with above 10 million inhabitants.

There are some promising signs for the environment: Cities became 25% greener between 2000 and 2015. And although air pollution in urban centres was increasing from 1990, between 2000 and 2015 the trend was reversed.

With every high density area of at least 50,000 inhabitants covered, the city centres database shows growth in population and built-up areas over the past 40 years.  Environmental factors tracked include:

  • ‘Greenness’: the estimated amount of healthy vegetation in the city centre
  • Soil sealing: the covering of the soil surface with materials like concrete and stone, as a result of new buildings, roads and other public and private spaces
  • Air pollution: the level of polluting particles such as PM2.5 in the air
  • Vicinity to protected areas: the percentage of natural protected space within 30 km distance from the city centre’s border
  • Disaster risk-related exposure of population and buildings in low lying areas and on steep slopes.

The data is free to access and open to everyone. It applies big data analytics and a global, people-based definition of cities, providing support to monitor global urbanisation and the 2030 Sustainable Development Agenda.

The information gained from the GHSL is used to map out population density and settlement maps. Satellite, census and local geographic information are used to create the maps….(More)”.

Building Trust in Data and Statistics


Shaida Badiee at UN World Data Forum: …What do we want for a 2030 data ecosystem?

Hope to achieve: A world where data are part of the DNA and culture of decision-making, used by all and valued as an important public good. A world where citizens trust the systems that produce data and have the skills and means to use and verify their quality and accuracy. A world where there are safeguards in place to protect privacy, while bringing the benefits of open data to all. In this world, countries value their national statistical systems, which are working independently with trusted partners in the public and private sectors and citizens to continuously meet the changing and expanding demands from data users and policy makers. Private sector data generators are generously sharing their data with public sector. And gaps in data are closing, making the dream of “leaving no one behind” come true, with SDG goals on the path to being met by 2030.

Hope to avoid: A world where large corporations control the bulk of national and international data and statistics with only limited sharing with the public sector, academics, and citizens. The culture of every man for himself and who pays, wins, dominates data sharing practices. National statistical systems are under-resourced and under-valued, with low trust from users, further weakening them and undermining their independence from political interference and their ability to control quality. The divide between those who have and those who do not have access, skills, and the ability to use data for decision-making and policy has widened. Data systems and their promise to count the uncounted and “leave no one behind” are falling behind due to low capacity and poor standards and institutions, and the hope of the 2030 agenda is fading.

With this vision in mind, are we on the right path? An optimist would say we are closer to the data ecosystem that we want to achieve. However, there are also some examples of movement in the wrong direction. There is no magic wand to make our wish come true, but a powerful enabler would be building trust in data and statistics. Therefore, this should be included as a goal in all our data strategies and action plans.

Here are some important building blocks underlying trust in data and statistics:

  1. Building strong organizational infrastructure, governance, and partnerships;
  2. Following sound data standards and principles for production, sharing, interoperability, and dissemination; and
  3. Addressing the last mile in the data value chain to meet users’ needs, create value with data, and ensure meaningful impacts…(More)”.

Republics of Makers: From the Digital Commons to a Flat Marginal Cost Society


Mario Carpo at eFlux: “…as the costs of electronic computation have been steadily decreasing for the last forty years at least, many have recently come to the conclusion that, for most practical purposes, the cost of computation is asymptotically tending to zero. Indeed, the current notion of Big Data is based on the assumption that an almost unlimited amount of digital data will soon be available at almost no cost, and similar premises have further fueled the expectation of a forthcoming “zero marginal costs society”: a society where, except for some upfront and overhead costs (the costs of building and maintaining some facilities), many goods and services will be free for all. And indeed, against all odds, an almost zero marginal cost society is already a reality in the case of many services based on the production and delivery of electricity: from the recording, transmission, and processing of electrically encoded digital information (bits) to the production and consumption of electrical power itself. Using renewable energies (solar, wind, hydro) the generation of electrical power is free, except for the cost of building and maintaining installations and infrastructure. And given the recent progress in the micro-management of intelligent electrical grids, it is easy to imagine that in the near future the cost of servicing a network of very small, local hydro-electric generators, for example, could easily be devolved to local communities of prosumers who would take care of those installations as their tend to their living environment, on an almost voluntary, communal basis.4 This was already often the case during the early stages of electrification, before the rise of AC (alternate current, which, unlike DC, or direct current, could be carried over long distances): AC became the industry’s choice only after Galileo Ferraris’s and Nikola Tesla’s developments in AC technologies in the 1880s.

Likewise, at the micro-scale of the electronic production and processing of bits and bytes of information, the Open Source movement and the phenomenal surge of some crowdsourced digital media (including some so-called social media) in the first decade of the twenty-first century has already proven that a collaborative, zero cost business model can effectively compete with products priced for profit on a traditional marketplace. As the success of Wikipedia, Linux, or Firefox proves, many are happy to volunteer their time and labor for free when all can profit from the collective work of an entire community without having to pay for it. This is now technically possible precisely because the fixed costs of building, maintaining, and delivering these service are very small; hence, from the point of view of the end-user, negligible.

Yet, regardless of the fixed costs of the infrastructure, content—even user-generated content—has costs, albeit for the time being these are mostly hidden, voluntarily born, or inadvertently absorbed by the prosumers themselves. For example, the wisdom of Wikipedia is not really a wisdom of crowds: most Wikipedia entries are de facto curated by fairly traditional scholar communities, and these communities can contribute their expertise for free only because their work has already been paid for by others—often by universities. In this sense, Wikipedia is only piggybacking on someone else’s research investments (but multiplying their outreach, which is one reason for its success). Ditto for most Open Source software, as training a software engineer, coder, or hacker, takes time and money—an investment for future returns that in many countries around the world is still born, at least in part, by public institutions….(More)”.

Crowdsourcing Judgments of News Source Quality


Paper by Gordon Pennycook and David G. Rand: “The spread of misinformation and disinformation, especially on social media, is a major societal challenge. Here, we assess whether crowdsourced ratings of trust in news sources can effectively differentiate between more and less reliable sources. To do so, we ran a preregistered experiment (N = 1,010 from Amazon Mechanical Turk) in which individuals rated familiarity with, and trust in, 60 news sources from three categories: 1) Mainstream media outlets, 2) Websites that produce hyper-partisan coverage of actual facts, and 3) Websites that produce blatantly false content (“fake news”).

Our results indicate that, despite substantial partisan bias, laypeople across the political spectrum rate mainstream media outlets as far more trustworthy than either hyper-partisan or fake news sources (all but 1 mainstream source, Salon, was rated as more trustworthy than every hyper-partisan or fake news source when equally weighting ratings of Democrats and Republicans).

Critically, however, excluding ratings from participants who are not familiar with a given news source dramatically reduces the difference between mainstream media sources and hyper-partisan or fake news sites. For example, 30% of the mainstream media websites (Salon, the Guardian, Fox News, Politico, Huffington Post, and Newsweek) received lower trust scores than the most trusted fake news site (news4ktla.com) when excluding unfamiliar ratings.

This suggests that rather than being initially agnostic about unfamiliar sources, people are initially skeptical – and thus a lack of familiarity is an important cue for untrustworthiness. Overall, our findings indicate that crowdsourcing media trustworthiness judgments is a promising approach for fighting misinformation and disinformation online, but that trustworthiness ratings from participants who are unfamiliar with a given source should not be ignored….(More)”.