The Internet Freedom League: How to Push Back Against the Authoritarian Assault on the Web


Essay by Richard A. Clarke And Rob Knake in Foreign Affairs: “The early days of the Internet inspired a lofty dream: authoritarian states, faced with the prospect of either connecting to a new system of global communication or being left out of it, would choose to connect. According to this line of utopian thinking, once those countries connected, the flow of new information and ideas from the outside world would inexorably pull them toward economic openness and political liberalization. In reality, something quite different has happened. Instead of spreading democratic values and liberal ideals, the Internet has become the backbone of authoritarian surveillance states all over the world. Regimes in China, Russia, and elsewhere have used the Internet’s infrastructure to build their own national networks. At the same time, they have installed technical and legal barriers to prevent their citizens from reaching the wider Internet and to limit Western companies from entering their digital markets. 

But despite handwringing in Washington and Brussels about authoritarian schemes to split the Internet, the last thing Beijing and Moscow want is to find themselves relegated to their own networks and cut off from the global Internet. After all, they need access to the Internet to steal intellectual property, spread propaganda, interfere with elections in other countries, and threaten critical infrastructure in rival countries. China and Russia would ideally like to re-create the Internet in their own images and force the world to play by their repressive rules. But they haven’t been able to do that—so instead they have ramped up their efforts to tightly control outside access to their markets, limit their citizens’ ability to reach the wider Internet, and exploit the vulnerability that comes with the digital freedom and openness enjoyed in the West.

The United States and its allies and partners should stop worrying about the risk of authoritarians splitting the Internet. Instead, they should split it themselves, by creating a digital bloc within which data, services, and products can flow freely…(More)”.

Fostering an Enabling Policy and Regulatory Environment in APEC for Data-Utilizing Businesses


APEC: “The objectives of this study is to better understand: 1) how firms from different sectors use data in their business models; and considering the significant increase in data-related policies and regulations enacted by governments across the world, 2) how such policies and regulations are affecting their use of data and hence business models. The study also tries: 3) to identify some of the middle-ground approaches that would enable governments to achieve public policy objectives, such as data security and privacy, and at the same time, also promote the growth of data-utilizing businesses. 39 firms from 12 economies have participated in this project and they come from a diverse group of industries, including aviation, logistics, shipping, payment services, encryption services, and manufacturing. The synthesis report can be found in Chapter 1 while the case study chapters can be found in Chapter 2 to 10….(More)”.

Companies Collect a Lot of Data, But How Much Do They Actually Use?


Article by Priceonomics Data Studio: “For all the talk of how data is the new oil and the most valuable resource of any enterprise, there is a deep dark secret companies are reluctant to share — most of the data collected by businesses simply goes unused.

This unknown and unused data, known as dark data comprises more than half the data collected by companies. Given that some estimates indicate that 7.5 septillion (7,700,000,000,000,000,000,000) gigabytes of data are generated every single day, not using  most of it is a considerable issue.

In this article, we’ll look at this dark data. Just how much of it is created by companies, what are the reasons this data isn’t being analyzed, and what are the costs and implications of companies not using the majority of the data they collect.  

Before diving into the analysis, it’s worth spending a moment clarifying what we mean by the term “dark data.” Gartner defines dark data as:

“The information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). 

To learn more about this phenomenon, Splunk commissioned a global survey of 1,300+ business leaders to better understand how much data they collect, and how much is dark. Respondents were from IT and business roles, and were located in Australia, China, France, Germany, Japan, the United States, and the United Kingdom. across various industries. For the report, Splunk defines dark data as: “all the unknown and untapped data across an organization, generated by systems, devices and interactions.”

While the costs of storing data has decreased overtime, the cost of saving septillions of gigabytes of wasted data is still significant. What’s more, during this time the strategic importance of data has increased as companies have found more and more uses for it. Given the cost of storage and the value of data, why does so much of it go unused?

The following chart shows the reasons why dark data isn’t currently being harnessed:

By a large margin, the number one reason given for not using dark data is that companies lack a tool to capture or analyze the data. Companies accumulate data from server logs, GPS networks, security tools, call records, web traffic and more. Companies track everything from digital transactions to the temperature of their server rooms to the contents of retail shelves. Most of this data lies in separate systems, is unstructured, and cannot be connected or analyzed.

Second, the data captured just isn’t good enough. You might have important customer information about a transaction, but it’s missing location or other important metadata because that information sits somewhere else or was never captured in useable format.

Additionally, dark data exists because there is simply too much data out there and a lot of is unstructured. The larger the dataset (or the less structured it is), the more sophisticated the tool required for analysis. Additionally, these kinds of datasets often time require analysis by individuals with significant data science expertise who are often is short supply

The implications of the prevalence are vast. As a result of the data deluge, companies often don’t know where all the sensitive data is stored and can’t be confident they are complying with consumer data protection measures like GDPR. …(More)”.

The Costs of Connection: How Data Is Colonizing Human Life and Appropriating It for Capitalism


Book by Nick Couldry: “We are told that progress requires human beings to be connected, and that science, medicine and much else that is good demands the kind massive data collection only possible if every thing and person are continuously connected.

But connection, and the continuous surveillance that connection makes possible, usher in an era of neocolonial appropriation. In this new era, social life becomes a direct input to capitalist production, and data – the data collected and processed when we are connected – is the means for this transformation. Hence the need to start counting the costs of connection.

Capturing and processing social data is today handled by an emerging social quantification sector. We are familiar with its leading players, from Acxiom to Equifax, from Facebook to Uber. Together, they ensure the regular and seemingly natural conversion of daily life into a stream of data that can be appropriated for value. This stream is extracted from sensors embedded in bodies and objects, and from the traces left by human interaction online. The result is a new social order based on continuous tracking, and offering unprecedented new opportunities for social discrimination and behavioral influence.  This order has disturbing consequences for freedom, justice and power — indeed, for the quality of human life.

The true violence of this order is best understood through the history of colonialism. But because we assume that colonialism has been replaced by advanced capitalism, we often miss the connection. The concept of data colonialism can thus be used to trace continuities from colonialism’s historic appropriation of territories and material resources to the datafication of everyday life today. While the modes, intensities, scales and contexts of dispossession have changed, the underlying function remains the same: to acquire resources from which economic value can be extracted.

In data colonialism, data is appropriated through a new type of social relation: data relations. We are living through a time when the organization of capital and the configurations of power are changing dramatically because of this contemporary form of social relation. Data colonialism justifies what it does as an advance in scientific knowledge, personalized marketing, or rational management, just as historic colonialism claimed a civilizing mission. Data colonialism is global, dominated by powerful forces in East and West, in the USA and China. The result is a world where, wherever we are connected, we are colonized by data.

Where is data colonialism heading in the long term? Just as historical colonialism paved the way for industrial capitalism, data colonialism is paving the way for a new stage of capitalism whose outlines we only partly see: the capitalization of life without limit. There will be no part of human life, no layer of experience, that is not extractable for economic value. Human life will be there for mining by corporations without reserve as governments look on appreciatively. This process of capitalization will be the foundation for a highly unequal new social arrangement, a social order that is deeply incompatible with human freedom and autonomy.

But resistance is still possible, drawing on past and present decolonial struggles, as well as the on the best of the humanities, philosophy, political economy, information and social science. The goal is to name what is happening and imagine better ways of living together without the exploitation on which today’s models of ‘connection’ are founded….(More)”

Trust in Contemporary Society


Book edited by Masamichi Sasaki: “… deals with conceptual, theoretical and social interaction analyses, historical data on societies, national surveys or cross-national comparative studies, and methodological issues related to trust. The authors are from a variety of disciplines: psychology, sociology, political science, organizational studies, history, and philosophy, and from Britain, the United States, the Czech Republic, the Netherlands, Australia, Germany, and Japan. They bring their vast knowledge from different historical and cultural backgrounds to illuminate contemporary issues of trust and distrust. The socio-cultural perspective of trust is important and increasingly acknowledged as central to trust research. Accordingly, future directions for comparative trust research are also discussed….(More)”.

This Is Not an Atlas.


Book by kollektiv orangotango: “This Is Not an Atlas gathers more than 40 counter-cartographies from all over the world. This collection shows how maps are created and transformed as a part of political struggle, for critical research or in art and education: from indigenous territories in the Amazon to the anti-eviction movement in San Francisco; from defending commons in Mexico to mapping refugee camps with balloons in Lebanon; from slums in Nairobi to squats in Berlin; from supporting communities in the Philippines to reporting sexual harassment in Cairo. This Is Not an Atlas seeks to inspire, to document the underrepresented, and to be a useful companion when becoming a counter-cartographer yourself….(More)”.

The Psychology of Prediction


Blog post by Morgan Housel: “During the Vietnam War Secretary of Defense Robert McNamara tracked every combat statistic he could, creating a mountain of analytics and predictions to guide the war’s strategy.

Edward Lansdale, head of special operations at the Pentagon, once looked at McNamara’s statistics and told him something was missing.

“What?” McNamara asked.

“The feelings of the Vietnamese people,” Landsdale said.

That’s not the kind of thing a statistician pays attention to. But, boy, did it matter.

I believe in prediction. I think you have to in order to get out of bed in the morning.

But prediction is hard. Either you know that or you’re in denial about it.

A lot of the reason it’s hard is because the visible stuff that happens in the world is a small fraction of the hidden stuff that goes on inside people’s heads. The former is easy to overanalyze; the latter is easy to ignore.

This report describes 12 common flaws, errors, and misadventures that occur in people’s heads when predictions are made….(More)”.

The plan to mine the world’s research papers


Priyanka Pulla in Nature: “Carl Malamud is on a crusade to liberate information locked up behind paywalls — and his campaigns have scored many victories. He has spent decades publishing copyrighted legal documents, from building codes to court records, and then arguing that such texts represent public-domain law that ought to be available to any citizen online. Sometimes, he has won those arguments in court. Now, the 60-year-old American technologist is turning his sights on a new objective: freeing paywalled scientific literature. And he thinks he has a legal way to do it.

Over the past year, Malamud has — without asking publishers — teamed up with Indian researchers to build a gigantic store of text and images extracted from 73 million journal articles dating from 1847 up to the present day. The cache, which is still being created, will be kept on a 576-terabyte storage facility at Jawaharlal Nehru University (JNU) in New Delhi. “This is not every journal article ever written, but it’s a lot,” Malamud says. It’s comparable to the size of the core collection in the Web of Science database, for instance. Malamud and his JNU collaborator, bioinformatician Andrew Lynn, call their facility the JNU data depot.

No one will be allowed to read or download work from the repository, because that would breach publishers’ copyright. Instead, Malamud envisages, researchers could crawl over its text and data with computer software, scanning through the world’s scientific literature to pull out insights without actually reading the text.

The unprecedented project is generating much excitement because it could, for the first time, open up vast swathes of the paywalled literature for easy computerized analysis. Dozens of research groups already mine papers to build databases of genes and chemicals, map associations between proteins and diseases, and generate useful scientific hypotheses. But publishers control — and often limit — the speed and scope of such projects, which typically confine themselves to abstracts, not full text. Researchers in India, the United States and the United Kingdom are already making plans to use the JNU store instead. Malamud and Lynn have held workshops at Indian government laboratories and universities to explain the idea. “We bring in professors and explain what we are doing. They get all excited and they say, ‘Oh gosh, this is wonderful’,” says Malamud.

But the depot’s legal status isn’t yet clear. Malamud, who contacted several intellectual-property (IP) lawyers before starting work on the depot, hopes to avoid a lawsuit. “Our position is that what we are doing is perfectly legal,” he says. For the moment, he is proceeding with caution: the JNU data depot is air-gapped, meaning that no one can access it from the Internet. Users have to physically visit the facility, and only researchers who want to mine for non-commercial purposes are currently allowed in. Malamud says his team does plan to allow remote access in the future. “The hope is to do this slowly and deliberately. We are not throwing this open right away,” he says….(More)”.

What Restaurant Reviews Reveal About Cities


Linda Poon at CityLab: “Online review sites can tell you a lot about a city’s restaurant scene, and they can reveal a lot about the city itself, too.

Researchers at MIT recently found that information about restaurants gathered from popular review sites can be used to uncover a number of socioeconomic factors of a neighborhood, including its employment rates and demographic profiles of the people who live, work, and travel there.

A report published last week in the Proceedings of the National Academy of Sciences explains how the researchers used information found on Dianping—a Yelp-like site in China—to find information that might usually be gleaned from an official government census. The model could prove especially useful for gathering information about cities that don’t have that kind of reliable or up-to-date government data, especially in developing countries with limited resources to conduct regular surveys….

Zheng and her colleagues tested out their machine-learning model using restaurant data from nine Chinese cities of various sizes—from crowded ones like Beijing, with a population of more than 10 million, to smaller ones like Baoding, a city of fewer than 3 million people.

They pulled data from 630,000 restaurants listed on Dianping, including each business’s location, menu prices, opening day, and customer ratings. Then they ran it through a machine-learning model with official census data and with anonymous location and spending data gathered from cell phones and bank cards. By comparing the information, they were able to determine where the restaurant data reflected the other data they had about neighborhoods’ characteristics.

They found that the local restaurant scene can predict, with 95 percent accuracy, variations in a neighborhood’s daytime and nighttime populations, which are measured using mobile phone data. They can also predict, with 90 and 93 percent accuracy, respectively, the number of businesses and the volume of consumer consumption. The type of cuisines offered and kind of eateries available (coffeeshop vs. traditional teahouses, for example), can also predict the proportion of immigrants or age and income breakdown of residents. The predictions are more accurate for neighborhoods near urban centers as opposed to those near suburbs, and for smaller cities, where neighborhoods don’t vary as widely as those in bigger metropolises….(More)”.

Stop Surveillance Humanitarianism


Mark Latonero at The New York Times: “A standoff between the United Nations World Food Program and Houthi rebels in control of the capital region is threatening the lives of hundreds of thousands of civilians in Yemen.

Alarmed by reports that food is being diverted to support the rebels, the aid program is demanding that Houthi officials allow them to deploy biometric technologies like iris scans and digital fingerprints to monitor suspected fraud during food distribution.

The Houthis have reportedly blocked food delivery, painting the biometric effort as an intelligence operation, and have demanded access to the personal data on beneficiaries of the aid. The impasse led the aid organization to the decision last month to suspend food aid to parts of the starving population — once thought of as a last resort — unless the Houthis allow biometrics.

With program officials saying their staff is prevented from doing its essential jobs, turning to a technological solution is tempting. But biometrics deployed in crises can lead to a form of surveillance humanitarianism that can exacerbate risks to privacy and security.

By surveillance humanitarianism, I mean the enormous data collection systems deployed by aid organizations that inadvertently increase the vulnerability of people in urgent need….(More)”.