Seeing Like a Finite State Machine


Henry Farrell at the Crooked Timber: “…So what might a similar analysis say about the marriage of authoritarianism and machine learning? Something like the following, I think. There are two notable problems with machine learning. One – that while it can do many extraordinary things, it is not nearly as universally effective as the mythology suggests. The other is that it can serve as a magnifier for already existing biases in the data. The patterns that it identifies may be the product of the problematic data that goes in, which is (to the extent that it is accurate) often the product of biased social processes. When this data is then used to make decisions that may plausibly reinforce those processes (by singling e.g. particular groups that are regarded as problematic out for particular police attention, leading them to be more liable to be arrested and so on), the bias may feed upon itself.

This is a substantial problem in democratic societies, but it is a problem where there are at least some counteracting tendencies. The great advantage of democracy is its openness to contrary opinions and divergent perspectives. This opens up democracy to a specific set of destabilizing attacks but it also means that there are countervailing tendencies to self-reinforcing biases. When there are groups that are victimized by such biases, they may mobilize against it (although they will find it harder to mobilize against algorithms than overt discrimination). When there are obvious inefficiencies or social, political or economic problems that result from biases, then there will be ways for people to point out these inefficiencies or problems.

These correction tendencies will be weaker in authoritarian societies; in extreme versions of authoritarianism, they may barely even exist. Groups that are discriminated against will have no obvious recourse. Major mistakes may go uncorrected: they may be nearly invisible to a state whose data is polluted both by the means employed to observe and classify it, and the policies implemented on the basis of this data. A plausible feedback loop would see bias leading to error leading to further bias, and no ready ways to correct it. This of course, will be likely to be reinforced by the ordinary politics of authoritarianism, and the typical reluctance to correct leaders, even when their policies are leading to disaster. The flawed ideology of the leader (We must all study Comrade Xi thought to discover the truth!) and of the algorithm (machine learning is magic!) may reinforce each other in highly unfortunate ways.

In short, there is a very plausible set of mechanisms under which machine learning and related techniques may turn out to be a disaster for authoritarianism, reinforcing its weaknesses rather than its strengths, by increasing its tendency to bad decision making, and reducing further the possibility of negative feedback that could help correct against errors. This disaster would unfold in two ways. The first will involve enormous human costs: self-reinforcing bias will likely increase discrimination against out-groups, of the sort that we are seeing against the Uighur today. The second will involve more ordinary self-ramifying errors, that may lead to widespread planning disasters, which will differ from those described in Scott’s account of High Modernism in that they are not as immediately visible, but that may also be more pernicious, and more damaging to the political health and viability of the regime for just that reason….(More)”

How We Became Our Data


Book by Colin Koopman: “We are now acutely aware, as if all of the sudden, that data matters enormously to how we live. How did information come to be so integral to what we can do? How did we become people who effortlessly present our lives in social media profiles and who are meticulously recorded in state surveillance dossiers and online marketing databases? What is the story behind data coming to matter so much to who we are?


In How We Became Our Data, Colin Koopman excavates early moments of our rapidly accelerating data-tracking technologies and their consequences for how we think of and express our selfhood today. Koopman explores the emergence of mass-scale record-keeping systems like birth certificates and social security numbers, as well as new data techniques for categorizing personality traits, measuring intelligence, and even racializing subjects. This all culminates in what Koopman calls the “informational person” and the “informational power” we are now subject to. The recent explosion of digital technologies that are turning us into a series of algorithmic data points is shown to have a deeper and more turbulent past than we commonly think. Blending philosophy, history, political theory, and media theory in conversation with thinkers like Michel Foucault, Jürgen Habermas, and Friedrich Kittler, Koopman presents an illuminating perspective on how we have come to think of our personhood—and how we can resist its erosion….(More)”.

Big Data, Big Impact? Towards Gender-Sensitive Data Systems


Report by Data2X: “How can insights drawn from big data sources improve understanding about the lives of women and girls?

This question has underpinned Data2X’s groundbreaking work at the intersection of big data and gender — work that funded ten research projects that examined the potential of big data to fill the global gender data gap.

Big Data, Big Impact? Towards Gender-Sensitive Data Systems summarizes the findings and potential policy implications of the Big Data for Gender pilot projects funded by Data2X, and lays out five cross-cutting messages that emerge from this body of work:

  1. Big data offers unique insights on women and girls.
  2. Gender-sensitive big data is ready to scale and integrate with traditional data.
  3. Identify and correct bias in big datasets.
  4. Protect the privacy of women and girls.
  5. Women and girls must be central to data governance.

This report argues that the time for pilot projects has passed. Data privacy concerns must be addressed; investment in scale up is needed. Big data offers great potential for women and girls, and indeed for all people….(More)”.

User Data as Public Resource: Implications for Social Media Regulation


Paper by Philip Napoli: “Revelations about the misuse and insecurity of user data gathered by social media platforms have renewed discussions about how best to characterize property rights in user data. At the same time, revelations about the use of social media platforms to disseminate disinformation and hate speech have prompted debates over the need for government regulation to assure that these platforms serve the public interest. These debates often hinge on whether any of the established rationales for media regulation apply to social media. This article argues that the public resource rationale that has been utilized in traditional media regulation in the United States applies to social media.

The public resource rationale contends that, when a media outlet utilizes a public resource—such as the broadcast spectrum, or public rights of way—the outlet must abide by certain public interest obligations that may infringe upon its First Amendment rights. This article argues that aggregate user data can be conceptualized as a public resource that triggers the application of a public interest regulatory framework to social media sites and other digital platforms that derive their revenue from the gathering, sharing, and monetization of massive aggregations of user data….(More)”.

The weather data gap: How can mobile technology make smallholder farmers climate resilient?


Rishi Raithatha at GSMA: “In the new GSMA AgriTech report, Mobile Technology for Climate Resilience: The role of mobile operators in bridging the data gap, we explore how mobile network operators (MNOs) can play a bigger role in developing and delivering services to strengthen the climate resilience of smallholder farmers. By harnessing their own assets and data, MNOs can improve a broad suite of weather products that are especially relevant for farming communities. These include a variety of weather forecasts (daily, weekly, sub-seasonal and seasonal) and nowcasts, as real-time monitoring and one- to two-hour predictions are often used for Early Warning Systems (EWS) to prevent weather-related disasters. MNOs can also help strengthen the value proposition of other climate products, such as weather index insurance and decision agriculture.

Why do we need more weather data?

Agriculture is highly dependent on regional climates, especially in developing countries where farming is largely rain-fed. Smallholder farmers, who are responsible for the bulk of agricultural production in developing countries, are particularly vulnerable to changing weather patterns – especially given their reliance on natural resources and exclusion from social protection schemes. However, the use of climate adaptation approaches, such as localised weather forecasts and weather index insurance, can enhance smallholder farmers’ ability to withstand the risks posed by climate change and maintain agricultural productivity.

Ground-level measurements are an essential component of climate resilience products; the creation of weather forecasts and nowcasts starts with the analysis of ground, spatial and aerial observations. This involves the use of algorithms, weather models and current and historical observational weather data. Observational instruments, such as radar, weather stations and satellites, are necessary in measuring ground-level weather. However, National Hydrological and Meteorological Services (NHMSs) in developing countries often lack the capacity to generate accurate ground-level measurements beyond a few areas, resulting in gaps in local weather data.

While satellite offers better quality resolution than before, and is more affordable and available to NHMSs, there is a need to complement this data with ground-level measurements. This is especially true in tropical and sub-tropical regions where most smallholder farmers live, where variable local weather patterns can lead to skewed averages from satellite data….(More).”

Big Data Analytics in Healthcare


Book edited by Anand J. Kulkarni, Patrick Siarry, Pramod Kumar Singh, Ajith Abraham, Mengjie Zhang, Albert Zomaya and Fazle Baki: “This book includes state-of-the-art discussions on various issues and aspects of the implementation, testing, validation, and application of big data in the context of healthcare. The concept of big data is revolutionary, both from a technological and societal well-being standpoint. This book provides a comprehensive reference guide for engineers, scientists, and students studying/involved in the development of big data tools in the areas of healthcare and medicine. It also features a multifaceted and state-of-the-art literature review on healthcare data, its modalities, complexities, and methodologies, along with mathematical formulations.

The book is divided into two main sections, the first of which discusses the challenges and opportunities associated with the implementation of big data in the healthcare sector. In turn, the second addresses the mathematical modeling of healthcare problems, as well as current and potential future big data applications and platforms…(More)”.

Big Data, Political Campaigning and the Law


Book edited by Normann Witzleb, Moira Paterson, and Janice Richardson on “Democracy and Privacy in the Age of Micro-Targeting”…: “In this multidisciplinary book, experts from around the globe examine how data-driven political campaigning works, what challenges it poses for personal privacy and democracy, and how emerging practices should be regulated.

The rise of big data analytics in the political process has triggered official investigations in many countries around the world, and become the subject of broad and intense debate. Political parties increasingly rely on data analytics to profile the electorate and to target specific voter groups with individualised messages based on their demographic attributes. Political micro-targeting has become a major factor in modern campaigning, because of its potential to influence opinions, to mobilise supporters and to get out votes. The book explores the legal, philosophical and political dimensions of big data analytics in the electoral process. It demonstrates that the unregulated use of big personal data for political purposes not only infringes voters’ privacy rights, but also has the potential to jeopardise the future of the democratic process, and proposes reforms to address the key regulatory and ethical questions arising from the mining, use and storage of massive amounts of voter data.

Providing an interdisciplinary assessment of the use and regulation of big data in the political process, this book will appeal to scholars from law, political science, political philosophy, and media studies, policy makers and anyone who cares about democracy in the age of data-driven political campaigning….(More)”.

What statistics can and can’t tell us about ourselves


Hannah Fry at The New Yorker: “Harold Eddleston, a seventy-seven-year-old from Greater Manchester, was still reeling from a cancer diagnosis he had been given that week when, on a Saturday morning in February, 1998, he received the worst possible news. He would have to face the future alone: his beloved wife had died unexpectedly, from a heart attack.

Eddleston’s daughter, concerned for his health, called their family doctor, a well-respected local man named Harold Shipman. He came to the house, sat with her father, held his hand, and spoke to him tenderly. Pushed for a prognosis as he left, Shipman replied portentously, “I wouldn’t buy him any Easter eggs.” By Wednesday, Eddleston was dead; Dr. Shipman had murdered him.

Harold Shipman was one of the most prolific serial killers in history. In a twenty-three-year career as a mild-mannered and well-liked family doctor, he injected at least two hundred and fifteen of his patients with lethal doses of opiates. He was finally arrested in September, 1998, six months after Eddleston’s death.

David Spiegelhalter, the author of an important and comprehensive new book, “The Art of Statistics” (Basic), was one of the statisticians tasked by the ensuing public inquiry to establish whether the mortality rate of Shipman’s patients should have aroused suspicion earlier. Then a biostatistician at Cambridge, Spiegelhalter found that Shipman’s excess mortality—the number of his older patients who had died in the course of his career over the number that would be expected of an average doctor’s—was a hundred and seventy-four women and forty-nine men at the time of his arrest. The total closely matched the number of victims confirmed by the inquiry….

In 1825, the French Ministry of Justice ordered the creation of a national collection of crime records. It seems to have been the first of its kind anywhere in the world—the statistics of every arrest and conviction in the country, broken down by region, assembled and ready for analysis. It’s the kind of data set we take for granted now, but at the time it was extraordinarily novel. This was an early instance of Big Data—the first time that mathematical analysis had been applied in earnest to the messy and unpredictable realm of human behavior.

Or maybe not so unpredictable. In the early eighteen-thirties, a Belgian astronomer and mathematician named Adolphe Quetelet analyzed the numbers and discovered a remarkable pattern. The crime records were startlingly consistent. Year after year, irrespective of the actions of courts and prisons, the number of murders, rapes, and robberies reached almost exactly the same total. There is a “terrifying exactitude with which crimes reproduce themselves,” Quetelet said. “We know in advance how many individuals will dirty their hands with the blood of others. How many will be forgers, how many poisoners.”

To Quetelet, the evidence suggested that there was something deeper to discover. He developed the idea of a “Social Physics,” and began to explore the possibility that human lives, like planets, had an underlying mechanistic trajectory. There’s something unsettling in the idea that, amid the vagaries of choice, chance, and circumstance, mathematics can tell us something about what it is to be human. Yet Quetelet’s overarching findings still stand: at some level, human life can be quantified and predicted. We can now forecast, with remarkable accuracy, the number of women in Germany who will choose to have a baby each year, the number of car accidents in Canada, the number of plane crashes across the Southern Hemisphere, even the number of people who will visit a New York City emergency room on a Friday evening….(More)”

Government wants access to personal data while it pushes privacy


Sara Fischer and Scott Rosenberg at Axios: “Over the past two years, the U.S. government has tried to rein in how major tech companies use the personal data they’ve gathered on their customers. At the same time, government agencies are themselves seeking to harness those troves of data.

Why it matters: Tech platforms use personal information to target ads, whereas the government can use it to prevent and solve crimes, deliver benefits to citizens — or (illegally) target political dissent.

Driving the news: A new report from the Wall Street Journal details the ways in which family DNA testing sites like FamilyTreeDNA are pressured by the FBI to hand over customer data to help solve criminal cases using DNA.

  • The trend has privacy experts worried about the potential implications of the government having access to large pools of genetic data, even though many people whose data is included never agreed to its use for that purpose.

The FBI has particular interest in data from genetic and social media sites, because it could help solve crimes and protect the public.

  • For example, the FBI is “soliciting proposals from outside vendors for a contract to pull vast quantities of public data” from Facebook, Twitter Inc. and other social media companies,“ the Wall Street Journal reports.
  • The request is meant to help the agency surveil social behavior to “mitigate multifaceted threats, while ensuring all privacy and civil liberties compliance requirements are met.”
  • Meanwhile, the Trump administration has also urged social media platforms to cooperate with the governmentin efforts to flag individual users as potential mass shooters.

Other agencies have their eyes on big data troves as well.

  • Earlier this year, settlement talks between Facebook and the Department of Housing and Urban Development broke down over an advertising discrimination lawsuit when, according to a Facebook spokesperson, HUD “insisted on access to sensitive information — like user data — without adequate safeguards.”
  • HUD presumably wanted access to the data to ensure advertising discrimination wasn’t occurring on the platform, but it’s unclear whether the agency needed user data to be able to support that investigation….(More)”.

Companies Collect a Lot of Data, But How Much Do They Actually Use?


Article by Priceonomics Data Studio: “For all the talk of how data is the new oil and the most valuable resource of any enterprise, there is a deep dark secret companies are reluctant to share — most of the data collected by businesses simply goes unused.

This unknown and unused data, known as dark data comprises more than half the data collected by companies. Given that some estimates indicate that 7.5 septillion (7,700,000,000,000,000,000,000) gigabytes of data are generated every single day, not using  most of it is a considerable issue.

In this article, we’ll look at this dark data. Just how much of it is created by companies, what are the reasons this data isn’t being analyzed, and what are the costs and implications of companies not using the majority of the data they collect.  

Before diving into the analysis, it’s worth spending a moment clarifying what we mean by the term “dark data.” Gartner defines dark data as:

“The information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). 

To learn more about this phenomenon, Splunk commissioned a global survey of 1,300+ business leaders to better understand how much data they collect, and how much is dark. Respondents were from IT and business roles, and were located in Australia, China, France, Germany, Japan, the United States, and the United Kingdom. across various industries. For the report, Splunk defines dark data as: “all the unknown and untapped data across an organization, generated by systems, devices and interactions.”

While the costs of storing data has decreased overtime, the cost of saving septillions of gigabytes of wasted data is still significant. What’s more, during this time the strategic importance of data has increased as companies have found more and more uses for it. Given the cost of storage and the value of data, why does so much of it go unused?

The following chart shows the reasons why dark data isn’t currently being harnessed:

By a large margin, the number one reason given for not using dark data is that companies lack a tool to capture or analyze the data. Companies accumulate data from server logs, GPS networks, security tools, call records, web traffic and more. Companies track everything from digital transactions to the temperature of their server rooms to the contents of retail shelves. Most of this data lies in separate systems, is unstructured, and cannot be connected or analyzed.

Second, the data captured just isn’t good enough. You might have important customer information about a transaction, but it’s missing location or other important metadata because that information sits somewhere else or was never captured in useable format.

Additionally, dark data exists because there is simply too much data out there and a lot of is unstructured. The larger the dataset (or the less structured it is), the more sophisticated the tool required for analysis. Additionally, these kinds of datasets often time require analysis by individuals with significant data science expertise who are often is short supply

The implications of the prevalence are vast. As a result of the data deluge, companies often don’t know where all the sensitive data is stored and can’t be confident they are complying with consumer data protection measures like GDPR. …(More)”.