Babbage among the insurers: big 19th-century data and the public interest.


Wilson, D. C. S.  at the History of the Human Sciences: “This article examines life assurance and the politics of ‘big data’ in mid-19th-century Britain. The datasets generated by life assurance companies were vast archives of information about human longevity. Actuaries distilled these archives into mortality tables – immensely valuable tools for predicting mortality and so pricing risk. The status of the mortality table was ambiguous, being both a public and a private object: often computed from company records they could also be extrapolated from quasi-public projects such as the Census or clerical records. Life assurance more generally straddled the line between private enterprise and collective endeavour, though its advocates stressed the public interest in its success. Reforming actuaries such as Thomas Rowe Edmonds wanted the data on which mortality tables were based to be made publicly available, but faced resistance. Such resistance undermined insurers’ claims to be scientific in spirit and hindered Edmonds’s personal quest for a law of mortality. Edmonds pushed instead for an open actuarial science alongside fellow-travellers at the Statistical Society of London, which was populated by statisticians such as William Farr (whose subsequent work, it is argued, was influenced by Edmonds) as well as by radical mathematicians such as Charles Babbage. The article explores Babbage’s little-known foray into the world of insurance, both as a budding actuary but also as a fierce critic of the industry. These debates over the construction, ownership, and accessibility of insurance datasets show that concern about the politics of big data did not begin in the 21st century….(More)”.

Privacy and Synthetic Datasets


Paper by Steven M. Bellovin, Preetam K. Dutta and Nathan Reitinger: “Sharing is a virtue, instilled in us from childhood. Unfortunately, when it comes to big data — i.e., databases possessing the potential to usher in a whole new world of scientific progress — the legal landscape prefers a hoggish motif. The historic approach to the resulting database–privacy problem has been anonymization, a subtractive technique incurring not only poor privacy results, but also lackluster utility. In anonymization’s stead, differential privacy arose; it provides better, near-perfect privacy, but is nonetheless subtractive in terms of utility.

Today, another solution is leaning into the fore, synthetic data. Using the magic of machine learning, synthetic data offers a generative, additive approach — the creation of almost-but-not-quite replica data. In fact, as we recommend, synthetic data may be combined with differential privacy to achieve a best-of-both-worlds scenario. After unpacking the technical nuances of synthetic data, we analyze its legal implications, finding both over and under inclusive applications. Privacy statutes either overweigh or downplay the potential for synthetic data to leak secrets, inviting ambiguity. We conclude by finding that synthetic data is a valid, privacy-conscious alternative to raw data, but is not a cure-all for every situation. In the end, computer science progress must be met with proper policy in order to move the area of useful data dissemination forward….(More)”.

After the flood, the flood map: uncovering values at risk


Rebecca Elliott at Work in Progress: “Climate change is making the planet we inhabit a more dangerous place to live. After the devastating 2017 hurricane season in the U.S. and Caribbean, it has become easier, and more frightening, to comprehend what a world of more frequent and severe storms and extreme weather might portend for our families and communities.

When policymakers, officials, and experts talk about such threats, they often do so in a language of “value at risk”: a measurement of the financial worth of assets exposed to potential losses in the face of natural hazards. This language is not only descriptive, expressing the extent of the threat, it is also in some ways prescriptive.

Information about value and risk provides a way for us to exert some control, to “tame uncertainty” and, if not precisely predict, at least to plan and prepare prudently for the future. If we know the value at risk, we can take smart steps to protect it.

This logic can, however, break down in practice.

After Hurricane Sandy in 2012, I went to New York City to find out how residents there, particularly homeowners, were responding to a new landscape of “value at risk.” In the wake of the catastrophe, they had received a new “flood insurance rate map” that expanded the boundaries of the city’s high-risk flood zones.

129 billion dollars of property was now officially “at risk” of flood, an increase of more than 120 percent over the previous map.

And yet, I found that many New Yorkers were less worried about the threat of flooding than they were about the flood map itself. As one Rockaway man put it, the map was “scarier than another storm.”

Far from producing clear strategies of action, the map produced ambivalent actors and outcomes. Even when people took steps to reduce their flood risk, they often did not feel that they were better off or more secure for having done so. How can we understand this?

By examining the social life of the flood insurance rate map, talking to its users (affected residents as well as the experts, officials, and professionals who work with them), I found that the stakes on the ground were bigger than just property values and floods. Other kinds of values were threatened here, and not just from floods, complicating the decision of what to do and when….(More)”.

Who represents the human in the digital age?


Anni Rowland-Campbell at NPC: “In his book The Code Economy Philip E. Auerswald talks about the long history of humans developing codeas a mechanism by which to create and regulate activities and markets.[1] We have Codes of Practice, Ethical Codes, Building Codes, and Legal Codes, just to name a few.

Each and every one of these is based on the data of human behaviour, and that data can now be collected, analysed, harvested and repurposed as never before through the application of intelligent machines that operate and are instructed by algorithms. Anything that can be articulated as an algorithm—a self-contained sequence of actions to be performed—is now fertile ground for machine analysis, and increasingly machine activity.

So, what does this mean for us humans who, are ourselves a conglomeration of DNA code? I have spent many years thinking about this. Not that long ago my friends and family tolerated my speculations with good humour, but a fair degree of scepticism. Now I run workshops for boards and even my children are listening far more intently. Because people are sensing that the invasion of the ‘Social Machine’ is changing our relationship with such things as privacy, as well as with both ourselves and each other. It is changing how we understand our role as humans.

The Social Machine is the name given to the systems we have created that blur the lines between computational processes and human input, of which the World Wide Web is the largest and best known example. These ‘smart machines’ are increasingly pervading almost every aspect of human existenceand, in many ways, getting to know us better than we know ourselves.

So who stands up for us humans? Who determines how society will harness and utilise the power of information technologies whilst ensuring that the human remain both relevant and important?…

Philanthropists must equip themselves with the knowledge they need in order to do good with digital

Consider the Luddites as they smashed the looms in the early 1800s. Their struggle is instructive because they were amongst the first to experience technological displacement. They sensed the degradation of human kind and they fought for social equality and fairness in the distribution of the benefits of science and technology to all. If knowledge is power, philanthropy must arm itself with knowledge of digital to ensure the power of digital lies with the many and not the few.

The best place to start in understanding the digital world as it stands now is to begin to see the world, and all human activities, through the lens of data and as a form of digital currency. This links back to the earlier idea of codes. Our activities, up until recently, were tacit and experiential, but now they are becoming increasingly explicit and quantified. Where we go, who we meet, what we say, what we do is all being registered, monitored and measured as long as we are connected to the digital infrastructure.

A new currency is emerging that is based on the world’s most valuable resource: data. It is this currency that connects the arteries and capillaries, and reaches across all disciplines and fields of expertise. The kind of education that is required now is to be able to make connections and to see the opportunities in the interstice between policy and day-to-day reality.

The dominant players in this space thus far have been the large corporations and governments that have harnessed and exploited digital currencies for their own benefit. Shoshana Zuboff describes this as the ‘surveillance economy’. But this data actually belongs to each and every human who generates it. As people begin to wake up to this we are gradually realising that this is what fuels the social currency of entrepreneurship, leadership and innovation, and provides the legitimacy upon which trust is based.

Trust is an outcome of experiences and interactions, but governments and corporations have transactionalised their interactions with citizens and consumer through exploiting data. As a consequence they have eroded the esteem with which they are held. The more they try to garner greater insights through data and surveillance, the more they alienate the people they seek to reach.

If we are smart what we need to do, as philanthropists, is to understand the fundamentals of data as a currency and integrate this in to each and every interaction we have. This will enable us to create relationships with the people that are based on the authenticity of purpose, supported by the data of proof. Yes, there have been some instances where the sector has not done as well as it could and betrayed that trust. But this only serves as a lesson as to how fragile the world of trust and legitimacy are. It shows how crucial it is that we define all that we do in terms of social outcomes and impact, however that is defined….(More)”

DNA databases are too white. This man aims to fix that.


Interview of Carlos D. Bustamante by David Rotman: “In the 15 years since the Human Genome Project first exposed our DNA blueprint, vast amounts of genetic data have been collected from millions of people in many different parts of the world. Carlos D. Bustamante’s job is to search that genetic data for clues to everything from ancient history and human migration patterns to the reasons people with different ancestries are so varied in their response to common diseases.

Bustamante’s career has roughly spanned the period since the Human Genome Project was completed. A professor of genetics and biomedical data science at Stanford and 2010 winner of a MacArthur genius award, he has helped to tease out the complex genetic variation across different populations. These variants mean that the causes of diseases can vary greatly between groups. Part of the motivation for Bustamante, who was born in Venezuela and moved to the US when he was seven, is to use those insights to lessen the medical disparities that still plague us.

But while it’s an area ripe with potential for improving medicine, it’s also fraught with controversies over how to interpret genetic differences between human populations. In an era still obsessed with race and ethnicity—and marred by the frequent misuse of science in defining the characteristics of different groups—Bustamante remains undaunted in searching for the nuanced genetic differences that these groups display.

Perhaps his optimism is due to his personality—few sentences go by without a “fantastic” or “extraordinarily exciting.” But it is also his recognition as a population geneticist of the incredible opportunity that understanding differences in human genomes presents for improving health and fighting disease.

David Rotman, MIT Technology Review’s editor at large, discussed with Bustamante why it’s so important to include more people in genetic studies and understand the genetics of different populations.

How good are we at making sure that the genomic data we’re collecting is inclusive?

I’m optimistic, but it’s not there yet.

In our 2011 paper, the statistic we had was that more than 96% of participants in genome-wide association studies were of European descent. In the follow-up in 2016, the number went from 96% to around 80%. So that’s getting better. Unfortunately, or perhaps fortunately, a lot of that is due to the entry of China into genetics. A lot of that was due to large-scale studies in Chinese and East Asian populations. Hispanics, for example, make up less than 1% of genome-wide association studies. So we need to do better. Ultimately, we want precision medicine to benefit everybody.

Aside from a fairness issue, why is diversity in genomic data important? What do we miss without it?

First of all, it has nothing to do with political correctness. It has everything to do with human biology and the fact that human populations and the great diaspora of human migrations have left their mark on the human genome. The genetic underpinnings of health and disease have shared components across human populations and things that are unique to different populations….(More)”.

Crowdsourcing the vote: New horizons in citizen forecasting


Article by Mickael Temporão Yannick Dufresne Justin Savoie and Clifton van der Linden in International Journal of Forecasting: “People do not know much about politics. This is one of the most robust findings in political science and is backed by decades of research. Most of this research has focused on people’s ability to know about political issues and party positions on these issues. But can people predict elections? Our research uses a very large dataset (n>2,000,000) collected during ten provincial and federal elections in Canada to test whether people can predict the electoral victor and the closeness of the race in their district throughout the campaign. The results show that they can. This paper also contributes to the emerging literature on citizen forecasting by developing a scaling method that allows us to compare the closeness of races and that can be applied to multiparty contexts with varying numbers of parties. Finally, we assess the accuracy of citizen forecasting in Canada when compared to voter expectations weighted by past votes and political competency….(More)”.

When the Rule of Law Is Not Working


A conversation with Karl Sigmund at Edge: “…Now, I’m getting back to evolutionary game theory, the theory of evolution of cooperation and the social contract, and how the social contract can be subverted by corruption. That’s what interests me most currently. Of course, that is not a new story. I believe it explains a lot of what I see happening in my field and in related fields. The ideas that survive are the ideas that are fruitful in the sense of quickly producing a lot of publications, and that’s not necessarily correlated with these ideas being important to advancing science.

Corruption is a wicked problem, wicked in the technical sense of sociology, and it’s not something that will go away. You can reduce it, but as soon as you stop your efforts, it comes back again. Of course, there are many sides to corruption, but everybody seems now to agree that it is a very important problem. In fact, there was a Gallop Poll recently in which people were asked what the number one problem in today’s world is. You would think it would be climate change or overpopulation, but it turned out the majority said “corruption.” So, it’s a problem that is affecting us deeply.

There are so many different types of corruption, but the official definition is “a misuse of public trust for private means.” And this need not be by state officials; it could be also by CEOs, or by managers of non-governmental organizations, or by a soccer referee for that matter. It is always the misuse of public trust for private means, which of course takes many different forms; for instance, you have something called pork barreling, which is a wonderful expression in the United States, or embezzlement of funds, and so on.

I am mostly interested in the effect of bribery upon the judiciary system. If the trust in contracts breaks down, then the economy breaks down, because trust is at the root of the economy. There are staggering statistics which illustrate that the economic welfare of a state is closely related to the corruption perception index. Every year there are statistics about corruption published by organizations such as Transparency International or other such non-governmental organizations. It is truly astonishing how close this gradient between the different countries on the corruption level aligns with the gradient in welfare, in household income and things like this.

The paralyzing effect of this type of corruption upon the economy is something that is extremely interesting. Lots of economists are now turning their interest to that, which is new. In the 1970s, there was a Nobel Prize-winning economist, Gunnar Myrdal, who said that corruption is practically taboo as a research topic among economists. This has well changed in the decades since. It has become a very interesting topic for law students, for students of economy, sociology, and historians, of course, because corruption has always been with us. This is now a booming field, and I would like to approach this with evolutionary game theory.

Evolutionary game theory has a long tradition, and I have witnessed its development practically from the beginning. Some of the most important pioneers were Robert Axelrod and John Maynard Smith. In particular, Axelrod who in the late ‘70s wrote a truly seminal book called The Evolution of Cooperation, which iterated the prisoner’s dilemma. He showed that there is a way out of the social dilemma, which is based on reciprocity. This surprised economists, particularly, game theoreticians. He showed that by viewing social dilemmas in the context of a population where people learn from each other, where the social learning imitates whatever type of behavior is currently the best, you can place it into a context where cooperative strategies, like tit for tat, based on reciprocation can evolve….(More)”.

Statistics and data science degrees: Overhyped or the real deal?


 at The Conversation“Data science” is hot right now. The number of undergraduate degrees in statistics has tripled in the past decade, and as a statistics professor, I can tell you that it isn’t because freshmen love statistics.

Way back in 2009, economist Hal Varian of Google dubbed statistician the “next sexy job.” Since then, statistician, data scientist and actuary have topped various “best jobs” lists. Not to mention the enthusiastic press coverage of industry applications: Machine learning! Big dataAIDeep learning!

But is it good advice? I’m going to voice an unpopular opinion for the sake of starting a conversation. Stats is indeed useful, but not in the way that the popular media – and all those online data science degree programs – seem to suggest….

While all the press tends to go to the sensationalist applications – computers that watch cat videos, anyone? – the data science boom reflects a broad increase in demand for data literacy, as a baseline requirement for modern jobs.

The “big data era” doesn’t just mean large amounts of data; it also means increased ease and ability to collect data of all types, in all walks of life. Although the big five tech companies – Google, Apple, Amazon, Facebook and Microsoft – represent about 10 percent of the U.S. market cap and dominate the public imagination, they employ only one-half of one percent of all employees.

Therefore, to be a true revolution, data science will need to infiltrate nontech industries. And it is. The U.S. has seen its impact on political campaigns. I myself have consulted in the medical devices sector. A few years back, Walmart held a data analysis competition as a recruiting tool. The need for people that can dig into the data and parse it is everywhere.

In a speech at the National Academy of Sciences in 2015, Steven “Freakonomics” Levitt related his insights about the need for data-savvy workers, based on his experience as a sought-after consultant in fields ranging from the airline industry to fast food….(More)”.

Craft metrics to value co-production


Liz Richardson and Beth Perry at Nature: “Advocates of co-production encourage collaboration between professional researchers and those affected by that research, to ensure that the resulting science is relevant and useful. Opening up science beyond scientists is essential, particularly where problems are complex, solutions are uncertain and values are salient. For example, patients should have input into research on their conditions, and first-hand experience of local residents should shape research on environmental-health issues.

But what constitutes success on these terms? Without a better understanding of this, it is harder to incentivize co-production in research. A key way to support co-production is reconfiguring that much-derided feature of academic careers: metrics.

Current indicators of research output (such as paper counts or the h-index) conceptualize the value of research narrowly. They are already roundly criticized as poor measures of quality or usefulness. Less appreciated is the fact that these metrics also leave out the societal relevance of research and omit diverse approaches to creating knowledge about social problems.

Peer review also has trouble assessing the value of research that sits at disciplinary boundaries or that addresses complex social challenges. It denies broader social accountability by giving scientists a monopoly on determining what is legitimate knowledge1. Relying on academic peer review as a means of valuing research can discourage broader engagement.

This privileges abstract and theoretical research over work that is localized and applied. For example, research on climate-change adaptation, conducted in the global south by researchers embedded in affected communities, can make real differences to people’s lives. Yet it is likely to be valued less highly by conventional evaluation than research that is generalized from afar and then published in a high-impact English-language journal….(More)”.

What is machine learning?


Chris Meserole at Brookings: “In the summer of 1955, while planning a now famous workshop at Dartmouth College, John McCarthy coined the term “artificial intelligence” to describe a new field of computer science. Rather than writing programs that tell a computer how to carry out a specific task, McCarthy pledged that he and his colleagues would instead pursue algorithms that could teach themselves how to do so. The goal was to create computers that could observe the world and then make decisions based on those observations—to demonstrate, that is, an innate intelligence.

The question was how to achieve that goal. Early efforts focused primarily on what’s known as symbolic AI, which tried to teach computers how to reason abstractly. But today the dominant approach by far is machine learning, which relies on statistics instead. Although the approach dates back to the 1950s—one of the attendees at Dartmouth, Arthur Samuels, was the first to describe his work as “machine learning”—it wasn’t until the past few decades that computers had enough storage and processing power for the approach to work well. The rise of cloud computing and customized chips has powered breakthrough after breakthrough, with research centers like OpenAI or DeepMind announcing stunning new advances seemingly every week.

The extraordinary success of machine learning has made it the default method of choice for AI researchers and experts. Indeed, machine learning is now so popular that it has effectively become synonymous with artificial intelligence itself. As a result, it’s not possible to tease out the implications of AI without understanding how machine learning works—as well as how it doesn’t….(More)”.