Paper by Sofia C. Olhede and Patrick J. Wolfe in Statistics & Probability Letters: “The Danish physicist Niels Bohr is said to have remarked: “Prediction is very difficult, especially about the future”. Predicting the future of statistics in the era of big data is not so very different from prediction about anything else. Ever since we started to collect data to predict cycles of the moon, seasons, and hence future agriculture yields, humankind has worked to infer information from indirect observations for the purpose of making predictions.
Even while acknowledging the momentous difficulty in making predictions about the future, a few topics stand out clearly as lying at the current and future intersection of statistics and data science. Not all of these topics are of a strictly technical nature, but all have technical repercussions for our field. How might these repercussions shape the still relatively young field of statistics? And what can sound statistical theory and methods bring to our understanding of the foundations of data science? In this article we discuss these issues and explore how new open questions motivated by data science may in turn necessitate new statistical theory and methods now and in the future.
Together, the ubiquity of sensing devices, the low cost of data storage, and the commoditization of computing have led to a volume and variety of modern data sets that would have been unthinkable even a decade ago. We see four important implications for statistics.
First, many modern data sets are related in some way to human behavior. Data might have been collected by interacting with human beings, or personal or private information traceable back to a given set of individuals might have been handled at some stage. Mathematical or theoretical statistics traditionally does not concern itself with the finer points of human behavior, and indeed many of us have only had limited training in the rules and regulations that pertain to data derived from human subjects. Yet inevitably in a data-rich world, our technical developments cannot be divorced from the types of data sets we can collect and analyze, and how we can handle and store them.
Second, the importance of data to our economies and civil societies means that the future of regulation will look not only to protect our privacy, and how we store information about ourselves, but also to include what we are allowed to do with that data. For example, as we collect high-dimensional vectors about many family units across time and space in a given region or country, privacy will be limited by that high-dimensional space, but our wish to control what we do with data will go beyond that….
Third, the growing complexity of algorithms is matched by an increasing variety and complexity of data. Data sets now come in a variety of forms that can be highly unstructured, including images, text, sound, and various other new forms. These different types of observations have to be understood together, resulting in multimodal data, in which a single phenomenon or event is observed through different types of measurement devices. Rather than having one phenomenon corresponding to single scalar values, a much more complex object is typically recorded. This could be a three-dimensional shape, for example in medical imaging, or multiple types of recordings such as functional magnetic resonance imaging and simultaneous electroencephalography in neuroscience. Data science therefore challenges us to describe these more complex structures, modeling them in terms of their intrinsic patterns.
Finally, the types of data sets we now face are far from satisfying the classical statistical assumptions of identically distributed and independent observations. Observations are often “found” or repurposed from other sampling mechanisms, rather than necessarily resulting from designed experiments….
Our field will either meet these challenges and become increasingly ubiquitous, or risk rapidly becoming irrelevant to the future of data science and artificial intelligence….(More)”.
The current moment in world history is a painful one. Open societies are in crisis, and various forms of dictatorships and mafia states, exemplified by Vladimir Putin’s Russia, are on the rise. In the United States, President Donald Trump would like to establish his own mafia-style state but cannot, because the Constitution, other institutions, and a vibrant civil society won’t allow it….
The rise and monopolistic behavior of the giant American Internet platform companies is contributing mightily to the US government’s impotence. These companies have often played an innovative and liberating role. But as Facebook and Google have grown ever more powerful, they have become obstacles to innovation, and have caused a variety of problems of which we are only now beginning to become aware…
Social media companies’ true customers are their advertisers. But a new business model is gradually emerging, based not only on advertising but also on selling products and services directly to users. They exploit the data they control, bundle the services they offer, and use discriminatory pricing to keep more of the benefits that they would otherwise have to share with consumers. This enhances their profitability even further, but the bundling of services and discriminatory pricing undermine the efficiency of the market economy.
Social media companies deceive their users by manipulating their attention, directing it toward their own commercial purposes, and deliberately engineering addiction to the services they provide. This can be very harmful, particularly for adolescents.
There is a similarity between Internet platforms and gambling companies. Casinos have developed techniques to hook customers to the point that they gamble away all of their money, even money they don’t have.
Something similar – and potentially irreversible – is happening to human attention in our digital age. This is not a matter of mere distraction or addiction; social media companies are actually inducing people to surrender their autonomy. And this power to shape people’s attention is increasingly concentrated in the hands of a few companies.
It takes significant effort to assert and defend what John Stuart Mill called the freedom of mind. Once lost, those who grow up in the digital age may have difficulty regaining it.
This would have far-reaching political consequences. People without the freedom of mind can be easily manipulated. This danger does not loom only in the future; it already played an important role in the 2016 US presidential election.
There is an even more alarming prospect on the horizon: an alliance between authoritarian states and large, data-rich IT monopolies, bringing together nascent systems of corporate surveillance with already-developed systems of state-sponsored surveillance. This may well result in a web of totalitarian control the likes of which not even George Orwell could have imagined….(More)”.