Big Data: the End of the Scientific Method?


Paper by S. Succi and P.V. Coveney at arXiv: “We argue that the boldest claims of Big Data are in need of revision and toning-down, in view of a few basic lessons learned from the science of complex systems. We point out that, once the most extravagant claims of Big Data are properly discarded, a synergistic merging of BD with big theory offers considerable potential to spawn a new scientific paradigm capable of overcoming some of the major barriers confronted by the modern scientific method originating with Galileo. These obstacles are due to the presence of nonlinearity, nonlocality and hyperdimensions which one encounters frequently in multiscale modelling….(More)”.

Doing good data science


Mike Loukides, Hilary Mason and DJ Patil at O’Reilly: “(This post is the first in a series on data ethics) The hard thing about being an ethical data scientist isn’t understanding ethics. It’s the junction between ethical ideas and practice. It’s doing good data science.

There has been a lot of healthy discussion about data ethics lately. We want to be clear: that discussion is good, and necessary. But it’s also not the biggest problem we face. We already have good standards for data ethics. The ACM’s code of ethics, which dates back to 1993, is clear, concise, and surprisingly forward-thinking; 25 years later, it’s a great start for anyone thinking about ethics. The American Statistical Association has a good set of ethical guidelines for working with data. So, we’re not working in a vacuum.

And, while there are always exceptions, we believe that most people want to be fair. Data scientists and software developers don’t want to harm the people using their products. There are exceptions, of course; we call them criminals and con artists. Defining “fairness” is difficult, and perhaps impossible, given the many crosscutting layers of “fairness” that we might be concerned with. But we don’t have to solve that problem in advance, and it’s not going to be solved in a simple statement of ethical principles, anyway.

The problem we face is different: how do we put ethical principles into practice? We’re not talking about an abstract commitment to being fair. Ethical principles are worse than useless if we don’t allow them to change our practice, if they don’t have any effect on what we do day-to-day. For data scientists, whether you’re doing classical data analysis or leading-edge AI, that’s a big challenge. We need to understand how to build the software systems that implement fairness. That’s what we mean by doing good data science.

Any code of data ethics will tell you that you shouldn’t collect data from experimental subjects without informed consent. But that code won’t tell you how to implement “informed consent.” Informed consent is easy when you’re interviewing a few dozen people in person for a psychology experiment. Informed consent means something different when someone clicks on an item in an online catalog (hello, Amazon), and ads for that item start following them around ad infinitum. Do you use a pop-up to ask for permission to use their choice in targeted advertising? How many customers would you lose? Informed consent means something yet again when you’re asking someone to fill out a profile for a social site, and you might (or might not) use that data for any number of experimental purposes. Do you pop up a consent form in impenetrable legalese that basically says “we will use your data, but we don’t know for what”? Do you phrase this agreement as an opt-out, and hide it somewhere on the site where nobody will find it?…

To put ethical principles into practice, we need space to be ethical. We need the ability to have conversations about what ethics means, what it will cost, and what solutions to implement. As technologists, we frequently share best practices at conferences, write blog posts, and develop open source technologies—but we rarely discuss problems such as how to obtain informed consent.

There are several facets to this space that we need to think about.

First, we need corporate cultures in which discussions about fairness, about the proper use of data, and about the harm that can be done by inappropriate use of data can be considered. In turn, this means that we can’t rush products out the door without thinking about how they’re used. We can’t allow “internet time” to mean ignoring the consequences. Indeed, computer security has shown us the consequences of ignoring the consequences: many companies that have never taken the time to implement good security practices and safeguards are now paying with damage to their reputations and their finances. We need to do the same when thinking about issues like fairness, accountability, and unintended consequences….(More)”.

Making a 21st Century Constitution: Playing Fair in Modern Democracies


Making a 21st Century Constitution

Book by Frank Vibert: “Democratic constitutions are increasingly unfit for purpose with governments facing increased pressures from populists and distrust from citizens. The only way to truly solve these problems is through reform. Within this important book, Frank Vibert sets out the key challenges to reform, the ways in which constitutions should be revitalised and provides the standards against which reform should be measured…

Democratic governments are increasingly under pressure from populists, and distrust of governmental authority is on the rise. Economic causes are often blamed. Making a 21st Century Constitution proposes instead that constitutions no longer provide the kind of support that democracies need in today’s conditions, and outlines ways in which reformers can rectify this.

Frank Vibert addresses key sources of constitutional obsolescence, identifies the main challenges for constitutional updating and sets out the ways in which constitutions may be made suitable for the the 21st century. The book highlights the need for reformers to address the deep diversity of values in today’s urbanized societies, the blind spots and content-lite nature of democratic politics, and the dispersion of authority among new chains of intermediaries.

This book will be invaluable for students of political science, public administration and policy, law and constitutional economics. Its analysis of how constitutions can be made fit for purpose again will appeal to all concerned with governance, practitioners and reformers alike…(More)”.

Ethics as Methods: Doing Ethics in the Era of Big Data Research—Introduction


Introduction to the Special issue of Social Media + Society on “Ethics as Methods: Doing Ethics in the Era of Big Data Research”: Building on a variety of theoretical paradigms (i.e., critical theory, [new] materialism, feminist ethics, theory of cultural techniques) and frameworks (i.e., contextual integrity, deflationary perspective, ethics of care), the Special Issue contributes specific cases and fine-grained conceptual distinctions to ongoing discussions about the ethics in data-driven research.

In the second decade of the 21st century, a grand narrative is emerging that posits knowledge derived from data analytics as true, because of the objective qualities of data, their means of collection and analysis, and the sheer size of the data set. The by-product of this grand narrative is that the qualitative aspects of behavior and experience that form the data are diminished, and the human is removed from the process of analysis.

This situates data science as a process of analysis performed by the tool, which obscures human decisions in the process. The scholars involved in this Special Issue problematize the assumptions and trends in big data research and point out the crisis in accountability that emerges from using such data to make societal interventions.

Our collaborators offer a range of answers to the question of how to configure ethics through a methodological framework in the context of the prevalence of big data, neural networks, and automated, algorithmic governance of much of human socia(bi)lity…(More)”.

Open Science by Design: Realizing a Vision for 21st Century Research


Report by the National Academies of Sciences: “Openness and sharing of information are fundamental to the progress of science and to the effective functioning of the research enterprise. The advent of scientific journals in the 17th century helped power the Scientific Revolution by allowing researchers to communicate across time and space, using the technologies of that era to generate reliable knowledge more quickly and efficiently. Harnessing today’s stunning, ongoing advances in information technologies, the global research enterprise and its stakeholders are moving toward a new open science ecosystem. Open science aims to ensure the free availability and usability of scholarly publications, the data that result from scholarly research, and the methodologies, including code or algorithms, that were used to generate those data.

Open Science by Design is aimed at overcoming barriers and moving toward open science as the default approach across the research enterprise. This report explores specific examples of open science and discusses a range of challenges, focusing on stakeholder perspectives. It is meant to provide guidance to the research enterprise and its stakeholders as they build strategies for achieving open science and take the next steps….(More)”.

Forty years of wicked problems literature: forging closer links to policy studies


Brian W. Head at Policy and Society: “Rittel and Webber boldly challenged the conventional assumption that ‘scientific’ approaches to social policy and planning provide the most reliable guidance for practitioners and researchers who are addressing complex, and contested, social problems.

This provocative claim, that scientific-technical approaches would not ‘work’ for complex social issues, has engaged policy analysts, academic researchers and planning practitioners since the 1970s. Grappling with the implications of complexity and uncertainty in policy debates, the first generation of ‘wicked problem’ scholars generally agreed that wicked issues require correspondingly complex and iterative approaches. This tended to quarantine complex ‘wicked’ problems as a special category that required special collaborative processes.

Most often they recommended the inclusion of multiple stakeholders in exploring the relevant issues, interests, value differences and policy responses. More than four decades later, however, there are strong arguments for developing a second-generation approach which would ‘mainstream’ the analysis of wicked problems in public policy. While continuing to recognize the centrality of complexity and uncertainty, and the need for creative thinking, a broader approach would make better use of recent public policy literatures on such topics as problem framing, policy design, policy capacity and the contexts of policy implementation….(More)”.

‘Data is a fingerprint’: why you aren’t as anonymous as you think online


Olivia Solon at The Guardian: “In August 2016, the Australian government released an “anonymised” data set comprising the medical billing records, including every prescription and surgery, of 2.9 million people.

Names and other identifying features were removed from the records in an effort to protect individuals’ privacy, but a research team from the University of Melbourne soon discovered that it was simple to re-identify people, and learn about their entire medical history without their consent, by comparing the dataset to other publicly available information, such as reports of celebrities having babies or athletes having surgeries.

The government pulled the data from its website, but not before it had been downloaded 1,500 times.

This privacy nightmare is one of many examples of seemingly innocuous, “de-identified” pieces of information being reverse-engineered to expose people’s identities. And it’s only getting worse as people spend more of their lives online, sprinkling digital breadcrumbs that can be traced back to them to violate their privacy in ways they never expected.

Nameless New York taxi logs were compared with paparazzi shots at locations around the city to reveal that Bradley Cooper and Jessica Alba were bad tippers. In 2017 German researchers were able to identify people based on their “anonymous” web browsing patterns. This week University College London researchers showed how they could identify an individual Twitter user based on the metadata associated with their tweets, while the fitness tracking app Polar revealed the homes and in some cases names of soldiers and spies.

“It’s convenient to pretend it’s hard to re-identify people, but it’s easy. The kinds of things we did are the kinds of things that any first-year data science student could do,” said Vanessa Teague, one of the University of Melbourne researchers to reveal the flaws in the open health data.

One of the earliest examples of this type of privacy violation occurred in 1996 when the Massachusetts Group Insurance Commission released “anonymised” data showing the hospital visits of state employees. As with the Australian data, the state removed obvious identifiers like name, address and social security number. Then the governor, William Weld, assured the public that patients’ privacy was protected….(More)”.

Data infrastructure literacy


Paper by Jonathan Gray, Carolin Gerlitz and Liliana Bounegru at Big Data & Society: “A recent report from the UN makes the case for “global data literacy” in order to realise the opportunities afforded by the “data revolution”. Here and in many other contexts, data literacy is characterised in terms of a combination of numerical, statistical and technical capacities. In this article, we argue for an expansion of the concept to include not just competencies in reading and working with datasets but also the ability to account for, intervene around and participate in the wider socio-technical infrastructures through which data is created, stored and analysed – which we call “data infrastructure literacy”. We illustrate this notion with examples of “inventive data practice” from previous and ongoing research on open data, online platforms, data journalism and data activism. Drawing on these perspectives, we argue that data literacy initiatives might cultivate sensibilities not only for data science but also for data sociology, data politics as well as wider public engagement with digital data infrastructures. The proposed notion of data infrastructure literacy is intended to make space for collective inquiry, experimentation, imagination and intervention around data in educational programmes and beyond, including how data infrastructures can be challenged, contested, reshaped and repurposed to align with interests and publics other than those originally intended….(More)”

Microsoft Research Open Data


Microsoft Research Open Data: “… is a data repository that makes available datasets that researchers at Microsoft have created and published in conjunction with their research. You can browse available datasets and either download them or directly copy them to an Azure-based Virtual Machine or Data Science Virtual Machine. To the extent possible, we follow FAIR (findable, accessible, interoperable and reusable) data principles and will continue to push towards the highest standards for data sharing. We recognize that there are dozens of data repositories already in use by researchers and expect that the capabilities of this repository will augment existing efforts. Datasets are categorized by their primary research area. You can find links to research projects or publications with the dataset.

What is our goal?

Our goal is to provide a simple platform to Microsoft’s researchers and collaborators to share datasets and related research technologies and tools. The site has been designed to simplify access to these data sets, facilitate collaboration between researchers using cloud-based resources, and enable the reproducibility of research. We will continue to evolve and grow this repository and add features to it based on feedback from the community.

How did this project come to be?

Over the past few years, our team, based at Microsoft Research, has worked extensively with the research community to create cloud-based research infrastructure. We started this project as a prototype about a year ago and are excited to finally share it with the research community to support data-intensive research in the cloud. Because almost all research projects have a data component, there is real need for curated and meaningful datasets in the research community, not only in computer science but in interdisciplinary and domain sciences. We have now made several such datasets available for download or use directly on cloud infrastructure….(More)”.

The Global Council on Extended Intelligence


“The IEEE Standards Association (IEEE-SA) and the MIT Media Lab are joining forces to launch a global Council on Extended Intelligence (CXI) composed of individuals who agree on the following:

One of the most powerful narratives of modern times is the story of scientific and technological progress. While our future will undoubtedly be shaped by the use of existing and emerging technologies – in particular, of autonomous and intelligent systems (A/IS) – there is no guarantee that progress defined by “the next” is beneficial. Growth for humanity’s future should not be defined by reductionist ideas of speed or size alone but as the holistic evolution of our species in positive alignment with the environmental and other systems comprising the modern algorithmic world.

We believe all systems must be responsibly created to best utilize science and technology for tangible social and ethical progress. Individuals, businesses and communities involved in the development and deployment of autonomous and intelligent technologies should mitigate predictable risks at the inception and design phase and not as an afterthought. This will help ensure these systems are created in such a way that their outcomes are beneficial to society, culture and the environment.

Autonomous and intelligent technologies also need to be created via participatory design, where systems thinking can help us avoid repeating past failures stemming from attempts to control and govern the complex-adaptive systems we are part of. Responsible living with or in the systems we are part of requires an awareness of the constrictive paradigms we operate in today. Our future practices will be shaped by our individual and collective imaginations and by the stories we tell about who we are and what we desire, for ourselves and the societies in which we live.

These stories must move beyond the “us versus them” media mentality pitting humans against machines. Autonomous and intelligent technologies have the potential to enhance our personal and social skills; they are much more fully integrated and less discrete than the term “artificial intelligence” implies. And while this process may enlarge our cognitive intelligence or make certain individuals or groups more powerful, it does not necessarily make our systems more stable or socially beneficial.

We cannot create sound governance for autonomous and intelligent systems in the Algorithmic Age while utilizing reductionist methodologies. By proliferating the ideals of responsible participant design, data symmetry and metrics of economic prosperity prioritizing people and the planet over profit and productivity, The Council on Extended Intelligence will work to transform reductionist thinking of the past to prepare for a flourishing future.

Three Priority Areas to Fulfill Our Vision

1 – Build a new narrative for intelligent and autonomous technologies inspired by principles of systems dynamics and design.

“Extended Intelligence” is based on the hypothesis that intelligence, ideas, analysis and action are not formed in any one individual collection of neurons or code…..

2 – Reclaim our digital identity in the algorithmic age

Business models based on tracking behavior and using outdated modes of consent are compounded by the appetites of states, industries and agencies for all data that may be gathered….

3 – Rethink our metrics for success

Although very widely used, concepts of exponential growth and productivity such as the gross domestic product (GDP) index are insufficient to holistically measure societal prosperity. … (More)”.