Big and open data are prompting a reform of scientific governance


Sabina Leonelli in Times Higher Education: “Big data are widely seen as a game-changer in scientific research, promising new and efficient ways to produce knowledge. And yet, large and diverse data collections are nothing new – they have long existed in fields such as meteorology, astronomy and natural history.

What, then, is all the fuss about? In my recent book, I argue that the true revolution is in the status accorded to data as research outputs in their own right. Along with this has come an emphasis on open data as crucial to excellent and reliable science.

Previously – ever since scientific journals emerged in the 17th century – data were private tools, owned by the scientists who produced them and scrutinised by a small circle of experts. Their usefulness lay in their function as evidence for a given hypothesis. This perception has shifted dramatically in the past decade. Increasingly, data are research components that can and should be made publicly available and usable.

Rather than the birth of a data-driven epistemology, we are thus witnessing the rise of a data-centric approach in which efforts to mobilise, integrate and visualise data become contributions to discovery, not just a by-product of hypothesis testing.

The rise of data-centrism highlights the challenges involved in gathering, classifying and interpreting data, and the concepts, technologies and social structures that surround these processes. This has implications for how research is conducted, organised, governed and assessed.

Data-centric science requires shifts in the rewards and incentives provided to those who produce, curate and analyse data. This challenges established hierarchies: laboratory technicians, librarians and database managers turn out to have crucial skills, subverting the common view of their jobs as marginal to knowledge production. Ideas of research excellence are also being challenged. Data management is increasingly recognised as crucial to the sustainability and impact of research, and national funders are moving away from citation counts and impact factors in evaluations.

New uses of data are forcing publishers to re-assess their business models and dissemination procedures, and research institutions are struggling to adapt their management and administration.

Data-centric science is emerging in concert with calls for increased openness in research….(More)”

Human Decisions and Machine Predictions


NBER Working Paper by Jon Kleinberg, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainatha: “We examine how machine learning can be used to improve and understand human decision-making. In particular, we focus on a decision that has important policy consequences. Millions of times each year, judges must decide where defendants will await trial—at home or in jail. By law, this decision hinges on the judge’s prediction of what the defendant would do if released. This is a promising machine learning application because it is a concrete prediction task for which there is a large volume of data available. Yet comparing the algorithm to the judge proves complicated. First, the data are themselves generated by prior judge decisions. We only observe crime outcomes for released defendants, not for those judges detained. This makes it hard to evaluate counterfactual decision rules based on algorithmic predictions. Second, judges may have a broader set of preferences than the single variable that the algorithm focuses on; for instance, judges may care about racial inequities or about specific crimes (such as violent crimes) rather than just overall crime risk. We deal with these problems using different econometric strategies, such as quasi-random assignment of cases to judges. Even accounting for these concerns, our results suggest potentially large welfare gains: a policy simulation shows crime can be reduced by up to 24.8% with no change in jailing rates, or jail populations can be reduced by 42.0% with no increase in crime rates. Moreover, we see reductions in all categories of crime, including violent ones. Importantly, such gains can be had while also significantly reducing the percentage of African-Americans and Hispanics in jail. We find similar results in a national dataset as well. In addition, by focusing the algorithm on predicting judges’ decisions, rather than defendant behavior, we gain some insight into decision-making: a key problem appears to be that judges to respond to ‘noise’ as if it were signal. These results suggest that while machine learning can be valuable, realizing this value requires integrating these tools into an economic framework: being clear about the link between predictions and decisions; specifying the scope of payoff functions; and constructing unbiased decision counterfactuals….(More)”

How a Political Scientist Knows What Our Enemies Will Do (Often Before They Do)


Political scientists have now added rigorous mathematical techniques to their social-science toolbox, creating methods to explain—and even predict—the actions of adversaries, thus making society safer as well as smarter. Such techniques allowed the U.S. government to predict the fall of President Ferdinand Marcos of the Philippines in 1986, helping hatch a strategy to ease him out of office and avoid political chaos in that nation. And at Los Angeles International Airport a computer system predicts the tactical calculations of criminals and terrorists, making sure that patrols and checkpoints are placed in ways that adversaries can’t exploit.

The advances in solving the puzzle of human behavior represent a dramatic turnaround for the field of political science, notes Bruce Bueno de Mesquita, a professor of politics at New York University. “In the mid-1960s, I took a statistics course,” he recalls, “and my undergraduate advisor was appalled. He told me that I was wasting my time.” It took researchers many years of patient work, putting piece after piece of the puzzle of human behavior together, to arrive at today’s new knowledge. The result has been dramatic progress in the nation’s ability to protect its interests at home and abroad.

Social scientists have not abandoned the proven tools that Bueno de Mesquita and generations of other scholars acquired as they mastered their discipline. Rather, adding the rigor of mathematical analysis has allowed them to solve more of the puzzle. Mathematical models of human behavior let social scientists assemble a picture of the previously unnoticed forces that drive behavior—forces common to all situations, operating below the emotions, drama, and history that make each conflict unique….(More)”

Mapping open data governance models: Who makes decisions about government data and how?


Ana Brandusescu, Danny Lämmerhirt and Stefaan Verhulst call for a systematic and comparative investigation of the different governance models for open data policy and publication….

“An important value proposition behind open data involves increased transparency and accountability of governance. Yet little is known about how open data itself is governed. Who decides and how? How accountable are data holders to both the demand side and policy makers? How do data producers and actors assure the quality of government data? Who, if any, are data stewards within government tasked to make its data open?

Getting a better understanding of open data governance is not only important from an accountability point of view. If there is a better insight of the diversity of decision-making models and structures across countries, the implementation of common open data principles, such as those advocated by the International Open Data Charter, can be accelerated across countries.

In what follows, we seek to develop the initial contours of a research agenda on open data governance models. We start from the premise that different countries have different models to govern and administer their activities – in short, different ‘governance models’. Some countries are more devolved in their decision making, while others seek to organize “public administration” activities more centrally. These governance models clearly impact how open data is governed – providing a broad patchwork of different open data governance across the world and making it difficult to identify who the open data decision makers and data gatekeepers or stewards are within a given country.

For example, if one wants to accelerate the opening up of education data across borders, in some countries this may fall under the authority of sub-national government (such as states, provinces, territories or even cities), while in other countries education is governed by central government or implemented through public-private partnership arrangements. Similarly, transportation or water data may be privatised, while in other cases it may be the responsibility of municipal or regional government. Responsibilities are therefore often distributed across administrative levels and agencies affecting how (open) government data is produced, and published….(More)”

Corporate Social Responsibility for a Data Age


Stefaan G. Verhulst in the Stanford Social Innovation Review: “Proprietary data can help improve and save lives, but fully harnessing its potential will require a cultural transformation in the way companies, governments, and other organizations treat and act on data….

We live, as it is now common to point out, in an era of big data. The proliferation of apps, social media, and e-commerce platforms, as well as sensor-rich consumer devices like mobile phones, wearable devices, commercial cameras, and even cars generate zettabytes of data about the environment and about us.

Yet much of the most valuable data resides with the private sector—for example, in the form of click histories, online purchases, sensor data, and call data records. This limits its potential to benefit the public and to turn data into a social asset. Consider how data held by business could help improve policy interventions (such as better urban planning) or resiliency at a time of climate change, or help design better public services to increase food security.

Data responsibility suggests steps that organizations can take to break down these private barriers and foster so-called data collaboratives, or ways to share their proprietary data for the public good. For the private sector, data responsibility represents a new type of corporate social responsibility for the 21st century.

While Nepal’s Ncell belongs to a relatively small group of corporations that have shared their data, there are a few encouraging signs that the practice is gaining momentum. In Jakarta, for example, Twitter exchanged some of its data with researchers who used it to gather and display real-time information about massive floods. The resulting website, PetaJakarta.org, enabled better flood assessment and management processes. And in Senegal, the Data for Development project has brought together leading cellular operators to share anonymous data to identify patterns that could help improve health, agriculture, urban planning, energy, and national statistics.

Examples like this suggest that proprietary data can help improve and save lives. But to fully harness the potential of data, data holders need to fulfill at least three conditions. I call these the “the three pillars of data responsibility.”…

The difficulty of translating insights into results points to some of the larger social, political, and institutional shifts required to achieve the vision of data responsibility in the 21st century. The move from data shielding to data sharing will require that we make a cultural transformation in the way companies, governments, and other organizations treat and act on data. We must incorporate new levels of pro-activeness, and make often-unfamiliar commitments to transparency and accountability.

By way of conclusion, here are four immediate steps—essential but not exhaustive—we can take to move forward:

  1. Data holders should issue a public commitment to data responsibility so that it becomes the default—an expected, standard behavior within organizations.
  2. Organizations should hire data stewards to determine what and when to share, and how to protect and act on data.
  3. We must develop a data responsibility decision tree to assess the value and risk of corporate data along the data lifecycle.
  4. Above all, we need a data responsibility movement; it is time to demand data responsibility to ensure data improves and safeguards people’s lives…(More)”

The value of crowdsourcing in public policymaking: epistemic, democratic and economic value


 &  in The Theory and Practice of Legislation: “While national and local governments increasingly deploy crowdsourcing in lawmaking as an open government practice, it remains unclear how crowdsourcing creates value when it is applied in policymaking. Therefore, in this article, we examine value creation in crowdsourcing for public policymaking. We introduce a framework for analysing value creation in public policymaking in the following three dimensions: democratic, epistemic and economic. Democratic value is created by increasing transparency, accountability, inclusiveness and deliberation in crowdsourced policymaking. Epistemic value is developed when crowdsourcing serves as a knowledge search mechanism and a learning context. Economic value is created when crowdsourcing makes knowledge search in policymaking more efficient and enables government to produce policies that better address citizens’ needs and societal issues. We show how these tenets of value creation are manifest in crowdsourced policymaking by drawing on instances of crowdsourced lawmaking, and we also discuss the contingencies and challenges preventing value creation…(More)”

Information for accountability: Transparency and citizen engagement for improved service delivery in education systems


Lindsay Read and Tamar Manuelyan Atinc at Brookings: “There is a wide consensus among policymakers and practitioners that while access to education has improved significantly for many children in low- and middle-income countries, learning has not kept pace. A large amount of research that has attempted to pinpoint the reasons behind this quality deficit in education has revealed that providing extra resources such as textbooks, learning materials, and infrastructure is largely ineffective in improving learning outcomes at the system level without accompanying changes to the underlying structures of education service delivery and associated systems of accountability.

Information is a key building block of a wide range of strategies that attempts to tackle weaknesses in service delivery and accountability at the school level, even where political systems disappoint at the national level. The dissemination of more and better quality information is expected to empower parents and communities to make better decisions in terms of their children’s schooling and to put pressure on school administrators and public officials for making changes that improve learning and learning environments. This theory of change underpins both social accountability and open data initiatives, which are designed to use information to enhance accountability and thereby influence education delivery.

This report seeks to extract insight into the nuanced relationship between information and accountability, drawing upon a vast literature on bottom-up efforts to improve service delivery, increase citizen engagement, and promote transparency, as well as case studies in Australia, Moldova, Pakistan, and the Philippines. In an effort to clarify processes and mechanisms behind information-based reforms in the education sector, this report also categorizes and evaluates recent impact evaluations according to the intensity of interventions and their target change agents—parents, teachers, school principals, and local officials. The idea here is not just to help clarify what works but why reforms work (or do not)….(More)”

Documenting Hate


Shan Wang at NiemanLab: “A family’s garage vandalized with an image of a swastika and a hateful message targeted at Arabs. Jewish community centers receiving bomb threats. These are just a slice of the incidents of hate across the country after the election of Donald Trump — but getting reliable data on the prevalence of hate and bias crimes to answer questions about whether these sorts of crimes are truly on the rise is nearly impossible.

ProPublica, which led an effort of more than a thousand reporters and students across the U.S. to cover voting problems on Election Day as part of its Electionland project, is now leaning on the collaborative and data-driven Electionland model to track and cover hate crimes.

Documenting Hate, launched last week, is a hate and bias crime-tracking project headed up by ProPublica and supported by a coalition of news and digital media organizations, universities, and civil rights groups like Southern Poverty Law Center (which has been tracking hate groups across the country). Like Electionland, the project is seeking local partners, and will share its data with and guide local reporters interested in writing relevant stories.

“Hate crimes are inadequately tracked,” Scott Klein, assistant manager editor at ProPublica, said. “Local police departments do not report up hate crimes in any consistent way, so the federal data is woefully inadequate, and there’s no good national data on hate crimes. The data is at best locked up by local police departments, and the best we can know is a local undercount.”

Documenting Hate offers a form for anyone to report a hate or bias crime (emphasizing that “we are not law enforcement and will not report this information to the police,” nor will it “share your name and contact information with anybody outside our coalition without your permission”). ProPublica is working with Meedan (whose verification platform Check it also used for Electionland) and crowdsourced crisis-mapping group Ushahidi, as well as several journalism schools, to verify reports coming in through social channels. Ken Schwencke, who helped build the infrastructure for Electionland, is now focused on things like building backend search databases for Documenting Hate, which can be shared with local reporters. The hope is that many stories, interactives, and a comprehensive national database will emerge and paint a fuller picture of the scope of hate crimes in the U.S.

ProPublica is actively seeking local partners, who will have access to the data as well as advice on how to report on sensitive information (no partners to announce just yet, though there’s been plenty of inbound interest, according to Klein). Some of the organizations working with ProPublica were already seeking reader stories of their own….(More)”.

Mass Observation: The amazing 80-year experiment to record our daily lives


William Cook at BBC Arts: “Eighty years ago, on 30th January 1937, the New Statesman published a letter which launched the largest (and strangest) writers’ group in British literary history.

An anthropologist called Tom Harrisson, a journalist called Charles Madge and a filmmaker called Humphrey Jennings wrote to the magazine asking for volunteers to take part in a new project called Mass Observation. Over a thousand readers responded, offering their services. Remarkably, this ‘scientific study of human social behaviour’ is still going strong today.

Mass Observation was the product of a growing interest in the social sciences, and a growing belief that the mass media wasn’t accurately reflecting the lives of so-called ordinary people. Instead of entrusting news gathering to jobbing journalists, who were under pressure to provide the stories their editors and proprietors wanted, Mass Observation recruited a secret army of amateur reporters, to track the habits and opinions of ‘the man in the street.’

Ironically, the three founders of this egalitarian movement were all extremely well-to-do. They’d all been to public schools and Oxbridge, but this was the ‘Age of Anxiety’, when capitalism was in chaos and dangerous demagogues were on the rise (plus ça change…).

For these idealistic public schoolboys, socialism was the answer, and Mass Observation was the future. By finding out what ‘ordinary’ folk were really doing, and really thinking, they would forge a new society, more attuned to the needs of the common man.

Mass Observation selected 500 citizen journalists, and gave them regular ‘directives’ to report back on virtually every aspect of their daily lives. They were guaranteed anonymity, which gave them enormous freedom. People opened up about themselves (and their peers) to an unprecedented degree.

Even though they were all unpaid, correspondents devoted a great deal of time to this endeavour – writing at great length, in great detail, over many years. As well as its academic value, Mass Observation proved that autobiography is not the sole preserve of the professional writer. For all of us, the urge to record and reflect upon our lives is a basic human need.

The Second World War was the perfect forum for this vast collective enterprise. Mass Observation became a national diary of life on the home front. For historians, the value of such uncensored revelations is enormous. These intimate accounts of air raids and rationing are far more revealing and evocative than the jolly state-sanctioned reportage of the war years.

After the war, Mass Observation became more commercial, supplying data for market research, and during the 1960s this extraordinary experiment gradually wound down. It was rescued from extinction by the historian Asa Briggs….

The founders of Mass Observation were horrified by what they called “the revival of racial superstition.” Hitler, Franco and Mussolini were in the forefront of their minds. “We are all in danger of extinction from such outbursts of atavism,” they wrote, in 1937. “We look to science to help us, only to find that science is too busy forging new weapons of mass destruction.”

For its founders, Mass Observation was a new science which would build a better future. For its countless correspondents, however, it became something more than that – not merely a social science, but a communal work of art….(More)”.

The science of society: From credible social science to better social policies


Nancy Cartwright and Julian Reiss at LSE Blog: “Society invests a great deal of money in social science research. Surely the expectation is that some of it will be useful not only for understanding ourselves and the societies we live in but also for changing them? This is certainly the hope of the very active evidence-based policy and practice movement, which is heavily endorsed in the UK both by the last Labour Government and by the current Coalition Government. But we still do not know how to use the results of social science in order to improve society. This has to change, and soon.

Last year the UK launched an extensive – and expensive – new What Works Network that, as the Government press release describes, consists of “two existing centres of excellence – the National Institute for Health and Clinical Excellence (NICE) and the Educational Endowment Foundation – plus four new independent institutions responsible for gathering, assessing and sharing the most robust evidence to inform policy and service delivery in tackling crime, promoting active and independent ageing, effective early intervention, and fostering local economic growth”.

This is an exciting and promising initiative. But it faces a series challenge: we remain unable to build real social policies based on the results of social science or to predict reliably what the outcomes of these policies will actually be. This contrasts with our understanding of how to establish the results in the first place.There we have a handle on the problem. We have a reasonable understanding of what kinds of methods are good for establishing what kinds of results and with what (at least rough) degrees of certainty.

There are methods – well thought through – that social scientists learn in the course of their training for constructing a questionnaire, running a randomised controlled trial, conducting an ethnographic study, looking for patterns in large data sets. There is nothing comparably explicit and well thought through about how to use social science knowledge to help predict what will happen when we implement a proposed policy in real, complex situations. Nor is there anything to help us estimate and balance the effectiveness, the evidence, the chances of success, the costs, the benefits, the winners and losers, and the social, moral, political and cultural acceptability of the policy.

To see why this is so difficult think of an analogy: not building social policies but building material technologies. We do not just read off instructions for building a laser – which may ultimately be used to operate on your eyes – from knowledge of basic science. Rather, we piece together a detailed model using heterogeneous knowledge from a mix of physics theories, from various branches of engineering, from experience of how specific materials behave, from the results of trial-and-error, etc. By analogy, building a successful social policy equally requires a mix of heterogeneous kinds of knowledge from radically different sources. Sometimes we are successful at doing this and some experts are very good at it in their own specific areas of expertise. But in both cases – both for material technology and for social technology – there is no well thought through, defensible guidance on how to do it: what are better and worse ways to proceed, what tools and information might be needed, and how to go about getting these. This is true whether we look for general advice that might be helpful across subject areas or advice geared to specific areas or specific kinds of problems. Though we indulge in social technology – indeed we can hardly avoid it – and are convinced that better social science will make for better policies, we do not know how to turn that conviction into a reality.

This presents a real challenge to the hopes for evidence-based policy….(More)”