Data Disrupts Corruption


Carlos Santiso & Ben Roseth at Stanford Social Innovation Review: “…The Panama Papers scandal demonstrates the power of data analytics to uncover corruption in a world flooded with terabytes needing only the computing capacity to make sense of it all. The Rousse impeachment illustrates how open data can be used to bring leaders to account. Together, these stories show how data, both “big” and “open,” is driving the fight against corruption with fast-paced, evidence-driven, crowd-sourced efforts. Open data can put vast quantities of information into the hands of countless watchdogs and whistleblowers. Big data can turn that information into insight, making corruption easier to identify, trace, and predict. To realize the movement’s full potential, technologists, activists, officials, and citizens must redouble their efforts to integrate data analytics into policy making and government institutions….

Making big data open cannot, in itself, drive anticorruption efforts. “Without analytics,” a 2014 White House report on big data and individual privacy underscored, “big datasets could be stored, and they could be retrieved, wholly or selectively. But what comes out would be exactly what went in.”

In this context, it is useful to distinguish the four main stages of data analytics to illustrate its potential in the global fight against corruption: Descriptive analytics uses data to describe what has happened in analyzing complex policy issues; diagnostic analytics goes a step further by mining and triangulating data to explain why a specific policy problem has happened, identify its root causes, and decipher underlying structural trends; predictive analytics uses data and algorithms to predict what is most likely to occur, by utilizing machine learning; and prescriptive analytics proposes what should be done to cause or prevent something from happening….

Despite the big data movement’s promise for fighting corruption, many challenges remain. The smart use of open and big data should focus not only on uncovering corruption, but also on better understanding its underlying causes and preventing its recurrence. Anticorruption analytics cannot exist in a vacuum; it must fit in a strategic institutional framework that starts with quality information and leads to reform. Even the most sophisticated technologies and data innovations cannot prevent what French novelist Théophile Gautier described as the “inexplicable attraction of corruption, even amongst the most honest souls.” Unless it is harnessed for improvements in governance and institutions, data analytics will not have the impact that it could, nor be sustainable in the long run…(More)”.

Big and open data are prompting a reform of scientific governance


Sabina Leonelli in Times Higher Education: “Big data are widely seen as a game-changer in scientific research, promising new and efficient ways to produce knowledge. And yet, large and diverse data collections are nothing new – they have long existed in fields such as meteorology, astronomy and natural history.

What, then, is all the fuss about? In my recent book, I argue that the true revolution is in the status accorded to data as research outputs in their own right. Along with this has come an emphasis on open data as crucial to excellent and reliable science.

Previously – ever since scientific journals emerged in the 17th century – data were private tools, owned by the scientists who produced them and scrutinised by a small circle of experts. Their usefulness lay in their function as evidence for a given hypothesis. This perception has shifted dramatically in the past decade. Increasingly, data are research components that can and should be made publicly available and usable.

Rather than the birth of a data-driven epistemology, we are thus witnessing the rise of a data-centric approach in which efforts to mobilise, integrate and visualise data become contributions to discovery, not just a by-product of hypothesis testing.

The rise of data-centrism highlights the challenges involved in gathering, classifying and interpreting data, and the concepts, technologies and social structures that surround these processes. This has implications for how research is conducted, organised, governed and assessed.

Data-centric science requires shifts in the rewards and incentives provided to those who produce, curate and analyse data. This challenges established hierarchies: laboratory technicians, librarians and database managers turn out to have crucial skills, subverting the common view of their jobs as marginal to knowledge production. Ideas of research excellence are also being challenged. Data management is increasingly recognised as crucial to the sustainability and impact of research, and national funders are moving away from citation counts and impact factors in evaluations.

New uses of data are forcing publishers to re-assess their business models and dissemination procedures, and research institutions are struggling to adapt their management and administration.

Data-centric science is emerging in concert with calls for increased openness in research….(More)”

Why Big Data Is a Big Deal for Cities


John M. Kamensky in Governing: “We hear a lot about “big data” and its potential value to government. But is it really fulfilling the high expectations that advocates have assigned to it? Is it really producing better public-sector decisions? It may be years before we have definitive answers to those questions, but new research suggests that it’s worth paying a lot of attention to.

University of Kansas Prof. Alfred Ho recently surveyed 65 mid-size and large cities to learn what is going on, on the front line, with the use of big data in making decisions. He found that big data has made it possible to “change the time span of a decision-making cycle by allowing real-time analysis of data to instantly inform decision-making.” This decision-making occurs in areas as diverse as program management, strategic planning, budgeting, performance reporting and citizen engagement.

Cities are natural repositories of big data that can be integrated and analyzed for policy- and program-management purposes. These repositories include data from public safety, education, health and social services, environment and energy, culture and recreation, and community and business development. They include both structured data, such as financial and tax transactions, and unstructured data, such as recorded sounds from gunshots and videos of pedestrian movement patterns. And they include data supplied by the public, such as the Boston residents who use a phone app to measure road quality and report problems.

These data repositories, Ho writes, are “fundamental building blocks,” but the challenge is to shift the ownership of data from separate departments to an integrated platform where the data can be shared.

There’s plenty of evidence that cities are moving in that direction and that they already are systematically using big data to make operational decisions. Among the 65 cities that Ho examined, he found that 49 have “some form of data analytics initiatives or projects” and that 30 have established “a multi-departmental team structure to do strategic planning for these data initiatives.”….The effective use of big data can lead to dialogs that cut across school-district, city, county, business and nonprofit-sector boundaries. But more importantly, it provides city leaders with the capacity to respond to citizens’ concerns more quickly and effectively….(More)”

Corporate Social Responsibility for a Data Age


Stefaan G. Verhulst in the Stanford Social Innovation Review: “Proprietary data can help improve and save lives, but fully harnessing its potential will require a cultural transformation in the way companies, governments, and other organizations treat and act on data….

We live, as it is now common to point out, in an era of big data. The proliferation of apps, social media, and e-commerce platforms, as well as sensor-rich consumer devices like mobile phones, wearable devices, commercial cameras, and even cars generate zettabytes of data about the environment and about us.

Yet much of the most valuable data resides with the private sector—for example, in the form of click histories, online purchases, sensor data, and call data records. This limits its potential to benefit the public and to turn data into a social asset. Consider how data held by business could help improve policy interventions (such as better urban planning) or resiliency at a time of climate change, or help design better public services to increase food security.

Data responsibility suggests steps that organizations can take to break down these private barriers and foster so-called data collaboratives, or ways to share their proprietary data for the public good. For the private sector, data responsibility represents a new type of corporate social responsibility for the 21st century.

While Nepal’s Ncell belongs to a relatively small group of corporations that have shared their data, there are a few encouraging signs that the practice is gaining momentum. In Jakarta, for example, Twitter exchanged some of its data with researchers who used it to gather and display real-time information about massive floods. The resulting website, PetaJakarta.org, enabled better flood assessment and management processes. And in Senegal, the Data for Development project has brought together leading cellular operators to share anonymous data to identify patterns that could help improve health, agriculture, urban planning, energy, and national statistics.

Examples like this suggest that proprietary data can help improve and save lives. But to fully harness the potential of data, data holders need to fulfill at least three conditions. I call these the “the three pillars of data responsibility.”…

The difficulty of translating insights into results points to some of the larger social, political, and institutional shifts required to achieve the vision of data responsibility in the 21st century. The move from data shielding to data sharing will require that we make a cultural transformation in the way companies, governments, and other organizations treat and act on data. We must incorporate new levels of pro-activeness, and make often-unfamiliar commitments to transparency and accountability.

By way of conclusion, here are four immediate steps—essential but not exhaustive—we can take to move forward:

  1. Data holders should issue a public commitment to data responsibility so that it becomes the default—an expected, standard behavior within organizations.
  2. Organizations should hire data stewards to determine what and when to share, and how to protect and act on data.
  3. We must develop a data responsibility decision tree to assess the value and risk of corporate data along the data lifecycle.
  4. Above all, we need a data responsibility movement; it is time to demand data responsibility to ensure data improves and safeguards people’s lives…(More)”

Why big data may be having a big effect on how our politics plays out


 in The Conversation: “…big data… is an inconceivably vast mass of information, which at first glance would seem a giant mess; just white noise.

Unless you know how to decipher it.

According to a story first published in Zurich-based Das Magazin in December and more recently taken up by Motherboard, events such as Brexit and Trump’s ascendency may have been made possible through just such deciphering. The argument is that technology combining psychological profiling and data analysis may have played a pivotal part in exploiting unconscious bias at the individual voter level. The theory is this was used in the recent US election to increase or suppress votes to benefit particular candidates in crucial locations. It is claimed that the company behind this may be active in numerous countries.

The technology at play is based on the integration of a model of psychological profiling known as OCEAN. This uses the details contained within individuals’ digital footprints to create user-specific profiles. These map to the level of the individual, identifiable voter, who can then be manipulated by exploiting beliefs, preferences and biases that they might not even be aware of, but which their data has revealed about them in glorious detail.

As well as enabling the creation of tailored media content, this can also be used to create scripts of relevant talking points for campaign doorknockers to focus on, according to the address and identity of the householder to whom they are speaking.

This goes well beyond the scope and detail of previous campaign strategies. If the theory about the role of these techniques is correct, it signals a new landscape of political strategising. An active researcher in the field, when writing about the company behind this technology (which Trump paid for services during his election campaign), described the potential scale of such technologies:

Marketers have long tailored their placement of advertisements based on their target group, for example by placing ads aimed at conservative consumers in magazines read by conservative audiences. What is new about the psychological targeting methods implemented by Cambridge Analytica, however, is their precision and scale. According to CEO Alexander Nix, the company holds detailed psycho-demographic profiles of more than 220 million US citizens and used over 175,000 different ad messages to meet the unique motivations of their recipients….(More)”

What Communication Can Contribute to Data Studies: Three Lenses on Communication and Data


Andrew Schrock at the International Journal of Communication: “We are awash in predictions about our data-driven future. Enthusiasts believe big data imposes new ways of knowing, while critics worry it will enable powerful regimes of institutional control. This debate has been of keen interest to communication scholars. To encourage conceptual clarity, this article draws on communication scholarship to suggest three lenses for data epistemologies. I review the common social scientific perspective of communication as data. A data as discourse lens interrogates the meanings that data carries. Communication around data describes moments where data are constructed. By employing multiple perspectives, we might understand how data operate as a complex structure of dominance….(More)”

Big data may be reinforcing racial bias in the criminal justice system


Laurel Eckhouse at the Washington Post: “Big data has expanded to the criminal justice system. In Los Angeles, police use computerized “predictive policing” to anticipate crimes and allocate officers. In Fort Lauderdale, Fla., machine-learning algorithms are used to set bond amounts. In states across the country, data-driven estimates of the risk of recidivism are being used to set jail sentences.

Advocates say these data-driven tools remove human bias from the system, making it more fair as well as more effective. But even as they have become widespread, we have little information about exactly how they work. Few of the organizations producing them have released the data and algorithms they use to determine risk.

 We need to know more, because it’s clear that such systems face a fundamental problem: The data they rely on are collected by a criminal justice system in which race makes a big difference in the probability of arrest — even for people who behave identically. Inputs derived from biased policing will inevitably make black and Latino defendants look riskier than white defendants to a computer. As a result, data-driven decision-making risks exacerbating, rather than eliminating, racial bias in criminal justice.
Consider a judge tasked with making a decision about bail for two defendants, one black and one white. Our two defendants have behaved in exactly the same way prior to their arrest: They used drugs in the same amount, have committed the same traffic offenses, owned similar homes and took their two children to the same school every morning. But the criminal justice algorithms do not rely on all of a defendant’s prior actions to reach a bail assessment — just those actions for which he or she has been previously arrested and convicted. Because of racial biases in arrest and conviction rates, the black defendant is more likely to have a prior conviction than the white one, despite identical conduct. A risk assessment relying on racially compromised criminal-history data will unfairly rate the black defendant as riskier than the white defendant.

To make matters worse, risk-assessment tools typically evaluate their success in predicting a defendant’s dangerousness on rearrests — not on defendants’ overall behavior after release. If our two defendants return to the same neighborhood and continue their identical lives, the black defendant is more likely to be arrested. Thus, the tool will falsely appear to predict dangerousness effectively, because the entire process is circular: Racial disparities in arrests bias both the predictions and the justification for those predictions.

We know that a black person and a white person are not equally likely to be stopped by police: Evidence on New York’s stop-and-frisk policy, investigatory stops, vehicle searches and drug arrests show that black and Latino civilians are more likely to be stopped, searched and arrested than whites. In 2012, a white attorney spent days trying to get himself arrested in Brooklyn for carrying graffiti stencils and spray paint, a Class B misdemeanor. Even when police saw him tagging the City Hall gateposts, they sped past him, ignoring a crime for which 3,598 people were arrested by the New York Police Department the following year.

Before adopting risk-assessment tools in the judicial decision-making process, jurisdictions should demand that any tool being implemented undergo a thorough and independent peer-review process. We need more transparencyand better data to learn whether these risk assessments have disparate impacts on defendants of different races. Foundations and organizations developing risk-assessment tools should be willing to release the data used to build these tools to researchers to evaluate their techniques for internal racial bias and problems of statistical interpretation. Even better, with multiple sources of data, researchers could identify biases in data generated by the criminal justice system before the data is used to make decisions about liberty. Unfortunately, producers of risk-assessment tools — even nonprofit organizations — have not voluntarily released anonymized data and computational details to other researchers, as is now standard in quantitative social science research….(More)”.

Big data is adding a whole new dimension to public spaces – here’s how


 at the Conversation: “Most of us encounter public spaces in our daily lives: whether it’s physical space (a sidewalk, a bench, or a road), a visual element (a panorama, a cityscape) or a mode of transport (bus, train or bike share). But over the past two decades, digital technologies such as smart phones and the internet of things are adding extra layers of information to our public spaces, and transforming the urban environment.

Traditionally, public spaces have been carefully designed by urban planners and architects, and managed by private companies or public bodies. The theory goes that people’s attention and behaviour in public spaces can be guided by the way that architects plan the built environment. Take, for example, Leicester Square in London: the layout of green areas, pathways and benches makes it clear where people are supposed to walk, sit down and look at the natural elements. The public space is a given, which people receive and use within the terms and guidelines provided.

While these ideas are still relevant today, information is now another key material in public spaces. It changes the way that people experience the city. Uber shows us the position of its closest drivers, even when they’re out of sight; route-finding apps such as Google Maps helps us to navigate through unfamiliar territory; Pokemon Go places otherworldly creatures on the pavement right before our eyes.

But we’re not just receiving information – we’re also generating it. Whether you’re “liking” something on Facebook, searching Google, shopping online, or even exchanging an email address for Wi-Fi access; all of the data created by these actions are collected, stored, managed, analysed and brokered to generate monetary value.

Data deluge

But as well as creating profits for private companies, these data provide accurate and continuous updates of how society is evolving, which can be used by governments and designers to manage and design public spaces.

Before big data, the architects designed spaces based on mere assumptions about how people were likely to use them. Success was measured by “small”, localised data methods, such as post-occupancy evaluations, where built projects are observed during their use and assessed against the designers’ original intentions, as well as fitness for purpose and performance. For the most part, the people who used public spaces did not have a say in how they were designed or managed….(More)”

Beyond prediction: Using big data for policy problems


Susan Athey at Science: “Machine-learning prediction methods have been extremely productive in applications ranging from medicine to allocating fire and health inspectors in cities. However, there are a number of gaps between making a prediction and making a decision, and underlying assumptions need to be understood in order to optimize data-driven decision-making…(More)”

Data ideologies of an interested public: A study of grassroots open government data intermediaries


 and  in Big Data & Society: “Government officials claim open data can improve internal and external communication and collaboration. These promises hinge on “data intermediaries”: extra-institutional actors that obtain, use, and translate data for the public. However, we know little about why these individuals might regard open data as a site of civic participation. In response, we draw on Ilana Gershon to conceptualize culturally situated and socially constructed perspectives on data, or “data ideologies.” This study employs mixed methodologies to examine why members of the public hold particular data ideologies and how they vary. In late 2015 the authors engaged the public through a commission in a diverse city of approximately 500,000. Qualitative data was collected from three public focus groups with residents. Simultaneously, we obtained quantitative data from surveys. Participants’ data ideologies varied based on how they perceived data to be useful for collaboration, tasks, and translations. Bucking the “geek” stereotype, only a minority of those surveyed (20%) were professional software developers or engineers. Although only a nascent movement, we argue open data intermediaries have important roles to play in a new political landscape….(More)”