With great power comes great responsibility: crowdsourcing raises methodological and ethical questions for academia


Isabell Stamm and Lina Eklund at LSE Impact Blog: “Social scientists are expanding the landscape of academic knowledge production by adopting online crowdsourcing techniques used by businesses to design, innovate, and produce. Researchers employ crowdsourcing for a number of tasks, such as taking pictures, writing text, recording stories, or digesting web-based data (tweets, posts, links, etc.). In an increasingly competitive academic climate, crowdsourcing offers researchers a cutting-edge tool for engaging with the public. Yet this socio-technical practice emerged as a business procedure rather than a research method and thus contains many hidden assumptions about the world which concretely affect the knowledge produced. With this comes a problematic reduction of research participants into a single, faceless crowd. This requires a critical assessment of crowdsourcing’s methodological assumptions….(More)”

AI, machine learning and personal data


Jo Pedder at the Information Commissioner’s Office Blog: “Today sees the publication of the ICO’s updated paper on big data and data protection.

But why now? What’s changed in the two and a half years since we first visited this topic? Well, quite a lot actually:

  • big data is becoming the norm for many organisations, using it to profile people and inform their decision-making processes, whether that’s to determine your car insurance premium or to accept/reject your job application;
  • artificial intelligence (AI) is stepping out of the world of science-fiction and into real life, providing the ‘thinking’ power behind virtual personal assistants and smart cars; and
  • machine learning algorithms are discovering patterns in data that traditional data analysis couldn’t hope to find, helping to detect fraud and diagnose diseases.

The complexity and opacity of these types of processing operations mean that it’s often hard to know what’s going on behind the scenes. This can be problematic when personal data is involved, especially when decisions are made that have significant effects on people’s lives. The combination of these factors has led some to call for new regulation of big data, AI and machine learning, to increase transparency and ensure accountability.

In our view though, whilst the means by which the processing of personal data are changing, the underlying issues remain the same. Are people being treated fairly? Are decisions accurate and free from bias? Is there a legal basis for the processing? These are issues that the ICO has been addressing for many years, through oversight of existing European data protection legislation….(More)”

From Nairobi to Manila, mobile phones are changing the lives of bus riders


Shomik Mehnidrata at Transport for Development Blog: “Every day around the world, millions of people rely on buses to get around. In many cities, these services carry the bulk of urban trips, especially in Africa and Latin America. They are known by many different names—matatus, dalalas, minibus taxis, colectivos, diablos rojos, micros, etc.—but all have one thing in common: they are either hardly regulated… or not regulated at all. Although buses play a critical role in the daily life of many urban dwellers, there are a variety of complaints that have spurred calls for improvement and reform.

However, we are now witnessing a different, more organic kind of change that is disrupting the world of informal buses using ubiquitous cheap sensors and mobile technology. One hotbed of innovation is Nairobi, Kenya’s bustling capital. Two years ago, Nairobi made a splash in the world of urban transport by mapping all the routes of informal matatus. Other countries have sought to replicate this model, with open source tools and crowdsourcing supporting similar efforts in Mexico, Manila, and beyond. Back in Nairobi, the Magic Bus app helps commuters use sms services to reserve and pay for seats in matatus; in September 2016, MagicBus’ potential for easing commuter pain in the Kenyan capital was rewarded with a $1 million prize. Other programs implemented in collaboration with insurers and operators are experimenting with on-board sensors to identify and correct dangerous driver behavior such as sudden braking and acceleration. Ma3Route, also in Nairobi (there is a pattern here!) used crowdsourcing to identify dangerous drivers as well as congestion. At the same time, operators are upping their game: using technology to improve system management, control and routing in La Paz, and working with universities to improve their financial planning and capabilities in Cape Town.

Against this backdrop, the question is then: can these ongoing experimental initiatives offer a coherent alternative to formal reform? …(More)”.

Think tanks can transform into the standard-setters and arbiters of quality of 21st century policy analysis


Marcos Hernando, Diane Stone and Hartwig Pautz in LSE Impact Blog: “Last month, the annual Global GoTo Think Tank Index Report was released, amid claims “think tanks are more important than ever before”. It is unclear whether this was said in spite of, or because of, the emergence of ‘post-truth politics’. Experts have become targets of anger and derision, struggling to communicate facts and advance evidence-based policy. Popular dissatisfaction with ‘policy wonks’ has meant think tanks face challenges to their credibility at a time they are under pressure from increased competition. The 20th century witnessed the rise of the think tank, but the 21st century might yet see its decline. To avoid such a fate, we believe think tanks must reposition themselves as the credible arbiters able to distinguish between poor analysis and good quality research….

In recent years, think tanks have faced three major challenges: financial limits in a world characterised by austerity; increased competition both among think tanks and with other types of policy research organisations; and a growing questioning of, and popular dissatisfaction with, the role of the ‘expert’ itself. Here, we look at each of these in turn..

Nevertheless, think tanks do retain some competitive advantages. The rapid proliferation of knowledge complicates the absorption of information among policymakers. To put it simply, there are limits to the quantity and diversity of knowledge that government actors can make sense of, especially in states hollowed out by austerity programmes and burdened by ever-higher public demands. Managing the over-supply of (occasionally dubious) evidence and policy analysis from research-based NGOs, universities and advocacy groups has become a problem of governance. But this issue also opens a space for the reinvention of think tanks.

With information overload comes a need for talented editors and skilled curators. That is, organisations as much as individuals which help those within policy processes to discern the reliability and usefulness of analytic products. Potentially, think tanks could transform into significant standard-setters and arbiters of quality of 21st century policy analysis. If they do not, they risk becoming just another group in the overpopulated ‘post-truth’ policy advice industry….(More)”

Open-Sourcing Google Earth Enterprise


Geo Developers Blog: “We are excited to announce that we are open-sourcing Google Earth Enterprise (GEE), the enterprise product that allows developers to build and host their own private maps and 3D globes. With this release, GEE Fusion, GEE Server, and GEE Portable Server source code (all 470,000+ lines!) will be published on GitHub under the Apache2 license in March.

Originally launched in 2006, Google Earth Enterprise provides customers the ability to build and host private, on-premise versions of Google Earth and Google Maps. In March 2015, we announced the deprecation of the product and the end of all sales. To provide ample time for customers to transition, we have provided a two year maintenance period ending on March 22, 2017. During this maintenance period, product updates have been regularly shipped and technical support has been available to licensed customers….

GCP is increasingly used as a source for geospatial data. Google’s Earth Engine has made available over a petabyte of raster datasets which are readily accessible and available to the public on Google Cloud Storage. Additionally, Google uses Cloud Storage to provide data to customers who purchase Google Imagerytoday. Having access to massive amounts of geospatial data, on the same platform as your flexible compute and storage, makes generating high quality Google Earth Enterprise Databases and Portables easier and faster than ever.

We will be sharing a series of white papers and other technical resources to make it as frictionless as possible to get open source GEE up and running on Google Cloud Platform. We are excited about the possibilities that open-sourcing enables, and we trust this is good news for our community. We will be sharing more information when we launch the code in March on GitHub. For general product information, visit the Google Earth Enterprise Help Center. Review the essential and advanced training for how to use Google Earth Enterprise, or learn more about the benefits of Google Cloud Platform….(More)”

The science of society: From credible social science to better social policies


Nancy Cartwright and Julian Reiss at LSE Blog: “Society invests a great deal of money in social science research. Surely the expectation is that some of it will be useful not only for understanding ourselves and the societies we live in but also for changing them? This is certainly the hope of the very active evidence-based policy and practice movement, which is heavily endorsed in the UK both by the last Labour Government and by the current Coalition Government. But we still do not know how to use the results of social science in order to improve society. This has to change, and soon.

Last year the UK launched an extensive – and expensive – new What Works Network that, as the Government press release describes, consists of “two existing centres of excellence – the National Institute for Health and Clinical Excellence (NICE) and the Educational Endowment Foundation – plus four new independent institutions responsible for gathering, assessing and sharing the most robust evidence to inform policy and service delivery in tackling crime, promoting active and independent ageing, effective early intervention, and fostering local economic growth”.

This is an exciting and promising initiative. But it faces a series challenge: we remain unable to build real social policies based on the results of social science or to predict reliably what the outcomes of these policies will actually be. This contrasts with our understanding of how to establish the results in the first place.There we have a handle on the problem. We have a reasonable understanding of what kinds of methods are good for establishing what kinds of results and with what (at least rough) degrees of certainty.

There are methods – well thought through – that social scientists learn in the course of their training for constructing a questionnaire, running a randomised controlled trial, conducting an ethnographic study, looking for patterns in large data sets. There is nothing comparably explicit and well thought through about how to use social science knowledge to help predict what will happen when we implement a proposed policy in real, complex situations. Nor is there anything to help us estimate and balance the effectiveness, the evidence, the chances of success, the costs, the benefits, the winners and losers, and the social, moral, political and cultural acceptability of the policy.

To see why this is so difficult think of an analogy: not building social policies but building material technologies. We do not just read off instructions for building a laser – which may ultimately be used to operate on your eyes – from knowledge of basic science. Rather, we piece together a detailed model using heterogeneous knowledge from a mix of physics theories, from various branches of engineering, from experience of how specific materials behave, from the results of trial-and-error, etc. By analogy, building a successful social policy equally requires a mix of heterogeneous kinds of knowledge from radically different sources. Sometimes we are successful at doing this and some experts are very good at it in their own specific areas of expertise. But in both cases – both for material technology and for social technology – there is no well thought through, defensible guidance on how to do it: what are better and worse ways to proceed, what tools and information might be needed, and how to go about getting these. This is true whether we look for general advice that might be helpful across subject areas or advice geared to specific areas or specific kinds of problems. Though we indulge in social technology – indeed we can hardly avoid it – and are convinced that better social science will make for better policies, we do not know how to turn that conviction into a reality.

This presents a real challenge to the hopes for evidence-based policy….(More)”

Can you crowdsource water quality data?


Pratibha Mistry at The Water Blog (Worldbank): “The recently released Contextual Framework for Crowdsourcing Water Quality Data lays out a strategy for citizen engagement in decentralized water quality monitoring, enabled by the “mobile revolution.”

According to the WHO, 1.8 billion people lack access to safe drinking water worldwide. Poor source water quality, non-existent or insufficient treatment, and defects in water distribution systems and storage mean these consumers use water that often doesn’t meet the WHO’s Guidelines for Drinking Water Quality.

The crowdsourcing framework develops a strategy to engage citizens in measuring and learning about the quality of their own drinking water. Through their participation, citizens provide utilities and water supply agencies with cost-effective water quality data in near-real time. Following a typical crowdsourcing model: consumers use their mobile phones to report water quality information to a central service. That service receives the information, then repackages and shares it via mobile phone messages, websites, dashboards, and social media. Individual citizens can thus be educated about their water quality, and water management agencies and other stakeholders can use the data to improve water management; it’s a win-win.

A well-implemented crowdsourcing project both depends on and benefits end users.Source: Figure modified from Hutchings, M., Dev, A., Palaniappan, M., Srinivasan, V., Ramanathan, N., Taylor, J.  2012. “mWASH: Mobile Phone Applications for the Water, Sanitation, and Hygiene Sector.” Pacific Institute, Oakland, California.  114 p.  (Link to full text)

Several groups, from the private sector to academia to non-profits, have taken a recent interest in developing a variety of so-called mWASH apps (mobile phone applications for the water, sanitation, and hygiene WASH sector).  A recent academic study analyzed how mobile phones might facilitate the flow of water quality data between water suppliers and public health agencies in Africa. USAID has invested in piloting a mobile application in Tanzania to help consumers test their water for E. coli….(More)”

What does Big Data mean to public affairs research?


Ines Mergel, R. Karl Rethemeyer, and Kimberley R. Isett at LSE’s The Impact Blog: “…Big Data promises access to vast amounts of real-time information from public and private sources that should allow insights into behavioral preferences, policy options, and methods for public service improvement. In the private sector, marketing preferences can be aligned with customer insights gleaned from Big Data. In the public sector however, government agencies are less responsive and agile in their real-time interactions by design – instead using time for deliberation to respond to broader public goods. The responsiveness Big Data promises is a virtue in the private sector but could be a vice in the public.

Moreover, we raise several important concerns with respect to relying on Big Data as a decision and policymaking tool. While in the abstract Big Data is comprehensive and complete, in practice today’sversion of Big Data has several features that should give public sector practitioners and scholars pause. First, most of what we think of as Big Data is really ‘digital exhaust’ – that is, data collected for purposes other than public sector operations or research. Data sets that might be publicly available from social networking sites such as Facebook or Twitter were designed for purely technical reasons. The degree to which this data lines up conceptually and operationally with public sector questions is purely coincidental. Use of digital exhaust for purposes not previously envisioned can go awry. A good example is Google’s attempt to predict the flu based on search terms.

Second, we believe there are ethical issues that may arise when researchers use data that was created as a byproduct of citizens’ interactions with each other or with a government social media account. Citizens are not able to understand or control how their data is used and have not given consent for storage and re-use of their data. We believe that research institutions need to examine their institutional review board processes to help researchers and their subjects understand important privacy issues that may arise. Too often it is possible to infer individual-level insights about private citizens from a combination of data points and thus predict their behaviors or choices.

Lastly, Big Data can only represent those that spend some part of their life online. Yet we know that certain segments of society opt in to life online (by using social media or network-connected devices), opt out (either knowingly or passively), or lack the resources to participate at all. The demography of the internet matters. For instance, researchers tend to use Twitter data because its API allows data collection for research purposes, but many forget that Twitter users are not representative of the overall population. Instead, as a recent Pew Social Media 2016 update shows, only 24% of all online adults use Twitter. Internet participation generally is biased in terms of age, educational attainment, and income – all of which correlate with gender, race, and ethnicity. We believe therefore that predictive insights are potentially biased toward certain parts of the population, making generalisations highly problematic at this time….(More)”

Toward Evidence-Based Open Governance by Curating and Exchanging Research: OGRX 2.0


Andrew Young and Stefaan Verhulst at OGRX : “The Open Governance Research Exchange (OGRX) is a platform that seeks to identify, collect and share curated insights on new ways of solving public problems. It was created last year by the GovLab, World Bank Digital Engagement Evaluation Team and mySociety. Today, while more than 3000 representatives from more than 70 countries are gathering in Paris for the Open Government Partnership Summit, we are launching OGRX 2.0 with new features and functionalities to further help users identify the signal in the noise of research and evidence on more innovative means of governing….

What is new?

First, the new OGRX Blog provides an outlet for more easily digestible and shareable insights from the open governance research community. OGRX currently features over 600 publications on governance innovation – but how to digest and identify insights? This space will provide summaries of important works, analyses of key trends in the field of research, and guest posts from researchers working at the leading edge of governance innovation across regions and domains. Check back often to stay on top of what’s new in open governance research.

Second, the new OGRX Selected Readings series offers curated reading lists from well-known experts in open governance. These Selected Readings will give readers a sense of how to jumpstart their knowledge by focusing on those publications that have been curated by those in the known about the topics at hand. Today we are launching this new series with the Selected Readings on Civic Technology, curated by mySociety’s head of research Rebecca Rumbul; and the Selected Readings on Policy Informatics, curated by Erik Johnston of the MacArthur Foundation Research Network on Opening Governance and director of the Arizona State University Center for Policy Informatics. New Selected Readings will be posted each month, so check back often!…Watch this space and #OGRX to stay abreast of new developments….”

How the Circle Line rogue train was caught with data


Daniel Sim at the Data.gov.sg Blog: “Singapore’s MRT Circle Line was hit by a spate of mysterious disruptions in recent months, causing much confusion and distress to thousands of commuters.

Like most of my colleagues, I take a train on the Circle Line to my office at one-north every morning. So on November 5, when my team was given the chance to investigate the cause, I volunteered without hesitation.

 From prior investigations by train operator SMRT and the Land Transport Authority (LTA), we already knew that the incidents were caused by some form of signal interference, which led to loss of signals in some trains. The signal loss would trigger the emergency brake safety feature in those trains and cause them to stop randomly along the tracks.

But the incidents — which first happened in August — seemed to occur at random, making it difficult for the investigation team to pinpoint the exact cause.

We were given a dataset compiled by SMRT that contained the following information:

  • Date and time of each incident
  • Location of incident
  • ID of train involved
  • Direction of train…

LTA and SMRT eventually published a joint press release on November 11 to share the findings with the public….

When we first started, my colleagues and I were hoping to find patterns that may be of interest to the cross-agency investigation team, which included many officers at LTA, SMRT and DSTA. The tidy incident logs provided by SMRT and LTA were instrumental in getting us off to a good start, as minimal cleaning up was required before we could import and analyse the data. We were also gratified by the effective follow-up investigations by LTA and DSTA that confirmed the hardware problems on PV46.

From the data science perspective, we were lucky that incidents happened so close to one another. That allowed us to identify both the problem and the culprit in such a short time. If the incidents were more isolated, the zigzag pattern would have been less apparent, and it would have taken us more time — and data — to solve the mystery….(More).”