Policy in the data age: Data enablement for the common good


Karim Tadjeddine and Martin Lundqvist of McKinsey: “Like companies in the private sector, governments from national to local can smooth the process of digital transformation—and improve services to their “customers,” the public—by adhering to certain core principles. Here’s a road map.

By virtue of their sheer size, visibility, and economic clout, national, state or provincial, and local governments are central to any societal transformation effort, in particular a digital transformation. Governments at all levels, which account for 30 to 50 percent of most countries’ GDP, exert profound influence not only by executing their own digital transformations but also by catalyzing digital transformations in other societal sectors (Exhibit 1).

The tremendous impact that digital services have had on governments and society has been the subject of extensive research that has documented the rapid, extensive adoption of public-sector digital services around the globe. We believe that the coming data revolution will be even more deeply transformational and that data enablement will produce a radical shift in the public sector’s quality of service, empowering governments to deliver better constituent service, better policy outcomes, and more-productive operations….(More)”

Data and Democracy


(Free) book by Andrew Therriault:  “The 2016 US elections will be remembered for many things, but for those who work in politics, 2016 may be best remembered as the year that the use of data in politics reached its maturity. Through a collection of essays from leading experts in the field, this report explores how political data science helps to drive everything from overall strategy and messaging to individual voter contacts and advertising.

Curated by Andrew Therriault, former Director of Data Science for the Democratic National Committee, this illuminating report includes first-hand accounts from Democrats, Republicans, and members of the media. Tech-savvy readers will get a comprehensive account of how data analysis has prevailed over political instinct and experience and examples of the challenges these practitioners face.

Essays include:

  • The Role of Data in Campaigns—Andrew Therriault, former Director of Data Science for the Democratic National Committee
  • Essentials of Modeling and Microtargeting—Dan Castleman, cofounder and Director of Analytics at Clarity Campaign Labs, a leading modeler in Democratic politics
  • Data Management for Political Campaigns—Audra Grassia, Deputy Political Director for the Democratic Governors Association in 2014
  • How Technology Is Changing the Polling Industry—Patrick Ruffini, cofounder of Echelon Insights and Founder/Chairman of Engage, was a digital strategist for President Bush in 2004 and for the Republican National Committee in 2006
  • Data-Driven Media Optimization—Alex Lundry, cofounder and Chief Data Scientist at Deep Root Analytics, a leading expert on media and voter analytics, electoral targeting, and political data mining
  • How (and Why) to Follow the Money in Politics—Derek Willis, ProPublica’s news applications developer, formerly with The New York Times
  • Digital Advertising in the Post-Obama Era—Daniel Scarvalone, Associate Director of Research and Data at Bully Pulpit Interactive (BPI), a digital marketer for the Democratic party
  • Election Forecasting in the Media—Natalie Jackson, Senior Polling Editor atThe Huffington Post…(More)”

White House, Transportation Dept. want help using open data to prevent traffic crashes


Samantha Ehlinger in FedScoop: “The Transportation Department is looking for public input on how to better interpret and use data on fatal crashes after 2015 data revealed a startling spike of 7.2 percent more deaths in traffic accidents that year.

Looking for new solutions that could prevent more deaths on the roads, the department released three months earlier than usual the 2015 open dataset about each fatal crash. With it, the department and the White House announced a call to action for people to use the data set as a jumping off point for a dialogue on how to prevent crashes, as well as understand what might be causing the spike.

“What we’re ultimately looking for is getting more people engaged in the data … matching this with other publicly available data, or data that the private sector might be willing to make available, to dive in and to tell these stories,” said Bryan Thomas, communications director for the National Highway Traffic Safety Administration, to FedScoop.

One striking statistic was that “pedestrian and pedalcyclist fatalities increased to a level not seen in 20 years,” according to a DOT press release. …

“We want folks to be engaged directly with our own data scientists, so we can help people through the dataset and help answer their questions as they work their way through, bounce ideas off of us, etc.,” Thomas said. “We really want to be accessible in that way.”

He added that as ideas “come to fruition,” there will be opportunities to present what people have learned.

“It’s a very, very rich data set, there’s a lot of information there,” Thomas said. “Our own ability is, frankly, limited to investigate all of the questions that you might have of it. And so we want to get the public really diving in as well.”…

Here are the questions “worth exploring,” according to the call to action:

  • How might improving economic conditions around the country change how Americans are getting around? What models can we develop to identify communities that might be at a higher risk for fatal crashes?
  • How might climate change increase the risk of fatal crashes in a community?
  • How might we use studies of attitudes toward speeding, distracted driving, and seat belt use to better target marketing and behavioral change campaigns?
  • How might we monitor public health indicators and behavior risk indicators to target communities that might have a high prevalence of behaviors linked with fatal crashes (drinking, drug use/addiction, etc.)? What countermeasures should we create to address these issues?”…(More)”

Counterterrorism and Counterintelligence: Crowdsourcing Approach


Literature review by Sanket Subhash Khanwalkar: “Despite heavy investment by the United States and several other national governments, terrorism related problems are rising at an alarming rate. Lone-wolf terrorism, in particular, in the last decade, has caused 70% of all terrorism related deaths in the US and the West. This literature survey describes lone-wolf terrorism in detail to analyse its structure, characteristics, strengths and weaknesses. It also investigates crowdsourcing intelligence, as an unorthodox approach to counter lone-wolf terrorism, by reviewing its current state-of-the-art and identifying the areas for improvement….(More)”

Why Zika, Malaria and Ebola should fear analytics


Frédéric Pivetta at Real Impact Analytics:Big data is a hot business topic. It turns out to be an equally hot topic for the non profit sector now that we know the vital role analytics can play in addressing public health issues and reaching sustainable development goals.

Big players like IBM just announced they will help fight Zika by analyzing social media, transportation and weather data, among other indicators. Telecom data takes it further by helping to predict the spread of disease, identifying isolated and fragile communities and prioritizing the actions of aid workers.

The power of telecom data

Human mobility contributes significantly to epidemic transmission into new regions. However, there are gaps in understanding human mobility due to the limited and often outdated data available from travel records. In some countries, these are collected by health officials in the hospitals or in occasional surveys.

Telecom data, constantly updated and covering a large portion of the population, is rich in terms of mobility insights. But there are other benefits:

  • it’s recorded automatically (in the Call Detail Records, or CDRs), so that we avoid data collection and response bias.
  • it contains localization and time information, which is great for understanding human mobility.
  • it contains info on connectivity between people, which helps understanding social networks.
  • it contains info on phone spending, which allows tracking of socio-economic indicators.

Aggregated and anonymized, mobile telecom data fills the public data gap without questioning privacy issues. Mixing it with other public data sources results in a very precise and reliable view on human mobility patterns, which is key for preventing epidemic spreads.

Using telecom data to map epidemic risk flows

So how does it work? As in any other big data application, the challenge is to build the right predictive model, allowing decision-makers to take the most appropriate actions. In the case of epidemic transmission, the methodology typically includes five steps :

  • Identify mobility patterns relevant for each particular disease. For example, short-term trips for fast-spreading diseases like Ebola. Or overnight trips for diseases like Malaria, as it spreads by mosquitoes that are active only at night. Such patterns can be deduced from the CDRs: we can actually find the home location of each user by looking at the most active night tower, and then tracking calls to identify short or long-term trips. Aggregating data per origin-destination pairs is useful as we look at intercity or interregional transmission flows. And it protects the privacy of individuals, as no one can be singled out from the aggregated data.
  • Get data on epidemic incidence, typically from local organisations like national healthcare systems or, in case of emergency, from NGOs or dedicated emergency teams. This data should be aggregated on the same level of granularity than CDRs.
  • Knowing how many travelers go from one place to another, for how long, and the disease incidence at origin and destination, build an epidemiological model that can account for the way and speed of transmission of the particular disease.
  • With an import/export scoring model, map epidemic risk flows and flag areas that are at risk of becoming the new hotspots because of human travel.
  • On that base, prioritize and monitor public health measures, focusing on restraining mobility to and from hotspots. Mapping risk also allows launching prevention campaigns at the right places and setting up the necessary infrastructure on time. Eventually, the tool reduces public health risks and helps stem the epidemic.

That kind of application works in a variety of epidemiological contexts, including Zika, Ebola, Malaria, Influenza or Tuberculosis. No doubt the global boom of mobile data will proof extraordinarily helpful in fighting these fierce enemies….(More)”

Open Data for Social Change and Sustainable Development


Special issue of the Journal of Community Informatics edited by Raed M. Sharif and Francois Van Schalkwyk: “As the second phase of the Emerging Impacts of Open Data in Developing Countries (ODDC) drew to a close, discussions started on a possible venue for publishing some of the papers that emerged from the research conducted by the project partners. In 2012 the Journal of Community Informatics published a special issue titled ‘Community Informatics and Open Government Data’. Given the journal’s previous interest in the field of open data, its established reputation and the fact that it is a peer-reviewed open access journal, the Journal of Community Informatics was approached and agreed to a second special issue with a focus on open data. A closed call for papers was sent out to the project research partners. Shortly afterwards, the first Open Data Research Symposium was held ahead of the International Open Data Conference 2015 in Ottawa, Canada. For the first time, a forum was provided to academics and researchers to present papers specifically on open data. Again there were discussions about an appropriate venue to publish selected papers from the Symposium. The decision was taken by the Symposium Programme Committee to invite the twenty plus presenters to submit full papers for consideration in the special issue.

The seven papers published in this special issue are those that were selected through a double-blind peer review process. Researchers are often given a rough ride by open data advocates – the research community is accused of taking too long, not being relevant enough and of speaking in tongues unintelligible to social movements and policy-makers. And yet nine years after the ground-breaking meeting in Sebastopol at which the eight principles of open government data were penned, seven after President Obama injected political legitimacy into a movement, and five after eleven nation states formed the global Open Government Partnership (OGP), which has grown six-fold in membership; an email crosses our path in which the authors of a high-level report commit to developing a comprehensive understanding of a continental open data ecosystem through an examination of open data supply. Needless to say, a single example is not necessarily representative of global trends in thinking about open data. Yet, the focus on government and on the supply of open data by open data advocates – with little consideration of open data use, the differentiation of users, intermediaries, power structures or the incentives that propel the evolution of ecosystems – is still all too common. Empirical research has already revealed the limitations of ‘supply it and they will use it’ open data practices, and has started to fill critical knowledge gaps to develop a more holistic understanding of the determinants of effective open data policy and practice. As open data policies and practices evolve, the need to capture the dynamics of this evolution and to trace unfolding outcomes becomes critical to advance a more efficient and progressive field of research and practice. The trajectory of the existing body of literature on open data and the role of public authorities, both local and national, in the provision of open data

As open data policies and practices evolve, the need to capture the dynamics of this evolution and to trace unfolding outcomes becomes critical to advance a more efficient and progressive field of research and practice. The trajectory of the existing body of literature on open data and the role of public authorities, both local and national, in the provision of open data is logical and needed in light of the central role of government in producing a wide range of types and volumes of data. At the same time, the complexity of open data ecosystem and the plethora of actors (local, regional and global suppliers, intermediaries and users) makes a compelling case for opening avenues for more diverse discussion and research beyond the supply of open data. The research presented in this special issue of the Journal of Community Informatics touches on many of these issues, sets the pace and contributes to the much-needed knowledge base required to promote the likelihood of open data living up to its promise. … (More)”

How Medical Crowdsourcing Empowers Patients & Doctors


Rob Stretch at Rendia: “Whether you’re a solo practitioner in a rural area, or a patient who’s bounced from doctor to doctor with adifficult–to-diagnose condition, there are many reasons why you might seek out expert medical advice from a larger group. Fortunately, in 2016, seeking feedback from other physicians or getting a second opinion is as easy as going online.

“Medical crowdsourcing” sites and apps are gathering steam, from provider-only forums likeSERMOsolves and Figure 1, to patient-focused sites like CrowdMed. They share the same mission of empowering doctors and patients, reducing misdiagnosis, and improving medicine. Is crowdsourcing the future of medicine? Read on to find out more.

Fixing misdiagnosis

An estimated 10 percent to 20 percent of medical cases are misdiagnosed, even more than drug errors and surgery on the wrong patient or body part, according to the National Center for Policy Analysis. And diagnostic errors are the leading cause of malpractice litigation. Doctors often report that with many of their patient cases, they would benefit from the support and advice of their peers.

The photo-sharing app for health professionals, Figure 1, is filling that need. Since we reported on it last year, the app has reached 1 million users and added a direct-messaging feature. The app is geared towards verified medical professionals, and goes to great lengths to protect patient privacy in keeping with HIPAAlaws. According to co-founder and CEO Gregory Levey, an average of 10,000 unique users check in toFigure 1 every hour, and medical professionals and students in 190 countries currently use the app.

Using Figure 1 to crowdsource advice from the medical community has saved at least one life. EmilyNayar, a physician assistant in rural Oklahoma and a self-proclaimed “Figure 1 addict,” told Wired magazine that because of photos she’d seen on the app, she was able to correctly diagnose a patient with shingles meningitis. Another doctor had misdiagnosed him previously, and the wrong medication could have killed him.

Collective knowledge at zero cost

In addition to serving as “virtual colleagues” for isolated medical providers, crowdsourcing forums can pool knowledge from an unprecedented number of doctors in different specialties and even countries,and can do so very quickly.

When we first reported on SERMO, the company billed itself as a “virtual doctors’ lounge.” Now, the global social network with 600,000 verified, credentialed physician members has pivoted to medical crowdsourcing with SERMOsolves, one of its most popular features, according to CEO Peter Kirk.

“Crowdsourcing patient cases through SERMOsolves is an ideal way for physicians to gain valuable information from the collective knowledge of hundreds of physicians instantly,” he said in a press release.According to SERMO, 3,500 challenging patient cases were posted in 2014, viewed 700,000 times, and received 50,000 comments. Most posted cases received responses within 1.5 hours and were resolved within a day. “We have physicians from more than 96 specialties and subspecialties posting on the platform, working together to share their valuable insights at zero cost to the healthcare system.”

While one early user of SERMO wrote on KevinMD.com that he felt the site’s potential was overshadowed by the anonymous rants and complaining, other users have noted that the medical crowdsourcing site has,like Figure 1, directly benefitted patients.

In an article on PhysiciansPractice.com, Richard Armstrong, M.D., cites the example of a family physician in Canada who posted a case of a young girl with an E. coli infection. “Physicians from around the world immediately began to comment and the recommendations resulted in a positive outcome for the patient.This instance offered cross-border learning experiences for the participating doctors, not only regarding the specific medical issue but also about how things are managed in different health systems,” wrote Dr.Armstrong.

Patients get proactive

While patients have long turned to social media to (questionably) crowdsource their medical queries, there are now more reputable sources than Facebook.

Tech entrepreneur Jared Heyman launched the health startup CrowdMed in 2013 after his sister endured a “terrible, undiagnosed medical condition that could have killed her,” he told the Wall Street Journal. She saw about 20 doctors over three years, racking up six-figure medical bills. The NIH Undiagnosed DiseaseProgram finally gave her a diagnosis: fragile X-associated primary ovarian insufficiency, a rare disease that affects just 1 in 15,000 women. A hormone patch resolved her debilitating symptoms….(More)”

Open Data for Developing Economies


Scan of the literature by Andrew Young, Stefaan Verhulst, and Juliet McMurren: This edition of the GovLab Selected Readings was developed as part of the Open Data for Developing Economies research project (in collaboration with WebFoundation, USAID and fhi360). Special thanks to Maurice McNaughton, Francois van Schalkwyk, Fernando Perini, Michael Canares and David Opoku for their input on an early draft. Please contact Stefaan Verhulst (stefaan@thegovlab.org) for any additional input or suggestions.

Open data is increasingly seen as a tool for economic and social development. Across sectors and regions, policymakers, NGOs, researchers and practitioners are exploring the potential of open data to improve government effectiveness, create new economic opportunity, empower citizens and solve public problems in developing economies. Open data for development does not exist in a vacuum – rather it is a phenomenon that is relevant to and studied from different vantage points including Data4Development (D4D), Open Government, the United Nations’ Sustainable Development Goals (SDGs), and Open Development. The below-selected readings provide a view of the current research and practice on the use of open data for development and its relationship to related interventions.

Selected Reading List (in alphabetical order)

  • Open Data and Open Development…
  • Open Data and Developing Countries (National Case Studies)….(More)”

The Potential of M-health for Improved Data Use


IDS Evidence Report: “The Institute of Development Studies (IDS), in partnership with World Vision Indonesia, are exploring whether a recently implemented nutrition surveillance intervention, known as M-health, is being used to improve community-based data collection on nutrition.

The M-health mobile phone application has been integrated into the Indonesian national nutrition service delivery through the community-based health service called ‘posyandu’. Established in 1986, the posyandu is Indonesia’s main national community nutrition programme. It functions at the village level, enabling communities to access primary health care. The aim of the intervention is to reduce maternal, infant and child (under five) mortality rates. The posyandu involves five priority programmes: maternal and child health, which includes the ‘weighing post’ (growth monitoring); family planning; immunisation; nutrition, which includes nutrition counselling; and diarrhoea prevention and treatment.

The programme works by the mobile phone application (M-health) automatically sending a referral to health workers at the sub-district-level in cases where a child does not meet the required growth targets. The application also provides the health community-based cadres with reminders and steps to accurately plan follow-up visits. These data are then sent to the community health centres at the sub-district-level, known in Indonesia as the puskesmas.

In the period 2013–15, researchers at IDS worked with World Vision Indonesia to assess whether data produced through mobile phone technology might trigger faster response by nutrition stakeholders. This short report supports ongoing work and focuses on how posyandu-level data might be used by different stakeholders….(More)”

Matchmaker, matchmaker make me a mortgage: What policymakers can learn from dating websites


Angelina Carvalho, Chiranjit Chakraborty and Georgia Latsi at Bank Underground: “Policy makers have access to more and more detailed datasets. These can be joined together to give an unprecedentedly rich description of the economy. But the data are often noisy and individual entries are not uniquely identifiable. This leads to a trade-off: very strict matching criteria may result in a limited and biased sample; making them too loose risks inaccurate data. The problem gets worse when joining large datasets as the potential number of matches increases exponentially. Even with today’s astonishing computer power, we need efficient techniques. In this post we describe a bipartite matching algorithm on such big data to deal with these issues. Similar algorithms are often used in online dating, closely modelled as the stable marriage problem.

The home-mover problem

The housing market matters and affects almost everything that central banks care about. We want to know why, when and how people move home. And a lot do move: one in nine UK households in 2013/4 according to the Office for National Statistics (ONS). Fortunately, it is also a market that we have an increasing amount of information about. We are going to illustrate the use of the matching algorithm in the context of identifying the characteristics of these movers and the mortgages that many of them took out.

A Potential Solution

The FCA’s Product Sales Data (PSD) on owner-occupied mortgage lending contains loan level product, borrower and property characteristics for all loans originated in the UK since Q2 2005. This dataset captures the attributes of each loan at the point of origination but does not follow the borrowers afterwards. Hence, it does not meaningfully capture if a loan was transferred to another property or closed for certain reason. Also, there is no unique borrower identifier and that is why we cannot easily monitor if a borrower repaid their old mortgage and got a new one against another property.

However, the dataset identify whether a borrower is a first time buyer or a home-mover, together with other information. Even though we do not have information before 2005, we can still try to use this dataset to identify some of the owners’ moving patterns. We try to find from where a home-mover may have moved (origination point) and who moved in to his/her vacant property. If we can successfully track the movers, it will also help us to remove corresponding old mortgages to calculate the stock of mortgages from our flow data. A previous Bank Underground post showed how probabilistic record linkage techniques can be used to join related datasets that do not have unique common identifiers.  We have used bipartite graph matching techniques here to extend those ideas….(More)”