Data for Policy: Data Science and Big Data in the Public Sector


Innar Liiv at OXPOL: “How can big data and data science help policy-making? This question has recently gained increasing attention. Both the European Commission and the White House have endorsed the use of data for evidence-based policy making.

Still, a gap remains between theory and practice. In this blog post, I make a number of recommendations for systematic development paths.

RESEARCH TRENDS SHAPING DATA FOR POLICY

‘Data for policy’ as an academic field is still in its infancy. A typology of the field’s foci and research areas are summarised in the figure below.

 

diagram1

 

Besides the ‘data for policy’ community, there are two important research trends shaping the field: 1) computational social science; and 2) the emergence of politicised social bots.

Computational social science (CSS) is an new interdisciplinary research trend in social science, which tries to transform advances in big data and data science into research methodologies for understanding, explaining and predicting underlying social phenomena.

Social science has a long tradition of using computational and agent-based modelling approaches (e.g.Schelling’s Model of Segregation), but the new challenge is to feed real-life, and sometimes even real-time information into those systems to get gain rapid insights into the validity of research hypotheses.

For example, one could use mobile phone call records to assess the acculturation processes of different communities. Such a project would involve translating different acculturation theories into computational models, researching the ethical and legal issues inherent in using mobile phone data and developing a vision for generating policy recommendations and new research hypothesis from the analysis.

Politicised social bots are also beginning to make their mark. In 2011, DARPA solicited research proposals dealing with social media in strategic communication. The term ‘political bot’ was not used, but the expected results left no doubt about the goals…

The next wave of e-government innovation will be about analytics and predictive models.  Taking advantage of their potential for social impact will require a solid foundation of e-government infrastructure.

The most important questions going forward are as follows:

  • What are the relevant new data sources?
  • How can we use them?
  • What should we do with the information? Who cares? Which political decisions need faster information from novel sources? Do we need faster information? Does it come with unanticipated risks?

These questions barely scratch the surface, because the complex interplay between general advancements of computational social science and hovering satellite topics like political bots will have an enormous impact on research and using data for policy. But, it’s an important start….(More)”

Crowdsourced map of safe drinking water


Springwise: “Just over two years ago, in April 2014, city officials in Flint, Michigan decided to save costs by switching the city’s water supply from Lake Huron to the Flint River. Because of the switch, residents of the town and their children were exposed to dangerous levels of lead. Much of the population suffered from the side effects of lead poisoning, including skin lesions, hair loss, depression and anxiety and in severe cases, permanent brain damage. Media attention, although focussed at first, inevitably died down. To avoid future similar disasters, Sean Montgomery, a neuroscientist and the CEO of technology company, Connected Future Labs, set up CitizenSpring.

CitizenSpring is an app which enables individuals to test their water supply using readily available water testing kits. Users hold a test strip underneath running water, hold the strip to a smartphone camera and press the button. The app then reveals the results of the test, also cataloguing the test results and storing them in the cloud in the form of a digital map. Using what Montgomery describes as “computer vision,” the app is able to detect lead levels in a given water source and confirm whether they exceed the Environmental Protection Agency’s “safe” threshold. The idea is that communities can inform themselves about their own and nearby water supplies in order that they can act as guardians of their own health. “It’s an impoverished data problem,” says Montgomery. “We don’t have enough data. By sharing the results of test[s], people can, say, find out if they’re testing a faucet that hasn’t been tested before.”

CitizenSpring narrowly missed its funding target on Kickstarter. However, collective monitoring can work. We have already seen the power of communities harnessed to crowdsource pollution data in the EU and map conflict zones through user-submitted camera footage….(More)”

Civil Solutions


Driving government transformation through design thinking


Michael McHugh at Federal Times: “According to Gartner, “Design thinking is a multidisciplinary process that builds solutions for complex, intractable problems in a technically feasible, commercially sustainable and emotionally meaningful way.”

Design thinking as an approach puts the focus on people — their likes, dislikes, desires and experience — for designing new services and products. It encourages a free flow of ideas within a team to build and test prototypes by setting a high tolerance for failure. The approach is more holistic, as it considers both human and technological aspects to cater to mission-critical needs. Due to its innovative and agile problem-solving technique, design thinking inspires teams to collaborate and contribute towards driving mission goals.

How Can Design Thinking Help Agencies?

Whether it is problem solving, streamlining a process or increasing the adoption rate of a new service, design thinking calls for agencies to be empathetic towards people’s needs while being open to continuous learning and a willingness to fail — fast. A fail-fast model enables agencies to detect errors during the course of finding a solution, in which they learn from the possible mistakes and then proceed to develop a more suitable solution that is likely to add value to the user.

Consider an example of a federal agency whose legacy inspection application was affecting the productivity of its inspectors. By leveraging an agile approach, the agency built a mobile inspection solution to streamline and automate the inspection process. The methodology involved multiple iterations based on observations and findings from inspector actions. Here is a step-by-step synopsis of this methodology:

  • Problem presentation: Identifying the problems faced by inspectors.
  • Empathize with users: Understanding the needs and challenges of inspectors.
  • Define the problem: Redefining the problem based on input from inspectors.
  • Team collaboration: Brainstorming and discussing multiple solutions.
  • Prototype creation: Determining and building viable design solutions.
  • Testing with constituents: Releasing the prototype and testing it with inspectors.
  • Collection of feedback: Incorporating feedback from pilot testing and making required changes.

The insights drawn from each step helped the agency to design a secure platform in the form of a mobile inspection tool, optimized for tablets with a smartphone companion app for enhanced mobility. Packed with features like rich media capture with video, speech-to-text and photographs, the mobile inspection tool dramatically reduces manual labor and speeds up the on-site inspection process. It delivers significant efficiencies by improving processes, increasing productivity and enhancing the visibility of information. Additionally, its integration with legacy systems helps leverage existing investments, therefore justifying the innovation, which is based on a tightly defined test and learn cycle….(More)”

The SAGE Handbook of Digital Journalism


Book edited by Tamara WitschgeC. W. AndersonDavid Domingo, and Alfred Hermida: “The production and consumption of news in the digital era is blurring the boundaries between professionals, citizens and activists. Actors producing information are multiplying, but still media companies hold central position. Journalism research faces important challenges to capture, examine, and understand the current news environment. The SAGE Handbook of Digital Journalism starts from the pressing need for a thorough and bold debate to redefine the assumptions of research in the changing field of journalism. The 38 chapters, written by a team of global experts, are organised into four key areas:

Section A: Changing Contexts

Section B: News Practices in the Digital Era

Section C: Conceptualizations of Journalism

Section D: Research Strategies

By addressing both institutional and non-institutional news production and providing ample attention to the question ‘who is a journalist?’ and the changing practices of news audiences in the digital era, this Handbook shapes the field and defines the roadmap for the research challenges that scholars will face in the coming decades….(More)”

Data and Democracy


(Free) book by Andrew Therriault:  “The 2016 US elections will be remembered for many things, but for those who work in politics, 2016 may be best remembered as the year that the use of data in politics reached its maturity. Through a collection of essays from leading experts in the field, this report explores how political data science helps to drive everything from overall strategy and messaging to individual voter contacts and advertising.

Curated by Andrew Therriault, former Director of Data Science for the Democratic National Committee, this illuminating report includes first-hand accounts from Democrats, Republicans, and members of the media. Tech-savvy readers will get a comprehensive account of how data analysis has prevailed over political instinct and experience and examples of the challenges these practitioners face.

Essays include:

  • The Role of Data in Campaigns—Andrew Therriault, former Director of Data Science for the Democratic National Committee
  • Essentials of Modeling and Microtargeting—Dan Castleman, cofounder and Director of Analytics at Clarity Campaign Labs, a leading modeler in Democratic politics
  • Data Management for Political Campaigns—Audra Grassia, Deputy Political Director for the Democratic Governors Association in 2014
  • How Technology Is Changing the Polling Industry—Patrick Ruffini, cofounder of Echelon Insights and Founder/Chairman of Engage, was a digital strategist for President Bush in 2004 and for the Republican National Committee in 2006
  • Data-Driven Media Optimization—Alex Lundry, cofounder and Chief Data Scientist at Deep Root Analytics, a leading expert on media and voter analytics, electoral targeting, and political data mining
  • How (and Why) to Follow the Money in Politics—Derek Willis, ProPublica’s news applications developer, formerly with The New York Times
  • Digital Advertising in the Post-Obama Era—Daniel Scarvalone, Associate Director of Research and Data at Bully Pulpit Interactive (BPI), a digital marketer for the Democratic party
  • Election Forecasting in the Media—Natalie Jackson, Senior Polling Editor atThe Huffington Post…(More)”

Managing Federal Information as a Strategic Resource


White House: “Today the Office of Management and Budget (OMB) is releasing an update to the Federal Government’s governing document for the management of Federal information resources: Circular A-130, Managing Information as a Strategic Resource.

The way we manage information technology(IT), security, data governance, and privacy has rapidly evolved since A-130 was last updated in 2000.  In today’s digital world, we are creating and collecting large volumes of data to carry out the Federal Government’s various missions to serve the American people.  This data is duplicated, stored, processed, analyzed, and transferred with ease.  As government continues to digitize, we must ensure we manage data to not only keep it secure, but also allow us to harness this information to provide the best possible service to our citizens.

Today’s update to Circular A-130 gathers in one resource a wide range of policy updates for Federal agencies regarding cybersecurity, information governance, privacy, records management, open data, and acquisitions.  It also establishes general policy for IT planning and budgeting through governance, acquisition, and management of Federal information, personnel, equipment, funds, IT resources, and supporting infrastructure and services.  In particular, A-130 focuses on three key elements to help spur innovation throughout the government:

  • Real Time Knowledge of the Environment.  In today’s rapidly changing environment, threats and technology are evolving at previously unimagined speeds.  In such a setting, the Government cannot afford to authorize a system and not look at it again for years at a time.  In order to keep pace, we must move away from periodic, compliance-driven assessment exercises and, instead, continuously assess our systems and build-in security and privacy with every update and re-design.  Throughout the Circular, we make clear the shift away from check-list exercises and toward the ongoing monitoring, assessment, and evaluation of Federal information resources.
  • Proactive Risk ManagementTo keep pace with the needs of citizens, we must constantly innovate.  As part of such efforts, however, the Federal Government must modernize the way it identifies, categorizes, and handles risk to ensure both privacy and security.  Significant increases in the volume of data processed and utilized by Federal resources requires new ways of storing, transferring, and managing it Circular A-130 emphasizes the need for strong data governance that encourages agencies to proactively identify risks, determine practical and implementable solutions to address said risks, and implement and continually test the solutions.  This repeated testing of agency solutions will help to proactively identify additional risks, starting the process anew.
  • Shared ResponsibilityCitizens are connecting with each other in ways never before imagined.  From social media to email, the connectivity we have with one another can lead to tremendous advances.  The updated A-130 helps to ensure everyone remains responsible and accountable for assuring privacy and security of information – from managers to employees to citizens interacting with government services. …(More)”

The ‘who’ and ‘what’ of #diabetes on Twitter


Mariano Beguerisse-Díaz, Amy K. McLennan, Guillermo Garduño-Hernández, Mauricio Barahona, and Stanley J. Ulijaszek at arXiv: “Social media are being increasingly used for health promotion. Yet the landscape of users and messages in such public fora is not well understood. So far, studies have typically focused either on people suffering from a disease, or on agencies that address it, but have not looked more broadly at all the participants in the debate and discussions. We study the conversation about diabetes on Twitter through the systematic analysis of a large collection of tweets containing the term ‘diabetes’, as well as the interactions between their authors. We address three questions: (1) what themes arise in these messages?; (2) who talks about diabetes and in what capacity?; and (3) which type of users contribute to which themes? To answer these questions, we employ a mixed-methods approach, using techniques from anthropology, network science and information retrieval. We find that diabetes-related tweets fall within broad thematic groups: health information, news, social interaction, and commercial. Humorous messages and messages with references to popular culture appear constantly over time, more than any other type of tweet in this corpus. Top ‘authorities’ are found consistently across time and comprise bloggers, advocacy groups and NGOs related to diabetes, as well as stockmarket-listed companies with no specific diabetes expertise. These authorities fall into seven interest communities in their Twitter follower network. In contrast, the landscape of ‘hubs’ is diffuse and fluid over time. We discuss the implications of our findings for public health professionals and policy makers. Our methods are generally applicable to investigations where similar data are available….(More)”

Countries with strong public service media have less rightwing extremism


Tara Conlan in The Guardian: “Countries that have popular, well-funded public service broadcasters encounter less rightwing extremism and corruption and have more press freedom, a report from the European Broadcasting Union has found.

For the first time, an analysis has been done of the contribution of public service media, such as the BBC, to democracy and society.

Following Brexit and the rise in rightwing extremism across Europe, the report shows the impact strong publicly funded television and radio has had on voter turnout, control of corruption and press freedom.

The EBU, which founded Eurovision, carried out the study across 25 countries after noticing that the more well-funded a country’s public service outlets were, the less likely the nation was to endure extremism.

The report says that in “countries where public service media funding … is higher there tends to be more press freedom” and where they have a higher market share “there also tends to be a higher voter turnout”. It also says there is a strong correlation between how much of a country’s market its public service broadcaster has and the “demand for rightwing extremism” and “control of corruption”.

“These correlations are especially interesting given the current public debates about low participation in elections, corruption and the rise of far right politics across Europe,” said EBU head of media intelligence service Roberto Suárez Candel, who conducted the research….(More)”

See also:  PSM Correlations Report  and Trust in Media 2016

Why Zika, Malaria and Ebola should fear analytics


Frédéric Pivetta at Real Impact Analytics:Big data is a hot business topic. It turns out to be an equally hot topic for the non profit sector now that we know the vital role analytics can play in addressing public health issues and reaching sustainable development goals.

Big players like IBM just announced they will help fight Zika by analyzing social media, transportation and weather data, among other indicators. Telecom data takes it further by helping to predict the spread of disease, identifying isolated and fragile communities and prioritizing the actions of aid workers.

The power of telecom data

Human mobility contributes significantly to epidemic transmission into new regions. However, there are gaps in understanding human mobility due to the limited and often outdated data available from travel records. In some countries, these are collected by health officials in the hospitals or in occasional surveys.

Telecom data, constantly updated and covering a large portion of the population, is rich in terms of mobility insights. But there are other benefits:

  • it’s recorded automatically (in the Call Detail Records, or CDRs), so that we avoid data collection and response bias.
  • it contains localization and time information, which is great for understanding human mobility.
  • it contains info on connectivity between people, which helps understanding social networks.
  • it contains info on phone spending, which allows tracking of socio-economic indicators.

Aggregated and anonymized, mobile telecom data fills the public data gap without questioning privacy issues. Mixing it with other public data sources results in a very precise and reliable view on human mobility patterns, which is key for preventing epidemic spreads.

Using telecom data to map epidemic risk flows

So how does it work? As in any other big data application, the challenge is to build the right predictive model, allowing decision-makers to take the most appropriate actions. In the case of epidemic transmission, the methodology typically includes five steps :

  • Identify mobility patterns relevant for each particular disease. For example, short-term trips for fast-spreading diseases like Ebola. Or overnight trips for diseases like Malaria, as it spreads by mosquitoes that are active only at night. Such patterns can be deduced from the CDRs: we can actually find the home location of each user by looking at the most active night tower, and then tracking calls to identify short or long-term trips. Aggregating data per origin-destination pairs is useful as we look at intercity or interregional transmission flows. And it protects the privacy of individuals, as no one can be singled out from the aggregated data.
  • Get data on epidemic incidence, typically from local organisations like national healthcare systems or, in case of emergency, from NGOs or dedicated emergency teams. This data should be aggregated on the same level of granularity than CDRs.
  • Knowing how many travelers go from one place to another, for how long, and the disease incidence at origin and destination, build an epidemiological model that can account for the way and speed of transmission of the particular disease.
  • With an import/export scoring model, map epidemic risk flows and flag areas that are at risk of becoming the new hotspots because of human travel.
  • On that base, prioritize and monitor public health measures, focusing on restraining mobility to and from hotspots. Mapping risk also allows launching prevention campaigns at the right places and setting up the necessary infrastructure on time. Eventually, the tool reduces public health risks and helps stem the epidemic.

That kind of application works in a variety of epidemiological contexts, including Zika, Ebola, Malaria, Influenza or Tuberculosis. No doubt the global boom of mobile data will proof extraordinarily helpful in fighting these fierce enemies….(More)”