NZ to perform urgent algorithm ‘stocktake’ fearing data misuse within government


Asha McLean at ZDNet: “The New Zealand government has announced it will be assessing how government agencies are using algorithms to analyse data, hoping to ensure transparency and fairness in decisions that affect citizens.

A joint statement from Minister for Government Digital Services Clare Curran and Minister of Statistics James Shaw said the algorithm “stocktake” will be conducted with urgency, but cites only the growing interest in data analytics as the reason for the probe.

“The government is acutely aware of the need to ensure transparency and accountability as interest grows regarding the challenges and opportunities associated with emerging technology such as artificial intelligence,” Curran said.

It was revealed in April that Immigration New Zealand may have been using citizen data for less than desirable purposes, with claims that data collected through the country’s visa application process that was being used to determine those in breach of their visa conditions was in fact filtering people based on their age, gender, and ethnicity.

Rejecting the idea the data-collection project was racial profiling, Immigration Minister Iain Lees-Galloway told Radio New Zealand that Immigration looks at a range of issues, including at those who have made — and have had rejected — multiple visa applications.

“It looks at people who place the greatest burden on the health system, people who place the greatest burden on the criminal justice system, and uses that data to prioritise those people,” he said.

“It is important that we protect the integrity of our immigration system and that we use the resources that immigration has as effectively as we can — I do support them using good data to make good decisions about where best to deploy their resources.”

In the statement on Wednesday, Shaw pointed to two further data-modelling projects the government had embarked on, with one from the Ministry of Health looking into the probability of five-year post-transplant survival in New Zealand.

“Using existing data to help model possible outcomes is an important part of modern government decision-making,” Shaw said….(More)”.

The Unlinkable Data Challenge: Advancing Methods in Differential Privacy


National Institute of Standards and Technology: “Databases across the country include information with potentially important research implications and uses, e.g. contingency planning in disaster scenarios, identifying safety risks in aviation, assist in tracking contagious diseases, identifying patterns of violence in local communities.  However, included in these datasets are personally identifiable information (PII) and it is not enough to simply remove PII from these datasets.  It is well known that using auxiliary and possibly completely unrelated datasets, in combination with records in the dataset, can correspond to uniquely identifiable individuals (known as a linkage attack).  Today’s efforts to remove PII do not provide adequate protection against linkage attacks. With the advent of “big data” and technological advances in linking data, there are far too many other possible data sources related to each of us that can lead to our identity being uncovered.

Get Involved – How to Participate

The Unlinkable Data Challenge is a multi-stage Challenge.  This first stage of the Challenge is intended to source detailed concepts for new approaches, inform the final design in the two subsequent stages, and provide recommendations for matching stage 1 competitors into teams for subsequent stages.  Teams will predict and justify where their algorithm fails with respect to the utility-privacy frontier curve.

In this stage, competitors are asked to propose how to de-identify a dataset using less than the available privacy budget, while also maintaining the dataset’s utility for analysis.  For example, the de-identified data, when put through the same analysis pipeline as the original dataset, produces comparable results (i.e. similar coefficients in a linear regression model, or a classifier that produces similar predictions on sub-samples of the data).

This stage of the Challenge seeks Conceptual Solutions that describe how to use and/or combine methods in differential privacy to mitigate privacy loss when publicly releasing datasets in a variety of industries such as public safety, law enforcement, healthcare/biomedical research, education, and finance.  We are limiting the scope to addressing research questions and methodologies that require regression, classification, and clustering analysis on datasets that contain numerical, geo-spatial, and categorical data.

To compete in this stage, we are asking that you propose a new algorithm utilizing existing or new randomized mechanisms with a justification of how this will optimize privacy and utility across different analysis types.  We are also asking you to propose a dataset that you believe would make a good use case for your proposed algorithm, and provide a means of comparing your algorithm and other algorithms.

All submissions must be made using the submission form provided on HeroX website….(More)“.

Big Data against Child Obesity


European Commission: “Childhood and adolescent obesity is a major global and European public health problem. Currently, public actions are detached from local needs, mostly including indiscriminate blanket policies and single-element strategies, limiting their efficacy and effectiveness. The need for community-targeted actions has long been obvious, but the lack of monitoring and evaluation framework and the methodological inability to objectively quantify the local community characteristics, in a reasonable timeframe, has hindered that.

Graph showing BigO policy planner

Big Data based Platform

Technological achievements in mobile and wearable electronics and Big Data infrastructures allow the engagement of European citizens in the data collection process, allowing us to reshape policies at a regional, national and European level. In BigO, that will be facilitated through the development of a platform, allowing the quantification of behavioural community patterns through Big Data provided by wearables and eHealth- devices.

Estimate child obesity through community data

BigO has set detailed scientific, technological, validation and business objectives in order to be able to build a system that collects Big Data on children’s behaviour and helps planning health policies against obesity. In addition, during the project, BigO will reach out to more than 25.000 school and age-matched obese children and adolescents as sources for community data. Comprehensive models of the obesity prevalence dependence matrix will be created, allowing the data-driven effectiveness predictions about specific policies on a community and the real-time monitoring of the population response, supported by powerful real-time data visualisations….(More)

Data Governance in the Digital Age


Centre for International Governance Innovation: “Data is being hailed as “the new oil.” The analogy seems appropriate given the growing amount of data being collected, and the advances made in its gathering, storage, manipulation and use for commercial, social and political purposes.

Big data and its application in artificial intelligence, for example, promises to transform the way we live and work — and will generate considerable wealth in the process. But data’s transformative nature also raises important questions around how the benefits are shared, privacy, public security, openness and democracy, and the institutions that will govern the data revolution.

The delicate interplay between these considerations means that they have to be treated jointly, and at every level of the governance process, from local communities to the international arena. This series of essays by leading scholars and practitioners, which is also published as a special report, will explore topics including the rationale for a data strategy, the role of a data strategy for Canadian industries, and policy considerations for domestic and international data governance…

RATIONALE OF A DATA STRATEGY

THE ROLE OF A DATA STRATEGY FOR CANADIAN INDUSTRIES

BALANCING PRIVACY AND COMMERCIAL VALUES

DOMESTIC POLICY FOR DATA GOVERNANCE

INTERNATIONAL POLICY CONSIDERATIONS

EPILOGUE

Ten Reasons Not to Measure Impact—and What to Do Instead


Essay by Mary Kay Gugerty & Dean Karlan in the Stanford Social Innovation Review: “Good impact evaluations—those that answer policy-relevant questions with rigor—have improved development knowledge, policy, and practice. For example, the NGO Living Goods conducted a rigorous evaluation to measure the impact of its community health model based on door-to-door sales and promotions. The evidence of impact was strong: Their model generated a 27-percent reduction in child mortality. This evidence subsequently persuaded policy makers, replication partners, and major funders to support the rapid expansion of Living Goods’ reach to five million people. Meanwhile, rigorous evidence continues to further validate the model and help to make it work even better.

Of course, not all rigorous research offers such quick and rosy results. Consider the many studies required to discover a successful drug and the lengthy process of seeking regulatory approval and adoption by the healthcare system. The same holds true for fighting poverty: Innovations for Poverty Action (IPA), a research and policy nonprofit that promotes impact evaluations for finding solutions to global poverty, has conducted more than 650 randomized controlled trials (RCTs) since its inception in 2002. These studies have sometimes provided evidence about how best to use scarce resources (e.g., give away bed nets for free to fight malaria), as well as how to avoid wasting them (e.g., don’t expand traditional microcredit). But the vast majority of studies did not paint a clear picture that led to immediate policy changes. Developing an evidence base is more like building a mosaic: Each individual piece does not make the picture, but bit by bit a picture becomes clearer and clearer.

How do these investments in evidence pay off? IPA estimated the benefits of its research by looking at its return on investment—the ratio of the benefit from the scale-up of the demonstrated large-scale successes divided by the total costs since IPA’s founding. The ratio was 74x—a huge result. But this is far from a precise measure of impact, since IPA cannot establish what would have happened had IPA never existed. (Yes, IPA recognizes the irony of advocating for RCTs while being unable to subject its own operations to that standard. Yet IPA’s approach is intellectually consistent: Many questions and circumstances do not call for RCTs.)

Even so, a simple thought exercise helps to demonstrate the potential payoff. IPA never works alone—all evaluations and policy engagements are conducted in partnership with academics and implementing organizations, and increasingly with governments. Moving from an idea to the research phase to policy takes multiple steps and actors, often over many years. But even if IPA deserves only 10 percent of the credit for the policy changes behind the benefits calculated above, the ratio of benefits to costs is still 7.4x. That is a solid return on investment.

Despite the demonstrated value of high-quality impact evaluations, a great deal of money and time has been wasted on poorly designed, poorly implemented, and poorly conceived impact evaluations. Perhaps some studies had too small of a sample or paid insufficient attention to establishing causality and quality data, and hence any results should be ignored; others perhaps failed to engage stakeholders appropriately, and as a consequence useful results were never put to use.

The push for more and more impact measurement can not only lead to poor studies and wasted money, but also distract and take resources from collecting data that can actually help improve the performance of an effort. To address these difficulties, we wrote a book, The Goldilocks Challenge, to help guide organizations in designing “right-fit” evidence strategies. The struggle to find the right fit in evidence resembles the predicament that Goldilocks faces in the classic children’s fable. Goldilocks, lost in the forest, finds an empty house with a large number of options: chairs, bowls of porridge, and beds of all sizes. She tries each but finds that most do not suit her: The porridge is too hot or too cold, the bed too hard or too soft—she struggles to find options that are “just right.” Like Goldilocks, the social sector has to navigate many choices and challenges to build monitoring and evaluation systems that fit their needs. Some will push for more and more data; others will not push for enough….(More)”.

Doing Research In and On the Digital: Research Methods across Fields of Inquiry


Book edited by Cristina Costa and Jenna Condie: “As a social space, the web provides researchers both with a tool and an environment to explore the intricacies of everyday life. As a site of mediated interactions and interrelationships, the ‘digital’ has evolved from being a space of information to a space of creation, thus providing new opportunities regarding how, where and, why to conduct social research.

Doing Research In and On the Digital aims to deliver on two fronts: first, by detailing how researchers are devising and applying innovative research methods for and within the digital sphere, and, secondly, by discussing the ethical challenges and issues implied and encountered in such approaches.

In two core Parts, this collection explores:

  • content collection: methods for harvesting digital data
  • engaging research informants: digital participatory methods and data stories .

With contributions from a diverse range of fields such as anthropology, sociology, education, healthcare and psychology, this volume will particularly appeal to post-graduate students and early career researchers who are navigating through new terrain in their digital-mediated research endeavours….(More)”.

What kind of Evidence Influences local officials? A great example from Guatemala


Paper  by Walter Flores: “Between 2007 and up to now, we have implemented five different methods for gathering evidence:

1) Surveys of health clinics with random sampling,

2) Surveys using tracers and convenience-based sampling,

3) Life histories of the users of health services,

4) User complaints submitted via text messages,

5) Video and photography documenting service delivery problems.

Each of these methods was deployed for a period of 2-3 years and accompanied by detailed monitoring to track its effects on two outcome variables:

1) the level of community participation in planning, data collection and analysis; and

2) the responsiveness of the authorities to the evidence presented.

Our initial intervention generated evidence by surveying a random sample of health clinics—widely considered to be a highly rigorous method for collecting evidence. As the surveys were long and technically complicated, participation from the community was close to zero. Yet our expectation was that, given its scientific rigor, authorities would be responsive to the evidence we presented. The government instead used technical methodological objections as a pretext to reject the service delivery problems we identified. It was clear that such arguments were an excuse and authorities did not want to act.

Flores fig 1Our next effort was to simplify the survey and involve communities in surveying, analysis, and report writing. However, as the table shows, participation was still “minimal,” as was the responsiveness of the authorities. Many community members still struggled to participate and the authorities rejected the evidence as unreliable, again citing methodological concerns. Together with community leaders, we decided to move away from surveys altogether, so authorities could no longer use technical arguments to disregard the evidence.

For our next method, we introduced collecting life-stories of real patients and users of health services. The decision about this new method was taken together with communities. Community members were trained to identify cases of poor service delivery, interview users, and write down their experiences. These testimonies vividly described the impact of poor health services: children unable to go to school because they needed to attend to sick relatives; sick parents unable to care for young children; breadwinners unable go to work, leaving families destitute.

This type of evidence changed the meetings between community leaders and authorities considerably, shifting from arguments over data to discussing the struggles real people faced due to nonresponsive services. After a year of responding to individual life-stories, however, authorities started to treat the information presented as “isolated cases” and became less responsive.

We regrouped again with community leaders to reflect on how to further boost community participation and achieve a response from authorities. We agreed that more agile and less burdensome methods for community volunteers to collect and disseminate evidence might increase the response from authorities. After reviewing different options, we agreed to build a complaint system that allowed users to send coded text messages to an open-access platform….(More)”.

Plunging response rates to household surveys worry policymakers


The Economist: “Response rates to surveys are plummeting all across the rich world. Last year only around 43% of households contacted by the British government responded to the LFS, down from 70% in 2001 (see chart). In America the share of households responding to the Current Population Survey (CPS) has fallen from 94% to 85% over the same period. The rest of Europe and Canada have seen similar trends.

Poor response rates drain budgets, as it takes surveyors more effort to hunt down interviewees. And a growing reluctance to give interviewers information threatens the quality of the data. Politicians often complain about inaccurate election polls. Increasingly misleading economic surveys would be even more disconcerting.

Household surveys derive their power from randomness. Since it is impractical to get every citizen to complete a long questionnaire regularly, statisticians interview what they hope is a representative sample instead. But some types are less likely to respond than others—people who live in flats not houses, for example. A study by Christopher Bollinger of the University of Kentucky and three others matched data from the CPS with social-security records and found that poorer and very rich households were more likely to ignore surveyors than middle-income ones. Survey results will be skewed if the types who do not answer are different from those who do, or if certain types of people are more loth to answer some questions, or more likely to fib….

Statisticians have been experimenting with methods of improving response rates: new ways to ask questions, or shorter questionnaires, for example. Payment raises response rates, and some surveys offer more money for the most reluctant interviewees. But such persistence can have drawbacks. One study found that more frequent attempts to contact interviewees raised the average response rate, but lowered the average quality of answers.

Statisticians have also been exploring supplementary data sources, including administrative data. Such statistics come with two big advantages. One is that administrative data sets can include many more people and observations than is practical in a household survey, giving researchers the statistical power to run more detailed studies. Another is that governments already collect them, so they can offer huge cost savings over household surveys. For instance, Finland’s 2010 census, which was based on administrative records rather than surveys, cost its government just €850,000 ($1.1m) to produce. In contrast, America’s government spent $12.3bn on its 2010 census, roughly 200 times as much on a per-person basis.

Recent advances in computing mean that vast data sets are no longer too unwieldy for use by researchers. However, in many rich countries (those in Scandinavia are exceptions), socioeconomic statistics are collected by several agencies, meaning that researchers who want to combine, say, health records with tax data, face formidable bureaucratic and legal challenges.

Governments in English-speaking countries are especially keen to experiment. In January HMRC, the British tax authority, started publishing real-time tax data as an “experimental statistic” to be compared with labour-market data from household surveys. Two-fifths of Canada’s main statistical agency’s programmes are based at least in part on administrative records. Last year, Britain passed the Digital Economy Act, which will give its Office of National Statistics (ONS) the right to requisition data from other departments and from private sources for statistics-and-research purposes. America is exploring using such data as part of its 2020 census.

Administrative data also have their limitations (see article). They are generally not designed to be used in statistical analyses. A data set on income taxes might be representative of the population receiving benefits or earning wages, but not the population as a whole. Most important, some things are not captured in administrative records, such as well-being, informal employment and religious affiliation….(More)”.

Governance on the Drug Supply Chain via Gcoin Blockchain


Paper by Jen-Hung Tseng et al in the International Journal of Environmental Research and Public Health: “…blockchain was recently introduced to the public to provide an immutable, consensus based and transparent system in the Fintech field. However, there are ongoing efforts to apply blockchain to other fields where trust and value are essential. In this paper, we suggest Gcoin blockchain as the base of the data flow of drugs to create transparent drug transaction data. Additionally, the regulation model of the drug supply chain could be altered from the inspection and examination only model to the surveillance net model, and every unit that is involved in the drug supply chain would be able to participate simultaneously to prevent counterfeit drugs and to protect public health, including patients….(More)”.

Information to Action: Strengthening EPA Citizen Science Partnerships for Environmental Protection


Report by the National Advisory Council for Environmental Policy and Technology: “Citizen science is catalyzing collaboration; new data and information brought about by greater public participation in environmental research are helping to drive a new era of environmental protection. As the body of citizen-generated data and information in the public realm continues to grow, EPA must develop a clear strategy to lead change and encourage action beyond the collection of data. EPA should recognize the variety of opportunities that it has to act as a conduit between the public and key partners, including state, territorial, tribal and local governments; nongovernmental organizations; and leading technology groups in the private sector. The Agency should build collaborations with new partners, identify opportunities to integrate equity into all relationships, and ensure that grassroots and community-based organizations are well supported and fairly resourced in funding strategies.

Key recommendations under this theme:

  • Recommendation 1. Catalyze action from citizen science data and information by providing guidance and leveraging collaboration.
  • Recommendation 2. Build inclusive and equitable partnerships by understanding partners’ diverse concerns and needs, including prioritizing better support for grassroots and community-based partnerships in EPA grantfunding strategies.

Increase state, territorial, tribal and local government engagement with citizen science

The Agency should reach out to tribes, states, territories and local governments throughout the country to understand the best practices and strategies for encouraging and incorporating citizen science in environmental protection. For states and territories looking for ways to engage in citizen science, EPA can help design strategies that recognize the community perspectives while building capacity in state and territorial governments. Recognizing the direct Executive Summary Information to Action: Strengthening EPA Citizen Science Partnerships for Environmental Protection connection between EPA and tribes, the Agency should seek tribal input and support tribes in using citizen science for environmental priorities. EPA should help to increase awareness for citizen science and where jurisdictional efforts already exist, assist in making citizen science accessible through local government agencies. EPA should more proactively listen to the voices of local stakeholders and encourage partners to embrace a vision for citizen science to accelerate the achievement of environmental goals. As part of this approach, EPA should find ways to define and communicate the Agency’s role as a resource in helping communities achieve environmental outcomes.

Key recommendations under this theme:

  • Recommendation 3. Provide EPA support and engage states and territories to better integrate citizen science into program goals.
  • Recommendation 4. Build on the unique strengths of EPA-tribal relationships.
  • Recommendation 5. Align EPA citizen science work to the priorities of local governments.

Leverage external organizations for expertise and project level support

Collaborations between communities and other external organizations—including educational institutions, civic organizations, and community-based organizations— are accelerating the growth of citizen science. Because EPA’s direct connection with members of the public often is limited, the Agency could benefit significantly by consulting with key external organizations to leverage citizen science efforts to provide the greatest benefit for the protection of human health and the environment. EPA should look to external organizations as vital connections to communities engaged in collaboratively led scientific investigation to address community-defined questions, referred to as community citizen science. External organizations can help EPA in assessing gaps in community-driven research and help the Agency to design effective support tools and best management practices for facilitating effective environmental citizen science programs….(More)”.