The Unlinkable Data Challenge: Advancing Methods in Differential Privacy


National Institute of Standards and Technology: “Databases across the country include information with potentially important research implications and uses, e.g. contingency planning in disaster scenarios, identifying safety risks in aviation, assist in tracking contagious diseases, identifying patterns of violence in local communities.  However, included in these datasets are personally identifiable information (PII) and it is not enough to simply remove PII from these datasets.  It is well known that using auxiliary and possibly completely unrelated datasets, in combination with records in the dataset, can correspond to uniquely identifiable individuals (known as a linkage attack).  Today’s efforts to remove PII do not provide adequate protection against linkage attacks. With the advent of “big data” and technological advances in linking data, there are far too many other possible data sources related to each of us that can lead to our identity being uncovered.

Get Involved – How to Participate

The Unlinkable Data Challenge is a multi-stage Challenge.  This first stage of the Challenge is intended to source detailed concepts for new approaches, inform the final design in the two subsequent stages, and provide recommendations for matching stage 1 competitors into teams for subsequent stages.  Teams will predict and justify where their algorithm fails with respect to the utility-privacy frontier curve.

In this stage, competitors are asked to propose how to de-identify a dataset using less than the available privacy budget, while also maintaining the dataset’s utility for analysis.  For example, the de-identified data, when put through the same analysis pipeline as the original dataset, produces comparable results (i.e. similar coefficients in a linear regression model, or a classifier that produces similar predictions on sub-samples of the data).

This stage of the Challenge seeks Conceptual Solutions that describe how to use and/or combine methods in differential privacy to mitigate privacy loss when publicly releasing datasets in a variety of industries such as public safety, law enforcement, healthcare/biomedical research, education, and finance.  We are limiting the scope to addressing research questions and methodologies that require regression, classification, and clustering analysis on datasets that contain numerical, geo-spatial, and categorical data.

To compete in this stage, we are asking that you propose a new algorithm utilizing existing or new randomized mechanisms with a justification of how this will optimize privacy and utility across different analysis types.  We are also asking you to propose a dataset that you believe would make a good use case for your proposed algorithm, and provide a means of comparing your algorithm and other algorithms.

All submissions must be made using the submission form provided on HeroX website….(More)“.

Big Data against Child Obesity


European Commission: “Childhood and adolescent obesity is a major global and European public health problem. Currently, public actions are detached from local needs, mostly including indiscriminate blanket policies and single-element strategies, limiting their efficacy and effectiveness. The need for community-targeted actions has long been obvious, but the lack of monitoring and evaluation framework and the methodological inability to objectively quantify the local community characteristics, in a reasonable timeframe, has hindered that.

Graph showing BigO policy planner

Big Data based Platform

Technological achievements in mobile and wearable electronics and Big Data infrastructures allow the engagement of European citizens in the data collection process, allowing us to reshape policies at a regional, national and European level. In BigO, that will be facilitated through the development of a platform, allowing the quantification of behavioural community patterns through Big Data provided by wearables and eHealth- devices.

Estimate child obesity through community data

BigO has set detailed scientific, technological, validation and business objectives in order to be able to build a system that collects Big Data on children’s behaviour and helps planning health policies against obesity. In addition, during the project, BigO will reach out to more than 25.000 school and age-matched obese children and adolescents as sources for community data. Comprehensive models of the obesity prevalence dependence matrix will be created, allowing the data-driven effectiveness predictions about specific policies on a community and the real-time monitoring of the population response, supported by powerful real-time data visualisations….(More)

Data Governance in the Digital Age


Centre for International Governance Innovation: “Data is being hailed as “the new oil.” The analogy seems appropriate given the growing amount of data being collected, and the advances made in its gathering, storage, manipulation and use for commercial, social and political purposes.

Big data and its application in artificial intelligence, for example, promises to transform the way we live and work — and will generate considerable wealth in the process. But data’s transformative nature also raises important questions around how the benefits are shared, privacy, public security, openness and democracy, and the institutions that will govern the data revolution.

The delicate interplay between these considerations means that they have to be treated jointly, and at every level of the governance process, from local communities to the international arena. This series of essays by leading scholars and practitioners, which is also published as a special report, will explore topics including the rationale for a data strategy, the role of a data strategy for Canadian industries, and policy considerations for domestic and international data governance…

RATIONALE OF A DATA STRATEGY

THE ROLE OF A DATA STRATEGY FOR CANADIAN INDUSTRIES

BALANCING PRIVACY AND COMMERCIAL VALUES

DOMESTIC POLICY FOR DATA GOVERNANCE

INTERNATIONAL POLICY CONSIDERATIONS

EPILOGUE

City Data Exchange – Lessons Learned From A Public/Private Data Collaboration


Report by the Municipality of Copenhagen: “The City Data Exchange (CDE) is the product of a collaborative project between the Municipality of Copenhagen, the Capital Region of Denmark, and Hitachi. The purpose of the project is to examine the possibilities of creating a marketplace for the exchange of data between public and private organizations.

The CDE consists of three parts:

  • A collaboration between the different partners on supply, and demand of specific data;
  • A platform for selling and purchasing data aimed at both public, and private organizations;
  • An effort to establish further experience in the field of data exchange between public, and private organizations.

In 2013, the City of Copenhagen, and the Copenhagen Region decided to invest in the creation of a marketplace for the exchange of public, and private sector data. The initial investment was meant as a seed towards a self-sustained marketplace. This was an innovative approach to test the readiness of the market to deliver new data-sharing solutions.

The CDE is the result of a tender by the Municipality of Copenhagen and the Capital Region of Denmark in 2015. Hitachi Consulting won the tender and has invested, and worked with the Municipality of Copenhagen, and the Capital Region of Denmark to establish an organization and a technical platform.

The City Data Exchange (CDE) has closed a gap in regional data infrastructure. Both public-and private sector organizations have used the CDE to gain insights into data use cases, new external data sources, GDPR issues, and to explore the value of their data. Before the CDE was launched, there were only a few options available to purchase or sell data.

The City and the Region of Copenhagen are utilizing the insights from the CDE project to improve their internal activities and to shape new policies. The lessons from the CDE also provide insights into a wider national infrastructure for effective data sharing. Based on the insights from approximately 1000 people that the CDE has been in contact with, the recommendations are:

  • Start with the use case, as it is key to engage the data community that will use the data;
  • Create a data competence hub, where the data community can meet and get support;
  • Create simple standards and guidelines for data publishing.

The following paper presents some of the key findings from our work with the CDE. It has been compiled by Smart City Insights on behalf of the partners of the City Data Exchange project…(More)”.

The 2018 Atlas of Sustainable Development Goals: an all-new visual guide to data and development


World Bank Data Team: “We’re pleased to release the 2018 Atlas of Sustainable Development Goals. With over 180 maps and charts, the new publication shows the progress societies are making towards the 17 SDGs.

It’s filled with annotated data visualizations, which can be reproducibly built from source code and data. You can view the SDG Atlas onlinedownload the PDF publication (30Mb), and access the data and source code behind the figures.

This Atlas would not be possible without the efforts of statisticians and data scientists working in national and international agencies around the world. It is produced in collaboration with the professionals across the World Bank’s data and research groups, and our sectoral global practices.

Trends and analysis for the 17 SDGs

The Atlas draws on World Development Indicators, a database of over 1,400 indicators for more than 220 economies, many going back over 50 years. For example, the chapter on SDG4 includes data from the UNESCO Institute for Statistics on education and its impact around the world.

Throughout the Atlas, data are presented by country, region and income group and often disaggregated by sex, wealth and geography.

The Atlas also explores new data from scientists and researchers where standards for measuring SDG targets are still being developed. For example, the chapter on SDG14 features research led by Global Fishing Watch, published this year in Science. Their team has tracked over 70,000 industrial fishing vessels from 2012 to 2016, processed 22 billion automatic identification system messages to map and quantify fishing around the world….(More)”.

Plunging response rates to household surveys worry policymakers


The Economist: “Response rates to surveys are plummeting all across the rich world. Last year only around 43% of households contacted by the British government responded to the LFS, down from 70% in 2001 (see chart). In America the share of households responding to the Current Population Survey (CPS) has fallen from 94% to 85% over the same period. The rest of Europe and Canada have seen similar trends.

Poor response rates drain budgets, as it takes surveyors more effort to hunt down interviewees. And a growing reluctance to give interviewers information threatens the quality of the data. Politicians often complain about inaccurate election polls. Increasingly misleading economic surveys would be even more disconcerting.

Household surveys derive their power from randomness. Since it is impractical to get every citizen to complete a long questionnaire regularly, statisticians interview what they hope is a representative sample instead. But some types are less likely to respond than others—people who live in flats not houses, for example. A study by Christopher Bollinger of the University of Kentucky and three others matched data from the CPS with social-security records and found that poorer and very rich households were more likely to ignore surveyors than middle-income ones. Survey results will be skewed if the types who do not answer are different from those who do, or if certain types of people are more loth to answer some questions, or more likely to fib….

Statisticians have been experimenting with methods of improving response rates: new ways to ask questions, or shorter questionnaires, for example. Payment raises response rates, and some surveys offer more money for the most reluctant interviewees. But such persistence can have drawbacks. One study found that more frequent attempts to contact interviewees raised the average response rate, but lowered the average quality of answers.

Statisticians have also been exploring supplementary data sources, including administrative data. Such statistics come with two big advantages. One is that administrative data sets can include many more people and observations than is practical in a household survey, giving researchers the statistical power to run more detailed studies. Another is that governments already collect them, so they can offer huge cost savings over household surveys. For instance, Finland’s 2010 census, which was based on administrative records rather than surveys, cost its government just €850,000 ($1.1m) to produce. In contrast, America’s government spent $12.3bn on its 2010 census, roughly 200 times as much on a per-person basis.

Recent advances in computing mean that vast data sets are no longer too unwieldy for use by researchers. However, in many rich countries (those in Scandinavia are exceptions), socioeconomic statistics are collected by several agencies, meaning that researchers who want to combine, say, health records with tax data, face formidable bureaucratic and legal challenges.

Governments in English-speaking countries are especially keen to experiment. In January HMRC, the British tax authority, started publishing real-time tax data as an “experimental statistic” to be compared with labour-market data from household surveys. Two-fifths of Canada’s main statistical agency’s programmes are based at least in part on administrative records. Last year, Britain passed the Digital Economy Act, which will give its Office of National Statistics (ONS) the right to requisition data from other departments and from private sources for statistics-and-research purposes. America is exploring using such data as part of its 2020 census.

Administrative data also have their limitations (see article). They are generally not designed to be used in statistical analyses. A data set on income taxes might be representative of the population receiving benefits or earning wages, but not the population as a whole. Most important, some things are not captured in administrative records, such as well-being, informal employment and religious affiliation….(More)”.

Information to Action: Strengthening EPA Citizen Science Partnerships for Environmental Protection


Report by the National Advisory Council for Environmental Policy and Technology: “Citizen science is catalyzing collaboration; new data and information brought about by greater public participation in environmental research are helping to drive a new era of environmental protection. As the body of citizen-generated data and information in the public realm continues to grow, EPA must develop a clear strategy to lead change and encourage action beyond the collection of data. EPA should recognize the variety of opportunities that it has to act as a conduit between the public and key partners, including state, territorial, tribal and local governments; nongovernmental organizations; and leading technology groups in the private sector. The Agency should build collaborations with new partners, identify opportunities to integrate equity into all relationships, and ensure that grassroots and community-based organizations are well supported and fairly resourced in funding strategies.

Key recommendations under this theme:

  • Recommendation 1. Catalyze action from citizen science data and information by providing guidance and leveraging collaboration.
  • Recommendation 2. Build inclusive and equitable partnerships by understanding partners’ diverse concerns and needs, including prioritizing better support for grassroots and community-based partnerships in EPA grantfunding strategies.

Increase state, territorial, tribal and local government engagement with citizen science

The Agency should reach out to tribes, states, territories and local governments throughout the country to understand the best practices and strategies for encouraging and incorporating citizen science in environmental protection. For states and territories looking for ways to engage in citizen science, EPA can help design strategies that recognize the community perspectives while building capacity in state and territorial governments. Recognizing the direct Executive Summary Information to Action: Strengthening EPA Citizen Science Partnerships for Environmental Protection connection between EPA and tribes, the Agency should seek tribal input and support tribes in using citizen science for environmental priorities. EPA should help to increase awareness for citizen science and where jurisdictional efforts already exist, assist in making citizen science accessible through local government agencies. EPA should more proactively listen to the voices of local stakeholders and encourage partners to embrace a vision for citizen science to accelerate the achievement of environmental goals. As part of this approach, EPA should find ways to define and communicate the Agency’s role as a resource in helping communities achieve environmental outcomes.

Key recommendations under this theme:

  • Recommendation 3. Provide EPA support and engage states and territories to better integrate citizen science into program goals.
  • Recommendation 4. Build on the unique strengths of EPA-tribal relationships.
  • Recommendation 5. Align EPA citizen science work to the priorities of local governments.

Leverage external organizations for expertise and project level support

Collaborations between communities and other external organizations—including educational institutions, civic organizations, and community-based organizations— are accelerating the growth of citizen science. Because EPA’s direct connection with members of the public often is limited, the Agency could benefit significantly by consulting with key external organizations to leverage citizen science efforts to provide the greatest benefit for the protection of human health and the environment. EPA should look to external organizations as vital connections to communities engaged in collaboratively led scientific investigation to address community-defined questions, referred to as community citizen science. External organizations can help EPA in assessing gaps in community-driven research and help the Agency to design effective support tools and best management practices for facilitating effective environmental citizen science programs….(More)”.

Most Maps of the New Ebola Outbreak Are Wrong


Ed Kong in The Atlantic: “Almost all the maps of the outbreak zone that have thus far been released contain mistakes of this kind. Different health organizations all seem to use their own maps, most of which contain significant discrepancies. Things are roughly in the right place, but their exact positions can be off by miles, as can the boundaries between different regions.

Sinai, a cartographer at UCLA, has been working with the Ministry of Health to improve the accuracy of the Congo’s maps, and flew over on Saturday at their request. For each health zone within the outbreak region, Sinai compiled a list of the constituent villages, plotted them using the most up-to-date sources of geographical data, and drew boundaries that include these places and no others. The maps at the top of this piece show the before (left) and after (right) images….

Consider Bikoro, the health zone where the outbreak may have originated, and where most cases are found. Sinai took a list of all Bikoro’s villages, plotted them using the most up-to-date sources of geographical data, and drew a boundary that includes these places and no others. This new shape is roughly similar to the one on current maps, but with critical differences. Notably, existing maps have the village of Ikoko Impenge—one of the epicenters of the outbreak—outside the Bikoro health zone, when it actually lies within the zone.

 “These visualizations are important for communicating the reality on the ground to all levels of the health hierarchy, and to international partners who don’t know the country,” says Mathias Mossoko, the head of disease surveillance data in DRC.

“It’s really important for the outbreak response to have real and accurate data,” adds Bernice Selo, who leads the cartographic work from the Ministry of Health’s command center in Kinshasa. “You need to know exactly where the villages are, where the health facilities are, where the transport routes and waterways are. All of this helps you understand where the outbreak is, where it’s moving, how it’s moving. You can see which villages have the highest risk.”

To be clear, there’s no evidence that these problems are hampering the response to the current outbreak. It’s not like doctors are showing up in the middle of the forest, wondering why they’re in the wrong place. “Everyone on the ground knows where the health zones start and end,” says Sinai. “I don’t think this will make or break the response. But you surely want the most accurate data.”

It feels unusual to not have this information readily at hand, especially in an era when digital maps are so omnipresent and so supposedly truthful. If you search for San Francisco on Google Maps, you can be pretty sure that what comes up is actually where San Francisco is. On Google Street View, you can even walk along a beach at the other end of the world….(More)”.

But the Congo is a massive country—a quarter the size of the United States with considerably fewer resources. Until very recently, they haven’t had the resources to get accurate geolocalized data. Instead, the boundaries of the health zones and their constituent “health areas,” as well as the position of specific villages, towns, rivers, hospitals, clinics, and other landmarks, are often based on local knowledge and hand-drawn maps. Here’s an example, which I saw when I visited the National Institute for Biomedical Research in March. It does the job, but it’s clearly not to scale.

Mapping the economy in real time is almost ‘within our grasp’


Delphine Strauss at the Financial Times: “The goal of mapping economic activity in real time, just as we do for weather or traffic, is “closer than ever to being within our grasp”, according to Andy Haldane, the Bank of England’s chief economist. In recent years, “data has become the new oil . . . and data companies have become the new oil giants”, Mr Haldane told an audience at King’s Business School …

But economics and finance have been “rather reticent about fully embracing this oil-rush”, partly because economists have tended to prefer a deductive approach that puts theory ahead of measurement. This needs to change, he said, because relying too much on either theory or real-world data in isolation can lead to serious mistakes in policymaking — as was seen when the global financial crisis exposed the “empirical fragility” of macroeconomic models.

Parts of the private sector and academia have been far swifter to exploit the vast troves of ever-accumulating data now available — 90 per cent of which has been created in the last two years alone. Massachusetts Institute of Technology’s “Billion Prices Project”, name-checked in Mr Haldane’s speech, now collects enough data from online retailers for its commercial arm to provide daily inflation updates for 22 economies….

The UK’s Office for National Statistics — which has faced heavy criticism over the quality of its data in recent years — is experimenting with “web-scraping” to collect price quotes for food and groceries, for example, and making use of VAT data from small businesses to improve its output-based estimates of gross domestic product. In both cases, the increased sample size and granularity could bring considerable benefits on top of existing surveys, Mr Haldane said.

The BoE itself is trying to make better use of financial data — for example, by using administrative data on owner-occupied mortgages to better understand pricing decisions in the UK housing market. Mr Haldane sees scope to go further with the new data coming on stream on payment, credit and banking flows. …New data sources and techniques could also help policymakers think about human decision-making — which rarely conforms with the rational process assumed in many economic models. Data on music downloads from Spotify, used as an indicator of sentiment, has recently been shown to do at least as well as a standard consumer confidence survey in tracking consumer spending….(More)”.

CrowdLaw Manifesto


At the Rockefeller Foundation Bellagio Center this spring, assembled participants  met to discuss CrowdLaw, namely how to use technology to improve the quality and effectiveness of law and policymaking through greater public engagement. We put together and signed 12 principles to promote the use of CrowdLaw by local legislatures and national parliaments, calling for legislatures, technologists and the public to participate in creating more open and participatory lawmaking practices. We invite you to sign the Manifesto using the form below.

Draft dated May 29, 2018

  1. To improve public trust in democratic institutions, we must improve how we govern in the 21st century.
  2. CrowdLaw is any law, policy-making or public decision-making that offers a meaningful opportunity for the public to participate in one or multiples stages of decision-making, including but not limited to the processes of problem identification, solution identification, proposal drafting, ratification, implementation or evaluation.
  3. CrowdLaw draws on innovative processes and technologies and encompasses diverse forms of engagement among elected representatives, public officials, and those they represent.
  4. When designed well, CrowdLaw may help governing institutions obtain more relevant facts and knowledge as well as more diverse perspectives, opinions and ideas to inform governing at each stage and may help the public exercise political will.
  5. When designed well, CrowdLaw may help democratic institutions build trust and the public to play a more active role in their communities and strengthen both active citizenship and democratic culture.
  6. When designed well, CrowdLaw may enable engagement that is thoughtful, inclusive, informed but also efficient, manageable and sustainable.
  7. Therefore, governing institutions at every level should experiment and iterate with CrowdLaw initiatives in order to create formal processes for diverse members of society to participate in order to improve the legitimacy of decision-making, strengthen public trust and produce better outcomes.
  8. Governing institutions at every level should encourage research and learning about CrowdLaw and its impact on individuals, on institutions and on society.
  9. The public also has a responsibility to improve our democracy by demanding and creating opportunities to engage and then actively contributing expertise, experience, data and opinions.
  10. Technologists should work collaboratively across disciplines to develop, evaluate and iterate varied, ethical and secure CrowdLaw platforms and tools, keeping in mind that different participation mechanisms will achieve different goals.
  11. Governing institutions at every level should encourage collaboration across organizations and sectors to test what works and share good practices.
  12. Governing institutions at every level should create the legal and regulatory frameworks necessary to promote CrowdLaw and better forms of public engagement and usher in a new era of more open, participatory and effective governing.

The CrowdLaw Manifesto has been signed by the following individuals and organizations:

Individuals

  • Victoria Alsina, Senior Fellow at The GovLab and Faculty Associate at Harvard Kennedy School, Harvard University
  • Marta Poblet Balcell , Associate Professor, RMIT University
  • Robert Bjarnason — President & Co-founder, Citizens Foundation; Better Reykjavik
  • Pablo Collada — Former Executive Director, Fundación Ciudadano Inteligente
  • Mukelani Dimba — Co-chair, Open Government Partnership
  • Hélène Landemore, Associate Professor of Political Science, Yale University
  • Shu-Yang Lin, re:architect & co-founder, PDIS.tw
  • José Luis Martí , Vice-Rector for Innovation and Professor of Legal Philosophy, Pompeu Fabra University
  • Jessica Musila — Executive Director, Mzalendo
  • Sabine Romon — Chief Smart City Officer — General Secretariat, Paris City Council
  • Cristiano Ferri Faría — Director, Hacker Lab, Brazilian House of Representatives
  • Nicola Forster — President and Founder, Swiss Forum on Foreign Policy
  • Raffaele Lillo — Chief Data Officer, Digital Transformation Team, Government of Italy
  • Tarik Nesh-Nash — CEO & Co-founder, GovRight; Ashoka Fellow
  • Beth Simone Noveck, Director, The GovLab and Professor at New York University Tandon School of Engineering
  • Ehud Shapiro , Professor of Computer Science and Biology, Weizmann Institute of Science

Organizations

  • Citizens Foundation, Iceland
  • Fundación Ciudadano Inteligente, Chile
  • International School for Transparency, South Africa
  • Mzalendo, Kenya
  • Smart Cities, Paris City Council, Paris, France
  • Hacker Lab, Brazilian House of Representatives, Brazil
  • Swiss Forum on Foreign Policy, Switzerland
  • Digital Transformation Team, Government of Italy, Italy
  • The Governance Lab, New York, United States
  • GovRight, Morocco
  • ICT4Dev, Morocco