Why We Should Care About Bad Data


Blog by Stefaan G. Verhulst: “At a time of open and big data, data-led and evidence-based policy making has great potential to improve problem solving but will have limited, if not harmful, effects if the underlying components are riddled with bad data.

Why should we care about bad data? What do we mean by bad data? And what are the determining factors contributing to bad data that if understood and addressed could prevent or tackle bad data? These questions were the subject of my short presentation during a recent webinar on  Bad Data: The Hobgoblin of Effective Government, hosted by the American Society for Public Administration and moderated by Richard Greene (Partner, Barrett and Greene Inc.). Other panelists included Ben Ward (Manager, Information Technology Audits Unit, California State Auditor’s Office) and Katherine Barrett (Partner, Barrett and Greene Inc.). The webinar was a follow-up to the excellent Special Issue of Governing on Bad Data written by Richard and Katherine….(More)”

Formalised data citation practices would encourage more authors to make their data available for reuse


 Hyoungjoo Park and Dietmar Wolfram at the LSE Impact Blog: “Today’s researchers work in a heavily data-intensive and collaborative environment in order to further scientific discovery across and within fields. It is becoming routine for researchers (i.e. authors and data publishers) to submit their research data, such as datasets, biological samples in biomedical fields, and computer code, as supplementary information in order to comply with data sharing requirements of major funding agencies, high-profile journals, and data journals. This is part of open science, where data and any publication products are expected to be made available to anyone interested.

Given that researchers benefit from publicly shared data through data reuse in their own research, researchers who provide access to data should be acknowledged for their contributions, much in the same way that authors are recognised for their research publications through citation. Researchers who use shared data or other shared research products (e.g. open access software, tissue cultures) should also acknowledge the providers of these resources through formal citation. At present, data citation is not widely practised in most disciplines and as an object of study remains largely overlooked….

We found that data citations appear in the references section of an article less frequently than in the main text, making it difficult to identify the reward and credit for data authors (i.e. data sharers). Consistent data citation formats could not be found. Current data citation practices do not (yet) benefit data sharers. Also, data citation was sometimes located in the supplementary information, outside of the references. Data that had been reused was often not acknowledged in the reference lists, but was rather hidden in the representation of data (e.g. tables, figures, images, graphs, and other elements), which may be a consequence of the fact that data citation practices are not yet common in scholarly communications.

Ongoing challenges remain in identifying and documenting data citation. First, the practice of informal data citation presents a challenge for accurately documenting data citation. …

Second, data recitation by one or more co-authors of earlier studies (i.e. self-citation) is common, which reduces the broader impact of data sharing by limiting much of the reuse to the original authors..

Third, currently indexed data citations may not include rapidly advancing areas, such as in the hard sciences or computer engineering, because approximately 90% of indexed works were associated with journal articles…

Fourth, the number of authors associated with shared datasets raises questions of the ownership of and responsibility for a collective work, although some journals require one author to be responsible for the data used in the study…(More). (See also An examination of research data sharing and re-use: implications for data citation practice, published in Scientometrics)

Avoiding Garbage In – Garbage Out: Improving Administrative Data Quality for Research


Blog by : “In June, I presented the webinar, “Improving Administrative Data Quality for Research and Analysis”, for members of the Association of Public Data Users (APDU). APDU is a national network that provides a venue to promote education, share news, and advocate on behalf of public data users.

The webinar served as a primer to help smaller organizations begin to use their data for research. Participants were given the tools to transform their administrative data into “research-ready” datasets.

I first reviewed seven major issues for administrative data quality and discussed how these issues can affect research and analysis. For instance, issues with incorrect value formats, unit of analysis, and duplicate records can make the data difficult to use. Invalid or inconsistent values lead to inaccurate analysis results. Missing or outlier values can produce inaccurate and biased analysis results. All these issues make the data less useful for research.

Next, I presented concrete strategies for reviewing the data to identify each of these quality issues. I also discussed several tips to make the data review process easier, faster, and easy to replicate. Most importantly among these tips are: (1) reviewing everyvariable in the data set, whether you expect problems or not, and (2) relying on data documentation to understand how the data should look….(More)”.

Four lessons NHS Trusts can learn from the Royal Free case


Blog by Elizabeth Denham, Information Commissioner in the UK: “Today my office has announced that the Royal Free London NHS Foundation Trust did not comply with the Data Protection Act when it turned over the sensitive medical data of around 1.6 million patients to Google DeepMind, a private sector firm, as part of a clinical safety initiative. As a result of our investigation, the Trust has been asked to sign an undertaking committing it to changes to ensure it is acting in accordance with the law, and we’ll be working with them to make sure that happens.

But what about the rest of the sector? As organisations increasingly look to unlock the huge potential that creative uses of data can have for patient care, what are the lessons to be learned from this case?

It’s not a choice between privacy or innovation

It’s welcome that the trial looks to have been positive. The Trust has reported successful outcomes. Some may reflect that data protection rights are a small price to pay for this.

But what stood out to me on looking through the results of the investigation is that the shortcomings we found were avoidable. The price of innovation didn’t need to be the erosion of legally ensured fundamental privacy rights….

Don’t dive in too quickly

Privacy impact assessments are a key data protection tool of our era, as evolving law and best practice around the world demonstrate. Privacy impact assessments play an increasingly prominent role in data protection, and they’re a crucial part of digital innovation. ….

New cloud processing technologies mean you can, not that you always should

Changes in technology mean that vast data sets can be made more readily available and can be processed faster and using greater data processing technologies. That’s a positive thing, but just because evolving technologies can allow you to do more doesn’t mean these tools should always be fully utilised, particularly during a trial initiative….

Know the law, and follow it

No-one suggests that red tape should get in the way of progress. But when you’re setting out to test the clinical safety of a new service, remember that the rules are there for a reason….(More)”

Blockchains, personal data and the challenge of governance


Theo Bass at NESTA: “…There are a number of dominant internet platforms (Google, Facebook, Amazon, etc.) that hoard, analyse and sell information about their users in the name of a more personalised and efficient service. This has become a problem.

People feel they are losing control over how their data is used and reused on the web. 500 million adblocker downloads is a symptom of a market which isn’t working well for people. As Irene Ng mentions in a recent guest blog on the Nesta website, the secondary data market is thriving (online advertising is a major player), as companies benefit from the opacity and lack of transparency about where profit is made from personal data.

It’s said that blockchain’s key characteristics could provide a foundational protocol for a fairer digital identity system on the web. Beyond its application as digital currency, blockchain could provide a new set of technical standards for transparency, openness, and user consent, on top of which a whole new generation of services might be built.

While the aim is ambitious, a handful of projects are rising to the challenge.

Blockstack is creating a global system of digital IDs, which are written into the bitcoin blockchain. Nobody can touch them other than the owner of that ID. Blockstack are building a new generation of applications on top of this infrastructure which promises to provide “a new decentralized internet where users own their data and apps run locally”.

Sovrin attempts to provide users with “self-sovereign identity”. The argument is that “centralized” systems for storing personal data make it a “treasure chest for attackers”. Sovrin argues that users should more easily be able to have “ownership” over their data, and the exchange of data should be made possible through a decentralised, tamper-proof ledger of transactions between users.

Our own DECODE project is piloting a set of collaboratively owned, local sharing economy platforms in Barcelona and Amsterdam. The blockchain aims to provide a public record of entitlements over where people’s data is stored, who can access it and for what purpose (with some additional help from new techniques in zero-knowledge cryptography to preserve people’s privacy).

There’s no doubt this is an exciting field of innovation. But the debate is characterised by a lot of hype. The following sections therefore discuss some of the challenges thrown up when we start thinking about implementations beyond bitcoin.

Blockchains and the challenge of governance

As mentioned above, bitcoin is a “bearer asset”. This is a necessary feature of decentralisation — all users maintain sole ownership over the digital money they hold on the network. If users get hacked (digital wallets sometimes do), or if a password gets lost, the money is irretrievable.

While the example of losing a password might seem trivial, it highlights some difficult questions for proponents of blockchain’s wider uses. What happens if there’s a dispute over an online transaction, but no intermediary to settle it? What happens if a someone’s digital assets or their digital identity is breached and sensitive data falls into the wrong hands? It might be necessary to assign responsibility to a governing actor to help resolve the issue, but of course this would require the introduction of a trusted middleman.

Bitcoin doesn’t try to answer these questions; its anonymous creators deliberately tried to avoid implementing a clear model of governance over the network, probably because they knew that bitcoin would be used by people as a method for subverting the law. Bitcoin still sees a lot of use in gray economies, including for the sale of drugs and gambling.

But if blockchains are set to enter the mainstream, providing for businesses, governments and nonprofits, then they won’t be able to function irrespective of the law. They will need to find use-cases that can operate alongside legal frameworks and jurisdictional boundaries. They will need to demonstrate regulatory compliance, create systems of rules and provide accountability when things go awry. This cannot just be solved through increasingly sophisticated coding.

All of this raises a potential paradox recently elaborated in a post by Vili Lehdonvirta of the Oxford Internet Institute: is it possible to successfully govern blockchains without undermining their entire purpose?….

If blockchain advocates only work towards purely technical solutions and ignore real-world challenges of trying to implement decentralisation, then we’ll only ever see flawed implementations of the technology. This is already happening in the form of centrally administered, proprietary or ‘half-baked’ blockchains, which don’t offer much more value than traditional databases….(More)”.

Facebook Disaster Maps


Molly Jackman et al at Facebook: “After a natural disaster, humanitarian organizations need to know where affected people are located, what resources are needed, and who is safe. This information is extremely difficult and often impossible to capture through conventional data collection methods in a timely manner. As more people connect and share on Facebook, our data is able to provide insights in near-real time to help humanitarian organizations coordinate their work and fill crucial gaps in information during disasters. This morning we announced a Facebook disaster map initiative to help organizations address the critical gap in information they often face when responding to natural disasters.

Facebook disaster maps provide information about where populations are located, how they are moving, and where they are checking in safe during a natural disaster. All data is de-identified and aggregated to a 360 square meter tile or local administrative boundaries (e.g. census boundaries). [1]

This blog describes the disaster maps datasets, how insights are calculated, and the steps taken to ensure that we’re preserving privacy….(More)”.

UK government watchdog examining political use of data analytics


“Given the big data revolution, it is understandable that political campaigns are exploring the potential of advanced data analysis tools to help win votes,” Elizabeth Denham, the information commissioner, writes on the ICO’s blog. However, “the public have the right to expect” that this takes place in accordance with existing data protection laws, she adds.

Political parties are able to use Facebook to target voters with different messages, tailoring the advert to recipients based on their demographic. In the 2015 UK general election, the Conservative party spent £1.2 million on Facebook campaigns and the Labour party £16,000. It is expected that Labour will vastly increase that spend for the general election on 8 June….

Political parties and third-party companies are allowed to collect data from sites like Facebook and Twitter that lets them tailor these ads to broadly target different demographics. However, if those ads target identifiable individuals, it runs afoul of the law….(More)”

How to increase public support for policy: understanding citizens’ perspectives


Peter van Wijck and Bert Niemeijer at LSE Blog: “To increase public support, it is essential to anticipate what reactions they will have to policy. But how to do that? Our framework combines insights from scenario planning and frame analysis. Scenario planning starts from the premise that we cannot predict the future. We can, however, imagine different plausible scenarios, different plausible future developments. Scenarios can be used to ask a ‘what if’ question. If a certain scenario were to develop, what policy measures would be required?  By the same token, scenarios may be used as test-conditions for policy-measures. Kees van der Heijden calls this ‘wind tunnelling’.

Frame-analysis is about how we interpret the world around us. Frames are mental structures that shape the way we see the world. Based on a frame, an individual perceives societal problems, attributes these problems to causes, and forms ideas on instruments to address the problems. Our central idea is that policy-makers may use citizens’ frames to reflect on their policy frame. Citizens’ frames may, in other words, be used to test conditions in a wind tunnel. The line of reasoning is summarized in the figure.

Policy frames versus citizens’ frames

policy framinng

The starting-points of the figure are the policy frame and the citizens’ frames. Arrow 1 and 2 indicate that citizens’ reactions depend on both frames. A citizen can be expected to respond positively in case of frame alignment. Negative responses can be expected if policy-makers do not address “the real problems”, do not attribute problems to “the real causes”, or do not select “adequate instruments”. If frames do not align, policy-makers are faced with the question of how to deal with it (arrow 3). First, they may reconsider the policy frame (arrow 4). That is, are there reasons to reconsider the definition of problems, the attribution to causes, and/or the selection of instruments? Such a “reframing” effectively amounts to the formulation of a new (or adjusted) policy-frame. Second, policy-makers may try to influence citizens’ frames (arrow 5). This may lead to a change in what citizens define as problems, what they consider to be the causes of problems and what they consider to be adequate instruments to deal with the problems.

Two cases: support for victims and confidence in the judiciary

To apply our framework in practice, we developed a three-step method. Firstly, we reconstruct the policy frame. Here we investigate what policy-makers see as social problems, what they assume to be the causes of these problems, and what they consider to be appropriate instruments to address these problems. Secondly, we reconstruct contrasting citizens’ frames. Here we use focus groups, where contrasting groups are selected based on a segmentation model. Finally, we engage in a “wind tunnelling exercise”. We present the citizens’ frames to policy-makers. And we ask them to reflect on the question of how the different groups can be expected to react on the policy measures selected by the policy-makers. In fact, this step is what Schön and Rein called “frame reflection”….(More)”.

ALTwitter


ALTwitter” – as in the alternate Twitter is the profiles of the Members of European Parliaments built on their Twitter metadata. In spite of the risks and challenges associated with the privacy of ineffectively regulated metadata, the beauty of the metadata which everyone should appreciate lies in its brevity and flexibility.

When you navigate to the profiles of the members of the parliament listed below, you will notice that these profiles give the essence of their interaction with Twitter and the data that they generate there. Without going through all their tweets, one can learn their areas/topics that they work, the device/mediums they use, the type of websites they refer, their sleeping/activity pattern, etc. The amount insight that can be derived from these metadata is indeed more interesting. We intend to present such artifacts in a separate blog post soon.

This open source project is a part of #hakunametadata series (with the earlier module on browsing metadata) is educate about the immense amount of information contained in the metadata that we generate by our day-to-day internet activities. Every bit of data used for this project is from the publically available information on Twitter. Furthermore, this project will be updated periodically and automatically to track the changes.”…(More)”

Going Digital: Restoring Trust In Government In Latin American Cities


Carlos Santiso at The Rockefeller Foundation Blog: “Driven by fast-paced technological innovations, an exponential growth of smartphones, and a daily stream of big data, the “digital revolution” is changing the way we live our lives. Nowhere are the changes more sweeping than in cities. In Latin America, almost 80 percent of the population lives in cities, where massive adoption of social media is enabling new forms of digital engagement. Technology is ubiquitous in cities. The expectations of Latin American “digital citizens” have grown exponentially as a result of a rising middle class and an increasingly connected youth.

This digital transformation is recasting the relation between states and citizens. Digital citizens are asking for better services, more transparency, and meaningful participation. Their rising expectations concern the quality of the services city governments ought to provide, but also the standards of integrity, responsiveness, and fairness of the bureaucracy in their daily dealings. A recent study shows that citizens’ satisfaction with public services is not only determined by the objective quality of the service, but also their subjective expectations and how fairly they consider being treated….

New technologies and data analytics are transforming the governance of cities. Digital-intensive and data-driven innovations are changing how city governments function and deliver services, and also enabling new forms of social participation and co-creation. New technologies help improve efficiency and further transparency through new modes of open innovation. Tech-enabled and citizen-driven innovations also facilitate participation through feedback loops from citizens to local authorities to identify and resolve failures in the delivery of public services.

Three structural trends are driving the digital revolution in governments.

  1. The digital transformation of the machinery of government. National and city governments in the region are developing digital strategies to increase connectivity, improve services, and enhance accountability. According to a recent report, 75 percent of the 23 countries surveyed have developed comprehensive digital strategies, such as Uruguay Digital, Colombia’s Vive Digital or Mexico’s Agenda Digital, that include legally recognized digital identification mechanisms. “Smart cities” are intensifying the use of modern technologies and improve the interoperability of government systems, the backbone of government, to ensure that public services are inter-connected and thus avoid having citizens provide the same information to different entities. An important driver of this transformation is citizens’ demands for greater transparency and accountability in the delivery of public services. Sixteen countries in the region have developed open government strategies, and cities such as Buenos Aires in Argentina, La Libertad in Peru, and Sao Paolo in Brazil have also committed to opening up government to public scrutiny and new forms of social participation. This second wave of active transparency reforms follows a first, more passive wave that focused on facilitating access to information.
  1. The digital transformation of the interface with citizens. Sixty percent of the countries surveyed by the aforementioned report have established integrated service portals through which citizens can access online public services. Online portals allow for a single point of access to public services. Cities, such as Bogotá and Rio de Janeiro, are developing their own online service platforms to access municipal services. These innovations improve access to public services and contribute to simplifying bureaucratic processes and cutting red-tape, as a recent study shows. Governments are resorting to crowdsourcing solutions, open intelligence initiatives, and digital apps to encourage active citizen participation in the improvement of public services and the prevention of corruption. Colombia’s Transparency Secretariat has developed an app that allows citizens to report “white elephants” — incomplete or overbilled public works. By the end of 2015, it identified 83 such white elephants, mainly in the capital Bogotá, for a total value of almost $500 million, which led to the initiation of criminal proceedings by law enforcement authorities. While many of these initiatives emerge from civic initiatives, local governments are increasingly encouraging them and adopting their own open innovation models to rethink public services.
  1. The gradual mainstreaming of social innovation in local government. Governments are increasingly resorting to public innovation labs to tackle difficult problems for citizens and businesses. Governments innovation labs are helping address “wicked problems” by combining design thinking, crowdsourcing techniques, and data analytics tools. Chile, Colombia, Mexico, Brazil, and Uruguay, have developed such social innovation labs within government structures. As a recent report notes, these mechanisms come in different forms and shapes. Large cities, such as Buenos Aires, Mexico City, Quito, Rio de Janeiro, and Montevideo, are at the forefront of testing such laboratory mechanisms and institutionalizing tech-driven and citizen-centered approaches through innovation labs. For example, in 2013, Mexico City created its Laboratorio para la Ciudad, as a hub for civic innovation and urban creativity, relying on small-case experiments and interventions to improve specific government services and make local government more transparent, responsive, and receptive. It spearheaded an open government law for the city that encourages residents to participate in the design of public policies and requires city agencies to consider those suggestions…..(More)”.