UK’s National Data Strategy


DCMS (UK): “…With the increasing ascendance of data, it has become ever-more important that the government removes the unnecessary barriers that prevent businesses and organisations from accessing such information.

The importance of data sharing was demonstrated during the first few months of the coronavirus pandemic, when government departments, local authorities, charities and the private sector came together to provide essential services. One notable example is the Vulnerable Person Service, which in a very short space of time enabled secure data-sharing across the public and private sectors to provide millions of food deliveries and access to priority supermarket delivery slots for clinically extremely vulnerable people.

Aggregation of data from different sources can also lead to new insights that otherwise would not have been possible. For example, the Connected Health Cities project anonymises and links data from different health and social care services, providing new insights into the way services are used.

Vitally, data sharing can also fuel growth and innovation.20 For new and innovating organisations, increasing data availability will mean that they, too, will be able to gain better insights from their work and access new markets – from charities able to pool beneficiary data to better evaluate the effectiveness of interventions, to new entrants able to access new markets. Often this happens as part of commercial arrangements; in other instances government has sought to intervene where there are clear consumer benefits, such as in relation to Open Banking and Smart Data. Government has also invested in the research and development of new mechanisms for better data sharing, such as the Office for AI and Innovate UK’s partnership with the Open Data Institute to explore data trusts.21

However, our call for evidence, along with engagement with stakeholders, has identified a range of barriers to data availability, including:

  • a culture of risk aversion
  • issues with current licensing regulations
  • market barriers to greater re-use, including data hoarding and differential market power
  • inconsistent formatting of public sector data
  • issues pertaining to the discoverability of data
  • privacy and security concerns
  • the benefits relating to increased data sharing not always being felt by the organisation incurring the cost of collection and maintenance

This is a complex environment, and heavy-handed intervention may have the unwanted effect of reducing incentives to collect, maintain and share data for the benefit of the UK. It is clear that any way forward must be carefully considered to avoid unintended negative consequences. There is a balance to be struck between maintaining appropriate commercial incentives to collect data, while ensuring that data can be used widely for the benefit of the UK. For personal data, we must also take account of the balance between individual rights and public benefit.

This is a new issue for all digital economies that has come to the fore as data has become a significant modern, economic asset. Our approach will take account of those incentives, and consider how innovation can overcome perceived barriers to availability. For example, it can be limited to users with specific characteristics, by licence or regulator accreditation; it can be shared within a collaborating group of organisations; there may also be value in creating and sharing synthetic data to support research and innovation, as well as other privacy-enhancing technologies and techniques….(More)”.

Synthetic data: Unlocking the power of data and skills for machine learning


Karen Walker at Gov.UK: “Defence generates and holds a lot of data. We want to be able to get the best out of it, unlocking new insights that aren’t currently visible, through the use of innovative data science and analytics techniques tailored to defence’s specific needs. But this can be difficult because our data is often sensitive for a variety of reasons. For example, this might include information about the performance of particular vehicles, or personnel’s operational deployment details.

It is therefore often challenging to share data with experts who sit outside the Ministry of Defence, particularly amongst the wider data science community in government, small companies and academia. The use of synthetic data gives us a way to address this challenge and to benefit from the expertise of a wider range of people by creating datasets which aren’t sensitive. We have recently published a report from this work….(More)”.

Double image of original data and synthetic data in a 2D chart. The two images look almost identical

Business-to-Business Data Sharing: An Economic and Legal Analysis


Paper by Bertin Martens et al: “The European Commission announced in its Data Strategy (2020) its intentions to propose an enabling legislative framework for the governance of common European data spaces, to review and operationalize data portability, to prioritize standardization activities and foster data interoperability and to clarify usage rights for co-generated IoT data. This Strategy starts from the premise that there is not enough data sharing and that much data remain locked up and are not available for innovative re-use. The Commission will also consider the adoption of a New Competition Tool as well as the adoption of ex ante regulation for large online gate-keeping platforms as part of the announced Digital Services Act Package . In this context, the goal of this report is to examine the obstacles to Business-to-Business (B2B) data sharing: what keeps businesses from sharing or trading more of their data with other businesses and what can be done about it? For this purpose, this report uses the well-known tools of legal and economic thinking about market failures. It starts from the economic characteristics of data and explores to what extent private B2B data markets result in a socially optimal degree of data sharing, or whether there are market failures in data markets that might justify public policy intervention.

It examines the conditions under which monopolistic data market failures may occur. It contrasts these welfare losses with the welfare gains from economies of scope in data aggregation in large pools. It also discusses other potential sources of B2B data market failures due to negative externalities, risks and transaction costs and asymmetric information situations. In a next step, the paper explores solutions to overcome these market failures. Private third-party data intermediaries may be in a position to overcome market failures due to high transactions costs and risks. They can aggregate data in large pools to harvest the benefits of economies of scale and scope in data. Where third-party intervention fails, regulators can step in, with ex-post competition instruments and with ex-ante regulation. The latter includes data portability rights for personal data and mandatory data access rights….(More)”.

Privacy-Preserving Record Linkage in the context of a National Statistics Institute


Guidance by Rainer Schnell: “Linking existing administrative data sets on the same units is used increasingly as a research strategy in many different fields. Depending on the academic field, this kind of operation has been given different names, but in application areas, this approach is mostly denoted as record linkage. Although linking data on organisations or economic entities is common, the most interesting applications of record linkage concern data on persons. Starting in medicine, this approach is now also being used in the social sciences and official statistics. Furthermore, the joint use of survey data with administrative data is now standard practice. For example, victimisation surveys are linked to police records, labour force surveys are linked to social security databases, and censuses are linked to surveys.

Merging different databases containing information on the same unit is technically trivial if all involved databases have a common identification number, such as a social security number or, as in the Scandinavian countries, a permanent personal identification number. Most of the modern identification numbers contain checksum mechanisms so that errors in these identifiers can be easily detected and corrected. Due to the many advantages of permanent personal identification numbers, similar systems have been introduced or discussed in some European countries outside Scandinavia.

In many jurisdictions, no permanent personal identification number is available for linkage. Examples are New Zealand, Australia, the UK, and Germany. Here, the linkage is most often based on alphanumeric identifiers such as surname, first name, address, and place of birth. In the literature, such identifiers are most often denoted as indirect or quasi-identifiers. Such identifiers are prone to error, for example, due to typographical errors, memory faults (previous addresses), different recordings of the same identifier (for example, swapping of substrings: reversal of first name and last name), deliberately false information (for example, year of birth) or changes of values over time (for example name changes due to marriages). Linking on exact matching information, therefore, yields only a non-randomly selected subset of records.

Furthermore, the quality of identifying information in databases containing only indirect identifiers is much lower than usually expected. Error rates in excess of 20% and more records containing incomplete or erroneous identifiers are encountered in practice….(More)”.

Applying new models of data stewardship to health and care data


Report by the Open Data Institute: “The outbreak of the coronavirus (Covid-19) has amplified and accelerated the need for an effective technology ecosystem that benefits everyone’s health. This report explores models of ‘data stewardship’ (the collection, maintenance and sharing of data) required to enable better evaluation

The pandemic has been accompanied by a marked increase in the use of digital technology, including introduction of remote consultation in general practice, new data flows to support the distribution of food and other essentials, and applications to support digital contact tracing.

This report explores models of ‘data stewardship’ (the collection, maintenance and sharing of data) required to enable better evaluation. It argues everybody involved in technology has a shared responsibility to enable evaluation, whether that means innovators sharing data for evaluation purposes, or healthcare providers being clearer, from the outset, about what data is needed to support effective evaluation.

This report re-envisages the role of evaluators as data stewards, who could use their positions as intermediaries to encourage stakeholders to share data, and help increase access to data for public benefit…(More)”.

EU risks being dethroned as world’s lead digital regulator


Marietje Schaake at the Financial Times: “With a series of executive orders, US president Donald Trump has quickly changed the digital regulatory game. His administration has adopted unprecedented sanctions against the Chinese technology group Huawei; next on the list of likely targets is the Chinese ecommerce group Alibaba.

The TikTok takeover saga continues, since the president this month ordered the sale of its US operations within 90 days. The administration’s Clean Network programme also claims to protect privacy by keeping “unsafe” companies out of US cable, cloud and app infrastructure. Engaging with a shared privacy agenda, which the EU has enshrined in law, would be a constructive step.

Instead, US secretary of state Mike Pompeo has prioritised warnings about the dangers posed by Huawei to individual EU member states during a recent visit. Yet these unilateral American actions also highlight weaknesses in Europe’s own preparedness and unity on issues of national security in the digital world. Beyond emphasising fundamental rights and economic rules, Europe must move fast if it does not want to see other global actors draw the road maps of regulation.

Recent years have seen the acceleration of national security arguments to restrict market access for global technology companies. Decisions on bans and sanctions tend to rely on the type of executive power that the EU lacks, especially in the national security domain. The bloc has never fully developed a common security policy — and deliberately so. In its white paper on artificial intelligence, the European Commission explicitly omits AI in the military context, and European geopolitical clout remains underused by politicians keen to advance their national postures.

Tensions between the promise of a digital single market and the absence of a common approach to security were revealed in fragmented responses to 5G concerns, as well as foreign acquisitions of strategic tech companies. This ad hoc policy toolbox may well prove inadequate to build the co-ordination needed for a forceful European strategy. The US tussle with TikTok and Huawei should be a lesson to European politicians on their approach to regulating tech.

A confident Europe might argue that concerns about terabytes of the most intimate information being shared with foreign companies were promptly met with the EU’s general data protection regulations. A more critical voice would counter that Europe does not appreciate the risks of integrating Chinese tech into 5G networks, and that its narrow focus on fundamental rights and market regulations in the digital world was always naive.

Either way, now that geopolitics is integrating with tech policy, the EU risks being dethroned as the lead regulator of the digital world. In many ways it is remarkable that a reckoning took this long. For decades, online products and services have evaded restrictions on their reach into global communities. But the long-anticipated collision of geopolitics and technological disruption is finally here. It will do significant collateral damage to the open internet.

The challenge for democracies is to preserve their own core values and interests, along with the benefits of an open, global internet. A series of nationalistic bans and restrictions will not achieve these goals. Instead it will unleash a digital trade war at the expense of internet users worldwide..(More)”.

Personal data, public data, privacy & power: GDPR & company data


Open Corporates: “…there are three other aspects which are relevant when talking about access to EU company data.

Cargo-culting GDPR

The first, is a tendency to take this complex and subtle legislation that is GDPR and use a poorly understood version in other legislation and regulation, even if that regulation is already covered by GDPR. This actually undermines the GDPR regime, and prevents it from working effectively, and should strongly be resisted. In the tech world, such approaches are called ‘cargo-culting’.

Similarly GDPR is often used as an excuse for not releasing company information as open data, even when the same data is being sold to third parties apparently without concerns — if one is covered by GDPR, the other certainly should be.

Widened power asymmetries

The second issue is the unintended consequences of GDPR, specifically the way it increases asymmetries of power and agency. For example, something like the so-called Right To Be Forgotten takes very significant resources to implement, and so actually strengthens the position of the giant tech companies — for such companies, investing millions in large teams to decide who should and should not be given the Right To Be Forgotten is just a relatively small cost of doing business.

Another issue is the growth of a whole new industry dedicated to removing traces of people’s past from the internet (2), which is also increasing the asymmetries of power. The vast majority of people are not directors of companies, or beneficial owners, and it is only the relatively rich and powerful (including politicians and criminals) who can afford lawyers to stifle free speech, or remove parts of their past they would rather not be there, from business failures to associations with criminals.

OpenCorporates, for example, was threatened with a lawsuit from a member of one of the wealthiest families in Europe for reproducing a gazette notice from the Luxembourg official gazette (a publication that contains public notices). We refused to back down, believing we had a good case in law and in the public interest, and the other side gave up. But such so-called SLAPP suits are becoming increasingly common, although unlike many US states there are currently no defences in place to resist these in the EU, despite pressure from civil society to address this….

At the same time, the automatic assumption that all Personally Identifiable Information (PII), someone’s name for example, is private is highly problematic, confusing both citizens and policy makers, and further undermining democracies and fair societies. As an obvious case, it’s critical that we know the names of our elected representatives, and those in positions of power, otherwise we would have an opaque society where decisions are made by nameless individuals with opaque agendas and personal interests — such as a leader awarding a contract to their brother’s company, for example.

As the diagram below illustrates, there is some personally identifiable information that it’s strongly in the public interest to know. Take the director or beneficial owner of a company, for example, of course their details are PII — clearly you need to know their name (and other information too), otherwise what actually do you know about them, or the company (only that some unnamed individual has been given special protection under law to be shielded from the company’s debts and actions, and yet can benefit from its profits)?

On the other hand, much of the data which is truly about our privacy — the profiles, inferences and scores that companies store on us — is explicitly outside GDPR, if it doesn’t contain PII.

Image for post

Hopefully, as awareness of the issues increases, we will develop a more nuanced, deeper, understanding of privacy, such that case law around GDPR, and successors to this legislation begin to rebalance and case law starts to bring clarity to the ambiguities of the GDPR….(More)”.

‘Telegram revolution’: App helps drive Belarus protests


Daria Litvinova at AP News: “Every day, like clockwork, to-do lists for those protesting against Belarus’ authoritarian leader appear in the popular Telegram messaging app. They lay out goals, give times and locations of rallies with business-like precision, and offer spirited encouragement.

“Today will be one more important day in the fight for our freedom. Tectonic shifts are happening on all fronts, so it’s important not to slow down,” a message in one of Telegram’s so-called channels read Tuesday. “Morning. Expanding the strike … 11:00. Supporting the Kupala (theater) … 19:00. Gathering at the Independence Square.”

The app has become an indispensable tool in coordinating the unprecedented mass protests that have rocked Belarus since Aug. 9, when election officials announced President Alexander Lukashenko had won a landslide victory to extend his 26-year rule in a vote widely seen as rigged.

Peaceful protesters who poured into the streets of the capital, Minsk, and other cities were met with stun grenades, rubber bullets and beatings from police. The opposition candidate left for Lithuania — under duress, her campaign said — and authorities shut off the internet, leaving Belarusians with almost no access to independent online news outlets or social media and protesters seemingly without a leader.

That’s where Telegram — which often remains available despite internet outages, touts the security of messages shared in the app and has been used in other protest movements — came in. Some of its channels helped scattered rallies to mature into well-coordinated action.

The people who run the channels, which used to offer political news, now post updates, videos and photos of the unfolding turmoil sent in from users, locations of heavy police presence, contacts of human rights activists, and outright calls for new demonstrations — something Belarusian opposition leaders have refrained from doing publicly themselves. Tens of thousands of people all across the country have responded to those calls.

In a matter of days, the channels — NEXTA, NEXTA Live and Belarus of the Brain are the most popular — have become the main method for facilitating the protests, said Franak Viacorka, a Belarusian analyst and non-resident fellow at the Atlantic Council….(More)”.

Health Data Privacy under the GDPR: Big Data Challenges and Regulatory Responses


Book edited by Maria Tzanou: “The growth of data collecting goods and services, such as ehealth and mhealth apps, smart watches, mobile fitness and dieting apps, electronic skin and ingestible tech, combined with recent technological developments such as increased capacity of data storage, artificial intelligence and smart algorithms have spawned a big data revolution that has reshaped how we understand and approach health data. Recently the COVID-19 pandemic has foregrounded a variety of data privacy issues. The collection, storage, sharing and analysis of health- related data raises major legal and ethical questions relating to privacy, data protection, profiling, discrimination, surveillance, personal autonomy and dignity.

This book examines health privacy questions in light of the GDPR and the EU’s general data privacy legal framework. The GDPR is a complex and evolving body of law that aims to deal with several technological and societal health data privacy problems, while safeguarding public health interests and addressing its internal gaps and uncertainties. The book answers a diverse range of questions including: What role can the GDPR play in regulating health surveillance and big (health) data analytics? Can it catch up with the Internet age developments? Are the solutions to the challenges posed by big health data to be found in the law? Does the GDPR provide adequate tools and mechanisms to ensure public health objectives and the effective protection of privacy? How does the GDPR deal with data that concern children’s health and academic research?

By analysing a number of diverse questions concerning big health data under the GDPR from various different perspectives, this book will appeal to those interested in privacy, data protection, big data, health sciences, information technology, the GDPR, EU and human rights law….(More)”.

Blame the politicians, not the technology, for A-level fiasco


The Editorial Board at the Financial Times: “The soundtrack of school students marching through Britain’s streets shouting “f*** the algorithm” captured the sense of outrage surrounding the botched awarding of A-level exam grades this year. But the students’ anger towards a disembodied computer algorithm is misplaced. This was a human failure. The algorithm used to “moderate” teacher-assessed grades had no agency and delivered exactly what it was designed to do.

It is politicians and educational officials who are responsible for the government’s latest fiasco and should be the target of students’ criticism….

Sensibly designed, computer algorithms could have been used to moderate teacher assessments in a constructive way. Using past school performance data, they could have highlighted anomalies in the distribution of predicted grades between and within schools. That could have led to a dialogue between Ofqual, the exam regulator, and anomalous schools to come up with more realistic assessments….

There are broader lessons to be drawn from the government’s algo fiasco about the dangers of automated decision-making systems. The inappropriate use of such systems to assess immigration status, policing policies and prison sentencing decisions is a live danger. In the private sector, incomplete and partial data sets can also significantly disadvantage under-represented groups when it comes to hiring decisions and performance measures.

Given the severe erosion of public trust in the government’s use of technology, it might now be advisable to subject all automated decision-making systems to critical scrutiny by independent experts. The Royal Statistical Society and The Alan Turing Institute certainly have the expertise to give a Kitemark of approval or flag concerns.

As ever, technology in itself is neither good nor bad. But it is certainly not neutral. The more we deploy automated decision-making systems, the smarter we must become in considering how best to use them and in scrutinising their outcomes. We often talk about a deficit of trust in our societies. But we should also be aware of the dangers of over-trusting technology. That may be a good essay subject for next year’s philosophy A-level….(More)”.