Synthetic data: Unlocking the power of data and skills for machine learning


Karen Walker at Gov.UK: “Defence generates and holds a lot of data. We want to be able to get the best out of it, unlocking new insights that aren’t currently visible, through the use of innovative data science and analytics techniques tailored to defence’s specific needs. But this can be difficult because our data is often sensitive for a variety of reasons. For example, this might include information about the performance of particular vehicles, or personnel’s operational deployment details.

It is therefore often challenging to share data with experts who sit outside the Ministry of Defence, particularly amongst the wider data science community in government, small companies and academia. The use of synthetic data gives us a way to address this challenge and to benefit from the expertise of a wider range of people by creating datasets which aren’t sensitive. We have recently published a report from this work….(More)”.

Double image of original data and synthetic data in a 2D chart. The two images look almost identical

Business-to-Business Data Sharing: An Economic and Legal Analysis


Paper by Bertin Martens et al: “The European Commission announced in its Data Strategy (2020) its intentions to propose an enabling legislative framework for the governance of common European data spaces, to review and operationalize data portability, to prioritize standardization activities and foster data interoperability and to clarify usage rights for co-generated IoT data. This Strategy starts from the premise that there is not enough data sharing and that much data remain locked up and are not available for innovative re-use. The Commission will also consider the adoption of a New Competition Tool as well as the adoption of ex ante regulation for large online gate-keeping platforms as part of the announced Digital Services Act Package . In this context, the goal of this report is to examine the obstacles to Business-to-Business (B2B) data sharing: what keeps businesses from sharing or trading more of their data with other businesses and what can be done about it? For this purpose, this report uses the well-known tools of legal and economic thinking about market failures. It starts from the economic characteristics of data and explores to what extent private B2B data markets result in a socially optimal degree of data sharing, or whether there are market failures in data markets that might justify public policy intervention.

It examines the conditions under which monopolistic data market failures may occur. It contrasts these welfare losses with the welfare gains from economies of scope in data aggregation in large pools. It also discusses other potential sources of B2B data market failures due to negative externalities, risks and transaction costs and asymmetric information situations. In a next step, the paper explores solutions to overcome these market failures. Private third-party data intermediaries may be in a position to overcome market failures due to high transactions costs and risks. They can aggregate data in large pools to harvest the benefits of economies of scale and scope in data. Where third-party intervention fails, regulators can step in, with ex-post competition instruments and with ex-ante regulation. The latter includes data portability rights for personal data and mandatory data access rights….(More)”.

Privacy-Preserving Record Linkage in the context of a National Statistics Institute


Guidance by Rainer Schnell: “Linking existing administrative data sets on the same units is used increasingly as a research strategy in many different fields. Depending on the academic field, this kind of operation has been given different names, but in application areas, this approach is mostly denoted as record linkage. Although linking data on organisations or economic entities is common, the most interesting applications of record linkage concern data on persons. Starting in medicine, this approach is now also being used in the social sciences and official statistics. Furthermore, the joint use of survey data with administrative data is now standard practice. For example, victimisation surveys are linked to police records, labour force surveys are linked to social security databases, and censuses are linked to surveys.

Merging different databases containing information on the same unit is technically trivial if all involved databases have a common identification number, such as a social security number or, as in the Scandinavian countries, a permanent personal identification number. Most of the modern identification numbers contain checksum mechanisms so that errors in these identifiers can be easily detected and corrected. Due to the many advantages of permanent personal identification numbers, similar systems have been introduced or discussed in some European countries outside Scandinavia.

In many jurisdictions, no permanent personal identification number is available for linkage. Examples are New Zealand, Australia, the UK, and Germany. Here, the linkage is most often based on alphanumeric identifiers such as surname, first name, address, and place of birth. In the literature, such identifiers are most often denoted as indirect or quasi-identifiers. Such identifiers are prone to error, for example, due to typographical errors, memory faults (previous addresses), different recordings of the same identifier (for example, swapping of substrings: reversal of first name and last name), deliberately false information (for example, year of birth) or changes of values over time (for example name changes due to marriages). Linking on exact matching information, therefore, yields only a non-randomly selected subset of records.

Furthermore, the quality of identifying information in databases containing only indirect identifiers is much lower than usually expected. Error rates in excess of 20% and more records containing incomplete or erroneous identifiers are encountered in practice….(More)”.

Applying new models of data stewardship to health and care data


Report by the Open Data Institute: “The outbreak of the coronavirus (Covid-19) has amplified and accelerated the need for an effective technology ecosystem that benefits everyone’s health. This report explores models of ‘data stewardship’ (the collection, maintenance and sharing of data) required to enable better evaluation

The pandemic has been accompanied by a marked increase in the use of digital technology, including introduction of remote consultation in general practice, new data flows to support the distribution of food and other essentials, and applications to support digital contact tracing.

This report explores models of ‘data stewardship’ (the collection, maintenance and sharing of data) required to enable better evaluation. It argues everybody involved in technology has a shared responsibility to enable evaluation, whether that means innovators sharing data for evaluation purposes, or healthcare providers being clearer, from the outset, about what data is needed to support effective evaluation.

This report re-envisages the role of evaluators as data stewards, who could use their positions as intermediaries to encourage stakeholders to share data, and help increase access to data for public benefit…(More)”.

EU risks being dethroned as world’s lead digital regulator


Marietje Schaake at the Financial Times: “With a series of executive orders, US president Donald Trump has quickly changed the digital regulatory game. His administration has adopted unprecedented sanctions against the Chinese technology group Huawei; next on the list of likely targets is the Chinese ecommerce group Alibaba.

The TikTok takeover saga continues, since the president this month ordered the sale of its US operations within 90 days. The administration’s Clean Network programme also claims to protect privacy by keeping “unsafe” companies out of US cable, cloud and app infrastructure. Engaging with a shared privacy agenda, which the EU has enshrined in law, would be a constructive step.

Instead, US secretary of state Mike Pompeo has prioritised warnings about the dangers posed by Huawei to individual EU member states during a recent visit. Yet these unilateral American actions also highlight weaknesses in Europe’s own preparedness and unity on issues of national security in the digital world. Beyond emphasising fundamental rights and economic rules, Europe must move fast if it does not want to see other global actors draw the road maps of regulation.

Recent years have seen the acceleration of national security arguments to restrict market access for global technology companies. Decisions on bans and sanctions tend to rely on the type of executive power that the EU lacks, especially in the national security domain. The bloc has never fully developed a common security policy — and deliberately so. In its white paper on artificial intelligence, the European Commission explicitly omits AI in the military context, and European geopolitical clout remains underused by politicians keen to advance their national postures.

Tensions between the promise of a digital single market and the absence of a common approach to security were revealed in fragmented responses to 5G concerns, as well as foreign acquisitions of strategic tech companies. This ad hoc policy toolbox may well prove inadequate to build the co-ordination needed for a forceful European strategy. The US tussle with TikTok and Huawei should be a lesson to European politicians on their approach to regulating tech.

A confident Europe might argue that concerns about terabytes of the most intimate information being shared with foreign companies were promptly met with the EU’s general data protection regulations. A more critical voice would counter that Europe does not appreciate the risks of integrating Chinese tech into 5G networks, and that its narrow focus on fundamental rights and market regulations in the digital world was always naive.

Either way, now that geopolitics is integrating with tech policy, the EU risks being dethroned as the lead regulator of the digital world. In many ways it is remarkable that a reckoning took this long. For decades, online products and services have evaded restrictions on their reach into global communities. But the long-anticipated collision of geopolitics and technological disruption is finally here. It will do significant collateral damage to the open internet.

The challenge for democracies is to preserve their own core values and interests, along with the benefits of an open, global internet. A series of nationalistic bans and restrictions will not achieve these goals. Instead it will unleash a digital trade war at the expense of internet users worldwide..(More)”.

Personal data, public data, privacy & power: GDPR & company data


Open Corporates: “…there are three other aspects which are relevant when talking about access to EU company data.

Cargo-culting GDPR

The first, is a tendency to take this complex and subtle legislation that is GDPR and use a poorly understood version in other legislation and regulation, even if that regulation is already covered by GDPR. This actually undermines the GDPR regime, and prevents it from working effectively, and should strongly be resisted. In the tech world, such approaches are called ‘cargo-culting’.

Similarly GDPR is often used as an excuse for not releasing company information as open data, even when the same data is being sold to third parties apparently without concerns — if one is covered by GDPR, the other certainly should be.

Widened power asymmetries

The second issue is the unintended consequences of GDPR, specifically the way it increases asymmetries of power and agency. For example, something like the so-called Right To Be Forgotten takes very significant resources to implement, and so actually strengthens the position of the giant tech companies — for such companies, investing millions in large teams to decide who should and should not be given the Right To Be Forgotten is just a relatively small cost of doing business.

Another issue is the growth of a whole new industry dedicated to removing traces of people’s past from the internet (2), which is also increasing the asymmetries of power. The vast majority of people are not directors of companies, or beneficial owners, and it is only the relatively rich and powerful (including politicians and criminals) who can afford lawyers to stifle free speech, or remove parts of their past they would rather not be there, from business failures to associations with criminals.

OpenCorporates, for example, was threatened with a lawsuit from a member of one of the wealthiest families in Europe for reproducing a gazette notice from the Luxembourg official gazette (a publication that contains public notices). We refused to back down, believing we had a good case in law and in the public interest, and the other side gave up. But such so-called SLAPP suits are becoming increasingly common, although unlike many US states there are currently no defences in place to resist these in the EU, despite pressure from civil society to address this….

At the same time, the automatic assumption that all Personally Identifiable Information (PII), someone’s name for example, is private is highly problematic, confusing both citizens and policy makers, and further undermining democracies and fair societies. As an obvious case, it’s critical that we know the names of our elected representatives, and those in positions of power, otherwise we would have an opaque society where decisions are made by nameless individuals with opaque agendas and personal interests — such as a leader awarding a contract to their brother’s company, for example.

As the diagram below illustrates, there is some personally identifiable information that it’s strongly in the public interest to know. Take the director or beneficial owner of a company, for example, of course their details are PII — clearly you need to know their name (and other information too), otherwise what actually do you know about them, or the company (only that some unnamed individual has been given special protection under law to be shielded from the company’s debts and actions, and yet can benefit from its profits)?

On the other hand, much of the data which is truly about our privacy — the profiles, inferences and scores that companies store on us — is explicitly outside GDPR, if it doesn’t contain PII.

Image for post

Hopefully, as awareness of the issues increases, we will develop a more nuanced, deeper, understanding of privacy, such that case law around GDPR, and successors to this legislation begin to rebalance and case law starts to bring clarity to the ambiguities of the GDPR….(More)”.

‘Telegram revolution’: App helps drive Belarus protests


Daria Litvinova at AP News: “Every day, like clockwork, to-do lists for those protesting against Belarus’ authoritarian leader appear in the popular Telegram messaging app. They lay out goals, give times and locations of rallies with business-like precision, and offer spirited encouragement.

“Today will be one more important day in the fight for our freedom. Tectonic shifts are happening on all fronts, so it’s important not to slow down,” a message in one of Telegram’s so-called channels read Tuesday. “Morning. Expanding the strike … 11:00. Supporting the Kupala (theater) … 19:00. Gathering at the Independence Square.”

The app has become an indispensable tool in coordinating the unprecedented mass protests that have rocked Belarus since Aug. 9, when election officials announced President Alexander Lukashenko had won a landslide victory to extend his 26-year rule in a vote widely seen as rigged.

Peaceful protesters who poured into the streets of the capital, Minsk, and other cities were met with stun grenades, rubber bullets and beatings from police. The opposition candidate left for Lithuania — under duress, her campaign said — and authorities shut off the internet, leaving Belarusians with almost no access to independent online news outlets or social media and protesters seemingly without a leader.

That’s where Telegram — which often remains available despite internet outages, touts the security of messages shared in the app and has been used in other protest movements — came in. Some of its channels helped scattered rallies to mature into well-coordinated action.

The people who run the channels, which used to offer political news, now post updates, videos and photos of the unfolding turmoil sent in from users, locations of heavy police presence, contacts of human rights activists, and outright calls for new demonstrations — something Belarusian opposition leaders have refrained from doing publicly themselves. Tens of thousands of people all across the country have responded to those calls.

In a matter of days, the channels — NEXTA, NEXTA Live and Belarus of the Brain are the most popular — have become the main method for facilitating the protests, said Franak Viacorka, a Belarusian analyst and non-resident fellow at the Atlantic Council….(More)”.

Health Data Privacy under the GDPR: Big Data Challenges and Regulatory Responses


Book edited by Maria Tzanou: “The growth of data collecting goods and services, such as ehealth and mhealth apps, smart watches, mobile fitness and dieting apps, electronic skin and ingestible tech, combined with recent technological developments such as increased capacity of data storage, artificial intelligence and smart algorithms have spawned a big data revolution that has reshaped how we understand and approach health data. Recently the COVID-19 pandemic has foregrounded a variety of data privacy issues. The collection, storage, sharing and analysis of health- related data raises major legal and ethical questions relating to privacy, data protection, profiling, discrimination, surveillance, personal autonomy and dignity.

This book examines health privacy questions in light of the GDPR and the EU’s general data privacy legal framework. The GDPR is a complex and evolving body of law that aims to deal with several technological and societal health data privacy problems, while safeguarding public health interests and addressing its internal gaps and uncertainties. The book answers a diverse range of questions including: What role can the GDPR play in regulating health surveillance and big (health) data analytics? Can it catch up with the Internet age developments? Are the solutions to the challenges posed by big health data to be found in the law? Does the GDPR provide adequate tools and mechanisms to ensure public health objectives and the effective protection of privacy? How does the GDPR deal with data that concern children’s health and academic research?

By analysing a number of diverse questions concerning big health data under the GDPR from various different perspectives, this book will appeal to those interested in privacy, data protection, big data, health sciences, information technology, the GDPR, EU and human rights law….(More)”.

Blame the politicians, not the technology, for A-level fiasco


The Editorial Board at the Financial Times: “The soundtrack of school students marching through Britain’s streets shouting “f*** the algorithm” captured the sense of outrage surrounding the botched awarding of A-level exam grades this year. But the students’ anger towards a disembodied computer algorithm is misplaced. This was a human failure. The algorithm used to “moderate” teacher-assessed grades had no agency and delivered exactly what it was designed to do.

It is politicians and educational officials who are responsible for the government’s latest fiasco and should be the target of students’ criticism….

Sensibly designed, computer algorithms could have been used to moderate teacher assessments in a constructive way. Using past school performance data, they could have highlighted anomalies in the distribution of predicted grades between and within schools. That could have led to a dialogue between Ofqual, the exam regulator, and anomalous schools to come up with more realistic assessments….

There are broader lessons to be drawn from the government’s algo fiasco about the dangers of automated decision-making systems. The inappropriate use of such systems to assess immigration status, policing policies and prison sentencing decisions is a live danger. In the private sector, incomplete and partial data sets can also significantly disadvantage under-represented groups when it comes to hiring decisions and performance measures.

Given the severe erosion of public trust in the government’s use of technology, it might now be advisable to subject all automated decision-making systems to critical scrutiny by independent experts. The Royal Statistical Society and The Alan Turing Institute certainly have the expertise to give a Kitemark of approval or flag concerns.

As ever, technology in itself is neither good nor bad. But it is certainly not neutral. The more we deploy automated decision-making systems, the smarter we must become in considering how best to use them and in scrutinising their outcomes. We often talk about a deficit of trust in our societies. But we should also be aware of the dangers of over-trusting technology. That may be a good essay subject for next year’s philosophy A-level….(More)”.

The EU is launching a market for personal data. Here’s what that means for privacy.


Anna Artyushina at MIT Tech Review: “The European Union has long been a trendsetter in privacy regulation. Its General Data Protection Regulation (GDPR) and stringent antitrust laws have inspired new legislation around the world. For decades, the EU has codified protections on personal data and fought against what it viewed as commercial exploitation of private information, proudly positioning its regulations in contrast to the light-touch privacy policies in the United States.

The new European data governance strategy (pdf) takes a fundamentally different approach. With it, the EU will become an active player in facilitating the use and monetization of its citizens’ personal data. Unveiled by the European Commission in February 2020, the strategy outlines policy measures and investments to be rolled out in the next five years.

This new strategy represents a radical shift in the EU’s focus, from protecting individual privacy to promoting data sharing as a civic duty. Specifically, it will create a pan-European market for personal data through a mechanism called a data trust. A data trust is a steward that manages people’s data on their behalf and has fiduciary duties toward its clients.

The EU’s new plan considers personal data to be a key asset for Europe. However, this approach raises some questions. First, the EU’s intent to profit from the personal data it collects puts European governments in a weak position to regulate the industry. Second, the improper use of data trusts can actually deprive citizens of their rights to their own data.

The Trusts Project, the first initiative put forth by the new EU policies, will be implemented by 2022. With a €7 million budget, it will set up a pan-European pool of personal and nonpersonal information that should become a one-stop shop for businesses and governments looking to access citizens’ information.

Global technology companies will not be allowed to store or move Europeans’ data. Instead, they will be required to access it via the trusts. Citizens will collect “data dividends,” which haven’t been clearly defined but could include monetary or nonmonetary payments from companies that use their personal data. With the EU’s roughly 500 million citizens poised to become data sources, the trusts will create the world’s largest data market.

For citizens, this means the data created by them and about them will be held in public servers and managed by data trusts. The European Commission envisions the trusts as a way to help European businesses and governments reuse and extract value from the massive amounts of data produced across the region, and to help European citizens benefit from their information. The project documentation, however, does not specify how individuals will be compensated.

Data trusts were first proposed by internet pioneer Sir Tim Berners Lee in 2018, and the concept has drawn considerable interest since then. Just like the trusts used to manage one’s property, data trusts may serve different purposes: they can be for-profit enterprises, or they can be set up for data storage and protection, or to work for a charitable cause.

IBM and Mastercard have built a data trust to manage the financial information of their European clients in Ireland; the UK and Canada have employed data trusts to stimulate the growth of the AI industries there; and recently, India announced plans to establish its own public data trust to spur the growth of technology companies.

The new EU project is modeled on Austria’s digital system, which keeps track of information produced by and about its citizens by assigning them unique identifiers and storing the data in public repositories.

Unfortunately, data trusts do not guarantee more transparency. The trust is governed by a charter created by the trust’s settlor, and its rules can be made to prioritize someone’s interests. The trust is run by a board of directors, which means a party that has more seats gains significant control.

The Trusts Project is bound to face some governance issues of its own. Public and private actors often do not see eye to eye when it comes to running critical infrastructure or managing valuable assets. Technology companies tend to favor policies that create opportunity for their own products and services. Caught in a conflict of interest, Europe may overlook the question of privacy….(More)”.