The value of data in Canada: Experimental estimates


Statistics Canada: “As data and information take on a far more prominent role in Canada and, indeed, all over the world, data, databases and data science have become a staple of modern life. When the electricity goes out, Canadians are as much in search of their data feed as they are food and heat. Consumers are using more and more data that is embodied in the products they buy, whether those products are music, reading material, cars and other appliances, or a wide range of other goods and services. Manufacturers, merchants and other businesses depend increasingly on the collection, processing and analysis of data to make their production processes more efficient and to drive their marketing strategies.

The increasing use of and investment in all things data is driving economic growth, changing the employment landscape and reshaping how and from where we buy and sell goods. Yet the rapid rise in the use and importance of data is not well measured in the existing statistical system. Given the ‘lack of data on data’, Statistics Canada has initiated new research to produce a first set of estimates of the value of data, databases and data science. The development of these estimates benefited from collaboration with the Bureau of Economic Analysis in the United States and the Organisation for Economic Co-operation and Development.

In 2018, Canadian investment in data, databases and data science was estimated to be as high as $40 billion. This was greater than the annual investment in industrial machinery, transportation equipment, and research and development and represented approximately 12% of total non-residential investment in 2018….

Statistics Canada recently released a conceptual framework outlining how one might measure the economic value of data, databases and data science. Thanks to this new framework, the growing role of data in Canada can be measured through time. This framework is described in a paper that was released in The Daily on June 24, 2019 entitled “Measuring investments in data, databases and data science: Conceptual framework.” That paper describes the concept of an ‘information chain’ in which data are derived from everyday observations, databases are constructed from data, and data science creates new knowledge by analyzing the contents of databases….(More)”.

How we can place a value on health care data


Report by E&Y: “Unlocking the power of health care data to fuel innovation in medical research and improve patient care is at the heart of today’s health care revolution. When curated or consolidated into a single longitudinal dataset, patient-level records will trace a complete story of a patient’s demographics, health, wellness, diagnosis, treatments, medical procedures and outcomes. Health care providers need to recognize patient data for what it is: a valuable intangible asset desired by multiple stakeholders, a treasure trove of information.

Among the universe of providers holding significant data assets, the United Kingdom’s National Health Service (NHS) is the single largest integrated health care provider in the world. Its patient records cover the entire UK population from birth to death.

We estimate that the 55 million patient records held by the NHS today may have an indicative market value of several billion pounds to a commercial organization. We estimate also that the value of the curated NHS dataset could be as much as £5bn per annum and deliver around £4.6bn of benefit to patients per annum, in potential operational savings for the NHS, enhanced patient outcomes and generation of wider economic benefits to the UK….(More)”.

The plan to mine the world’s research papers


Priyanka Pulla in Nature: “Carl Malamud is on a crusade to liberate information locked up behind paywalls — and his campaigns have scored many victories. He has spent decades publishing copyrighted legal documents, from building codes to court records, and then arguing that such texts represent public-domain law that ought to be available to any citizen online. Sometimes, he has won those arguments in court. Now, the 60-year-old American technologist is turning his sights on a new objective: freeing paywalled scientific literature. And he thinks he has a legal way to do it.

Over the past year, Malamud has — without asking publishers — teamed up with Indian researchers to build a gigantic store of text and images extracted from 73 million journal articles dating from 1847 up to the present day. The cache, which is still being created, will be kept on a 576-terabyte storage facility at Jawaharlal Nehru University (JNU) in New Delhi. “This is not every journal article ever written, but it’s a lot,” Malamud says. It’s comparable to the size of the core collection in the Web of Science database, for instance. Malamud and his JNU collaborator, bioinformatician Andrew Lynn, call their facility the JNU data depot.

No one will be allowed to read or download work from the repository, because that would breach publishers’ copyright. Instead, Malamud envisages, researchers could crawl over its text and data with computer software, scanning through the world’s scientific literature to pull out insights without actually reading the text.

The unprecedented project is generating much excitement because it could, for the first time, open up vast swathes of the paywalled literature for easy computerized analysis. Dozens of research groups already mine papers to build databases of genes and chemicals, map associations between proteins and diseases, and generate useful scientific hypotheses. But publishers control — and often limit — the speed and scope of such projects, which typically confine themselves to abstracts, not full text. Researchers in India, the United States and the United Kingdom are already making plans to use the JNU store instead. Malamud and Lynn have held workshops at Indian government laboratories and universities to explain the idea. “We bring in professors and explain what we are doing. They get all excited and they say, ‘Oh gosh, this is wonderful’,” says Malamud.

But the depot’s legal status isn’t yet clear. Malamud, who contacted several intellectual-property (IP) lawyers before starting work on the depot, hopes to avoid a lawsuit. “Our position is that what we are doing is perfectly legal,” he says. For the moment, he is proceeding with caution: the JNU data depot is air-gapped, meaning that no one can access it from the Internet. Users have to physically visit the facility, and only researchers who want to mine for non-commercial purposes are currently allowed in. Malamud says his team does plan to allow remote access in the future. “The hope is to do this slowly and deliberately. We are not throwing this open right away,” he says….(More)”.

Governing Smart Data in the Public Interest: Lessons from Ontario’s Smart Metering Entity


Paper by Teresa Scassa and Merlynda Vilain: “The collection of vast quantities of personal data from embedded sensors is increasingly an aspect of urban life. This type of data collection is a feature of so-called smart cities, and it raises important questions about data governance. This is particularly the case where the data may be made available for reuse by others and for a variety of purposes.

This paper focuses on the governance of data captured through “smart” technologies and uses Ontario’s smart metering program as a case study. Ontario rolled out mandatory smart metering for electrical consumption in the early 2000s largely to meet energy conservation goals. In doing so, it designed a centralized data governance system overseen by the Smart Metering Entity to manage smart meter data and to protect consumer privacy. As interest in access to the data grew among third parties, and as new potential applications for the data emerged, the regulator sought to develop a model for data sharing that would protect privacy in relation to these new uses and that would avoid uses that might harm the public interest…(More)”.

How I Learned to Stop Worrying and Love the GDPR


Ariane Adam at DataStewards.net: “The General Data Protection Regulation (GDPR) was approved by the EU Parliament on 14 April 2016 and came into force on 25 May 2018….

The coming into force of this important regulation has created confusion and concern about penalties, particularly in the private sector….There is also apprehension about how the GDPR will affect the opening and sharing of valuable databases. At a time when open data is increasingly shaping the choices we make, from finding the fastest route home to choosing the best medical or education provider, misinformation about data protection principles leads to concerns that ‘privacy’ will be used as a smokescreen to not publish important information. Allaying the concerns of private organisations and businesses in this area is particularly important as often the datasets that most matter, and that could have the most impact if they were open, do not belong to governments.

Looking at the regulation and its effects about one year on, this paper advances a positive case for the GDPR and aims to demonstrate that a proper understanding of its underlying principles can not only assist in promoting consumer confidence and therefore business growth, but also enable organisations to safely open and share important and valuable datasets….(More)”.

Trusted data and the future of information sharing


 MIT Technology Review: “Data in some form underpins almost every action or process in today’s modern world. Consider that even farming, the world’s oldest industry, is on the verge of a digital revolution, with AI, drones, sensors, and blockchain technology promising to boost efficiencies. The market value of an apple will increasingly reflect not only traditional farming inputs but also some value of modern data, such as weather patterns, soil acidity levels and agri-supply-chain information. By 2022 more than 60% of global GDP will be digitized, according to IDC.

Governments seeking to foster growth in their digital economies need to be more active in encouraging safe data sharing between organizations. Tolerating the sharing of data and stepping in only where security breaches occur is no longer enough. Sharing data across different organizations enables the whole ecosystem to grow and can be a unique source of competitive advantage. But businesses need guidelines and support in how to do this effectively.   

This is how Singapore’s data-sharing worldview has evolved, according to Janil Puthucheary, senior minister of state for communications and information and transport, upon launching the city-state’s new Trusted Data Sharing Framework in June 2019.

The Framework, a product of consultations between Singapore’s Infocomm Media Development Authority (IMDA), its Personal Data Protection Commission (PDPC), and industry players, is intended to create a common data-sharing language for relevant stakeholders. Specifically, it addresses four common categories of concerns with data sharing: how to formulate an overall data-sharing strategy, legal and regulatory considerations, technical and organizational considerations, and the actual operationalizing of data sharing.

For instance, companies often have trouble assessing the value of their own data, a necessary first step before sharing should even be considered. The framework describes the three general approaches used: market-, cost-, and income-based. The legal and regulatory section details when businesses can, among other things, seek exemptions from Singapore’s Personal Data Protection Act.

The technical and organizational chapter includes details on governance, infrastructure security, and risk management. Finally, the section on operational aspects of data sharing includes guidelines for when it is appropriate to use shared data for a secondary purpose or not….(More)”.

How credit unions could help people make the most of personal data


Dylan Walsh at MIT Sloan: “In May of 2018, the EU adopted the General Data Protection Regulation, referred to by The New York Timesas “the world’s toughest rules to protect people’s online data.” Among its many safeguards, the GDPR gave individuals ownership of their personal data and thereby restricted its collection and use by businesses.

“That’s a good first start,” said Alex Pentland, a co-creator of the MIT Media Lab who played a foundational role in the development of the GDPR. “But ownership isn’t enough. Simply having the rights to your data doesn’t allow you to do much with it.” In response to this shortcoming, Pentland and his team have proposed the establishment of data cooperatives.

The idea is conceptually straightforward: Individuals would pool their personal data in a single institution — just as they pool money in banks — and that institution would both protect the data and put it to use. Pentland and his team suggest credit unions as one type of organization that could fill this role. And while companies would need to request permission to use consumer data, consumers themselves could request analytic insights from the cooperative. Lyft drivers, for instance, might compare their respective incomes across routes, and ride-share passengers could compare how much they pay relative to other cooperative members….

Several states have now asked credit unions to look into the idea of data cooperatives, but the model has yet to gain a foothold. “Credit unions are conservative,” Pentland said. But assuming the idea gains traction, the infrastructure won’t be difficult to build. Technology exists to automatically record and organize all the data that we give to companies; and credit unions, which have 100 million members nationwide, possess charters readymade to take on data management….(More)”.

Mobile phone data’s potential for informing infrastructure planning in developing countries


Paper by Hadrien Salat, Zbigniew Smoreda, and Markus Schläpfer: “High quality census data are not always available in developing countries. Instead, mobile phone data are becoming a go to proxy to evaluate population density, activity and social characteristics. They offer additional advantages for infrastructure planning such as being updated in real-time, including mobility information and recording temporary visitors’ activity. We combine various data sets from Senegal to evaluate mobile phone data’s potential to replace insufficient census data for infrastructure planning in developing countries. As an applied case, we test their ability at predicting accurately domestic electricity consumption. We show that, contrary to common belief, average mobile phone activity is not well correlated with population density. However, it can provide better electricity consumption estimates than basic census data. More importantly, we successfully use curve and network clustering techniques to enhance the accuracy of the predictions, to recover good population mapping potential and to reduce the collection of informative data for planning to substantially smaller samples….(More)”.

Sharing data can help prevent public health emergencies in Africa


Moses John Bockarie at The Conversation: “Global collaboration and sharing data on public health emergencies is important to fight the spread of infectious diseases. If scientists and health workers can openly share their data across regions and organisations, countries can be better prepared and respond faster to disease outbreaks.

This was the case in with the 2014 Ebola outbreak in West Africa. Close to 100 scientists, clinicians, health workers and data analysts from around the world worked together to help contain the spread of the disease.

But there’s a lack of trust when it comes to sharing data in north-south collaborations. African researchers are suspicious that their northern partners could publish data without acknowledging the input from the less resourced southern institutions where the data was first generated. Until recently, the authorship of key scientific publications, based on collaborative work in Africa, was dominated by scientists from outside Africa.

The Global Research Collaboration for Infectious Disease Preparedness, an international network of major research funding organisations, recently published a roadmap to data sharing. This may go some way to address the data sharing challenges. Members of the network are expected to encourage their grantees to be inclusive and publish their results in open access journals. The network includes major funders of research in Africa like the European Commission, Bill & Melinda Gates Foundation and Wellcome Trust.

The roadmap provides a guide on how funders can accelerate research data sharing by the scientists they fund. It recommends that research funding institutions make real-time, external data sharing a requirement. And that research needs to be part of a multi-disciplinary disease network to advance public health emergencies responses.

In addition, funding should focus on strengthening institutions’ capacity on a number of fronts. This includes data management, improving data policies, building trust and aligning tools for data sharing.

Allowing researchers to freely access data generated by global academic counterparts is critical for rapidly informing disease control strategies in public health emergencies….(More)”.

Clinical Trial Data Transparency and GDPR Compliance: Implications for Data Sharing and Open Innovation


Paper by Timo Minssen, Rajam N. and Marcel Bogers: “Recent EU initiatives and legislations have considerably increased public access to clinical trials data (CTD). These developments are generally much welcomed for the enhancement of science, trust, and open innovation. However, they also raise many questions and concerns, not least at the interface between CTD transparency and other areas of evolving EU law on the protection of trade secrets, intellectual property rights and privacy.

This paper focuses on privacy issues and on the interrelation between developments in transparency and the EU’s new General Data Protection Regulation 2016/679 (GDPR). More specifically, this paper examines: (1) the genesis of EU transparency regulations, including the incidents, developments and policy concerns that have shaped them; (2) the features and implications of the GDPR which are relevant in the context of clinical trials; and (3) the risk for tensions between the GDPR and the policy goals of CTD transparency, including their implications for data sharing and open innovation. Ultimately, we stress that these and other related factors must be carefully considered and addressed to reap the full benefits of CTD transparency….(More)”.