data collaboratives

How we can place a value on health care data

Curated on July 23, 2019July 23, 2019 by Stefaan Verhulst

Report by E&Y: “Unlocking the power of health care data to fuel innovation in medical research and improve patient care is at the heart of today’s health care revolution. When curated or consolidated into a single longitudinal dataset, patient-level records will trace a complete story of a patient’s demographics, health, wellness, diagnosis, treatments, medical procedures and outcomes. Health care providers need to recognize patient data for what it is: a valuable intangible asset desired by multiple stakeholders, a treasure trove of information.

Among the universe of providers holding significant data assets, the United Kingdom’s National Health Service (NHS) is the single largest integrated health care provider in the world. Its patient records cover the entire UK population from birth to death.

We estimate that the 55 million patient records held by the NHS today may have an indicative market value of several billion pounds to a commercial organization. We estimate also that the value of the curated NHS dataset could be as much as £5bn per annum and deliver around £4.6bn of benefit to patients per annum, in potential operational savings for the NHS, enhanced patient outcomes and generation of wider economic benefits to the UK….(More)”.

The plan to mine the world’s research papers

Curated on July 22, 2019July 22, 2019 by Stefaan Verhulst

Priyanka Pulla in Nature: “Carl Malamud is on a crusade to liberate information locked up behind paywalls — and his campaigns have scored many victories. He has spent decades publishing copyrighted legal documents, from building codes to court records, and then arguing that such texts represent public-domain law that ought to be available to any citizen online. Sometimes, he has won those arguments in court. Now, the 60-year-old American technologist is turning his sights on a new objective: freeing paywalled scientific literature. And he thinks he has a legal way to do it.

Over the past year, Malamud has — without asking publishers — teamed up with Indian researchers to build a gigantic store of text and images extracted from 73 million journal articles dating from 1847 up to the present day. The cache, which is still being created, will be kept on a 576-terabyte storage facility at Jawaharlal Nehru University (JNU) in New Delhi. “This is not every journal article ever written, but it’s a lot,” Malamud says. It’s comparable to the size of the core collection in the Web of Science database, for instance. Malamud and his JNU collaborator, bioinformatician Andrew Lynn, call their facility the JNU data depot.

No one will be allowed to read or download work from the repository, because that would breach publishers’ copyright. Instead, Malamud envisages, researchers could crawl over its text and data with computer software, scanning through the world’s scientific literature to pull out insights without actually reading the text.

The unprecedented project is generating much excitement because it could, for the first time, open up vast swathes of the paywalled literature for easy computerized analysis. Dozens of research groups already mine papers to build databases of genes and chemicals, map associations between proteins and diseases, and generate useful scientific hypotheses. But publishers control — and often limit — the speed and scope of such projects, which typically confine themselves to abstracts, not full text. Researchers in India, the United States and the United Kingdom are already making plans to use the JNU store instead. Malamud and Lynn have held workshops at Indian government laboratories and universities to explain the idea. “We bring in professors and explain what we are doing. They get all excited and they say, ‘Oh gosh, this is wonderful’,” says Malamud.

But the depot’s legal status isn’t yet clear. Malamud, who contacted several intellectual-property (IP) lawyers before starting work on the depot, hopes to avoid a lawsuit. “Our position is that what we are doing is perfectly legal,” he says. For the moment, he is proceeding with caution: the JNU data depot is air-gapped, meaning that no one can access it from the Internet. Users have to physically visit the facility, and only researchers who want to mine for non-commercial purposes are currently allowed in. Malamud says his team does plan to allow remote access in the future. “The hope is to do this slowly and deliberately. We are not throwing this open right away,” he says….(More)”.

Governing Smart Data in the Public Interest: Lessons from Ontario’s Smart Metering Entity

Curated on July 20, 2019July 20, 2019 by Stefaan Verhulst

Paper by Teresa Scassa and Merlynda Vilain: “The collection of vast quantities of personal data from embedded sensors is increasingly an aspect of urban life. This type of data collection is a feature of so-called smart cities, and it raises important questions about data governance. This is particularly the case where the data may be made available for reuse by others and for a variety of purposes.

This paper focuses on the governance of data captured through “smart” technologies and uses Ontario’s smart metering program as a case study. Ontario rolled out mandatory smart metering for electrical consumption in the early 2000s largely to meet energy conservation goals. In doing so, it designed a centralized data governance system overseen by the Smart Metering Entity to manage smart meter data and to protect consumer privacy. As interest in access to the data grew among third parties, and as new potential applications for the data emerged, the regulator sought to develop a model for data sharing that would protect privacy in relation to these new uses and that would avoid uses that might harm the public interest…(More)”.

How I Learned to Stop Worrying and Love the GDPR

Curated on July 11, 2019July 11, 2019 by Stefaan Verhulst

Ariane Adam at DataStewards.net: “The General Data Protection Regulation (GDPR) was approved by the EU Parliament on 14 April 2016 and came into force on 25 May 2018….

The coming into force of this important regulation has created confusion and concern about penalties, particularly in the private sector….There is also apprehension about how the GDPR will affect the opening and sharing of valuable databases. At a time when open data is increasingly shaping the choices we make, from finding the fastest route home to choosing the best medical or education provider, misinformation about data protection principles leads to concerns that ‘privacy’ will be used as a smokescreen to not publish important information. Allaying the concerns of private organisations and businesses in this area is particularly important as often the datasets that most matter, and that could have the most impact if they were open, do not belong to governments.

Looking at the regulation and its effects about one year on, this paper advances a positive case for the GDPR and aims to demonstrate that a proper understanding of its underlying principles can not only assist in promoting consumer confidence and therefore business growth, but also enable organisations to safely open and share important and valuable datasets….(More)”.

Trusted data and the future of information sharing

Curated on July 11, 2019July 11, 2019 by Stefaan Verhulst

MIT Technology Review: “Data in some form underpins almost every action or process in today’s modern world. Consider that even farming, the world’s oldest industry, is on the verge of a digital revolution, with AI, drones, sensors, and blockchain technology promising to boost efficiencies. The market value of an apple will increasingly reflect not only traditional farming inputs but also some value of modern data, such as weather patterns, soil acidity levels and agri-supply-chain information. By 2022 more than 60% of global GDP will be digitized, according to IDC.

Governments seeking to foster growth in their digital economies need to be more active in encouraging safe data sharing between organizations. Tolerating the sharing of data and stepping in only where security breaches occur is no longer enough. Sharing data across different organizations enables the whole ecosystem to grow and can be a unique source of competitive advantage. But businesses need guidelines and support in how to do this effectively.

This is how Singapore’s data-sharing worldview has evolved, according to Janil Puthucheary, senior minister of state for communications and information and transport, upon launching the city-state’s new Trusted Data Sharing Framework in June 2019.

The Framework, a product of consultations between Singapore’s Infocomm Media Development Authority (IMDA), its Personal Data Protection Commission (PDPC), and industry players, is intended to create a common data-sharing language for relevant stakeholders. Specifically, it addresses four common categories of concerns with data sharing: how to formulate an overall data-sharing strategy, legal and regulatory considerations, technical and organizational considerations, and the actual operationalizing of data sharing.

For instance, companies often have trouble assessing the value of their own data, a necessary first step before sharing should even be considered. The framework describes the three general approaches used: market-, cost-, and income-based. The legal and regulatory section details when businesses can, among other things, seek exemptions from Singapore’s Personal Data Protection Act.

The technical and organizational chapter includes details on governance, infrastructure security, and risk management. Finally, the section on operational aspects of data sharing includes guidelines for when it is appropriate to use shared data for a secondary purpose or not….(More)”.

How credit unions could help people make the most of personal data

Curated on July 9, 2019July 9, 2019 by Stefaan Verhulst

Dylan Walsh at MIT Sloan: “In May of 2018, the EU adopted the General Data Protection Regulation, referred to by The New York Timesas “the world’s toughest rules to protect people’s online data.” Among its many safeguards, the GDPR gave individuals ownership of their personal data and thereby restricted its collection and use by businesses.

“That’s a good first start,” said Alex Pentland, a co-creator of the MIT Media Lab who played a foundational role in the development of the GDPR. “But ownership isn’t enough. Simply having the rights to your data doesn’t allow you to do much with it.” In response to this shortcoming, Pentland and his team have proposed the establishment of data cooperatives.

The idea is conceptually straightforward: Individuals would pool their personal data in a single institution — just as they pool money in banks — and that institution would both protect the data and put it to use. Pentland and his team suggest credit unions as one type of organization that could fill this role. And while companies would need to request permission to use consumer data, consumers themselves could request analytic insights from the cooperative. Lyft drivers, for instance, might compare their respective incomes across routes, and ride-share passengers could compare how much they pay relative to other cooperative members….

Several states have now asked credit unions to look into the idea of data cooperatives, but the model has yet to gain a foothold. “Credit unions are conservative,” Pentland said. But assuming the idea gains traction, the infrastructure won’t be difficult to build. Technology exists to automatically record and organize all the data that we give to companies; and credit unions, which have 100 million members nationwide, possess charters readymade to take on data management….(More)”.

Mobile phone data’s potential for informing infrastructure planning in developing countries

Curated on July 9, 2019July 11, 2019 by Stefaan Verhulst

Paper by Hadrien Salat, Zbigniew Smoreda, and Markus Schläpfer: “High quality census data are not always available in developing countries. Instead, mobile phone data are becoming a go to proxy to evaluate population density, activity and social characteristics. They offer additional advantages for infrastructure planning such as being updated in real-time, including mobility information and recording temporary visitors’ activity. We combine various data sets from Senegal to evaluate mobile phone data’s potential to replace insufficient census data for infrastructure planning in developing countries. As an applied case, we test their ability at predicting accurately domestic electricity consumption. We show that, contrary to common belief, average mobile phone activity is not well correlated with population density. However, it can provide better electricity consumption estimates than basic census data. More importantly, we successfully use curve and network clustering techniques to enhance the accuracy of the predictions, to recover good population mapping potential and to reduce the collection of informative data for planning to substantially smaller samples….(More)”.

Sharing data can help prevent public health emergencies in Africa

Curated on July 8, 2019July 8, 2019 by Stefaan Verhulst

Moses John Bockarie at The Conversation: “Global collaboration and sharing data on public health emergencies is important to fight the spread of infectious diseases. If scientists and health workers can openly share their data across regions and organisations, countries can be better prepared and respond faster to disease outbreaks.

This was the case in with the 2014 Ebola outbreak in West Africa. Close to 100 scientists, clinicians, health workers and data analysts from around the world worked together to help contain the spread of the disease.

But there’s a lack of trust when it comes to sharing data in north-south collaborations. African researchers are suspicious that their northern partners could publish data without acknowledging the input from the less resourced southern institutions where the data was first generated. Until recently, the authorship of key scientific publications, based on collaborative work in Africa, was dominated by scientists from outside Africa.

The Global Research Collaboration for Infectious Disease Preparedness, an international network of major research funding organisations, recently published a roadmap to data sharing. This may go some way to address the data sharing challenges. Members of the network are expected to encourage their grantees to be inclusive and publish their results in open access journals. The network includes major funders of research in Africa like the European Commission, Bill & Melinda Gates Foundation and Wellcome Trust.

The roadmap provides a guide on how funders can accelerate research data sharing by the scientists they fund. It recommends that research funding institutions make real-time, external data sharing a requirement. And that research needs to be part of a multi-disciplinary disease network to advance public health emergencies responses.

In addition, funding should focus on strengthening institutions’ capacity on a number of fronts. This includes data management, improving data policies, building trust and aligning tools for data sharing.

Allowing researchers to freely access data generated by global academic counterparts is critical for rapidly informing disease control strategies in public health emergencies….(More)”.

Clinical Trial Data Transparency and GDPR Compliance: Implications for Data Sharing and Open Innovation

Curated on July 2, 2019July 3, 2019 by Stefaan Verhulst

Paper by Timo Minssen, Rajam N. and Marcel Bogers: “Recent EU initiatives and legislations have considerably increased public access to clinical trials data (CTD). These developments are generally much welcomed for the enhancement of science, trust, and open innovation. However, they also raise many questions and concerns, not least at the interface between CTD transparency and other areas of evolving EU law on the protection of trade secrets, intellectual property rights and privacy.

This paper focuses on privacy issues and on the interrelation between developments in transparency and the EU’s new General Data Protection Regulation 2016/679 (GDPR). More specifically, this paper examines: (1) the genesis of EU transparency regulations, including the incidents, developments and policy concerns that have shaped them; (2) the features and implications of the GDPR which are relevant in the context of clinical trials; and (3) the risk for tensions between the GDPR and the policy goals of CTD transparency, including their implications for data sharing and open innovation. Ultimately, we stress that these and other related factors must be carefully considered and addressed to reap the full benefits of CTD transparency….(More)”.

We Need a Data-Rich Picture of What’s Killing the Planet

Curated on June 27, 2019July 19, 2019 by Stefaan Verhulst

Clive Thompson at Wired: “…Marine litter isn’t the only hazard whose contours we can’t fully see. The United Nations has 93 indicators to measure the environmental dimensions of “sustainable development,” and amazingly, the UN found that we have little to no data on 68 percent of them—like how rapidly land is being degraded, the rate of ocean acidification, or the trade in poached wildlife. Sometimes this is because we haven’t collected it; in other cases some data exists but hasn’t been shared globally, or it’s in a myriad of incompatible formats. No matter what, we’re flying blind. “And you can’t manage something if you can’t measure it,” says David Jensen, the UN’s head of environmental peacebuilding.

In other words, if we’re going to help the planet heal and adapt, we need a data revolution. We need to build a “digital ecosystem for the environment,” as Jensen puts it.

The good news is that we’ve got the tools. If there’s one thing tech excels at (for good and ill), it’s surveillance, right? We live in a world filled with cameras and pocket computers, titanic cloud computing, and the eerily sharp insights of machine learning. And this stuff can be used for something truly worthwhile: studying the planet.

There are already some remarkable cases of tech helping to break through the fog. Consider Global Fishing Watch, a nonprofit that tracks the world’s fishing vessels, looking for overfishing. They use everything from GPS-like signals emitted by ships to satellite infrared imaging of ship lighting, plugged into neural networks. (It’s massive, cloud-scale data: over 60 million data points per day, making the AI more than 90 percent accurate at classifying what type of fishing activity a boat is engaged in.)

“If a vessel is spending its time in an area that has little tuna and a lot of sharks, that’s questionable,” says Brian Sullivan, cofounder of the project and a senior program manager at Google Earth Outreach. Crucially, Global Fishing Watch makes its data open to anyone—so now the National Geographic Society is using it to lobby for new marine preserves, and governments and nonprofits use it to target illicit fishing.

If we want better environmental data, we’ll need for-profit companies with the expertise and high-end sensors to pitch in too. Planet, a firm with an array of 140 satellites, takes daily snapshots of the entire Earth. Customers like insurance and financial firms love that sort of data. (It helps them understand weather and climate risk.) But Planet also offers it to services like Global Forest Watch, which maps deforestation and makes the information available to anyone (like activists who help bust illegal loggers). Meanwhile, Google’s skill in cloud-based data crunching helps illuminate the state of surface water: Google digitized 30 years of measurements from around the globe—extracting some from ancient magnetic tapes—then created an easy-to-use online tool that lets resource-poor countries figure out where their water needs protecting….(More)”.