Trusted data and the future of information sharing


 MIT Technology Review: “Data in some form underpins almost every action or process in today’s modern world. Consider that even farming, the world’s oldest industry, is on the verge of a digital revolution, with AI, drones, sensors, and blockchain technology promising to boost efficiencies. The market value of an apple will increasingly reflect not only traditional farming inputs but also some value of modern data, such as weather patterns, soil acidity levels and agri-supply-chain information. By 2022 more than 60% of global GDP will be digitized, according to IDC.

Governments seeking to foster growth in their digital economies need to be more active in encouraging safe data sharing between organizations. Tolerating the sharing of data and stepping in only where security breaches occur is no longer enough. Sharing data across different organizations enables the whole ecosystem to grow and can be a unique source of competitive advantage. But businesses need guidelines and support in how to do this effectively.   

This is how Singapore’s data-sharing worldview has evolved, according to Janil Puthucheary, senior minister of state for communications and information and transport, upon launching the city-state’s new Trusted Data Sharing Framework in June 2019.

The Framework, a product of consultations between Singapore’s Infocomm Media Development Authority (IMDA), its Personal Data Protection Commission (PDPC), and industry players, is intended to create a common data-sharing language for relevant stakeholders. Specifically, it addresses four common categories of concerns with data sharing: how to formulate an overall data-sharing strategy, legal and regulatory considerations, technical and organizational considerations, and the actual operationalizing of data sharing.

For instance, companies often have trouble assessing the value of their own data, a necessary first step before sharing should even be considered. The framework describes the three general approaches used: market-, cost-, and income-based. The legal and regulatory section details when businesses can, among other things, seek exemptions from Singapore’s Personal Data Protection Act.

The technical and organizational chapter includes details on governance, infrastructure security, and risk management. Finally, the section on operational aspects of data sharing includes guidelines for when it is appropriate to use shared data for a secondary purpose or not….(More)”.

How credit unions could help people make the most of personal data


Dylan Walsh at MIT Sloan: “In May of 2018, the EU adopted the General Data Protection Regulation, referred to by The New York Timesas “the world’s toughest rules to protect people’s online data.” Among its many safeguards, the GDPR gave individuals ownership of their personal data and thereby restricted its collection and use by businesses.

“That’s a good first start,” said Alex Pentland, a co-creator of the MIT Media Lab who played a foundational role in the development of the GDPR. “But ownership isn’t enough. Simply having the rights to your data doesn’t allow you to do much with it.” In response to this shortcoming, Pentland and his team have proposed the establishment of data cooperatives.

The idea is conceptually straightforward: Individuals would pool their personal data in a single institution — just as they pool money in banks — and that institution would both protect the data and put it to use. Pentland and his team suggest credit unions as one type of organization that could fill this role. And while companies would need to request permission to use consumer data, consumers themselves could request analytic insights from the cooperative. Lyft drivers, for instance, might compare their respective incomes across routes, and ride-share passengers could compare how much they pay relative to other cooperative members….

Several states have now asked credit unions to look into the idea of data cooperatives, but the model has yet to gain a foothold. “Credit unions are conservative,” Pentland said. But assuming the idea gains traction, the infrastructure won’t be difficult to build. Technology exists to automatically record and organize all the data that we give to companies; and credit unions, which have 100 million members nationwide, possess charters readymade to take on data management….(More)”.

Mobile phone data’s potential for informing infrastructure planning in developing countries


Paper by Hadrien Salat, Zbigniew Smoreda, and Markus Schläpfer: “High quality census data are not always available in developing countries. Instead, mobile phone data are becoming a go to proxy to evaluate population density, activity and social characteristics. They offer additional advantages for infrastructure planning such as being updated in real-time, including mobility information and recording temporary visitors’ activity. We combine various data sets from Senegal to evaluate mobile phone data’s potential to replace insufficient census data for infrastructure planning in developing countries. As an applied case, we test their ability at predicting accurately domestic electricity consumption. We show that, contrary to common belief, average mobile phone activity is not well correlated with population density. However, it can provide better electricity consumption estimates than basic census data. More importantly, we successfully use curve and network clustering techniques to enhance the accuracy of the predictions, to recover good population mapping potential and to reduce the collection of informative data for planning to substantially smaller samples….(More)”.

Sharing data can help prevent public health emergencies in Africa


Moses John Bockarie at The Conversation: “Global collaboration and sharing data on public health emergencies is important to fight the spread of infectious diseases. If scientists and health workers can openly share their data across regions and organisations, countries can be better prepared and respond faster to disease outbreaks.

This was the case in with the 2014 Ebola outbreak in West Africa. Close to 100 scientists, clinicians, health workers and data analysts from around the world worked together to help contain the spread of the disease.

But there’s a lack of trust when it comes to sharing data in north-south collaborations. African researchers are suspicious that their northern partners could publish data without acknowledging the input from the less resourced southern institutions where the data was first generated. Until recently, the authorship of key scientific publications, based on collaborative work in Africa, was dominated by scientists from outside Africa.

The Global Research Collaboration for Infectious Disease Preparedness, an international network of major research funding organisations, recently published a roadmap to data sharing. This may go some way to address the data sharing challenges. Members of the network are expected to encourage their grantees to be inclusive and publish their results in open access journals. The network includes major funders of research in Africa like the European Commission, Bill & Melinda Gates Foundation and Wellcome Trust.

The roadmap provides a guide on how funders can accelerate research data sharing by the scientists they fund. It recommends that research funding institutions make real-time, external data sharing a requirement. And that research needs to be part of a multi-disciplinary disease network to advance public health emergencies responses.

In addition, funding should focus on strengthening institutions’ capacity on a number of fronts. This includes data management, improving data policies, building trust and aligning tools for data sharing.

Allowing researchers to freely access data generated by global academic counterparts is critical for rapidly informing disease control strategies in public health emergencies….(More)”.

Clinical Trial Data Transparency and GDPR Compliance: Implications for Data Sharing and Open Innovation


Paper by Timo Minssen, Rajam N. and Marcel Bogers: “Recent EU initiatives and legislations have considerably increased public access to clinical trials data (CTD). These developments are generally much welcomed for the enhancement of science, trust, and open innovation. However, they also raise many questions and concerns, not least at the interface between CTD transparency and other areas of evolving EU law on the protection of trade secrets, intellectual property rights and privacy.

This paper focuses on privacy issues and on the interrelation between developments in transparency and the EU’s new General Data Protection Regulation 2016/679 (GDPR). More specifically, this paper examines: (1) the genesis of EU transparency regulations, including the incidents, developments and policy concerns that have shaped them; (2) the features and implications of the GDPR which are relevant in the context of clinical trials; and (3) the risk for tensions between the GDPR and the policy goals of CTD transparency, including their implications for data sharing and open innovation. Ultimately, we stress that these and other related factors must be carefully considered and addressed to reap the full benefits of CTD transparency….(More)”.

We Need a Data-Rich Picture of What’s Killing the Planet


Clive Thompson at Wired: “…Marine litter isn’t the only hazard whose contours we can’t fully see. The United Nations has 93 indicators to measure the environmental dimensions of “sustainable development,” and amazingly, the UN found that we have little to no data on 68 percent of them—like how rapidly land is being degraded, the rate of ocean acidification, or the trade in poached wildlife. Sometimes this is because we haven’t collected it; in other cases some data exists but hasn’t been shared globally, or it’s in a myriad of incompatible formats. No matter what, we’re flying blind. “And you can’t manage something if you can’t measure it,” says David Jensen, the UN’s head of environmental peacebuilding.

In other words, if we’re going to help the planet heal and adapt, we need a data revolution. We need to build a “digital eco­system for the environment,” as Jensen puts it.

The good news is that we’ve got the tools. If there’s one thing tech excels at (for good and ill), it’s surveillance, right? We live in a world filled with cameras and pocket computers, titanic cloud computing, and the eerily sharp insights of machine learning. And this stuff can be used for something truly worthwhile: studying the planet.

There are already some remarkable cases of tech helping to break through the fog. Consider Global Fishing Watch, a nonprofit that tracks the world’s fishing vessels, looking for overfishing. They use everything from GPS-like signals emitted by ships to satellite infrared imaging of ship lighting, plugged into neural networks. (It’s massive, cloud-scale data: over 60 million data points per day, making the AI more than 90 percent accurate at classifying what type of fishing activity a boat is engaged in.)

“If a vessel is spending its time in an area that has little tuna and a lot of sharks, that’s questionable,” says Brian Sullivan, cofounder of the project and a senior program manager at Google Earth Outreach. Crucially, Global Fishing Watch makes its data open to anyone­­­—so now the National Geographic Society is using it to lobby for new marine preserves, and governments and nonprofits use it to target illicit fishing.

If we want better environmental data, we’ll need for-profit companies with the expertise and high-end sensors to pitch in too. Planet, a firm with an array of 140 satellites, takes daily snapshots of the entire Earth. Customers like insurance and financial firms love that sort of data. (It helps them understand weather and climate risk.) But Planet also offers it to services like Global Forest Watch, which maps deforestation and makes the information available to anyone (like activists who help bust illegal loggers). Meanwhile, Google’s skill in cloud-based data crunching helps illuminate the state of surface water: Google digitized 30 years of measurements from around the globe—extracting some from ancient magnetic tapes—then created an easy-to-use online tool that lets resource-poor countries figure out where their water needs protecting….(More)”.

Access to Data in Connected Cars and the Recent Reform of the Motor Vehicle Type Approval Regulation


Paper by Wolfgang Kerber and Daniel Moeller: “The need for regulatory solutions for access to in-vehicle data and resources of connected cars is one of the big controversial and unsolved policy issues. Last year the EU revised the Motor Vehicle Type Approval Regulation which already entailed a FRAND-like solution for the access to repair and maintenance information (RMI) to protect competition on the automotive aftermarkets. However, the transition to connected cars changes the technological conditions for this regulatory solution significantly. This paper analyzes the reform of the type approval regulation and shows that the regulatory solutions for access to RMI are so far only very insufficiently capable of dealing with the challenges coming along with increased connectivity, e.g. with regard to the new remote diagnostic, repair and maintenance services. Therefore, an important result of the paper is that the transition to connected cars will require a further reform of the rules for the regulated access to RMI (esp. with regard to data access, interoperability, and safety/security issues). However, our analysis also suggests that the basic approach of the current regulated access regime for RMI in the type approval regulation can also be a model for developing general solutions for the currently unsolved problems of access to in-vehicle data and resources in the ecosystem of connected driving….(More)”.

The Education Data Collaborative: A new kind of partnership.


About: “Whether we work within schools or as part of the broader ecosystem of parent-teacher associations, and philanthropic, nonprofit, and volunteer organizations, we need data to guide decisions about investing our time and resources.

This data is typically expensive to gather, often unvalidated (e.g. self-reported), and commonly available only to those who collect or report it. It can even be hard to ask for data when it’s not clear what’s available. At the same time, information – in the form of discrete research, report-card style PDFs, or static websites – is everywhere. The result is that many already resource-thin organizations that could be collaborating around strategies to help kids advance, spend a lot of time in isolation collecting and searching for data.

In the past decade, we’ve seen solid progress in addressing part of the problem: the emergence of connected longitudinal data systems (LDS). These warehouses and  linked databases contain data that can help us understand how students progress over time. No personally identifiable information (or PII) is shared, yet the data can reveal where interventions are most needed. Because these systems are typically designed for researchers and policy professionals, they are rarely accessible to the educators, parents, and partners – arts, sports, academic enrichment (e.g. STEM), mentoring, and family support programs – that play such important roles in helping young people learn and succeed…

“We need open tools for the ecosystem – parents, volunteers, non-profit organizations and the foundations and agencies that support them. These partners can realize significant benefit from the same kind of data policy makers and education leaders hold in their LDS.


That’s why we’re launching the Education Data Collaborative. Working together, we can build tools that help us use data to improve the design, efficacy, and impact of programs and interventions and find new  way to work with public education systems to achieve great things for kids. …Data collaboratives, data trusts, and other kinds of multi-sector data partnerships are among the most important civic innovations to emerge in the past decade….(More)”

The Age of Digital Interdependence


Report of the High-level Panel on Digital Cooperation: “The immense power and value of data in the modern economy can and must be harnessed to meet the SDGs, but this will require new models of collaboration. The Panel discussed potential pooling of data in areas such as health, agriculture and the environment to enable scientists and thought leaders to use data and artificial intelligence to better understand issues and find new ways to make progress on the SDGs. Such data commons would require criteria for establishing relevance to the SDGs, standards for interoperability, rules on access and safeguards to ensure privacy and security.

Anonymised data – information that is rendered anonymous in such a way that the data subject is not or no longer identifiable – about progress toward the SDGs is generally less sensitive and controversial than the use of personal data of the kind companies such as Facebook, Twitter or Google may collect to drive their business models, or facial and gait data that could be used for surveillance. However, personal data can also serve development goals, if handled with proper oversight to ensure its security and privacy.

For example, individual health data is extremely sensitive – but many people’s health data, taken together, can allow researchers to map disease outbreaks, compare the effectiveness of treatments and improve understanding of conditions. Aggregated data from individual patient cases was crucial to containing the Ebola outbreak in West Africa. Private and public sector healthcare providers around the world are now using various forms of electronic medical records. These help individual patients by making it easier to personalise health services, but the public health benefits require these records to be interoperable.

There is scope to launch collaborative projects to test the interoperability of data, standards and safeguards across the globe. The World Health Assembly’s consideration of a global strategy for digital health in 2020 presents an opportunity to launch such projects, which could initially be aimed at global health challenges such as Alzheimer’s and hypertension.

Improved digital cooperation on a data-driven approach to public health has the potential to lower costs, build new partnerships among hospitals, technology companies, insurance providers and research institutes and support the shift from treating diseases to improving wellness. Appropriate safeguards are needed to ensure the focus remains on improving health care outcomes. With testing, experience and necessary protective measures as well as guidelines for the responsible use of data, similar cooperation could emerge in many other fields related to the SDGs, from education to urban planning to agriculture…(More)”.

Data & Policy: A new venue to study and explore policy–data interaction


Opening editorial by Stefaan G. Verhulst, Zeynep Engin and Jon Crowcroft: “…Policy–data interactions or governance initiatives that use data have been the exception rather than the norm, isolated prototypes and trials rather than an indication of real, systemic change. There are various reasons for the generally slow uptake of data in policymaking, and several factors will have to change if the situation is to improve. ….

  • Despite the number of successful prototypes and small-scale initiatives, policy makers’ understanding of data’s potential and its value proposition generally remains limited (Lutes, 2015). There is also limited appreciation of the advances data science has made the last few years. This is a major limiting factor; we cannot expect policy makers to use data if they do not recognize what data and data science can do.
  • The recent (and justifiable) backlash against how certain private companies handle consumer data has had something of a reverse halo effect: There is a growing lack of trust in the way data is collected, analyzed, and used, and this often leads to a certain reluctance (or simply risk-aversion) on the part of officials and others (Engin, 2018).
  • Despite several high-profile open data projects around the world, much (probably the majority) of data that could be helpful in governance remains either privately held or otherwise hidden in silos (Verhulst and Young, 2017b). There remains a shortage not only of data but, more specifically, of high-quality and relevant data.
  • With few exceptions, the technical capacities of officials remain limited, and this has obviously negative ramifications for the potential use of data in governance (Giest, 2017).
  • It’s not just a question of limited technical capacities. There is often a vast conceptual and values gap between the policy and technical communities (Thompson et al., 2015; Uzochukwu et al., 2016); sometimes it seems as if they speak different languages. Compounding this difference in world views is the fact that the two communities rarely interact.
  • Yet, data about the use and evidence of the impact of data remain sparse. The impetus to use more data in policy making is stymied by limited scholarship and a weak evidential basis to show that data can be helpful and how. Without such evidence, data advocates are limited in their ability to make the case for more data initiatives in governance.
  • Data are not only changing the way policy is developed, but they have also reopened the debate around theory- versus data-driven methods in generating scientific knowledge (Lee, 1973; Kitchin, 2014; Chivers, 2018; Dreyfuss, 2017) and thus directly questioning the evidence base to utilization and implementation of data within policy making. A number of associated challenges are being discussed, such as: (i) traceability and reproducibility of research outcomes (due to “black box processing”); (ii) the use of correlation instead of causation as the basis of analysis, biases and uncertainties present in large historical datasets that cause replication and, in some cases, amplification of human cognitive biases and imperfections; and (iii) the incorporation of existing human knowledge and domain expertise into the scientific knowledge generation processes—among many other topics (Castelvecchi, 2016; Miller and Goodchild, 2015; Obermeyer and Emanuel, 2016; Provost and Fawcett, 2013).
  • Finally, we believe that there should be a sound under-pinning a new theory of what we call Policy–Data Interactions. To date, in reaction to the proliferation of data in the commercial world, theories of data management,1 privacy,2 and fairness3 have emerged. From the Human–Computer Interaction world, a manifesto of principles of Human–Data Interaction (Mortier et al., 2014) has found traction, which intends reducing the asymmetry of power present in current design considerations of systems of data about people. However, we need a consistent, symmetric approach to consideration of systems of policy and data, how they interact with one another.

All these challenges are real, and they are sticky. We are under no illusions that they will be overcome easily or quickly….

During the past four conferences, we have hosted an incredibly diverse range of dialogues and examinations by key global thought leaders, opinion leaders, practitioners, and the scientific community (Data for Policy, 2015201620172019). What became increasingly obvious was the need for a dedicated venue to deepen and sustain the conversations and deliberations beyond the limitations of an annual conference. This leads us to today and the launch of Data & Policy, which aims to confront and mitigate the barriers to greater use of data in policy making and governance.

Data & Policy is a venue for peer-reviewed research and discussion about the potential for and impact of data science on policy. Our aim is to provide a nuanced and multistranded assessment of the potential and challenges involved in using data for policy and to bridge the “two cultures” of science and humanism—as CP Snow famously described in his lecture on “Two Cultures and the Scientific Revolution” (Snow, 1959). By doing so, we also seek to bridge the two other dichotomies that limit an examination of datafication and is interaction with policy from various angles: the divide between practice and scholarship; and between private and public…

So these are our principles: scholarly, pragmatic, open-minded, interdisciplinary, focused on actionable intelligence, and, most of all, innovative in how we will share insight and pushing at the boundaries of what we already know and what already exists. We are excited to launch Data & Policy with the support of Cambridge University Press and University College London, and we’re looking for partners to help us build it as a resource for the community. If you’re reading this manifesto it means you have at least a passing interest in the subject; we hope you will be part of the conversation….(More)”.