Marine Data Sharing: Challenges, Technology Drivers and Quality Attributes


Paper by Keila Lima et al: “Many companies have been adopting data-driven applications in which products and services are centered around data analysis to approach new segments of the marketplace. Data ecosystems rise from data sharing among organizations premeditatedly. However, this migration to this new data sharing paradigm has not come that far in the marine domain. Nevertheless, better utilizing the ocean data might be crucial for humankind in the future, for food production, and minerals, to ensure the ocean’s health….We investigate the state-of-the-art regarding data sharing in the marine domain with a focus on aspects that impact the speed of establishing a data ecosystem for the ocean.We conducted an exploratory case study based on focus groups and workshops to understand the sharing of data in this context. Results: We identified main challenges of current systems that need to be addressed with respect to data sharing. Additionally, aspects related to the establishment of a data ecosystem were elicited and analyzed in terms of benefits, conflicts, and solutions…(More)”.

Is Facebook’s advertising data accurate enough for use in social science research? Insights from a cross-national online survey


Paper by André Grow et al: “Social scientists increasingly use Facebook’s advertising platform for research, either in the form of conducting digital censuses of the general population, or for recruiting participants for survey research. Both approaches depend on the accuracy of the data that Facebook provides about its users, but little is known about how accurate these data are. We address this gap in a large-scale, cross-national online survey (N = 137,224), in which we compare self-reported and Facebook-classified demographic information (sex, age and region of residence). Our results suggest that Facebook’s advertising platform can be fruitfully used for conducing social science research if additional steps are taken to assess the accuracy of the characteristics under consideration…(More)”.

Ethical Considerations in Re-Using Private Sector Data for Migration-Related Policy


IOM practitioner’s paper: “This paper assesses the ethical risks of using non-traditional data sources to inform migration related policymaking and suggests practical safeguards for various stages during the data cycle. The past decade has witnessed the rapid growth of non-traditional data (social media, mobile phones, satellite data, bank records, etc.) and their use in migration research and policy. While these data sources may be tempting and shed light on main migration trends, ensuring the ethical and responsible use of big data at every stage of migration research and policymaking is complex.

The recognition of the potential of new data sources for migration policy has grown exponentially in recent years. Data innovation is one of the crosscutting priorities of IOM’s Migration Data Strategy.
Further, the UN General Assembly recognises rapid technological developments and their potential in
achieving the Sustainable Development Goals and the Global Compact for Safe, Orderly and Regular Migration highlights the importance of harnessing data innovation to improve data and evidence for informed policies on migration. However, with big data comes big risks. New technological developments have opened new challenges, particularly, concerning data protection, individual privacy, human security,
and fundamental rights. These risks can be greater for certain migrant and displaced groups.
The identified risks are:…(More)” (see also Big Data for Migration Alliance)

How government can capitalise on a revolution in data sharing


Article by Alison Pritchard: “A watershed moment in the culture of data sharing, the pandemic has led to the use of linked data increasingly becoming standard practice. From linking census and NHS data to track the virus’s impact among minority ethnic groups, to the linking of timely local data sources to support local authorities’ responses, the value of sharing data across boundaries was self-evident. 

Using data to inform a multidisciplinary pandemic response accelerated our longstanding work on data capability. To continue this progress, there is now a need to make government data more organised, easier to access, and integrated for use. Our learning has guided the development of a new cloud-based platform that will ensure that anonymised data about our society and economy are now linked and accessible for vital research and decision-making in the UK.

The idea of sharing data to maximise impact isn’t new to us at the ONS – we’ve been doing this successfully for over 15 years through our well-respected Secure Research Service (SRS). The new Integrated Data Service (IDS) is the next step in this data-sharing journey, where, in a far more advanced form, government will have the ability to work with data at source – in a safe and secure environment – rather than moving data around, which currently creates friction and significant cost. The service, being compliant with the Digital Economy Act, opens up opportunities to capitalise on the often-underutilised research elements of that key legislation.

The launch of the full IDS in the spring of 2023 will see ready-to-use datasets made available to cross-government teams and wider research communities, enabling them to securely share, link and access them for vital research. The service is a collaboration among institutions to work on projects that shed light on some of the big challenges of the day, and to provide the ability to answer questions that we don’t yet know we need to answer…(More)”.

The Data4COVID-19 Review: Assessing the Use of Non-Traditional Data During a Pandemic Crisis


Report by Hannah Chafetz, Andrew J. Zahuranec, Sara Marcucci, Behruz Davletov, and Stefaan Verhulst: “As the last two years of the COVID-19 pandemic demonstrate, pandemics pose major challenges on all levels–with cataclysmic effects on society. 

Decision-makers from around the world have sought to mitigate the consequences of COVID-19 through the use of data, including data from non-traditional sources such as social media, wastewater, and credit card and telecommunications companies. However, there has been little research into how non-traditional data initiatives were designed or what impacts they had on COVID-19 responses. 

Over the last eight months, The GovLab, with the support of The Knight Foundation, has sought to fill this gap by conducting a study about how non-traditional data (NTD) sources have been used during COVID-19. 

On October 31st, The GovLab published the report: “The COVID-19 Review: Assessing the Use of Non-Traditional Data During a Pandemic Crisis.” The report details how decision makers around the world have used non-traditional sources through a series of briefings intended for a generalist audience. 

The briefings describe and assess how non-traditional data initiatives were designed, planned, and implemented, as well as the project results. 

Findings

The briefings uncovered several findings about why, where, when, and how NTD was used during COVID-19, including that:

  • Officials increasingly called for the use of NTD to answer questions where and when traditional data such as surveys and case data were not sufficient or could not be leveraged. However, the collection and use of traditional data was often needed to validate insights.
  • NTD sources were primarily used to understand populations’ health, mobility (or physical movements), economic activity, and sentiment of the pandemic. In comparison with previous dynamic crises, COVID-19 was a watershed moment in terms of access to and re-use of non-traditional data in those four areas.
  • The majority of NTD initiatives were fragmented and uncoordinated, reflecting the larger fragmented COVID-19 response. Many projects were focused on responding to COVID-19 after outbreaks occurred. This pattern reflected an overall lack of preparedness for the pandemic and need for the rapid development of initiatives to address its consequences.
  • NTD initiatives frequently took the form of cross-sectoral data partnerships or collaborations developed to respond to specific needs. Many institutions did not have the systems and infrastructure in place for these collaborations to be sustainable.
  • Many of the NTD initiatives involving granular, personal data were implemented without the necessary social license to do so–leading to public concerns about ethics and hindering public trust in non-traditional data. 

Stefaan Verhulst, Co-Founder and Chief R&D of The GovLab explains: “The use of NTD offers growing potential during crisis situations. When managed responsibly, NTD use can help us understand the current state of the crisis, forecast how it will progress, and respond to different aspects of it in real-time.”…(More)”.

Data for Social Good: Non-Profit Sector Data Projects


Open Access Book by Jane Farmer, Anthony McCosker, Kath Albury & Amir Aryani: “In February 2020, just pre-COVID, a group of managers from community organisations met with us researchers about data for social good. “We want to collaborate with data,” said one CEO. “We want to find the big community challenges, work together to fix them and monitor the change we make over ten years.” The managers created a small, pooled fund and, through the 2020–2021 COVID lockdowns, used Zoom to workshop. Together we identified organisations’ datasets, probed their strengths and weaknesses, and found ways to share and visualise data. There were early frustrations about what data was available, its ‘granularity’ and whether new insights about the community could be found, but about half-way through the project, there was a tipping point, and something changed. While still focused on discovery from visualisations comparing their data by suburb, the group started to talk about other benefits. Through drawing in staff from across their organisations, they saw how the work of departments could be integrated by using data, and they developed new confidence in using analytics techniques. Together, the organisations developed an understanding of each other’s missions and services, while developing new relationships, trust and awareness of the possibilities of collaborating to address community needs. Managers completed the pilot having codesigned an interactive Community Resilience Dashboard, which enabled them to visualise their own organisations’ data and open public data to reveal new landscapes about community financial wellbeing and social determinants of health. They agreed they also had so much more: a collective data-capable partnership, internally and across organisations, with new potential to achieve community social justice driven by data.

We use this story to signify how right now is a special—indeed critical—time for non-profit organisations and communities to build their capability to work with data. Certainly, in high-income countries, there is pressure on non-profits to operate like commercial businesses—prioritising efficiency and using data about their outputs and impacts to compete for funding. However, beyond the immediate operational horizon, non-profits can use data analytics techniques to drive community social justice and potentially impact on the institutional capability of the whole social welfare sector. Non-profits generate a lot of data but innovating with technology is not a traditional competence, and it demands infrastructure investment and specialist workforce. Given their meagre access to funding, this book examines how non-profits of different types and sizes can use data for social good and find a path to data capability. The aim is to inspire and give practical examples of how non-profits can make data useful. While there is an emerging range of novel data for social good cases around the world, the case studies featured in this book exemplify our research and developing thinking in experimental data projects with diverse non-profits that harnessed various types of data. We outline a way to gain data capability through collaborating internally across departments and with other external non-profits and skilled data analytics partners. We term this way of working collaborative data action…(More)”.

Wicked Problems Might Inspire Greater Data Sharing


Paper by Susan Ariel Aaronson: “In 2021, the United Nations Development Program issued a plea in their 2021 Digital Economy Report. “ Global data-sharing can help address major global development challenges such as poverty, health, hunger and climate change. …Without global cooperation on data and information, research to develop the vaccine and actions to tackle the impact of the pandemic would have been a much more difficult task. Thus, in the same way as some data can be public goods, there is a case for some data to be considered as global public goods, which need to be addressed and provided through global governance.” (UNDP: 2021, 178). Global public goods are goods and services with benefits and costs that potentially extend to all countries, people, and generations. Global data sharing can also help solve what scholars call wicked problems—problems so complex that they require innovative, cost effective and global mitigating strategies. Wicked problems are problems that no one knows how to solve without
creating further problems. Hence, policymakers must find ways to encourage greater data sharing among entities that hold large troves of various types of data, while protecting that data from theft, manipulation etc. Many factors impede global data sharing for public good purposes; this analysis focuses on two.
First, policymakers generally don’t think about data as a global public good; they view data as a commercial asset that they should nurture and control. While they may understand that data can serve the public interest, they are more concerned with using data to serve their country’s economic interest. Secondly, many leaders of civil society and business see the data they have collected as proprietary data. So far many leaders of private entities with troves of data are not convinced that their organization will benefit from such sharing. At the same time, companies voluntarily share some data for social good purposes.

However, data cannot meet its public good purpose if data is not shared among societal entities. Moreover, if data as a sovereign asset, policymakers are unlikely to encourage data sharing across borders oriented towards addressing shared problems. Consequently, society will be less able to use data as both a commercial asset and as a resource to enhance human welfare. As the Bennet Institute and ODI have argued, “value comes from data being brought together, and that requires organizations to let others use the data they hold.” But that also means the entities that collected the data may not accrue all of the benefits from that data (Bennett Institute and ODI: 2020a: 4). In short, private entities are not sufficiently incentivized to share data in the global public good…(More)”.

Global healthcare fairness: We should be sharing more, not less, data


Paper by Kenneth P. Seastedt et al: “The availability of large, deidentified health datasets has enabled significant innovation in using machine learning (ML) to better understand patients and their diseases. However, questions remain regarding the true privacy of this data, patient control over their data, and how we regulate data sharing in a way that does not encumber progress or further potentiate biases for underrepresented populations. After reviewing the literature on potential reidentifications of patients in publicly available datasets, we argue that the cost—measured in terms of access to future medical innovations and clinical software—of slowing ML progress is too great to limit sharing data through large publicly available databases for concerns of imperfect data anonymization. This cost is especially great for developing countries where the barriers preventing inclusion in such databases will continue to rise, further excluding these populations and increasing existing biases that favor high-income countries. Preventing artificial intelligence’s progress towards precision medicine and sliding back to clinical practice dogma may pose a larger threat than concerns of potential patient reidentification within publicly available datasets. While the risk to patient privacy should be minimized, we believe this risk will never be zero, and society has to determine an acceptable risk threshold below which data sharing can occur—for the benefit of a global medical knowledge system….(More)”.

Eliminate data asymmetries to democratize data use


Article by Rahul Matthan: “Anyone who possesses a large enough store of data can reasonably expect to glean powerful insights from it. These insights are more often than not used to enhance advertising revenues or ensure greater customer stickiness. In other instances, they’ve been subverted to alter our political preferences and manipulate us into taking decisions we otherwise may not have.

The ability to generate insights places those who have access to these data sets at a distinct advantage over those whose data is contained within them. It allows the former to benefit from the data in ways that the latter may not even have thought possible when they consented to provide it. Given how easily these insights can be used to harm those to whom it pertains, there is a need to mitigate the effects of this data asymmetry.

Privacy law attempts to do this by providing data principals with tools they can use to exert control over their personal data. It requires data collectors to obtain informed consent from data principals before collecting their data and forbids them from using it for any purpose other than that which has been previously notified. This is why, even if that consent has been obtained, data fiduciaries cannot collect more data than is absolutely necessary to achieve the stated purpose and are only allowed to retain that data for as long as is necessary to fulfil the stated purpose.

In India, we’ve gone one step further and built techno-legal solutions to help reduce this data asymmetry. The Data Empowerment and Protection Architecture (DEPA) framework makes it possible to extract data from the silos in which they reside and transfer it on the instructions of the data principal to other entities, which can then use it to provide other services to the data principal. This data micro-portability dilutes the historical advantage that incumbents enjoy on account of collecting data over the entire duration of their customer engagement. It eliminates data asymmetries by establishing the infrastructure that creates a competitive market for data-based services, allowing data principals to choose from a range of options as to how their data could be used for their benefit by service providers.

This, however, is not the only type of asymmetry we have to deal with in this age of big data. In a recent article, Stefaan Verhulst of GovLab at New York University pointed out that it is no longer enough to possess large stores of data—you need to know how to effectively extract value from it. Many businesses might have vast stores of data that they have accumulated over the years they have been in operation, but very few of them are able to effectively extract useful signals from that noisy data.

Without the know-how to translate data into actionable information, merely owning a large data set is of little value.

Unlike data asymmetries, which can be mitigated by making data more widely available, information asymmetries can only be addressed by radically democratizing the techniques and know-how that are necessary for extracting value from data. This know-how is largely proprietary and hard to access even in a fully competitive market. What’s more, in many instances, the computation power required far exceeds the capacity of entities for whom data analysis is not the main purpose of their business…(More)”.

Data and displacement: Ethical and practical issues in data-driven humanitarian assistance for IDPs


Blog by Vicki Squire: “Ten years since the so-called “data revolution” (Pearn et al, 2022), the rise of “innovation” and the proliferation of “data solutions” has rendered the assessment of changing data practices within the humanitarian sector ever more urgent. New data acquisition modalities have provoked a range of controversies across multiple contexts and sites (e.g. Human Rights Watch, 20212022a2022b). Moreover, a range of concerns have been raised about data sharing (e.g. Fast, 2022) and the inequities embedded within humanitarian data (e.g. Data Values, 2022).

With this in mind, the Data and Displacement project set out to explore the practical and ethical implications of data-driven humanitarian assistance in two contexts characterised by high levels of internal displacement: north-eastern Nigeria and South Sudan. Our interdisciplinary research team includes academics from each of the regions under analysis, as well as practitioners from the International Organization for Migration. From the start, the research was designed to centre the lived experiences of Internally Displaced Persons (IDPs), while also shedding light on the production and use of humanitarian data from multiple perspectives.

We conducted primary research during 2021-2022. Our research combines dataset analysis and visualisation techniques with a thematic analysis of 174 semi-structured qualitative interviews. In total we interviewed 182 people: 42 international data experts, donors, and humanitarian practitioners from a range of governmental and non-governmental organisations; 40 stakeholders and practitioners working with IDPs across north-eastern Nigeria and South Sudan (20 in each region); and 100 IDPs in camp-like settings (50 in each region). Our findings point to a disconnect between international humanitarian standards and practices on the ground, the need to revisit existing ethical guidelines such informed consent, and the importance of investing in data literacies…(More)”.