data collaboratives

On regulation for data trusts

Curated on July 6, 2021July 6, 2021 by Stefaan Verhulst

Paper by Aline Blankertz and Louisa Specht: “Data trusts are a promising concept for enabling data use while maintaining data privacy. Data trusts can pursue many goals, such as increasing the participation of consumers or other data subjects, putting data protection into practice more effectively, or strengthening data sharing along the value chain. They have the potential to become an alternative model to the large platforms, which are accused of accumulating data power and using it primarily for their own purposes rather than for the benefit of their users. To fulfill these hopes, data trusts must be trustworthy so that their users understand and trust that data is being used in their interest.

It is an important step that policymakers have recognized the potential of data trusts. This should be followed by measures that address specific risks and thus promote trust in the services. Currently, the political approach is to subject all forms of data trusts to the same rules through “one size fits all” regulation. This is the case, for example, with the Data Governance Act (DGA), which gives data trusts little leeway to evolve in the marketplace.

To encourage the development of data trusts, it makes sense to broadly define them as all organizations that manage data on behalf of others while adhering to a legal framework (including competition, trade secrets, and privacy). Which additional rules are necessary to ensure trustworthiness should be decided depending on the use case. The risk of a use case should be considered as well as the need for incentives to act as a data trust.

Risk factors can be identified across sectors; in particular, centralized or decentralized data storage and voluntary or mandatory use of data trusts are among them. The business model is not a main risk factor. Although many regulatory proposals call for strict neutrality, several data trusts without strict neutrality appear trustworthy in terms of monetization or vertical integration. At the same time, it is unclear what incentives exist for developing strictly neutral data trusts. Neutrality requirements that go beyond what is necessary make it less likely that desired alternative models will develop and take hold….(More)”.

Lessons learned from telco data informing COVID-19 responses: toward an early warning system for future pandemics?

Curated on July 1, 2021July 1, 2021 by Stefaan Verhulst

Introduction to a special issue of Data and Policy (Open Access) by Richard Benjamins, Jeanine Vos, and Stefaan Verhulst: “More than a year into the COVID-19 pandemic, the damage is still unfolding. While some countries have recently managed to gain an element of control through aggressive vaccine campaigns, much of the developing world — South and Southeast Asia in particular — remain in a state of crisis. Given what we now know about the global nature of this disease and the potential for mutant versions to develop and spread, a crisis anywhere is cause for concern everywhere. The world remains very much in the grip of this public health crisis.

From the beginning, there has been hope that data and technology could offer solutions to help inform the government’s response strategy and decision-making. Many of the expectations have been focused on mobile data analytics in particular, whereby mobile network operators create mobility insights and decision-support tools generated from anonymized and aggregated telco data. This stems both from a growing group of mobile network operators having significantly invested in systems and capabilities to develop such products and services for public and private sector customers. As well as their value having been demonstrated in addressing different global challenges, ranging from models to better understand the spread of Zika in Brazil to interactive dashboards to aid emergency services during earthquakes and floods in Japan. Yet despite these experiences, many governments across the world still have limited awareness, capabilities and resources to leverage these tools, in their efforts to limit the spread of COVID-19 using non-pharmaceutical interventions (NPI), both from a medical and economic point of view.

Today, we release the first batch of papers of a special collection of Data & Policy that examines both the potential of mobile data, as well as the challenges faced in delivering these tools to inform government decision-making. Consisting of 5 papers from 33 researchers and experts from academia, industry and government, the articles cover a wide range of geographies, including Europe, Argentina, Brazil, Ecuador, France, Gambia, Germany, Ghana, Austria, Belgium, and Spain. Responding to our call for case studies to illustrate the opportunities (and challenges) offered by mobile big data in the fight against COVID-19, the authors of these papers describe a number of examples of how mobile and mobile-related data have been used to address the medical, economic, socio-cultural and political aspects of the pandemic….(More)”.

Using big data for insights into the gender digital divide for girls: A discussion paper

Curated on July 1, 2021July 1, 2021 by Stefaan Verhulst

UNICEF paper: “This discussion paper describes the findings of a study that used big data as an alternative data source to understand the gender digital divide for under-18s. It describes 6 key insights gained from analysing big data from Facebook and Instagram platforms, and discusses how big data can be further used to contribute to the body of evidence for the gender digital divide for adolescent girls….(More)”

Linux Foundation unveils new permissive license for open data collaboration

Curated on June 24, 2021June 24, 2021 by Stefaan Verhulst

VentureBeat: “The Linux Foundation has announced a new permissive license designed to help foster collaboration around open data for artificial intelligence (AI) and machine learning (ML) projects.

Data may be the new oil, but for AI and ML projects, having access to expansive and diverse datasets is key to reducing bias and building powerful models capable of all manner of intelligent tasks. For machines, data is a little like “experience” is for humans — the more of it you have, the better decisions you are likely to make.

With CDLA-Permissive-2.0, the Linux Foundation is building on its previous efforts to encourage data-sharing through licensing arrangements that clearly define how the data — and any derivative datasets — can and can’t be used.

The Linux Foundation introduced the Community Data License Agreement (CDLA) in 2017 to entice organizations to open up their vast pools of (underused) data to third parties. There were two original licenses, a sharing license with a “copyleft” reciprocal commitment borrowed from the open source software sphere, stipulating that any derivative datasets built from the original dataset must be shared under a similar license, and a permissive license (1.0) without any such obligations in place (much as “true” open source software might be defined).

Licenses are basically legal documents that outline how a piece of work (in this case datasets) can be used or modified, but specific phrases, ambiguities, or exceptions can often be enough to spook companies if they think releasing content under a specific license could cause them problems down the line. This is where the CDLA-Permissive-2.0 license comes into play — it’s essentially a rewrite of version 1.0 but shorter and simpler to follow. Going further, it has removed certain provisions that were deemed unnecessary or burdensome and may have hindered broader use of the license.

For example, version 1.0 of the license included obligations that data recipients preserve attribution notices in the datasets. For context, attribution notices or statements are standard in the software sphere, where a company that releases software built on open source components has to credit the creators of these components in its own software license. But the Linux Foundation said feedback it received from the community and lawyers representing companies involved in open data projects pointed to challenges around associating attributions with data (or versions of datasets).

So while data source attribution is still an option, and might make sense for specific projects — particularly where transparency is paramount — it is no longer a condition for businesses looking to share data under the new permissive license. The chief remaining obligation is that the main community data license agreement text be included with the new datasets…(More)”.

Collective data rights can stop big tech from obliterating privacy

Curated on June 3, 2021June 3, 2021 by Stefaan Verhulst

Article by Martin Tisne: “…There are two parallel approaches that should be pursued to protect the public.

One is better use of class or group actions, otherwise known as collective redress actions. Historically, these have been limited in Europe, but in November 2020 the European parliament passed a measure that requires all 27 EU member states to implement measures allowing for collective redress actions across the region. Compared with the US, the EU has stronger laws protecting consumer data and promoting competition, so class or group action lawsuits in Europe can be a powerful tool for lawyers and activists to force big tech companies to change their behavior even in cases where the per-person damages would be very low.

Class action lawsuits have most often been used in the US to seek financial damages, but they can also be used to force changes in policy and practice. They can work hand in hand with campaigns to change public opinion, especially in consumer cases (for example, by forcing Big Tobacco to admit to the link between smoking and cancer, or by paving the way for car seatbelt laws). They are powerful tools when there are thousands, if not millions, of similar individual harms, which add up to help prove causation. Part of the problem is getting the right information to sue in the first place. Government efforts, like a lawsuit brought against Facebook in December by the Federal Trade Commission (FTC) and a group of 46 states, are crucial. As the tech journalist Gilad Edelman puts it, “According to the lawsuits, the erosion of user privacy over time is a form of consumer harm—a social network that protects user data less is an inferior product—that tips Facebook from a mere monopoly to an illegal one.” In the US, as the New York Times recently reported, private lawsuits, including class actions, often “lean on evidence unearthed by the government investigations.” In the EU, however, it’s the other way around: private lawsuits can open up the possibility of regulatory action, which is constrained by the gap between EU-wide laws and national regulators.

Which brings us to the second approach: a little-known 2016 French law called the Digital Republic Bill. The Digital Republic Bill is one of the few modern laws focused on automated decision making. The law currently applies only to administrative decisions taken by public-sector algorithmic systems. But it provides a sketch for what future laws could look like. It says that the source code behind such systems must be made available to the public. Anyone can request that code.

Importantly, the law enables advocacy organizations to request information on the functioning of an algorithm and the source code behind it even if they don’t represent a specific individual or claimant who is allegedly harmed. The need to find a “perfect plaintiff” who can prove harm in order to file a suit makes it very difficult to tackle the systemic issues that cause collective data harms. Laure Lucchesi, the director of Etalab, a French government office in charge of overseeing the bill, says that the law’s focus on algorithmic accountability was ahead of its time. Other laws, like the European General Data Protection Regulation (GDPR), focus too heavily on individual consent and privacy. But both the data and the algorithms need to be regulated…(More)”

Data-driven environmental decision-making and action in armed conflict

Curated on June 2, 2021June 2, 2021 by Stefaan Verhulst

Essay by Wim Zwijnenburg: “Our understanding of how severely armed conflicts have impacted natural resources, eco-systems, biodiversity and long-term implications on climate has massively improved over the last decade. Without a doubt, cataclysmic events such as the 1991 Gulf War oil fires contributed to raising awareness on the conflict-environment nexus, and the images of burning wells are engraved into our collective mind. But another more recent, under-examined yet major contributor to this growing cognizance is the digital revolution, which has provided us with a wealth of data and information from conflict-affected countries quickly made available through the internet. With just a few clicks, anyone with a computer or smartphone and a Wi-Fi connection can follow, often in near-real time, events shared through social media in warzones or satellite imagery showing what is unfolding on the ground.

These developments have significantly deepened our understanding of how military activities, both historically and in current conflicts, contribute to environmental damage and can impact the lives and livelihoods of civilians. Geospatial analysis through earth observation (EO) is now widely used to document international humanitarian law (IHL) violations, improve humanitarian response and inform post-conflict assessments.

These new insights on conflict-environment dynamics have driven humanitarian, military and political responses. The latter are essential for the protection of the environment in armed conflict: with knowledge and understanding also comes a responsibility to prevent, mitigate and minimize environmental damage, in line with existing international obligations. Of particular relevance, under international humanitarian law, militaries must take into account incidental environmental damage that is reasonably foreseeable based on an assessment of information from all sources available to them at the relevant time (ICRC Guidelines on the Protection of the Environment, Rule 7; Customary IHL Rule 43). Excessive harm is prohibited, and all feasible precautions must be taken to reduce incidental damage (Guidelines Rule 8, Customary IHL Rule 44).

How do we ensure that the data-driven strides forward in understanding conflict-driven environmental damage translate into proper military training and decision-making, humanitarian response and reconstruction efforts? How can this influence behaviour change and improve accountability for military actions and targeting decisions?…(More)”.

Investing in Data Saves Lives

Curated on June 1, 2021 by Stefaan Verhulst

Mark Lowcock and Raj Shah at Project Syndicate: “…Our experience of building a predictive model, and its use by public-health officials in these countries, showed that this approach could lead to better humanitarian outcomes. But it was also a reminder that significant data challenges, regarding both gaps and quality, limit the viability and accuracy of such models for the world’s most vulnerable countries. For example, data on the prevalence of cardiovascular diseases was 4-7 years old in several poorer countries, and not available at all for Sudan and South Sudan.

Globally, we are still missing about 50% of the data needed to respond effectively in countries experiencing humanitarian emergencies. OCHA and The Rockefeller Foundation are cooperating to provide early insight into crises, during and beyond the COVID-19 pandemic. But realizing the full potential of our approach depends on the contributions of others.

So, as governments, development banks, and major humanitarian and development agencies reflect on the first year of the pandemic response, as well as on discussions at the recent World Bank Spring Meetings, they must recognize the crucial role data will play in recovering from this crisis and preventing future ones. Filling gaps in critical data should be a top priority for all humanitarian and development actors.

Governments, humanitarian organizations, and regional development banks thus need to invest in data collection, data-sharing infrastructure, and the people who manage these processes. Likewise, these stakeholders must become more adept at responsibly sharing their data through open data platforms and that maintain rigorous interoperability standards.

Where data are not available, the private sector should develop new sources of information through innovative methods such as using anonymized social-media data or call records to understand population movement patterns….(More)”.

Next-generation nowcasting to improve decision making in a crisis

Curated on May 27, 2021May 27, 2021 by Stefaan Verhulst

Frank Gerhard, Marie-Paule Laurent, Kyriakos Spyrounakos, and Eckart Windhagen at McKinsey: “In light of the limitations of the traditional models, we recommend a modified approach to nowcasting that uses country- and industry-specific expertise to boil down the number of variables to a selected few for each geography or sector, depending on the individual economic setting. Given the specific selection of each core variable, the relationships between the variables will be relatively stable over time, even during a major crisis. Admittedly, the more variables used, the easier it is to explain an economic shift; however, using more variables also means a greater chance of a break in some of the statistical relationships, particularly in response to an exogenous shock.

This revised nowcasting model will be more flexible and robust in periods of economic stress. It will provide economically intuitive outcomes, include the consideration of complementary, high-frequency data, and offer access to economic insights that are at once timely and unique.

Nowcast for Q1 2021 shows differing recovery speeds by sector and geography.

For example, consumer spending can be estimated in different US cities by combining data such as wages from business applications and footfall from mobility trend reports. As a more complex example: eurozone capitalization rates are, at the time of the writing of this article, available only through January 2021. However, a revamped nowcasting model can estimate current capitalization rates in various European countries by employing a handful of real-time and high-frequency variables for each, such as retail confidence indicators, stock-exchange indices, price expectations, construction estimates, base-metals prices and output, and even deposits into financial institutions. The choice of variable should, of course, be guided by industry and sector experts.

Similarly, published figures for gross value added (GVA) at the sector level in Europe are available only up to the second quarter of 2020. However, by utilizing selected variables, the new approach to nowcasting can provide an estimate of GVA through the first quarter of 2021. It can also highlight the different experiences of each region and industry sector in the recent recovery. Note that the sectors reliant on in-person interactions and of a nonessential nature have been slow to recover, as have the countries more reliant on international markets (exhibit)….(More)”.

Enabling Trusted Data Collaboration in Society

Curated on May 13, 2021May 13, 2021 by Stefaan Verhulst

Launch of Public Beta of the Data Responsibility Journey Mapping Tool: “Data Collaboratives, the purpose-driven reuse of data in the public interest, have demonstrated their ability to unlock the societal value of siloed data and create real-world impacts. Data collaboration has been key in generating new insights and action in areas like public health, education, crisis response, and economic development, to name a few. Designing and deploying a data collaborative, however, is a complex undertaking, subject to risks of misuse of data as well as missed use of data that could have provided public value if used effectively and responsibly.

Today, The GovLab is launching the public beta of a new tool intended to help Data Stewards — responsible data leaders across sectors — and other decision-makers assess and mitigate risks across the life cycle of a data collaborative. The Data Responsibility Journey is an assessment tool for Data Stewards to identify and mitigate risks, establish trust, and maximize the value of their work. Informed by The GovLab’s long standing research and practice in the field, and myriad consultations with data responsibility experts across regions and contexts, the tool aims to support decision-making in public agencies, civil society organizations, large businesses, small businesses, and humanitarian and development organizations, in particular.

The Data Responsibility Journey guides users through important questions and considerations across the lifecycle of data stewardship and collaboration: Planning, Collecting, Processing, Sharing, Analyzing, and Using. For each stage, users are asked to consider whether important data responsibility issues have been taken into account as part of their implementation strategy. When users flag an issue as in need of more attention, it is automatically added to a customized data responsibility strategy report providing actionable recommendations, relevant tools and resources, and key internal and external stakeholders that could be engaged to help operationalize these data responsibility actions…(More)”.

Data for Good Collaboration

Curated on May 12, 2021May 12, 2021 by Stefaan Verhulst

Research Report by Swinburne University of Technology’s Social Innovation Research Institute: “…partnered with the Lord Mayor’s Charitable Foundation, Entertainment Assist, Good Cycles and Yooralla Disability Services, to create the data for good collaboration. The project had two aims: – Build organisational data capacity through knowledge sharing about data literacy, expertise and collaboration – Deliver data insights through a methodology of collaborative data analytics This report presents key findings from our research partnership, which involved the design and delivery of a series of webinars that built data literacy; and participatory data capacity-building workshops facilitated by teams of social scientists and data scientists. It also draws on interviews with participants, reflecting on the benefits and opportunities data literacy can offer to individuals and organisations in the not-for-profit and NGO sectors…(More)”.