Paper by Scott R. Baker & Lorenz Kueng: “The growth of the availability and use of detailed household financial transaction microdata has dramatically expanded the ability of researchers to understand both household decision-making as well as aggregate fluctuations across a wide range of fields. This class of transaction data is derived from a myriad of sources including financial institutions, FinTech apps, and payment intermediaries. We review how these detailed data have been utilized in finance and economics research and the benefits they enable beyond more traditional measures of income, spending, and wealth. We discuss the future potential for this flexible class of data in firm-focused research, real-time policy analysis, and macro statistics….(More)”.
Financial data unbound: The value of open data for individuals and institutions
Paper by McKinsey Global Institute: “As countries around the world look to ensure rapid recovery once the COVID-19 crisis abates, improved financial services are emerging as a key element to boost growth, raise economic efficiency, and lift productivity. Robust digital financial infrastructure proved its worth during the crisis, helping governments cushion people and businesses from the economic shock of the pandemic. The next frontier is to create an open-data ecosystem for finance.
Already, technological, regulatory, and competitive forces are moving markets toward easier and safer financial data sharing. Open-data initiatives are springing up globally, including the United Kingdom’s Open Banking Implementation Entity, the European Union’s second payment services directive, Australia’s new consumer protection laws, Brazil’s drafting of open data guidelines, and Nigeria’s new Open Technology Foundation (Open Banking Nigeria). In the United States, the Consumer Financial Protection Bureau aims to facilitate a consumer-authorized data-sharing market, while the Financial Data Exchange consortium attempts to promote common, interoperable standards for secure access to financial data. Yet, even as many countries put in place stronger digital financial infrastructure and data-sharing mechanisms, COVID-19 has exposed limitations and gaps in their reach, a theme we explored in earlier research.
This discussion paper from the McKinsey Global Institute (download full text in 36-page PDF) looks at the potential value that could be created—and the key issues that will need to be addressed—by the adoption of open data for finance. We focus on four regions: the European Union, India, the United Kingdom, and the United States.
By open data, we mean the ability to share financial data through a digital ecosystem in a manner that requires limited effort or manipulation. Advantages include more accurate credit risk evaluation and risk-based pricing, improved workforce allocation, better product delivery and customer service, and stronger fraud protection.
Our analysis suggests that the boost to the economy from broad adoption of open-data ecosystems could range from about 1 to 1.5 percent of GDP in 2030 in the European Union, the United Kingdom, and the United States, to as much as 4 to 5 percent in India. All market participants benefit, be they institutions or consumers—either individuals or micro-, small-, and medium-sized enterprises (MSMEs)—albeit to varying degrees….(More)”.
How data governance technologies can democratize data sharing for community well-being
Paper by Dan Wu, Stefaan Verhulst, Alex Pentland, Thiago Avila, Kelsey Finch, and Abhishek Gupta in Data & Policy (Cambridge University Press) focusing on “Data sharing efforts to allow underserved groups and organizations to overcome the concentration of power in our data landscape…
A few special organizations, due to their data monopolies and resources, are able to decide which problems to solve and how to solve them. But even though data sharing creates a counterbalancing democratizing force, it must nevertheless be approached cautiously. Underserved organizations and groups must navigate difficult barriers related to technological complexity and legal risk.
To examine what those common barriers are, one type of data sharing effort—data trusts—are examined, specifically the reports commenting on that effort. To address these practical issues, data governance technologies have a large role to play in democratizing data trusts safely and in a trustworthy manner. Yet technology is far from a silver bullet. It is dangerous to rely upon it. But technology that is no-code, flexible, and secure can help more responsibly operate data trusts. This type of technology helps innovators put relationships at the center of their efforts….(More)”.
Charting the ‘Data for Good’ Landscape
Report by Jake Porway at Data.org: “There is huge potential for data science and AI to play a productive role in advancing social impact. However, the field of “data for good” is not only overshadowed by the public conversations about the risks rampant data misuse can pose to civil society, it is also a fractured and disconnected space. There are a myriad of different interpretations of what it means to “use data for good” or “use AI for good”, which creates duplicate efforts, nonstrategic initiatives, and confusion about what a successfully data-driven social sector could look like. To add to that, funding is scarce for a field that requires expensive tools and skills to do well. These enduring challenges result in work being done at an activity and project level, but do not create a coherent set of building blocks to constitute a strong and healthy field that is capable of solving a new class of systems-level problems.
We are taking one tiny step forward in trying to make a more coherent Data for Good space with a landscape that makes clear what various Data for Good initiatives (and AI for Good initiatives) are trying to achieve, how they do it, and what makes them similar or different from one another. One of the major confusion points in talking about “Data for Good” is that it treats all efforts as similar by the mere fact that they use “data” and seek to do something “good”. This term is so broad as to be practically meaningless; as unhelpful as saying “Wood for Good”. We would laugh at a term as vague as “Wood for Good”, which would lump together activities as different as building houses to burning wood in cook stoves to making paper, combining architecture with carpentry, forestry with fuel. However, we are content to say “Data for Good”, and its related phrases “we need to use our data better” or “we need to be data-driven”, when data is arguably even more general than something like wood.
We are trying to bring clarity to the conversation by going beyond mapping organizations into arbitrary groups, to define the dimensions of what it means to do data for good. By creating an ontology for what Data for Good initiatives seek to achieve, in which sector, and by what means, we can gain a better understanding of the underlying fundamentals of using data for good, as well as creating a landscape of what initiatives are doing.
We hope that this landscape of initiatives will help to bring some more nuance and clarity to the field, as well as identify which initiatives are out there and what purpose they serve. Specifically, we hope this landscape will help:
- Data for Good field practitioners align on a shared language for the outcomes, activities, and aims of the field.
- Purpose-driven organizations who are interested in applying data and computing to their missions better understand what they might need and who they might go to to get it.
- Funders make more strategic decisions about funding in the data/AI space based on activities that align with their interests and the amount of funding already devoted to that area.
- Organizations with Data for Good initiatives can find one another and collaborate based on similarity of mission and activities.
Below you will find a very preliminary landscape map, along with a description of the different kinds of groups in the Data for Good ecosystem and why you might need to engage with them….(More)”.
On regulation for data trusts
Paper by Aline Blankertz and Louisa Specht: “Data trusts are a promising concept for enabling data use while maintaining data privacy. Data trusts can pursue many goals, such as increasing the participation of consumers or other data subjects, putting data protection into practice more effectively, or strengthening data sharing along the value chain. They have the potential to become an alternative model to the large platforms, which are accused of accumulating data power and using it primarily for their own purposes rather than for the benefit of their users. To fulfill these hopes, data trusts must be trustworthy so that their users understand and trust that data is being used in their interest.
It is an important step that policymakers have recognized the potential of data trusts. This should be followed by measures that address specific risks and thus promote trust in the services. Currently, the political approach is to subject all forms of data trusts to the same rules through “one size fits all” regulation. This is the case, for example, with the Data Governance Act (DGA), which gives data trusts little leeway to evolve in the marketplace.
To encourage the development of data trusts, it makes sense to broadly define them as all organizations that manage data on behalf of others while adhering to a legal framework (including competition, trade secrets, and privacy). Which additional rules are necessary to ensure trustworthiness should be decided depending on the use case. The risk of a use case should be considered as well as the need for incentives to act as a data trust.
Risk factors can be identified across sectors; in particular, centralized or decentralized data storage and voluntary or mandatory use of data trusts are among them. The business model is not a main risk factor. Although many regulatory proposals call for strict neutrality, several data trusts without strict neutrality appear trustworthy in terms of monetization or vertical integration. At the same time, it is unclear what incentives exist for developing strictly neutral data trusts. Neutrality requirements that go beyond what is necessary make it less likely that desired alternative models will develop and take hold….(More)”.
Lessons learned from telco data informing COVID-19 responses: toward an early warning system for future pandemics?
Introduction to a special issue of Data and Policy (Open Access) by Richard Benjamins, Jeanine Vos, and Stefaan Verhulst: “More than a year into the COVID-19 pandemic, the damage is still unfolding. While some countries have recently managed to gain an element of control through aggressive vaccine campaigns, much of the developing world — South and Southeast Asia in particular — remain in a state of crisis. Given what we now know about the global nature of this disease and the potential for mutant versions to develop and spread, a crisis anywhere is cause for concern everywhere. The world remains very much in the grip of this public health crisis.
From the beginning, there has been hope that data and technology could offer solutions to help inform the government’s response strategy and decision-making. Many of the expectations have been focused on mobile data analytics in particular, whereby mobile network operators create mobility insights and decision-support tools generated from anonymized and aggregated telco data. This stems both from a growing group of mobile network operators having significantly invested in systems and capabilities to develop such products and services for public and private sector customers. As well as their value having been demonstrated in addressing different global challenges, ranging from models to better understand the spread of Zika in Brazil to interactive dashboards to aid emergency services during earthquakes and floods in Japan. Yet despite these experiences, many governments across the world still have limited awareness, capabilities and resources to leverage these tools, in their efforts to limit the spread of COVID-19 using non-pharmaceutical interventions (NPI), both from a medical and economic point of view.
Today, we release the first batch of papers of a special collection of Data & Policy that examines both the potential of mobile data, as well as the challenges faced in delivering these tools to inform government decision-making. Consisting of 5 papers from 33 researchers and experts from academia, industry and government, the articles cover a wide range of geographies, including Europe, Argentina, Brazil, Ecuador, France, Gambia, Germany, Ghana, Austria, Belgium, and Spain. Responding to our call for case studies to illustrate the opportunities (and challenges) offered by mobile big data in the fight against COVID-19, the authors of these papers describe a number of examples of how mobile and mobile-related data have been used to address the medical, economic, socio-cultural and political aspects of the pandemic….(More)”.
Using big data for insights into the gender digital divide for girls: A discussion paper

UNICEF paper: “This discussion paper describes the findings of a study that used big data as an alternative data source to understand the gender digital divide for under-18s. It describes 6 key insights gained from analysing big data from Facebook and Instagram platforms, and discusses how big data can be further used to contribute to the body of evidence for the gender digital divide for adolescent girls….(More)”
Linux Foundation unveils new permissive license for open data collaboration
VentureBeat: “The Linux Foundation has announced a new permissive license designed to help foster collaboration around open data for artificial intelligence (AI) and machine learning (ML) projects.
Data may be the new oil, but for AI and ML projects, having access to expansive and diverse datasets is key to reducing bias and building powerful models capable of all manner of intelligent tasks. For machines, data is a little like “experience” is for humans — the more of it you have, the better decisions you are likely to make.
With CDLA-Permissive-2.0, the Linux Foundation is building on its previous efforts to encourage data-sharing through licensing arrangements that clearly define how the data — and any derivative datasets — can and can’t be used.
The Linux Foundation introduced the Community Data License Agreement (CDLA) in 2017 to entice organizations to open up their vast pools of (underused) data to third parties. There were two original licenses, a sharing license with a “copyleft” reciprocal commitment borrowed from the open source software sphere, stipulating that any derivative datasets built from the original dataset must be shared under a similar license, and a permissive license (1.0) without any such obligations in place (much as “true” open source software might be defined).
Licenses are basically legal documents that outline how a piece of work (in this case datasets) can be used or modified, but specific phrases, ambiguities, or exceptions can often be enough to spook companies if they think releasing content under a specific license could cause them problems down the line. This is where the CDLA-Permissive-2.0 license comes into play — it’s essentially a rewrite of version 1.0 but shorter and simpler to follow. Going further, it has removed certain provisions that were deemed unnecessary or burdensome and may have hindered broader use of the license.
For example, version 1.0 of the license included obligations that data recipients preserve attribution notices in the datasets. For context, attribution notices or statements are standard in the software sphere, where a company that releases software built on open source components has to credit the creators of these components in its own software license. But the Linux Foundation said feedback it received from the community and lawyers representing companies involved in open data projects pointed to challenges around associating attributions with data (or versions of datasets).
So while data source attribution is still an option, and might make sense for specific projects — particularly where transparency is paramount — it is no longer a condition for businesses looking to share data under the new permissive license. The chief remaining obligation is that the main community data license agreement text be included with the new datasets…(More)”.
Collective data rights can stop big tech from obliterating privacy
Article by Martin Tisne: “…There are two parallel approaches that should be pursued to protect the public.
One is better use of class or group actions, otherwise known as collective redress actions. Historically, these have been limited in Europe, but in November 2020 the European parliament passed a measure that requires all 27 EU member states to implement measures allowing for collective redress actions across the region. Compared with the US, the EU has stronger laws protecting consumer data and promoting competition, so class or group action lawsuits in Europe can be a powerful tool for lawyers and activists to force big tech companies to change their behavior even in cases where the per-person damages would be very low.
Class action lawsuits have most often been used in the US to seek financial damages, but they can also be used to force changes in policy and practice. They can work hand in hand with campaigns to change public opinion, especially in consumer cases (for example, by forcing Big Tobacco to admit to the link between smoking and cancer, or by paving the way for car seatbelt laws). They are powerful tools when there are thousands, if not millions, of similar individual harms, which add up to help prove causation. Part of the problem is getting the right information to sue in the first place. Government efforts, like a lawsuit brought against Facebook in December by the Federal Trade Commission (FTC) and a group of 46 states, are crucial. As the tech journalist Gilad Edelman puts it, “According to the lawsuits, the erosion of user privacy over time is a form of consumer harm—a social network that protects user data less is an inferior product—that tips Facebook from a mere monopoly to an illegal one.” In the US, as the New York Times recently reported, private lawsuits, including class actions, often “lean on evidence unearthed by the government investigations.” In the EU, however, it’s the other way around: private lawsuits can open up the possibility of regulatory action, which is constrained by the gap between EU-wide laws and national regulators.
Which brings us to the second approach: a little-known 2016 French law called the Digital Republic Bill. The Digital Republic Bill is one of the few modern laws focused on automated decision making. The law currently applies only to administrative decisions taken by public-sector algorithmic systems. But it provides a sketch for what future laws could look like. It says that the source code behind such systems must be made available to the public. Anyone can request that code.
Importantly, the law enables advocacy organizations to request information on the functioning of an algorithm and the source code behind it even if they don’t represent a specific individual or claimant who is allegedly harmed. The need to find a “perfect plaintiff” who can prove harm in order to file a suit makes it very difficult to tackle the systemic issues that cause collective data harms. Laure Lucchesi, the director of Etalab, a French government office in charge of overseeing the bill, says that the law’s focus on algorithmic accountability was ahead of its time. Other laws, like the European General Data Protection Regulation (GDPR), focus too heavily on individual consent and privacy. But both the data and the algorithms need to be regulated…(More)”
Data-driven environmental decision-making and action in armed conflict
Essay by Wim Zwijnenburg: “Our understanding of how severely armed conflicts have impacted natural resources, eco-systems, biodiversity and long-term implications on climate has massively improved over the last decade. Without a doubt, cataclysmic events such as the 1991 Gulf War oil fires contributed to raising awareness on the conflict-environment nexus, and the images of burning wells are engraved into our collective mind. But another more recent, under-examined yet major contributor to this growing cognizance is the digital revolution, which has provided us with a wealth of data and information from conflict-affected countries quickly made available through the internet. With just a few clicks, anyone with a computer or smartphone and a Wi-Fi connection can follow, often in near-real time, events shared through social media in warzones or satellite imagery showing what is unfolding on the ground.
These developments have significantly deepened our understanding of how military activities, both historically and in current conflicts, contribute to environmental damage and can impact the lives and livelihoods of civilians. Geospatial analysis through earth observation (EO) is now widely used to document international humanitarian law (IHL) violations, improve humanitarian response and inform post-conflict assessments.
These new insights on conflict-environment dynamics have driven humanitarian, military and political responses. The latter are essential for the protection of the environment in armed conflict: with knowledge and understanding also comes a responsibility to prevent, mitigate and minimize environmental damage, in line with existing international obligations. Of particular relevance, under international humanitarian law, militaries must take into account incidental environmental damage that is reasonably foreseeable based on an assessment of information from all sources available to them at the relevant time (ICRC Guidelines on the Protection of the Environment, Rule 7; Customary IHL Rule 43). Excessive harm is prohibited, and all feasible precautions must be taken to reduce incidental damage (Guidelines Rule 8, Customary IHL Rule 44).
How do we ensure that the data-driven strides forward in understanding conflict-driven environmental damage translate into proper military training and decision-making, humanitarian response and reconstruction efforts? How can this influence behaviour change and improve accountability for military actions and targeting decisions?…(More)”.