Linux Foundation unveils new permissive license for open data collaboration


VentureBeat: “The Linux Foundation has announced a new permissive license designed to help foster collaboration around open data for artificial intelligence (AI) and machine learning (ML) projects.

Data may be the new oil, but for AI and ML projects, having access to expansive and diverse datasets is key to reducing bias and building powerful models capable of all manner of intelligent tasks. For machines, data is a little like “experience” is for humans — the more of it you have, the better decisions you are likely to make.

With CDLA-Permissive-2.0, the Linux Foundation is building on its previous efforts to encourage data-sharing through licensing arrangements that clearly define how the data — and any derivative datasets — can and can’t be used.

The Linux Foundation introduced the Community Data License Agreement (CDLA) in 2017 to entice organizations to open up their vast pools of (underused) data to third parties. There were two original licenses, a sharing license with a “copyleft” reciprocal commitment borrowed from the open source software sphere, stipulating that any derivative datasets built from the original dataset must be shared under a similar license, and a permissive license (1.0) without any such obligations in place (much as “true” open source software might be defined).

Licenses are basically legal documents that outline how a piece of work (in this case datasets) can be used or modified, but specific phrases, ambiguities, or exceptions can often be enough to spook companies if they think releasing content under a specific license could cause them problems down the line. This is where the CDLA-Permissive-2.0 license comes into play — it’s essentially a rewrite of version 1.0 but shorter and simpler to follow. Going further, it has removed certain provisions that were deemed unnecessary or burdensome and may have hindered broader use of the license.

For example, version 1.0 of the license included obligations that data recipients preserve attribution notices in the datasets. For context, attribution notices or statements are standard in the software sphere, where a company that releases software built on open source components has to credit the creators of these components in its own software license. But the Linux Foundation said feedback it received from the community and lawyers representing companies involved in open data projects pointed to challenges around associating attributions with data (or versions of datasets).

So while data source attribution is still an option, and might make sense for specific projects — particularly where transparency is paramount — it is no longer a condition for businesses looking to share data under the new permissive license. The chief remaining obligation is that the main community data license agreement text be included with the new datasets…(More)”.

Collective data rights can stop big tech from obliterating privacy


Article by Martin Tisne: “…There are two parallel approaches that should be pursued to protect the public.

One is better use of class or group actions, otherwise known as collective redress actions. Historically, these have been limited in Europe, but in November 2020 the European parliament passed a measure that requires all 27 EU member states to implement measures allowing for collective redress actions across the region. Compared with the US, the EU has stronger laws protecting consumer data and promoting competition, so class or group action lawsuits in Europe can be a powerful tool for lawyers and activists to force big tech companies to change their behavior even in cases where the per-person damages would be very low.

Class action lawsuits have most often been used in the US to seek financial damages, but they can also be used to force changes in policy and practice. They can work hand in hand with campaigns to change public opinion, especially in consumer cases (for example, by forcing Big Tobacco to admit to the link between smoking and cancer, or by paving the way for car seatbelt laws). They are powerful tools when there are thousands, if not millions, of similar individual harms, which add up to help prove causation. Part of the problem is getting the right information to sue in the first place. Government efforts, like a lawsuit brought against Facebook in December by the Federal Trade Commission (FTC) and a group of 46 states, are crucial. As the tech journalist Gilad Edelman puts it, “According to the lawsuits, the erosion of user privacy over time is a form of consumer harm—a social network that protects user data less is an inferior product—that tips Facebook from a mere monopoly to an illegal one.” In the US, as the New York Times recently reported, private lawsuits, including class actions, often “lean on evidence unearthed by the government investigations.” In the EU, however, it’s the other way around: private lawsuits can open up the possibility of regulatory action, which is constrained by the gap between EU-wide laws and national regulators.

Which brings us to the second approach: a little-known 2016 French law called the Digital Republic Bill. The Digital Republic Bill is one of the few modern laws focused on automated decision making. The law currently applies only to administrative decisions taken by public-sector algorithmic systems. But it provides a sketch for what future laws could look like. It says that the source code behind such systems must be made available to the public. Anyone can request that code.

Importantly, the law enables advocacy organizations to request information on the functioning of an algorithm and the source code behind it even if they don’t represent a specific individual or claimant who is allegedly harmed. The need to find a “perfect plaintiff” who can prove harm in order to file a suit makes it very difficult to tackle the systemic issues that cause collective data harms. Laure Lucchesi, the director of Etalab, a French government office in charge of overseeing the bill, says that the law’s focus on algorithmic accountability was ahead of its time. Other laws, like the European General Data Protection Regulation (GDPR), focus too heavily on individual consent and privacy. But both the data and the algorithms need to be regulated…(More)”

Data-driven environmental decision-making and action in armed conflict


Essay by Wim Zwijnenburg: “Our understanding of how severely armed conflicts have impacted natural resources, eco-systems, biodiversity and long-term implications on climate has massively improved over the last decade. Without a doubt, cataclysmic events such as the 1991 Gulf War oil fires contributed to raising awareness on the conflict-environment nexus, and the images of burning wells are engraved into our collective mind. But another more recent, under-examined yet major contributor to this growing cognizance is the digital revolution, which has provided us with a wealth of data and information from conflict-affected countries quickly made available through the internet. With just a few clicks, anyone with a computer or smartphone and a Wi-Fi connection can follow, often in near-real time, events shared through social media in warzones or satellite imagery showing what is unfolding on the ground.

These developments have significantly deepened our understanding of how military activities, both historically and in current conflicts, contribute to environmental damage and can impact the lives and livelihoods of civilians. Geospatial analysis through earth observation (EO) is now widely used to document international humanitarian law (IHL) violations, improve humanitarian response and inform post-conflict assessments.

These new insights on conflict-environment dynamics have driven humanitarian, military and political responses. The latter are essential for the protection of the environment in armed conflict: with knowledge and understanding also comes a responsibility to prevent, mitigate and minimize environmental damage, in line with existing international obligations. Of particular relevance, under international humanitarian law, militaries must take into account incidental environmental damage that is reasonably foreseeable based on an assessment of information from all sources available to them at the relevant time (ICRC Guidelines on the Protection of the Environment, Rule 7Customary IHL Rule 43). Excessive harm is prohibited, and all feasible precautions must be taken to reduce incidental damage (Guidelines Rule 8, Customary IHL Rule 44).

How do we ensure that the data-driven strides forward in understanding conflict-driven environmental damage translate into proper military training and decision-making, humanitarian response and reconstruction efforts? How can this influence behaviour change and improve accountability for military actions and targeting decisions?…(More)”.

Investing in Data Saves Lives


Mark Lowcock and Raj Shah at Project Syndicate: “…Our experience of building a predictive model, and its use by public-health officials in these countries, showed that this approach could lead to better humanitarian outcomes. But it was also a reminder that significant data challenges, regarding both gaps and quality, limit the viability and accuracy of such models for the world’s most vulnerable countries. For example, data on the prevalence of cardiovascular diseases was 4-7 years old in several poorer countries, and not available at all for Sudan and South Sudan.

Globally, we are still missing about 50% of the data needed to respond effectively in countries experiencing humanitarian emergencies. OCHA and The Rockefeller Foundation are cooperating to provide early insight into crises, during and beyond the COVID-19 pandemic. But realizing the full potential of our approach depends on the contributions of others.

So, as governments, development banks, and major humanitarian and development agencies reflect on the first year of the pandemic response, as well as on discussions at the recent World Bank Spring Meetings, they must recognize the crucial role data will play in recovering from this crisis and preventing future ones. Filling gaps in critical data should be a top priority for all humanitarian and development actors.

Governments, humanitarian organizations, and regional development banks thus need to invest in data collection, data-sharing infrastructure, and the people who manage these processes. Likewise, these stakeholders must become more adept at responsibly sharing their data through open data platforms and that maintain rigorous interoperability standards.

Where data are not available, the private sector should develop new sources of information through innovative methods such as using anonymized social-media data or call records to understand population movement patterns….(More)”.

Next-generation nowcasting to improve decision making in a crisis


Frank Gerhard, Marie-Paule Laurent, Kyriakos Spyrounakos, and Eckart Windhagen at McKinsey: “In light of the limitations of the traditional models, we recommend a modified approach to nowcasting that uses country- and industry-specific expertise to boil down the number of variables to a selected few for each geography or sector, depending on the individual economic setting. Given the specific selection of each core variable, the relationships between the variables will be relatively stable over time, even during a major crisis. Admittedly, the more variables used, the easier it is to explain an economic shift; however, using more variables also means a greater chance of a break in some of the statistical relationships, particularly in response to an exogenous shock.

This revised nowcasting model will be more flexible and robust in periods of economic stress. It will provide economically intuitive outcomes, include the consideration of complementary, high-frequency data, and offer access to economic insights that are at once timely and unique.

Nowcast for Q1 2021 shows differing recovery speeds by sector and geography.

For example, consumer spending can be estimated in different US cities by combining data such as wages from business applications and footfall from mobility trend reports. As a more complex example: eurozone capitalization rates are, at the time of the writing of this article, available only through January 2021. However, a revamped nowcasting model can estimate current capitalization rates in various European countries by employing a handful of real-time and high-frequency variables for each, such as retail confidence indicators, stock-exchange indices, price expectations, construction estimates, base-metals prices and output, and even deposits into financial institutions. The choice of variable should, of course, be guided by industry and sector experts.

Similarly, published figures for gross value added (GVA) at the sector level in Europe are available only up to the second quarter of 2020. However, by utilizing selected variables, the new approach to nowcasting can provide an estimate of GVA through the first quarter of 2021. It can also highlight the different experiences of each region and industry sector in the recent recovery. Note that the sectors reliant on in-person interactions and of a nonessential nature have been slow to recover, as have the countries more reliant on international markets (exhibit)….(More)”.

Enabling Trusted Data Collaboration in Society


Launch of Public Beta of the Data Responsibility Journey Mapping Tool: “Data Collaboratives, the purpose-driven reuse of data in the public interest, have demonstrated their ability to unlock the societal value of siloed data and create real-world impacts. Data collaboration has been key in generating new insights and action in areas like public healtheducationcrisis response, and economic development, to name a few. Designing and deploying a data collaborative, however, is a complex undertaking, subject to risks of misuse of data as well as missed use of data that could have provided public value if used effectively and responsibly.

Today, The GovLab is launching the public beta of a new tool intended to help Data Stewards — responsible data leaders across sectors — and other decision-makers assess and mitigate risks across the life cycle of a data collaborative. The Data Responsibility Journey is an assessment tool for Data Stewards to identify and mitigate risks, establish trust, and maximize the value of their work. Informed by The GovLab’s long standing research and practice in the field, and myriad consultations with data responsibility experts across regions and contexts, the tool aims to support decision-making in public agencies, civil society organizations, large businesses, small businesses, and humanitarian and development organizations, in particular.

The Data Responsibility Journey guides users through important questions and considerations across the lifecycle of data stewardship and collaboration: Planning, Collecting, Processing, Sharing, Analyzing, and Using. For each stage, users are asked to consider whether important data responsibility issues have been taken into account as part of their implementation strategy. When users flag an issue as in need of more attention, it is automatically added to a customized data responsibility strategy report providing actionable recommendations, relevant tools and resources, and key internal and external stakeholders that could be engaged to help operationalize these data responsibility actions…(More)”.

Data for Good Collaboration


Research Report by Swinburne University of Technology’s Social Innovation Research Institute: “…partnered with the Lord Mayor’s Charitable Foundation, Entertainment Assist, Good Cycles and Yooralla Disability Services, to create the data for good collaboration. The project had two aims: – Build organisational data capacity through knowledge sharing about data literacy, expertise and collaboration – Deliver data insights through a methodology of collaborative data analytics This report presents key findings from our research partnership, which involved the design and delivery of a series of webinars that built data literacy; and participatory data capacity-building workshops facilitated by teams of social scientists and data scientists. It also draws on interviews with participants, reflecting on the benefits and opportunities data literacy can offer to individuals and organisations in the not-for-profit and NGO sectors…(More)”.

Developing a Data Reuse Strategy for Solving Public Problems


The Data Stewards Academy…A self-directed learning program from the Open Data Policy Lab (The GovLab): “Communities across the world face unprecedented challenges. Strained by climate change, crumbling infrastructure, growing economic inequality, and the continued costs of the COVID-19 pandemic, institutions need new ways of solving public problems and improving how they operate.

In recent years, data has been increasingly used to inform policies and interventions targeted at these issues. Yet, many of these data projects, data collaboratives, and open data initiatives remain scattered. As we enter into a new age of data use and re-use, a third wave of open data, it is more important than ever to be strategic and purposeful, to find new ways to connect the demand for data with its supply to meet institutional objectives in a socially responsible way.

This self-directed learning program, adapted from a selective executive education course, will help data stewards (and aspiring data stewards) develop a data re-use strategy to solve public problems. Noting the ways data resources can inform their day-to-day and strategic decision-making, the course provides learners with ways they can use data to improve how they operate and pursue goals in the public’s interests. By working differently—using agile methods and data analytics—public, private, and civil sector leaders can promote data re-use and reduce data access inequities in ways that advance their institution’s goals.

In this self-directed learning program, we will teach participants how to develop a 21st century data strategy. Participants will learn:

  1. Why It Matters: A discussion of the three waves of open data and how data re-use has proven to be transformative;
  2. The Current State of Play: Current practice around data re-use, including deficits of current approaches and the need to shift from ad hoc engagements to more systematic, sustainable, and responsible models;
  3. Defining Demand: Methodologies for how organizations can formulate questions that data can answer; and make data collaboratives more purposeful;
  4. Mapping Supply: Methods for organizations to discover and assess the open and private data needed to answer the questions at hand that potentially may be available to them;
  5. Matching Supply with Demand: Operational models for connecting and meeting the needs of supply- and demand-side actors in a sustainable way;
  6. Identifying Risks: Overview of the risks that can emerge in the course of data re-use;
  7. Mitigating Risks and Other Considerations: Technical, legal and contractual issues that can be leveraged or may arise in the course of data collaboration and other data work; and
  8. Institutionalizing Data Re-use: Suggestions for how organizations can incorporate data re-use into their organizational structure and foster future collaboration and data stewardship.

The Data Stewardship Executive Education Course was designed and implemented by program leads Stefaan Verhulst, co-founder and chief research development officer at the GovLab, and Andrew Young, The GovLab’s knowledge director, in close collaboration with a global network of expert faculty and advisors. It aims to….(More)”.

Data Stewards Academy Canvas

WHO, Germany launch new global hub for pandemic and epidemic intelligence


Press Release: “The World Health Organization (WHO) and the Federal Republic of Germany will establish a new global hub for pandemic and epidemic intelligence, data, surveillance and analytics innovation. The Hub, based in Berlin and working with partners around the world, will lead innovations in data analytics across the largest network of global data to predict, prevent, detect prepare for and respond to pandemic and epidemic risks worldwide.

H.E. German Federal Chancellor Dr Angela Merkel said: “The current COVID-19 pandemic has taught us that we can only fight pandemics and epidemics together. The new WHO Hub will be a global platform for pandemic prevention, bringing together various governmental, academic and private sector institutions. I am delighted that WHO chose Berlin as its location and invite partners from all around the world to contribute to the WHO Hub.”

The WHO Hub for Pandemic and Epidemic Intelligence is part of WHO’s Health Emergencies Programme and will be a new collaboration of countries and partners worldwide, driving innovations to increase availability and linkage of diverse data; develop tools and predictive models for risk analysis; and to monitor disease control measures, community acceptance and infodemics. Critically, the WHO Hub will support the work of public health experts and policy-makers in all countries with insights so they can take rapid decisions to prevent and respond to future public health emergencies.

“We need to identify pandemic and epidemic risks as quickly as possible, wherever they occur in the world. For that aim, we need to strengthen the global early warning surveillance system with improved collection of health-related data and inter-disciplinary risk analysis,” said Jens Spahn, German Minister of Health. “Germany has consistently been committed to support WHO’s work in preparing for and responding to health emergencies, and the WHO Hub is a concrete initiative that will make the world safer.”

Working with partners globally, the WHO Hub will drive a scale-up in innovation for existing forecasting and early warning capacities in WHO and Member States. At the same time, the WHO Hub will accelerate global collaborations across public and private sector organizations, academia, and international partner networks. It will help them to collaborate and co-create the necessary tools for managing and analyzing data for early warning surveillance. It will also promote greater access to data and information….(More)”.

Responsible Data Science


Book by Peter Bruce and Grant Fleming: “The increasing popularity of data science has resulted in numerous well-publicized cases of bias, injustice, and discrimination. The widespread deployment of “Black box” algorithms that are difficult or impossible to understand and explain, even for their developers, is a primary source of these unanticipated harms, making modern techniques and methods for manipulating large data sets seem sinister, even dangerous. When put in the hands of authoritarian governments, these algorithms have enabled suppression of political dissent and persecution of minorities. To prevent these harms, data scientists everywhere must come to understand how the algorithms that they build and deploy may harm certain groups or be unfair.

Responsible Data Science delivers a comprehensive, practical treatment of how to implement data science solutions in an even-handed and ethical manner that minimizes the risk of undue harm to vulnerable members of society. Both data science practitioners and managers of analytics teams will learn how to:

  • Improve model transparency, even for black box models
  • Diagnose bias and unfairness within models using multiple metrics
  • Audit projects to ensure fairness and minimize the possibility of unintended harm…(More)”