Data may be the new oil, but for AI and ML projects, having access to expansive and diverse datasets is key to reducing bias and building powerful models capable of all manner of intelligent tasks. For machines, data is a little like “experience” is for humans — the more of it you have, the better decisions you are likely to make.
With CDLA-Permissive-2.0, the Linux Foundation is building on its previous efforts to encourage data-sharing through licensing arrangements that clearly define how the data — and any derivative datasets — can and can’t be used.
The Linux Foundation introduced the Community Data License Agreement (CDLA) in 2017 to entice organizations to open up their vast pools of (underused) data to third parties. There were two original licenses, a sharing license with a “copyleft” reciprocal commitment borrowed from the open source software sphere, stipulating that any derivative datasets built from the original dataset must be shared under a similar license, and a permissive license (1.0) without any such obligations in place (much as “true” open source software might be defined).
Licenses are basically legal documents that outline how a piece of work (in this case datasets) can be used or modified, but specific phrases, ambiguities, or exceptions can often be enough to spook companies if they think releasing content under a specific license could cause them problems down the line. This is where the CDLA-Permissive-2.0 license comes into play — it’s essentially a rewrite of version 1.0 but shorter and simpler to follow. Going further, it has removed certain provisions that were deemed unnecessary or burdensome and may have hindered broader use of the license.
For example, version 1.0 of the license included obligations that data recipients preserve attribution notices in the datasets. For context, attribution notices or statements are standard in the software sphere, where a company that releases software built on open source components has to credit the creators of these components in its own software license. But the Linux Foundation said feedback it received from the community and lawyers representing companies involved in open data projects pointed to challenges around associating attributions with data (or versions of datasets).
So while data source attribution is still an option, and might make sense for specific projects — particularly where transparency is paramount — it is no longer a condition for businesses looking to share data under the new permissive license. The chief remaining obligation is that the main community data license agreement text be included with the new datasets…(More)”.