Data Governance Meets the EU AI Act


Article by Axel Schwanke: “..The EU AI Act emphasizes sustainable AI through robust data governance, promoting principles like data minimization, purpose limitation, and data quality to ensure responsible data collection and processing. It mandates measures such as data protection impact assessments and retention policies. Article 10 underscores the importance of effective data management in fostering ethical and sustainable AI development…This article states that high-risk AI systems must be developed using high-quality data sets for training, validation, and testing. These data sets should be managed properly, considering factors like data collection processes, data preparation, potential biases, and data gaps. The data sets should be relevant, representative, error-free, and complete as much as possible. They should also consider the specific context in which the AI system will be used. In some cases, providers may process special categories of personal data to detect and correct biases, but they must follow strict conditions to protect individuals’ rights and freedoms…

However, achieving compliance presents several significant challenges:

  • Ensuring Dataset Quality and Relevance: Organizations must establish robust data and AI platforms to prepare and manage datasets that are error-free, representative, and contextually relevant for their intended use cases. This requires rigorous data preparation and validation processes.
  • Bias and Contextual Sensitivity: Continuous monitoring for biases in data is critical. Organizations must implement corrective actions to address gaps while ensuring compliance with privacy regulations, especially when processing personal data to detect and reduce bias.
  • End-to-End Traceability: A comprehensive data governance framework is essential to track and document data flow from its origin to its final use in AI models. This ensures transparency, accountability, and compliance with regulatory requirements.
  • Evolving Data Requirements: Dynamic applications and changing schemas, particularly in industries like real estate, necessitate ongoing updates to data preparation processes to maintain relevance and accuracy.
  • Secure Data Processing: Compliance demands strict adherence to secure processing practices for personal data, ensuring privacy and security while enabling bias detection and mitigation.

Example: Real Estate Data
Immowelt’s real estate price map, awarded as the top performer in a 2022 test of real estate price maps, exemplifies the challenges of achieving high-quality datasets. The prepared data powers numerous services and applications, including data analysis, price predictions, personalization, recommendations, and market research…(More)”

Why Digital Public Goods, including AI, Should Depend on Open Data


Article by Cable Green: “Acknowledging that some data should not be shared (for moral, ethical and/or privacy reasons) and some cannot be shared (for legal or other reasons), Creative Commons (CC) thinks there is value in incentivizing the creation, sharing, and use of open data to advance knowledge production. As open communities continue to imagine, design, and build digital public goods and public infrastructure services for education, science, and culture, these goods and services – whenever possible and appropriate – should produce, share, and/or build upon open data.

Open Data and Digital Public Goods (DPGs)

CC is a member of the Digital Public Goods Alliance (DPGA) and CC’s legal tools have been recognized as digital public goods (DPGs). DPGs are “open-source software, open standards, open data, open AI systems, and open content collections that adhere to privacy and other applicable best practices, do no harm, and are of high relevance for attainment of the United Nations 2030 Sustainable Development Goals (SDGs).” If we want to solve the world’s greatest challenges, governments and other funders will need to invest in, develop, openly license, share, and use DPGs.

Open data is important to DPGs because data is a key driver of economic vitality with demonstrated potential to serve the public good. In the public sector, data informs policy making and public services delivery by helping to channel scarce resources to those most in need; providing the means to hold governments accountable and foster social innovation. In short, data has the potential to improve people’s lives. When data is closed or otherwise unavailable, the public does not accrue these benefits.CC was recently part of a DPGA sub-committee working to preserve the integrity of open data as part of the DPG Standard. This important update to the DPG Standard was introduced to ensure only open datasets and content collections with open licenses are eligible for recognition as DPGs. This new requirement means open data sets and content collections must meet the following criteria to be recognised as a digital public good.

  1. Comprehensive Open Licensing:
    1. The entire data set/content collection must be under an acceptable open licence. Mixed-licensed collections will no longer be accepted.
  2. Accessible and Discoverable:
    1. All data sets and content collection DPGs must be openly licensed and easily accessible from a distinct, single location, such as a unique URL.
  3. Permitted Access Restrictions:
    1. Certain access restrictions – such as logins, registrations, API keys, and throttling – are permitted as long as they do not discriminate against users or restrict usage based on geography or any other factors…(More)”.

Mindmasters: The Data-Driven Science of Predicting and Changing Human Behavior


Book by Sandra Matz: “There are more pieces of digital data than there are stars in the universe. This data helps us monitor our planet, decipher our genetic code, and take a deep dive into our psychology.

As algorithms become increasingly adept at accessing the human mind, they also become more and more powerful at controlling it, enticing us to buy a certain product or vote for a certain political candidate. Some of us say this technological trend is no big deal. Others consider it one of the greatest threats to humanity. But what if the truth is more nuanced and mind-bending than that?

In Mindmasters, Columbia Business School professor Sandra Matz reveals in fascinating detail how big data offers insights into the most intimate aspects of our psyches and how these insights empower an external influence over the choices we make. This can be creepy, manipulative, and downright harmful, with scandals like that of British consulting firm Cambridge Analytica being merely the tip of the iceberg. Yet big data also holds enormous potential to help us live healthier, happier lives—for example, by improving our mental health, encouraging better financial decisions, or enabling us to break out of our echo chambers..(More)”.

Problems of participatory processes in policymaking: a service design approach


Paper by Susana Díez-Calvo, Iván Lidón, Rubén Rebollar, Ignacio Gil-Pérez: “This study aims to identify and map the problems of participatory processes in policymaking through a Service Design approach….Fifteen problems of participatory processes in policymaking were identified, and some differences were observed in the perception of these problems between the stakeholders responsible for designing and implementing the participatory processes (backstage stakeholders) and those who are called upon to participate (frontstage stakeholders). The problems were found to occur at different stages of the service and to affect different stakeholders. A number of design actions were proposed to help mitigate these problems from a human-centred approach. These included process improvements, digital opportunities, new technologies and staff training, among others…(More)”.

The disparities and development trajectories of nations in achieving the sustainable development goals


Paper by Fengmei Ma, et al: “The Sustainable Development Goals (SDGs) provide a comprehensive framework for societal progress and planetary health. However, it remains unclear whether universal patterns exist in how nations pursue these goals and whether key development areas are being overlooked. Here, we apply the product space methodology, widely used in development economics, to construct an ‘SDG space of nations’. The SDG space models the relative performance and specialization patterns of 166 countries across 96 SDG indicators from 2000 to 2022. Our SDG space reveals a polarized global landscape, characterized by distinct groups of nations, each specializing in specific development indicators. Furthermore, we find that as countries improve their overall SDG scores, they tend to modify their sustainable development trajectories, pursuing different development objectives. Additionally, we identify orphaned SDG indicators — areas where certain country groups remain under-specialized. These patterns, and the SDG space more broadly, provide a high-resolution tool to understand and evaluate the progress and disparities of countries towards achieving the SDGs…(More)”

Developing a Framework for Collective Data Rights


Report by Jeni Tennison: “Are collective data rights really necessary? Or, do people and communities already have sufficient rights to address harms through equality, public administration or consumer law? Might collective data rights even be harmful by undermining individual data rights or creating unjust collectivities? If we did have collective data rights, what should they look like? And how could they be introduced into legislation?

Data protection law and policy are founded on the notion of individual notice and consent, originating from the handling of personal data gathered for medical and scientific research. However, recent work on data governance has highlighted shortcomings with the notice-and-consent approach, especially in an age of big data and artificial intelligence. This special reports considers the need for collective data rights by examining legal remedies currently available in the United Kingdom in three scenarios where the people affected by algorithmic decision making are not data subjects and therefore do not have individual data protection rights…(More)”.

Un-Plateauing Corruption Research?Perhaps less necessary, but more exciting than one might think


Article by Dieter Zinnbauer: “There is a sense in the anti-corruption research community that we may have reached some plateau (or less politely, hit a wall). This article argues – at least partly – against this claim.

We may have reached a plateau with regard to some recurring (staid?) scholarly and policy debates that resurface with eerie regularity, tend to suck all oxygen out of the room, yet remain essentially unsettled and irresolvable. Questions aimed at arriving closure on what constitutes corruption, passing authoritative judgements  on what works and what does not and rather grand pronouncements on whether progress has or has not been all fall into this category.

 At the same time, there is exciting work often in unexpected places outside the inner ward of the anti-corruption castle,  contributing new approaches and fresh-ish insights and there are promising leads for exciting research on the horizon. Such areas include the underappreciated idiosyncrasies of corruption in the form of inaction rather than action, the use of satellites and remote sensing techniques to better understand and measure corruption, the overlooked role of short-sellers in tackling complex forms of corporate corruption and the growing phenomena of integrity capture, the anti-corruption apparatus co-opted for sinister, corrupt purposes.

These are just four examples of the colourful opportunity tapestry for (anti)corruption research moving forward, not in form of a great unified project and overarching new idea  but as little stabs of potentiality here and  there and somewhere else surprisingly unbeknownst…(More)”

Reimagining data for Open Source AI: A call to action


Report by Open Source Initiative: “Artificial intelligence (AI) is changing the world at a remarkable pace, with Open Source AI playing a pivotal role in shaping its trajectory. Yet, as AI advances, a fundamental challenge emerges: How do we create a data ecosystem that is not only robust but also equitable and sustainable?

The Open Source Initiative (OSI) and Open Future have taken a significant step toward addressing this challenge by releasing a white paper: “Data Governance in Open Source AI: Enabling Responsible and Systematic Access.” This document is the culmination of a global co-design process, enriched by insights from a vibrant two-day workshop held in Paris in October 2024….

The white paper offers a blueprint for a data ecosystem rooted in fairness, inclusivity and sustainability. It calls for two transformative shifts:

  1. From Open Data to Data Commons: Moving beyond the notion of unrestricted data to a model that balances openness with the rights and needs of all stakeholders.
  2. Broadening the stakeholder universe: Creating collaborative frameworks that unite communities, stewards and creators in equitable data-sharing practices.

To bring these shifts to life, the white paper delves into six critical focus areas:

  • Data preparation
  • Preference signaling and licensing
  • Data stewards and custodians
  • Environmental sustainability
  • Reciprocity and compensation
  • Policy interventions…(More)”

Wikenigma – an Encyclopedia of Unknowns


About: “Wikenigma is a unique wiki-based resource specifically dedicated to documenting fundamental gaps in human knowledge.

Listing scientific and academic questions to which no-one, anywhere, has yet been able to provide a definitive answer. [ 1141 so far ]

That’s to say, a compendium of so-called ‘Known Unknowns’.

The idea is to inspire and promote interest in scientific and academic research by highlighting opportunities to investigate problems which no-one has yet been able to solve.

You can start browsing the content via the main menu on the left (or in the ‘Main Menu’ section if you’re using a small-screen device) Alternatively, the search box (above right) will find any articles with details that match your search terms…(More)”.

Overcoming challenges associated with broad sharing of human genomic data


Paper by Jonathan E. LoTempio Jr & Jonathan D. Moreno: “Since the Human Genome Project, the consensus position in genomics has been that data should be shared widely to achieve the greatest societal benefit. This position relies on imprecise definitions of the concept of ‘broad data sharing’. Accordingly, the implementation of data sharing varies among landmark genomic studies. In this Perspective, we identify definitions of broad that have been used interchangeably, despite their distinct implications. We further offer a framework with clarified concepts for genomic data sharing and probe six examples in genomics that produced public data. Finally, we articulate three challenges. First, we explore the need to reinterpret the limits of general research use data. Second, we consider the governance of public data deposition from extant samples. Third, we ask whether, in light of changing concepts of broad, participants should be encouraged to share their status as participants publicly or not. Each of these challenges is followed with recommendations…(More)”.