The emergence of non-personal data markets


Report by the Think Tank of the European Parliament: “The European Commission’s Data Strategy aims to create a single market for data, open to data from across the world, where personal and non-personal data, including sensitive business data, are secure. The EU Regulation on the free flow of non-personal data allows non-personal data to be stored and processed anywhere in the EU without unjustified restrictions, with limited exceptions based on grounds of public security. The creation of multiple common sector-specific European data spaces aims to ensure Europe’s global competitiveness and data sovereignty. The Data Act proposed by the Commission aims to remove barriers to data access for both consumers and businesses and to establish common rules to govern the sharing of data generated using connected products or related services.

The aim of the study is to provide an in-depth, comprehensive, and issue-specific analysis of the emergence of non-personal data markets in Europe. The study seeks to identify the potential value of the non-personal data market, potential challenges and solutions, and the legislative/policy measures necessary to facilitate the further development of non-personal data markets. The study also ranks the main non-personal data markets by size and growth rate and provides a sector-specific analysis for the mobility and transport, energy, and manufacturing sectors…(More)”.

Generative AI, Jobs, and Policy Response


Paper by the Global Partnership on AI: “Generative AI and the Future of Work remains notably absent from the global AI governance dialogue. Given the transformative potential of this technology in the workplace, this oversight suggests a significant gap, especially considering the substantial implications this technology has for workers, economies and society at large. As interest grows in the effects of Generative AI on occupations, debates centre around roles being replaced or enhanced by technology. Yet there is an incognita, the “Big Unknown”, an important number of workers whose future depends on decisions yet to be made
In this brief, recent articles about the topic are surveyed with special attention to the “Big Unknown”. It is not a marginal number: nearly 9% of the workforce, or 281 million workers worldwide, are in this category. Unlike previous AI developments which focused on automating narrow tasks, Generative AI models possess the scope, versatility, and economic viability to impact jobs across multiple industries and at varying skill levels. Their ability to produce human-like outputs in areas like language, content creation and customer interaction, combined with rapid advancement and low deployment costs, suggest potential near-term impacts that are much broader and more abrupt than prior waves of AI. Governments, companies, and social partners should aim to minimize any potential negative effects from Generative AI technology in the world of work, as well as harness potential opportunities to support productivity growth and decent work. This brief presents concrete policy recommendations at the global and local level. These insights, are aimed to guide the discourse towards a balanced and fair integration of Generative AI in our professional landscape To navigate this uncertain landscape and ensure that the benefits of Generative AI are equitably distributed, we recommend 10 policy actions that could serve as a starting point for discussion and implementation…(More)”.

Four Questions to Guide Decision-Making for Data Sharing and Integration


Paper by the Actionable Intelligence for Social Policy Center: “This paper presents a Four Question Framework to guide data integration partners in building a strong governance and legal foundation to support ethical data use. While this framework was developed based on work in the United States that routinely integrates public data, it is meant to be a simple, digestible tool that can be adapted to any context. The framework was developed through a series of public deliberation workgroups and 15 years of field experience working with a diversity of data integration efforts across the United States.
The Four Questions – Is this legal? Is this ethical? Is this a good idea? How do we know (and who decides)? – should be considered within an established data governance framework and alongside core partners to determine whether and how to move forward when building an Integrated Data System (IDS) and also at each stage of a specific data project. We discuss these questions in depth, with a particular focus on the role of governance in establishing legal and ethical data use. In addition, we provide example data governance structures from two IDS sites and hypothetical scenarios that illustrate key considerations for the Four Question Framework.
A robust governance process is essential for determining whether data sharing and integration is legal, ethical, and a good idea within the local context. This process is iterative and as relational as it is technical, which means authentic collaboration across partners should be prioritized at each stage of a data use project. The Four Questions serve as a guide for determining whether to undertake data sharing and integration and should be regularly revisited throughout the life of a project…(More)”.

AI chatbots do work of civil servants in productivity trial


Article by Paul Seddon: “Documents disclosed to the BBC have shed light on the use of AI-powered chatbot technology within government.

The chatbots have been used to analyse lengthy reports – a job that would normally be done by humans.

The Department for Education, which ran the trial, hopes it could boost productivity across Whitehall.

The PCS civil service union says it does not object to the use of AI – but clear guidelines are needed “so the benefits are shared by workers”.

The latest generation of chatbots, powered by artificial intelligence (AI), can quickly analyse reams of information, including images, to answer questions and summarise long articles.

They are expected to upend working practices across the economy in the coming years, and the government says they will have “significant implications” for the way officials work in future.

The education department ran the eight-week study over the summer under a contract with London-based company Faculty.ai, to test how so-called large language models (LLMs) could be used by officials.

The firm’s researchers used its access to a premium version of ChatGPT, the popular chatbot developed by OpenAI, to analyse draft local skills training plans that had been sent to the department to review.

These plans, drawn up by bodies representing local employers, are meant to influence the training offered by local further education colleges.

Results from the pilot are yet to be published, but documents and emails requested by the BBC under Freedom of Information laws offer an insight into the project’s aims.

According to an internal document setting out the reasons for the study, a chatbot would be used to summarise and compare the “main insights and themes” from the training plans.

The results, which were to be compared with summaries produced by civil servants, would test how Civil Service “productivity” might be improved.

It added that language models could analyse long, unstructured documents “where previously the only other option for be for individuals to read through all the reports”.

But the project’s aims went further, with hopes the chatbot could help provide “useful insights” that could help the department’s skills unit “identify future skills needs across the country”…(More)”.

The Participatory Turn in AI Design: Theoretical Foundations and the Current State of Practice


Paper by Fernando Delgado, Stephen Yang, Michael Madaio, and Qian Yang: “Despite the growing consensus that stakeholders affected by AI systems should participate in their design, enormous variation and implicit disagreements exist among current approaches. For researchers and practitioners who are interested in taking a participatory approach to AI design and development, it remains challenging to assess the extent to which any participatory approach grants substantive agency to stakeholders. This article thus aims to ground what we dub the “participatory turn” in AI design by synthesizing existing theoretical literature on participation and through empirical investigation and critique of its current practices. Specifically, we derive a conceptual framework through synthesis of literature across technology design, political theory, and the social sciences that researchers and practitioners can leverage to evaluate approaches to participation in AI design. Additionally, we articulate empirical findings concerning the current state of participatory practice in AI design based on an analysis of recently published research and semi-structured interviews with 12 AI researchers and practitioners. We use these empirical findings to understand the current state of participatory practice and subsequently provide guidance to better align participatory goals and methods in a way that accounts for practical constraints…(More)”.

Open: A Pan-ideological Panacea, a Free Floating Signifier


Paper by Andrea Liu: “Open” is a word that originated from FOSS (Free and Open Software movement) to mean a Commons-based, non-proprietary form of computer software development (Linux, Apache) based on a decentralized, poly-hierarchical, distributed labor model. But the word “open” has now acquired an unnerving over-elasticity, a word that means so many things that at times it appears meaningless. This essay is a rhetorical analysis (if not a deconstruction) of how the term “open” functions in digital culture, the promiscuity (if not gratuitousness) with which the term “open” is utilized in the wider society, and the sometimes blatantly contradictory ideologies a indiscriminately lumped together under this word…(More)”

Data Sandboxes: Managing the Open Data Spectrum


Primer by Uma Kalkar, Sampriti Saxena, and Stefaan Verhulst: “Opening up data offers opportunities to enhance governance, elevate public and private services, empower individuals, and bolster public well-being. However, achieving the delicate balance between open data access and the responsible use of sensitive and valuable information presents complex challenges. Data sandboxes are an emerging approach to balancing these needs.

In this white paper, The GovLab seeks to answer the following questions surrounding data sandboxes: What are data sandboxes? How can data sandboxes empower decision-makers to unlock the potential of open data while maintaining the necessary safeguards for data privacy and security? Can data sandboxes help decision-makers overcome barriers to data access and promote purposeful, informed data (re-)use?

The six characteristics of a data sandbox. Image by The GovLab.

After evaluating a series of case studies, we identified the following key findings:

  • Data sandboxes present six unique characteristics that make them a strong tool for facilitating open data and data re-use. These six characteristics are: controlled, secure, multi-sectoral and collaborative, high computing environments, temporal in nature, adaptable, and scalable.
  • Data sandboxes can be used for: pre-engagement assessment, data mesh enablement, rapid prototyping, familiarization, quality and privacy assurance, experimentation and ideation, white labeling and minimization, and maturing data insights.
  • There are many benefits to implementing data sandboxes. We found ten value propositions, such as: decreasing risk in accessing more sensitive data; enhancing data capacity; and fostering greater experimentation and innovation, to name a few.
  • When looking to implement a data sandbox, decision-makers should consider how they will attract and obtain high-quality, relevant data, keep the data fresh for accurate re-use, manage risks of data (re-)use, and translate and scale up sandbox solutions in real markets.
  • Advances in the use of the Internet of Things and Privacy Enhancing Technologies could help improve the creation, preparation, analysis, and security of data in a data sandbox. The development of these technologies, in parallel with European legislative measures such as the Digital Markets Act, the Data Act and the Data Governance Act, can improve the way data is unlocked in a data sandbox, improving trust and encouraging data (re-)use initiatives…(More)” (FULL PRIMER)”

Data Dysphoria: The Governance Challenge Posed by Large Learning Models


Paper by Susan Ariel Aaronson: “Only 8 months have passed since Chat-GPT and the large learning model underpinning it took the world by storm. This article focuses on the data supply chain—the data collected and then utilized to train large language models and the governance challenge it presents to policymakers These challenges include:

• How web scraping may affect individuals and firms which hold copyrights.
• How web scraping may affect individuals and groups who are supposed to be protected under privacy and personal data protection laws.
• How web scraping revealed the lack of protections for content creators and content providers on open access web sites; and
• How the debate over open and closed source LLM reveals the lack of clear and universal rules to ensure the quality and validity of datasets. As the US National Institute of Standards explained, many LLMs depend on “largescale datasets, which can lead to data quality and validity concerns. “The difficulty of finding the “right” data may lead AI actors to select datasets based more on accessibility and availability than on suitability… Such decisions could contribute to an environment where the data used in processes is not fully representative of the populations or phenomena that are being modeled, introducing downstream risks” –in short problems of quality and validity…(More)”.

International Definitions of Artificial Intelligence


Report by IAPP: “Computer scientist John McCarthy coined the term artificial intelligence in 1955, defining it as “the science and engineering of making intelligent machines.” He organized the Dartmouth Summer Research Project on Artificial Intelligence a year later — an event that many consider the birthplace of the field.

In today’s world, the definition of AI has been in continuous evolution, its contours and constraints changing to align with current and perhaps future technological progress and cultural contexts. In fact, most papers and articles are quick to point out the lack of common consensus around the definition of AI. As a resource from British research organization the Ada Lovelace Institute states, “We recognise that the terminology in this area is contested. This is a fast-moving topic, and we expect that terminology will evolve quickly.” The difficulty in defining AI is illustrated by what AI historian Pamela McCorduck called the “odd paradox,” referring to the idea that, as computer scientists find new and innovative solutions, computational techniques once considered AI lose the title as they become common and repetitive.

The indeterminate nature of the term poses particular challenges in the regulatory space. Indeed, in 2017 a New York City Council task force downgraded its mission to regulate the city’s use of automated decision-making systems to just defining the types of systems subject to regulation because it could not agree on a workable, legal definition of AI.

With this understanding, the following chart provides a snapshot of some of the definitions of AI from various global and sectoral (government, civil society and industry) perspectives. The chart is not an exhaustive list. It allows for cross-contextual comparisons from key players in the AI ecosystem…(More)”

Can Google Trends predict asylum-seekers’ destination choices?


Paper by Haodong Qi & Tuba Bircan: “Google Trends (GT) collate the volumes of search keywords over time and by geographical location. Such data could, in theory, provide insights into people’s ex ante intentions to migrate, and hence be useful for predictive analysis of future migration. Empirically, however, the predictive power of GT is sensitive, it may vary depending on geographical context, the search keywords selected for analysis, as well as Google’s market share and its users’ characteristics and search behavior, among others. Unlike most previous studies attempting to demonstrate the benefit of using GT for forecasting migration flows, this article addresses a critical but less discussed issue: when GT cannot enhance the performances of migration models. Using EUROSTAT statistics on first-time asylum applications and a set of push-pull indicators gathered from various data sources, we train three classes of gravity models that are commonly used in the migration literature, and examine how the inclusion of GT may affect models’ abilities to predict refugees’ destination choices. The results suggest that the effects of including GT are highly contingent on the complexity of different models. Specifically, GT can only improve the performance of relatively simple models, but not of those augmented by flow Fixed-Effects or by Auto-Regressive effects. These findings call for a more comprehensive analysis of the strengths and limitations of using GT, as well as other digital trace data, in the context of modeling and forecasting migration. It is our hope that this nuanced perspective can spur further innovations in the field, and ultimately bring us closer to a comprehensive modeling framework of human migration…(More)”.