Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality


Paper by Fabrizio Dell’Acqua et al: “The public release of Large Language Models (LLMs) has sparked tremendous interest in how humans will use Artificial Intelligence (AI) to accomplish a variety of tasks. In our study conducted with Boston Consulting Group, a global management consulting firm, we examine the performance implications of AI on realistic, complex, and knowledge-intensive tasks. The pre-registered experiment involved 758 consultants comprising about 7% of the individual contributor-level consultants at the company. After establishing a performance baseline on a similar task, subjects were randomly assigned to one of three conditions: no AI access, GPT-4 AI access, or GPT-4 AI access with a prompt engineering overview. We suggest that the capabilities of AI create a “jagged technological frontier” where some tasks are easily done by AI, while others, though seemingly similar in difficulty level, are outside the current capability of AI. For each one of a set of 18 realistic consulting tasks within the frontier of AI capabilities, consultants using AI were significantly more productive (they completed 12.2% more tasks on average, and completed task 25.1% more quickly), and produced significantly higher quality results (more than 40% higher quality compared to a control group). Consultants across the skills distribution benefited significantly from having AI augmentation, with those below the average performance threshold increasing by 43% and those above increasing by 17% compared to their own scores. For a task selected to be outside the frontier, however, consultants using AI were 19 percentage points less likely to produce correct solutions compared to those without AI. Further, our analysis shows the emergence of two distinctive patterns of successful AI use by humans along a spectrum of human-AI integration. One set of consultants acted as “Centaurs,” like the mythical halfhorse/half-human creature, dividing and delegating their solution-creation activities to the AI or to themselves. Another set of consultants acted more like “Cyborgs,” completely integrating their task flow with the AI and continually interacting with the technology…(More)”.

Artificial intelligence in local governments: perceptions of city managers on prospects, constraints and choices


Paper by Tan Yigitcanlar, Duzgun Agdas & Kenan Degirmenci: “Highly sophisticated capabilities of artificial intelligence (AI) have skyrocketed its popularity across many industry sectors globally. The public sector is one of these. Many cities around the world are trying to position themselves as leaders of urban innovation through the development and deployment of AI systems. Likewise, increasing numbers of local government agencies are attempting to utilise AI technologies in their operations to deliver policy and generate efficiencies in highly uncertain and complex urban environments. While the popularity of AI is on the rise in urban policy circles, there is limited understanding and lack of empirical studies on the city manager perceptions concerning urban AI systems. Bridging this gap is the rationale of this study. The methodological approach adopted in this study is twofold. First, the study collects data through semi-structured interviews with city managers from Australia and the US. Then, the study analyses the data using the summative content analysis technique with two data analysis software. The analysis identifies the following themes and generates insights into local government services: AI adoption areas, cautionary areas, challenges, effects, impacts, knowledge basis, plans, preparedness, roadblocks, technologies, deployment timeframes, and usefulness. The study findings inform city managers in their efforts to deploy AI in their local government operations, and offer directions for prospective research…(More)”.

AI and the next great tech shift


Book review by John Thornhill: “When the South Korean political activist Kim Dae-jung was jailed for two years in the early 1980s, he powered his way through some 600 books in his prison cell, such was his thirst for knowledge. One book that left a lasting impression was The Third Wave by the renowned futurist Alvin Toffler, who argued that an imminent information revolution was about to transform the world as profoundly as the preceding agricultural and industrial revolutions.

“Yes, this is it!” Kim reportedly exclaimed. When later elected president, Kim referred to the book many times in his drive to turn South Korea into a technological powerhouse.

Forty-three years after the publication of Toffler’s book, another work of sweeping futurism has appeared with a similar theme and a similar name. Although the stock in trade of futurologists is to highlight the transformational and the unprecedented, it is remarkable how much of their output appears the same.

The chief difference is that The Coming Wave by Mustafa Suleyman focuses more narrowly on the twin revolutions of artificial intelligence and synthetic biology. But the author would surely be delighted if his book were to prove as influential as Toffler’s in prompting politicians to action.

As one of the three co-founders of DeepMind, the London-based AI research company founded in 2010, and now chief executive of the AI start-up Inflection, Suleyman has been at the forefront of the industry for more than a decade. The Coming Wave bristles with breathtaking excitement about the extraordinary possibilities that the revolutions in AI and synthetic biology could bring about.

AI, we are told, could unlock the secrets of the universe, cure diseases and stretch the bounds of imagination. Biotechnology can enable us to engineer life and transform agriculture. “Together they will usher in a new dawn for humanity, creating wealth and surplus unlike anything ever seen,” he writes.

But what is striking about Suleyman’s heavily promoted book is how the optimism of his will is overwhelmed by the pessimism of his intellect, to borrow a phrase from the Marxist philosopher Antonio Gramsci. For most of history, the challenge of technology has been to unleash its power, Suleyman writes. Now the challenge has flipped.

In the 21st century, the dilemma will be how to contain technology’s power given the capabilities of these new technologies have exploded and the costs of developing them have collapsed. “Containment is not, on the face of it, possible. And yet for all our sakes, containment must be possible,” he writes…(More)”.

Data Commons


Paper by R. V. Guha et al: “Publicly available data from open sources (e.g., United States Census Bureau (Census), World Health Organization (WHO), Intergovernmental Panel on Climate Change (IPCC) are vital resources for policy makers, students and researchers across different disciplines. Combining data from different sources requires the user to reconcile the differences in schemas, formats, assumptions, and more. This data wrangling is time consuming, tedious and needs to be repeated by every user of the data. Our goal with Data Commons (DC) is to help make public data accessible and useful to those who want to understand this data and use it to solve societal challenges and opportunities. We do the data processing and make the processed data widely available via standard schemas and Cloud APIs. Data Commons is a distributed network of sites that publish data in a common schema and interoperate using the Data Commons APIs. Data from different Data Commons can be ‘joined’ easily. The aggregate of these Data Commons can be viewed as a single Knowledge Graph. This Knowledge Graph can then be searched over using Natural Language questions utilizing advances in Large Language Models. This paper describes the architecture of Data Commons, some of the major deployments and highlights directions for future work…(More)”.

Data Collaboratives


Policy Brief by Center for the Governance of Change: “Despite the abundance of data generated, it is becoming increasingly clear that its accessibility and advantages are not equitably or effectively distributed throughout society. Data asymmetries, driven in large part by deeply entrenched inequalities and lack of incentives by many public- and private-sector organizations to collaborate, are holding back the public good potential of data and hindering progress and innovation in key areas such as financial inclusion, health, and the future of work.

More (and better) collaboration is needed to address the data asymmetries that exist across society, but early efforts at opening data have fallen short of achieving their intended aims. In the EU, the proposed Data Act is seeking to address these shortcomings and make more data available for public use by setting up new rules on data sharing. However, critics say its current reading risks limiting the potential for delivering innovative solutions by failing to establish cross-sectoral data-sharing frameworks, leaving the issue of public data stewardship off the table, and avoiding the thorny question of business incentives.

This policy brief, based on Stefaan Verhulst’s recent policy paper for the Center for the Governance of Change, argues that data collaboratives, an emerging model of collaboration in which participants from different sectors exchange data to solve public problems, offer a promising solution to address these data asymmetries and contribute to a healthy data economy that can benefit society as a whole. However, data collaboratives require a systematic, sustainable, and responsible approach to be successful, with a particular focus on..(More):

Establishing a new science of questions, to help identify the most pressing public and private challenges that can be addressed with data sharing.Fostering a new profession of data stewards, to promote a culture of responsible sharing within organizations and recognize opportunities for productive collaboration.Clarifying incentives, to bring the private sector to the table and help operationalize data collaboration, ideally with some sort of market-led compensation model.
Establishing a social license for data reuse, to promote trust among stakeholders through public engagement, data stewardship, and an enabling regulatory framework.Becoming more data-driven about data, to improve our understanding of collaboration, build sustainable initiatives, and achieve project accountability.

Sharing Health Data: The Why, the Will, and the Way Forward.


Book edited by Grossmann C, Chua PS, Ahmed M, et al. : “Sharing health data and information1 across stakeholder groups is the bedrock of a learning health system. As data and information are increasingly combined across various sources, their generative value to transform health, health care, and health equity increases significantly. Facilitating this potential is an escalating surge of digital technologies (i.e., cloud computing, broadband and wireless solutions, digital health technologies, and application programming interfaces [APIs]) that, with each successive generation, not only enhance data sharing, but also improve in their ability to preserve privacy and identify and mitigate cybersecurity risks. These technological advances, coupled with notable policy developments, new interoperability standards (particularly the Fast Healthcare Interoperability Resources [FHIR] standard), and the launch of innovative payment models within the last decade, have resulted in a greater recognition of the value of health data sharing among patients, providers, and researchers. Consequently, a number of data sharing collaborations are emerging across the health care ecosystem.

Unquestionably, the COVID-19 pandemic has had a catalytic effect on this trend. The criticality of swift data exchange became evident at the outset of the pandemic, when the scientific community sought answers about the novel SARS-CoV-2 virus and emerging disease. Then, as the crisis intensified, data sharing graduated from a research imperative to a societal one, with a clear need to urgently share and link data across multiple sectors and industries to curb the effects of the pandemic and prevent the next one.

In spite of these evolving attitudes toward data sharing and the ubiquity of data-sharing partnerships, barriers persist. The practice of health data sharing occurs unevenly, prominent in certain stakeholder communities while absent in others. A stark contrast is observed between the volume, speed, and frequency with which health data is aggregated and linked—oftentimes with non-traditional forms of health data—for marketing purposes, and the continuing challenges patients experience in contributing data to their own health records. In addition, there are varying levels of data sharing. Not all types of data are shared in the same manner and at the same level of granularity, creating a patchwork of information. As highlighted by the gaps observed in the haphazard and often inadequate sharing of race and ethnicity data during the pandemic, the consequences can be severe—impacting the allocation of much-needed resources and attention to marginalized communities. Therefore, it is important to recognize the value of data sharing in which stakeholder participation is equitable and comprehensive— not only for achieving a future ideal state in health care, but also for redressing long-standing inequities…(More)”

Selected Readings on Open Data and Generative AI


By: María Esther Cervantes, Hannah Chafetz, Sampriti Saxena, & Stefaan G. Verhulst

Generative AI tools are increasingly used across sectors, including in governments. However, there is limited research on how these generative AI tools could impact open data policies and programs. What are the opportunities for generative AI and open data? What are the risks? Could generative AI transform the role of statistical agencies? Is there a need for a global charter to govern generative AI? 

Towards this end, in May 2023, The GovLab’s Open Data Policy Lab (a collaboration between The GovLab and Microsoft) hosted a panel discussion on the intersections of generative AI and open data and the ways in which generative AI could alter our existing conception of a third wave of open data. Building on the takeaways from this discussion, below we provide a curated list of annotated readings (listed alphabetically) on these topics. 

These selected readings focus on three main areas:  (1) the opportunities and risks of applying generative AI for open data, (2) generative AI governance models and discussion, and (3) the new role of national statistical agencies in the advent of these technologies. Given the speed at which these technologies are changing, we incorporate a wide variety of sources such as journal articles, reports from international organizations and think tanks, and blog posts. 

We found several common themes across these readings. First, there is generally consensus that generative AI tools can provide value for open data and National Statistical Offices, whether it be for increasing data discovery, accessibility, or stakeholder collaboration. However, privacy, security, and safety risks remain prevalent and must be balanced. Second, there is a lack of common standards or policies for generative AI specifically. There are concerns that without a common language or standardization, algorithms may be misconstrued across borders. Third, governments are recommending synthetic data as a way to minimize privacy concerns with open data. If done responsibly, generative AI could help produce synthetic data at a larger scale. Lastly, governments around the world do not all have the same capabilities and resources for applying generative AI in their work. The countries that lag behind on these capabilities may have more challenges and risks when trying to incorporate generative AI into their public services.

*****

Alam, Zaidul. “Harnessing the Power of Generative AI in a World of Open Government Data.” LinkedIn Blog, June 15, 2023.

  • In this LinkedIn article, the author discusses the opportunities to leverage Open Government Data (specifically, census data) for generative AI.
  • The author explains that Open Data and generative AI could be merged in several ways including: helping increase interactions between citizens and governments, develop tools to engage with public institutions, and answer search queries about domain specific data (e.g. health data). 
  • The author provides an example of how census data and AI applications could be merged: “By leveraging data APIs from the ABS and other similar institutions globally, Census Chat GPT could generate real-time, data-driven insights about demographic trends, socio-economic disparities, housing statistics, and more.”
  • There are many possible intersections between generative AI and Open Government Data: “In the future, we could see more sophisticated applications of generative AI to government open data. For example, AI could be used to generate comprehensive city planning scenarios based on urban development data, or to create personalized learning plans for students based on education data. Governments could also develop AI ‘public assistants’ that can explain complex legislation, provide real-time updates on policy changes, or guide citizens through bureaucratic procedures. Such AI assistants could democratize access to public information, reduce administrative burdens, and enhance civic engagement.”

Boom, Cedric, and Michael Reusens. Changing Data Sources in the Age of Machine Learning for Official Statistics, 2023. https://doi.org/10.48550/arXiv.2306.04338

  • This paper gives an overview of the main risks, liabilities and uncertainties associated with changing data sources in the context of machine learning for official statistics. 
  • The use of machine learning for official statistics has the potential to provide more timely, accurate and comprehensive insight into a wide range of topics, by leveraging the vast amounts of data that are generated by individuals and entities on a daily basis, statistical agencies can gain a more nuanced understanding of trends and patterns, but there are risks associated with this. Mainly, concerns about data quality, privacy and security and a need for the technical skills and infrastructure in government. 
  • Machine learning can be used to complement or even replace official statistics, and its ability to nowcast and forecast is an extremely valuable addition. By incorporating machine learning into official statistical production, one can benefit from the strengths of both approaches and make more informed decisions based on the most current and accurate data.
  • National statistics agencies are used to having their data completely under their control, but using external data sources to power innovative statistics can become problematic, establishing proper protocols and procedures for external data management is necessary. 

Goasduff, Laurence. “Is Synthetic Data the Future of AI? Q&A with Alexander Linden.” Gartner Interview, November 20, 2022.

  • In this interview with Alexander Linden, a VP Analyst at Gartner, he talks about the potential of synthetic data as a complement to open data to drive the development of more accurate AI models. 
  • He says, “Synthetic data can increase the accuracy of machine learning models. Real-world data is happenstance and does not contain all permutations of conditions or events possible in the real world. Synthetic data can counter this by generating data at the edges, or for conditions not yet seen.”
  • While synthetic data may offer a way to address biases and issues of quality in open data, Linden emphasizes the importance of transparency and explainability when it comes to the models creating and using synthetic data. 

Loukis, Euripidis, Stuti Saxena, Nina Rizun, Maria Ioanna Maratsi, Mohsan Ali, and Charalampos Alexopoulos. “ChatGPT Application Vis-a-Vis Open Government Data (OGD): Capabilities, Public Values, Issues and a Research Agenda.” In Electronic Government, edited by Ida Lindgren, Csaba Csáki, Evangelos Kalampokis, Marijn Janssen, Gabriela Viale Pereira, Shefali Virkar, Efthimios Tambouris, and Anneke Zuiderwijk, 95–110. Lecture Notes in Computer Science. Cham: Springer Nature Switzerland, 2023. https://doi.org/10.1007/978-3-031-41138-0_7.

  • In this paper, the authors analyze the opportunities and risks of using ChatGPT for Open Government Data from an Affordances Theory perspective. Through 12 expert interviews, the authors develop a series of research agendas to accelerate the understanding of how ChatGPT could impact Open Government Data. 
  • ChatGPT could have a positive impact on Open Government Data in several ways. These include: increasing user engagement, awareness, and accessibility, helping develop new Open Government strategies, offering new ways for data discovery through government chatbots, and balancing the supply and demand of Open Government Data. Additionally, from a public values perspective, ChatGPT could provide service-related and professionalism-related values for Open Government Data. It could help design user-driven Open Government Data initiatives and lower barriers to accessing Open Government Data amongst different stakeholders (e.g. citizens)–increasing transparency around government initiatives. 
  • The authors point to several issues that ChatGPT could pose for Open Government Data such as unknowingly collecting personal information from registered users and inaccurate summaries of Open Government Data from ChatGPT. Also, the lack of governance frameworks could lead to larger problems such as inadequate results, cybersecurity issues, and algorithmic biases caused by language differences across countries. 
  • In order to harness the value of ChatGPT for Open Government Data, additional research is needed on how ChatGPT could be used to increase use and value generation from Open Government Data, how ChatGPT could benefit the publishing of Open Government Data, and the potential issues of ChatGPT for Open Government Data. 

Sallier, Kenza, and Kate Burnett-Isaacs. “Unlocking the Power of Data Synthesis with the Starter Guide on Synthetic Data for Official Statistics.” Statistics Canada, March 10, 2023.

  • In this piece, Statistics Canada provides a set of guidelines for National Statistics Offices to use when leveraging synthetic data. 
  • Using UNECE’s report as the guide, the piece explains that using synthetic data can help increase access to statistical data in a privacy compliant manner. It can help with publishing data, testing analysis, education, and testing software. Additionally, it explains the three main ways in which synthetic data can be generated: sequential modeling, stimulated data, and deep learning methods. 
  • The article provides an overview of the pros and cons of using Generative Adversarial Networks to create synthetic data for National Statistics Offices.
    • Pros: “GANs have been used in NSOs to generate continuous, discrete and textual datasets, while ensuring that the underlying distribution and patterns of the original data are preserved. Furthermore, recent research has been focused on the generation of free-text data which can be convenient in situations where models need to be developed to classify text data.”
    • Cons: “GANs can be seen as too complex to understand, explain or implement where there is only a minimal knowledge of neural networks. There is often a criticism associated with neural networks as lacking in transparency. The method is time consuming and has a high demand for computational resources. GANs may suffer from mode collapse, and lack of diversity, although newer variations of the algorithm seem to remedy these issues. Modelling discrete data can be difficult for GAN models.”
  • In sum, the article explains that synthetic data can provide benefits for National Statistics Offices and Generative Adversarial Networks can help produce the synthetic data. However, those undertaking the initiative need to balance the many associated risks. 

Ziesche, Soenke. “Open Data for AI: What Now?” UNESCO Digital Library, 2023. 

  • This report summarizes UNESCO’s guidelines for Member States in opening up data for AI systems. 
  • The report explains that there is an enormous amount of data already being collected through automated systems (building off of the COVID-19 pandemic). This data is often too large to be manually processed. AI and data science methods have the capacity to discover new information from these large data sources. 
  • The report is divided into 3 phases: the preparation phase, the opening data phase, and follow-up phase for data re-use: “The preparation phase guides  Member  States  in  preparing  for  opening  their  data,  and  includes  the  fol-lowing  suggested  steps:  drafting  an  open  data  policy,  gathering  and  collecting high quality data, developing open data capacities and making the data AI-ready. The opening of the data phase consists of the following steps: selecting datasets to be opened, opening the datasets legally, opening the datasets technically, and creating  an  open-data-driven  culture.  The  follow-up  for  reuse  and  sustainability phase consists of the following steps: supporting citizen engagement, supporting international engagement, supporting beneficial AI engagement, and maintaining high quality data.”

*****

We plan to explore these topics further over the coming months. Professionals interested in collaborating with The GovLab on these topics can contact Stefaan Verhulst, Co-Founder & Chief Research and Development Officer at sverhulst@thegovlab.org.

Stay up-to-date on the latest developments of this work by signing up for the Data Stewards Network Newsletter.

Learn more about the Open Data Policy Lab by visiting our website: https://opendatapolicylab.org/.

Unlocking AI’s Potential for Everyone


Article by Diane Coyle: “…But while some policymakers do have deep knowledge about AI, their expertise tends to be narrow, and most other decision-makers simply do not understand the issue well enough to craft sensible policies. Owing to this relatively low knowledge base and the inevitable asymmetry of information between regulators and regulated, policy responses to specific issues are likely to remain inadequate, heavily influenced by lobbying, or highly contested.

So, what is to be done? Perhaps the best option is to pursue more of a principles-based policy. This approach has already gained momentum in the context of issues like misinformation and trolling, where many experts and advocates believe that Big Tech companies should have a general duty of care (meaning a default orientation toward caution and harm reduction).

In some countries, similar principles already apply to news broadcasters, who are obligated to pursue accuracy and maintain impartiality. Although enforcement in these domains can be challenging, the upshot is that we do already have a legal basis for eliciting less socially damaging behavior from technology providers.

When it comes to competition and market dominance, telecoms regulation offers a serviceable model with its principle of interoperability. People with competing service providers can still call each other because telecom companies are all required to adhere to common technical standards and reciprocity agreements. The same is true of ATMs: you may incur a fee, but you can still withdraw cash from a machine at any bank.

In the case of digital platforms, a lack of interoperability has generally been established by design, as a means of locking in users and creating “moats.” This is why policy discussions about improving data access and ensuring access to predictable APIs have failed to make any progress. But there is no technical reason why some interoperability could not be engineered back in. After all, Big Tech companies do not seem to have much trouble integrating the new services that they acquire when they take over competitors.

In the case of LLMs, interoperability probably could not apply at the level of the models themselves, since not even their creators understand their inner workings. However, it can and should apply to interactions between LLMs and other services, such as cloud platforms…(More)”.

City CIOs urged to lay the foundations for generative AI


Article by Sarah Wray: “The London Office of Technology and Innovation (LOTI) has produced a collection of guides to support local authorities in using generative artificial intelligence (genAI) tools such as ChatGPT, Bard, Midjourney and Dall-E.

The resources include a guide for local authority leaders and another aimed at all staff, as well as a guide designed specifically for council Chief Information Officers (CIOs), which was developed with AI software company Faculty.

Sam Nutt, Researcher and Data Ethicist at LOTI, a membership organisation for over 20 boroughs and the Greater London Authority, told Cities Today: “Generative AI won’t solve every problem for local governments, but it could be a catalyst to transform so many processes for how we work.

“On the one hand, personal assistants integrated into programmes like Word, Excel or Powerpoint could massively improve officer productivity. On another level there is a chance to reimagine services and government entirely, thinking about how gen AI models can do so many tasks with data that we couldn’t do before, and allow officers to completely change how they spend their time.

“There are both opportunities and challenges, but the key message on both is that local governments should be ambitious in using this ‘AI moment’ to reimagine and redesign our ways of working to be better at delivering services now and in the future for our residents.”

As an initial step, local governments are advised to provide training and guidelines for staff. Some have begun to implement these steps, including US cities such as BostonSeattle and San Jose.

Nutt stressed that generative AI policies are useful but not a silver bullet for governance and that they will need to be revisited and updated regularly as technology and regulations evolve…(More)”.

How citywide data strategies can connect the dots, drive results


Blog by Bloomberg Cities Network: “Data is more central than ever to improving service delivery, managing performance, and identifying opportunities that better serve residents. That’s why a growing number of cities are adding a new tool to their arsenal—the citywide data strategy—to provide teams with a holistic view of data efforts and then lay out a roadmap for scaling successful approaches throughout city hall.

These comprehensive strategies are increasingly “critical to help mayors reach their visions,” according to Amy Edward Holmes, executive director The Bloomberg Center for Government Excellence at John Hopkins University, which is helping dozens of cities across the Americas up their data games as part of the Bloomberg Philanthropies City Data Alliance (CDA).

Bloomberg Cities spoke with experts in the field and leaders in pioneering cities to learn more about the importance of citywide data strategies and how they can help:

  • Turn “pockets of promise” into citywide strengths;
  • Build upon and consolidate other citywide strategic efforts; 
  • Improve performance management and service delivery;
  • Align staff data capabilities with city needs;
  • Drive lasting cultural change through leadership commitment…(More)”.