Unlocking the Potential: The Call for an International Decade of Data


Working Paper by Stefaan Verhulst : “The goal of this working paper is to reiterate the central importance of data – to Artificial Intelligence (AI) in particular, but more generally to the landscape of digital technology.

What follows serves as a clarion call to the global community to prioritize and advance data as the bedrock for social and economic development, especially for the UN’s Sustainable Development Goals. It begins by recognizing the existence of significant remaining challenges related to data; encompassing issues of accessibility, distribution, divides, and asymmetries. In light of these challenges, and as we propel ourselves into an era increasingly dominated by AI and AI-related innovation, the paper argues that establishing a more robust foundation for the stewardship of data is critical; a foundation that, for instance, embodies inclusivity, self-determination, and responsibility.

Finally, the paper advocates for the creation of an International Decade of Data (IDD), an initiative aimed at solidifying this foundation globally and advancing our collective efforts towards data-driven progress.

Download ‘Unlocking the Potential: The Call for an International Decade of Data’ here

New Tools to Guide Data Sharing Agreements


Article by Andrew J. Zahuranec, Stefaan Verhulst, and Hannah Chafetz: “The process of forming a data-sharing agreement is not easy. The process involves figuring out incentives, evaluating the degree to which others are willing and able to collaborate, and defining the specific conduct that is and is not allowed. Even under the best of circumstances, these steps can be costly and time-consuming.

Today, the Open Data Policy Lab took a step to help data practitioners control these costs. Moving from Idea to Practice: Three Resources to Streamline the Creation of Data Sharing Agreements” provides data practitioners with three resources meant to support them throughout the process of developing an agreement. These include:

  • A Guide to Principled Data Sharing Agreement Negotiation by Design: A document outlining the different principles that a data practitioner might seek to uphold while negotiating an agreement;
  • The Contractual Wheel of Data Collaboration 2.0: A listing of the different kinds of data sharing agreement provisions that a data practitioner might include in an agreement;
  • A Readiness Matrix for Data Sharing Agreements: A form to evaluate the degree to which a partner can participate in a data-sharing agreement.

The resources are a result of a series of Open Data Action Labs, an initiative from the Open Data Policy Lab to define new strategies and tools that can help organizations resolve policy challenges they face. The Action Labs are built around a series of workshops (called “studios”) which given experts and stakeholders an opportunity to define the problems facing them and then ideate possible solutions in a collaborative setting. In February and March 2023, the Open Data Policy Lab and Trust Relay co-hosted conversations with experts in law, data, and smart cities on the challenge of forming a data sharing agreement. Find all the resources here.”

City Science


Book by Ramon Gras, and Jeremy Burke: “The Aretian team, a spin off company from the Harvard Innovation Lab, has developed a city science methodology to evaluate the relationship between city form and urban performance. This book illuminates the relationship between a city’s spatial design and quality of life it affords for the general population. By measuring innovation economies to design Innovation Districts, social networks and patterns to help form organization patterns, and city topology, morphology, entropy and scale to create 15 Minute Cities are some of the frameworks presented in this volume.
Therefore, urban designers, architects and engineers will be able to successfully tackle complex urban design challenges by using the authors’ frameworks and findings in their own work. Case studies help to present key insights from advanced, data-driven geospatial analyses of cities around the world in an illustrative manner. This inaugural book by Aretian Urban Analytics and Design will give readers a new set of tools to learn from, expand, and develop for the healthy growth of cities and regions around the world…(More)”.

Researchers warn we could run out of data to train AI by 2026. What then?


Article by Rita Matulionyte: “As artificial intelligence (AI) reaches the peak of its popularity, researchers have warned the industry might be running out of training data – the fuel that runs powerful AI systems. This could slow down the growth of AI models, especially large language models, and may even alter the trajectory of the AI revolution.

But why is a potential lack of data an issue, considering how much there are on the web? And is there a way to address the risk?…

We need a lot of data to train powerful, accurate and high-quality AI algorithms. For instance, ChatGPT was trained on 570 gigabytes of text data, or about 300 billion words.

Similarly, the stable diffusion algorithm (which is behind many AI image-generating apps such as DALL-E, Lensa and Midjourney) was trained on the LIAON-5B dataset comprising of 5.8 billion image-text pairs. If an algorithm is trained on an insufficient amount of data, it will produce inaccurate or low-quality outputs.

The quality of the training data is also important…This is why AI developers seek out high-quality content such as text from books, online articles, scientific papers, Wikipedia, and certain filtered web content. The Google Assistant was trained on 11,000 romance novels taken from self-publishing site Smashwords to make it more conversational.

The AI industry has been training AI systems on ever-larger datasets, which is why we now have high-performing models such as ChatGPT or DALL-E 3. At the same time, research shows online data stocks are growing much slower than datasets used to train AI.

In a paper published last year, a group of researchers predicted we will run out of high-quality text data before 2026 if the current AI training trends continue. They also estimated low-quality language data will be exhausted sometime between 2030 and 2050, and low-quality image data between 2030 and 2060.

AI could contribute up to US$15.7 trillion (A$24.1 trillion) to the world economy by 2030, according to accounting and consulting group PwC. But running out of usable data could slow down its development…(More)”.

Democratic Policy Development using Collective Dialogues and AI


Paper by Andrew Konya, Lisa Schirch, Colin Irwin, Aviv Ovadya: “We design and test an efficient democratic process for developing policies that reflect informed public will. The process combines AI-enabled collective dialogues that make deliberation democratically viable at scale with bridging-based ranking for automated consensus discovery. A GPT4-powered pipeline translates points of consensus into representative policy clauses from which an initial policy is assembled. The initial policy is iteratively refined with the input of experts and the public before a final vote and evaluation. We test the process three times with the US public, developing policy guidelines for AI assistants related to medical advice, vaccine information, and wars & conflicts. We show the process can be run in two weeks with 1500+ participants for around $10,000, and that it generates policy guidelines with strong public support across demographic divides. We measure 75-81% support for the policy guidelines overall, and no less than 70-75% support across demographic splits spanning age, gender, religion, race, education, and political party. Overall, this work demonstrates an end-to-end proof of concept for a process we believe can help AI labs develop common-ground policies, governing bodies break political gridlock, and diplomats accelerate peace deals…(More)”.

Matchmaking Research To Policy: Introducing Britain’s Areas Of Research Interest Database


Article by Kathryn Oliver: “Areas of research interest (ARIs) were originally recommended in the 2015 Nurse Review, which argued that if government stated what it needed to know more clearly and more regularly, then it would be easier for policy-relevant research to be produced.

During our time in government, myself and Annette Boaz worked to develop these areas of research interest, mobilize experts and produce evidence syntheses and other outputs addressing them, largely in response to the COVID pandemic. As readers of this blog will know, we have learned a lot about what it takes to mobilize evidence – the hard, and often hidden labor of creating and sustaining relationships, being part of transient teams, managing group dynamics, and honing listening and diplomatic skills.

Some of the challenges we encountered include the oft-cited, cultural gap between research and policy, the relevance of evidence, and the difficulty in resourcing knowledge mobilization and evidence synthesis require systemic responses. However, one challenge, the information gap noted by Nurse, between researchers and what government departments actually want to know offered a simpler solution.

Up until September 2023, departmental ARIs were published on gov.uk, in pdf or html format. Although a good start, we felt that having all the ARIs in one searchable database would make them more interactive and accessible. So, working with Overton, we developed the new ARI database. The primary benefit of the database will be to raise awareness of ARIs (through email alerts about new ARIs) and accessibility (by holding all ARIs in one place which is easily searchable)…(More)”.

What Is Public Trust in the Health System? Insights into Health Data Use


Open Access Book by Felix Gille: “This book explores the concept of public trust in health systems.

In the context of recent events, including public response to interventions to tackle the COVID-19 pandemic, vaccination uptake and the use of health data and digital health, this important book uses empirical evidence to address why public trust is vital to a well-functioning health system.

In doing so, it provides a comprehensive contemporary explanation of public trust, how it affects health systems and how it can be nurtured and maintained as an integral component of health system governance…(More)”.

Chatbots May ‘Hallucinate’ More Often Than Many Realize


Cade Metz at The New York Times: “When the San Francisco start-up OpenAI unveiled its ChatGPT online chatbot late last year, millions were wowed by the humanlike way it answered questions, wrote poetry and discussed almost any topic. But most people were slow to realize that this new kind of chatbot often makes things up.

When Google introduced a similar chatbot several weeks later, it spewed nonsense about the James Webb telescope. The next day, Microsoft’s new Bing chatbot offered up all sorts of bogus information about the Gap, Mexican nightlife and the singer Billie Eilish. Then, in March, ChatGPT cited a half dozen fake court cases while writing a 10-page legal brief that a lawyer submitted to a federal judge in Manhattan.

Now a new start-up called Vectara, founded by former Google employees, is trying to figure out how often chatbots veer from the truth. The company’s research estimates that even in situations designed to prevent it from happening, chatbots invent information at least 3 percent of the time — and as high as 27 percent.

Experts call this chatbot behavior “hallucination.” It may not be a problem for people tinkering with chatbots on their personal computers, but it is a serious issue for anyone using this technology with court documents, medical information or sensitive business data.

Because these chatbots can respond to almost any request in an unlimited number of ways, there is no way of definitively determining how often they hallucinate. “You would have to look at all of the world’s information,” said Simon Hughes, the Vectara researcher who led the project…(More)”.

Climate data can save lives. Most countries can’t access it.


Article by Zoya Teirstein: “Earth just experienced one of its hottest, and most damaging, periods on record. Heat waves in the United States, Europe, and China; catastrophic flooding in IndiaBrazilHong Kong, and Libya; and outbreaks of malaria, dengue, and other mosquito-borne illnesses across southern Asia claimed tens of thousands of lives. The vast majority of these deaths could have been averted with the right safeguards in place.

The World Meteorological Organization, or WMO, published a report last week that shows just 11 percent of countries have the full arsenal of tools required to save lives as the impacts of climate change — including deadly weather events, infectious diseases, and respiratory illnesses like asthma — become more extreme. The United Nations climate agency predicts that significant natural disasters will hit the planet 560 times per year by the end of this decade. What’s more, countries that lack early warning systems, such as extreme heat alerts, will see eight times more climate-related deaths than countries that are better prepared. By midcentury, some 50 percent of these deaths will take place in Africa, a continent that is responsible for around 4 percent of the world’s greenhouse gas emissions each year…(More)”.

Smart City Data Governance


OECD Report: “Smart cities leverage technologies, in particular digital, to generate a vast amount of real-time data to inform policy- and decision-making for an efficient and effective public service delivery. Their success largely depends on the availability and effective use of data. However, the amount of data generated is growing more rapidly than governments’ capacity to store and process them, and the growing number of stakeholders involved in data production, analysis and storage pushes cities data management capacity to the limit. Despite the wide range of local and national initiatives to enhance smart city data governance, urban data is still a challenge for national and city governments due to: insufficient financial resources; lack of business models for financing and refinancing of data collection; limited access to skilled experts; the lack of full compliance with the national legislation on data sharing and protection; and data and security risks. Facing these challenges is essential to managing and sharing data sensibly if cities are to boost citizens’ well-being and promote sustainable environments…(More)”