Interwoven Realms: Data Governance as the Bedrock for AI Governance


Essay by Stefaan G. Verhulst and Friederike Schüür: “In a world increasingly captivated by the opportunities and challenges of artificial intelligence (AI), there has been a surge in the establishment of committees, forums, and summits dedicated to AI governance. These platforms, while crucial, often overlook a fundamental pillar: the role of data governance. As we navigate through a plethora of discussions and debates on AI, this essay seeks to illuminate the often-ignored yet indispensable link between AI governance and robust data governance.

The current focus on AI governance, with its myriad ethical, legal, and societal implications, tends to sidestep the fact that effective AI governance is, at its core, reliant on the principles and practices of data governance. This oversight has resulted in a fragmented approach, leading to a scenario where the data and AI communities operate in isolation, often unaware of the essential synergy that should exist between them.

This essay delves into the intertwined nature of these two realms. It provides six reasons why AI governance is unattainable without a comprehensive and robust framework of data governance. In addressing this intersection, the essay aims to shed light on the necessity of integrating data governance more prominently into the conversation on AI, thereby fostering a more cohesive and effective approach to the governance of this transformative technology.

Six reasons why Data Governance is the bedrock for AI Governance...(More)”.

New York City Takes Aim at AI


Article by Samuel Greengard: “As concerns over artificial intelligence (AI) grow and angst about its potential impact increase, political leaders and government agencies are taking notice. In November, U.S. president Joe Biden issued an executive order designed to build guardrails around the technology. Meanwhile, the European Union (EU) is currently developing a legal framework around responsible AI.

Yet, what is often overlooked about artificial intelligence is that it’s more likely to impact people on a local level. AI touches housing, transportation, healthcare, policing and numerous other areas relating to business and daily life. It increasingly affects citizens, government employees, and businesses in both obvious and unintended ways.

One city attempting to position itself at the vanguard of AI is New York. In October 2023, New York City announced a blueprint for developing, managing, and using the technology responsibly. The New York City Artificial Intelligence Action Plan—the first of its kind in the U.S.—is designed to help officials and the public navigate the AI space.

“It’s a fairly comprehensive plan that addresses both the use of AI within city government and the responsible use of the technology,” says Clifford S. Stein, Wai T. Chang Professor of Industrial Engineering and Operations Research and Interim Director of the Data Science Institute at Columbia University.

Adds Stefaan Verhulst, co-founder and chief research and development officer at The GovLab and Senior Fellow at the Center for Democracy and Technology (CDT), “AI localism focuses on the idea that cities are where most of the action is in regard to AI.”…(More)”.

Updates to the OECD’s definition of an AI system explained


Article by Stuart Russell: “Obtaining consensus on a definition for an AI system in any sector or group of experts has proven to be a complicated task. However, if governments are to legislate and regulate AI, they need a definition to act as a foundation. Given the global nature of AI, if all governments can agree on the same definition, it allows for interoperability across jurisdictions.

Recently, OECD member countries approved a revised version of the Organisation’s definition of an AI system. We published the definition on LinkedIn, which, to our surprise, received an unprecedented number of comments.

We want to respond better to the interest our community has shown in the definition with a short explanation of the rationale behind the update and the definition itself. Later this year, we can share even more details once they are finalised.

How OECD countries updated the definition

Here are the revisions to the current text of the definition of “AI System” in detail, with additions set out in bold and subtractions in strikethrough):

An AI system is a machine-based system that can, for a given set of human-defined explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as makes predictions, content, recommendations, or decisions that can influenceing physical real or virtual environments. Different AI systems are designed to operate with varying in their levels of autonomy and adaptiveness after deployment…(More)”

Hypotheses devised by AI could find ‘blind spots’ in research


Article by Matthew Hutson: “One approach is to use AI to help scientists brainstorm. This is a task that large language models — AI systems trained on large amounts of text to produce new text — are well suited for, says Yolanda Gil, a computer scientist at the University of Southern California in Los Angeles who has worked on AI scientists. Language models can produce inaccurate information and present it as real, but this ‘hallucination’ isn’t necessarily bad, Mullainathan says. It signifies, he says, “‘here’s a kind of thing that looks true’. That’s exactly what a hypothesis is.”

Blind spots are where AI might prove most useful. James Evans, a sociologist at the University of Chicago, has pushed AI to make ‘alien’ hypotheses — those that a human would be unlikely to make. In a paper published earlier this year in Nature Human Behaviour4, he and his colleague Jamshid Sourati built knowledge graphs containing not just materials and properties, but also researchers. Evans and Sourati’s algorithm traversed these networks, looking for hidden shortcuts between materials and properties. The aim was to maximize the plausibility of AI-devised hypotheses being true while minimizing the chances that researchers would hit on them naturally. For instance, if scientists who are studying a particular drug are only distantly connected to those studying a disease that it might cure, then the drug’s potential would ordinarily take much longer to discover.

When Evans and Sourati fed data published up to 2001 to their AI, they found that about 30% of its predictions about drug repurposing and the electrical properties of materials had been uncovered by researchers, roughly six to ten years later. The system can be tuned to make predictions that are more likely to be correct but also less of a leap, on the basis of concurrent findings and collaborations, Evans says. But “if we’re predicting what people are going to do next year, that just feels like a scoop machine”, he adds. He’s more interested in how the technology can take science in entirely new directions….(More)”

Understanding AI jargon: Artificial intelligence vocabulary


Article by Kate Woodford: “Today, the Cambridge Dictionary announces its Word of the Year for 2023: hallucinate. You might already be familiar with this word, which we use to talk about seeing, hearing, or feeling things that don’t really exist. But did you know that it has a new meaning when it’s used in the context of artificial intelligence?

To celebrate the Word of the Year, this post is dedicated to AI terms that have recently come into the English language. AI, as you probably know, is short for artificial intelligence – the use of computer systems with qualities similar to the human brain that allow them to ‘learn’ and ‘think’. It’s a subject that arouses a great deal of interest and excitement and, it must be said, a degree of anxiety. Let’s have a look at some of these new words and phrases and see what they mean and how we’re using them to talk about AI…

As the field of AI continues to develop quickly, so does the language we use to talk about it. In a recent New Words post, we shared some words about AI that are being considered for addition to the Cambridge Dictionary…(More)”.

Boston experimented with using generative AI for governing. It went surprisingly well


Article by Santiago Garces and Stephen Goldsmith: “…we see the possible advances of generative AI as having the most potential. For example, Boston asked OpenAI to “suggest interesting analyses” after we uploaded 311 data. In response, it suggested two things: time series analysis by case time, and a comparative analysis by neighborhood. This meant that city officials spent less time navigating the mechanics of computing an analysis, and had more time to dive into the patterns of discrepancy in service. The tools make graphs, maps, and other visualizations with a simple prompt. With lower barriers to analyze data, our city officials can formulate more hypotheses and challenge assumptions, resulting in better decisions.

Not all city officials have the engineering and web development experience needed to run these tests and code. But this experiment shows that other city employees, without any STEM background, could, with just a bit of training, utilize these generative AI tools to supplement their work.

To make this possible, more authority would need to be granted to frontline workers who too often have their hands tied with red tape. Therefore, we encourage government leaders to allow workers more discretion to solve problems, identify risks, and check data. This is not inconsistent with accountability; rather, supervisors can utilize these same generative AI tools, to identify patterns or outliers—say, where race is inappropriately playing a part in decision-making, or where program effectiveness drops off (and why). These new tools will more quickly provide an indication as to which interventions are making a difference, or precisely where a historic barrier is continuing to harm an already marginalized community.  

Civic groups will be able to hold government accountable in new ways, too. This is where the linguistic power of large language models really shines: Public employees and community leaders alike can request that tools create visual process maps, build checklists based on a description of a project, or monitor progress compliance. Imagine if people who have a deep understanding of a city—its operations, neighborhoods, history, and hopes for the future—can work toward shared goals, equipped with the most powerful tools of the digital age. Gatekeepers of formerly mysterious processes will lose their stranglehold, and expediters versed in state and local ordinances, codes, and standards, will no longer be necessary to maneuver around things like zoning or permitting processes. 

Numerous challenges would remain. Public workforces would still need better data analysis skills in order to verify whether a tool is following the right steps and producing correct information. City and state officials would need technology partners in the private sector to develop and refine the necessary tools, and these relationships raise challenging questions about privacy, security, and algorithmic bias…(More)”

The AI regulations that aren’t being talked about


Article by Deloitte: “…But our research shows that this focus may be overlooking some of the most important tools already on the books. Of the 1,600+ policies we analyzed, only 11% were focused on regulating AI-adjacent issues like data privacy, cybersecurity, intellectual property, and so on (Figure 5). Even when limiting the search to only regulations, 60% were focused directly on AI and only 40% on AI-adjacent issues (Figure 5). For example, several countries have data protection agencies with regulatory powers to help protect citizens’ data privacy. But while these agencies may not have AI or machine learning named specifically in their charters, the importance of data in training and using AI models makes them an important AI-adjacent tool.

This can be problematic because directly regulating a fast-moving technology like AI can be difficult. Take the hypothetical example of removing bias from home loan decisions. Regulators could accomplish this goal by mandating that AI should have certain types of training data to ensure that the models are representative and will not produce biased results, but such an approach can become outdated when new methods of training AI models emerge. Given the diversity of different types of AI models already in use, from recurrent neural networks to generative pretrained transformers to generative adversarial networks and more, finding a single set of rules that can deliver what the public desires both now, and in the future, may be a challenge…(More)”.

Researchers warn we could run out of data to train AI by 2026. What then?


Article by Rita Matulionyte: “As artificial intelligence (AI) reaches the peak of its popularity, researchers have warned the industry might be running out of training data – the fuel that runs powerful AI systems. This could slow down the growth of AI models, especially large language models, and may even alter the trajectory of the AI revolution.

But why is a potential lack of data an issue, considering how much there are on the web? And is there a way to address the risk?…

We need a lot of data to train powerful, accurate and high-quality AI algorithms. For instance, ChatGPT was trained on 570 gigabytes of text data, or about 300 billion words.

Similarly, the stable diffusion algorithm (which is behind many AI image-generating apps such as DALL-E, Lensa and Midjourney) was trained on the LIAON-5B dataset comprising of 5.8 billion image-text pairs. If an algorithm is trained on an insufficient amount of data, it will produce inaccurate or low-quality outputs.

The quality of the training data is also important…This is why AI developers seek out high-quality content such as text from books, online articles, scientific papers, Wikipedia, and certain filtered web content. The Google Assistant was trained on 11,000 romance novels taken from self-publishing site Smashwords to make it more conversational.

The AI industry has been training AI systems on ever-larger datasets, which is why we now have high-performing models such as ChatGPT or DALL-E 3. At the same time, research shows online data stocks are growing much slower than datasets used to train AI.

In a paper published last year, a group of researchers predicted we will run out of high-quality text data before 2026 if the current AI training trends continue. They also estimated low-quality language data will be exhausted sometime between 2030 and 2050, and low-quality image data between 2030 and 2060.

AI could contribute up to US$15.7 trillion (A$24.1 trillion) to the world economy by 2030, according to accounting and consulting group PwC. But running out of usable data could slow down its development…(More)”.

Chatbots May ‘Hallucinate’ More Often Than Many Realize


Cade Metz at The New York Times: “When the San Francisco start-up OpenAI unveiled its ChatGPT online chatbot late last year, millions were wowed by the humanlike way it answered questions, wrote poetry and discussed almost any topic. But most people were slow to realize that this new kind of chatbot often makes things up.

When Google introduced a similar chatbot several weeks later, it spewed nonsense about the James Webb telescope. The next day, Microsoft’s new Bing chatbot offered up all sorts of bogus information about the Gap, Mexican nightlife and the singer Billie Eilish. Then, in March, ChatGPT cited a half dozen fake court cases while writing a 10-page legal brief that a lawyer submitted to a federal judge in Manhattan.

Now a new start-up called Vectara, founded by former Google employees, is trying to figure out how often chatbots veer from the truth. The company’s research estimates that even in situations designed to prevent it from happening, chatbots invent information at least 3 percent of the time — and as high as 27 percent.

Experts call this chatbot behavior “hallucination.” It may not be a problem for people tinkering with chatbots on their personal computers, but it is a serious issue for anyone using this technology with court documents, medical information or sensitive business data.

Because these chatbots can respond to almost any request in an unlimited number of ways, there is no way of definitively determining how often they hallucinate. “You would have to look at all of the world’s information,” said Simon Hughes, the Vectara researcher who led the project…(More)”.

Assessing and Suing an Algorithm


Report by Elina Treyger, Jirka Taylor, Daniel Kim, and Maynard A. Holliday: “Artificial intelligence algorithms are permeating nearly every domain of human activity, including processes that make decisions about interests central to individual welfare and well-being. How do public perceptions of algorithmic decisionmaking in these domains compare with perceptions of traditional human decisionmaking? What kinds of judgments about the shortcomings of algorithmic decisionmaking processes underlie these perceptions? Will individuals be willing to hold algorithms accountable through legal channels for unfair, incorrect, or otherwise problematic decisions?

Answers to these questions matter at several levels. In a democratic society, a degree of public acceptance is needed for algorithms to become successfully integrated into decisionmaking processes. And public perceptions will shape how the harms and wrongs caused by algorithmic decisionmaking are handled. This report shares the results of a survey experiment designed to contribute to researchers’ understanding of how U.S. public perceptions are evolving in these respects in one high-stakes setting: decisions related to employment and unemployment…(More)”.