Smart OCR – Advancing the Use of Artificial Intelligence with Open Data


Article by Parth Jain, Abhinay Mannepalli, Raj Parikh, and Jim Samuel: “Optical character recognition (OCR) is growing at a projected compounded annual growth rate (CAGR) of 16%, and is expected to have a value of 39.7 billion USD by 2030, as estimated by Straits research. There has been a growing interest in OCR technologies over the past decade. Optical character recognition is the technological process for transforming images of typed, handwritten, scanned, or printed texts into machine-encoded and machine-readable texts (Tappert, et al., 1990). OCR can be used with a broad range of image or scan formats – for example, these could be in the form of a scanned document such as a .pdf file, a picture of a piece of paper in .png or .jpeg format, or images with embedded text, such as characters on a coffee cup, title on the cover page of a book, the license number on vehicular plates, and images of code on websites. OCR has proven to be a valuable technological process for tackling the important challenge of transforming non-machine-readable data into machine readable data. This enables the use of natural language processing and computational methods on information-rich data which were previously largely non-processable. Given the broad array of scanned and image documents in open government data and other open data sources, OCR holds tremendous promise for value generation with open data.

Open data has been defined as “being data that is made freely available for open consumption, at no direct cost to the public, which can be efficiently located, filtered, downloaded, processed, shared, and reused without any significant restrictions on associated derivatives, use, and reuse” (Chidipothu et al., 2022). Large segments of open data contain images, visuals, scans, and other non-machine-readable content. The size and complexity associated with the manual analysis of such content is prohibitive. The most efficient way would be to establish standardized processes for transforming documents into their OCR output versions. Such machine-readable text could then be analyzed using a range of NLP methods. Artificial Intelligence (AI) can be viewed as being a “set of technologies that mimic the functions and expressions of human intelligence, specifically cognition and logic” (Samuel, 2021). OCR was one of the earliest AI technologies implemented. The first ever optical reader to identify handwritten numerals was the advanced reading machine “IBM 1287,” presented at the World Fair in New York in 1965 (Mori, et al., 1990). The value of open data is well established – however, the extent of usefulness of open data is dependent on “accessibility, machine readability, quality” and the degree to which data can be processed by using analytical and NLP methods (data.gov, 2022John, et al., 2022)…(More)”

What China’s Algorithm Registry Reveals about AI Governance


Article by Matt Sheehan, and Sharon Du: “For the past year, the Chinese government has been conducting some of the earliest experiments in building regulatory tools to govern artificial intelligence (AI). In that process, China is trying to tackle a problem that will soon face governments around the world: Can regulators gain meaningful insight into the functioning of algorithms, and ensure they perform within acceptable bounds?

One particular tool deserves attention both for its impact within China, and for the lessons technologists and policymakers in other countries can draw from it: a mandatory registration system created by China’s internet regulator for recommendation algorithms.

Although the full details of the registry are not public, by digging into its online instruction manual, we can reveal new insights into China’s emerging regulatory architecture for algorithms.

The algorithm registry was created by China’s 2022 regulation on recommendation algorithms (English translation), which came into effect in March of this year and was led by the Cyberspace Administration of China (CAC). China’s algorithm regulation has largely focused on the role recommendation algorithms play in disseminating information, requiring providers to ensure that they don’t “endanger national security or the social public interest” and to “give an explanation” when they harm the legitimate interests of users. Other provisions sought to address monopolistic behavior by platforms and hot-button social issues, such as the role that dispatching algorithms play in creating dangerous labor conditions for Chinese delivery drivers…(More)”

Language and the Rise of the Algorithm


Book by Jeffrey M. Binder: “Bringing together the histories of mathematics, computer science, and linguistic thought, Language and the Rise of the Algorithm reveals how recent developments in artificial intelligence are reopening an issue that troubled mathematicians well before the computer age: How do you draw the line between computational rules and the complexities of making systems comprehensible to people? By attending to this question, we come to see that the modern idea of the algorithm is implicated in a long history of attempts to maintain a disciplinary boundary separating technical knowledge from the languages people speak day to day.
 
Here Jeffrey M. Binder offers a compelling tour of four visions of universal computation that addressed this issue in very different ways: G. W. Leibniz’s calculus ratiocinator; a universal algebra scheme Nicolas de Condorcet designed during the French Revolution; George Boole’s nineteenth-century logic system; and the early programming language ALGOL, short for algorithmic language. These episodes show that symbolic computation has repeatedly become entangled in debates about the nature of communication. Machine learning, in its increasing dependence on words, erodes the line between technical and everyday language, revealing the urgent stakes underlying this boundary.
 
The idea of the algorithm is a levee holding back the social complexity of language, and it is about to break. This book is about the flood that inspired its construction…(More)”.

The Ethics of Automated Warfare and Artificial Intelligence


Essay series introduced by Bessma Momani, Aaron Shull and Jean-François Bélanger: “…begins with a piece written by Alex Wilner titled “AI and the Future of Deterrence: Promises and Pitfalls.” Wilner looks at the issue of deterrence and provides an account of the various ways AI may impact our understanding and framing of deterrence theory and its practice in the coming decades. He discusses how different countries have expressed diverging views over the degree of AI autonomy that should be permitted in a conflict situation — as those more willing to cut humans out of the decision-making loop could gain a strategic advantage. Wilner’s essay emphasizes that differences in states’ technological capability are large, and this will hinder interoperability among allies, while diverging views on regulation and ethical standards make global governance efforts even more challenging.

Looking to the future of non-state use of drones as an example, the weapon technology transfer from nation-state to non-state actors can help us to understand how next-generation technologies may also slip into the hands of unsavoury characters such as terrorists, criminal gangs or militant groups. The effectiveness of Ukrainian drone strikes against the much larger Russian army should serve as a warning to Western militaries, suggests James Rogers in his essay “The Third Drone Age: Visions Out to 2040.” This is a technology that can level the field by asymmetrically advantaging conventionally weaker forces. The increased diffusion of drone technology enhances the likelihood that future wars will also be drone wars, whether these drones are autonomous systems or not. This technology, in the hands of non-state actors, implies future Western missions against, say, insurgent or guerilla forces will be more difficult.

Data is the fuel that powers AI and the broader digital transformation of war. In her essay “Civilian Data in Cyber Conflict: Legal and Geostrategic Considerations,” Eleonore Pauwels discusses how offensive cyber operations are aiming to alter the very data sets of other actors to undermine adversaries — whether through targeting centralized biometric facilities or individuals’ DNA sequence in genomic analysis databases, or injecting fallacious data into satellite imagery used in situational awareness. Drawing on the implications of international humanitarian law, Pauwels argues that adversarial data manipulation constitutes another form of “grey zone” operation that falls below a threshold of armed conflict. She evaluates the challenges associated with adversarial data manipulation, given that there is no internationally agreed upon definition of what constitutes cyberattacks or cyber hostilities within international humanitarian law (IHL).

In “AI and the Actual International Humanitarian Law Accountability Gap,” Rebecca Crootoff argues that technologies can complicate legal analysis by introducing geographic, temporal and agency distance between a human’s decision and its effects. This makes it more difficult to hold an individual or state accountable for unlawful harmful acts. But in addition to this added complexity surrounding legal accountability, novel military technologies are bringing an existing accountability gap in IHL into sharper focus: the relative lack of legal accountability for unintended civilian harm. These unintentional acts can be catastrophic, but technically within the confines of international law, which highlights the need for new accountability mechanisms to better protect civilians.

Some assert that the deployment of autonomous weapon systems can strengthen compliance with IHL by limiting the kinetic devastation of collateral damage, but AI’s fragility and apparent capacity to behave in unexpected ways poses new and unexpected risks. In “Autonomous Weapons: The False Promise of Civilian Protection,” Branka Marijan opines that AI will likely not surpass human judgment for many decades, if ever, suggesting that there need to be regulations mandating a certain level of human control over weapon systems. The export of weapon systems to states willing to deploy them on a looser chain-of-command leash should be monitored…(More)”.

Machine Learning in Public Policy: The Perils and the Promise of Interpretability


Report by Evan D. Peet, Brian G. Vegetabile, Matthew Cefalu, Joseph D. Pane, Cheryl L. Damberg: “Machine learning (ML) can have a significant impact on public policy by modeling complex relationships and augmenting human decisionmaking. However, overconfidence in results and incorrectly interpreted algorithms can lead to peril, such as the perpetuation of structural inequities. In this Perspective, the authors give an overview of ML and discuss the importance of its interpretability. In addition, they offer the following recommendations, which will help policymakers develop trustworthy, transparent, and accountable information that leads to more-objective and more-equitable policy decisions: (1) improve data through coordinated investments; (2) approach ML expecting interpretability, and be critical; and (3) leverage interpretable ML to understand policy values and predict policy impacts…(More)”.

AI Audit-Washing and Accountability


Report by Ellen P. Goodman and Julia Tréhu: “.. finds that auditing could be a robust means for holding AI systems accountable, but today’s auditing regimes are not yet adequate to the job. The report assesses the effectiveness of various auditing regimes and proposes guidelines for creating trustworthy auditing systems.

Various government and private entities rely on or have proposed audits as a way of ensuring AI systems meet legal, ethical and other standards. This report finds that audits can in fact provide an agile co-regulatory approach—one that relies on both governments and private entities—to ensure societal accountability for algorithmic systems through private oversight.

But the “algorithmic audit” remains ill-defined and inexact, whether concerning social media platforms or AI systems generally. The risk is significant that inadequate audits will obscure problems with algorithmic systems. A poorly designed or executed audit is at best meaningless and at worst even excuses harms that the audits claim to mitigate.

Inadequate audits or those without clear standards provide false assurance of compliance with norms and laws, “audit-washing” problematic or illegal practices. Like green-washing and ethics-washing before, the audited entity can claim credit without doing the work.

The paper identifies the core specifications needed in order for algorithmic audits to be a reliable AI accountability mechanism:

  • Who” conducts the audit—clearly defined qualifications, conditions for data access, and guardrails for internal audits;
  • What” is the type and scope of audit—including its position within a larger sociotechnical system;
  • Why” is the audit being conducted—whether for narrow legal standards or broader ethical goals, essential for audit comparison, along with potential costs; and
  • How” are the audit standards determined—an important baseline for the development of audit certification mechanisms and to guard against audit-washing.

Algorithmic audits have the potential to increase the reliability and innovation of technology in the twenty-first century, much as financial audits transformed the way businesses operated in the twentieth century. They will take different forms, either within a sector or across sectors, especially for systems that pose the highest risk. Ensuring that AI is accountable and trusted is key to ensuring that democracies remain centers of innovation while shaping technology to democratic values…(More)”

We could run out of data to train AI language programs 


Article by Tammy Xu: “Large language models are one of the hottest areas of AI research right now, with companies racing to release programs like GPT-3 that can write impressively coherent articles and even computer code. But there’s a problem looming on the horizon, according to a team of AI forecasters: we might run out of data to train them on.

Language models are trained using texts from sources like Wikipedia, news articles, scientific papers, and books. In recent years, the trend has been to train these models on more and more data in the hope that it’ll make them more accurate and versatile.

The trouble is, the types of data typically used for training language models may be used up in the near future—as early as 2026, according to a paper by researchers from Epoch, an AI research and forecasting organization, that is yet to be peer reviewed. The issue stems from the fact that, as researchers build more powerful models with greater capabilities, they have to find ever more texts to train them on. Large language model researchers are increasingly concerned that they are going to run out of this sort of data, says Teven Le Scao, a researcher at AI company Hugging Face, who was not involved in Epoch’s work.

The issue stems partly from the fact that language AI researchers filter the data they use to train models into two categories: high quality and low quality. The line between the two categories can be fuzzy, says Pablo Villalobos, a staff researcher at Epoch and the lead author of the paper, but text from the former is viewed as better-written and is often produced by professional writers…(More)”.

AI Localism in Practice: Examining How Cities Govern AI


Report by Sara Marcucci, Uma Kalkar, and Stefaan Verhulst: “…serves as a primer for policymakers and practitioners to learn about current governance practices and inspire their own work in the field. In this report, we present the fundamentals of AI governance, the value proposition of such initiatives, and their application in cities worldwide to identify themes among city- and state-led governance actions. We close with ten lessons on AI localism for policymakers, data, AI experts, and the informed public to keep in mind as cities grow increasingly ‘smarter’, which include: 

  • Principles provide a North Star for governance;
  • Public engagement provides a social license;
  • AI literacy enables meaningful engagement;
  • Tap into local expertise;
  • Innovate in how transparency is provided;
  • Establish new means for accountability and oversight;
  • Signal boundaries through binding laws and policies;
  • Use procurement to shape responsible AI markets;
  • Establish data collaboratives to tackle asymmetries; and
  • Make good governance strategic.

Considered together, we look to use our understanding of governance practices, local AI governance examples, and the ten overarching lessons to create an incipient framework for implementing and assessing AI localism initiatives in cities around the world….(More)”

Measuring the environmental impacts of artificial intelligence compute and applications


OECD Paper: “Artificial intelligence (AI) systems can use massive computational resources, raising sustainability concerns. This report aims to improve understanding of the environmental impacts of AI, and help measure and decrease AI’s negative effects while enabling it to accelerate action for the good of the planet. It distinguishes between the direct environmental impacts of developing, using and disposing of AI systems and related equipment, and the indirect costs and benefits of using AI applications. It recommends the establishment of measurement standards, expanding data collection, identifying AI-specific impacts, looking beyond operational energy use and emissions, and improving transparency and equity to help policy makers make AI part of the solution to sustainability challenges…(More)”.

Algorithms in the Public Sector. Why context matters


Paper by Georg Wenzelburger, Pascal D. König, Julia Felfeli, and Anja Achtziger: “Algorithms increasingly govern people’s lives, including through rapidly spreading applications in the public sector. This paper sheds light on acceptance of algorithms used by the public sector emphasizing that algorithms, as parts of socio-technical systems, are always embedded in a specific social context. We show that citizens’ acceptance of an algorithm is strongly shaped by how they evaluate aspects of this context, namely the personal importance of the specific problems an algorithm is supposed to help address and their trust in the organizations deploying the algorithm. The objective performance of presented algorithms affects acceptance much less in comparison. These findings are based on an original dataset from a survey covering two real-world applications, predictive policing and skin cancer prediction, with a sample of 2661 respondents from a representative German online panel. The results have important implications for the conditions under which citizens will accept algorithms in the public sector…(More)”.