Smart OCR – Advancing the Use of Artificial Intelligence with Open Data


Article by Parth Jain, Abhinay Mannepalli, Raj Parikh, and Jim Samuel: “Optical character recognition (OCR) is growing at a projected compounded annual growth rate (CAGR) of 16%, and is expected to have a value of 39.7 billion USD by 2030, as estimated by Straits research. There has been a growing interest in OCR technologies over the past decade. Optical character recognition is the technological process for transforming images of typed, handwritten, scanned, or printed texts into machine-encoded and machine-readable texts (Tappert, et al., 1990). OCR can be used with a broad range of image or scan formats – for example, these could be in the form of a scanned document such as a .pdf file, a picture of a piece of paper in .png or .jpeg format, or images with embedded text, such as characters on a coffee cup, title on the cover page of a book, the license number on vehicular plates, and images of code on websites. OCR has proven to be a valuable technological process for tackling the important challenge of transforming non-machine-readable data into machine readable data. This enables the use of natural language processing and computational methods on information-rich data which were previously largely non-processable. Given the broad array of scanned and image documents in open government data and other open data sources, OCR holds tremendous promise for value generation with open data.

Open data has been defined as “being data that is made freely available for open consumption, at no direct cost to the public, which can be efficiently located, filtered, downloaded, processed, shared, and reused without any significant restrictions on associated derivatives, use, and reuse” (Chidipothu et al., 2022). Large segments of open data contain images, visuals, scans, and other non-machine-readable content. The size and complexity associated with the manual analysis of such content is prohibitive. The most efficient way would be to establish standardized processes for transforming documents into their OCR output versions. Such machine-readable text could then be analyzed using a range of NLP methods. Artificial Intelligence (AI) can be viewed as being a “set of technologies that mimic the functions and expressions of human intelligence, specifically cognition and logic” (Samuel, 2021). OCR was one of the earliest AI technologies implemented. The first ever optical reader to identify handwritten numerals was the advanced reading machine “IBM 1287,” presented at the World Fair in New York in 1965 (Mori, et al., 1990). The value of open data is well established – however, the extent of usefulness of open data is dependent on “accessibility, machine readability, quality” and the degree to which data can be processed by using analytical and NLP methods (data.gov, 2022John, et al., 2022)…(More)”

Leveraging Data to Improve Racial Equity in Fair Housing


Report by Temilola Afolabi: “Residential segregation is related to inequalities in education, job opportunities, political power, access to credit, access to health care, and more. Steering, redlining, mortgage lending discrimination, and other historic policies have all played a role in creating this state of affairs.

Over time, federal efforts including the Fair Housing Act and Home Mortgage Disclosure Act have been designed to improve housing equity in the United States. While these laws have not been entirely effective, they have made new kinds of data available—data that can shed light on some of the historic drivers of housing inequity and help inform tailored solutions to their ongoing impact.

This report explores a number of current opportunities to strengthen longstanding data-driven tools to address housing equity. The report also shows how the effects of mortgage lending discrimination and other historic practices are still being felt today. At the same time, it outlines opportunities to apply data to increase equity in many areas related to the homeownership gap, including negative impacts on health and well-being, socioeconomic disparities, and housing insecurity….(More)”.

Closing the gap between user experience and policy design 


Article by Cecilia Muñoz & Nikki Zeichner: “..Ask the average American to use a government system, whether it’s for a simple task like replacing a Social Security Card or a complicated process like filing taxes, and you’re likely to be met with groans of dismay. We all know that government processes are cumbersome and frustrating; we have grown used to the government struggling to deliver even basic services. 

Unacceptable as the situation is, fixing government processes is a difficult task. Behind every exhausting government application form or eligibility screener lurks a complex policy that ultimately leads to what Atlantic staff writer Anne Lowrey calls the time tax, “a levy of paperwork, aggravation, and mental effort imposed on citizens in exchange for benefits that putatively exist to help them.” 

Policies are complex, in part because they each represent many voices. The people who we call policymakers are key actors in governments and elected officials at every level from city councils to the U.S. Congress. As they seek to solve public problems like child poverty or improving economic mobility, they consult with experts at government agencies, researchers in academia, and advocates working directly with affected communities. They also hear from lobbyists from affected industries. They consider current events and public sentiments. All of these voices and variables, representing different and sometimes conflicting interests, contribute to the policies that become law. And as a result, laws reflect a complex mix of objectives. After a new law is in place, relevant government agencies are responsible for implementing them by creating new programs and services to carry them out. Complex policies then get translated into complex processes and experiences for members of the public. They become long application forms, unclear directions, and too often, barriers that keep people from accessing a benefit. 

Policymakers and advocates typically declare victory when a new policy is signed into law; if they think about the implementation details at all, that work mostly happens after the ink is dry. While these policy actors may have deep expertise in a given issue area, or deep understanding of affected communities, they often lack experience designing services in a way that will be easy for the public to navigate…(More)”.

Data Analysis for Social Science: A Friendly and Practical Introduction


Book by Elena Llaudet and Kosuke Imai: “…provides a friendly introduction to the statistical concepts and programming skills needed to conduct and evaluate social scientific studies. Using plain language and assuming no prior knowledge of statistics and coding, the book provides a step-by-step guide to analyzing real-world data with the statistical program R for the purpose of answering a wide range of substantive social science questions. It teaches not only how to perform the analyses but also how to interpret results and identify strengths and limitations. This one-of-a-kind textbook includes supplemental materials to accommodate students with minimal knowledge of math and clearly identifies sections with more advanced material so that readers can skip them if they so choose…(More)”.

Machine Learning in Public Policy: The Perils and the Promise of Interpretability


Report by Evan D. Peet, Brian G. Vegetabile, Matthew Cefalu, Joseph D. Pane, Cheryl L. Damberg: “Machine learning (ML) can have a significant impact on public policy by modeling complex relationships and augmenting human decisionmaking. However, overconfidence in results and incorrectly interpreted algorithms can lead to peril, such as the perpetuation of structural inequities. In this Perspective, the authors give an overview of ML and discuss the importance of its interpretability. In addition, they offer the following recommendations, which will help policymakers develop trustworthy, transparent, and accountable information that leads to more-objective and more-equitable policy decisions: (1) improve data through coordinated investments; (2) approach ML expecting interpretability, and be critical; and (3) leverage interpretable ML to understand policy values and predict policy impacts…(More)”.

Institutions, Experts & the Loss of Trust


Essay by Henry E. Brady and Kay Lehman Schlozman: “Institutions are critical to our personal and societal well-being. They develop and disseminate knowledge, enforce the law, keep us healthy, shape labor relations, and uphold social and religious norms. But institutions and the people who lead them cannot fulfill their missions if they have lost legitimacy in the eyes of the people they are meant to serve.

Americans’ distrust of Congress is long-standing. What is less well-documented is how partisan polarization now aligns with the growing distrust of institutions once thought of as nonpolitical. Refusals to follow public health guidance about COVID-19, calls to defund the police, the rejection of election results, and disbelief of the press highlight the growing polarization of trust. But can these relationships be broken? And how does the polarization of trust affect institutions’ ability to confront shared problems, like climate change, epidemics, and economic collapse?…(More)”.

Humanizing Science and Engineering for the Twenty-First Century


Essay by Kaye Husbands Fealing, Aubrey Deveny Incorvaia and Richard Utz: “Solving complex problems is never a purely technical or scientific matter. When science or technology advances, insights and innovations must be carefully communicated to policymakers and the public. Moreover, scientists, engineers, and technologists must draw on subject matter expertise in other domains to understand the full magnitude of the problems they seek to solve. And interdisciplinary awareness is essential to ensure that taxpayer-funded policy and research are efficient and equitable and are accountable to citizens at large—including members of traditionally marginalized communities…(More)”.

Our Data, Ourselves


Book by Jacqueline D. Lipton: “Our Data, Ourselves addresses a common and crucial question: What can we as private individuals do to protect our personal information in a digital world? In this practical handbook, legal expert Jacqueline D. Lipton guides readers through important issues involving technology, data collection, and digital privacy as they apply to our daily lives.

Our Data, Ourselves covers a broad range of everyday privacy concerns with easily digestible, accessible overviews and real-world examples. Lipton explores the ways we can protect our personal data and monitor its use by corporations, the government, and others. She also explains our rights regarding sensitive personal data like health insurance records and credit scores, as well as what information retailers can legally gather, and how. Who actually owns our personal information? Can an employer legally access personal emails? What privacy rights do we have on social media? Answering these questions and more, Our Data, Ourselves provides a strategic approach to assuming control over, and ultimately protecting, our personal information…(More)”

Brain capital: A new vector for democracy strengthening


Report by the Brain Capital Alliance: “Democracies are increasingly under siege. Beyond direct external (e.g., warfare) and internal (e.g., populism, extremism) threats to democratic nations, multiple democracy-weakening factors are converging in our modern world. Brain health challenges, including mental, neurologic, and substance use disorders, social determinants of health, long COVID, undesired effects of technology, mis- and disinformation, and educational, health, and gender disparities, are associated with substantial economic and sociopolitical impediments. Herein, we argue that thriving democracies can distinguish themselves through provision of environments that enable each citizen to achieve their full brain health potential conducive to both personal and societal well-being. Gearing policymaking towards equitable and quality brain health may prove essential to combat brain challenges, promote societal cohesion, and boost economic productivity. We outline emerging policy innovations directed at building “pro-democratic brain health” across individual, communal, national, and international levels. While extensive research is warranted to further validate these approaches, brain health-directed policymaking harbors potential as a novel concept for democracy strengthening….(More)”.

Algorithms Quietly Run the City of DC—and Maybe Your Hometown


Article by Khari Johnson: “Washington, DC, IS the home base of the most powerful government on earth. It’s also home to 690,000 people—and 29 obscure algorithms that shape their lives. City agencies use automation to screen housing applicants, predict criminal recidivism, identify food assistance fraud, determine if a high schooler is likely to drop out, inform sentencing decisions for young people, and many other things.

That snapshot of semiautomated urban life comes from a new report from the Electronic Privacy Information Center (EPIC). The nonprofit spent 14 months investigating the city’s use of algorithms and found they were used across 20 agencies, with more than a third deployed in policing or criminal justice. For many systems, city agencies would not provide full details of how their technology worked or was used. The project team concluded that the city is likely using still more algorithms that they were not able to uncover.

The findings are notable beyond DC because they add to the evidence that many cities have quietly put bureaucratic algorithms to work across their departments, where they can contribute to decisions that affect citizens’ lives.

Government agencies often turn to automation in hopes of adding efficiency or objectivity to bureaucratic processes, but it’s often difficult for citizens to know they are at work, and some systems have been found to discriminate and lead to decisions that ruin human lives. In Michigan, an unemployment-fraud detection algorithm with a 93 percent error rate caused 40,000 false fraud allegations. A 2020 analysis by Stanford University and New York University found that nearly half of federal agencies are using some form of automated decisionmaking systems…(More)”.