The unmet potential of open data


Essay by Jane Bambauer: “Open Data holds great promise — and more than thought leaders appreciate. 

Open access to data can lead to a much richer and more diverse range of research and development, hastening innovation. That’s why scientific journals are asking authors to make their data available, why governments are making publicly held records open by default, and why even private companies provide subsets of their data for general research use. Facebook, for example, launched an effort to provide research data that could be used to study the impact of social networks on election outcomes. 

Yet none of these moves have significantly changed the landscape. Because of lingering skepticism and some legitimate anxieties, we have not yet democratized access to Big Data.

There are a few well-trodden explanations for this failure — or this tragedy of the anti-commons — but none should dissuade us from pushing forward….

Finally, creating the infrastructure required to clean data, link it to other data sources, and make it useful for the most valuable research questions will not happen without a significant investment from somebody, be it the government or a private foundation. As Stefaan Verhulst, Andrew Zahuranec, and Andrew Young have explained, creating a useful data commons requires much more infrastructure and cultural buy-in than one might think. 

From my perspective, however, the greatest impediment to the open data movement has been a lack of vision within the intelligentsia. Outside a few domains like public health, intellectuals continue to traffic in and thrive on anecdotes and narratives. They have not perceived or fully embraced how access to broad and highly diverse data could radically change newsgathering (we could observe purchasing or social media data in real time), market competition (imagine designing a new robot using data collected from Uber’s autonomous cars), and responsive government (we could directly test claims of cause and effect related to highly salient issues during election time). 

With a quiet accumulation of use cases and increasing competence in handling and digesting data, we will eventually reach a tipping point where the appetite for more useful research data will outweigh the concerns and inertia that have bogged down progress in the open data movement…(More)”.

The risks and rewards of real-time data


Article by David Pringle: “Unlike many valuable resources, real-time data is both abundant and growing rapidly. But it also needs to be handled with great care.

That was one of the key takeaways from an online workshop produced by Science|Business’ Data Rules group, which explored what the rapid growth in real-time data means for artificial intelligence (AI). Real-time data is increasingly feeding machine learning systems that then adjust the algorithms they use to make decisions, such as which news item to display on your screen or which product to recommend.  

“With AI, especially, you want to make sure that the data that you have is consistent, replicable and also valid,” noted Chris Atherton, senior research engagement officer at GÉANT, who described how his organisation transmits data captured by the European Space Agency’s satellites to researchers across the world. He explained that the images of earth taken by satellites are initially processed at three levels to correct for the atmospheric conditions at the time, the angle of the viewpoint and other variables, before being made more widely available for researchers and users to process further. The satellite data is also “validated against ground-based sources…in-situ data to make sure that it is actually giving you a reliable reading,” Atherton added.

Depending on the orbit of the satellites and the equipment involved, the processing can take a few hours or a few days before it is made available to the wider public.  One way to speed things up post publication is to place the pre-processed data into so-called data cubes, Atherton noted, which can then be integrated with AI systems. “You can send queries to the data cube itself rather than having to download the data directly to your own location to process it on your machine,” he explained….(More)”.

Data Portals and Citizen Engagement


Series of blogs by Tim Davies: “Portals have been an integral part of the open data movement. They provided a space for publishing and curation of data for governments (usually), and a space to discover and access data for users (often individuals, civil society organisations or sometimes private sector organisations building services or deriving insights from this data). 

While many data portals are still maintained, and while some of them enable access to a sizeable amount of data, portals face some big questions in the decade ahead:

  1. Are open data portals still fit for purpose (and if so, which purpose)?
  2. Do open data portals still “make sense” in this decade, or are they a public sector anomaly in a context when data lakes, data meshes, data platforms are adopted across industry? Is there a minimum viable spec for a future-proof open data “portal”?
  3. What roles and activities have emerged around data platforms and portals that deserve to be codified and supported by the future type of platforms?
  4. Could re-imagined open data “platforms” create change in the role of the public service organisation with regards to data (from publisher to… steward?)?
  5. How can a new generation of portals or data platforms better support citizen engagement and civic participation?
  6. What differences are there between the private and public approaches, and why? Does any difference introduce any significant dynamics in private / public open data ecosystems?…(More)”.

Strengthening CRVS Systems to Improve Migration Policy: A Promising Innovation


Blog by Tawheeda Wahabzada and Deirdre Appel: “Migration is one of the most pressing issues of our time and innovation for migration policy can take on several different shapes to help solve challenges. It is seen through radical technological breakthrough such as biometric identifiers that completely transform the status quo as well as technological disruptions like mobile phone fund transforms that alter an existing process. There is also incremental innovation, or the gradual improvement of an existing process or institution even. Regardless of where the fall on the spectrum, their innovative applications are all relevant to migration policy.

Incremental innovation for civil registration and vital statistics (CRVS) systems can greatly benefit migrants and the policymakers trying to help them. According to World Health Organization, a well-functioning CRVS system registers all births and deaths, issues birth and death certificates, and compiles and disseminates vital statistics, including cause of death information. It may also record marriages and divorces. Each of these services brings a world of crucial advantages. But despite the social and legal benefits for individuals, especially migrants, these systems remain underfunded and under functioning. More than 100 low and middle-income countries lack functional CRVS systems and about one-third of all births are not registered. This amounts to more than one billion people without a legal identity leaving them unable to prove who they are and creating serious barriers to access health, education, financial, and other social services.

Throughout countries in Africa, there are great differences in CRVS coverage, where birth coverage ranges from above 90 percent in some North African countries to under 50 percent across several countries in different regions; and with death registration having greater gaps with either no information or lower coverage rates. For countries with low functioning CRVS systems, potential migrants from these countries could face additional obstacles in obtaining birth certificates and proof of identification….(More)”. See also https://data4migration.org/blog/

“If Everybody’s White, There Can’t Be Any Racial Bias”: The Disappearance of Hispanic Drivers From Traffic Records


Article by Richard A. Webster: “When sheriff’s deputies in Jefferson Parish, Louisiana, pulled over Octavio Lopez for an expired inspection tag in 2018, they wrote on his traffic ticket that he is white. Lopez, who is from Nicaragua, is Hispanic and speaks only Spanish, said his wife.

In fact, of the 167 tickets issued by deputies to drivers with the last name Lopez over a nearly six-year span, not one of the motorists was labeled as Hispanic, according to records provided by the Jefferson Parish clerk of court. The same was true of the 252 tickets issued to people with the last name of Rodriguez, 234 named Martinez, 223 with the last name Hernandez and 189 with the surname Garcia.

This kind of misidentification is widespread — and not without harm. Across America, law enforcement agencies have been accused of targeting Hispanic drivers, failing to collect data on those traffic stops, and covering up potential officer misconduct and aggressive immigration enforcement by identifying people as white on tickets.

“If everybody’s white, there can’t be any racial bias,” Frank Baumgartner, a political science professor at the University of North Carolina of Chapel Hill, told WWNO/WRKF and ProPublica.

Nationally, states have tried to patch this data loophole and tighten controls against racial profiling. In recent years, legislators have passed widely hailed traffic stop data-collection laws in California, Colorado, Illinois, Oregon, Virginia and Washington, D.C. This April, Alabama became the 22nd state to enact similar legislation.

Though Louisiana has had its own data-collection requirement for two decades, it contains a loophole unlike any other state: It exempts law enforcement agencies from collecting and delivering data to the state if they have an anti-racial-profiling policy in place. This has rendered the law essentially worthless, said Josh Parker, a senior staff attorney at the Policing Project, a public safety research nonprofit at the New York University School of Law.

Louisiana State Rep. Royce Duplessis, D-New Orleans, attempted to remove the exemption two years ago, but law enforcement agencies protested. Instead, he was forced to convene a task force to study the issue, which thus far hasn’t produced any results, he said.

“They don’t want the data because they know what it would reveal,” Duplessis said of law enforcement agencies….(More)”.

How digital minilaterals can revive international cooperation


Blog by Tanya Filer and Antonio Weiss: “From London to the Organisation for Economic Co-operation and Development, calls to “reimagine” or “revive” multilateralism have been a dime a dozen this year. The global upheaval of COVID-19 and emerging megatrends—from the climate crisis to global population growth—have afforded a new urgency to international cooperation and highlighted a growing sclerosis within multilateralism that even its greatest proponents admit. 

While these calls—and the rethinking they are beginning to provoke—are crucial, a truly new and nuanced multilateralism will require room for other models too. As we described in a paper published last year at the Bennett Institute for Public Policy at the University of Cambridge, digital minilaterals are providing a new model for international cooperation. Made up of small, trust-based, innovation-oriented networks, digital minilaterals use digital culture, practices, processes, and technologies as tools to advance peer learning, support, and cooperation between governments. 

Though far removed from great power politics, digital minilaterals are beginning to help nation-states navigate an environment of rapid technological change and problems of complex systems, including through facilitating peer-learning, sharing code base, and deliberating on major ethical questions, such as the appropriate use of artificial intelligence in society. Digital minilateralism is providing a decentralized form of global cooperation and could help revive multilateralism. To be truly effective, digital minilaterals must place as much emphasis on common values as on pooled knowledge, but it remains to be seen whether these new diplomatic groupings will deliver on their promise….(More)”.

Do we know what jobs are in high demand?


Emma Rindlisbacher at Work Shift: “…Measuring which fields are in demand is harder than it sounds. Many of the available data sources, experts say, have significant flaws. And that causes problems for education providers who are trying to understand market demand and map their programs to it.

“If you are in higher education and trying to understand where the labor market is going, use BLS data as a general guide but do not rely too heavily on it when it comes to building programs and making investments,” said Jason Tyszko, the Vice President of the Center for Education and Workforce at the US Chamber of Commerce Foundation.

What’s In-Demand?

Why it matters: Colleges are turning to labor market data as they face increasing pressure from lawmakers and the public to demonstrate value and financial ROI. A number of states also have launched specialized grant and “free college” programs for residents pursuing education in high-demand fields. And many require state agencies to determine which fields are in high demand as part of workforce planning processes.

Virginia is one of those states. To comply with state law, the Board of Workforce Development has to regularly update a list of high demand occupations. Deciding how to do so can be challenging.

According to a presentation given at a September 2021 meeting, the board chose to determine which occupations are in high demand by using BLS data. The reason: the BLS data is publicly available.

“Although in some instances, proprietary data sources have different or additional nuances, in service of guiding principle #1 (transparency, replicability), our team has relied exclusively on publicly available data for this exercise,” the presentation said. (A representative from the board declined to comment, citing the still ongoing nature of constructing the high demand occupations list.)

The limits of the gold standard

For institutions looking to study job market trends, there are typically two main data sources available. The first, from BLS, are official government statistics primarily designed to track economic indicators such as the unemployment rate. The second, from proprietary companies such as Emsi Burning Glass, typically relies on postings to job board websites like LinkedIn. 

The details: The two sources have different strengths and weaknesses. The Emsi Burning Glass data can be considered “real time” data, because it identifies new job postings as they are released online. The BLS data, on the other hand, is updated less frequently but is comprehensive.

The BLS data is designed to compare economic trends across decades, and to map to state systems so that statistics like unemployment rates can be compared across states. For those reasons, the agency is reluctant to change the definitions underlying the data. That consistency, however, can make it difficult for education providers to use the data to determine which fields are in high demand.

BLS data is broken down according to the Standard Occupation Classification system, or SOC, a taxonomy used to classify different occupations. That taxonomy is designed to be public facing—the BLS website, for example, features a guide for job seekers that purports to tell them which occupation codes have the highest wages or the greatest potential for growth.

But the taxonomy was last updated in 2010, according to a BLS spokesperson…(More)”.

New York City passed a bill requiring ‘bias audits’ of AI hiring tech


Kate Kaye at Protocol: “Let the AI auditing vendor brigade begin. A year since it was introduced, New York City Council passed a bill earlier this week requiring companies that sell AI technologies for hiring to obtain audits assessing the potential of those products to discriminate against job candidates. The bill requiring “bias audits” passed with overwhelming support in a 38-4 vote.

The bill is intended to weed out the use of tools that enable already unlawful employment discrimination in New York City. If signed into law, it will require providers of automated employment decision tools to have those systems evaluated each year by an audit service and provide the results to companies using those systems.

AI for recruitment can include software that uses machine learning to sift through resumes and help make hiring decisions, systems that attempt to decipher the sentiments of a job candidate, or even tech involving games to pick up on subtle clues about someone’s hiring worthiness. The NYC bill attempts to encompass the full gamut of AI by covering everything from old-school decision trees to more complex systems operating through neural networks.

The legislation calls on companies using automated decision tools for recruitment not only to tell job candidates when they’re being used, but to tell them what information the technology used to evaluate their suitability for a job.

The bill, however, fails to go into detail on what constitutes a bias audit other than to define one as “an impartial evaluation” that involves testing. And it already has critics who say it was rushed into passage and doesn’t address discrimination related to disability or age…(More)”.

The Census Mapper


Google blog: “…The U.S. Census is one of the largest data sets journalists can access. It has layers and layers of important data that can help reporters tell detailed stories about their own communities. But the challenge is sorting through that data and visualizing it in a way that helps readers understand trends and the bigger picture.

Today we’re launching a new tool to help reporters dig through all that data to find stories and embed visualizations on their sites. The Census Mapper project is an embeddable map that displays Census data at the national, state and county level, as well as census tracts. It was produced in partnership with Pitch Interactive and Big Local News, as part of the 2020 Census Co-op (supported by the Google News Initiative and in cooperation with the JSK Journalism Fellowships).

This image shows a detailed, country level view of the Census Mapper, showing arrows across the US depicting movements of people and other demographic information from the Census

Census Mapper shows where populations have grown over time.

The Census data is pulled from the data collected and processed by The Associated Press, one of the Census Co-op partners. Census Mapper then lets local journalists easily embed maps showing population change at any level, helping them tell powerful stories in a more visual way about their communities.

This image shows changing demographic data from North Carolina, with arrows showing different movements around the state.

With the tool, you can zoom into states and below, such as North Carolina, shown here.

As part of our investment in data journalism we’re also making improvements to our Common Knowledge Project, a data explorer and visual journalism project to allow US journalists to explore local data. Built with journalists for journalists, the new version of Common Knowledge integrates journalist feedback and new features including geographic comparisons, new charts and visuals…(More)”.

Why Are We Failing at AI Ethics?


Article by Anja Kaspersen and Wendell Wallach: “…Extremely troubling is the fact that the people who are most vulnerable to negative impacts from such rapid expansions of AI systems are often the least likely to be able to join the conversation about these systems, either because they have no or restricted digital access or their lack of digital literacy makes them ripe for exploitation.

Such vulnerable groups are often theoretically included in discussions, but not empowered to take a meaningful part in making decisions. This engineered inequity, alongside human biases, risks amplifying otherness through neglect, exclusion, misinformation, and disinformation.

Society should be deeply concerned that nowhere near enough substantive progress is being made to develop and scale actionable legal, ethical oversight while simultaneously addressing existing inequalities.

So, why hasn’t more been done? There are three main issues at play: 

First, many of the existing dialogues around the ethics of AI and governance are too narrow and fail to understand the subtleties and life cycles of AI systems and their impacts.

Often, these efforts focus only on the development and deployment stages of the technology life cycle, when many of the problems occur during the earlier stages of conceptualization, research, and design. Or they fail to comprehend when and if AI system operates at a level of maturity required to avoid failure in complex adaptive systems.

Or they focus on some aspects of ethics, while ignoring other aspects that are more fundamental and challenging. This is the problem known as “ethics washing” – creating a superficially reassuring but illusory sense that ethical issues are being adequately addressed, to justify pressing forward with systems that end up deepening current patterns.

Let’s be clear: every choice entails tradeoffs. “Ethics talk” is often about underscoring the various tradeoffs entailed in differing courses of action. Once a course has been selected, comprehensive ethical oversight is also about addressing the considerations not accommodated by the options selected, which is essential to any future verification effort. This vital part of the process is often a stumbling block for those trying to address the ethics of AI.

The second major issue is that to date all the talk about ethics is simply that: talk. 

We’ve yet to see these discussions translate into meaningful change in managing the ways in which AI systems are being embedded into various aspect of our lives….

A third issue at play is that discussions on AI and ethics are still largely confined to the ivory tower.

There is an urgent need for more informed public discourse and serious investment in civic education around the societal impact of the bio-digital revolution. This could help address the first two problems, but most of what the general public currently perceives about AI comes from sci-fi tropes and blockbuster movies.

A few examples of algorithmic bias have penetrated the public discourse. But the most headline-grabbing research on AI and ethics tends to focus on far-horizon existential risks. More effort needs to be invested in communicating to the public that, beyond the hypothetical risks of future AI, there are real and imminent risks posed by why and how we embed AI systems that currently shape everyone’s daily lives….(More)”.