AI By the People, For the People


Article by Billy Perrigo/Karnataka: “…To create an effective English-speaking AI, it is enough to simply collect data from where it has already accumulated. But for languages like Kannada, you need to go out and find more.

This has created huge demand for datasets—collections of text or voice data—in languages spoken by some of the poorest people in the world. Part of that demand comes from tech companies seeking to build out their AI tools. Another big chunk comes from academia and governments, especially in India, where English and Hindi have long held outsize precedence in a nation of some 1.4 billion people with 22 official languages and at least 780 more indigenous ones. This rising demand means that hundreds of millions of Indians are suddenly in control of a scarce and newly-valuable asset: their mother tongue.

Data work—creating or refining the raw material at the heart of AI— is not new in India. The economy that did so much to turn call centers and garment factories into engines of productivity at the end of the 20th century has quietly been doing the same with data work in the 21st. And, like its predecessors, the industry is once again dominated by labor arbitrage companies, which pay wages close to the legal minimum even as they sell data to foreign clients for a hefty mark-up. The AI data sector, worth over $2 billion globally in 2022, is projected to rise in value to $17 billion by 2030. Little of that money has flowed down to data workers in India, Kenya, and the Philippines.

These conditions may cause harms far beyond the lives of individual workers. “We’re talking about systems that are impacting our whole society, and workers who make those systems more reliable and less biased,” says Jonas Valente, an expert in digital work platforms at Oxford University’s Internet Institute. “If you have workers with basic rights who are more empowered, I believe that the outcome—the technological system—will have a better quality as well.”

In the neighboring villages of Alahalli and Chilukavadi, one Indian startup is testing a new model. Chandrika works for Karya, a nonprofit launched in 2021 in Bengaluru (formerly Bangalore) that bills itself as “the world’s first ethical data company.” Like its competitors, it sells data to big tech companies and other clients at the market rate. But instead of keeping much of that cash as profit, it covers its costs and funnels the rest toward the rural poor in India. (Karya partners with local NGOs to ensure access to its jobs go first to the poorest of the poor, as well as historically marginalized communities.) In addition to its $5 hourly minimum, Karya gives workers de-facto ownership of the data they create on the job, so whenever it is resold, the workers receive the proceeds on top of their past wages. It’s a model that doesn’t exist anywhere else in the industry…(More)”.

Public Policy and Technological Transformations in Africa


Book edited by Gedion Onyango: “This book examines the links between public policy and Fourth Industrial Revolution (4IR) technological developments in Africa. It broadly assesses three key areas – policy entrepreneurship, policy tools and citizen participation – in order to better understand the interfaces between public policy and technological transformations in African countries. The book presents incisive case studies on topics including AI policies, mobile money, e-budgeting, digital economy, digital agriculture and digital ethical dilemmas in order to illuminate technological proliferation in African policy systems. Its analysis considers the broader contexts of African state politics and governance. It will appeal to students, instructors, researchers and practitioners interested in governance and digital transformations in developing countries…(More)”.

Creating public sector value through the use of open data


Summary paper prepared as part of data.europa.eu: “This summary paper provides an overview of the different stakeholder activities undertaken, ranging from surveys to a focus group, and presents the key insights from this campaign regarding data reuse practices, barriers to data reuse in the public sector and suggestions to overcome these barriers. The following recommendations are made to help data.europa.eu support public administrations to boost open data value creation.

  • When it comes to raising awareness and communication, any action should also contain examples of data reuse by the public sector. Gathering and communicating such examples and use cases greatly helps in understanding the importance of the role of the public sector as a data reuser
  • When it comes to policy and regulation, it would be beneficial to align the ‘better regulation’ activities and roadmaps of the European Commission with the open data publication activities, in order to better explore the internal data needs. Furthermore, it would be helpful to facilitate a similar alignment and data needs analysis for all European public administrations. For example, this could be done by providing examples, best practices and methodologies on how to map data needs for policy and regulatory purposes.
  • Existing monitoring activities, such as surveys, should be revised to ensure that data reuse by the public sector is included. It would be useful to create a panel of users, based on the existing wide community, that could be used for further surveys.
  • The role of data stewards remains central to favouring reuse. Therefore, examples, best practices and methodologies on the role of data stewards should be included in the support activities – not specifically for public sector reusers, but in general…(More)”.

Why This AI Moment May Be the Real Deal


Essay by Ari Schulman: “For many years, those in the know in the tech world have known that “artificial intelligence” is a scam. It’s been true for so long in Silicon Valley that it was true before there even was a Silicon Valley.

That’s not to say that AI hadn’t done impressive things, solved real problems, generated real wealth and worthy endowed professorships. But peek under the hood of Tesla’s “Autopilot” mode and you would find odd glitches, frustrated promise, and, well, still quite a lot of people hidden away in backrooms manually plugging gaps in the system, often in real time. Study Deep Blue’s 1997 defeat of world chess champion Garry Kasparov, and your excitement about how quickly this technology would take over other cognitive work would wane as you learned just how much brute human force went into fine-tuning the software specifically to beat Kasparov. Read press release after press release of FacebookTwitter, and YouTube promising to use more machine learning to fight hate speech and save democracy — and then find out that the new thing was mostly a handmaid to armies of human grunts, and for many years relied on a technological paradigm that was decades old.

Call it AI’s man-behind-the-curtain effect: What appear at first to be dazzling new achievements in artificial intelligence routinely lose their luster and seem limited, one-off, jerry-rigged, with nothing all that impressive happening behind the scenes aside from sweat and tears, certainly nothing that deserves the name “intelligence” even by loose analogy.

So what’s different now? What follows in this essay is an attempt to contrast some of the most notable features of the new transformer paradigm (the T in ChatGPT) with what came before. It is an attempt to articulate why the new AIs that have garnered so much attention over the past year seem to defy some of the major lines of skepticism that have rightly applied to past eras — why this AI moment might, just might, be the real deal…(More)”.

Wikipedia’s Moment of Truth


Article by Jon Gertner at the New York Times: “In early 2021, a Wikipedia editor peered into the future and saw what looked like a funnel cloud on the horizon: the rise of GPT-3, a precursor to the new chatbots from OpenAI. When this editor — a prolific Wikipedian who goes by the handle Barkeep49 on the site — gave the new technology a try, he could see that it was untrustworthy. The bot would readily mix fictional elements (a false name, a false academic citation) into otherwise factual and coherent answers. But he had no doubts about its potential. “I think A.I.’s day of writing a high-quality encyclopedia is coming sooner rather than later,” he wrote in “Death of Wikipedia,” an essay that he posted under his handle on Wikipedia itself. He speculated that a computerized model could, in time, displace his beloved website and its human editors, just as Wikipedia had supplanted the Encyclopaedia Britannica, which in 2012 announced it was discontinuing its print publication.

Recently, when I asked this editor — he asked me to withhold his name because Wikipedia editors can be the targets of abuse — if he still worried about his encyclopedia’s fate, he told me that the newer versions made him more convinced that ChatGPT was a threat. “It wouldn’t surprise me if things are fine for the next three years,” he said of Wikipedia, “and then, all of a sudden, in Year 4 or 5, things drop off a cliff.”..(More)”.

Inclusive Cyber Policy Making


Toolkit by Global Digital Partnership: “Marginalised perspectives, particularly from women and LGBTQ+ communities, are largely absent in current cyber norm discussions. This is a serious issue, as marginalised groups often face elevated and specific threats in cyberspace

Our bespoke toolkit provides policymakers and other stakeholders with a range of resources to address this lack of inclusion, including:

  • A how-to guide on developing an inclusive process to develop a cybernorm or implement existing agreed norms
  • An introduction to key terms and concepts relevant to inclusivity and cybernorms
  • Key questions for facilitating inclusive stakeholder mapping processes
  • A mapping of regional and global cybernorm processes…(More)”.

Cross-Border Data Policy Index


Report by the Global Data Alliance: “The ability to responsibly transfer data around the globe supports cross-border economic opportunity, cross-border technological and scientific progress, and cross-border digital transformation and inclusion, among other public policy objectives. To assess where policies have helped create an enabling environment for cross-border data and its associated benefits, the Global Data Alliance has developed the Cross-Border Data Policy Index.

The Cross-Border Data Policy Index offers a quantitative and qualitative assessment of the relative openness or restrictiveness of cross-border data policies across nearly 100 economies. Global economies are classified into four levels. At Level 1 are economies that impose relatively fewer limits on the cross-border access to knowledge, information, digital tools, and economic opportunity for their citizens and legal persons. Economies’ restrictiveness scores increase as they are found to impose greater limits on cross-border data, thereby eroding opportunities for digital transformation while also impeding other policy objectives relating to health, safety, security, and the environment…(More)”.

Patients are Pooling Data to Make Diabetes Research More Representative


Blog by Tracy Kariuki: “Saira Khan-Gallo knows how overwhelming managing and living healthily with diabetes can be. As a person living with type 1 diabetes for over two decades, she understands how tracking glucose levels, blood pressure, blood cholesterol, insulin intake, and, and, and…could all feel like drowning in an infinite pool of numbers.

But that doesn’t need to be the case. This is why Tidepool, a non-profit tech organization composed of caregivers and other people living with diabetes such as Gallo, is transforming diabetes data management. Its data visualization platform enables users to make sense of the data and derive insights into their health status….

Through its Big Data Donation Project, Tidepool has been supporting the advancement of diabetes research by sharing anonymized data from people living with diabetes with researchers.

To date, more than 40,000 individuals have chosen to donate data uploaded from their diabetes devices like blood glucose meters, insulin pumps and continuous glucose monitors, which is then shared by Tidepool with students, academics, researchers, and industry partners — Making the database larger than many clinical trials. For instance, Oregon Health and Science University have used datasets collected from Tidepool to build an algorithm that predicts hypoglycemia, which is low blood sugar, with the goal of advancing closed loop therapy for diabetes management…(More)”.

What prevents us from reusing medical real-world data in research


Paper by Julia Gehrmann, Edit Herczog, Stefan Decker & Oya Beyan: “Recent studies show that Medical Data Science (MDS) carries great potential to improve healthcare. Thereby, considering data from several medical areas and of different types, i.e. using multimodal data, significantly increases the quality of the research results. On the other hand, the inclusion of more features in an MDS analysis means that more medical cases are required to represent the full range of possible feature combinations in a quantity that would be sufficient for a meaningful analysis. Historically, data acquisition in medical research applies prospective data collection, e.g. in clinical studies. However, prospectively collecting the amount of data needed for advanced multimodal data analyses is not feasible for two reasons. Firstly, such a data collection process would cost an enormous amount of money. Secondly, it would take decades to generate enough data for longitudinal analyses, while the results are needed now. A worthwhile alternative is using real-world data (RWD) from clinical systems of e.g. university hospitals. This data is immediately accessible in large quantities, providing full flexibility in the choice of the analyzed research questions. However, when compared to prospectively curated data, medical RWD usually lacks quality due to the specificities of medical RWD outlined in section 2. The reduced quality makes its preparation for analysis more challenging…(More)”.

Data-driven research and healthcare: public trust, data governance and the NHS


Paper by Angeliki Kerasidou & Charalampia (Xaroula) Kerasidou: “It is widely acknowledged that trust plays an important role for the acceptability of data sharing practices in research and healthcare, and for the adoption of new health technologies such as AI. Yet there is reported distrust in this domain. Although in the UK, the NHS is one of the most trusted public institutions, public trust does not appear to accompany its data sharing practices for research and innovation, specifically with the private sector, that have been introduced in recent years. In this paper, we examine the question of, what is it about sharing NHS data for research and innovation with for-profit companies that challenges public trust? To address this question, we draw from political theory to provide an account of public trust that helps better understand the relationship between the public and the NHS within a democratic context, as well as, the kind of obligations and expectations that govern this relationship. Then we examine whether the way in which the NHS is managing patient data and its collaboration with the private sector fit under this trust-based relationship. We argue that the datafication of healthcare and the broader ‘health and wealth’ agenda adopted by consecutive UK governments represent a major shift in the institutional character of the NHS, which brings into question the meaning of public good the NHS is expected to provide, challenging public trust. We conclude by suggesting that to address the problem of public trust, a theoretical and empirical examination of the benefits but also the costs associated with this shift needs to take place, as well as an open conversation at public level to determine what values should be promoted by a public institution like the NHS….(More)”.