Data Maturity Assessment for Government


UK Government: “The Data Maturity Assessment (DMA) for Government is a robust and comprehensive framework, designed by the public sector for the public sector. The DMA represents a big step forward in our shared ambition to establish and strengthen the data foundations in government by enabling a granular view of the current status of our data environments.

The systematic and detailed picture that the DMA results provide can be used to deliver value in the data function and across the enterprise. Maturity results, and the progression behaviours/features outlined in the DMA, will be essential to reviewing and setting data strategy. DMA outputs provide a way to communicate and evidence how the data ecosystem is critical to the business. When considered in the context of organisational priorities and responsibilities, DMA outputs can assist in:

  • identifying and mitigating strategic risk arising from low data maturity, and where higher maturity needs to be maintained
  • targeting and prioritising investment in the most important data initiatives
  • assuring the data environment for new services and programmes…(More)”.

Whose data commons? Whose city?


Blog by Gijs van Maanen and Anna Artyushina: “In 2020, the notion of data commons became a staple of the new European Data Governance Strategy, which envisions data cooperatives as key players of the European Union’s (EU) emerging digital market. In this new legal landscape, public institutions, businesses, and citizens are expected to share their data with the licensed data-governance entities that will oversee its responsible reuse. In 2022, the Open Future Foundation released several white papers where the NGO (non-govovernmental organisation) detailed a vision for the publicly governed and funded EU level data commons. Some academic researchers see data commons as a way to break the data silos maintained and exploited by Big Tech and, potentially, dismantle surveillance capitalism.

In this blog post, we discuss data commons as a concept and practice. Our argument here is that, for data commons to become a (partial) solution to the issues caused by data monopolies, they need to be politicised. As smart city scholar Shannon Mattern pointedly argues, the city is not a computer. This means that digitization and datafication of our cities involves making choices about what is worth digitising and whose interests are prioritised. These choices and their implications must be foregrounded when we discuss data commons or any emerging forms of data governance. It is important to ask whose data is made common and, subsequently, whose city we will end up living in. ..(More)”

You Can’t Regulate What You Don’t Understand


Article by Tim O’Reilly: “The world changed on November 30, 2022 as surely as it did on August 12, 1908 when the first Model T left the Ford assembly line. That was the date when OpenAI released ChatGPT, the day that AI emerged from research labs into an unsuspecting world. Within two months, ChatGPT had over a hundred million users—faster adoption than any technology in history.

The hand wringing soon began…

All of these efforts reflect the general consensus that regulations should address issues like data privacy and ownership, bias and fairness, transparency, accountability, and standards. OpenAI’s own AI safety and responsibility guidelines cite those same goals, but in addition call out what many people consider the central, most general question: how do we align AI-based decisions with human values? They write:

“AI systems are becoming a part of everyday life. The key is to ensure that these machines are aligned with human intentions and values.”

But whose human values? Those of the benevolent idealists that most AI critics aspire to be? Those of a public company bound to put shareholder value ahead of customers, suppliers, and society as a whole? Those of criminals or rogue states bent on causing harm to others? Those of someone well meaning who, like Aladdin, expresses an ill-considered wish to an all-powerful AI genie?

There is no simple way to solve the alignment problem. But alignment will be impossible without robust institutions for disclosure and auditing. If we want prosocial outcomes, we need to design and report on the metrics that explicitly aim for those outcomes and measure the extent to which they have been achieved. That is a crucial first step, and we should take it immediately. These systems are still very much under human control. For now, at least, they do what they are told, and when the results don’t match expectations, their training is quickly improved. What we need to know is what they are being told.

What should be disclosed? There is an important lesson for both companies and regulators in the rules by which corporations—which science-fiction writer Charlie Stross has memorably called “slow AIs”—are regulated. One way we hold companies accountable is by requiring them to share their financial results compliant with Generally Accepted Accounting Principles or the International Financial Reporting Standards. If every company had a different way of reporting its finances, it would be impossible to regulate them…(More)”

Future-proofing the city: A human rights-based approach to governing algorithmic, biometric and smart city technologies


Introduction to Special Issue by Alina Wernick, and Anna Artyushina: “While the GDPR and other EU laws seek to mitigate a range of potential harms associated with smart cities, the compliance with and enforceability of these regulations remain an issue. In addition, these proposed regulations do not sufficiently address the collective harms associated with the deployment of biometric technologies and artificial intelligence. Another relevant question is whether the initiatives put forward to secure fundamental human rights in the digital realm account for the issues brought on by the deployment of technologies in city spaces. In this special issue, we employ the smart city notion as a point of connection for interdisciplinary research on the human rights implications of the algorithmic, biometric and smart city technologies and the policy responses to them. The articles included in the special issue analyse the latest European regulations as well as soft law, and the policy frameworks that are currently at work in the regions where the GDPR does not apply…(More)”.

Making AI Less “Thirsty”: Uncovering and Addressing the Secret Water Footprint of AI Models


Paper by Shaolei Ren, Pengfei Li, Jianyi Yang, and Mohammad A. Islam: “The growing carbon footprint of artificial intelligence (AI) models, especially large ones such as GPT-3 and GPT-4, has been undergoing public scrutiny. Unfortunately, however, the equally important and enormous water footprint of AI models has remained under the radar. For example, training GPT-3 in Microsoft’s state-of-the-art U.S. data centers can directly consume 700,000 liters of clean freshwater (enough for producing 370 BMW cars or 320 Tesla electric vehicles) and the water consumption would have been tripled if training were done in Microsoft’s Asian data centers, but such information has been kept as a secret. This is extremely concerning, as freshwater scarcity has become one of the most pressing challenges shared by all of us in the wake of the rapidly growing population, depleting water resources, and aging water infrastructures. To respond to the global water challenges, AI models can, and also should, take social responsibility and lead by example by addressing their own water footprint. In this paper, we provide a principled methodology to estimate fine-grained water footprint of AI models, and also discuss the unique spatial-temporal diversities of AI models’ runtime water efficiency. Finally, we highlight the necessity of holistically addressing water footprint along with carbon footprint to enable truly sustainable AI…(More)”.

Slow-governance in smart cities: An empirical study of smart intersection implementation in four US college towns


Paper by Madelyn Rose Sanfilippo and Brett Frischmann: “Cities cannot adopt supposedly smart technological systems and protect human rights without developing appropriate data governance, because technologies are not value-neutral. This paper proposes a deliberative, slow-governance approach to smart tech in cities. Inspired by the Governing Knowledge Commons (GKC) framework and past case studies, we empirically analyse the adoption of smart intersection technologies in four US college towns to evaluate and extend knowledge commons governance approaches to address human rights concerns. Our proposal consists of a set of questions that should guide community decision-making, extending the GKC framework via an incorporation of human-rights impact assessments and a consideration of capabilities approaches to human rights. We argue that such a deliberative, slow-governance approach enables adaptation to local norms and more appropriate community governance of smart tech in cities. By asking and answering key questions throughout smart city planning, procurement, implementation and management processes, cities can respect human rights, interests and expectations…(More)”.

Recalibrating assumptions on AI


Essay by Arthur Holland Michel: “Many assumptions about artificial intelligence (AI) have become entrenched despite the lack of evidence to support them. Basing policies on these assumptions is likely to increase the risk of negative impacts for certain demographic groups. These dominant assumptions include claims that AI is ‘intelligent’ and ‘ethical’, that more data means better AI, and that AI development is a ‘race’.

The risks of this approach to AI policymaking are often ignored, while the potential positive impacts of AI tend to be overblown. By illustrating how a more evidence-based, inclusive discourse can improve policy outcomes, this paper makes the case for recalibrating the conversation around AI policymaking…(More)”

Institutional review boards need new skills to review data sharing and management plans


Article by Vasiliki Rahimzadeh, Kimberley Serpico & Luke Gelinas: “New federal rules require researchers to submit plans for how to manage and share their scientific data, but institutional ethics boards may be underprepared to review them.

Data sharing is widely considered a conduit to scientific progress, the benefits of which should return to individuals and communities who invested in that science. This is the central premise underpinning changes recently announcement by the US Office of Science Technology and Policy (OSTP)1 on sharing and managing data generated from federally funded research. Researchers will now be required to make publicly accessible any scholarly publications stemming from their federally funded research, as well as supporting data, according to the OSTP announcement. However, the attendant risks to individuals’ privacy-related interests and the increasing threat of community-based harms remain barriers to fostering a trustworthy ecosystem of biomedical data science.

Institutional review boards (IRBs) are responsible for ensuring protections for all human participants engaged in research, but they rarely include members with specialized expertise needed to effectively minimize data privacy and security risks. IRBs must be prepared to meet these review demands given the new data sharing policy changes. They will need additional resources to conduct quality and effective reviews of data management and sharing (DMS) plans. Practical ways forward include expanding IRB membership, proactively consulting with researchers, and creating new research compliance resources. This Comment will focus on data management and sharing oversight by IRBs in the US, but the globalization of data science research underscores the need for enhancing similar review capacities in data privacy, management and security worldwide…(More)”.

How public money is shaping the future of AI


Report by Ethica: “The European Union aims to become the “home of trustworthy Artificial Intelligence” and has committed the biggest existing public funding to invest in AI over the next decade. However, the lack of accessible data and comprehensive reporting on the Framework Programmes’ results and impact hinder the EU’s capacity to achieve its objectives and undermine the credibility of its commitments. 

This research commissioned by the European AI & Society Fund, recommends publicly accessible data, effective evaluation of the real-world impacts of funding, and mechanisms for civil society participation in funding before investing further public funds to achieve the EU’s goal of being the epicenter of trustworthy AI.

Among its findings, the research has highlighted the negative impact of the European Union’s investment in artificial intelligence (AI). The EU invested €10bn into AI via its Framework Programmes between 2014 and 2020, representing 13.4% of all available funding. However, the investment process is top-down, with little input from researchers or feedback from previous grantees or civil society organizations. Furthermore, despite the EU’s aim to fund market-focused innovation, research institutions and higher and secondary education establishments received 73% of the total funding between 2007 and 2020. Germany, France, and the UK were the largest recipients, receiving 37.4% of the total EU budget.

The report also explores the lack of commitment to ethical AI, with only 30.3% of funding calls related to AI mentioning trustworthiness, privacy, or ethics. Additionally, civil society organizations are not involved in the design of funding programs, and there is no evaluation of the economic or societal impact of the funded work. The report calls for political priorities to align with funding outcomes in specific, measurable ways, citing transport as the most funded sector in AI despite not being an EU strategic focus, while programs to promote SME and societal participation in scientific innovation have been dropped….(More)”.

No Ground Truth? No Problem: Improving Administrative Data Linking Using Active Learning and a Little Bit of Guile


Paper by Sarah Tahamont et al: “While linking records across large administrative datasets [“big data”] has the potential to revolutionize empirical social science research, many administrative data files do not have common identifiers and are thus not designed to be linked to others. To address this problem, researchers have developed probabilistic record linkage algorithms which use statistical patterns in identifying characteristics to perform linking tasks. Naturally, the accuracy of a candidate linking algorithm can be substantially improved when an algorithm has access to “ground-truth” examples — matches which can be validated using institutional knowledge or auxiliary data. Unfortunately, the cost of obtaining these examples is typically high, often requiring a researcher to manually review pairs of records in order to make an informed judgement about whether they are a match. When a pool of ground-truth information is unavailable, researchers can use “active learning” algorithms for linking, which ask the user to provide ground-truth information for select candidate pairs. In this paper, we investigate the value of providing ground-truth examples via active learning for linking performance. We confirm popular intuition that data linking can be dramatically improved with the availability of ground truth examples. But critically, in many real-world applications, only a relatively small number of tactically-selected ground-truth examples are needed to obtain most of the achievable gains. With a modest investment in ground truth, researchers can approximate the performance of a supervised learning algorithm that has access to a large database of ground truth examples using a readily available off-the-shelf tool…(More)”.