Using Data for Good: Identifying Who Could Benefit from Simplified Tax Filing


Blog by New America: “For years, New America Chicago has been working with state agencies, national and local advocates and thought leaders, as well as community members on getting beneficial tax credits, like the Earned Income Tax Credit (EITC) and Child Tax Credit (CTC), into the hands of those who need them most. Illinois paved the way recently with its innovative simplified filing initiative which helps residents easily claim their state Earned Income Credit (EIC) by confirming their refund with a prepopulated return.

This past year we had discussions with Illinois policymakers and state agencies, like the Illinois Department of Revenue (IDoR) and the Illinois Department of Human Services (IDHS), to envision new ways for expanding the simplified filing initiative. It is currently designed to reach those who have filed a federal tax return and claimed their EITC, leaving out non-filer households who typically do not file taxes because they earn less than the federal income requirement or have other barriers.

In Illinois, over 600,000 households are enrolled in SNAP, and over 1 million households are enrolled in Medicaid. Every year thousands of families spend countless hours applying for these and other social safety net programs using IDHS’ Application for Benefits Eligibility (ABE). Unfortunately, many of these households are most in need of the federal EITC and the recently expanded state EIC but will never receive it. We posed the question, what if Illinois could save families time and money by using that already provided income and household information to streamline access to the state EIC for low-income families that don’t normally file taxes?

Our friends at Inclusive Economy Lab (IEL) conducted analysis using Census microdata to estimate the number of Illinois households who are enrolled in Medicaid and SNAP but do not file their federal or state tax forms…(More)”.

Open Data Commons Licences (ODCL): Licensing personal and non personal data supporting the commons and privacy


Paper by Yaniv Benhamou and Melanie Dulong de Rosnay: “Data are often subject to a multitude of rights (e.g. original works or personal data posted on social media, or collected through captcha, subject to copyright, database and data protection) and voluntary shared through non standardized, non interoperable contractual terms. This leads to fragmented legal regimes and has become an even major challenge in the AI-era, for example when online platforms set their own Terms of Services (ToS), in business-to-consumer relationship (B2C).

This article proposes standard terms that may apply to all kind of data (including personal and mixed datasets subject to different legal regimes) based on the open data philosophy initially developed for Free and Open Source software and Creative Commons licenses for artistic and other copyrighted works. In a first part, we analyse how to extend open standard terms to all kinds of data (II). In a second part, we suggest to combine these open standard terms with collective governance instruments, in particular data trust, inspired by commons-based projects and by the centennial collective management of copyright (III). In a last part, after few concluding remarks (IV), we propose a template “Open Data Commons Licenses“ (ODCL) combining compulsory and optional elements to be selected by licensors, illustrated by pictograms and icons inspired by the bricks of Creative Commons licences and legal design techniques (V).

This proposal addresses the bargaining power imbalance and information asymmetry (by offering the licensor the ability to decide the terms), and conceptualises contract law differently. It reverses the current logic of contract: instead of letting companies (licensees) impose their own ToS to the users (licensors, being the copyright owner, data subject, data producer), licensors will reclaim the ability to set their own terms for access and use of data, by selecting standard terms. This should also allow the management of complex datasets, increase data sharing, and improve trust and control over the data. Like previous open licencing standards, the model is expected to lower the transaction costs by reducing the need to develop and read new complicated contractual terms. It can also spread the virality of open data to all data in an AI-era, if any input data under such terms used for AI training purposes propagates its conditions to all aggregated and output data. In other words, any data distributed under our ODCL template will turn all outcome into more or less open data and foster a data common ecosystem. Finally, instead of full openness, our model allows for restrictions outside of certain boundaries (e.g. authorized users and uses), in order to protect the commons and certain values. The model would require to be governed and monitored by a collective data trust…(More)”.

Considerations for Governing Open Foundation Models


Brief by Rishi Bommasani et al: “Foundation models (e.g., GPT-4, Llama 2) are at the epicenter of AI, driving technological innovation and billions in investment. This paradigm shift has sparked widespread demands for regulation. Animated by factors as diverse as declining transparency and unsafe labor practices, limited protections for copyright and creative work, as well as market concentration and productivity gains, many have called for policymakers to take action.

Central to the debate about how to regulate foundation models is the process by which foundation models are released. Some foundation models like Google DeepMind’s Flamingo are fully closed, meaning they are available only to the model developer; others, such as OpenAI’s GPT-4, are limited access, available to the public but only as a black box; and still others, such as Meta’s Llama 2, are more open, with widely available model weights enabling downstream modification and scrutiny. As of August 2023, the U.K.’s Competition and Markets Authority documents the most common release approach for publicly-disclosed models is open release based on data from Stanford’s Ecosystem Graphs. Developers like Meta, Stability AI, Hugging Face, Mistral, Together AI, and EleutherAI frequently release models openly.

Governments around the world are issuing policy related to foundation models. As part of these efforts, open foundation models have garnered significant attention: The recent U.S. Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence tasks the National Telecommunications and Information Administration with preparing a report on open foundation models for the president. In the EU, open foundation models trained with fewer than 1025 floating point operations (a measure of the amount of compute expended) appear to be exempted under the recently negotiated AI Act. The U.K.’s AI Safety Institute will “consider open-source systems as well as those deployed with various forms of access controls” as part of its initial priorities. Beyond governments, the Partnership on AI has introduced guidelines for the safe deployment of foundation models, recommending against open release for the most capable foundation models.

Policy on foundation models should support the open foundation model ecosystem, while providing resources to monitor risks and create safeguards to address harms. Open foundation models provide significant benefits to society by promoting competition, accelerating innovation, and distributing power. For example, small businesses hoping to build generative AI applications could choose among a variety of open foundation models that offer different capabilities and are often less expensive than closed alternatives. Further, open models are marked by greater transparency and, thereby, accountability. When a model is released with its training data, independent third parties can better assess the model’s capabilities and risks…(More)”.

 Privacy-Enhancing and Privacy-Preserving Technologies: Understanding the Role of PETs and PPTs in the Digital Age


Paper by the Centre for Information Policy Leadership: “…explores how organizations are approaching privacy-enhancing technologies (“PETs”) and how PETs can advance data protection principles, and provides examples of how specific types of PETs work. It also explores potential challenges to the use of PETs and possible solutions to those challenges.

CIPL emphasizes the enormous potential inherent in these technologies to mitigate privacy risks and support innovation, and recommends a number of steps to foster further development and adoption of PETs. In particular, CIPL calls for policymakers and regulators to incentivize the use of PETs through clearer guidance on key legal concepts that impact the use of PETs, and by adopting a pragmatic approach to the application of these concepts.

CIPL’s recommendations towards wider adoption are as follows:

  • Issue regulatory guidance and incentives regarding PETs: Official regulatory guidance addressing PETs in the context of specific legal obligations or concepts (such as anonymization) will incentivize greater investment in PETs.
  • Increase education and awareness about PETs: PET developers and providers need to show tangible evidence of the value of PETs and help policymakers, regulators and organizations understand how such technologies can facilitate responsible data use.
  • Develop industry standards for PETs: Industry standards would help facilitate interoperability for the use of PETs across jurisdictions and help codify best practices to support technical reliability to foster trust in these technologies.
  • Recognize PETs as a demonstrable element of accountability: PETs complement robust data privacy management programs and should be recognized as an element of organizational accountability…(More)”.

How Moral Can A.I. Really Be?


Article by Paul Bloom: “…The problem isn’t just that people do terrible things. It’s that people do terrible things that they consider morally good. In their 2014 book “Virtuous Violence,” the anthropologist Alan Fiske and the psychologist Tage Rai argue that violence is often itself a warped expression of morality. “People are impelled to violence when they feel that to regulate certain social relationships, imposing suffering or death is necessary, natural, legitimate, desirable, condoned, admired, and ethically gratifying,” they write. Their examples include suicide bombings, honor killings, and war. The philosopher Kate Manne, in her book “Down Girl,” makes a similar point about misogynistic violence, arguing that it’s partially rooted in moralistic feelings about women’s “proper” role in society. Are we sure we want A.I.s to be guided by our idea of morality?

Schwitzgebel suspects that A.I. alignment is the wrong paradigm. “What we should want, probably, is not that superintelligent AI align with our mixed-up, messy, and sometimes crappy values but instead that superintelligent AI have ethically good values,” he writes. Perhaps an A.I. could help to teach us new values, rather than absorbing old ones. Stewart, the former graduate student, argued that if researchers treat L.L.M.s as minds and study them psychologically, future A.I. systems could help humans discover moral truths. He imagined some sort of A.I. God—a perfect combination of all the great moral minds, from Buddha to Jesus. A being that’s better than us.

Would humans ever live by values that are supposed to be superior to our own? Perhaps we’ll listen when a super-intelligent agent tells us that we’re wrong about the facts—“this plan will never work; this alternative has a better chance.” But who knows how we’ll respond if one tells us, “You think this plan is right, but it’s actually wrong.” How would you feel if your self-driving car tried to save animals by refusing to take you to a steakhouse? Would a government be happy with a military A.I. that refuses to wage wars it considers unjust? If an A.I. pushed us to prioritize the interests of others over our own, we might ignore it; if it forced us to do something that we consider plainly wrong, we would consider its morality arbitrary and cruel, to the point of being immoral. Perhaps we would accept such perverse demands from God, but we are unlikely to give this sort of deference to our own creations. We want alignment with our own values, then, not because they are the morally best ones, but because they are ours…(More)”

How to make data open? Stop overlooking librarians


Article by Jessica Farrell: “The ‘Year of Open Science’, as declared by the US Office of Science and Technology Policy (OSTP), is now wrapping up. This followed an August 2022 memo from OSTP acting director Alondra Nelson, which mandated that data and peer-reviewed publications from federally funded research should be made freely accessible by the end of 2025. Federal agencies are required to publish full plans for the switch by the end of 2024.

But the specifics of how data will be preserved and made publicly available are far from being nailed down. I worked in archives for ten years and now facilitate two digital-archiving communities, the Software Preservation Network and BitCurator Consortium, at Educopia in Atlanta, Georgia. The expertise of people such as myself is often overlooked. More open-science projects need to integrate digital archivists and librarians, to capitalize on the tools and approaches that we have already created to make knowledge accessible and open to the public.How to make your scientific data accessible, discoverable and useful

Making data open and ‘FAIR’ — findable, accessible, interoperable and reusable — poses technical, legal, organizational and financial questions. How can organizations best coordinate to ensure universal access to disparate data? Who will do that work? How can we ensure that the data remain open long after grant funding runs dry?

Many archivists agree that technical questions are the most solvable, given enough funding to cover the labour involved. But they are nonetheless complex. Ideally, any open research should be testable for reproducibility, but re-running scripts or procedures might not be possible unless all of the required coding libraries and environments used to analyse the data have also been preserved. Besides the contents of spreadsheets and databases, scientific-research data can include 2D or 3D images, audio, video, websites and other digital media, all in a variety of formats. Some of these might be accessible only with proprietary or outdated software…(More)”.

Artificial Intelligence and the City


Book edited by Federico Cugurullo, Federico Caprotti, Matthew Cook, Andrew Karvonen, Pauline McGuirk, and Simon Marvin: “This book explores in theory and practice how artificial intelligence (AI) intersects with and alters the city. Drawing upon a range of urban disciplines and case studies, the chapters reveal the multitude of repercussions that AI is having on urban society, urban infrastructure, urban governance, urban planning and urban sustainability.

Contributors also examine how the city, far from being a passive recipient of new technologies, is influencing and reframing AI through subtle processes of co-constitution. The book advances three main contributions and arguments:

  • First, it provides empirical evidence of the emergence of a post-smart trajectory for cities in which new material and decision-making capabilities are being assembled through multiple AIs.
  • Second, it stresses the importance of understanding the mutually constitutive relations between the new experiences enabled by AI technology and the urban context.
  • Third, it engages with the concepts required to clarify the opaque relations that exist between AI and the city, as well as how to make sense of these relations from a theoretical perspective…(More)”.

After USTR’s Move, Global Governance of Digital Trade Is Fraught with Unknowns


Article by Patrick Leblond: “On October 25, the United States announced at the World Trade Organization (WTO) that it was dropping its support for provisions meant to promote the free flow of data across borders. Also abandoned were efforts to continue negotiations on international e-commerce, to protect the source code in applications and algorithms (the so-called Joint Statement Initiative process).

According to the Office of the US Trade Representative (USTR): “In order to provide enough policy space for those debates to unfold, the United States has removed its support for proposals that might prejudice or hinder those domestic policy considerations.” In other words, the domestic regulation of data, privacy, artificial intelligence, online content and the like, seems to have taken precedence over unhindered international digital trade, which the United States previously strongly defended in trade agreements such as the Trans-Pacific Partnership (TPP) and the Canada-United States-Mexico Agreement (CUSMA)…

One pathway for the future sees the digital governance noodle bowl getting bigger and messier. In this scenario, international digital trade suffers. Agreements continue proliferating but remain ineffective at fostering cross-border digital trade: either they remain hortatory with attempts at cooperation on non-strategic issues, or no one pays attention to the binding provisions because business can’t keep up and governments want to retain their “policy space.” After all, why has there not yet been any dispute launched based on binding provisions in a digital trade agreement (either on its own or as part of a larger trade deal) when there has been increasing digital fragmentation?

The other pathway leads to the creation of a new international standards-setting and governance body (call it an International Digital Standards Board), like there exists for banking and finance. Countries that are members of such an international organization and effectively apply the commonly agreed standards become part of a single digital area where they can conduct cross-border digital trade without impediments. This is the only way to realize the G7’s “data free flow with trust” vision, originally proposed by Japan…(More)”.

Steering Responsible AI: A Case for Algorithmic Pluralism


Paper by Stefaan G. Verhulst: “In this paper, I examine questions surrounding AI neutrality through the prism of existing literature and scholarship about mediation and media pluralism. Such traditions, I argue, provide a valuable theoretical framework for how we should approach the (likely) impending era of AI mediation. In particular, I suggest examining further the notion of algorithmic pluralism. Contrasting this notion to the dominant idea of algorithmic transparency, I seek to describe what algorithmic pluralism may be, and present both its opportunities and challenges. Implemented thoughtfully and responsibly, I argue, Algorithmic or AI pluralism has the potential to sustain the diversity, multiplicity, and inclusiveness that are so vital to democracy…(More)”.

Want to know if your data are managed responsibly? Here are 15 questions to help you find out


Article by P. Alison Paprica et al: “As the volume and variety of data about people increases, so does the number of ideas about how data might be used. Studies show that many people want their data to be used for public benefit.

However, the research also shows that public support for use of data is conditional, and only given when risks such as those related to privacycommercial exploitation and artificial intelligence misuse are addressed.

It takes a lot of work for organizations to establish data governance and management practices that mitigate risks while also encouraging beneficial uses of data. So much so, that it can be challenging for responsible organizations to communicate their data trustworthiness without providing an overwhelming amount of technical and legal details.

To address this challenge our team undertook a multiyear project to identify, refine and publish a short list of essential requirements for responsible data stewardship.

Our 15 minimum specification requirements (min specs) are based on a review of the scientific literature and the practices of 23 different data-focused organizations and initiatives.

As part of our project, we compiled over 70 public resources, including examples of organizations that address the full list of min specs: ICES, the Hartford Data Collaborative and the New Brunswick Institute for Research, Data and Training.

Our hope is that information related to the min specs will help organizations and data-sharing initiatives share best practices and learn from each other to improve their governance and management of data…(More)”.