Data Stewardship as Environmental Stewardship


Article by Stefaan Verhulst and Sara Marcucci: “Why responsible data stewardship could help address today’s pressing environmental challenges resulting from artificial intelligence and other data-related technologies…

Even as the world grows increasingly reliant on data and artificial intelligence, concern over the environmental impact of data-related activities is increasing. Solutions remain elusive. The rise of generative AI, which rests on a foundation of massive data sets and computational power, risks exacerbating the problem.

In the below, we propose that responsible data stewardship offers a potential pathway to reducing the environmental footprint of data activities. By promoting practices such as data reuse, minimizing digital waste, and optimizing storage efficiency, data stewardship can help mitigate environmental harm. Additionally, data stewardship supports broader environmental objectives by facilitating better decision-making through transparent, accessible, and shared data. In the below, we suggest that advancing data stewardship as a cornerstone of environmental responsibility could provide a compelling approach to addressing the dual challenges of advancing digital technologies while safeguarding the environment…(More)”

Data Governance Meets the EU AI Act


Article by Axel Schwanke: “..The EU AI Act emphasizes sustainable AI through robust data governance, promoting principles like data minimization, purpose limitation, and data quality to ensure responsible data collection and processing. It mandates measures such as data protection impact assessments and retention policies. Article 10 underscores the importance of effective data management in fostering ethical and sustainable AI development…This article states that high-risk AI systems must be developed using high-quality data sets for training, validation, and testing. These data sets should be managed properly, considering factors like data collection processes, data preparation, potential biases, and data gaps. The data sets should be relevant, representative, error-free, and complete as much as possible. They should also consider the specific context in which the AI system will be used. In some cases, providers may process special categories of personal data to detect and correct biases, but they must follow strict conditions to protect individuals’ rights and freedoms…

However, achieving compliance presents several significant challenges:

  • Ensuring Dataset Quality and Relevance: Organizations must establish robust data and AI platforms to prepare and manage datasets that are error-free, representative, and contextually relevant for their intended use cases. This requires rigorous data preparation and validation processes.
  • Bias and Contextual Sensitivity: Continuous monitoring for biases in data is critical. Organizations must implement corrective actions to address gaps while ensuring compliance with privacy regulations, especially when processing personal data to detect and reduce bias.
  • End-to-End Traceability: A comprehensive data governance framework is essential to track and document data flow from its origin to its final use in AI models. This ensures transparency, accountability, and compliance with regulatory requirements.
  • Evolving Data Requirements: Dynamic applications and changing schemas, particularly in industries like real estate, necessitate ongoing updates to data preparation processes to maintain relevance and accuracy.
  • Secure Data Processing: Compliance demands strict adherence to secure processing practices for personal data, ensuring privacy and security while enabling bias detection and mitigation.

Example: Real Estate Data
Immowelt’s real estate price map, awarded as the top performer in a 2022 test of real estate price maps, exemplifies the challenges of achieving high-quality datasets. The prepared data powers numerous services and applications, including data analysis, price predictions, personalization, recommendations, and market research…(More)”

Building Safer and Interoperable AI Systems


Essay by Vint Cerf: “While I am no expert on artificial intelligence (AI), I have some experience with the concept of agents. Thirty-five years ago, my colleague, Robert Kahn, and I explored the idea of knowledge robots (“knowbots” for short) in the context of digital libraries. In principle, a knowbot was a mobile piece of code that could move around the Internet, landing at servers, where they could execute tasks on behalf of users. The concept is mostly related to finding information and processing it on behalf of a user. We imagined that the knowbot code would land at a serving “knowbot hotel” where it would be given access to content and computing capability. The knowbots would be able to clone themselves to execute their objectives in parallel and would return to their origins bearing the results of their work. Modest prototypes were built in the pre-Web era.

In today’s world, artificially intelligent agents are now contemplated that can interact with each other and with information sources found on the Internet. For this to work, it’s my conjecture that a syntax and semantics will need to be developed and perhaps standardized to facilitate inter-agent interaction, agreements, and commitments for work to be performed, as well as a means for conveying results in reliable and unambiguous ways. A primary question for all such concepts starts with “What could possibly go wrong?”

In the context of AI applications and agents, work is underway to answer that question. I recently found one answer to that in the MLCommons AI Safety Working Group and its tool, AILuminate. My coarse sense of this is that AILuminate posts a large and widely varying collection of prompts—not unlike the notion of testing software by fuzzing—looking for inappropriate responses. Large language models (LLMs) can be tested and graded (that’s the hard part) on responses to a wide range of prompts. Some kind of overall safety metric might be established to connect one LLM to another. One might imagine query collections oriented toward exposing particular contextual weaknesses in LLMs. If these ideas prove useful, one could even imagine using them in testing services such as those at Underwriters Laboratory, now called UL Solutions. UL Solutions already offers software testing among its many other services.

LLMs as agents seem naturally attractive…(More)”.

Why Digital Public Goods, including AI, Should Depend on Open Data


Article by Cable Green: “Acknowledging that some data should not be shared (for moral, ethical and/or privacy reasons) and some cannot be shared (for legal or other reasons), Creative Commons (CC) thinks there is value in incentivizing the creation, sharing, and use of open data to advance knowledge production. As open communities continue to imagine, design, and build digital public goods and public infrastructure services for education, science, and culture, these goods and services – whenever possible and appropriate – should produce, share, and/or build upon open data.

Open Data and Digital Public Goods (DPGs)

CC is a member of the Digital Public Goods Alliance (DPGA) and CC’s legal tools have been recognized as digital public goods (DPGs). DPGs are “open-source software, open standards, open data, open AI systems, and open content collections that adhere to privacy and other applicable best practices, do no harm, and are of high relevance for attainment of the United Nations 2030 Sustainable Development Goals (SDGs).” If we want to solve the world’s greatest challenges, governments and other funders will need to invest in, develop, openly license, share, and use DPGs.

Open data is important to DPGs because data is a key driver of economic vitality with demonstrated potential to serve the public good. In the public sector, data informs policy making and public services delivery by helping to channel scarce resources to those most in need; providing the means to hold governments accountable and foster social innovation. In short, data has the potential to improve people’s lives. When data is closed or otherwise unavailable, the public does not accrue these benefits.CC was recently part of a DPGA sub-committee working to preserve the integrity of open data as part of the DPG Standard. This important update to the DPG Standard was introduced to ensure only open datasets and content collections with open licenses are eligible for recognition as DPGs. This new requirement means open data sets and content collections must meet the following criteria to be recognised as a digital public good.

  1. Comprehensive Open Licensing:
    1. The entire data set/content collection must be under an acceptable open licence. Mixed-licensed collections will no longer be accepted.
  2. Accessible and Discoverable:
    1. All data sets and content collection DPGs must be openly licensed and easily accessible from a distinct, single location, such as a unique URL.
  3. Permitted Access Restrictions:
    1. Certain access restrictions – such as logins, registrations, API keys, and throttling – are permitted as long as they do not discriminate against users or restrict usage based on geography or any other factors…(More)”.

Leveraging Crowd Intelligence to Enhance Fairness and Accuracy in AI-powered Recruitment Decisions


Paper by Zhen-Song Chen and Zheng Ma: “Ensuring fair and accurate hiring outcomes is critical for both job seekers’ economic opportunities and organizational development. This study addresses the challenge of mitigating biases in AI-powered resume screening systems by leveraging crowd intelligence, thereby enhancing problem-solving efficiency and decision-making quality. We propose a novel counterfactual resume-annotation method based on a causal model to capture and correct biases from human resource (HR) representatives, providing robust ground truth data for supervised machine learning. The proposed model integrates multiple language embedding models and diverse HR-labeled data to train a cohort of resume-screening agents. By training 60 such agents with different models and data, we harness their crowd intelligence to optimize for three objectives: accuracy, fairness, and a balance of both. Furthermore, we develop a binary bias-detection model to visualize and analyze gender bias in both human and machine outputs. The results suggest that harnessing crowd intelligence using both accuracy and fairness objectives helps AI systems robustly output accurate and fair results. By contrast, a sole focus on accuracy may lead to severe fairness degradation, while, conversely, a sole focus on fairness leads to a relatively minor loss of accuracy. Our findings underscore the importance of balancing accuracy and fairness in AI-powered resume-screening systems to ensure equitable hiring outcomes and foster inclusive organizational development…(More)”

Mindmasters: The Data-Driven Science of Predicting and Changing Human Behavior


Book by Sandra Matz: “There are more pieces of digital data than there are stars in the universe. This data helps us monitor our planet, decipher our genetic code, and take a deep dive into our psychology.

As algorithms become increasingly adept at accessing the human mind, they also become more and more powerful at controlling it, enticing us to buy a certain product or vote for a certain political candidate. Some of us say this technological trend is no big deal. Others consider it one of the greatest threats to humanity. But what if the truth is more nuanced and mind-bending than that?

In Mindmasters, Columbia Business School professor Sandra Matz reveals in fascinating detail how big data offers insights into the most intimate aspects of our psyches and how these insights empower an external influence over the choices we make. This can be creepy, manipulative, and downright harmful, with scandals like that of British consulting firm Cambridge Analytica being merely the tip of the iceberg. Yet big data also holds enormous potential to help us live healthier, happier lives—for example, by improving our mental health, encouraging better financial decisions, or enabling us to break out of our echo chambers..(More)”.

The Attention Crisis Is Just a Distraction


Essay by Daniel Immerwahr: “…If every video is a starburst of expression, an extended TikTok session is fireworks in your face for hours. That can’t be healthy, can it? In 2010, the technology writer Nicholas Carr presciently raised this concern in “The Shallows: What the Internet Is Doing to Our Brains,” a Pulitzer Prize finalist. “What the Net seems to be doing,” Carr wrote, “is chipping away my capacity for concentration and contemplation.” He recounted his increased difficulty reading longer works. He wrote of a highly accomplished philosophy student—indeed, a Rhodes Scholar—who didn’t read books at all but gleaned what he could from Google. That student, Carr ominously asserted, “seems more the rule than the exception.”

Carr set off an avalanche. Much read works about our ruined attention include Nir Eyal’s “Indistractable,” Johann Hari’s “Stolen Focus,” Cal Newport’s “Deep Work,” and Jenny Odell’s “How to Do Nothing.” Carr himself has a new book, “Superbloom,” about not only distraction but all the psychological harms of the Internet. We’ve suffered a “fragmentation of consciousness,” Carr writes, our world having been “rendered incomprehensible by information.”

Read one of these books and you’re unnerved. But read two more and the skeptical imp within you awakens. Haven’t critics freaked out about the brain-scrambling power of everything from pianofortes to brightly colored posters? Isn’t there, in fact, a long section in Plato’s Phaedrus in which Socrates argues that writing will wreck people’s memories?…(More)”.

What’s the Goal of the Goal?


Chapter by Dan Heath: “…Achieving clarity on the way forward is not an incremental victory. It is transformative. It can mean the difference between stuck and unstuck.

A group of federal government leaders experienced this transformation several years ago when they rethought the goal of a program that served people with disabilities, including veterans. Some context: Anyone with a “total permanent disability” can, by law, have their federal student loans discharged. But thousands of veterans didn’t take advantage of the program. This was a disappointment to many government leaders, whose goal was simple: Make it easy for veterans to apply for the benefits they deserve.

What was holding back participation in the program? To some extent it was knowledge: Many simply didn’t realize they were eligible for forgiveness. Others got derailed by the cumbersome application process.

The stakes were high: Some of these borrowers were actually in default—potentially having their social-security-disability payments garnished to make loan payments. The government was reaching into their pockets to claim money for loans that they shouldn’t have owed!

So what could be done? In 2016, a team at the Department of Education thought: Rather than make the borrowers responsible for discovering this benefit, let’s proactively tell them about it!

They hatched a plan that led them to compare the databases at several agencies, including the Department of Education and the Department of Veteran Affairs (VA). The Department of Education database could tell you: Who has student loans? The VA database could tell you: Which veterans are permanently disabled? Anyone who matched both databases was eligible for a loan discharge…(More)”

Impact Curious?


Excerpt of book by Priya Parrish: “My journey to impact investing began when I was an undergraduate studying economics and entrepreneurship and couldn’t find any examples of people harnessing the power of business to improve the world. That was 20 years ago, before impact investing was a recognized strategy. Back then, just about everyone in the field was an entrepreneur experimenting with investment tools, trying to figure out how to do well financially while also making positive change. I joined right in.

The term “impact investing” has been around since 2007 but hasn’t taken hold the way I thought (and hoped) it might. There are still a lot of myths about what impact investing truly is and does, the most prevalent of which is that doing good won’t generate returns. This couldn’t be more false, yet it persists. This book is my attempt to debunk this myth and others like it, as well as make sense of the confusion, as it’s difficult for a newcomer to understand the jargon, sort through the many false or exaggerated claims, and follow the heated debates about this topic. This book is for the “impact curious,” or anyone who wants more than just financial returns from their investments. It is for anyone interested in finding out what their investments can do when aligned with purpose. It is for anyone who wishes to live in alignment with their values—in every aspect of their lives.

This particular excerpt from my book, The Little Book of Impact Investing, provides a history of the term and activity in the space. It addresses why now is a particularly good time to make business do more and do better—so that the world can and will too…(More)”.

Silencing Science Tracker


About: “The Silencing Science Tracker is a joint initiative of the Sabin Center for Climate Change Law and the Climate Science Legal Defense Fund. It is intended to record reports of federal, state, and local government attempts to “silence science” since the November 2016 election.

We define “silencing science” to include any action that has the effect of restricting or prohibiting scientific research, education, or discussion, or the publication or use of scientific information. We divide such actions into 7 categories as follows…(More)”

CategoryExamples
Government CensorshipChanging the content of websites and documents to suppress or distort scientific information.Making scientific data more difficult to find or access.Restricting public communication by scientists.
Self-CensorshipScientists voluntarily changing the content of websites and documents to suppress or distort scientific information, potentially in response to political pressure.
 We note that it is often difficult to determine whether self-censorship is occurring and/or its cause. We do not take any position on the accuracy of any individual report on self-censorship.
Budget CutsReducing funding for existing agency programs involving scientific research or scientific education.Cancelling existing grants for scientific research or scientific education.
 We do not include, in the “budget cuts” category, government decisions to refuse new grant applications or funding for new agency programs.
Personnel ChangesRemoving scientists from agency positions or creating a hostile work environment.Appointing unqualified individuals to, or failing to fill, scientific positions.Changing the composition of scientific advisory board or other bodies to remove qualified scientists or add only industry-favored members.Eliminating government bodies involved in scientific research or education or the dissemination of scientific information.
Research HindranceDestroying data needed to undertake scientific research.Preventing or restricting the publication of scientific research.Pressuring scientists to change research findings.
Bias and MisrepresentationEngaging in “cherry picking” or only disclosing certain scientific studies (e.g., that support a particular conclusion).Misrepresenting or mischaracterizing scientific studies.Disregarding scientific studies or advice in policy-making.
Interference with EducationChanging science education standards to prevent or limit the teaching of proven scientific theories.Requiring or encouraging the teaching of discredited or unproven scientific theories.Preventing the use of factually accurate textbooks and other instructional materials (e.g., on religious grounds).