How Your Car Might Be Making Roads Safer


Article by Kashmir Hill: “Darcy Bullock, a civil engineering professor at Purdue University, turns to his computer screen to get information about how fast cars are traveling on Interstate 65, which runs 887 miles from Lake Michigan to the Gulf of Mexico. It’s midafternoon on a Monday, and his screen is mostly filled with green dots indicating that traffic is moving along nicely. But near an exit on the outskirts of Indianapolis, an angry red streak shows that cars have stopped moving.

A traffic camera nearby reveals the cause: A car has spun out, causing gridlock.

In recent years, vehicles that have wireless connectivity have become a critical source of information for transportation departments and for academics who study traffic patterns. The data these vehicles emit — including speed, how hard they brake and accelerate, and even if their windshield wipers are on — can offer insights into dangerous road conditions, congestion or poorly timed traffic signals.

“Our cars know more about our roads than agencies do,” said Dr. Bullock, who regularly works with the Indiana Department of Transportation to conduct studies on how to reduce traffic congestion and increase road safety. He credits connected-car data with detecting hazards that would have taken years — and many accidents — to find in the past.

The data comes primarily from commercial trucks and from cars made by General Motors that are enrolled in OnStar, G.M.’s internet-connected service. (Drivers know OnStar as the service that allows them to lock their vehicles from a smartphone app or find them if they have been stolen.) Federal safety guidelines require commercial truck drivers to be routinely monitored, but people driving G.M. vehicles may be surprised to know that their data is being collected, though it is indicated in the fine print of the company’s privacy policy…(More)”.

Philanthropy by the Numbers


Essay by Aaron Horvath: “Foundations make grants conditional on demonstrable results. Charities tout the evidentiary basis of their work. And impact consultants play both sides: assisting funders in their pursuit of rational beneficence and helping grantees translate the jumble of reality into orderly, spreadsheet-ready metrics.

Measurable impact has crept into everyday understandings of charity as well. There’s the extensive (often fawning) news coverage of data-crazed billionaire philanthropists, so-called thought leaders exhorting followers to rethink their contributions to charity, and popular books counseling that intuition and sentiment are poor guides for making the world a better place. Putting ideas into action, charity evaluators promote research-backed listings of the most impactful nonprofits. Why give to your local food bank when there’s one in Somerville, Massachusetts, with a better rating?

Over the past thirty years, amid a larger crisis of civic engagement, social isolation, and political alienation, measurable impact has seeped into our civic imagination and become one of the guiding ideals for public-spirited beneficence. And while its proponents do not always agree on how best to achieve or measure the extent of that impact, they have collectively recast civic engagement as objective, pragmatic, and above the fray of politics—a triumph of the head over the heart. But how did we get here? And what happens to our capacity for meaningful collective action when we think of civic life in such depersonalized and quantified terms?…(More)”.

To Whom Does the World Belong?


Essay by Alexander Hartley: “For an idea of the scale of the prize, it’s worth remembering that 90 percent of recent U.S. economic growth, and 65 percent of the value of its largest 500 companies, is already accounted for by intellectual property. By any estimate, AI will vastly increase the speed and scale at which new intellectual products can be minted. The provision of AI services themselves is estimated to become a trillion-dollar market by 2032, but the value of the intellectual property created by those services—all the drug and technology patents; all the images, films, stories, virtual personalities—will eclipse that sum. It is possible that the products of AI may, within my lifetime, come to represent a substantial portion of all the world’s financial value.

In this light, the question of ownership takes on its true scale, revealing itself as a version of Bertolt Brecht’s famous query: To whom does the world belong?


Questions of AI authorship and ownership can be divided into two broad types. One concerns the vast troves of human-authored material fed into AI models as part of their “training” (the process by which their algorithms “learn” from data). The other concerns ownership of what AIs produce. Call these, respectively, the input and output problems.

So far, attention—and lawsuits—have clustered around the input problem. The basic business model for LLMs relies on the mass appropriation of human-written text, and there simply isn’t anywhere near enough in the public domain. OpenAI hasn’t been very forthcoming about its training data, but GPT-4 was reportedly trained on around thirteen trillion “tokens,” roughly the equivalent of ten trillion words. This text is drawn in large part from online repositories known as “crawls,” which scrape the internet for troves of text from news sites, forums, and other sources. Fully aware that vast data scraping is legally untested—to say the least—developers charged ahead anyway, resigning themselves to litigating the issue in retrospect. Lawyer Peter Schoppert has called the training of LLMs without permission the industry’s “original sin”—to be added, we might say, to the technology’s mind-boggling consumption of energy and water in an overheating planet. (In September, Bloomberg reported that plans for new gas-fired power plants have exploded as energy companies are “racing to meet a surge in demand from power-hungry AI data centers.”)…(More)”.

Collaborative Intelligence


Book edited by Mira Lane and Arathi Sethumadhavan: “…The book delves deeply into the dynamic interplay between theory and practice, shedding light on the transformative potential and complexities of AI. For practitioners deeply immersed in the world of AI, Lane and Sethumadhavan offer firsthand accounts and insights from technologists, academics, and thought leaders, as well as a series of compelling case studies, ranging from AI’s impact on artistry to its role in addressing societal challenges like modern slavery and wildlife conservation.

As the global AI market burgeons, this book enables collaboration, knowledge sharing, and interdisciplinary dialogue. It caters not only to the practitioners shaping the AI landscape but also to policymakers striving to navigate the intricate relationship between humans and machines, as well as academics. Divided into two parts, the first half of the book offers readers a comprehensive understanding of AI’s historical context, its influence on power dynamics, human-AI interaction, and the critical role of audits in governing AI systems. The second half unfolds a series of eight case studies, unraveling AI’s impact on fields as varied as healthcare, vehicular safety, conservation, human rights, and the metaverse. Each chapter in this book paints a vivid picture of AI’s triumphs and challenges, providing a panoramic view of how it is reshaping our world…(More)”

Trust but Verify: A Guide to Conducting Due Diligence When Leveraging Non-Traditional Data in the Public Interest


New Report by Sara Marcucci, Andrew J. Zahuranec, and Stefaan Verhulst: “In an increasingly data-driven world, organizations across sectors are recognizing the potential of non-traditional data—data generated from sources outside conventional databases, such as social media, satellite imagery, and mobile usage—to provide insights into societal trends and challenges. When harnessed thoughtfully, this data can improve decision-making and bolster public interest projects in areas as varied as disaster response, healthcare, and environmental protection. However, with these new data streams come heightened ethical, legal, and operational risks that organizations need to manage responsibly. That’s where due diligence comes in, helping to ensure that data initiatives are beneficial and ethical.

The report, Trust but Verify: A Guide to Conducting Due Diligence When Leveraging Non-Traditional Data in the Public Interest, co-authored by Sara Marcucci, Andrew J. Zahuranec, and Stefaan Verhulst, offers a comprehensive framework to guide organizations in responsible data partnerships. Whether you’re a public agency or a private enterprise, this report provides a six-step process to ensure due diligence and maintain accountability, integrity, and trust in data initiatives…(More) (Blog)”.

Beyond checking a box: how a social licence can help communities benefit from data reuse and AI


Article by Stefaan Verhulst and Peter Addo: “In theory, consent offers a mechanism to reduce power imbalances. In reality, existing consent mechanisms are limited and, in many respects, archaic, based on binary distinctions – typically presented in check-the-box forms that most websites use to ask you to register for marketing e-mails – that fail to appreciate the nuance and context-sensitive nature of data reuse. Consent today generally means individual consent, a notion that overlooks the broader needs of communities and groups.

While we understand the need to safeguard information about an individual such as, say, their health status, this information can help address or even prevent societal health crises. Individualised notions of consent fail to consider the potential public good of reusing individual data responsibly. This makes them particularly problematic in societies that have more collective orientations, where prioritising individual choices could disrupt the social fabric.

The notion of a social licence, which has its roots in the 1990s within the extractive industries, refers to the collective acceptance of an activity, such as data reuse, based on its perceived alignment with community values and interests. Social licences go beyond the priorities of individuals and help balance the risks of data misuse and missed use (for example, the risks of violating privacy vs. neglecting to use private data for public good). Social licences permit a broader notion of consent that is dynamic, multifaceted and context-sensitive.

Policymakers, citizens, health providers, think tanks, interest groups and private industry must accept the concept of a social licence before it can be established. The goal for all stakeholders is to establish widespread consensus on community norms and an acceptable balance of social risk and opportunity.

Community engagement can create a consensus-based foundation for preferences and expectations concerning data reuse. Engagement could take place via dedicated “data assemblies” or community deliberations about data reuse for particular purposes under particular conditions. The process would need to involve voices as representative as possible of the different parties involved, and include those that are traditionally marginalised or silenced…(More)”.

Announcing SPARROW: A Breakthrough AI Tool to Measure and Protect Earth’s Biodiversity in the Most Remote Places


Blog by Juan Lavista Ferres: “The biodiversity of our planet is rapidly declining. We’ve likely reached a tipping point where it is crucial to use every tool at our disposal to help preserve what remains. That’s why I am pleased to announce SPARROW—Solar-Powered Acoustic and Remote Recording Observation Watch, developed by Microsoft’s AI for Good Lab. SPARROW is an AI-powered edge computing solution designed to operate autonomously in the most remote corners of the planet. Solar-powered and equipped with advanced sensors, it collects biodiversity data—from camera traps, acoustic monitors, and other environmental detectors—that are processed using our most advanced PyTorch-based wildlife AI models on low-energy edge GPUs. The resulting critical information is then transmitted via low-Earth orbit satellites directly to the cloud, allowing researchers to access fresh, actionable insights in real time, no matter where they are. 

Think of SPARROW as a network of Earth-bound satellites, quietly observing and reporting on the health of our ecosystems without disrupting them. By leveraging solar energy, these devices can run for a long time, minimizing their footprint and any potential harm to the environment…(More)”.

Harnessing AI: How to develop and integrate automated prediction systems for humanitarian anticipatory action


CEPR Report: “Despite unprecedented access to data, resources, and wealth, the world faces an escalating wave of humanitarian crises. Armed conflict, climate-induced disasters, and political instability are displacing millions and devastating communities. Nearly one in every five children are living in or fleeing conflict zones (OCHA, 2024). Often the impacts of conflict and climatic hazards – such as droughts and flood – exacerbate each other, leading to even greater suffering. As crises unfold and escalate, the need for timely and effective humanitarian action becomes paramount.

Sophisticated systems for forecasting and monitoring natural and man-made hazards have emerged as critical tools to help inform and prompt action. The full potential for the use of such automated forecasting systems to inform anticipatory action (AA) is immense but is still to be realised. By providing early warnings and predictive insights, these systems could help organisations allocate resources more efficiently, plan interventions more effectively, and ultimately save lives and prevent or reduce humanitarian impact.


This Policy Insight provides an account of the significant technical, ethical, and organisational difficulties involved in such systems, and the current solutions in place…(More)”.

Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft


Article by Kate Knibbs: “Harvard University announced Thursday it’s releasing a high-quality dataset of nearly 1 million public-domain books that could be used by anyone to train large language models and other AI tools. The dataset was created by Harvard’s newly formed Institutional Data Initiative with funding from both Microsoft and OpenAI. It contains books scanned as part of the Google Books project that are no longer protected by copyright.

Around five times the size of the notorious Books3 dataset that was used to train AI models like Meta’s Llama, the Institutional Data Initiative’s database spans genres, decades, and languages, with classics from Shakespeare, Charles Dickens, and Dante included alongside obscure Czech math textbooks and Welsh pocket dictionaries. Greg Leppert, executive director of the Institutional Data Initiative, says the project is an attempt to “level the playing field” by giving the general public, including small players in the AI industry and individual researchers, access to the sort of highly-refined and curated content repositories that normally only established tech giants have the resources to assemble. “It’s gone through rigorous review,” he says…(More)”.

How Years of Reddit Posts Have Made the Company an AI Darling


Article by Sarah E. Needleman: “Artificial-intelligence companies were one of Reddit’s biggest frustrations last year. Now they are a key source of growth for the social-media platform. 

These companies have an insatiable appetite for online data to train their models and display content in an easy-to-digest format. In mid-2023, Reddit, a social-media veteran and IPO newbie, turned off the spigot and began charging some businesses for access to its data. 

It turns out that Reddit’s ever-growing 19-year warehouse of user commentary makes it an attractive resource for AI companies. The platform recently reported its first quarterly profit as a publicly traded company, thanks partly to data-licensing deals it made in the past year with OpenAI and Google.

Reddit Chief Executive and co-founder Steve Huffman has said the company had to stop giving away its valuable data to the world’s largest companies for free. 

“It is an arms race,” he said at The Wall Street Journal’s Tech Live conference in October. “But we’re in talks with just about everybody, so we’ll see where these things land.”

Reddit’s huge amount of data works well for AI companies because it is organized by topics and uses a voting system instead of an algorithm to sort content quality, and because people’s posts tend to be candid.

For the first nine months of 2024, Reddit’s revenue category that includes licensing grew to $81.6 million from $12.3 million a year earlier.

While data-licensing revenue remains dwarfed by Reddit’s core advertising sales, the new category’s rapid growth reveals a potential lucrative business line with relatively high margins.

Diversifying away from a reliance on advertising, while tapping into an AI-adjacent market, has also made Reddit attractive to investors who are searching for new exposure to the latest technology boom. Reddit’s stock has more than doubled in the past three months.

The source of Reddit’s newfound wealth is the burgeoning market for AI-useful data. Reddit’s willingness to sell its data to AI outfits makes it stand out, because there is only a finite amount of data available for AI companies to gobble up for free or purchase. Some executives and researchers say the industry’s need for high-quality text could outstrip supply within two years, potentially slowing AI’s development…(More)”.