On Slicks and Satellites: An Open Source Guide to Marine Oil Spill Detection


Article by Wim Zwijnenburg: “The sheer scale of ocean oil pollution is staggering. In Europe, a suspected 3,000 major illegal oil dumps take place annually, with an estimated release of between 15,000 and 60,000 tonnes of oil ending up in the North Sea. In the Mediterranean, figures provided by the Regional Marine Pollution Emergency Response Centre estimate there are 1,500 to 2,000 oil spills every year.

The impact of any single oil spill on a marine or coastal ecosystem can be devastating and long-lasting. Animals such as birds, turtles, dolphins and otters can suffer from ingesting or inhaling oil, as well as getting stuck in the slick. The loss of water and soil quality can be toxic to both flora and fauna. Heavy metals enter the food chain, poisoning everything from plankton to shellfish, which in turn affects the livelihoods of coastal communities dependent on fishing and tourism.

However, with a wealth of open source earth observation tools at our fingertips, during such environmental disasters it’s possible for us to identify and monitor these spills, highlight at-risk areas, and even hold perpetrators accountable. …

There are several different types of remote sensing sensors we can use for collecting data about the Earth’s surface. In this article we’ll focus on two: optical and radar sensors. 

Optical imagery captures the broad light spectrum reflected from the Earth, also known as passive remote sensing. In contrast, Synthetic Aperture Radar (SAR) uses active remote sensing, sending radio waves down to the Earth’s surface and capturing them as they are reflected back. Any change in the reflection can indicate a change on ground, which can then be investigated. For more background, see Bellingcat contributor Ollie Ballinger’s Remote Sensing for OSINT Guide…(More)”.

New AI standards group wants to make data scraping opt-in


Article by Kate Knibbs: “The first wave of major generative AI tools largely were trained on “publicly available” data—basically, anything and everything that could be scraped from the Internet. Now, sources of training data are increasingly restricting access and pushing for licensing agreements. With the hunt for additional data sources intensifying, new licensing startups have emerged to keep the source material flowing.

The Dataset Providers Alliance, a trade group formed this summer, wants to make the AI industry more standardized and fair. To that end, it has just released a position paper outlining its stances on major AI-related issues. The alliance is made up of seven AI licensing companies, including music copyright-management firm Rightsify, Japanese stock-photo marketplace Pixta, and generative-AI copyright-licensing startup Calliope Networks. (At least five new members will be announced in the fall.)

The DPA advocates for an opt-in system, meaning that data can be used only after consent is explicitly given by creators and rights holders. This represents a significant departure from the way most major AI companies operate. Some have developed their own opt-out systems, which put the burden on data owners to pull their work on a case-by-case basis. Others offer no opt-outs whatsoever…(More)”.

Building LLMs for the social sector: Emerging pain points


Blog by Edmund Korley: “…One of the sprint’s main tracks focused on using LLMs to enhance the impact and scale of chat services in the social sector.

Six organizations participated, with operations spanning Africa and India. Bandhu empowers India’s blue-collar workers and migrants by connecting them to jobs and affordable housing, helping them take control of their livelihoods and future stability. Digital Green enhances rural farmers’ agency with AI-driven insights to improve agricultural productivity and livelihoods. Jacaranda Health provides mothers in sub-Saharan Africa with essential information and support to improve maternal and newborn health outcomes. Kabakoo equips youth in Francophone Africa with digital skills, fostering self-reliance and economic independence. Noora Health teaches Indian patients and caregivers critical health skills, enhancing their ability to manage care. Udhyam provides micro-entrepreneurs’ with education, mentorship, and financial support to build sustainable businesses.

These organizations demonstrate diverse ways one can boost human agency: they help people in underserved communities take control of their lives, make more informed choices, and build better futures – and they are piloting AI interventions to scale these efforts…(More)”.

Using internet search data as part of medical research


Blog by Susan Thomas and Matthew Thompson: “…In the UK, almost 50 million health-related searches are made using Google per year. Globally there are 100s of millions of health-related searches every day. And, of course, people are doing these searches in real-time, looking for answers to their concerns in the moment. It’s also possible that, even if people aren’t noticing and searching about changes to their health, their behaviour is changing. Maybe they are searching more at night because they are having difficulty sleeping or maybe they are spending more (or less) time online. Maybe an individual’s search history could actually be really useful for researchers. This realisation has led medical researchers to start to explore whether individuals’ online search activity could help provide those subtle, almost unnoticeable signals that point to the beginning of a serious illness.

Our recent review found 23 studies have been published so far that have done exactly this. These studies suggest that online search activity among people later diagnosed with a variety of conditions ranging from pancreatic cancer and stroke to mood disorders, was different to people who did not have one of these conditions.

One of these studies was published by researchers at Imperial College London, who used online search activity to identify signals of women with gynaecological malignancies. They found that women with malignant (e.g. ovarian cancer) and benign conditions had different search patterns, up to two months prior to a GP referral. 

Pause for a moment, and think about what this could mean. Ovarian cancer is one of the most devastating cancers women get. It’s desperately hard to detect early – and yet there are signals of this cancer visible in women’s internet searches months before diagnosis?…(More)”.

Advocating an International Decade for Data under G20 Sponsorship


G20 Policy Brief by Lorrayne Porciuncula, David Passarelli, Muznah Siddiqui, and Stefaan Verhulst: “This brief draws attention to the important role of data in social and economic development. It advocates the establishment of an International Decade for Data (IDD) from 2025-2035 under G20 sponsorship. The IDD can be used to bridge existing data governance initiatives and deliver global ambitions to use data for social impact, innovation, economic growth, research, and social development. Despite the critical importance of data governance to achieving the SDGs and to emerging topics such as artificial intelligence, there is no unified space that brings together stakeholders to coordinate and shape the data dimension of digital societies.

While various data governance processes exist, they often operate in silos, without effective coordination and interoperability. This fragmented landscape inhibits progress toward a more inclusive and sustainable digital future. The envisaged IDD fosters an integrated approach to data governance that supports all stakeholders in navigating complex data landscapes. Central to this proposal are new institutional frameworks (e.g. data collaboratives), mechanisms (e.g. digital social licenses and sandboxes), and professional domains (e.g. data stewards), that can respond to the multifaceted issue of data governance and the multiplicity of actors involved.

The G20 can capitalize on the Global Digital Compact’s momentum and create a task force to position itself as a data champion through the launch of the IDD, enabling collective progress and steering global efforts towards a more informed and responsible data-centric society…(More)”.

Frontier AI: double-edged sword for public sector


Article by Zeynep Engin: “The power of the latest AI technologies, often referred to as ‘frontier AI’, lies in their ability to automate decision-making by harnessing complex statistical insights from vast amounts of unstructured data, using models that surpass human understanding. The introduction of ChatGPT in late 2022 marked a new era for these technologies, making advanced AI models accessible to a wide range of users, a development poised to permanently reshape how our societies function.

From a public policy perspective, this capacity offers the optimistic potential to enable personalised services at scale, potentially revolutionising healthcare, education, local services, democratic processes, and justice, tailoring them to everyone’s unique needs in a digitally connected society. The ambition is to achieve better outcomes than humanity has managed so far without AI assistance. There is certainly a vast opportunity for improvement, given the current state of global inequity, environmental degradation, polarised societies, and other chronic challenges facing humanity.

However, it is crucial to temper this optimism with recognising the significant risks. In their current trajectories, these technologies are already starting to undermine hard-won democratic gains and civil rights. Integrating AI into public policy and decision-making processes risks exacerbating existing inequalities and unfairness, potentially leading to new, uncontrollable forms of discrimination at unprecedented speed and scale. The environmental impacts, both direct and indirect, could be catastrophic, while the rise of AI-powered personalised misinformation and behavioural manipulation is contributing to increasingly polarised societies.

Steering the direction of AI to be in the public interest requires a deeper understanding of its characteristics and behaviour. To imagine and design new approaches to public policy and decision-making, we first need a comprehensive understanding of what this remarkable technology offers and its potential implications…(More)”.

Data sovereignty for local governments. Considerations and enablers


Report by JRC Data sovereignty for local governments refers to a capacity to control and/or access data, and to foster a digital transformation aligned with societal values and EU Commission political priorities. Data sovereignty clauses are an instrument that local governments may use to compel companies to share data of public interest. Albeit promising, little is known about the peculiarities of this instrument and how it has been implemented so far. This policy brief aims at filling the gap by systematising existing knowledge and providing policy-relevant recommendations for its wider implementation…(More)”.

Breaking the Wall of Digital Heteronomy


Interview with Julia Janssen: “The walls of algorithms increasingly shape your life. Telling what to buy, where to go, what news to believe or songs to listen to. Data helps to navigate the world’s complexity and its endless possibilities. Artificial intelligence promises frictionless experiences, tailored and targeted, seamless and optimized to serve you best. But, at what cost? Frictionlessness comes with obedience. To the machine, the market and your own prophesy.

Mapping the Oblivion researches the influence of data and AI on human autonomy. The installation visualized Netflix’s percentage-based prediction models to provoke questions about to what extent we want to quantify choices. Will you only watch movies that are over 64% to your liking? Dine at restaurants that match your appetite above 76%. Date people with a compatibility rate of 89%? Will you never choose the career you want when there is only a 12% chance you’ll succeed? Do you want to outsmart your intuition with systems you do not understand and follow the map of probabilities and statistics?

Digital heteronomy is a condition in which one is guided by data, governed by AI and ordained by the industry. Homo Sapiens, the knowing being becomes Homo Stultus, the controllable being.

Living a quantified life in a numeric world. Not having to choose, doubt or wonder. Kept safe, risk-free and predictable within algorithmic walls. Exhausted of autonomy, creativity and randomness. Imprisoned in bubbles, profiles and behavioural tribes. Controllable, observable and monetizable.

Breaking the wall of digital heteronomy means taking back control over our data, identity, choices and chances in life. Honouring the unexpected, risk, doubt and having an unknown future. Shattering the power structures created by Big Tech to harvest information and capitalize on unfairness, vulnerabilities and fears. Breaking the wall of digital heteronomy means breaking down a system where profit is more important than people…(More)”.

AI firms must play fair when they use academic data in training


Nature Editorial: “But others are worried about principles such as attribution, the currency by which science operates. Fair attribution is a condition of reuse under CC BY, a commonly used open-access copyright license. In jurisdictions such as the European Union and Japan, there are exemptions to copyright rules that cover factors such as attribution — for text and data mining in research using automated analysis of sources to find patterns, for example. Some scientists see LLM data-scraping for proprietary LLMs as going well beyond what these exemptions were intended to achieve.

In any case, attribution is impossible when a large commercial LLM uses millions of sources to generate a given output. But when developers create AI tools for use in science, a method known as retrieval-augmented generation could help. This technique doesn’t apportion credit to the data that trained the LLM, but does allow the model to cite papers that are relevant to its output, says Lucy Lu Wang, an AI researcher at the University of Washington in Seattle.

Giving researchers the ability to opt out of having their work used in LLM training could also ease their worries. Creators have this right under EU law, but it is tough to enforce in practice, says Yaniv Benhamou, who studies digital law and copyright at the University of Geneva. Firms are devising innovative ways to make it easier. Spawning, a start-up company in Minneapolis, Minnesota, has developed tools to allow creators to opt out of data scraping. Some developers are also getting on board: OpenAI’s Media Manager tool, for example, allows creators to specify how their works can be used by machine-learning algorithms…(More)”.

The Imperial Origins of Big Data


Blog and book by Asheesh Kapur Siddique: “We live in a moment of massive transformation in the nature of information. In 2020, according to one report, users of the Internet created 64.2 zetabytes of data, a quantity greater than the “number of detectable stars in the cosmos,” a colossal increase whose origins can be traced to the emergence of the World Wide Web in 1993.1 Facilitated by technologies like satellites, smartphones, and artificial intelligence, the scale and speed of data creation seems like it may only balloon over the rest of our lifetimes—and with it, the problem of how to govern ourselves in relation to the inequalities and opportunities that the explosion of data creates.

But while much about our era of big data is indeed revolutionary, the political questions that it raises—How should information be used? Who should control it? And how should it be preserved?—are ones with which societies have long grappled. These questions attained a particular importance in Europe from the eleventh century due to a technological change no less significant than the ones we are witnessing today: the introduction of paper into Europe. Initially invented in China, paper travelled to Europe via the conduit of Islam around the eleventh century after the Moors conquered Spain. Over the twelfth, thirteenth, and fourteenth centuries, paper emerged as the fundamental substrate which politicians, merchants, and scholars relied on to record and circulate information in governance, commerce, and learning. At the same time, governing institutions sought to preserve and control the spread of written information through the creation of archives: repositories where they collected, organized, and stored documents.

The expansion of European polities overseas from the late fifteenth century onward saw governments massively scale up their use of paper—and confront the challenge of controlling its dissemination across thousands of miles of ocean and land. These pressures were felt particularly acutely in what eventually became the largest empire in world history, the British empire. As people from the British isles from the early seventeenth century fought, traded, and settled their way to power in the Atlantic world and South Asia, administrators faced the problem of how to govern both their emigrating subjects and the non-British peoples with whom they interacted. This meant collecting information about their behavior through the technology of paper. Just as we struggle to organize, search, and control our email boxes, text messages, and app notifications, so too did these early moderns confront the attendant challenges of developing practices of collection and storage to manage the resulting information overload. And despite the best efforts of states and companies to control information, it constantly escaped their grasp, falling into the hands of their opponents and rivals who deployed it to challenge and contest ruling powers.

The history of the early modern information state offers no simple or straightforward answers to the questions that data raises for us today. But it does remind us of a crucial truth, all too readily obscured by the deluge of popular narratives glorifying technological innovation: that questions of data are inherently questions about politics—about who gets to collect, control, and use information, and the ends to which information should be put. We should resist any effort to insulate data governance from democratic processes—and having an informed perspective on the politics of data requires that we attend not just to its present, but also to its past…(More)”.