How Years of Reddit Posts Have Made the Company an AI Darling


Article by Sarah E. Needleman: “Artificial-intelligence companies were one of Reddit’s biggest frustrations last year. Now they are a key source of growth for the social-media platform. 

These companies have an insatiable appetite for online data to train their models and display content in an easy-to-digest format. In mid-2023, Reddit, a social-media veteran and IPO newbie, turned off the spigot and began charging some businesses for access to its data. 

It turns out that Reddit’s ever-growing 19-year warehouse of user commentary makes it an attractive resource for AI companies. The platform recently reported its first quarterly profit as a publicly traded company, thanks partly to data-licensing deals it made in the past year with OpenAI and Google.

Reddit Chief Executive and co-founder Steve Huffman has said the company had to stop giving away its valuable data to the world’s largest companies for free. 

“It is an arms race,” he said at The Wall Street Journal’s Tech Live conference in October. “But we’re in talks with just about everybody, so we’ll see where these things land.”

Reddit’s huge amount of data works well for AI companies because it is organized by topics and uses a voting system instead of an algorithm to sort content quality, and because people’s posts tend to be candid.

For the first nine months of 2024, Reddit’s revenue category that includes licensing grew to $81.6 million from $12.3 million a year earlier.

While data-licensing revenue remains dwarfed by Reddit’s core advertising sales, the new category’s rapid growth reveals a potential lucrative business line with relatively high margins.

Diversifying away from a reliance on advertising, while tapping into an AI-adjacent market, has also made Reddit attractive to investors who are searching for new exposure to the latest technology boom. Reddit’s stock has more than doubled in the past three months.

The source of Reddit’s newfound wealth is the burgeoning market for AI-useful data. Reddit’s willingness to sell its data to AI outfits makes it stand out, because there is only a finite amount of data available for AI companies to gobble up for free or purchase. Some executives and researchers say the industry’s need for high-quality text could outstrip supply within two years, potentially slowing AI’s development…(More)”.

Citizen science as an instrument for women’s health research


Paper by Sarah Ahannach et al: “Women’s health research is receiving increasing attention globally, but considerable knowledge gaps remain. Across many fields of research, active involvement of citizens in science has emerged as a promising strategy to help align scientific research with societal needs. Citizen science offers researchers the opportunity for large-scale sampling and data acquisition while engaging the public in a co-creative approach that solicits their input on study aims, research design, data gathering and analysis. Here, we argue that citizen science has the potential to generate new data and insights that advance women’s health. Based on our experience with the international Isala project, which used a citizen-science approach to study the female microbiome and its influence on health, we address key challenges and lessons for generating a holistic, community-centered approach to women’s health research. We advocate for interdisciplinary collaborations to fully leverage citizen science in women’s health toward a more inclusive research landscape that amplifies underrepresented voices, challenges taboos around intimate health topics and prioritizes women’s involvement in shaping health research agendas…(More)”.

Changing Behaviour by Adding an Option


Paper by Lukas Fuchs: “Adding an option is a neglected mechanism for bringing about behavioural change. This mechanism is distinct from nudges, which are changes in the choice architecture, and instead makes it possible to pursue republican paternalism, a unique form of paternalism in which choices are changed by expanding people’s set of options. I argue that this is truly a form of paternalism (albeit a relatively soft one) and illustrate some of its manifestations in public policy, specifically public options and market creation. Furthermore, I compare it with libertarian paternalism on several dimensions, namely respect for individuals’ agency, effectiveness, and efficiency. Finally, I consider whether policymakers have the necessary knowledge to successfully change behaviour by adding options. Given that adding an option has key advantages over nudges in most if not all of these dimensions, it should be considered indispensable in the behavioural policymaker’s toolbox…(More)”.

Must NLP be Extractive?


Paper by Steven Bird: “How do we roll out language technologies across a world with 7,000 languages? In one story, we scale the successes of NLP further into ‘low-resource’ languages, doing ever more with less. However, this approach does not recognise the fact that – beyond the 500 institutional languages – the remaining languages are oral vernaculars. These speech communities interact with the outside world using a ‘con-
tact language’. I argue that contact languages are the appropriate target for technologies like speech recognition and machine translation, and that the 6,500 oral vernaculars should be approached differently. I share stories from an Indigenous community where local people reshaped an extractive agenda to align with their relational agenda. I describe the emerging paradigm of Relational NLP and explain how it opens the way to non-extractive methods and to solutions that enhance human agency…(More)”

Navigating the AI Frontier: A Primer on the Evolution and Impact of AI Agents


Report by the World Economic Forum: “AI agents are autonomous systems capable of sensing, learning and acting upon their environments. This white paper explores their development and looks at how they are linked to recent advances in large language and multimodal models. It highlights how AI agents can enhance efficiency across sectors including healthcare, education and finance.

Tracing their evolution from simple rule-based programmes to sophisticated entities with complex decision-making abilities, the paper discusses both the benefits and the risks associated with AI agents. Ethical considerations such as transparency and accountability are emphasized, highlighting the need for robust governance frameworks and cross-sector collaboration.

By understanding the opportunities and challenges that AI agents present, stakeholders can responsibly leverage these systems to drive innovation, improve practices and enhance quality of life. This primer serves as a valuable resource for anyone seeking to gain a better grasp of this rapidly advancing field…(More)”.

The Limitations of Consent as a Legal Basis for Data Processing in the Digital Society


Paper by the Centre for Information Policy Leadership: “Contemporary everyday life is increasingly permeated by digital information, whether by creating, consuming or depending on it. Most of our professional and private lives now rely to a large degree on digital interactions. As a result, access to and the use of data, and in particular personal data, are key elements and drivers of the digital economy and society. This has brought us to a significant inflection point on the issue of legitimising the processing of personal data in the wide range of contexts that are essential to our data-driven, AI-enabled digital products and services. The time has come to seriously re-consider the status of consent as a privileged legal basis and to consider alternatives that are better suited for a wide range of essential data processing contexts. The most prominent among these alternatives are the “legitimate interest” and “contractual necessity” legal bases, which have found an equivalent in a number of jurisdictions. One example is Singapore, where revisions to their data protection framework include a legitimate interest exemption…(More)”.

Humanitarian Mapping with WhatsApp: Introducing ChatMap


Article by Emilio Mariscal: “…After some exploration, I came up with an idea: what if we could export chat conversations and extract the location data along with the associated messages? The solution would involve a straightforward application where users can upload their exported chats and instantly generate a map displaying all shared locations and messages. No business accounts or complex integrations would be required—just a simple, ready-to-use tool from day one.

ChatMap —chatmap.hotosm.org — is a straightforward and simple mapping solution that leverages WhatsApp, an application used by 2.78 billion people worldwide. Its simplicity and accessibility make it an effective tool for communities with limited technical knowledge. And it even works offline! as it relies on the GPS signal for location, sending all data with the phone to gather connectivity.

This solution provides complete independence, as it does not require users to adopt a technology that depends on third-party maintenance. It’s a simple data flow with an equally straightforward script that can be improved by anyone interested on GitHub.

We’re already using it! Recently, as part of a community mapping project to assess the risks in the slopes of Comuna 8 in Medellín, an area vulnerable to repeated flooding, a group of students and local collectives collaborated with the Humanitarian OpenStreetMap (HOT) to map areas affected by landslides and other disaster impacts. This initiative facilitated the identification and characterization of settlements, supporting humanitarian aid efforts.

Humanitarian Mapping ChatMap.jpg
Photo by Daniela Arbeláez Suárez (source: WhatsApp)

As shown in the picture, the community explored the area on foot, using their phones to take photos and notes, and shared them along with the location. It was incredibly simple!

The data gathered during this activity was transformed 20 minutes later (once getting access to a WIFI network) into a map, which was then uploaded to our online platform powered by uMap (umap.hotosm.org)…(More)”.

Humanitarian Mapping ChatMap WhatsApp Colombia.jpg
See more at https://umap.hotosm.org/en/map/unaula-mapea-con-whatsapp_38

Innovating with Non-Traditional Data: Recent Use Cases for Unlocking Public Value


Article by Stefaan Verhulst and Adam Zable: “Non-Traditional Data (NTD): “data that is digitally captured (e.g. mobile phone records), mediated (e.g. social media), or observed (e.g. satellite imagery), using new instrumentation mechanisms, often privately held.”

Digitalization and the resulting datafication have introduced a new category of data that, when re-used responsibly, can complement traditional data in addressing public interest questions—from public health to environmental conservation. Unlocking these often privately held datasets through data collaboratives is a key focus of what we have called The Third Wave of Open Data

To help bridge this gap, we have curated below recent examples of the use of NTD for research and decision-making that were published the past few months. They are organized into five categories:

  • Health and Well-being;
  • Humanitarian Aid;
  • Environment and Climate;
  • Urban Systems and Mobility, and 
  • Economic and Labor Dynamics…(More)”.

It Was the Best of Times, It Was the Worst of Times: The Dual Realities of Data Access in the Age of Generative AI


Article by Stefaan Verhulst: “It was the best of times, it was the worst of times… It was the spring of hope, it was the winter of despair.” –Charles Dickens, A Tale of Two Cities

Charles Dickens’s famous line captures the contradictions of the present moment in the world of data. On the one hand, data has become central to addressing humanity’s most pressing challenges — climate change, healthcare, economic development, public policy, and scientific discovery. On the other hand, despite the unprecedented quantity of data being generated, significant obstacles remain to accessing and reusing it. As our digital ecosystems evolve, including the rapid advances in artificial intelligence, we find ourselves both on the verge of a golden era of open data and at risk of slipping deeper into a restrictive “data winter.”

A Tale of Two Cities by Charles Dickens (1902)

These two realities are concurrent: the challenges posed by growing restrictions on data reuse, and the countervailing potential brought by advancements in privacy-enhancing technologies (PETs), synthetic data, and data commons approaches. It argues that while current trends toward closed data ecosystems threaten innovation, new technologies and frameworks could lead to a “Fourth Wave of Open Data,” potentially ushering in a new era of data accessibility and collaboration…(More)” (First Published in Industry Data for Society Partnership’s (IDSP) 2024 Year in Review).

Space, Satellites, and Democracy: Implications of the New Space Age for Democratic Processes and Recommendations for Action


NDI Report: “The dawn of a new space age is upon us, marked by unprecedented engagement from both state and private actors. Driven by technological innovations such as reusable rockets and miniaturized satellites, this era presents a double-edged sword for global democracy. On one side, democratized access to space offers powerful tools for enhancing civic processes. Satellite technology now enables real-time election monitoring, improved communication in remote areas, and more effective public infrastructure planning. It also equips democratic actors with means to document human rights abuses and circumvent authoritarian internet restrictions.

However, the accessibility of these technologies also raises significant concerns. The potential for privacy infringements and misuse by authoritarian regimes or malicious actors casts a shadow over these advancements.

This report discusses the opportunities and risks that space and satellite technologies pose to democracy, human rights, and civic processes globally. It examines the current regulatory and normative frameworks governing space activities and highlights key considerations for stakeholders navigating this increasingly competitive domain.

It is essential that the global democracy community be familiar with emerging trends in space and satellite technology and their implications for the future. Failure to do so will leave the community unprepared to harness the opportunities or address the challenges that space capabilities present. It would also cede influence over the development of global norms and standards in this arena to states and private sector interests alone and, in turn, ensure those standards are not rooted in democratic norms and human rights, but rather in principles such as state sovereignty and profit maximization…(More)”.