Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper by Xin Chan, et al: “We propose a novel persona-driven data synthesis methodology that leverages various perspectives within a large language model (LLM) to create diverse synthetic data. To fully exploit this methodology at scale, we introduce Persona Hub — a collection of 1 billion diverse personas automatically curated from web data. These 1 billion personas (~13% of the world’s total population), acting as distributed carriers of world knowledge, can tap into almost every perspective encapsulated within the LLM, thereby facilitating the creation of diverse synthetic data at scale for various scenarios. By showcasing Persona Hub’s use cases in synthesizing high-quality mathematical and logical reasoning problems, instructions (i.e., user prompts), knowledge-rich texts, game NPCs and tools (functions) at scale, we demonstrate persona-driven data synthesis is versatile, scalable, flexible, and easy to use, potentially driving a paradigm shift in synthetic data creation and applications in practice, which may have a profound impact on LLM research and development…(More)”.

Collaborating with Journalists and AI: Leveraging Social Media Images for Enhanced Disaster Resilience and Recovery

Paper by Murthy Dhiraj et al: “Methods to meaningfully integrate journalists into crisis informatics remain lacking. We explored the feasibility of generating a real-time, priority-driven map of infrastructure damage during a natural disaster by strategically selecting journalist networks to identify sources of image-based infrastructure-damage data. Using the REST Twitter API, 1,000,522 tweets were collected from September 13-18, 2018, during and after Hurricane Florence made landfall in the United States. Tweets were classified by source (e.g., news organizations or citizen journalists), and 11,638 images were extracted. We utilized Google’s AutoML Vision software to successfully develop a machine learning image classification model to interpret this sample of images. As a result, 80% of our labeled data was used for training, 10% for validation, and 10% for testing. The model achieved an average precision of 90.6%, an average recall of 77.2%, and an F1 score of .834. In the future, establishing strategic networks of journalists ahead of disasters will reduce the time needed to identify disaster-response targets, thereby focusing relief and recovery efforts in real-time. This approach ultimately aims to save lives and mitigate harm…(More)”.

Building an AI ecosystem in a small nation: lessons from Singapore’s journey to the forefront of AI

Paper by Shaleen Khanal, Hongzhou Zhang & Araz Taeihagh: “Artificial intelligence (AI) is arguably the most transformative technology of our time. While all nations would like to mobilize their resources to play an active role in AI development and utilization, only a few nations, such as the United States and China, have the resources and capacity to do so. If so, how can smaller or less resourceful countries navigate the technological terrain to emerge at the forefront of AI development? This research presents an in-depth analysis of Singapore’s journey in constructing a robust AI ecosystem amidst the prevailing global dominance of the United States and China. By examining the case of Singapore, we argue that by designing policies that address risks associated with AI development and implementation, smaller countries can create a vibrant AI ecosystem that encourages experimentation and early adoption of the technology. In addition, through Singapore’s case, we demonstrate the active role the government can play, not only as a policymaker but also as a steward to guide the rest of the economy towards the application of AI…(More)”.

Future of Professionals

Report by Thomson Reuters: “First, the productivity benefits we have been promised are now becoming more apparent. As AI adoption has become widespread, professionals can more tangibly tell us about how they will use this transformative technology and the greater efficiency and value it will provide. The most common use cases for AI-powered technology thus far include drafting documents, summarizing information, and performing basic research. Second, there’s a tremendous sense of excitement about the value that new AI-powered technology can bring to the day-to-day lives of the professionals we surveyed. While more than half of professionals said they’re most excited about the benefits that new AI-powered technologies can bring in terms of time-savings, nearly 40% said the new value that will be brought is what excites them the most.

This report highlights how AI could free up that precious commodity of time. As with the adoption of all new technology, change appears moderate and the impact incremental. And yet, within the year, our respondents predicted that for professionals, AI could free up as much as four hours a week. What will they do with 200 extra hours of time a year? They might reinvest that time in strategic work, innovation, and professional development, which could help companies retain or advance their competitive advantage. Imagine the broader impact on the economy and GDP from this increased efficiency. For US lawyers alone, that is a combined 266 million hours of increased productivity. That could translate into $100,000 in new, billable time per lawyer each year, based on current average rates – with similar productivity gains projected across various professions. The time saved can also be reinvested in professional development, nurturing work-life balance, and focusing on wellness and mental health. Moreover, the economic and organizational benefits of these time-savings are substantial. They could lead to reduced operational costs and higher efficiency, while enabling organizations to redirect resources toward strategic initiatives, fostering growth and competitiveness.

Finally, it’s important to acknowledge there’s still a healthy amount of reticence among professionals to fully adopt AI. Respondents are concerned primarily with the accuracy of outputs, and almost two-thirds of respondents agreed that data security is a vital component of responsible use. These concerns aren’t trivial, and they warrant attention as we navigate this new era of technology. While AI can provide tremendous productivity benefits to professionals and generate greater value for businesses, that’s only possible if we build and use this technology responsibly.”…(More)”.

AI-Ready FAIR Data: Accelerating Science through Responsible AI and Data Stewardship

Article by Sean Hill: “Imagine a future where scientific discovery is unbound by the limitations of data accessibility and interoperability. In this future, researchers across all disciplines — from biology and chemistry to astronomy and social sciences — can seamlessly access, integrate, and analyze vast datasets with the assistance of advanced artificial intelligence (AI). This world is one where AI-ready data empowers scientists to unravel complex problems at unprecedented speeds, leading to breakthroughs in medicine, environmental conservation, technology, and more. The vision of a truly FAIR (Findable, Accessible, Interoperable, Reusable) and AI-ready data ecosystem, underpinned by Responsible AI (RAI) practices and the pivotal role of data stewards, promises to revolutionize the way science is conducted, fostering an era of rapid innovation and global collaboration…(More)”.

The Economy of Algorithms

Book by Marek Kowalkiewicz: “Welcome to the economy of algorithms. It’s here and it’s growing. In the past few years, we have been flooded with examples of impressive technology. Algorithms have been around for hundreds of years, but they have only recently begun to ‘escape’ our understanding. When algorithms perform certain tasks, they’re not just as good as us, they’re becoming infinitely better, and, at the same time, massively more surprising. We are so impressed by what they can do that we give them a lot of agency. But because they are so hard to comprehend, this leads to all kinds of unintended consequences.

In the 20th century, things were simple: we had the economy of corporations. In the first two decades of the 21st century, we saw the emergence of the economy of people, otherwise known as the digital economy, enabled by the internet. Now we’re seeing a new economy take shape: the economy of algorithms…(More)”.

UN adopts Chinese resolution with US support on closing the gap in access to artificial intelligence

Article by Edith Lederer: “The U.N. General Assembly adopted a Chinese-sponsored resolution with U.S. support urging wealthy developed nations to close the widening gap with poorer developing countries and ensure that they have equal opportunities to use and benefit from artificial intelligence.

The resolution approved Monday follows the March 21 adoption of the first U.N. resolution on artificial intelligence spearheaded by the United States and co-sponsored by 123 countries including China. It gave global support to the international effort to ensure that AI is “safe, secure and trustworthy” and that all nations can take advantage of it.

Adoption of the two nonbinding resolutions shows that the United States and China, rivals in many areas, are both determined to be key players in shaping the future of the powerful new technology — and have been cooperating on the first important international steps.

The adoption of both resolutions by consensus by the 193-member General Assembly shows widespread global support for their leadership on the issue.

Fu Cong, China’s U.N. ambassador, told reporters Monday that the two resolutions are complementary, with the U.S. measure being “more general” and the just-adopted one focusing on “capacity building.”

He called the Chinese resolution, which had more than 140 sponsors, “great and far-reaching,” and said, “We’re very appreciative of the positive role that the U.S. has played in this whole process.”

Nate Evans, spokesperson for the U.S. mission to the United Nations, said Tuesday that the Chinese-sponsored resolution “was negotiated so it would further the vision and approach the U.S. set out in March.”

“We worked diligently and in good faith with developing and developed countries to strengthen the text, ensuring it reaffirms safe, secure, and trustworthy AI that respects human rights, commits to digital inclusion, and advances sustainable development,” Evans said.

Fu said that AI technology is advancing extremely fast and the issue has been discussed at very senior levels, including by the U.S. and Chinese leaders.

“We do look forward to intensifying our cooperation with the United States and for that matter with all countries in the world on this issue, which … will have far-reaching implications in all dimensions,” he said…(More)”.

Not all ‘open source’ AI models are actually open: here’s a ranking

Article by Elizabeth Gibney: “Technology giants such as Meta and Microsoft are describing their artificial intelligence (AI) models as ‘open source’ while failing to disclose important information about the underlying technology, say researchers who analysed a host of popular chatbot models.

The definition of open source when it comes to AI models is not yet agreed, but advocates say that ’full’ openness boosts science, and is crucial for efforts to make AI accountable. What counts as open source is likely to take on increased importance when the European Union’s Artificial Intelligence Act comes into force. The legislation will apply less strict regulations to models that are classed as open.

Some big firms are reaping the benefits of claiming to have open-source models, while trying “to get away with disclosing as little as possible”, says Mark Dingemanse, a language scientist at Radboud University in Nijmegen, the Netherlands. This practice is known as open-washing.

“To our surprise, it was the small players, with relatively few resources, that go the extra mile,” says Dingemanse, who together with his colleague Andreas Liesenfeld, a computational linguist, created a league table that identifies the most and least open models (see table). They published their findings on 5 June in the conference proceedings of the 2024 ACM Conference on Fairness, Accountability and Transparency…(More)”.

Artificial Intelligence Is Making The Housing Crisis Worse

Article by Rebecca Burns: “When Chris Robinson applied to move into a California senior living community five years ago, the property manager ran his name through an automated screening program that reportedly used artificial intelligence to detect “higher-risk renters.” Robinson, then 75, was denied after the program assigned him a low score — one that he later learned was based on a past conviction for littering.

Not only did the crime have little bearing on whether Robinson would be a good tenant, it wasn’t even one that he’d committed. The program had turned up the case of a 33-year-old man with the same name in Texas — where Robinson had never lived. He eventually corrected the error but lost the apartment and his application fee nonetheless, according to a federal class-action lawsuit that moved towards settlement this month. The credit bureau TransUnion, one of the largest actors in the multi-billion-dollar tenant screening industry, agreed to pay $11.5 million to resolve claims that its programs violated fair credit reporting laws.

Landlords are increasingly turning to private equity-backed artificial intelligence (AI) screening programs to help them select tenants, and resulting cases like Robinson’s are just the tip of the iceberg. The prevalence of incorrect, outdated, or misleading information in such reports is increasing costs and barriers to housing, according to a recent report from federal consumer regulators.

Even when screening programs turn up real data, housing and privacy advocates warn that opaque algorithms are enshrining high-tech discrimination in an already unequal housing market — the latest example of how AI can end up amplifying existing biases…(More)”.

What the Arrival of A.I. Phones and Computers Means for Our Data

Article by Brian X. Chen: “Apple, Microsoft and Google are heralding a new era of what they describe as artificially intelligent smartphones and computers. The devices, they say, will automate tasks like editing photos and wishing a friend a happy birthday.

But to make that work, these companies need something from you: more data.

In this new paradigm, your Windows computer will take a screenshot of everything you do every few seconds. An iPhone will stitch together information across many apps you use. And an Android phone can listen to a call in real time to alert you to a scam.

Is this information you are willing to share?

This change has significant implications for our privacy. To provide the new bespoke services, the companies and their devices need more persistent, intimate access to our data than before. In the past, the way we used apps and pulled up files and photos on phones and computers was relatively siloed. A.I. needs an overview to connect the dots between what we do across apps, websites and communications, security experts say.

“Do I feel safe giving this information to this company?” Cliff Steinhauer, a director at the National Cybersecurity Alliance, a nonprofit focusing on cybersecurity, said about the companies’ A.I. strategies.

All of this is happening because OpenAI’s ChatGPT upended the tech industry nearly two years ago. Apple, Google, Microsoft and others have since overhauled their product strategies, investing billions in new services under the umbrella term of A.I. They are convinced this new type of computing interface — one that is constantly studying what you are doing to offer assistance — will become indispensable.

The biggest potential security risk with this change stems from a subtle shift happening in the way our new devices work, experts say. Because A.I. can automate complex actions — like scrubbing unwanted objects from a photo — it sometimes requires more computational power than our phones can handle. That means more of our personal data may have to leave our phones to be dealt with elsewhere.

The information is being transmitted to the so-called cloud, a network of servers that are processing the requests. Once information reaches the cloud, it could be seen by others, including company employees, bad actors and government agencies. And while some of our data has always been stored in the cloud, our most deeply personal, intimate data that was once for our eyes only — photos, messages and emails — now may be connected and analyzed by a company on its servers…(More)”.