Synthetic Data and the Future of AI


Paper by Peter Lee: “The future of artificial intelligence (AI) is synthetic. Several of the most prominent technical and legal challenges of AI derive from the need to amass huge amounts of real-world data to train machine learning (ML) models. Collecting such real-world data can be highly difficult and can threaten privacy, introduce bias in automated decision making, and infringe copyrights on a massive scale. This Article explores the emergence of a seemingly paradoxical technical creation that can mitigate—though not completely eliminate—these concerns: synthetic data. Increasingly, data scientists are using simulated driving environments, fabricated medical records, fake images, and other forms of synthetic data to train ML models. Artificial data, in other words, is being used to train artificial intelligence. Synthetic data offers a host of technical and legal benefits; it promises to radically decrease the cost of obtaining data, sidestep privacy issues, reduce automated discrimination, and avoid copyright infringement. Alongside such promise, however, synthetic data offers perils as well. Deficiencies in the development and deployment of synthetic data can exacerbate the dangers of AI and cause significant social harm.

In light of the enormous value and importance of synthetic data, this Article sketches the contours of an innovation ecosystem to promote its robust and responsible development. It identifies three objectives that should guide legal and policy measures shaping the creation of synthetic data: provisioning, disclosure, and democratization. Ideally, such an ecosystem should incentivize the generation of high-quality synthetic data, encourage disclosure of both synthetic data and processes for generating it, and promote multiple sources of innovation. This Article then examines a suite of “innovation mechanisms” that can advance these objectives, ranging from open source production to proprietary approaches based on patents, trade secrets, and copyrights. Throughout, it suggests policy and doctrinal reforms to enhance innovation, transparency, and democratic access to synthetic data. Just as AI will have enormous legal implications, law and policy can play a central role in shaping the future of AI…(More)”.

Prompting Diverse Ideas: Increasing AI Idea Variance


Paper by Lennart Meincke, Ethan Mollick, and Christian Terwiesch: “Unlike routine tasks where consistency is prized, in creativity and innovation the goal is to create a diverse set of ideas. This paper delves into the burgeoning interest in employing Artificial Intelligence (AI) to enhance the productivity and quality of the idea generation process. While previous studies have found that the average quality of AI ideas is quite high, prior research also has pointed to the inability of AI-based brainstorming to create sufficient dispersion of ideas, which limits novelty and the quality of the overall best idea. Our research investigates methods to increase the dispersion in AI-generated ideas. Using GPT-4, we explore the effect of different prompting methods on Cosine Similarity, the number of unique ideas, and the speed with which the idea space gets exhausted. We do this in the domain of developing a new product development for college students, priced under $50. In this context, we find that (1) pools of ideas generated by GPT-4 with various plausible prompts are less diverse than ideas generated by groups of human subjects (2) the diversity of AI generated ideas can be substantially improved using prompt engineering (3) Chain-of-Thought (CoT) prompting leads to the highest diversity of ideas of all prompts we evaluated and was able to come close to what is achieved by groups of human subjects. It also was capable of generating the highest number of unique ideas of any prompt we studied…(More)”

All the News That’s Fit to Click: How Metrics Are Transforming the Work of Journalists


Book by Caitlin Petre: “Journalists today are inundated with data about which stories attract the most clicks, likes, comments, and shares. These metrics influence what stories are written, how news is promoted, and even which journalists get hired and fired. Do metrics make journalists more accountable to the public? Or are these data tools the contemporary equivalent of a stopwatch wielded by a factory boss, worsening newsroom working conditions and journalism quality? In All the News That’s Fit to Click, Caitlin Petre takes readers behind the scenes at the New York TimesGawker, and the prominent news analytics company Chartbeat to explore how performance metrics are transforming the work of journalism.

Petre describes how digital metrics are a powerful but insidious new form of managerial surveillance and discipline. Real-time analytics tools are designed to win the trust and loyalty of wary journalists by mimicking key features of addictive games, including immersive displays, instant feedback, and constantly updated “scores” and rankings. Many journalists get hooked on metrics—and pressure themselves to work ever harder to boost their numbers.

Yet this is not a simple story of managerial domination. Contrary to the typical perception of metrics as inevitably disempowering, Petre shows how some journalists leverage metrics to their advantage, using them to advocate for their professional worth and autonomy…(More)”.

Trust in AI companies drops to 35 percent in new study


Article by Filip Timotija: “Trust in artificial intelligence (AI) companies has dipped to 35 percent over a five-year period in the U.S., according to new data.

The data, released Tuesday by public relations firm Edelman, found that trust in AI companies also dropped globally by eight points, going from 61 percent to 53 percent. 

The dwindling confidence in the rapidly-developing tech industry comes as regulators in the U.S. and across the globe are brainstorming solutions on how to regulate the sector. 

When broken down my political party, researchers found Democrats showed the most trust in AI companies at 38 percent — compared to Republicans’ 24 percent and independents’ 25 percent, per the study.

Multiple factors contributed to the decline in trust toward the companies polled in the data, according to Justin Westcott, Edelman’s chair of global technology.

“Key among these are fears related to privacy invasion, the potential for AI to devalue human contributions, and apprehensions about unregulated technological leaps outpacing ethical considerations,” Westcott said, adding “the data points to a perceived lack of transparency and accountability in how AI companies operate and engage with societal impacts.”

Technology as a whole is losing its lead in trust among sectors, Edelman said, highlighting the key findings from the study.

“Eight years ago, technology was the leading industry in trust in 90 percent of the countries we study,” researchers wrote, referring to the 28 countries. “Now it is most trusted only in half.”

Westcott argued the findings should be a “wake up call” for AI companies to “build back credibility through ethical innovation, genuine community engagement and partnerships that place people and their concerns at the heart of AI developments.”

As for the impacts on the future for the industry as a whole, “societal acceptance of the technology is now at a crossroads,” he said, adding that trust in AI and the companies producing it should be seen “not just as a challenge, but an opportunity.”

Priorities, Westcott continued, should revolve around ethical practices, transparency and a “relentless focus” on the benefits to society AI can provide…(More)”.

Unconventional data, unprecedented insights: leveraging non-traditional data during a pandemic


Paper by Kaylin Bolt et al: “The COVID-19 pandemic prompted new interest in non-traditional data sources to inform response efforts and mitigate knowledge gaps. While non-traditional data offers some advantages over traditional data, it also raises concerns related to biases, representativity, informed consent and security vulnerabilities. This study focuses on three specific types of non-traditional data: mobility, social media, and participatory surveillance platform data. Qualitative results are presented on the successes, challenges, and recommendations of key informants who used these non-traditional data sources during the COVID-19 pandemic in Spain and Italy….

Non-traditional data proved valuable in providing rapid results and filling data gaps, especially when traditional data faced delays. Increased data access and innovative collaborative efforts across sectors facilitated its use. Challenges included unreliable access and data quality concerns, particularly the lack of comprehensive demographic and geographic information. To further leverage non-traditional data, participants recommended prioritizing data governance, establishing data brokers, and sustaining multi-institutional collaborations. The value of non-traditional data was perceived as underutilized in public health surveillance, program evaluation and policymaking. Participants saw opportunities to integrate them into public health systems with the necessary investments in data pipelines, infrastructure, and technical capacity…(More)”.

The AI data scraping challenge:  How can we proceed responsibly?


Article by Lee Tiedrich: “Society faces an urgent and complex artificial intelligence (AI) data scraping challenge.  Left unsolved, it could threaten responsible AI innovation.  Data scraping refers to using web crawlers or other means to obtain data from third-party websites or social media properties.  Today’s large language models (LLMs) depend on vast amounts of scraped data for training and potentially other purposes.  Scraped data can include facts, creative content, computer code, personal information, brands, and just about anything else.  At least some LLM operators directly scrape data from third-party sites.  Common CrawlLAION, and other sites make scraped data readily accessible.  Meanwhile, Bright Data and others offer scraped data for a fee. 

In addition to fueling commercial LLMs, scraped data can provide researchers with much-needed data to advance social good.  For instance, Environmental Journal explains how scraped data enhances sustainability analysis.  Nature reports that scraped data improves research about opioid-related deaths.  Training data in different languages can help make AI more accessible for users in Africa and other underserved regions.  Access to training data can even advance the OECD AI Principles by improving safety and reducing bias and other harms, particularly when such data is suitable for the AI system’s intended purpose…(More)”.

The Computable City: Histories, Technologies, Stories, Predictions


Book by Michael Batty: “At every stage in the history of computers and communications, it is safe to say we have been unable to predict what happens next. When computers first appeared nearly seventy-five years ago, primitive computer models were used to help understand and plan cities, but as computers became faster, smaller, more powerful, and ever more ubiquitous, cities themselves began to embrace them. As a result, the smart city emerged. In The Computable City, Michael Batty investigates the circularity of this peculiar evolution: how computers and communications changed the very nature of our city models, which, in turn, are used to simulate systems composed of those same computers.

Batty first charts the origins of computers and examines how our computational urban models have developed and how they have been enriched by computer graphics. He then explores the sequence of digital revolutions and how they are converging, focusing on continual changes in new technologies, as well as the twenty-first-century surge in social media, platform economies, and the planning of the smart city. He concludes by revisiting the digital transformation as it continues to confound us, with the understanding that the city, now a high-frequency twenty-four-hour version of itself, changes our understanding of what is possible…(More)”.

Evaluating LLMs Through a Federated, Scenario-Writing Approach


Article by Bogdana “Bobi” Rakova: “What do screenwriters, AI builders, researchers, and survivors of gender-based violence have in common? I’d argue they all imagine new, safe, compassionate, and empowering approaches to building understanding.

In partnership with Kwanele South Africa, I lead an interdisciplinary team, exploring this commonality in the context of evaluating large language models (LLMs) — more specifically, chatbots that provide legal and social assistance in a critical context. The outcomes of our engagement are a series of evaluation objectives and scenarios that contribute to an evaluation protocol with the core tenet that when we design for the most vulnerable, we create better futures for everyone. In what follows I describe our process. I hope this methodological approach and our early findings will inspire other evaluation efforts to meaningfully center the margins in building more positive futures that work for everyone…(More)”

Generative AI: Navigating Intellectual Property


Factsheet by WIPO: “Generative artificial intelligence (AI) tools are rapidly being adopted by many businesses and organizations for the purpose of content generation. Such tools represent both a substantial opportunity to assist business operations and a significant legal risk due to current uncertainties, including intellectual property (IP) questions.

Many organizations are seeking to put guidance in place to help their employees mitigate these risks. While each business situation and legal context will be unique, the following Guiding Principles and Checklist are intended to assist organizations in understanding the IP risks, asking the right questions, and considering potential safeguards…(More)”.

Surveilling Alone


Essay by Christine Rosen: “When Jane Jacobs, author of the 1961 classic The Death and Life of Great American Cities, outlined the qualities of successful neighborhoods, she included “eyes on the street,” or, as she described this, the “eyes belonging to those we might call the natural proprietors of the street,” including shopkeepers and residents going about their daily routines. Not every neighborhood enjoyed the benefit of this informal sense of community, of course, but it was widely seen to be desirable. What Jacobs understood is that the combined impact of many local people practicing normal levels of awareness in their neighborhoods on any given day is surprisingly effective for community-building, with the added benefit of building trust and deterring crime.

Jacobs’s championing of these “natural proprietors of the street” was a response to a mid-century concern that aggressive city planning would eradicate the vibrant experience of neighborhoods like her own, the Village in New York City. Jacobs famously took on “master planner” Robert Moses after he proposed building an expressway through Lower Manhattan, a scheme that, had it succeeded, would have destroyed Washington Square Park and the Village, and turned neighborhoods around SoHo into highway underpasses. For Jacobs and her fellow citizen activists, the efficiency of the proposed highway was not enough to justify eliminating bustling sidewalks and streets, where people played a crucial role in maintaining the health and order of their communities.

Today, a different form of efficient design is eliminating “eyes on the street” — by replacing them with technological ones. The proliferation of neighborhood surveillance technologies such as Ring cameras and digital neighborhood-watch platforms and apps such as Nextdoor and Citizen have freed us from the constraints of having to be physically present to monitor our homes and streets. Jacobs’s “eyes on the street” are now cameras on many homes, and the everyday interactions between neighbors and strangers are now a network of cameras and platforms that promise to put “neighborhood security in your hands,” as the Ring Neighbors app puts it.

Inside our homes, we monitor ourselves and our family members with equal zeal, making use of video baby monitors, GPS-tracking software for children’s smartphones (or for covert surveillance by a suspicious spouse), and “smart” speakers that are always listening and often recording when they shouldn’t. A new generation of domestic robots, such as Amazon’s Astro, combines several of these features into a roving service-machine always at your beck and call around the house and ever watchful of its security when you are away…(More)”.