A.I. Is Mastering Language. Should We Trust What It Says?


Steven Johnson at the New York Times: “You are sitting in a comfortable chair by the fire, on a cold winter’s night. Perhaps you have a mug of tea in hand, perhaps something stronger. You open a magazine to an article you’ve been meaning to read. The title suggested a story about a promising — but also potentially dangerous — new technology on the cusp of becoming mainstream, and after reading only a few sentences, you find yourself pulled into the story. A revolution is coming in machine intelligence, the author argues, and we need, as a society, to get better at anticipating its consequences. But then the strangest thing happens: You notice that the writer has, seemingly deliberately, omitted the very last word of the first .

The missing word jumps into your consciousness almost unbidden: ‘‘the very last word of the first paragraph.’’ There’s no sense of an internal search query in your mind; the word ‘‘paragraph’’ just pops out. It might seem like second nature, this filling-in-the-blank exercise, but doing it makes you think of the embedded layers of knowledge behind the thought. You need a command of the spelling and syntactic patterns of English; you need to understand not just the dictionary definitions of words but also the ways they relate to one another; you have to be familiar enough with the high standards of magazine publishing to assume that the missing word is not just a typo, and that editors are generally loath to omit key words in published pieces unless the author is trying to be clever — perhaps trying to use the missing word to make a point about your cleverness, how swiftly a human speaker of English can conjure just the right word.

Before you can pursue that idea further, you’re back into the article, where you find the author has taken you to a building complex in suburban Iowa. Inside one of the buildings lies a wonder of modern technology: 285,000 CPU cores yoked together into one giant supercomputer, powered by solar arrays and cooled by industrial fans. The machines never sleep: Every second of every day, they churn through innumerable calculations, using state-of-the-art techniques in machine intelligence that go by names like ‘‘stochastic gradient descent’’ and ‘‘convolutional neural networks.’’ The whole system is believed to be one of the most powerful supercomputers on the planet.

And what, you may ask, is this computational dynamo doing with all these prodigious resources? Mostly, it is playing a kind of game, over and over again, billions of times a second. And the game is called: Guess what the missing word is.…(More)”.

Should we get rid of the scientific paper?


Article by Stuart Ritchie: “But although the internet has transformed the way we read it, the overall system for how we publish science remains largely unchanged. We still have scientific papers; we still send them off to peer reviewers; we still have editors who give the ultimate thumbs up or down as to whether a paper is published in their journal.

This system comes with big problems. Chief among them is the issue of publication bias: reviewers and editors are more likely to give a scientific paper a good write-up and publish it in their journal if it reports positive or exciting results. So scientists go to great lengths to hype up their studies, lean on their analyses so they produce “better” results, and sometimes even commit fraud in order to impress those all-important gatekeepers. This drastically distorts our view of what really went on.

There are some possible fixes that change the way journals work. Maybe the decision to publish could be made based only on the methodology of a study, rather than on its results (this is already happening to a modest extent in a few journals). Maybe scientists could just publish all their research by default, and journals would curate, rather than decide, which results get out into the world. But maybe we could go a step further, and get rid of scientific papers altogether.

Scientists are obsessed with papers – specifically, with having more papers published under their name, extending the crucial “publications” section of their CV. So it might sound outrageous to suggest we could do without them. But that obsession is the problem. Paradoxically, the sacred status of a published, peer-reviewed paper makes it harder to get the contents of those papers right.

Consider the messy reality of scientific research. Studies almost always throw up weird, unexpected numbers that complicate any simple interpretation. But a traditional paper – word count and all – pretty well forces you to dumb things down. If what you’re working towards is a big, milestone goal of a published paper, the temptation is ever-present to file away a few of the jagged edges of your results, to help “tell a better story”. Many scientists admit, in surveys, to doing just that – making their results into unambiguous, attractive-looking papers, but distorting the science along the way.

And consider corrections. We know that scientific papers regularly contain errors. One algorithm that ran through thousands of psychology papers found that, at worst, more than 50% had one specific statistical error, and more than 15% had an error serious enough to overturn the results. With papers, correcting this kind of mistake is a slog: you have to write in to the journal, get the attention of the busy editor, and get them to issue a new, short paper that formally details the correction. Many scientists who request corrections find themselves stonewalled or otherwise ignored by journals. Imagine the number of errors that litter the scientific literature that haven’t been corrected because to do so is just too much hassle.

Finally, consider data. Back in the day, sharing the raw data that formed the basis of a paper with that paper’s readers was more or less impossible. Now it can be done in a few clicks, by uploading the data to an open repository. And yet, we act as if we live in the world of yesteryear: papers still hardly ever have the data attached, preventing reviewers and readers from seeing the full picture.

The solution to all these problems is the same as the answer to “How do I organise my journals if I don’t use cornflakes boxes?” Use the internet. We can change papers into mini-websites (sometimes called “notebooks”) that openly report the results of a given study. Not only does this give everyone a view of the full process from data to analysis to write-up – the dataset would be appended to the website along with all the statistical code used to analyse it, and anyone could reproduce the full analysis and check they get the same numbers – but any corrections could be made swiftly and efficiently, with the date and time of all updates publicly logged…(More)”.

Cities Take the Lead in Setting Rules Around How AI Is Used


Jackie Snow at the Wall Street Journal: “As cities and states roll out algorithms to help them provide services like policing and traffic management, they are also racing to come up with policies for using this new technology.

AI, at its worst, can disadvantage already marginalized groups, adding to human-driven bias in hiring, policing and other areas. And its decisions can often be opaque—making it difficult to tell how to fix that bias, as well as other problems. (The Wall Street Journal discussed calls for regulation of AI, or at least greater transparency about how the systems work, with three experts.)

Cities are looking at a number of solutions to these problems. Some require disclosure when an AI model is used in decisions, while others mandate audits of algorithms, track where AI causes harm or seek public input before putting new AI systems in place.

Here are some ways cities are redefining how AI will work within their borders and beyond.

Explaining the algorithms: Amsterdam and Helsinki

One of the biggest complaints against AI is that it makes decisions that can’t be explained, which can lead to complaints about arbitrary or even biased results.

To let their citizens know more about the technology already in use in their cities, Amsterdam and Helsinki collaborated on websites that document how each city government uses algorithms to deliver services. The registry includes information on the data sets used to train an algorithm, a description of how an algorithm is used, how public servants use the results, the human oversight involved and how the city checks the technology for problems like bias.

Amsterdam has six algorithms fully explained—with a goal of 50 to 100—on the registry website, including how the city’s automated parking-control and trash-complaint reports work. Helsinki, which is only focusing on the city’s most advanced algorithms, also has six listed on its site, with another 10 to 20 left to put up.

“We needed to assess the risk ourselves,” says Linda van de Fliert, an adviser at Amsterdam’s Chief Technology Office. “And we wanted to show the world that it is possible to be transparent.”…(More)” See also AI Localism: The Responsible Use and Design of Artificial Intelligence at the Local Level

Russia Is Leaking Data Like a Sieve


Matt Burgess at Wired: “Names, birthdays, passport numbers, job titles—the personal information goes on for pages and looks like any typical data breach. But this data set is very different. It allegedly contains the personal information of 1,600 Russian troops who served in Bucha, a Ukrainian city devastated during Russia’s war and the scene of multiple potential war crimes.

The data set is not the only one. Another allegedly contains the names and contact details of 620 Russian spies who are registered to work at the Moscow office of the FSB, the country’s main security agency. Neither set of information was published by hackers. Instead they were put online by Ukraine’s intelligence services, with all the names and details freely available to anyone online. “Every European should know their names,” Ukrainian officials wrote in a Facebook post as they published the data.

Since Russian troops crossed Ukraine’s borders at the end of February, colossal amounts of information about the Russian state and its activities have been made public. The data offers unparalleled glimpses into closed-off private institutions, and it may be a gold mine for investigators, from journalists to those tasked with investigating war crimes. Broadly, the data comes in two flavors: information published proactively by Ukranian authorities or their allies, and information obtained by hacktivists. Hundreds of gigabytes of files and millions of emails have been made public.

“Both sides in this conflict are very good at information operations,” says Philip Ingram, a former colonel in British military intelligence. “The Russians are quite blatant about the lies that they’ll tell,” he adds. Since the war started, Russian disinformation has been consistently debunked. Ingram says Ukraine has to be more tactical with the information it publishes. “They have to make sure that what they’re putting out is credible and they’re not caught out telling lies in a way that would embarrass them or embarrass their international partners.”

Both the lists of alleged FSB officers and Russian troops were published online by Ukraine’s Central Intelligence Agency at the end of March and start of April, respectively. While WIRED has not been able to verify the accuracy of the data—and Ukrainian cybersecurity officials did not respond to a request for comment—Aric Toler, from investigative outlet Bellingcat, tweeted that the FSB details appear to have been combined from previous leaks and open source information. It is unclear how up-to-date the information is…(More)”.

The Power of Narrative


Essay by Klaus Schwab and Thierry Mallerett: “…The expression “failure of imagination” captures this by describing the expectation that future opportunities and risks will resemble those of the past. Novelist Graham Greene used it in The Power and the Glory, but the 9/11 Commission made it popular by invoking it as the main reason why intelligence agencies had failed to anticipate the “unimaginable” events of that day.

Ever since, the expression has been associated with situations in which strategic thinking and risk management are stuck in unimaginative and reactive thinking. Considering today’s wide and interdependent array of risks, we can’t afford to be unimaginative, even though, as the astrobiologist Caleb Scharf points out, we risk getting imprisoned in a dangerous cognitive lockdown because of the magnitude of the task. “Indeed, we humans do seem to struggle in general when too many new things are thrown at us at once. Especially when those things are outside of our normal purview. Like, well, weird viruses or new climate patterns,” Scharf writes. “In the face of such things, we can simply go into a state of cognitive lockdown, flipping from one small piece of the problem to another and not quite building a cohesive whole.”

Imagination is precisely what is required to escape a state of “cognitive lockdown” and to build a “cohesive whole.” It gives us the capacity to dream up innovative solutions to successfully address the multitude of risks that confront us. For decades now, we’ve been destabilizing the world, having failed to imagine the consequences of our actions on our societies and our biosphere, and the way in which they are connected. Now, following this failure and the stark realization of what it has entailed, we need to do just the opposite: rely on the power of imagination to get us out of the holes we’ve dug ourselves into. It is incumbent upon us to imagine the contours of a more equitable and sustainable world. Imagination being boundless, the variety of social, economic, and political solutions is infinite.

With respect to the assertion that there are things we don’t imagine to be socially or politically possible, a recent book shows that nothing is preordained. We are in fact only bound by the power of our own imaginations. In The Dawn of Everything, David Graeber and David Wengrow (an anthropologist and an archaeologist) prove this by showing that every imaginable form of social and economic organization has existed from the very beginning of humankind. Over the past 300,000 years, we’ve pursued knowledge, experimentation, happiness, development, freedom, and other human endeavors in myriad different ways. During these times that preceded our modern world, none of the arrangements that we devised to live together exhibited a single point of origin or an invariant pattern. Early societies were peaceful and violent, authoritarian and democratic, patriarchal and matriarchal, slaveholding and abolitionist, some moving between different types of organizations all the time, others not. Antique industrial cities were flourishing at the heart of empires while others existed in the absence of a sovereign entity…(More)”

Opening up Science—to Skeptics


Essay by Rohan R. Arcot  and Hunter Gehlbach: “Recently, the soaring trajectory of science skepticism seems to be rivaled only by global temperatures. Empirically established facts—around vaccines, elections, climate science, and the like—face potent headwinds. Despite the scientific consensus on these issues, much of the public remains unconvinced. In turn, science skepticism threatens our health, the health of our democracy, and the health of our planet.  

The research community is no stranger to skepticism. Its own members have been questioning the integrity of many scientific findings with particular intensity of late. In response, we have seen a swell of open science norms and practices, which provide greater transparency about key procedural details of the research process, mitigating many research skeptics’ misgivings. These open practices greatly facilitate how science is communicated—but only between scientists. 

Given the present historical moment’s critical need for science, we wondered: What if scientists allowed skeptics in the general public to look under the hood at how their studies were conducted? Could opening up the basic ideas of open science beyond scholars help combat the epidemic of science skepticism?  

Intrigued by this possibility, we sought a qualified skeptic and returned to Rohan’s father. If we could chaperone someone through a scientific journey—a person who could vicariously experience the key steps along the way—could our openness assuage their skepticism?…(More)”.

Internet ‘algospeak’ is changing our language in real time, from ‘nip nops’ to ‘le dollar bean’


Article by Taylor Lorenz: “Algospeak” is becoming increasingly common across the Internet as people seek to bypass content moderation filters on social media platforms such as TikTok, YouTube, Instagram and Twitch.

Algospeak refers to code words or turns of phrase users have adopted in an effort to create a brand-safe lexicon that will avoid getting their posts removed or down-ranked by content moderation systems. For instance, in many online videos, it’s common to say “unalive” rather than “dead,” “SA” instead of “sexual assault,” or “spicy eggplant” instead of “vibrator.”

As the pandemic pushed more people to communicate and express themselves online, algorithmic content moderation systems have had an unprecedented impact on the words we choose, particularly on TikTok, and given rise to a new form of internet-driven Aesopian language.

Unlike other mainstream social platforms, the primary way content is distributed on TikTok is through an algorithmically curated “For You” page; having followers doesn’t guarantee people will see your content. This shift has led average users to tailor their videos primarily toward the algorithm, rather than a following, which means abiding by content moderation rules is more crucial than ever.

When the pandemic broke out, people on TikTok and other apps began referring to it as the “Backstreet Boys reunion tour” or calling it the “panini” or “panda express” as platforms down-ranked videos mentioning the pandemic by name in an effort to combat misinformation. When young people began to discuss struggling with mental health, they talked about “becoming unalive” in order to have frank conversations about suicide without algorithmic punishment. Sex workers, who have long been censored by moderation systems, refer to themselves on TikTok as “accountants” and use the corn emoji as a substitute for the word “porn.”

As discussions of major events are filtered through algorithmic content delivery systems, more users are bending their language. Recently, in discussing the invasion of Ukraine, people on YouTube and TikTok have used the sunflower emoji to signify the country. When encouraging fans to follow them elsewhere, users will say “blink in lio” for “link in bio.”

Euphemisms are especially common in radicalized or harmful communities. Pro-anorexia eating disorder communities have long adopted variations on moderated words to evade restrictions. One paper from the School of Interactive Computing, Georgia Institute of Technology found that the complexity of such variants even increased over time. Last year, anti-vaccine groups on Facebook began changing their names to “dance party” or “dinner party” and anti-vaccine influencers on Instagram used similar code words, referring to vaccinated people as “swimmers.”

Tailoring language to avoid scrutiny predates the Internet. Many religions have avoided uttering the devil’s name lest they summon him, while people living in repressive regimes developed code words to discuss taboo topics…(More)”.

Facial Recognition Goes to War


Kashmir Hill at the New York Times: “In the weeks after Russia invaded Ukraine and images of the devastation wrought there flooded the news, Hoan Ton-That, the chief executive of the facial recognition company Clearview AI, began thinking about how he could get involved.

He believed his company’s technology could offer clarity in complex situations in the war.

“I remember seeing videos of captured Russian soldiers and Russia claiming they were actors,” Mr. Ton-That said. “I thought if Ukrainians could use Clearview, they could get more information to verify their identities.”

In early March, he reached out to people who might help him contact the Ukrainian government. One of Clearview’s advisory board members, Lee Wolosky, a lawyer who has worked for the Biden administration, was meeting with Ukrainian officials and offered to deliver a message.

Mr. Ton-That drafted a letter explaining that his app “can instantly identify someone just from a photo” and that the police and federal agencies in the United States used it to solve crimes. That feature has brought Clearview scrutiny over concerns about privacy and questions about racism and other biases within artificial-intelligence systems.

The tool, which can identify a suspect caught on surveillance video, could be valuable to a country under attack, Mr. Ton-That wrote. He said the tool could identify people who might be spies, as well as deceased people, by comparing their faces against Clearview’s database of 20 billion faces from the public web, including from “Russian social sites such as VKontakte.”

Mr. Ton-That decided to offer Clearview’s services to Ukraine for free, as reported earlier by Reuters. Now, less than a month later, the New York-based Clearview has created more than 200 accounts for users at five Ukrainian government agencies, which have conducted more than 5,000 searches. Clearview has also translated its app into Ukrainian.

“It’s been an honor to help Ukraine,” said Mr. Ton-That, who provided emails from officials from three agencies in Ukraine, confirming that they had used the tool. It has identified dead soldiers and prisoners of war, as well as travelers in the country, confirming the names on their official IDs. The fear of spies and saboteurs in the country has led to heightened paranoia.

According to one email, Ukraine’s national police obtained two photos of dead Russian soldiers, which have been viewed by The New York Times, on March 21. One dead man had identifying patches on his uniform, but the other did not, so the ministry ran his face through Clearview’s app…(More)”.

The Food Aid Delivery App


Essay by Trish Bendix: “Between 30 and 40 percent of the US food supply goes to waste each year. The US Environmental Protection Agency estimates that nearly 80 billion pounds of food end up in landfills annually. This figure takes on a greater significance in the context of another food crisis: food insecurity. More than 10 percent of US households are food insecure, and the nonprofit Feeding America reports that this number will increase due to the economic and unemployment consequences of the COVID-19 pandemic.

The food waste crisis is not new. Wasted, a 2012 report from the Natural Resources Defense Council, recorded Americans’ annual food waste at 40 percent. Horrified by the report’s findings, Leah Lizarondo, a food and health advocate who began her career working in consumer-packaged goods and technology, was inspired to find a solution.

“I tried to figure out why this inefficiency was happening—where the failing was in the supply chain,” Lizarondo says. She knew that consumer-facing businesses such as grocery stores and restaurants were the second-biggest culprits of food waste—behind American households. And even though these businesses didn’t intend to waste food, they lacked the logistics, structures, or incentives to redirect the food surplus to people experiencing food insecurity. Furthermore, because most wasted food is perishable, traditional waste methods didn’t work within the food-banking structure.

“It was so cheap to just throw food in a landfill,” Lizarondo comments. “There’s no legislation [in the United States] that prevents us from doing that, unlike other countries.” For example, France banned food waste in 2016, while Norway has stores that sell food past their sell-by dates, and Asian countries like Japan and South Korea have adopted their own regulations, including the latter charging a fee to citizens for each pound of food waste. Currently, California, Connecticut, Massachusetts, Rhode Island, and Vermont are the only US states with legislation enforcing organic waste bans.

In 2016, Lizarondo launched the nonprofit Food Rescue Hero, a technology platform that redirects food waste to the food insecure in cities across America.

Since its launch, Food Rescue Hero has given more than 68 million pounds of food to people in need. Currently, it operates in 12 cities in the United States and Canada, with more than 22,000 drivers volunteering their time….(More)”.

A 630-Billion-Word Internet Analysis Shows ‘People’ Is Interpreted as ‘Men’


Dana G. Smith at Scientific American: “A massive linguistic analysis of more than half a trillion words concludes that we assign gender to words that, by their very definition, should be gender-neutral.

Psychologists at New York University analyzed text from nearly three billion Web pages and compared how often words for person (“individual,” “people,” and so on) were associated with terms for a man (“male,” “he”) or a woman (“female,” “she”). They found that male-related words overlapped with “person” more frequently than female words did. The cultural concept of a person, from this perspective, is more often a man than a woman, according to the study, which was published on April 1 in Science Advances.

To conduct the study, the researchers turned to an enormous open-source data set of Web pages called the Common Crawl, which pulls text from everything from corporate white papers to Internet discussion forums. For their analysis of the text—a total of more than 630 billion words—the researchers used word embeddings, a computational linguistic technique that assesses how similar two words are by looking for how often they appear together.

“You can take a word like the word ‘person’ and understand what we mean by ‘person,’ how we represent the word ‘person,’ by looking at the other words that we often use around the word ‘person,’” explains April Bailey, a postdoctoral researcher at N.Y.U., who conducted the study. “We found that there was more overlap between the words for people and words for men than words for people and the words for women…, suggesting that there is this male bias in the concept of a person.”

Scientists have previously studied gender bias in language, such as the idea that women are more closely associated with family and home life and that men are more closely linked with work. “But this is the first to study this really general gender stereotype—the idea that men are sort of the default humans—in this quantitative computational social science way,” says Molly Lewis, a research scientist at the psychology department at Carnegie Mellon University, who was not involved in the study….(More)”.