Cade Metz at The New York Times: “…These are not systems that anyone can properly evaluate with the Turing test — or any other simple method. Their end goal is not conversation.
Researchers at Google and DeepMind, which is owned by Google’s parent company, are developing tests meant to evaluate chatbots and systems like DALL-E, to judge what they do well, where they lack reason and common sense, and more. One test shows videos to artificial intelligence systems and asks them to explain what has happened. After watching someone tinker with an electric shaver, for instance, the A.I. must explain why the shaver did not turn on.
These tests feel like academic exercises — much like the Turing test. We need something that is more practical, that can really tell us what these systems do well and what they cannot, how they will replace human labor in the near term and how they will not.
We could also use a change in attitude. “We need a paradigm shift — where we no longer judge intelligence by comparing machines to human behavior,” said Oren Etzioni, professor emeritus at the University of Washington and founding chief executive of the Allen Institute for AI, a prominent lab in Seattle….
At the same time, there are many ways these bots are superior to you and me. They do not get tired. They do not let emotion cloud what they are trying to do. They can instantly draw on far larger amounts of information. And they can generate text, images and other media at speeds and volumes we humans never could.
Their skills will also improve considerably in the coming years.
Researchers can rapidly hone these systems by feeding them more and more data. The most advanced systems, like ChatGPT, require months of training, but over those months, they can develop skills they did not exhibit in the past.
“We have found a set of techniques that scale effortlessly,” said Raia Hadsell, senior director of research and robotics at DeepMind. “We have a simple, powerful approach that continues to get better and better.”
The exponential improvement we have seen in these chatbots over the past few years will not last forever. The gains may soon level out. But even then, multimodal systems will continue to improve — and master increasingly complex skills involving images, sounds and computer code. And computer scientists will combine these bots with systems that can do things they cannot. ChatGPT failed Turing’s chess test. But we knew in 1997 that a computer could beat the best humans at chess. Plug ChatGPT into a chess program, and the hole is filled.
In the months and years to come, these bots will help you find information on the internet. They will explain concepts in ways you can understand. If you like, they will even write your tweets, blog posts and term papers.
They will tabulate your monthly expenses in your spreadsheets. They will visit real estate websites and find houses in your price range. They will produce online avatars that look and sound like humans. They will make mini-movies, complete with music and dialogue…
Certainly, these bots will change the world. But the onus is on you to be wary of what these systems say and do, to edit what they give you, to approach everything you see online with skepticism. Researchers know how to give these systems a wide range of skills, but they do not yet know how to give them reason or common sense or a sense of truth.
That still lies with you…(More)”.