Google’s new AI can hear a snippet of song—and then keep on playing

Article by Tammy Xu: “The new AI system can generate natural sounds and voices after being prompted with a few seconds of audio.

AudioLM, developed by Google researchers, produces sounds that match the style of reminders, including complex sounds like piano music or human voices, in a way that is nearly indistinguishable from original record. The technique shows promise in terms of speeding up the training of AI to generate audio, and it could eventually be used to automatically generate music to accompany videos.

AI-generated audio has become ubiquitous: voices on home assistants like Alexa use natural language processing. AI music systems like OpenAI’s Jukebox have produced impressive results, but most current techniques require people to prepare transcriptions and label training data based on text, which does It takes a lot of time and human labor. For example, Jukebox uses text-based data to generate lyrics.

AudioLM, described in a non-peer-reviewed paper Last month was different: it didn’t require transcription or labeling. Instead, an audio database is fed into the program, and machine learning is used to compress the audio files into audio clips, called “tokens,” without losing too much information. This encrypted training data is then fed into a machine learning model that uses natural language processing to learn the audio samples.

To generate sound, a few seconds of audio is fed into AudioLM, then predict what happens next. This process is similar to how language models like GPT-3 predict sentences and words that often follow one another.

Sound clip released by the team sounds quite natural. In particular, piano music created with AudioLM sounded more fluid than piano music created with existing AI techniques, which tends to sound chaotic…(More)”.