SampleMatch: A model that automatically retrieves matching drum samples for musical tracks

MegustaNovember 14, 2022November 14, 2022no commentNo tags

SampleMatch: A model that automatically retrieves matching drum samples for musical tracks — Credit: StableDiffusion / Stefan Lattner.

Machine learning-based computational models have been successfully applied to a broad range of complex information processing tasks, including those that involve retrieving specific data items from large archives. Researchers at the Sony Computer Science Laboratories (CSL) in France have been trying to develop machine learning techniques that could help music producers to easily identify and retrieve specific audio samples from a database.

To this end, Stefan Lattner, a researcher at Sony CSL, recently introduced SampleMatch, a machine learning-based model that can automatically retrieve drum samples that match a specific music track from large archives. His model is set to be presented in December at the ISMIR 2022 conference, a leading event that focuses on music information retrieval.

“Our music team at Sony CSL is working on AI that could make the life of music producers easier,” Stefan Lattner, one of the researchers who carried out the study, told TechXplore. “In music production, there are many tasks for which AI could be valuable. One such task that is currently relatively tedious is drum sample selection.”

Drum sample selection is the process through which music producers must search for drum samples that would work well with specific drum-less music tracks. As drum sample libraries are typically large, identifying suitable drum samples can be very time and energy consuming.

Currently, music producers only have access to a few rudimentary computational tools designed to assist them with drum sample selection processes. These primarily include filtering a large dataset by tags or keywords.

A few years ago, Lattner set out to develop a new system that could retrieve drum samples in a more intuitive and effective way. Due to the limitations of technology available at the time, however, this system needed to be relatively complex.

“I found that the system I previously created was not very elegant, so I didn’t publish it,” Lattner explained. “With the recent advances in contrastive learning (and improvements in neural network encoders), it has become much easier to estimate if two data points fit together. As a result, the system became more general, and my method could be used to estimate the fit of many kinds of sounds.”

When using SampleMatch, musicians can input their track into their system at any stage of production. The system then automatically sorts a drum sample library based on what it calculates would match best with it.

Lattner trained SampleMatch using a large dataset of 4,830 electronic music tracks and 885 famous pop/rock tracks. Specifically, he used audio pairs of instrumental music (i.e., synthetic bases, bass, guitar, pad, strings, choir, keyboard, and vocals) and matching drum tracks.

“SampleMatch was trained on audio pairs that we knew would match,” Lattner said. “Now, when we show a new pair to the model, it will provide a ‘matching score.’ While there are already systems that match audio samples using extracted musical features, their retrieval quality depends on the pre-defined features and type of samples. For drum samples, it is not even clear which features we should look at to compute a matching score.”

While Lattner trained his model to learn what drum samples matched a specific track, it could also be used for other forms of audio matching. By using different training sample pairs, in fact, SampleMatch could also be taught to retrieve matching bass, guitar, or other instrumental tracks.

“Some aesthetic choices a musician performs in music production are still mysterious.” Lattner said. “While it is obvious that an instrument should not play out of key, with drum samples, there is no theory why some fit your track, and some don’t. By showing examples, a computer can now learn the aesthetic principles we apply when listening. In some way, the computer learns to listen like a human.”

In the future, the audio retrieval model created by Lattner and his colleagues at Sony CSL could assist music producers in sourcing suitable drum samples or other instrumental samples for their tracks. In addition, a close analysis of how the system learned to organize data could help to devise new theories that might guide music production efforts. More specifically, the reverse-engineering of the system might allow the researchers to outline some general rules that musicians should follow when mixing their music.

“In our future works, we want to combine this method with our DrumGAN technology to generate drum samples that match a given track directly,” Lattner added. “Meanwhile we also want to extend SampleMatch to other kinds of samples.”