Can Machines Learn Filterbank Design?

Europe/Vienna
Description

ARI Guest Talk, more information at https://www.oeaw.ac.at/en/ari/ari/about-ari/event/ari-guest-talk-10

Organised by

Nicki Holighaus

Nicki Holighaus
    • 1
      Can Machines Learn Filterbank Design

      MLA2S is delighted to co-host the ARI Guest Talk "Can Machines Learn Filterbank Design?" by Vincent Lostanlen of the Laboratoire des Sciences du Numérique de Nantes (LS2N).

      In this talk, the speaker will introduce a “hybrid” approach to filterbank design, a critical pre-processing step for machine listening. This method enhances stability, sample efficiency, and parameter efficiency in training by combining wavelets and neural networks. Real-world applications will include examples from bioacoustics to speech enhancement. The talk is of particular interest for researchers working with time-series data, in particular audio signals. We look forward to seeing you there for this sound experience!

      About the Speaker
      Vincent is a scientist (chargé de recherche) at CNRS, the French National Center for Scientific Research, affiliated with the Laboratoire des Sciences du Numérique de Nantes (LS2N). He works on the mathematical and computational foundations of machine listening technologies, with applications to biodiversity monitoring and computer music. He is the recipient of a Young Researcher grant from ANR, the French national agency, named “Multi-Resolution Neural Networks” (MuReNN), with Peter Balazs as a partner. Website: https://audio.ls2n.fr, https://www.lostanlen.com

      Full Abstract
      Filterbank analysis is an essential component of machine listening as a pre-processing step before pattern recognition in the time-frequency domain. In speech and music signal processing, filterbank design is often accomplished from prior knowledge about auditory perception and the tuning of musical instruments. Yet, this kind of prior knowledge is not available in emerging domains of machine listening, such as bioacoustics, urban acoustics, industrial acoustics, and medical acoustics. In this context, one solution is to replace filterbank design with a data-driven procedure involving training a neural network on the "raw waveform." In this talk, I will outline an ongoing research program toward making this training procedure more stable, sample-efficient, and parameter-efficient. The key idea is to train separate convolutional operators over the subbands of a non-learned filterbank: typically, a discrete wavelet transform (DWT). This kind of "hybrid" approach, combining digital signal processing and machine learning, can be justified formally via simple techniques in linear algebra and probability theory. I will present some insightful numerical simulations and a real-world application to speech enhancement, conducted in collaboration with some members of the ÖAW, namely, Daniel Haider, Felix Perfler, Martin Ehler, and Peter Balazs.

      References
      Fitting auditory filterbanks with multiresolution neural networks. IEEE WASPAA 2023.
      https://arxiv.org/abs/2307.13821

      Hold Me Tight: Stable encoder–decoder design for speech enhancement. INTERSPEECH 2024.
      https://arxiv.org/abs/2408.17358

      Instabilities in convents for raw audio. IEEE SPL 2024.
      https://arxiv.org/abs/2309.05855

      Speaker: Vincent Lostanlen (Laboratoire des Sciences du Numérique de Nantes (LS2N))