Extensions#
There are various avenues for future experimentation with hybrid scattering convnets.
In this tutorial, we only explored 1D convnets along time.
Time-frequency scattering in Kymatio offers the "joint"
output format, which is indexed by (path, frequency, time)
.
Therefore we can do 2D convolutions in the time-frequency domain, where the joint time-frequency scattering paths
serve as input channels to the convnet.
This was shown to be effective for classification of musical instrument solos with limited annotated data [1].
We did not explore the effectiveness of log-compression and mean normalization of scattering paths as was shown in [1] and [2].
We could explore joint classification of musical instrument and playing technique.
Various recent works have modelled human perception of similarity between sounds at the level of expressive playing techniques [2], [3].
References#
Vahidi, Cyrus, et al. “Differentiable Time-Frequency Scattering On GPU.” arXiv preprint arXiv:2204.08269 (2022).
Lostanlen, Vincent, et al. “Time–frequency scattering accurately models auditory similarities between instrumental playing techniques.” EURASIP Journal on Audio, Speech, and Music Processing 2021.1 (2021): 1-21.
Vahidi, Cyrus, et al. “Perceptual musical similarity metric learning with graph neural networks.” 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2023.