musique contemporaine

Ircam - articles scientifiques notice originale

Timbre Characterisation and Recognition with Combined Stationary and Temporal Features






document imprimé

Cette ressource est disponible chez l'organisme suivant : Ircam - Centre Pompidou



Timbre Characterisation and Recognition with Combined Stationary and Temporal Features


Dubnov, Shlomo (auteur)
Rodet, Xavier (auteur)


Ann Arbor, USA , 1998



Timbre   Characterisation   Recognition   Vector Quantization   Universal Classification  


Classification and generation of sound require a modeling approach that takes into account, additionally to the common sound features, also the statistical behaviour of the sound components. Such statistics include the stationary random fluctuations in amplitude and frequency that occur during sustained portions of the sound and the stochastic behaviour of sound during its lifetime. In our work we have considered so far statistical models of the variations that occur during a sustained portion of the sound. Various aspects, such as phase coupling and its relation to Higher Order Statistical (HOS) analysis were investigated and shown to be important for sound characterization. The purpose of the current work is to extend this research towards modeling the temporal behaviour of sound. We are considering a unified model that combines spectral and HOS features and apply a new method for comparison between the temporal evolutions of these features. Typical applications envisioned are very broad and include characterisation for analysis/synthesis, coding and sound database retrieval. In order to understand the problems in comparing sounds, one must note that there are different temporal scales for sound behaviour. This includes short term correlations related to the timbral properties (such as formants), correlations due to pitch period, slower modulations such as vibrato, expressivity inflections, and transitions between different notes. Thus a sequence that might seem stationary on one time scale, departs from stationarity and ergodicity on another time scale. This situation poses a problem for assessing the right probability function for the sequence of samples. Moreover, for purposes of classification, introducing similarity measures between sounds is usually based upon specific models (like Markov models of a certain order) or apriori knowledge of the parametric shape of the probability distribution, a situation which we would like to avoid. A possible solution for this problem is to consider the Markovian property at different time scales by using multiple features and capturing their temporal behaviours. Thus, we consider a model composed of features that represent stationary segments (states) and transition between these states. For short time description of the sound we use a of spectral envelopes (Mel Frequency Cepstral Coefficients (MFCC), like in speech), which allow for up to 90% of data reduction in sound representation. Moreover, a vector quantisation (VQ) procedure further reduces the set of envelopes by optimally representing the complete dataset with just a few typical envelopes. In order to capture the information present in higher cepstral coefficients as well, additional parameters were used. These higher cepstral coefficients correspond to the excitation signal (also called the residual). Variations in the fundamental frequency and HOS parameters that describe the residual properties (such as kurtosis which is related to phase coupling) were used. The investigation into temporal structure of the signal was done along two lines: 1). the short time temporal evolution is described by specific features such as cepstral "difference" and "acceleration". The evolution is considered in terms of transition between "typical" envelopes found by VQ. This method gives excellent performance for limited data sets such as isolated notes by matching both the instantaneous spectral shapes and their evolution. 2). for the long term behaviour of the signal we applied information-theoretic tools for classification of the feature sequences. Using Ziv-Merhav ``universal'' sequence classification method, the cross-entropy comparison is done without estimation of a specific Markov model. The model requires long feature sequences to reveal its structure and is applicable for complex sounds such as note sequences and some non-musical sounds. The model, classification scheme and refinements for specific types of sounds will be presented in the paper.


Contribution au colloque ou congrès : ICMC: International Computer Music Conference


Envoyer la notice

Bookmark and Share 

Identifiant OAI


Date de la notice

2006-03-14 01:00:00

Identifiant portail