musique contemporaine

Ircam - articles scientifiques notice originale

Singing Voice Detection in Music Tracks using Direct Voice Vibrato Detection

Type

text
 

Genre(s)

article
 

Forme(s)

document numérique
 

Cette ressource est disponible chez l'organisme suivant : Ircam - Centre Pompidou

Identification

Titre

Singing Voice Detection in Music Tracks using Direct Voice Vibrato Detection
 

Nom(s)

Regnier, lise (auteur)
 
Peeters, Geoffroy (auteur)
 

Publication

2009
 

Description

Sujet(s)

Singing voice detection   vibrato detection   voice segmentation   vibrato and tremolo parameters extraction   feature extraction.
 

Résumé

In this paper we investigate the problem of locating singing voice in music tracks. As opposed to most existing methods for this task, we rely on the extraction of the characteristics specific to singing voice. In our approach we suppose that the singing voice is characterized by harmonicity,formants, vibrato and tremolo. In the present study we deal only with the vibrato and tremolo characteristics. For this, we first extract sinusoidal partials from the musical audio signal . The frequency modulation (vibrato) and amplitude modulation (tremolo) of each partial are then studied to determine if the partial corresponds to singing voice and hence the corresponding segment is supposed to contain singing voice. For this we estimate for each partial the rate (frequency of the modulations) and the extent (amplitude of modulation) of both vibrato and tremolo. A partial selection is then operated based on these values. A second criteria based on harmonicity is also introduced. Based on this, each segment can be labelled as singing or non-singing. Post-processing of the segmentation is then applied in order to remove short-duration segments. The proposed method is then evaluated on a large manually annotated test-set. The results of this evaluation are compared to the one obtained with a usual machine learning approach (MFCC and SFM modeling with GMM). The proposed method achieves very close results to the machine learning approach : 76.8% compared to 77.4% F-measure (frame classification). This result is very promising, since both approaches are orthogonal and can then be combined.
 

Note(s)

Contribution au colloque ou congrès : ICASSP
 

Localisation

Envoyer la notice

Bookmark and Share 
 

Identifiant OAI

 

Date de la notice

2010-02-25 01:00:00
 

Identifiant portail

 

Contact