musique contemporaine

Ircam - articles scientifiques notice originale

On Automatic Voice Casting for Expressive Speech: Speaker Recognition vs. Speech Classification

Type

text
 

Genre(s)

article
 

Forme(s)

document numérique
 

Cette ressource est disponible chez l'organisme suivant : Ircam - Centre Pompidou

Identification

Titre

On Automatic Voice Casting for Expressive Speech: Speaker Recognition vs. Speech Classification
 

Nom(s)

Obin, Nicolas (auteur)
 
Roebel, Xavier (auteur)
 
Bachman, Grégoire (auteur)
 

Publication

Florence, Italy , 2014
 

Description

Sujet(s)

voice casting   voice similarity   speaker recognition   speech classification
 

Résumé

This paper presents the first large-scale automatic voice casting system, and explores the adaptation of speaker recognition techniques to measure voice similarities. The proposed system is based on the representation of a voice by classes (e.g., age/gender, voice quality, emotion). First, a multi-label system is used to classify speech into classes. Then, the output probabilities for each class are concatenated to form a vector that represents the vocal signature of a speech recording. Finally, a similarity search is performed on the vocal signatures to determine the set of target actors that are the most similar to a speech recording of a source actor. In a subjective experiment conducted in the real-context of voice casting for video games, the multi-label system clearly outperforms standard speaker recognition systems. This indicates evidence that speech classes successfully capture the principal directions that are used in the perception of voice similarity.
 

Note(s)

Contribution au colloque ou congrès : IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP),
 

Localisation

Envoyer la notice

Bookmark and Share 
 

Identifiant OAI

 

Date de la notice

2014-02-15 01:00:00
 

Identifiant portail

 

Contact