musique contemporaine

Ircam - articles scientifiques notice originale

Conversion de la voix de haute qualité




mémoire ou thèse


document imprimé

Cette ressource est disponible chez l'organisme suivant : Ircam - Centre Pompidou



Conversion de la voix de haute qualité


Villavicencio, Fernando (auteur)


Université Paris 6 (UPMC) , 2010



synthese de la parole   analyse de la parole   analyse cepstral   prediction linéaire   speech synthesis   speech analysis   cepstral analysis   linear prediction.


This dissertation address a work on the field known as Voice Conversion. This technology refers to the ability to modify the perceived voice identity of a speaker to render it similar to that of a specific target one. A Voice Conver- sion system consists basically in the analysis and modification of the source speech after conversion of the timbre information (spectral envelope), com- monly achieved by statistical modeling. However, natural speech quality has been rarely observed following the current approaches. Some degradations can result from the conversion process and, in general, a reduction on the overall quality of the converted speech is commonly perceived. In addition, the con- version effect is not considered fully satisfactory since the converted speech is not always perceived as being similar to that of the target speaker. Finally, note that the speech signals used until now has been restricted to low-medium quality sample-rates ([8 − 16]). The problems just described can be principally attributed to an insufficient performance of the source-target mapping of the timbre features as well as an inefficient modeling and modification of the timbre information. In particular, the spectral envelope models used to represent the timbre features, typically based on Linear Prediction or cepstral analysis (MFCC), observe systematic errors and can not been considered in general as performing efficient esti- mation of the underlying transfer-function of the signal (source-filter model). Accordingly, we consider that, following these techniques, proper extraction and modeling of the timbre information cannot be achieved. The goal of our research work was the application of Voice Conversion on high-quality speech. Our main interests were established in the improvement of current systems quality and the use of high-quality speech. To achieve this, we focused our motivation into the study of improved spectral envelope modeling and timbre modification. The benefits provided by a cepstrum-based technique known as True Enve- lope to achieve efficient envelope estimation were studied and experimentally verified. A model including perceptual criteria and accurate target informa- tion was defined to evaluate the conversion performance instead of the classical error measure based on poorly estimated envelope parameters. The improved envelope models were applied to a Voice Conversion framework based on Gaus- sian Mixture Modeling, resulting in increased timbre conversion performance. A strategy to automatically select the order of the envelope models was also derived, allowing increased extraction of the source timbre features. Finally, a technique to achieve improved modified-timbre speech synthesis based on the LP-PSOLA technique and Line Spectral Frequencies parameterization was proposed. The resulting Voice Conversion methodology showed improved ob- jective and subjective performance compared to the classical one based on Linear Prediction.


Envoyer la notice

Bookmark and Share 

Identifiant OAI


Date de la notice

2011-04-06 02:00:00

Identifiant portail