A spectral envelope is a curve in the frequency-magnitude plane which envelopes the short time spectrum of a signal, e.g. connecting the peaks which represent sinusoidal partials, or modeling the spectral density of a noise signal. It describes the perceptually pertinent distribution of energy over frequency, which determines a large part of timbre for instruments, and the type of vowel for speech. Because of the importance of using spectral envelopes for sound synthesis, a more high level approach to their handling is taken here. We present programs developed using spectral envelopes for analysis, representation, manipulation, and synthesis. Spectral envelopes can be estimated by linear prediction, cepstrum or discrete cepstrum. The strong and weak points of each are discussed relative to the requirements for estimation, such as robustness and regularity. Improvements of discrete cepstrum estimation (regularization, statistical smoothing, logarithmic frequency scale, adding control points) are presented. For speech signals, a composite envelope is shown to be advantageous. It is estimated from the sinusoidal partials and from the noise part above the maximum partial frequency. The representation of spectral envelopes is the central point for their handling. A good representation is crucial for the ease and flexibility with which they can be manipulated. Several requirements are laid out, such as stability, locality, and flexibility. The representations (filter coefficients, sampled, break-point-functions, splines, formants) are then discussed relative to these requirements. The notion of fuzzy formants based on formant regions is introduced. Some general forms of manipulations and morphing are presented. For morphing between two or more spectral envelopes over time, linear interpolation, and formant shifting which preserves valid vocal tract characteristics, are considered. For synthesis, spectral envelopes are applied to sinusoidal additive synthesis and are used for filtering the residual noise component. This is especially easy and efficient for both components in the FFT-1 technique. Finally, in additive analysis, spectral envelopes can be generalized not only to apply to magnitude, but also to frequency and phase, while keeping the same representation. The frequency envelope expresses harmonicity of partials over frequency, the phase envelope expresses phase relations between harmonic partials. With this high level approach to spectral envelopes, additive synthesis can avoid the dilemma of how to control hundreds of partials, and the residual noise part can be treated by the same manipulations as the sinusoidal part by using the same representation. Also, high quality singing voice synthesis can use morphing between sampled spectral envelopes and formants to combine natural sounding transitions with a precisely modeled sustained part. Abovementioned methods have been implemented in a C-library using the SDIF standard for sound description data as file format and are used in various real-time and non real-time programs on Unix and Macintosh.
Contribution au colloque ou congrès : ICMC: International Computer Music Conference