A-Team : Formant Wave Function Synthesizer : UTS 48770
From SoftwarePractice.org
Contents |
Introduction
This is the project documentation site for the A-Team in Signal Processing 48770 at UTS in the Spring Semester 2006. We will endeavour to create a Formant Wave Function (FWF) Synthesizer in MATLAB.
A-Team Members:
Jace Davies
John Gorgees
Stephen Hinwood
Tungnguyen Vupham (Tung)
Background information
Sound Synthesis: Methods and Tools
The domain of sound analysis, processing and synthesis has been developed since the 50's and is still advancing rapidly in terms of algorithm and technology. Throughout the decades, various methods of analysing, processing and synthesis of sound, namely Additive + residual; Spectral Envelope; Resonance Model Analysis; Formant Wave Functions and the like, has been developed to suit the specific needs/environment of composers and researchers.[1] In this project we will briefly cover the Additive + residual, Spectral Envelope and Resonance Model Analysis so that we can spare time to deeply investigate the Formant Wave Functions Synthesis.
Additive + Residual method
'Additive and Residual' is commonly regarded as the most powerful and flexible method of sound synthesis due to its simplicity of representing the sound signals. The method involves representing sound signals as a summation of time varying sinusoid components, in the case that signal is non-sinusoidal, the modelled signal is added a residual or noise. In doing so, the method allows the pitch and length of the sound signal to be varied independently. The main parameters of this method are magnitude and frequency. These parameters are meaningful and easily understood due to the easiness of mapping frequency and magnitude to human perception, thus made it more attractive to composers and researchers. The original algorithm required cumbersome calculations to achieve high resolution and precision. There is a new method of Inverse Fast Fourier Transform (fft-1), which increases the efficiency and reduces the computation cost by the factor of 15 compared with the older procedure. However, this method can be improve by grouping large number of sinusoidal partials on statistical basis, applying the High Resolution Matching Pursuit (HRMP) algorithym in cases where sinusoidal model is inadequate.[2]
Spectral Envelope
This is the analysis of the curve in frequency magnitude of the short time spectrum of a signal. When the peaks are connected, this line represents the sinusoidal partials. The sinusoidal partials modelled the spectral density of a noise signal which can be used to determine the vowel for speech and the timbre for instruments. This can be estimated by linear prediction, cepstrum or discrete cepstrum. The estimation are conducted from the sinusoidal partials and noise above the maximum partial frequency. For general modification of the signal, one can simply morph 2 or more envelope overtime or just formant shifting. For synthesis, this method is usually used inconjunction with sinusoidal additive synthesis, or fft-1 for greater simplicity and efficiency.
Resonance Model Analysis
It is essentially the sum of exponentially damped sinusoids which id usually also the response mode of the instrument. The fundamental procedure of this method is to search for sinusoids in 2 FFT. One FFT is employed on the window situated at the beginning of the sound signal where resonance is of maximum amplitude. The other FFT is located further in the sound signal where resonances have damped. The resonance will appear as peaks in both transforms with similar frequency. The peaks are then matched and couples with close enough frequncies are considered as resonance. The procedure is repeated with various window size and positions for greater precision. The result is a Resonance model of the sound signal with parameters of amplitude, decay rates and frequency. Modification of these charateristics will yield in synthesised sound as output.
Formant Wave Function Synthesis
The Formant Wave Function (FWF) or originally known as FOF (Forme d'Onde Formantique) is a method for directly extracting the amplitude of the waveform of a signal as a function of time. [3] There are many advantages to this method which made it so persuasive. First and foremost, the FWF formula is fast and easy, it uses limited number of function and is mostly used for general purpose synthesis. Secondly, it allows modelling of the signals without a need to separate the excitation function and the filter. Lastly, Linear Prediction Code (LPC) and/or FT can be used to automatically extract the formant frequency, bandwidth, and other parameters from pre-recorded sound.[4]
The method separates the signal into partials on the frequency domain corresponding to one fundamental frequency called partial wave function. Summing these partail wave functions, or can be considered as FWF, will reconstitute the original signal, the spectrum of the FWF is an envelope. The FWF will then be tabulated, each can be independently manipulated in terms of amplitude or in time to change the spectral shape. Complex sound signal can be synthesis using a large and comprehensive lookup table, however, the down side is file storage.
[5]
Insert non-formatted text hereFOF (Function d”Onde Formatique) is based on a synthesis method developed by Xavier Rodet for the CHANT program at IRCAM. It produces a series of partials, shaped into a formant (resonant) region that can be used to build up a vocal or instrumental simulation. Since the operation works in the time domain, generating a sequence of excitations or grains, it can also be used for granular synthesis. One very effective technique is to move from a timbral to a granular texture.
A classic approach to formant synthesis was implemented in Matlab using Peterson and Barney’s formant tables to create a formant matrix, which contains the first three formant frequency values from 10 American-English monophthong vowels as spoken by 76 speakers (33 men, 28 women and 15 children). Although it has only three control parameters - pitch, gender (male, female, or child), and vowel type - it is far more suitable for hyper-spectral data sonification because if data values where are to the amplitudes and the bandwidths of formant peaks, vowel sounds with different sonority could be obtained.
So in a few words, what are formants? Formants are merely the resonant frequencies, or the transfer functions, of the vocal tract. In speech, two formants in the harmonic spectrum are used to categorize the vowels. The rapid variation of formants during transients also contributes to the recognition of consonants, as do the frequencies of the broad spectral bands, and timing cues. Together, these parameters are used to convey phonemes, and thus code the 'textual information'—the information retained in a transcription.
Formants can also be described as the enhanced bands of frequencies in the sound the instruments. For instance, the bridge and body of a good violin will resonate very efficiently in the range 1 to 4 kHz – we say it has a formant in that range. If the violin plays its lowest note G3 with f 196 Hz, then this formant gives strong harmonics between the 5th and 20th. If it plays the G four octaves higher with f 1570 Hz, then this formant gives strong 1st and 2nd harmonics.
A more spectacular example of an instrument with strong formants is the voice. Depending on the shape of the mouth, different frequency bands (formants) are radiated more strongly, no matter what the pitch of the voice. We use these formants to identify all vowels and many consonants. Thus if we hear a note which has loud harmonics at or near the frequencies 400, 2000 and 2600 Hz, we will usually identify it as the vowel I (as in “bit”), independently of the pitch of the note.
Are there any categorizations which differentiate between formants? Steady and varying formants are the two main categories, steady formants being widely featured in the acoustics of speech and music, as they are a major component of sustained phonemes and instrumental timbre respectively.
Have there been many methods devised in the light of formant synthesis? There have been a number of methods developed over the past two and a half decades. These are summarized as follows:
- Excitation synchronous formant analysis- which was particularly devised to analyze female speech. The most successful method being that of the closed-phase formant analysis when compared to the pitch-synchronous or the fixed frame formant analysis methods. Closed-phase allows for a more precise tracking of the transients of the vocal tract with fewer missed or extra formants and enhanced formant continuity. Popularly, closed phase is achieved by setting the analysis interval to less than that of the pitch interval.
- Pitch synchronous formant analysis: the analysis interval is equal to the whole pitch interval. This method has been widely used, particularly for high-quality speech synthesis research
- Fixed-frame analysis: this is a very common analysis method where the analysis is fixed, typically, between 2G30 ms with a frame rate of 10 ms.
- Source-filter model: To generate vowel-like sounds a model of the characteristics of the vowel sound production mechanisms is constructed using a band-limited impulse train as a glottal source subsequently filtered by a parallel or cascaded series of resonators with appropriate corresponding formant frequencies and bandwidths.
FWF synthesis is a great method of sound synthesis but it does have its limitations. Firstly, FWF algorithym requires a lookup table of parameters to operate its functions. The higher the resolution and precision, the more comprehensive and cumbersome the lookup table will get. This will result in greater file storage space which will hinders the computational efforts as well as the portability of the algorithym. Secondly, the FWF algorithym does not directly allow for cross- synthesis between two sounds. Thirdly, FWF is still based on linear model which is not best to accurately synthesis voice. Lasstly, from a composer point of view, it is not the most convenient nor easily understood as other methods of sound synthesis.
A formant is a phenomenon that describes some of the resonances of a sound. A vowel sound is made up of a certain set of formants, while a clarinet sound is made up of another set.
Formant wave functions mathematically describe the impulse response of formants in the time domain. An instrument sound can be generated as follows:
- The designer chooses a set of formants that when added together will describe the instrument's sound
- The Formant Wave Functions of each of those formants are found
- The fundamental frequency of a tone on the well-tempered scale (see below) is fed into each Formant Wave Function as a parameter
- The output of each excited Formant Wave Function is added to the others
- Points 3 and 4 above are repeated at the begenning of each period of the tone wave
We are going to achieve this in Matlab.
The Plan
- Decide on an instrument to model
- Determine the instrument's formants
- Find that instrument's Formant Wave Functions
- Write those funtions as Matlab code
- Write code in Matlab that feeds an arbitrary pure (discrete time) tone into each of the Formant Wave Functions, and adds the resultant vectors together
- Loops the above process at an interval equal to the period of the pure tone
- Outputs the sound using Matlab's inbuilt tools
Function Generator
- Musical Note FundamentalFrom Dr. Ty J. Prosa [6].
The following zip file contains a MATLAB function file for each musical note 'A' through 'G'and a MATLAB M-file that calls each of the musical note function files in order to play the well known tune of 'Happy Birthday'.
The musical note function files have two inputs, (o,L) and result in a single row output of length according to input parameter, 'L'. 'L' is the length the signal will be created for in seconds. 'o' is the octave or the number to multipy the base frequency of each note by. (ie, middle C is fundamental C multiplied by 5, 65.41*5=327.05Hz) Pauses are used to prevent notes playing over the top of each other. Note that pauses are not as long as the length of the note. This has the effect of cutting the note off before it ends and hence preventing a 'click' when the note ends.
A modified version of the script in 'happybirthday.m' will be used to feed in to our FWF filters.
Creating The Tune
As we chosse to play our own variation of Happy Birthday, it was necessary to first construct the song as pure tones. This was done using the functions above. After spending a lot of time to get the song timing right, the notes were then given an exponential decay to mimic a 'real note', decaying with time. It was quickly found that a single value for the exponential decay could not be used due to the varying times the notes were called for. I then constructed a simple 'IF' statement to relate the decay to the time variable that the note is called for. This mimics real life notes that resonate longer for notes played long, or decays quickly for notes played to be short.
MATLAB Code for adjusting exponential decay:
d = abs((Time-1)*10);
if d==0 d = 1; end
As seen below, a decay of 5 was leaving a period of very soft sound, (in fact you could not hear it). By creating all functions with less decay meant notes played for shorter periods did not resonate out. The below demonastrates results of the above code, by decaying the note but still being audible for time periods up to 1 second
Now we have a tune that sounds similar to Happy Birthday, with simulated resonating notes that can be fed into our FWF.
Finding the Formant Wave Function
The goal of FWF synthesis is, of course, to generate instrumental sounds from a pure sinusoidal input. This is the role of the Formant Wave Function itself. Due to its simplicity, we settled on the Additive and Residual Method described above.
Mathematically speaking, the sine wave f(t) maps all values of t in the interval [0,infinity) to the set of real numbers R, with a range in the interval [-1,1]. It is smooth, continuous and periodic.
(Note that both of the above defined intervals are sub-sets of R.)
Let g(t) be a function defined across the domain of f(t), such that g(f(t)) maps the range of f(t) to R.
g(t) is defined to be the Formant Wave Function, and it is continuous and smooth across its domain.
In engineering terms, g(t) is to act as an envelope function to scale the amplitude of f(t) throughout all values of t -- essentially applying distortion.
Team A used a tool called COLEA to generate the frequency spectrum of the formants of an audio signal in MATLAB. It is from this spectrum that the FWF is to be generated.
Since a requisite part of this project is to use MATLAB, the formants' magnitude frequency spectrum was sampled at evenly spaced intervals, and the magnitudes of the samples used as the co-efficients (An) for the fourier series expansion of the FWF:
g(t) = {sum from n=0 to infinity} of AnCos(2pin fo t + phin)
where fo is the frequency of the samples and n marks the nth sample.
(Please note that it appears the <math> </math> tags in this wiki are broken.)
The values of An will, in practice, describe the nature of the sound the synthesiser makes.
To test this idea, we used MATLAB to find the inverse Fast Fourier Transform of the formants' spectrum:
% Here are samples of the formant's spectrum, taken at 10.767 Hz intervals
% These will act as the fourier co-efficients. The original spectrum had magnitude in dB
% and they've been converted to a linear scale
Gf = [72443.6 ... 125.17];
% Find the time vector (there were 512 samples)
t=[0:(1/512):(1-1/512)];
% Now find the inverse FT of the spectrum
gt=ifft(Gf);
% Find the magnitudes of gt
gta=abs(gt);
% It might help to look at the time-domain formant over a longer period
% Define a new time vector
t2=[0:(1/512):(3-1/512)];
% Now concatenate the magnitude vector from above
gta2=[gta gta gta];
% Plot the time-domain representation
plot(t2,gta2)
title('Formant Time Domain Representation')
xlabel('Time')
ylabel('Amplitude')
% Plot the spectrum
plot(f,Gf)
title('Formant Spectrum')
xlabel('Frequency')
ylabel('Magnitude')
The inverse fourier transform (IFT) created a benchmark for g(t). We compared it with the results of summing the first fifteen elements of the FWF's fourier series expansion:
% Define the time vector. The sampling frequency is at CD quality
t3=[0:1/44100:(20-1/44100)];
% Now set up the formant wave function fourier series. I'm naming the FWF "z".
z1=Gf(1)*cos(1*2*pi*t3);
z2=Gf(2)*cos(2*2*pi*t3);
z3=Gf(3)*cos(3*2*pi*t3);
z4=Gf(4)*cos(4*2*pi*t3);
z5=Gf(5)*cos(5*2*pi*t3);
z6=Gf(6)*cos(6*2*pi*t3);
z7=Gf(7)*cos(7*2*pi*t3);
z8=Gf(8)*cos(8*2*pi*t3);
z9=Gf(9)*cos(9*2*pi*t3);
z10=Gf(10)*cos(10*2*pi*t3);
z11=Gf(11)*cos(11*2*pi*t3);
z12=Gf(12)*cos(12*2*pi*t3);
z13=Gf(13)*cos(13*2*pi*t3);
z14=Gf(14)*cos(14*2*pi*t3);
z15=Gf(15)*cos(15*2*pi*t3);
% Add the cosines together
zref=z1+z2+z3+z4+z5+z6+z7+z8+z9+z10+z11+z12+z13+z14+z15;
% Let's plot the FWF and compare it to the IFT graphed above
% First, find the magnitudes
za=abs(zref);
% Now do the plot
plot(t3,za)
title('Formant Wave Function')
xlabel('Time')
ylabel('Amplitude')
It's clear from the graphs that the function described by the summation of fifteen elements of the fourier series is a good approximation of the formants' IFT, with the notable exception of a large amount of amplitude distortion in the function compared to the IFT.
To circumvent this, the function was scaled by a factor of 1/100000; giving a much more pleasing result when matlab produced sound:
% Find z, a scaled down version of the sum of the fifteen fourier elements
z=(1/100000).*(z1+z2+z3+z4+z5+z6+z7+z8+z9+z10+z11+z12+z13+z14+z15)
% Now plot it
za=abs(zref);
plot(t3,za)
title('Formant Wave Function')
xlabel('Time')
ylabel('Amplitude')
Here was a usable g(t), ready to operate on the input sinusoid f(t).
All that remained was to find our goal, g(f(t)), was to "feed" a sinusoid at a particular frequency into the formant wave function, and a note could be made:
% Now to feed in the fundamental frequency of a note, and apply it to the formant sigproc=sin(2*pi*327.05*t3).*z; % Make the sound sound(sigproc,44100)
Success: we've found an elegant means to make a formant wave function simply by finding a set of spectrum samples: any set of 512 co-efficients can be used to create a unique sound.
The Final Tune
The FWF above was then used in each of the note function files and called using the same 'happybirthday.m' script, giving our final version of Happy Birthday played using a FWF synthesiser in MATLAB.
Glossary
A list of some terms that we needed to understand in order to successfully research this assignment
Formant - As Above
Timbre - This is a subjective quality that enables listeners to tell the difference between two sound of identical pitch at the same volume. Fundamentally it is the difference in frequency content of the sounds.[7]
Staccato - Detached sounds, indicated by a dot over or under a note. The opposite of legato.[8]
Legato - A smooth and gliding style of singing or playing; the opposite of Iegato is marcato (in a marked, punchy style) or even staccato (in an even shorter, more aggressive style).[9]
Pitch - The location of a sound on a scale ranging from high to low.[9]
Tempo - The speed of a musical passage or composition; the tempo may range from very slow ("largo" in Italian, "langsam" in German) to extremely fast ("presto" in Italian, "schnell" in German).[9]
Music - The broadest definition of music is organized sound.[10]
References
- ↑ Sound Analysis, Processing and Synthesis Tools for Music Reseach and Production, X. Rodet - XIII CIM 2000, l'Aquila, Sept. 2000
- ↑ Musical Sound Signal Analysis/Synthesis: Sinusoidal + Residual and Elementary Waveform Models, X. Rodet - TFTS, 1997
- ↑ Time-Domain Formant-Wave-Function Synthesis, X. Rodet Comuter Music Journal, 8, (3): -- 14, 1980.
- ↑ Toward The Perfect Audio Morph? Singing Voice Synthesis and Processing, P. Cook Princeton University Computer Science Department
- ↑ Toward The Perfect Audio Morph? Singing Voice Synthesis and Processing, P. Cook Princeton University Computer Science Department
- ↑ http://www.hamline.edu/~tjprosa/old/sound_j2004.htm
- ↑ http://www.songstuff.com/glossary/T
- ↑ http://library.thinkquest.org/2791/MDCTARY/S.htm
- ↑ 9.0 9.1 9.2 http://www.vaopera.org/html/allaboutopera/termstoknow.cfm
- ↑ http://en.wikipedia.org/wiki/Music
The following link is to a usefull 'frequency to note' converter which we will use to test our derived formants of the piano: http://www.phys.unsw.edu.au/music/note/
L.C. Wood and D.J.B. Pearce. Excitation synchronous formant analysis. IEE PROCEEDINGS, Vol. 136, Pt. I, No. 2, APRIL I989
Minkyu Lee, Jan van Santen, Bernd Möbius, and Joseph Olive. Formant Tracking Using Context-Dependent Phonemic Information. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 5, SEPTEMBER 2005 741
Peterson, G.E. and H.L. Barney. Control methods used in a study of the identification of vowels. Journal of the Acoustical Society of America, 1952, 24, 175-184.
Joe Wolfe. SPEECH AND MUSIC, ACOUSTICS AND CODING, AND WHAT MUSIC MIGHT BE 'FOR'. Proceedings of the 7th International Conference on Music Perception and Cognition, Sydney, 2002
Akira Watanabe. Formant Estimation Methods Using Inverse-Filter Control. IEEETRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 4, MAY 2001 317
Kyogu Lee, Gregory Sell, Jonathan Berger. SONIFICATION USING DIGITAL WAVEGUIDES AND 2- AND 3-DIMENSIONAL DIGITAL WAVEGUIDE MESH. Proceedings of ICAD 05-Eleventh Meeting of the International Conference on Auditory Display, Limerick, Ireland, July 6-9, 2005
Journal
Plan 1 - We have preliminarily chosen to simulate a piano if we can get the correct formants to allow this
Plan 2 - We have borrowed: Computer Music Tutorial - Curtis Roads From the UTS library
- We have contacted Xavier Rodet and requested a copy of his original paper on FWF. Eagerly awaiting response!
This document contains a guide to an equal tempered keyboard which we will use a stepping stone in working out the formants of the piano. http://www.phys.unsw.edu.au/~jw/reprints/SoMBooklet2005.pdf#search=%22piano%20formants%22
MInutes from meeting on 20th of sept:
14:45 meeting convenes: stephen, tung and john present
14:50 discussed tung's book titled 'labs for signals and systems' and its relevance to our project
15:20 discussed john's program, COLEA, and its relevance to our project (http://ccrma-www.stanford.edu/~jmccarty/formant.htm)
15:30 discussed john reekie's suggestion to to approach sydney uni library to obtain a copy of Xavier's journal article
15:45 discussed alternative sources to search for information on this subject as it is rather scarce, options include sampling our own signals via using the COLEA software in conjunction with a CASIO snthesiser to find the formants of the piano...
- We've received a copy of the Rodet paper - Time-Domain Dormant-Wave-Function Synthesis. It describes in technical detail how to go about generating sounds using formants. The function must be excited, but the output must be passed into a parallel filter. It is the output of _that_ filter that we need to model.
- We've found a ready-made matlab function called COLEA to model the formants found at any instant in a digital sound file. This function allows us to select a point in the sounds's time domain representation and see the formants present at that point. This is extremely useful - once we've found those formants, we can then attack the task of turning them into a set of mathematical functions.
-25 October 2006 We have finally made the breakthrough of getting FWF to use. By using a high-resolution wave file with the chromatic scale in COLEA, we are able to get an output of the amplitude vs. frequency data of the formant, corresponding to the formant's spectrum graph as generated by COLEA. This data can then be used to find the fourier co-efficients for a series representing the formant wave function.
After speaking Rajiv, we determined that this set of sinusiods (with an encapsulating exponential to model the decay of a note) will suffice as a formant wave function.
-08 November 2006 Early Wednesday morning, We have consolidated most of the resources and updated most of the background info about the project. This documentation will then be examined by the whole group once again before the final document is agreed upon. So far so good.






