Team I Speaker recognition
From SoftwarePractice.org
This is the page for group I. We have chosen the topic "Speaker Recognition" for our project. We are interested in this technology as we believe that it is practical and useful. We look forward to getting a good learning outcome from the project.
Contents |
Team members
Ke Ye 10015220
Asif Hirani 10241127
Ki-Seung (Harry) Kim 99075828
Htin Aung Moe 10157446
Meeting Minutes
Abstract
The objective of the project is to develop a speaker recognition program usising MatLab that will be able to identify a particular individual from words spoken by him or her. And it has to be text-independent.
Introduction
A speaker's voice can be analysed in both the frequency domain and the time domain. In the frequency domain, by applying the Fast Fourier Transform (FFT) analysis, we are able to obtain the spectrum from a input signal and it helps us to anaylsis the frequencies found in the given segment of that signal. And also it allows for pitch extraction. In the time domain, the major parameters we can anylsis are duration and ampitudes.
thoery
Speech Signal
Cepstrum
Mel-Frequency Cepstal Coefficients (MFCC)
Vector Quantization (VQ): K-means
Distance Measure: Euclidean Distance
approach to analysis
To identify the voices of the unknown speaker we need to:
- Record set words or phrases of speech from the known speakers
– Extract characteristic features of the speech of the known speakers
– Create models of the features of the known speakers
– Compare the features from the unknown speaker’s utterances with the statistical models of the voices of the speakers known to the system
– Make decision when we have identified that test utterance belongs to a certain speaker
Implementation - Matlab codes
Results
Results:
For example, we are going to test speech wave file made by Brian, which called ‘test_brian.wav’. Assume we do not know the speaker is Brian at the beginning. Therefore we need to apply the wav. file into our speaker recognition system to find out who the speaker is. We run the program twice in order to get a more accurate result. The Matlab codes are provided as following: % First run >> speakerID('test_brian') Loading data... Calculating mel-frequency cepstral coefficients for training set... Harry Carli Brian In___ Hojin Performing K-means... Calculating mel-frequency cepstral coefficients for test set... Compute a distortion measure for each codebook... Display the result... The average of Euclidean distances between database and test wave file Harry
7.0183
Carli
10.0679
Brian
5.9630
In___
8.4237
Hojin
7.6526
The test voice is most likely from Brian
% Second run >> speakerID('test_brian') Loading data... Calculating mel-frequency cepstral coefficients for training set... Harry Carli Brian In___ Hojin Performing K-means... Calculating mel-frequency cepstral coefficients for test set... Compute a distortion measure for each codebook... Display the result... The average of Euclidean distances between database and test wave file Harry
6.9995
Carli
9.9876
Brian
5.8339
In___
8.7075
Hojin
7.6390
The test voice is most likely from Brian
From the above outputs we had in Matlab, we got 5 measurements for each run, whic are the calculated Euclidean distances between the test wave file and codebooks from the database. We can see that, compare to the codebooks in the database; both calculated distortion distance of Brian have the smallest values, which are 5.9630 and 5.8339. Therefore, we can conclude that the speak person is Brian according to the theory: “the most likely speaker’s voice should have the smallest Euclidean distance compare to the codebooks in the database”.
Conclusion
