Code
From SoftwarePractice.org
Code Analysis:
In this part we will be analysing the speaker recognition code in order to give a better explanation on the functionality of our Code. The code will be broken up to parts and each part will be explained individually. The flow of the code is not necessary in order but the code flows in such a way that it moves from one section to another depending on the user’s input.
Section 1 - Variable Declaration
Code
Continue = 1; % while loop Selection = 0; % Input Time = 0; % Input Fs = 22050; % Sampling Frequency average_power_male = 0; % Average Power in MALE average_power_female = 0; % Average Power in FEMALE Energy = zeros(1,1); % Matrix for ENERGY Power = zeros(1,1); % Matrix for POWER Frequency = zeros(1,1); % Matrix for FREQUENCY
Analysis
This section is mainly to declare all variables which would be used by the program.
Continue is used for the main while loop of the program. The user inputs to the program are Selection and Time. Selection is used for choosing which mode of training the user wishes to use. There are two modes of training the program, firstly training the program with a pre-existing sound file and secondly recording a sound file for training. Time (seconds) is used for providing the length of the time for which the user wishes to record. The Energy, Power and Frequency matrices are used to store the calculated values of each while the program is trained.
Section 2 - Start of Program
Code
while (Continue == 1)
Selection = input('Choose Training Method\n1.Input pre-existing sound files\n2.Record a file for training\nPlease Enter 1 or 2: ');
if (Selection == 1) % If the user choses to input a pre-existing sound file
% Get input for training (MALE)
Male_File_Path = input('Please input the name of a MALE sound file.\n','s');
[y,fs] = wavread(Male_File_Path); % Store the file to Variable 'y'
% Get input for training (FEMALE)
Female_File_Path = input('Please input the path of a FEMALE sound file.\n','s');
[z,fs] = wavread(Female_File_Path); % Store the file to Variable 'z'
end
if (Selection == 2) % If the user choses to record a file for training
% Ask the user to input the number of seconds he/she wishes to record for
Time = input('Plean Enter the Number Of Seconds you wish to Record for: ');
% Ask the user to record a MALE file for Training
Ready_String=input('Type "ready" to start Recording.\nPlease provide the program with a MALE file.\n','s');
if (Ready_String~='ready')
break
end
y=wavrecord(Time*Fs,Fs); % Records the wave file
% Ask the user to record a FEMALE file for Training
Ready_String=input('Type "ready" to start Recording.\nPlease provide the program with a FEMALE file.\n','s');
if (Ready_String~='ready')
break
end
z=wavrecord(Time*Fs,Fs); % Records the wave file
end
Analysis
Section 2 is made to allow inputs into the program in order to train the program to recognise wether the speaker is a male or female. Selection is used to train the program. In either case, i.e. inputting a pre-existing user file or recording a file for training the program prompts the user for a male and a female file. The male file is stored in the variable y and the female file is stored in variable z.
Inputting a pre-existing file just requires the user to enter a pre-existing file stored in the working directory of the program. Whilst recording a wave file for training the user is prompted to enter the time for which he/she wishes to record. Further once the user is ready to record he/she needs to type ready to start recording.
Section 3 - Removing Noise from the waveform and Extracting the active part of the Waveform
Code
%==========================================================================
% MALE
%==========================================================================
ShiftedSoundy = zeros(1,1); % Matirx to Store the Active part of the Wavefrom
Noise_Canceller = (max(y(1:Interval)) - min(y(1:Interval)) + mean(abs(y(1:Interval))))/0.618 % Sets Threshold for Noise
for (d=1:Interval:(length(y)-Interval)) % for searches for the active part of the wave form
Max_Value_In_Interval = max(y(d:d+Interval)); % Searches for the Maximum value between the index and the interval
Min_Value_In_Interval = min(y(d:d+Interval)); % Searches for the Minimum value between the index and the interval
if (Max_Value_In_Interval - Min_Value_In_Interval > Noise_Canceller) % Extracts the active part of the waveform
ShiftedSoundy = cat(1,ShiftedSoundy,y(d:d+Interval)); % Stores the active part of the waveform in ShiftedSoundy
end
end
%==========================================================================
% FEMALE
%==========================================================================
ShiftedSoundz = zeros(1,1); % Matirx to Store the Active part of the Wavefrom
Noise_Canceller = (max(z(1:Interval))-min(z(1:Interval))+mean(abs(z(1:Interval))))/0.618 % Sets Threshold for Noise
for (d=1:Interval:(length(z)-Interval)) % for searches for the active part of the wave form
Max_Value_In_Interval = max(z(d:d+Interval)); % Searches for the Maximum value between the index and the interval
Min_Value_In_Interval = min(z(d:d+Interval)); % Searches for the Minimum value between the index and the interval
if (Max_Value_In_Interval-Min_Value_In_Interval > Noise_Canceller) % Extracts the active part of the waveform
ShiftedSoundz = cat(1,ShiftedSoundz,z(d:d+Interval)); % Stores the active part of the waveform in ShiftedSoundz
end
end
Analysis
This section is used for removing noise from the waveform and extracting the active part of the waveform. This is essential if the user wishes to train the program with a string of words. This section removes any background noise from the waveform and when a string of data is spoken it extracts the parts of the waveform for which it is active, i.e. it removes the delay between various words spoken. This enhances the ability of the program to extract the required information from the waveform.
Once the noise is minimized and the active part of the waveform determined the new waveform is stored in ShiftedSoundy and ShiftedSoundz respectively for male and female.
Section 4 - Normalization of the frequency components of the waveforms
Code
maxfftY = max(max(abs(Y))); % Gets the Maximum value of Frequency component in MALE
maxfftZ = max(max(abs(Z))); % Gets the Maximum value of Frequency component in FEMALE
c = 1/log(1+(maxfftY)); % Ratio for Normalization
d = 1/log(1+(maxfftZ)); % Ratio for Normalization
NormY = c*log(1+abs(Y)); % Normalized Frequency for MALE
NormZ = d*log(1+abs(Z)); % Normalized Frequency for FEMALE
Analysis
This section is used for normalizing the frequency components of the waveform. This enables the program to determine a peak value of frequency of the waveform which is further used to determine the threshold of frequency for the recognition phase. The normalized frequency is stored in the NormY and NormZ for male and female respectively.
Section 5 - Normalization of the waveform
Code
Norm_Male = ShiftedSoundy / (max(ShiftedSoundy) - min(ShiftedSoundy))*2; % Normalize the waveform
Norm_Female = ShiftedSoundz / (max(ShiftedSoundz) - min(ShiftedSoundz))*2; % Normalize the waveform
Analysis
This section is used to normalize the actual waveform to make it suitable for calculations of Energy and Power. This makes the waveform attain a peak value of amplitude for simplifying and maintaining accurate calculations.
Section 6 - Calculating Energy of the waveforms
Code
Norm_Male = Norm_Male'; % Converting into row vector
Norm_Female = Norm_Female'; % Converting into row vector
Energy_Male = sum(Norm_Male.^2) %Finding the Energy in MALE File
% Storing the calculated value for training in Energy Matrix
Energy = horzcat(Energy,Energy_Male);
Energy_Female = sum(Norm_Female.^2) %Finding the Energy in FEMALE File
% Storing the calculated value for training in Energy Matrix
Energy = horzcat(Energy,Energy_Female);
Analysis
Converting normalized waveform to a row vector helps calculate the energy of the waveform. Energy of the waveform is mainly the sum of the square of every component of the waveform. The energy of both the male and the female waveforms are stored in the Energy vector which is then used to set the threshold for the recognition phase of the program.
Section 7 - Calculating Power of the waveforms
Code
[S,F,T,Power_Male] = SPECTROGRAM (Norm_Male,1024,512,512,8000); % Used to Store the power of the Waveform in Power_Male
[A,B,C,Power_Female] = SPECTROGRAM (Norm_Female,1024,512,512,8000); % Used to Store the power of the Waveform in Power_Female
Analysis
SPECTROGRAM is used to calculate Power stored in the waveform. The power of each of the male and female waveform is stored in Power_Male and Power_Female respectively.
Section 8 - Power Spectral Density (PSD) Calculation
Code
%==========================================================================
% Power Spectral Density (PSD) Calculation
%==========================================================================
%==========================================================================
% Variable Declaration
%==========================================================================
cumulative_power_male = 0; % Used to Store the Cumulative Power in the MALE waveform
cumulative_power_female = 0; % Used to Store the Cumulative Power in the FEMALE waveform
component_amount_male = 0; % Used to Store the Component Ammount of Power in MALE
component_amount_female = 0; % Used to Store the Component Ammount of Power in FEMALE
length_male = 0; % Length of Matrix in which the Power of MALE is stored
length_female = 0; % Length of Matrix in which the Power of FEMALE is stored
[component_amount_male,length_male] = size(Power_Male); % Copies the the Power Matrix from MALE
[component_amount_female,length_female] = size(Power_Female); % Copies the the Power Matrix from FEMALE
% As infered from a Number of calculations the true Power of waveforms lie
% under the 3000 Hz Threshold we use that as the basis of our calculation.
component_amount_male = component_amount_male/(4/3); % Component_amount_male corresponds to the Frequency Threshold
component_amount_female = component_amount_female/(4/3); % Component_amount_female corresponds to the Frequency Threshold
% for loops to calculate the Cumulative Power of MALE and FEMALE Waveforms
for (i=1:ceil(component_amount_male))
cumulative_power_male = cumulative_power_male+sum(Power_Male(i)); % Calculates the Cumulative power of the waveform for calculation of the average power
end
for i = (1:ceil(component_amount_female))
cumulative_power_female = cumulative_power_female+sum(Power_Female(i)); % Calculates the Cumulative power of the waveform for calculation of the average power
end
%==========================================================================
% Average Power of each waveform recorded
%==========================================================================
average_power_male = cumulative_power_male*1e10/ (component_amount_male*length_male) % Scaled up by 1e10 to store in the Matrix
% Storing the calculated value for training in Power Matrix
Power = horzcat(Power,average_power_male);
average_power_female = cumulative_power_female*1e10/ (component_amount_female*length_female) % Scaled up by 1e10 to store in the Matrix
% Storing the calculated value for training in Power Matrix
Power = horzcat(Power,average_power_female);
Analysis
This section is used to calculate the power spectral density of the waveform. As inferred from numerous calculations and spectrogram analysis the power of every waveform exists between the ranges of 0 to 3000 Hz of the spectrogram. To obtain accurate calculations of the Power Spectral Density of each waveform the Power_Male and Power_Female matrices were taken into consideration. The component amount of each of the matrices was stored in the component_ammount_male and component_ammount_female respectively. The cumulative power was then calculated summing every instance of the waveform which existed within the ranges of frequency. Further the average power was calculated by taking the average of the cumulative power of the waveforms. These were scaled by a factor of 1e10 to store in the Power matrix as matrices are only accurate up to 4 decimal places.
Section 9 - Calculating Frequency of Each Waveform
Code
% for loop to calculate the peak Frequency of the MALE Waveform
for (i = 1:1:length(NormY))
if(NormY(i) > 0.98)
Male_Frequency = i
end
end
% Storing the calculated value for training in Frequency Matrix
Frequency = horzcat(Frequency,Male_Frequency);
% for loop to calculate the peak Frequency of the FEMALE Waveform
for (i = 1:1:length(NormZ))
if(NormZ(i) >0.98)
Female_Frequency = i
end
end
% Storing the calculated value for training in Frequency Matrix
Frequency = horzcat(Frequency,Female_Frequency);
Analysis
This section searches for the peak value of frequency from the normalized components of the frequency. It provides the program with a peak value of the frequency of each of the waveform which are then sorted into the Frequency matrix for setting the threshold of frequency for the recognition phase.
Section 10 - Setting Recognition Thresholds from Training
Code
%==========================================================================
% Training Complete
%==========================================================================
% Deleting the First Coloumn of every Storage Variable Matrix as while
% using concatenation the first element was assigned a zero value.
Energy(:, 1) = [];
Power(:, 1) = [];
Frequency(:, 1) = [];
%==========================================================================
% Setting Recognition Thresholds from Training
%==========================================================================
% Taking average of all the extracted parameters (Thresholds)
Energy_Threshold = mean(Energy);
Power_Threshold = mean(Power);
Frequency_Threshold = mean(Frequency);
% Ask User to initiate the Recongnition part of the Code
Recong = input('Training Complete!\nDo you wish to record files for Recongnition? (y/n)\n','s') ;
if(Recong == 'y')
% Ask the user to input the number of seconds he/she wishes to record for
Time = input('Plean Enter the Number Of Seconds you wish to Record for: ');
Ready_String = input('Please record the first voice,Type "ready" to get recording started.\n','s');
if (Ready_String ~= 'ready')
break
end
y = wavrecord(Time*Fs,Fs);
Analysis
The First Column of every Storage Variable Matrix while using concatenation was assigned a zero value. These were then deleted to acquire the range of parameters for which the program trained for. The averages of all the extracted parameters were then taken to set thresholds for the recognition phase of the program. These thresholds were then used to compare the extracted parameters of the waveform to be recognized.
Section 11 - Recognition Phase
The recognition phase followed a similar pattern as the training phase. The user was asked to record the voice for recognition and similar parameters were extracted from the recorded waveform following a similar manner. The Frequency, Energy and Average Power of the recorded waveform were then compared to the thresholds determined from the training phase of the program. Conclusions were determined from this comparison.
Section 12 - Conclusion
Code
%==========================================================================
% Conclusion
%==========================================================================
flag = 0; % Variable to Check Probability of Female Component
if (Recording_Frequency > Frequency_Threshold)
fprintf('The Frequency Test Infers the Recording is Female.\n');
flag = flag + 1;
else
fprintf('The Frequenct Test Infers the Recording is Male.\n');
end
if (Average_Power_Recording < Power_Threshold)
fprintf('The Power Test Infers the Recording is Female.\n');
flag = flag + 1;
else
fprintf('The Power Test Infers the Recording is Male.\n');
end
if (Energy_Recording < Energy_Threshold)
fprintf('The Energy Test Infers the Recording is Female.\n');
flag = flag + 1;
else
fprintf('The Energy Test Infers the Recording is Male.\n');
end
if (flag >= 2)
fprintf('The Recording is Female.\n');
else
fprintf('The Recording is Male.\n');
end
end
Analysis
As researched, calculated and compared by with a numerous resources it was determined that the frequency of the female voice since it has a higher pitch is higher than the male voice. On the contrary the Energy and the Power of a male voice is higher than a female voice. Using these assumptions a conclusion was determined about the recorded waveform in comparison with the thresholds set in the training of the program.
