SoftwarePractice.org: Home | Courseware | Wiki | Archive

Code

From SoftwarePractice.org

Contents

Code Analysis:

In this part we will be analysing the speaker recognition code in order to give a better explanation on the functionality of our Code. The code will be broken up to parts and each part will be explained individually. The flow of the code is not necessary in order but the code flows in such a way that it moves from one section to another depending on the user’s input.

Section 1 - Variable Declaration

Code

Continue = 1;       % while loop
Selection = 0;      % Input
Time = 0;           % Input
Fs = 22050;         % Sampling Frequency
average_power_male = 0; % Average Power in MALE
average_power_female = 0; % Average Power in FEMALE
Energy = zeros(1,1);  % Matrix for ENERGY
Power = zeros(1,1);   % Matrix for POWER
Frequency = zeros(1,1); % Matrix for FREQUENCY

Analysis

This section is mainly to declare all variables which would be used by the program.

Continue is used for the main while loop of the program. The user inputs to the program are Selection and Time. Selection is used for choosing which mode of training the user wishes to use. There are two modes of training the program, firstly training the program with a pre-existing sound file and secondly recording a sound file for training. Time (seconds) is used for providing the length of the time for which the user wishes to record. The Energy, Power and Frequency matrices are used to store the calculated values of each while the program is trained.

Section 2 - Start of Program

Code

while (Continue == 1)
    Selection = input('Choose Training Method\n1.Input pre-existing sound files\n2.Record a file for training\nPlease Enter 1 or 2: ');
    if (Selection == 1) % If the user choses to input a pre-existing sound file
        % Get input for training (MALE)
        Male_File_Path = input('Please input the name of a MALE sound file.\n','s');
        [y,fs] = wavread(Male_File_Path); % Store the file to Variable 'y'
        % Get input for training (FEMALE)
        Female_File_Path = input('Please input the path of a FEMALE sound file.\n','s');
        [z,fs] = wavread(Female_File_Path); % Store the file to Variable 'z'
    end 
    
    if (Selection == 2) % If the user choses to record a file for training
        % Ask the user to input the number of seconds he/she wishes to record for
        Time = input('Plean Enter the Number Of Seconds you wish to Record for: ');
        % Ask the user to record a MALE file for Training
        Ready_String=input('Type "ready" to start Recording.\nPlease provide the program with a MALE file.\n','s');
        if (Ready_String~='ready')
            break
        end
        y=wavrecord(Time*Fs,Fs); % Records the wave file
        
        % Ask the user to record a FEMALE file for Training
        Ready_String=input('Type "ready" to start Recording.\nPlease provide the program with a FEMALE file.\n','s');
        if (Ready_String~='ready')
            break
        end
        z=wavrecord(Time*Fs,Fs); % Records the wave file
    end

Analysis

Section 2 is made to allow inputs into the program in order to train the program to recognise wether the speaker is a male or female. Selection is used to train the program. In either case, i.e. inputting a pre-existing user file or recording a file for training the program prompts the user for a male and a female file. The male file is stored in the variable y and the female file is stored in variable z.

Inputting a pre-existing file just requires the user to enter a pre-existing file stored in the working directory of the program. Whilst recording a wave file for training the user is prompted to enter the time for which he/she wishes to record. Further once the user is ready to record he/she needs to type ready to start recording.

Section 3 - Removing Noise from the waveform and Extracting the active part of the Waveform

Code

%==========================================================================
%   MALE
%==========================================================================

    ShiftedSoundy = zeros(1,1); % Matirx to Store the Active part of the Wavefrom
    Noise_Canceller = (max(y(1:Interval)) - min(y(1:Interval)) + mean(abs(y(1:Interval))))/0.618 % Sets Threshold for Noise  
    for (d=1:Interval:(length(y)-Interval)) % for searches for the active part of the wave form
        Max_Value_In_Interval = max(y(d:d+Interval)); % Searches for the Maximum value between the index and the interval
        Min_Value_In_Interval = min(y(d:d+Interval)); % Searches for the Minimum value between the index and the interval
    
        if (Max_Value_In_Interval - Min_Value_In_Interval > Noise_Canceller) % Extracts the active part of the waveform
            ShiftedSoundy = cat(1,ShiftedSoundy,y(d:d+Interval)); % Stores the active part of the waveform in ShiftedSoundy
        end
    end

%==========================================================================
%   FEMALE
%==========================================================================

    ShiftedSoundz = zeros(1,1); % Matirx to Store the Active part of the Wavefrom
    Noise_Canceller = (max(z(1:Interval))-min(z(1:Interval))+mean(abs(z(1:Interval))))/0.618 % Sets Threshold for Noise
    for (d=1:Interval:(length(z)-Interval)) % for searches for the active part of the wave form
        Max_Value_In_Interval = max(z(d:d+Interval)); % Searches for the Maximum value between the index and the interval
        Min_Value_In_Interval = min(z(d:d+Interval)); % Searches for the Minimum value between the index and the interval
    
        if (Max_Value_In_Interval-Min_Value_In_Interval > Noise_Canceller) % Extracts the active part of the waveform
            ShiftedSoundz = cat(1,ShiftedSoundz,z(d:d+Interval)); % Stores the active part of the waveform in ShiftedSoundz
        end
    end

Analysis

This section is used for removing noise from the waveform and extracting the active part of the waveform. This is essential if the user wishes to train the program with a string of words. This section removes any background noise from the waveform and when a string of data is spoken it extracts the parts of the waveform for which it is active, i.e. it removes the delay between various words spoken. This enhances the ability of the program to extract the required information from the waveform.

Once the noise is minimized and the active part of the waveform determined the new waveform is stored in ShiftedSoundy and ShiftedSoundz respectively for male and female.

Section 4 - Normalization of the frequency components of the waveforms

Code

    maxfftY = max(max(abs(Y))); % Gets the Maximum value of Frequency component in MALE
    maxfftZ = max(max(abs(Z))); % Gets the Maximum value of Frequency component in FEMALE

    c = 1/log(1+(maxfftY)); % Ratio for Normalization 
    d = 1/log(1+(maxfftZ)); % Ratio for Normalization

    NormY = c*log(1+abs(Y)); % Normalized Frequency for MALE
    NormZ = d*log(1+abs(Z)); % Normalized Frequency for FEMALE

Analysis

This section is used for normalizing the frequency components of the waveform. This enables the program to determine a peak value of frequency of the waveform which is further used to determine the threshold of frequency for the recognition phase. The normalized frequency is stored in the NormY and NormZ for male and female respectively.

Section 5 - Normalization of the waveform

Code

    Norm_Male = ShiftedSoundy / (max(ShiftedSoundy) - min(ShiftedSoundy))*2; % Normalize the waveform  
    Norm_Female = ShiftedSoundz / (max(ShiftedSoundz) - min(ShiftedSoundz))*2; % Normalize the waveform

Analysis

This section is used to normalize the actual waveform to make it suitable for calculations of Energy and Power. This makes the waveform attain a peak value of amplitude for simplifying and maintaining accurate calculations.

Section 6 - Calculating Energy of the waveforms

Code

    Norm_Male = Norm_Male'; % Converting into row vector
    Norm_Female = Norm_Female'; % Converting into row vector
 
    Energy_Male = sum(Norm_Male.^2) %Finding the Energy in MALE File
% Storing the calculated value for training in Energy Matrix
    Energy = horzcat(Energy,Energy_Male);
    
    Energy_Female = sum(Norm_Female.^2) %Finding the Energy in FEMALE File
% Storing the calculated value for training in Energy Matrix
    Energy = horzcat(Energy,Energy_Female);
 

Analysis

Converting normalized waveform to a row vector helps calculate the energy of the waveform. Energy of the waveform is mainly the sum of the square of every component of the waveform. The energy of both the male and the female waveforms are stored in the Energy vector which is then used to set the threshold for the recognition phase of the program.

Section 7 - Calculating Power of the waveforms

Code

    [S,F,T,Power_Male] = SPECTROGRAM (Norm_Male,1024,512,512,8000); % Used to Store the power of the Waveform in Power_Male
    [A,B,C,Power_Female] = SPECTROGRAM (Norm_Female,1024,512,512,8000); % Used to Store the power of the Waveform in Power_Female

Analysis

SPECTROGRAM is used to calculate Power stored in the waveform. The power of each of the male and female waveform is stored in Power_Male and Power_Female respectively.

Section 8 - Power Spectral Density (PSD) Calculation

Code

%==========================================================================
%   Power Spectral Density (PSD) Calculation
%==========================================================================

%==========================================================================
%   Variable Declaration
%==========================================================================
    cumulative_power_male = 0; % Used to Store the Cumulative Power in the MALE waveform
    cumulative_power_female = 0; % Used to Store the Cumulative Power in the FEMALE waveform
    component_amount_male = 0; % Used to Store the Component Ammount of Power in MALE
    component_amount_female = 0; % Used to Store the Component Ammount of Power in FEMALE
    length_male = 0; % Length of Matrix in which the Power of MALE is stored
    length_female = 0; % Length of Matrix in which the Power of FEMALE is stored

    [component_amount_male,length_male] = size(Power_Male); % Copies the the Power Matrix from MALE
    [component_amount_female,length_female] = size(Power_Female); % Copies the the Power Matrix from FEMALE

% As infered from a Number of calculations the true Power of waveforms lie
% under the 3000 Hz Threshold we use that as the basis of our calculation.

    component_amount_male = component_amount_male/(4/3); % Component_amount_male corresponds to the Frequency Threshold 
    component_amount_female = component_amount_female/(4/3); % Component_amount_female corresponds to the Frequency Threshold

% for loops to calculate the Cumulative Power of MALE and FEMALE Waveforms
    for (i=1:ceil(component_amount_male))
        cumulative_power_male = cumulative_power_male+sum(Power_Male(i)); % Calculates the Cumulative power of the waveform for calculation of the average power
    end

    for i = (1:ceil(component_amount_female))
        cumulative_power_female = cumulative_power_female+sum(Power_Female(i)); % Calculates the Cumulative power of the waveform for calculation of the average power
    end

%==========================================================================
%   Average Power of each waveform recorded
%==========================================================================

    average_power_male = cumulative_power_male*1e10/ (component_amount_male*length_male) % Scaled up by 1e10 to store in the Matrix 
% Storing the calculated value for training in Power Matrix
    Power = horzcat(Power,average_power_male);
    
    average_power_female = cumulative_power_female*1e10/ (component_amount_female*length_female) % Scaled up by 1e10 to store in the Matrix
% Storing the calculated value for training in Power Matrix
    Power = horzcat(Power,average_power_female);

Analysis

This section is used to calculate the power spectral density of the waveform. As inferred from numerous calculations and spectrogram analysis the power of every waveform exists between the ranges of 0 to 3000 Hz of the spectrogram. To obtain accurate calculations of the Power Spectral Density of each waveform the Power_Male and Power_Female matrices were taken into consideration. The component amount of each of the matrices was stored in the component_ammount_male and component_ammount_female respectively. The cumulative power was then calculated summing every instance of the waveform which existed within the ranges of frequency. Further the average power was calculated by taking the average of the cumulative power of the waveforms. These were scaled by a factor of 1e10 to store in the Power matrix as matrices are only accurate up to 4 decimal places.

Section 9 - Calculating Frequency of Each Waveform

Code

% for loop to calculate the peak Frequency of the MALE Waveform
    for (i = 1:1:length(NormY))
        if(NormY(i) > 0.98)
            Male_Frequency = i
        end
    end
% Storing the calculated value for training in Frequency Matrix
    Frequency = horzcat(Frequency,Male_Frequency);

% for loop to calculate the peak Frequency of the FEMALE Waveform
    for (i = 1:1:length(NormZ))
        if(NormZ(i) >0.98)
            Female_Frequency = i
        end
    end
% Storing the calculated value for training in Frequency Matrix
    Frequency = horzcat(Frequency,Female_Frequency);

Analysis

This section searches for the peak value of frequency from the normalized components of the frequency. It provides the program with a peak value of the frequency of each of the waveform which are then sorted into the Frequency matrix for setting the threshold of frequency for the recognition phase.

Section 10 - Setting Recognition Thresholds from Training

Code

%==========================================================================
%  Training Complete
%==========================================================================

% Deleting the First Coloumn of every Storage Variable Matrix as while
% using concatenation the first element was assigned a zero value.

Energy(:, 1) = [];
Power(:, 1) = [];
Frequency(:, 1) = [];

%==========================================================================
%  Setting Recognition Thresholds from Training
%==========================================================================

% Taking average of all the extracted parameters (Thresholds)
Energy_Threshold = mean(Energy);
Power_Threshold = mean(Power);
Frequency_Threshold = mean(Frequency);

% Ask User to initiate the Recongnition part of the Code

Recong = input('Training Complete!\nDo you wish to record files for Recongnition? (y/n)\n','s') ;
if(Recong == 'y')
    % Ask the user to input the number of seconds he/she wishes to record for
    Time = input('Plean Enter the Number Of Seconds you wish to Record for: ');
    Ready_String = input('Please record the first voice,Type "ready" to get recording started.\n','s');
    if (Ready_String ~= 'ready')
        break
    end
            
    y = wavrecord(Time*Fs,Fs);

Analysis

The First Column of every Storage Variable Matrix while using concatenation was assigned a zero value. These were then deleted to acquire the range of parameters for which the program trained for. The averages of all the extracted parameters were then taken to set thresholds for the recognition phase of the program. These thresholds were then used to compare the extracted parameters of the waveform to be recognized.

Section 11 - Recognition Phase

The recognition phase followed a similar pattern as the training phase. The user was asked to record the voice for recognition and similar parameters were extracted from the recorded waveform following a similar manner. The Frequency, Energy and Average Power of the recorded waveform were then compared to the thresholds determined from the training phase of the program. Conclusions were determined from this comparison.


Section 12 - Conclusion

Code

%==========================================================================
%   Conclusion
%==========================================================================
    flag = 0; % Variable to Check Probability of Female Component 

    if (Recording_Frequency > Frequency_Threshold)
        fprintf('The Frequency Test Infers the Recording is Female.\n');
        flag = flag + 1;
    else
        fprintf('The Frequenct Test Infers the Recording is Male.\n');
    end
    
    if (Average_Power_Recording < Power_Threshold)
        fprintf('The Power Test Infers the Recording is Female.\n');
        flag = flag + 1;
    else
        fprintf('The Power Test Infers the Recording is Male.\n');
    end

    if (Energy_Recording < Energy_Threshold)
        fprintf('The Energy Test Infers the Recording is Female.\n');
        flag = flag + 1;
    else
        fprintf('The Energy Test Infers the Recording is Male.\n');
    end
    
    if (flag >= 2)
        fprintf('The Recording is Female.\n');
    else
        fprintf('The Recording is Male.\n');
    end

end

Analysis

As researched, calculated and compared by with a numerous resources it was determined that the frequency of the female voice since it has a higher pitch is higher than the male voice. On the contrary the Energy and the Power of a male voice is higher than a female voice. Using these assumptions a conclusion was determined about the recorded waveform in comparison with the thresholds set in the training of the program.

Personal tools