Speaker recognition
From SoftwarePractice.org
This article describes a project that implements a speaker recognition program using Matlab. It is suitable for a team implementation and design project in a second class in signals and systems.
Overview
Speaker recognition is the term used for recognizing a particular individual from words spoken by him or her. It is different from speech recognition, in that no attempt is made to understand the actuals words that are spoken.
Clearly, there are some obvious differences between the spoken voice of particular individuals. Female voices are generally higher in pitch than male voices, and so this provides a simple (if not particularly reliable) way to distinguish between a male and a female speaker.
In general, however, more sophisticated techniques are required. To recognize a particular speaker, the program must first be "trained" with samples of the speaker's voice. The voice signal is analyzed to extract parameters or "features" of the voice.
During the recognition phase, the same parameters are extracted from the input voice, and compared to the stored parameters for different speakers. In some cases, the speaker to be recognized is required to use the same words that were used to train the program (security applications, for example). This is called text-dependent recognition. In other cases, any words can be spoken -- this is called text-independent recognition.
Project
You will implement a speaker recognition program in Matlab. The program will have two modes: training and recognition. For the purposes of this project, you are required to implement text-independent recognition. You may find using set words or phrases useful during program development, though.
Since you have (presumably) no experience with speaker recognition, you should plan your approach to the program design and implementation carefully. It is recommended that you firstly implement a fairly simple recognition algorithm and then use that as a basis for developing the rest of the program, such as the switching between training and recognition modes, and the user interface. For this first phase, use two speakers with distinctly different voices (such as one male and one female).
In the second phase, you will have a much better idea of what the problems you face are, and can attempt a more sophisticated algorithm. This algorithm should be able to distinguish between speakers with voices that are not so obviously dissimilar as for the male and female voices.
That way, if the improved algorithm does not perform as expected, at least you will have the simpler algorithm working, and your work on its performance limits will still help to demonstrate the engineering skills you have applied to the program.
To demonstrate the program, the instructor will ask you to demonstrate training and recognition of two speakers with a phrase he or she will provide.
Your project documentation should address at least the following:
- Theory of speaker recognition
- A description of your implementation, including issues encountered and proposed enhancements
- Sample Matlab code to illustrate key points
- Plots and spectra
- A discussion of the limitations of your program
You should also perform some investigation into how the program could be implemented to run in real time, and predict the expected performance limits on your chosen platform.
Extra credit will be awarded to teams who do an excellent job of implementing and documenting the program as described above, and either a) make their program function in real time, or b) implement a useful and usable GUI to the program.
Other resources
- (None listed)
