SoftwarePractice.org: Home | Courseware | Wiki | Archive

Team J: Speaker Recognition

From SoftwarePractice.org

Project Overview

The purpose of the Speaker Recognition Project is to develop a core algorithm to recognize and identify different input voices. The basic requirements of this project is provide a operatable program that has basic and trainable voice recoginition function, which means

1. It neither have complicated functions nor require high processing power

2. It must be able to add/update its database entries so that the recognition results can be improved

Generic Speaker Recognition

The tradition of voice and speech recognition dates back to the 1870s, when Alexander Bell came up with the idea of a machine that would make speech visible to help people with hearing difficulties. By discovering how to convert air pressure waves (sound) into electrical impulses, he began the process of uncovering the scientific and mathematical basis of understanding speech. Despite this, suffice to say that he is more famous for the invention of the telephone.

Modern development of voice identification systems began as early as the 1960s with exploration into voiceprint analysis, where characteristics of an individual’s voice were thought to be able to characterize the uniqueness of an individual much like a fingerprint.

The early systems had many flaws and research ensued to derive a more reliable method of distinguishing between two different types of speeches or voices. Voice identification research continues today under the realm of the field of digital signal processing where many advances have taken place in recent years. Also, we are seeing further implementation of speaker recognition as a biometric identifier.

Problems we are probably going to face

Problems
- Choice of word
- Speed that the word is spoken
- Intonation of certain syllables in the word
- Volume (and perhaps subsequent distortion) of the spoken word
- SNR
- Anything else?

Our Idea of Speech Recognition

In our case all the sample waveforms are to have a proper sampling rate and proper a number of quantization bits, we should not forget that the signal to noise ratio should be good, so we don’t need to do pre-processing for our signal. We are to use sampling frequency of 22,050 Hz Mono bigger than 2 times our frequency speech waveform, and 16 bits which is mean 216=65536 quantization level. With these characters we will start collecting samples for testing.

Development Progress and Further recognition


Something to think about

The Cepstrum Domain

The cepstrum is a common transform used to gain information from a person’s speech signal. It can be used to separate the excitation signal (which contains the words and the pitch) and the transfer function (which contains the voice quality). It is similar to a channel vocoder or LPC in its applications, but using the cepstrum as a spectral analyzer is a completely different process. (It is also worth pointing out that cepstrum is “spectrum” with the first syllable flipped… we will encounter several words with this naming convention.) Before describing the details of the cepstrum, a little background in speech models is needed.

'Cepstrum Definition'

A cepstrum (pronounced /ˈkɛpstrəm/) is the result of taking the Fourier transform (FT) of the decibel spectrum as if it were a signal.

Mathematically: cepstrum of signal = FT(log(FT(the signal))+j2πm) (where m is the integer required to properly unwrap the angle or imaginary part of the complex log function)


from - http://cnx.org/content/m12469/latest/

1. Generic



Implementation

1. Get speaker's sounds into recognizable format

A computer is basically a machine that offers functions based on digital calculations. To kick start the recognition implementation, we will firstly have to make Speaker's sound recognizable by a computer. That includes converting sounds to electrical signals, digitization and quantization. We can convert sounds to electrical signals through microphone. Recording software will help us to complete digitization and quantaization.

Image:Recorder.jpg

Sampling Frequency:

Sampling Bits:



2. Analyse sound data

Personal tools