Speaker Recognition is the problem of identifying a speaker from a recording of their speech sample. It is an important topic in Signal Processing and has a variety of applications, especially in security systems. Voice controlled devices also rely heavily on speaker recognition.
The modules I used to do this project are NumPy, SciPy and Matplotlib that have a major area of coverage in building appplications of Signal Processing and plotting them
The main principle behind Speaker Recognition is extraction of features from speech followed by training on a data set and testing. While doing this project, I mainly got the opportunity to get indroduced to the basics of Digital Signal Processing, Feature extraction using two different algorithms (MFCC and LPC),Feature Matching (LBG)
STEP1:FEATURE EXTRACTION (MFCC-Mel Frequency Cepstral Coefficients):
- Human hearing as expected is not linear in nature rather it is logarithmic. Our ears act as a filter.
- Most popular MFCC's are based on the known variation of the human ear’s critical bandwidths with frequency. This is expressed in the mel-frequency scale.
- The speech signal is divided into frames of 25ms with an overlap of 10ms and multiplied with a hamming window.
- The periodogram of each frame of speech is calculated by first doing an FFT of 512 sampleson individual frames, then taking the power spectrum
- The entire frequency range is divided into ‘n’ Mel filter banks(12 here)
- Then filterbank energies are calculated by multiplying the each filter bank with power spectrum and add up the coefficients.
- Finally, applying discrete cosine transform on logarithm of these distinct 'n' energies give MFCCs.
STEP2:FEATURE EXTRACTION (LPC-Linear Prediction Coefficients):
- LPCs are also the popular technique of feature Extraction. It is based on the AutoRegressive Model of the speech.
- In this extraction also, the signal is framed same as mentioned in MFCCs.
- To estimate the LPC coefficients, we use the Yule-Walker Equations which uses Auto-correlation function.
STEP3:FEATURE MATCHING (LBG-Linde-Buzo-Gray):
- Generally, the main approach of Feature Matching is mapping vectors from a large vector space to a finite number of regions in that space. Each region is called a cluster and can be represented by its center called a codeword. The collection of all codewords is called a codebook.
- A vector codebook is designed which is the centroid of entire set of training vectors.
- Now, the codebook size is doubled by splitting the current one and closest codeword is searched for every training vector and assigned as centroid in next iteration.
- This iterations carry out until vector distortion for current iteration falls below a certain value.
STEP4:TRAINING:
- The small dataset contains 8 speakers and 16 centroids.
- Now dataset needs to be trained to derive codebook for each speaker.
- The datset goes through all the extractions of MFCC and LPC and givies out a codebook of mfcc and lpc.
STEP5:TESTING:
- Now, All the features of each speech signal must be compared with the codebooks of training files.
- The results would be obtained for the testing.
Speaker 1 in test matches with speaker 4 in train for training with MFCC
Speaker 1 in test matches with speaker 8 in train for training with LPC
Speaker 2 in test matches with speaker 2 in train for training with MFCC
Speaker 2 in test matches with speaker 8 in train for training with LPC
Speaker 3 in test matches with speaker 3 in train for training with MFCC
Speaker 3 in test matches with speaker 8 in train for training with LPC
Speaker 4 in test matches with speaker 4 in train for training with MFCC
Speaker 4 in test matches with speaker 8 in train for training with LPC
Speaker 5 in test matches with speaker 5 in train for training with MFCC
Speaker 5 in test matches with speaker 8 in train for training with LPC
Speaker 6 in test matches with speaker 6 in train for training with MFCC
Speaker 6 in test matches with speaker 8 in train for training with LPC
Speaker 7 in test matches with speaker 7 in train for training with MFCC
Speaker 7 in test matches with speaker 8 in train for training with LPC
Speaker 8 in test matches with speaker 8 in train for training with MFCC
Speaker 8 in test matches with speaker 8 in train for training with LPC
Accuracy of result for training with MFCC is 87.5 %
Accuracy of result for training with LPC is 12.5 %
Accuracy of result for training with MFCC is 87.2340425531915 %
Accuracy of result for training with LPC is 2.127659574468085 %