Automatic-Speaker-Recognition

Speaker Recognition is the problem of identifying a speaker from a recording of their speech sample. It is an important topic in Signal Processing and has a variety of applications, especially in security systems. Voice controlled devices also rely heavily on speaker recognition.

The modules I used to do this project are NumPy, SciPy and Matplotlib that have a major area of coverage in building appplications of Signal Processing and plotting them

The main principle behind Speaker Recognition is extraction of features from speech followed by training on a data set and testing. While doing this project, I mainly got the opportunity to get indroduced to the basics of Digital Signal Processing, Feature extraction using two different algorithms (MFCC and LPC),Feature Matching (LBG)

STEP1:FEATURE EXTRACTION (MFCC-Mel Frequency Cepstral Coefficients):

Human hearing as expected is not linear in nature rather it is logarithmic. Our ears act as a filter.
Most popular MFCC's are based on the known variation of the human ear’s critical bandwidths with frequency. This is expressed in the mel-frequency scale.
The speech signal is divided into frames of 25ms with an overlap of 10ms and multiplied with a hamming window.
The periodogram of each frame of speech is calculated by first doing an FFT of 512 sampleson individual frames, then taking the power spectrum
The entire frequency range is divided into ‘n’ Mel filter banks(12 here)
Then filterbank energies are calculated by multiplying the each filter bank with power spectrum and add up the coefficients.
Finally, applying discrete cosine transform on logarithm of these distinct 'n' energies give MFCCs.

STEP2:FEATURE EXTRACTION (LPC-Linear Prediction Coefficients):

LPCs are also the popular technique of feature Extraction. It is based on the AutoRegressive Model of the speech.
In this extraction also, the signal is framed same as mentioned in MFCCs.
To estimate the LPC coefficients, we use the Yule-Walker Equations which uses Auto-correlation function.

STEP3:FEATURE MATCHING (LBG-Linde-Buzo-Gray):

Generally, the main approach of Feature Matching is mapping vectors from a large vector space to a finite number of regions in that space. Each region is called a cluster and can be represented by its center called a codeword. The collection of all codewords is called a codebook.
A vector codebook is designed which is the centroid of entire set of training vectors.
Now, the codebook size is doubled by splitting the current one and closest codeword is searched for every training vector and assigned as centroid in next iteration.
This iterations carry out until vector distortion for current iteration falls below a certain value.

STEP4:TRAINING:

The small dataset contains 8 speakers and 16 centroids.
Now dataset needs to be trained to derive codebook for each speaker.
The datset goes through all the extractions of MFCC and LPC and givies out a codebook of mfcc and lpc.

STEP5:TESTING:

Now, All the features of each speech signal must be compared with the codebooks of training files.
The results would be obtained for the testing.

OUTPUT

SMALL DATSET:

Speaker 1 in test matches with speaker 4 in train for training with MFCC
Speaker 1 in test matches with speaker 8 in train for training with LPC

Speaker 2 in test matches with speaker 2 in train for training with MFCC
Speaker 2 in test matches with speaker 8 in train for training with LPC

Speaker 3 in test matches with speaker 3 in train for training with MFCC
Speaker 3 in test matches with speaker 8 in train for training with LPC

Speaker 4 in test matches with speaker 4 in train for training with MFCC
Speaker 4 in test matches with speaker 8 in train for training with LPC

Speaker 5 in test matches with speaker 5 in train for training with MFCC
Speaker 5 in test matches with speaker 8 in train for training with LPC

Speaker 6 in test matches with speaker 6 in train for training with MFCC
Speaker 6 in test matches with speaker 8 in train for training with LPC

Speaker 7 in test matches with speaker 7 in train for training with MFCC
Speaker 7 in test matches with speaker 8 in train for training with LPC

Speaker 8 in test matches with speaker 8 in train for training with MFCC
Speaker 8 in test matches with speaker 8 in train for training with LPC

Accuracy for small dataset:

Accuracy of result for training with MFCC is 87.5 %
Accuracy of result for training with LPC is 12.5 %

Accuracy for large dataset:

Accuracy of result for training with MFCC is 87.2340425531915 %
Accuracy of result for training with LPC is 2.127659574468085 %

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Automatic-Speaker-Recognition

OUTPUT

Accuracy for small dataset:

Accuracy for large dataset:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Automatic-Speaker-Recognition

OUTPUT

Accuracy for small dataset:

Accuracy for large dataset: