Skip to content

Latest commit

 

History

History
76 lines (56 loc) · 5.03 KB

README.md

File metadata and controls

76 lines (56 loc) · 5.03 KB

Automatic-Speaker-Recognition

Speaker Recognition is the problem of identifying a speaker from a recording of their speech sample. It is an important topic in Signal Processing and has a variety of applications, especially in security systems. Voice controlled devices also rely heavily on speaker recognition.

The modules I used to do this project are NumPy, SciPy and Matplotlib that have a major area of coverage in building appplications of Signal Processing and plotting them

The main principle behind Speaker Recognition is extraction of features from speech followed by training on a data set and testing. While doing this project, I mainly got the opportunity to get indroduced to the basics of Digital Signal Processing, Feature extraction using two different algorithms (MFCC and LPC),Feature Matching (LBG)


STEP1:FEATURE EXTRACTION (MFCC-Mel Frequency Cepstral Coefficients):

  • Human hearing as expected is not linear in nature rather it is logarithmic. Our ears act as a filter.
  • Most popular MFCC's are based on the known variation of the human ear’s critical bandwidths with frequency. This is expressed in the mel-frequency scale.
  • The speech signal is divided into frames of 25ms with an overlap of 10ms and multiplied with a hamming window.
  • The periodogram of each frame of speech is calculated by first doing an FFT of 512 sampleson individual frames, then taking the power spectrum
  • The entire frequency range is divided into ‘n’ Mel filter banks(12 here)
  • Then filterbank energies are calculated by multiplying the each filter bank with power spectrum and add up the coefficients.
  • Finally, applying discrete cosine transform on logarithm of these distinct 'n' energies give MFCCs.

STEP2:FEATURE EXTRACTION (LPC-Linear Prediction Coefficients):

  • LPCs are also the popular technique of feature Extraction. It is based on the AutoRegressive Model of the speech.
  • In this extraction also, the signal is framed same as mentioned in MFCCs.
  • To estimate the LPC coefficients, we use the Yule-Walker Equations which uses Auto-correlation function.

STEP3:FEATURE MATCHING (LBG-Linde-Buzo-Gray):

  • Generally, the main approach of Feature Matching is mapping vectors from a large vector space to a finite number of regions in that space. Each region is called a cluster and can be represented by its center called a codeword. The collection of all codewords is called a codebook.
  • A vector codebook is designed which is the centroid of entire set of training vectors.
  • Now, the codebook size is doubled by splitting the current one and closest codeword is searched for every training vector and assigned as centroid in next iteration.
  • This iterations carry out until vector distortion for current iteration falls below a certain value.

STEP4:TRAINING:

  • The small dataset contains 8 speakers and 16 centroids.
  • Now dataset needs to be trained to derive codebook for each speaker.
  • The datset goes through all the extractions of MFCC and LPC and givies out a codebook of mfcc and lpc.

STEP5:TESTING:

  • Now, All the features of each speech signal must be compared with the codebooks of training files.
  • The results would be obtained for the testing.

OUTPUT

SMALL DATSET:

Speaker 1 in test matches with speaker 4 in train for training with MFCC
Speaker 1 in test matches with speaker 8 in train for training with LPC

Speaker 2 in test matches with speaker 2 in train for training with MFCC
Speaker 2 in test matches with speaker 8 in train for training with LPC

Speaker 3 in test matches with speaker 3 in train for training with MFCC
Speaker 3 in test matches with speaker 8 in train for training with LPC

Speaker 4 in test matches with speaker 4 in train for training with MFCC
Speaker 4 in test matches with speaker 8 in train for training with LPC

Speaker 5 in test matches with speaker 5 in train for training with MFCC
Speaker 5 in test matches with speaker 8 in train for training with LPC

Speaker 6 in test matches with speaker 6 in train for training with MFCC
Speaker 6 in test matches with speaker 8 in train for training with LPC

Speaker 7 in test matches with speaker 7 in train for training with MFCC
Speaker 7 in test matches with speaker 8 in train for training with LPC

Speaker 8 in test matches with speaker 8 in train for training with MFCC
Speaker 8 in test matches with speaker 8 in train for training with LPC

Accuracy for small dataset:

Accuracy of result for training with MFCC is 87.5 %
Accuracy of result for training with LPC is 12.5 %

Accuracy for large dataset:

Accuracy of result for training with MFCC is 87.2340425531915 %
Accuracy of result for training with LPC is 2.127659574468085 %