|
Abstract Speaker recognition is the process of automatically recognizingthe speaker on the basis of information obtained from speech waves.It can be usually divided into two subclasses: speaker identificationand speaker verification. The speech signal contains both the phoneme and the speakercharacteristics. While the former carries the phoneme messages, thelater bring the information of the speaker. This thesis considers thetwo characteristics on speaker recognition. First, we discuss theeffects of phoneme characteristics on speaker recognition. We constructone single GMM and 10 digital HMMs for each speaker. The GMM is referredto the condition of reducing the phoneme information, and the HMMs areassociated to that of dealing with both the phoneme information andspeaker characteristics. We exams the performance of the two kinds ofmodels through various mixture numbers, training data quantity, testingdata length, and the speaker population size. With the total mixturesnumber equal to 60, the error rate (ER) of using GMMs is 7.08% in speakeridentification, and equal error rate (EER) 6.16% in speaker verificationsystem. While using the HMMs, we can reduce the ER to 6.69% in speakeridentification, and the EER to 5.86% in speaker verification. Because that the consideration of phoneme and speaker characteristicsresult in better performance, we provide four schemes for speakerrecognition based on the HMMs in the second part. These four schemesconsider different weights of the phoneme characteristics in speakerrecognition. The best scheme is the model-combining method with the compensationmodification. This method can lead to an ER 5.73% in speaker identificationand an EER 5.15% in speaker verification. The second one is theframe-refining method which modify the reliability of each frame of aninput utterance. Using this method can reduce the ER to 6.56% in speakeridentification and the EER to 5.78% in speaker verification.
|