"Emotion Recognition Using Speech Features" provides coverage of emotion-specific features present in speech. The author also discusses suitable models for capturing emotion-specific information for distinguishing different emotions. The content of this book is important for designing and developing natural and sophisticated speech systems. In this Brief, Drs. Rao and Koolagudi lead a discussion of how emotion-specific information is embedded in speech and how to acquire emotion-specific knowledge using appropriate statistical models. Additionally, the authors provide information about exploiting multiple evidences derived from various features and models. The acquired emotion-specific knowledge is useful for synthesizing emotions. Features includes discussion of: * Global and local prosodic features at syllable, word and phrase levels, helpful for capturing emotion-discriminative information; * Exploiting complementary evidences obtained from excitation sources, vocal tract systems and prosodic features in order to enhance the emotion recognition performance; * Proposed multi-stage and hybrid models for improving the emotion recognition performance.
This brief is for researchers working in areas related to speech-based products such as mobile phone manufacturing companies, automobile companies, and entertainment products as well as researchers involved in basic and applied speech processing research.
Contents 1 Introduction ... 1 1.1 Emotion: Psychological perspective ... 2 1.2 Emotion: Speech signal perspective ... 3 1.2.1 Speech production mechanism ... 4 1.2.2 Source features ... 5 1.2.3 System features ... 5 1.2.4 Prosodic features ... 7 1.3 Emotional speech databases ... 8 1.4 Applications of speech emotion recognition ... 9 1.5 Issues in speech emotion recognition ... 10 1.6 Objectives and scope of the work ... 11 1.7 Main highlights of research investigations ... 12 1.8 Brief overview of contributions to this book ... 12 1.8.1 Emotion recognition using excitation source information ... 12 1.8.2 Emotion recognition using vocal tract information ... 12 1.8.3 Emotion recognition using prosodic information ... 13 1.9 Organization of the book ... 13 ... 11.1 Emotion: Psychological perspective ... 21.2 Emotion: Speech signal perspective ... 3 1.2.1 Speech production mechanism ... 4 1.2.2 Source features ... 5 1.2.3 System features ... 5 1.2.4 Prosodic features ... 7 1.3 Emotional speech databases ... 8 1.4 Applications of speech emotion recognition ... 9 1.5 Issues in speech emotion recognition ... 10 1.6 Objectives and scope of the work ... 11 1.7 Main highlights of research investigations ... 12 1.8 Brief overview of contributions to this book ... 12 1.8.1 Emotion recognition using excitation source information ... 12 1.8.2 Emotion recognition using vocal tract information ... 12 1.8.3 Emotion recognition using prosodic information ... 13 1.9 Organization of the book ... 13 Emotion: Psychological perspective ... 2 1.2 Emotion: Speech signal perspective ... 3 1.2.1 Speech production mechanism ... 4 1.2.2 Source features ... 5 1.2.3 System features ... 5 1.2.4 Prosodic features ... 7 1.3 Emotional speech databases ... 8 1.4 Applications of speech emotion recognition ... 9 1.5 Issues in speech emotion recognition ... 10 1.6 Objectives and scope of the work ... 11 1.7 Main highlights of research investigations ... 12 1.8 Brief overview of contributions to this book ... 12 1.8.1 Emotion recognition using excitation source information ... 12 1.8.2 Emotion recognition using vocal tract information ... 12 1.8.3 Emotion recognition using prosodic information ... 13 1.9 Organization of the book ... 13 Emotion: Speech signal perspective ... 3 1.2.1 Speech production mechanism ... 4 1.2.2 Source features ... 5 1.2.3 System features ... 5 1.2.4 Prosodic features ... 7 1.3 Emotional speech databases ... 8 1.4 Applications of speech emotion recognition ... 9 1.5 Issues in speech emotion recognition ... 10 1.6 Objectives and scope of the work ... 11 1.7 Main highlights of research investigations ... 12 1.8 Brief overview of contributions to this book ... 12 1.8.1 Emotion recognition using excitation source information ... 12 1.8.2 Emotion recognition using vocal tract information ... 12 1.8.3 Emotion recognition using prosodic information ... 13 1.9 Organization of the book ... 13 2 Speech Emotion Recognition: A Review ... 17 2.1 Introduction ... 17 2.2 Emotional speech corpora: A review... 18 2.3 Excitation source features: A review ... 22 2.4 Vocal tract system features: A review ... 24 2.5 Prosodic features: A review ... 25 2.6 Classification models ... 28 2.7 Motivation for the present work ... 31 2.8 Summary of the literature and scope for the present work ... 31... 17 2.1 Introduction ... 17 2.2 Emotional speech corpora: A review... 18 >2.3 Excitation source features: A review ... 22 2.4 Vocal tract system features: A review ... 24 2.5 Prosodic features: A review ... 25 2.6 Classification models ... 28 2.7 Motivation for the present work ... 31 2.8 Summary of the literature and scope for the present work ... 31 3 Emotion Recognition using Excitation Source Information ... 33 3.1 Introduction ... 33 3.2 Motivation ... 34... 33 3.1 Introduction ... 33 3.2 Motivation ... 34 viii Contents 3.3 Emotional speech corpora ... 37 3.3.1 Indian Institute of Technology Kharagpur-Simulated Emotional Speech Corpus: IITKGP-SESC ... 38 3.3.2 Berlin Emotional Speech Database: Emo-DB ... 40 3.4 Excitation source features for emotion recognition ... 40 3.4.1 Higher-order relations among LP residual samples ... 41 3.4.2 Phase of LP residual signal ... 43 3.4.3 Parameters of the instants of glottal closure (Epoch parameters) ... 44 3.4.4 Dynamics of epoch parameters at syllable level ... 48 3.4.5 Dynamics of epoch parameters at utterance level ... 49 3.4.6 Glottal pulse parameters ... 50 3.5 Classification models ... 50 3.5.1 Auto-associative neural networks ... 50 3.5.2 Support vector machines ... 53 3.6 Results and discussion ... 54 3.7 Summary ... 64 4 Emotion Recognition using Vocal Tract Information ... 67 4.1 Introduction ... 67 4.2 Feature extraction ... 69 4.2.1 Linear prediction cepstral coefficients (LPCCs) ... 69 4.2.2 Mel frequency cepstral coefficients (MFCCs) ... 70 4.2.3 Formant features ... 71 4.3 Classifiers ... 73 4.3.1 Gaussian mixture models (GMM) ... 73 4.4 Results and discussion ... 74 4.5 Summary ... 78 ... 67 4.1 Introduction ... 67 4.2 Feature extraction ... 69 4.2.1 Linear prediction cepstral coefficients (LPCCs) ... 69 4.2.2 Mel frequency cepstral coefficients (MFCCs) ... 70 4.2.3 Formant features ... 71 4.3 Classifiers ... 73 4.3.1 Gaussian mixture models (GMM) ... 73 4.4 Results and discussion ... 74 4.5 Summary ... 78 5 Emotion Recognition using Prosodic Information ... 815.1 Introduction ... 81 5.2 Prosodic features: importance in emotion recognition ... 82 5.3 Motivation ... 85 5.4 Extraction of global and local prosodic features ... 86 5.5 Results and discussion ... 88 5.6 Summary ... 93 ... 81 5.1 Introduction ... 81 5.2 Prosodic features: importance in emotion recognition ... 82 5.3 Motivation ... 85 5.4 Extraction of global and local prosodic features ... 86 5.5 Results and discussion ... 88 5.6 Summary ... 93 6 Summary and Conclusions ... 95 6.1 Summary of the present work ... 95 6.2 Contributions of the present work ... 97 6.3 Conclusions from the present work ... 97 6.4 Scope for future work ... 97 ... 95 6.1 Summary of the present work ... 95 6.2 Contributions of the present work ... 97 6.3 Conclusions from the present work ... 97 6.4 Scope for future work ... 97 A Linear Prediction Analysis of Speech ... 101 A.1 The Prediction Error Signal ... 103 A.2 Estimation of Linear Prediction Coefficients ... 103 ... 101 A.1 The Prediction Error Signal ... 103 A.2 Estimation of Linear Prediction Coefficients ... 103 Contents ix B MFCC Features ... 107 ... 107 C Gaussian Mixture Model (GMM) ... 111 C.1 Training the GMMs... 112 C.1.1 Expectation Maximization (EM) Algorithm ... 112 C.1.2 Maximum a posteriori (MAP) Adaptation ... 113 C.2 Testing ... 115 References ... 116 ... 111 C.1 Training the GMMs... 112 C.1.1 Expectation Maximization (EM) Algorithm ... 112 C.1.2 Maximum a posteriori (MAP) Adaptation ... 113C.2 Testing ... 115 References ... 116 a posteriori (MAP) Adaptation ... 113 C.2 Testing ... 115 References ... 116