Аналіз мовленнєвої шкали MEL та її класифікація як великих даних за допомогою параметризованого KNN

Шукати за:

Роком видання

Автором

Назвою статті

https://doi.org/10.15407/jai2021.01.042

Аналіз мовленнєвої шкали MEL та її класифікація як великих даних за допомогою параметризованого KNN

Скуратовський Р.В.¹, Базарна А.Д.¹, Осадчий Є.О.¹

¹ Міжрегіональна академія управління персоналом

Повний текст (PDF)

УДК: 4.093
Мова публікації: Англійська
Stuc. intelekt. 2021; 26; (1):42-57

Анотація: . Розпізнавання емоцій та людської мови завжди було захоплюючим викликом для вчених. У нашій роботі для вектора даних, отриманого з речення, реалізується і ефективно застосовується параметризація цього вектора, що містить емоційно забарвлену частину і інформаційну частину. Виразність людської мови посилюється емоціями, які вона передає. Існує декілька характеристик та особливостей мови, які розрізняють її серед висловлювань, тобто різні провідні характеристики, такі як висота, тембр, гучність. Ми доповнили їх новою класифікаційною ознакою мови, що полягає в поділі пропозиції на емоційно навантажену частину пропозиції і частину, яка несе тільки інформаційне навантаження. Таким чином, еталонна фраза з даного класу емоцій змінюється, оскільки вибір відбувається з відповідного класу емоцій для підрахунку її відстані методом параметризованого KNN (зразок мови змінюється, коли він піддається впливу різних емоційних середовищ). Оскільки визначення емоційних станів мовця може бути виконано на основі шкали МEL, MFCC є одним з таких варіантів вивчення емоційних аспектів висловлювань. У цій роботі ми реалізуємо модель для визначення декількох емоційних станів з MFCC для декількох наборів даних, класифікуємо емоції для них на основі характеристик MFCC і даємо відповідне порівняння. Окрім статистичного аналізу тонального портрету автора, який застосовується, зокрема, у MFFC, ми запропонували новий метод динамічного аналізу: лише те, що сказано фразами, як нова лінгвістично-емоційна сутність виробленої теми самим автором. Завдяки ранжуванню згідно важливості особливостей голосової шкали нам вдалося параметризувати координати векторів, які будуть оброблятися методом параметризованого KNN. Розпізнавання мови - це багаторівнева задача розпізнавання образів. Тут акустичні сигнали аналізуються і структуруються в ієрархію структурних елементів, слів, фраз і пропозицій. Кожен рівень такої ієрархії може передбачати деякі тимчасові константи: можливі послідовності слів або відомі види вимови, які дозволяють зменшити кількість помилок розпізнавання на більш низькому рівні. Аналіз голосу і динаміки мови доцільний для підвищення якості сприйняття людиною, формування людської мови машиною і знаходиться в межах можливостей штучного інтелекту. Результати визначення емоцій можуть широко застосовуватися в платформах електронного навчання, автомобільних бортових системах, медицині і т. д.

Ключові слова: машинне навчання; розпізнавання мови; розпізнавання емоцій; MFCC; контрольоване навчання; дерева рішень, шкала MEL.

Посилання:

S. G., K. Koolagudi, and K. S. Rao, ‘Emotion recognition from speech: A review’, in International Journal of Speech Technology, 2012, https://doi.org/10.1007/s10772-011-9125-1.
C. Marechal et al., ‘Survey on AI-based multimodal methods for emotion detection’, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2019.
K. S. Rao, S. G. Koolagudi, and R. R. Vempada, ‘Emotion recognition from speech using global and local prosodic features’, International Journal of Speech Technology, 2013. DOIhttps://doi.org/10.1007/s10772-012-9172-2
S. G. Koolagudi, A. Barthwal, S. Devliyal, and K. Sreenivasa Rao, ‘Real life emotion classification from speech using gaussian mixture models’, in Communications in Computer and Information Science, 2012. DOIhttps://doi.org/10.1007/978-3-642-32129-0_28
S. Latif, R. Rana, S. Younis, J. Qadir, and J. Epps, ‘Transfer learning for improving speech emotion classification accuracy’, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2018-Septe, no. January, pp. 257–261, 2018.
C. M. Lee and S. S. Narayanan, ‘Toward detecting emotions in spoken dialogs’, IEEE Transactions on Speech and Audio Processing, 2005.
R. Banse and K. R. Scherer, ‘Acoustic profiles in vocal emotion expression.’, Journal of Personality and Social Psychology, vol. 70, no. 3, pp. 614–636, 1996.
V. Hozjan and Z. Kačič, ‘Context-independent multilingual emotion recognition from speech signals’, International Journal of Speech Technology, 2003.
S. Ramakrishnan, ‘Recognition of Emotion from Speech: A Review’, in Speech Enhancement, Modeling and Recognition- Algorithms and Applications, 2012.
N. Sebe, I. Cohen, and T. S. Huang, ‘Multimodal emotion recognition’, in Handbook of Pattern Recognition and Computer Vision, 3rd Edition, 2005.
Q. Zhang, Y. Wang, L. Wang, and G. Wang, ‘Research on speech emotion recognition in E-learning by using neural networks method’, in 2007 IEEE International Conference on Control and Automation, ICCA, 2007.
S. Jing, X. Mao, and L. Chen, ‘Prominence features: Effective emotional features for speech emotion recognition’, Digital Signal Processing: A Review Journal, vol. 72, no. October, pp. 216–231, 2018.
E. M. Albornoz, D. H. Milone, and H. L. Rufiner, ‘Spoken emotion recognition using hierarchical classifiers’, Computer Speech and Language, 2011.
A. Özseven, T.; Düğenci, M.; Durmuşoğlu, ‘A Content Analysis of The Research Approaches in Speech Emotion’, International Journal of Engineering Sciences & Research Technology, 2018.
K. V. Krishna Kishore and P. Krishna Satish, ‘Emotion recognition in speech using MFCC and wavelet features’, in Proceedings of the 2013 3rd IEEE International Advance Computing Conference, IACC 2013, 2013.
A. Yousefpour, R. Ibrahim, and H. N. A. Hamed, ‘Ordinal-based and frequency-based integration of feature selection methods for sentiment analysis’, Expert Systems with Applications, 2017.
L. Shu et al., ‘A review of emotion recognition using physiological signals’, Sensors (Switzerland). 2018.
S. Oosterwijk, K. A. Lindquist, E. Anderson, R. Dautoff, Y. Moriguchi, and L. F. Barrett, ‘States of mind: Emotions, body feelings, and thoughts share distributed neural networks’, NeuroImage, 2012.
L. Pessoa, ‘Emotion and cognition and the amygdala: From ``what is it?{’’} to ``what’s to be done?{’’} (Reprinted from Neuropsychologia, vol 48, pg
S. G., K. Koolagudi, and K. S. Rao, ‘Emotion recognition from speech: A review’, in International Journal of Speech Technology, 2012.
P. Winkielman, P. Niedenthal, J. Wielgosz, J. Eelen, and L. C. Kavanagh, ‘Embodiment of cognition and emotion, in APA handbook of personality and social psychology, Volume 1: Attitudes and social cognition., 2014.
A. Fernández-Caballero et al., ‘Smart environment architecture for emotion detection and regulation’, Journal of Biomedical Informatics, 2016.
H. Guan, Z. Liu, L. Wang, J. Dang, and R. Yu, ‘Speech Emotion Recognition Considering Local Dynamic Features’, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018.
L. Cen, F. Wu, Z. L. Yu, and F. Hu, ‘A Real-Time Speech Emotion Recognition System and its Application in Online Learning’, in Emotions, Technology, Design, and Learning, 2016.
V. Shuman and K. R. Scherer, ‘Emotions, Psychological Structure of’, in International Encyclopedia of the Social & Behavioral Sciences: Second Edition, 2015.
P. Ekman, ‘Basic Emotions’, in Handbook of Cognition and Emotion, 2005.
O. Langner, R. Dotsch, G. Bijlstra, D. H. J. Wigboldus, S. T. Hawk, and A. van Knippenberg, ‘Presentation and validation of the radboud faces database’, Cognition and Emotion, 2010.
P. Ekman, ‘Facial expression and emotion’, American Psychologist, 1993.
C. Bourke, K. Douglas, and R. Porter, ‘Processing of facial emotion expression in major depression: A review’, Australian and New Zealand Journal of Psychiatry. 2010.
J. Van den Stock, R. Righart, and B. de Gelder, ‘Body Expressions Influence Recognition of Emotions in the Face and Voice’, Emotion, 2007.
R. Banse and K. R. Scherer, ‘Acoustic Profiles in Vocal Emotion Expression’, Journal of Personality and Social Psychology, 1996.
T. Gulzar, A. Singh, and S. Sharma, ‘Comparative Analysis of LPCC, MFCC and BFCC for the Recognition of Hindi Words using Artificial Neural Networks’, International Journal of Computer Applications, 2014.
U. Shrawankar and V. M. Thakare, ‘Techniques for Feature Extraction In Speech Recognition System : A Comparative Study’, 2013.
R. E. Haamer, E. Rusadze, I. Lüsi, T. Ahmed, S. Escalera, and G. Anbarjafari, ‘Review on Emotion Recognition Databases’, in Human-Robot Interaction - Theory and Application, 2018.
S. Lalitha, D. Geyasruti, R. Narayanan, and M. Shravani, ‘Emotion Detection Using MFCC and Cepstrum Features’, Procedia Computer Science, vol. 70, pp. 29–35, 2015.
P. Jackson and S. Haq, ‘Surrey audio-visual expressed emotion (savee) database’, University of Surrey: Guildford, UK, 2014.
Z. T. Liu, Q. Xie, M. Wu, W. H. Cao, Y. Mei, and J. W. Mao, ‘Speech emotion recognition based on an improved brain emotion learning model’, Neurocomputing, 2018.
P. Ekman et al., ‘Universals and Cultural Differences in the Judgments of Facial Expressions of Emotion’, Journal of Personality and Social Psychology, 1987.
Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang, ‘A survey of affect recognition methods: Audio, visual, and spontaneous expressions’, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009.
A. Koduru, H. B. Valiveti, and A. K. Budati, ‘Feature extraction algorithms to improve the speech emotion recognition rate’, International Journal of Speech Technology, 2020.
K. Kumar, C. Kim, and R. M. Stern, ‘Delta-spectral cepstral coefficients for robust speech recognition’, in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2011.
V. Tiwari, ‘MFCC and its applications in speaker recognition’, International Journal on Emerging Technologies, 2010.
N. Dave, ‘Feature Extraction Methods LPC , PLP and MFCC In Speech Recognition’, International Journal for Advance Research in Engineering and Technology, 2013.
M. Yankayi, ‘Feature Extraction Mel Frequency Cepstral Coefficients ( Mfcc )’, pp. 1–6, 2016.
S. Ananthakrishnan and S. S. Narayanan, ‘Automatic prosodic event detection using acoustic, lexical, and syntactic evidence’, IEEE Transactions on Audio, Speech and Language Processing, 2008.
T. Kinnunen and H. Li, ‘An overview of text-independent speaker recognition: From features to supervectors’, Speech Communication, 2010.
W. Y. Wang, F. Biadsy, A. Rosenberg, and J. Hirschberg, ‘Automatic detection of speaker state: Lexical, prosodic, and phonetic approaches to level-of-interest and intoxication classification’, Computer Speech and Language, 2013.
J. Lyons, ‘Mel Frequency Cepstral Coefficient’, Practical Cryptography. 2014.
H. K. Palo, M. Chandra, and M. N. Mohanty, ‘Recognition of Human Speech Emotion Using Variants of Mel-Frequency Cepstral Coefficients’, Lecture Notes in Electrical Engineering, vol. 442, pp. 491–498, 2018.
M. Yazici, S. Basurra, and M. Gaber, ‘Edge Machine Learning: Enabling Smart Internet of Things Applications’, Big Data and Cognitive Computing, 2018.
Xia Wang, Yuan Dong, J. Hakkinen, and O. Viikki, ‘Noise robust Chinese speech recognition using feature vector normalization and higher-order cepstral coefficients’, 2002.
S. B. DAVIS and P. MERMELSTEIN, ‘Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences’, in Readings in Speech Recognition, 1990.
D. Palaz, M. Magimai-Doss, and R. Collobert, ‘End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition’, Speech Communication, 2019.
V. Passricha and R. K. Aggarwal, ‘A comparative analysis of pooling strategies for convolutional neural network based Hindi ASR’, Journal of Ambient Intelligence and Humanized Computing, 2020.
C. Vimala and V. Radha, ‘Suitable Feature Extraction and Speech Recognition Technique for Isolated Tamil Spoken Words’, International Journal of Computer Science and Information Technologies, 2014.
C. P. Dalmiya, V. S. Dharun, and K. P. Rajesh, ‘An efficient method for Tamil speech recognition using MFCC and DTW for mobile applications’, in 2013 IEEE Conference on Information and Communication Technologies, ICT 2013, 2013.
A. NithyaKalyani and S. Jothilakshmi, ‘Speech summarization for tamil language’, in Intelligent Speech Signal Processing, 2019.
S. S. Stevens, J. Volkmann, and E. B. Newman, ‘A Scale for the Measurement of the Psychological Magnitude Pitch’, Journal of the Acoustical Society of America, 1937.
D. Mitrović, M. Zeppelzauer, and C. Breiteneder, ‘Features for Content-Based Audio Retrieval’, 2010.
R. Caruana and A. Niculescu-Mizil, ‘An empirical comparison of supervised learning algorithms’, in ACM International Conference Proceeding Series, 2006.
S. B. Kotsiantis, ‘Supervised machine learning: A review of classification techniques’, Informatica (Ljubljana). 2007.
M. Luckner, B. Topolski, and M. Mazurek, ‘Application of XGboost algorithm in fingerprinting localisation task’, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017.
O. Sutton, ‘Introduction to k Nearest Neighbour Classification and Condensed Nearest Neighbour Data Reduction’, Introduction to k Nearest Neighbour Classification, 2012.
Z. Deng, X. Zhu, D. Cheng, M. Zong, and S. Zhang, ‘Efficient kNN classification algorithm for big data’, Neurocomputing, 2016.
Okfalisa, I. Gazalba, Mustakim, and N. G. I. Reza, ‘Comparative analysis of k-nearest neighbor and modified k-nearest neighbor algorithm for data classification’, in Proceedings - 2017 2nd International Conferences on Information Technology, Information Systems and Electrical Engineering, ICITISEE 2017, 2018.
Ruslan V. Skuratovskii. The timer compression of data and information Proceedings of the 2020 IEEE 3rd International Conference on Data Stream Mining and Processing, DSMP 2020, pp. 455-459. DOI 10.1007/978-3-030-61656-4
Skuratovskii, R. V. Employment of Minimal Generating Sets and Structure of Sylow 2-Subgroups Alternating Groups in Block Ciphers. Advances in Computer Communication and Computational Sciences, Springer, pp. 351–364, 2019.
Gnatyuk, V. A. Mechanism of laser damage of transparent semiconductors.Physica B: Condensed Matter,. pp. 308-310, 2001.
Mikhail Z. Zgurovsky, N.D. Pankratova. System Analysis: Theory and Applications. Springer Verlag. Berlin. 2007. P. 446.
Romanenko, Y.O. 2016, "Place and role of communication in public policy", Actual Problems of Economics, vol. 176, no. 2, pp. 25-26.
Ruslan V. Skuratovskii. On commutator subgroups of Sylow 2-subgroups of the alternating group, and the commutator width in wreath products. European Journal of Mathematics. (vol.7), 2021, pp. 353-373. doi.org/10.1007/s40879-020-00418-9

Переглянути повний текст статті (PDF)

Штучний інтелект

Науковий журнал

Шукати за:

Аналіз мовленнєвої шкали MEL та її класифікація як великих даних за допомогою параметризованого KNN