Effect of time-domain windowing on isolated speech recognition system performance

Authors

Abstract

Speech recognition system extract the textual data from the speech signal. The research in speech recognition domain is challenging due to the large variabilities involved with the speech signal. Variety of signal processing and machine learning techniques have been explored to achieve better recognition
accuracy. Speech is highly non-stationary in nature and therefore analysis is carried out by considering short time-domain window or frame. In the speech recognition task, cepstral (Mel frequency cepstral coefficients (MFCC)) features are commonly used and are extracted for short time-frame. The effectiveness of features depend upon duration of the time-window chosen. The present study is aimed at investigation of optimal time-window duration for extraction of cepstral features in the context of speech recognition task. A speaker independent speech recognition system for the Kannada language has been considered for the analysis. In the current work, speech utterances of Kannada news corpus
recorded from different speakers have been used to create speech database. The hidden Markov tool kit (HTK) has been used to implement the speech recognition system. The MFCC along with their first and second derivative coefficients are considered as feature vectors. Pronunciation dictionary required for the study
has been built manually for mono-phone system. Experiments have been carried out and results have been analyzed for different time-window lengths. The overlapping Hamming window has been considered in this study. The best average word recognition accuracy of 61.58% has been obtained for a window length of 110 msec duration. This recognition accuracy is comparable with the similar work found in literature. The experiments have shown that best word recognition performance can be achieved by tuning the window length to its optimum value.

Author Biographies

Ananthakrishna Thalengala, Manipal Academy of Higher Education (MAHE)

Department of Electronics and Communication Engineering, 

Manipal Institute of Technology (MIT)

Anitha Hoblidar, Manipal Academy of Higher Education (MAHE)

Department of Electronics and Communication Engineering, 

Manipal Institute of Technology (MIT)

Girisha S Tumkur, Manipal Academy of Higher Education (MAHE)

Department of Electronics and Communication Engineering, 

Manipal Institute of Technology (MIT)

References

Bharali, S. S., & Kalita, S. K., ”A comparative study of different features for isolated spoken word recognition using HMM with reference to Assamese language”. International Journal of Speech Technology, 18(4), 673-684 (2015).

Kumar, K., Aggarwal, R. K., & Jain, A., ”A Hindi speech recognition system for connected words using HTK”, International Journal of Computational Systems Engineering, 1(1), 25-32 (2012).

Thangarajan, R., Natarajan, A. M., & Selvam, M., ”Syllable modeling in

continuous speech recognition for Tamil language”, International Journal

of Speech Technology, 12(1), 47-57 (2009).

Dua, M., Aggarwal, R. K., Kadyan, V., & Dua, S., ”Punjabi automatic

speech recognition using HTK”, IJCSI International Journal of Computer

Science Issues, 9(4), 1694-0814 (2012).

Hegde, S., Achary, K., & Shetty, S., ”Statistical analysis of features

and classification of alphasyllabary sounds in Kannada language”, International Journal of Speech Technology, 18(1), 65–75 (2015).

Panda, S. P., & Nayak, A. K., ”Automatic speech segmentation in

syllable centric speech recognition system”, International Journal of Speech Technology, 19(1), 9-18 (2016).

Thangarajan, R., Natarajan, A., & Selvam, M., ”Syllable modeling in

continuous speech recognition for Tamil language”, International Journal

of Speech Technology, 12(1), 47–57 (2009).

Manjunath, K. E., Jayagopi, D. B., Rao, K. S., & Ramasubramanian,

V. (2019), ”Development and analysis of multilingual phone recogni-

tion systems using Indian languages”, International Journal of Speech

Technology, 22(1), 157-168.

Kumar, C. S., & Mohandas, V. P. (2011), ”Robust features for multi-

lingual acoustic modeling”, International Journal of Speech Technology,

(3), 147-155.

Ananthakrishna, T., Maithri, M., & Shama, K., ”Kannada word recognition system using HTK”, In 2015 Annual India Conference, INDICON, New Delhi, India , pp. 1-5, (2015, December).

Thalengala, A., & Shama, K., ”Study of sub-word acoustical models

for Kannada isolated word recognition system”, International Journal of

Speech Technology, 19(4), 817-826, (2016).

Thalengala Ananthakrishna, Kumara Shama, and Maithri Mangalore,

”Performance Analysis of Isolated Speech Recognition System Using

Kannada Speech Database”, Pertanika Journal of Science & Technology

4 (2018).

Rabiner, L. R., Juang B. H., & Yegnanarayana B., ”Fundamentals of

speech recognition”, Englewood Cliffs: PTR Prentice Hall (2012).

Rabiner, L. R., ”A tutorial on hidden Markov models and selected

applications in speech recognition”, Proceedings of the IEEE, 77(2),

-286 (1989).

Deller J. R., Proakis J. G. & Hansen J. H. L., ”Discrete Time Processing of Speech Signals”, New York: Macmillan Publishing Company, (1993).

Shridhara, M. V., Banahatti, B. K., Narthan, L., Karjigi, V., & Kumaraswamy, R. (2013, November), ”Development of Kannada speech

corpus for prosodically guided phonetic search engine”, In 2013 international conference oriental COCOSDA held jointly with 2013

conference on Asian spoken language research and evaluation (O-COCOSDA/CASLRE) (pp. 1-6). IEEE.

Krishnamurti, B., ”The Dravidian Languages”, Cambridge: Cambridge

University Press, (2003).

Steever, S. B., ”The Dravidian languages. London: Routledge Publications, (2015).

Akhmetov, B., Tereykovsky, I., Doszhanova, A., & Tereykovskaya,

L. (2018), ”Determination of input parameters of the neural network

model, intended for phoneme recognition of a voice signal in the systems of distance learning”, International Journal of Electronics and

Telecommunications, 64(4), 425-432.

Kumar, R. S., & Lajish, V. L. (2013), ”Phoneme recognition using zero-crossing interval distribution of speech patterns and ANN”, International Journal of Speech Technology, 16(1), 125-131.

Young S., Evermann G, Gales M., Hain T., Kershaw D., Liu, ”The HTK

book (Vol. 2)” Cambridge: Entropic Cambridge Research Laboratory.

Davis, S., & Mermelstein, P., ”Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357-366 (1980).

Nilsson, M., ”First Order Hidden Markov Model: Theory and implementation issues”, Technical Report, 2005:02. Blekinge Institute of Technology.

OShaughnessy, D., ”Automatic speech recognition: History, methods and challenges”, Pattern Recognition, 41(10), 2965–2979 (2008).

Downloads

Published

2024-04-19

Issue

Section

Digital Signal Processing