Change Point Determination in Audio Data Using Auditory Features

Tomasz Mąka


The study is aimed to investigate the properties of auditory-based features for audio change point detection process. In the performed analysis, two popular techniques have been used: a metric-based approach and the BIC scheme. The efficiency of the change point detection process depends on the type and size of the feature space. Therefore, we have compared two auditory-based feature sets (MFCC and GTEAD) in both change point detection schemes. We have proposed a new technique based on multiscale analysis to determine the content change in the audio data. The comparison of the two typical change point detection techniques with two different feature spaces has been performed on the set of acoustical scenes with single change point. As the results show, the accuracy of the detected positions depends on the feature type, feature space dimensionality, detection technique and the type of audio data. In case of the BIC approach, the better accuracy has been obtained for MFCC feature space in the most cases. However, the change point detection with this feature results in a lower detection ratio in comparison to the GTEAD feature. Using the same criteria as for BIC, the proposed multiscale metric-based technique has been executed. In such case, the use of the GTEAD feature space has led to better accuracy. We have shown that the proposed multiscale change point detection scheme is competitive to the BIC scheme with the MFCC feature space.

Full Text:



T. Kemp and M. Schmidt and M. Westphal and A. Waibel, Strategies for automatic segmentation of audio data, In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP'00, 5-9 June, Istanbul, 2000, DOI: 10.1109/ICASSP.2000.861862.

S. Chen and P. Gopalakrishnan, Speaker, environment and channel change detection and clustering via the bayesian information criterion, In Proc. DARPA Broadcast News Transcription and Understanding Workshop, 1998.

K. West, and S. Cox, Finding an Optimal Segmentation for Audio Genre Classification, in Proceedings of 6th International Conference on Music Information Retrieval ISMIR'2005, 11-15 September, London, UK, 2005.

G. Hu and D. Wang, Auditory Segmentation Based on Onset and Offset Analysis, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 2, pp. 396-405, February, 2007, DOI: 10.1109/TASL.2006.881700.

J. Foote, Automatic audio segmentation using a measure of audio novelty,

Multimedia and Expo --ICME 2000, IEEE International Conference, New York, NY, USA, 2000, DOI: 10.1109/ICME.2000.869637.

P. Hanna and N. Louis and M. Desainte-Catherine, and J. Benois-Pineau,

Audio features for noisy sound segmentation, International Society for Music Information Retrieval Conference -- ISMIR'2004, Barcelona, Spain, October 10--14 2004, vol. 1, pp. 120--124.

G. Hu and D. Wang, Auditory segmentation based on event detection,

Workshop on Statistical and Perceptual Audio Processing -- SAPA'2004, Jeju, Korea, October 2004.

H. Sundaram and S. Chang, Audio scene segmentation using multiple features, models and time scales, IEEE International Conference on Acoustics, Speech, and Signal Processing -- ICASSP '2000,

June 2000, vol. 6, pp. 2441--2444, DOI: 10.1109/ICASSP.2000.859335.

D. Castan, A. Ortega, A. Miguel and E. Lleida, Audio segmentation-by-classification approach based on factor analysis in broadcast news domain, EURASIP Journal on Audio, Speech, and Music Processing,

vol. 34, pp. 1--13, 2014, DOI: 10.1186/s13636-014-0034-5.

T. Maka, An Auditory-Based Scene Change Detection in Audio Data,

International Conference on Signals and Electronic Systems (ICSES), 11-13

September 2014, Poznan, Poland, 2014, DOI: 10.1109/ICSES.2014.6948723.

L. Rabiner and W. Schafer, Theory and Applications of Digital Speech Processing, Prentice-Hall, 1st edition, 2010.

T. Nwe, M. Dong, S. Khine, and H. Li, Multi-Speaker Meeting Audio Segmentation, in Proceedings of INTERSPEECH'2008, 22-26 September, Brisbane, Australia, 2008.

T. Maka, Auditory Features Analysis for BIC-based Audio Segmentation,

SIGMAP 2014 -- 11th International Conference on Signal Processing and

Multimedia Applications, August 27-30, Vienna, Austria, 2014.

S. Davis and P. Mermelstein, emph{Comparison of parametric representation for monosyllabic word recognition in continuously spoken

sentences}, IEEE Transactions on ASSP, August, 1980.

M. Slaney, Auditory Toolbox, Apple Technical Report #45, 1998.

D. Wang and G. Brown, Computational Auditory Scene Analysis}, John Wiley & Sons, Inc., 2006.

M. Cooke, Modelling Auditory Processing and Organisation, Cambridge University Press, 2005.


  • There are currently no refbacks.

International Journal of Electronics and Telecommunications
is a periodical of Electronics and Telecommunications Committee
of Polish Academy of Sciences

eISSN: 2300-1933