Change Point Determination in Audio Data Using Auditory Features
Abstract
The study is aimed to investigate the properties of auditory-based features for audio change point detection process. In the performed analysis, two popular techniques have been used: a metric-based approach and the BIC scheme. The efficiency of the change point detection process depends on the type and size of the feature space. Therefore, we have compared two auditory-based feature sets (MFCC and GTEAD) in both change point detection schemes. We have proposed a new technique based on multiscale analysis to determine the content change in the audio data. The comparison of the two typical change point detection techniques with two different feature spaces has been performed on the set of acoustical scenes with single change point. As the results show, the accuracy of the detected positions depends on the feature type, feature space dimensionality, detection technique and the type of audio data. In case of the BIC approach, the better accuracy has been obtained for MFCC feature space in the most cases. However, the change point detection with this feature results in a lower detection ratio in comparison to the GTEAD feature. Using the same criteria as for BIC, the proposed multiscale metric-based technique has been executed. In such case, the use of the GTEAD feature space has led to better accuracy. We have shown that the proposed multiscale change point detection scheme is competitive to the BIC scheme with the MFCC feature space.References
T. Kemp and M. Schmidt and M. Westphal and A. Waibel, Strategies for automatic segmentation of audio data, In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP'00, 5-9 June, Istanbul, 2000, DOI: 10.1109/ICASSP.2000.861862.
S. Chen and P. Gopalakrishnan, Speaker, environment and channel change detection and clustering via the bayesian information criterion, In Proc. DARPA Broadcast News Transcription and Understanding Workshop, 1998.
K. West, and S. Cox, Finding an Optimal Segmentation for Audio Genre Classification, in Proceedings of 6th International Conference on Music Information Retrieval ISMIR'2005, 11-15 September, London, UK, 2005.
G. Hu and D. Wang, Auditory Segmentation Based on Onset and Offset Analysis, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 2, pp. 396-405, February, 2007, DOI: 10.1109/TASL.2006.881700.
J. Foote, Automatic audio segmentation using a measure of audio novelty,
Multimedia and Expo --ICME 2000, IEEE International Conference, New York, NY, USA, 2000, DOI: 10.1109/ICME.2000.869637.
P. Hanna and N. Louis and M. Desainte-Catherine, and J. Benois-Pineau,
Audio features for noisy sound segmentation, International Society for Music Information Retrieval Conference -- ISMIR'2004, Barcelona, Spain, October 10--14 2004, vol. 1, pp. 120--124.
G. Hu and D. Wang, Auditory segmentation based on event detection,
Workshop on Statistical and Perceptual Audio Processing -- SAPA'2004, Jeju, Korea, October 2004.
H. Sundaram and S. Chang, Audio scene segmentation using multiple features, models and time scales, IEEE International Conference on Acoustics, Speech, and Signal Processing -- ICASSP '2000,
June 2000, vol. 6, pp. 2441--2444, DOI: 10.1109/ICASSP.2000.859335.
D. Castan, A. Ortega, A. Miguel and E. Lleida, Audio segmentation-by-classification approach based on factor analysis in broadcast news domain, EURASIP Journal on Audio, Speech, and Music Processing,
vol. 34, pp. 1--13, 2014, DOI: 10.1186/s13636-014-0034-5.
T. Maka, An Auditory-Based Scene Change Detection in Audio Data,
International Conference on Signals and Electronic Systems (ICSES), 11-13
September 2014, Poznan, Poland, 2014, DOI: 10.1109/ICSES.2014.6948723.
L. Rabiner and W. Schafer, Theory and Applications of Digital Speech Processing, Prentice-Hall, 1st edition, 2010.
T. Nwe, M. Dong, S. Khine, and H. Li, Multi-Speaker Meeting Audio Segmentation, in Proceedings of INTERSPEECH'2008, 22-26 September, Brisbane, Australia, 2008.
T. Maka, Auditory Features Analysis for BIC-based Audio Segmentation,
SIGMAP 2014 -- 11th International Conference on Signal Processing and
Multimedia Applications, August 27-30, Vienna, Austria, 2014.
S. Davis and P. Mermelstein, emph{Comparison of parametric representation for monosyllabic word recognition in continuously spoken
sentences}, IEEE Transactions on ASSP, August, 1980.
M. Slaney, Auditory Toolbox, Apple Technical Report #45, 1998.
D. Wang and G. Brown, Computational Auditory Scene Analysis}, John Wiley & Sons, Inc., 2006.
M. Cooke, Modelling Auditory Processing and Organisation, Cambridge University Press, 2005.
Downloads
Published
Issue
Section
License
1. License
The non-commercial use of the article will be governed by the Creative Commons Attribution license as currently displayed on https://creativecommons.org/licenses/by/4.0/.
2. Author’s Warranties
The author warrants that the article is original, written by stated author/s, has not been published before, contains no unlawful statements, does not infringe the rights of others, is subject to copyright that is vested exclusively in the author and free of any third party rights, and that any necessary written permissions to quote from other sources have been obtained by the author/s. The undersigned also warrants that the manuscript (or its essential substance) has not been published other than as an abstract or doctorate thesis and has not been submitted for consideration elsewhere, for print, electronic or digital publication.
3. User Rights
Under the Creative Commons Attribution license, the author(s) and users are free to share (copy, distribute and transmit the contribution) under the following conditions: 1. they must attribute the contribution in the manner specified by the author or licensor, 2. they may alter, transform, or build upon this work, 3. they may use this contribution for commercial purposes.
4. Rights of Authors
Authors retain the following rights:
- copyright, and other proprietary rights relating to the article, such as patent rights,
- the right to use the substance of the article in own future works, including lectures and books,
- the right to reproduce the article for own purposes, provided the copies are not offered for sale,
- the right to self-archive the article
- the right to supervision over the integrity of the content of the work and its fair use.
5. Co-Authorship
If the article was prepared jointly with other authors, the signatory of this form warrants that he/she has been authorized by all co-authors to sign this agreement on their behalf, and agrees to inform his/her co-authors of the terms of this agreement.
6. Termination
This agreement can be terminated by the author or the Journal Owner upon two months’ notice where the other party has materially breached this agreement and failed to remedy such breach within a month of being given the terminating party’s notice requesting such breach to be remedied. No breach or violation of this agreement will cause this agreement or any license granted in it to terminate automatically or affect the definition of the Journal Owner. The author and the Journal Owner may agree to terminate this agreement at any time. This agreement or any license granted in it cannot be terminated otherwise than in accordance with this section 6. This License shall remain in effect throughout the term of copyright in the Work and may not be revoked without the express written consent of both parties.
7. Royalties
This agreement entitles the author to no royalties or other fees. To such extent as legally permissible, the author waives his or her right to collect royalties relative to the article in respect of any use of the article by the Journal Owner or its sublicensee.
8. Miscellaneous
The Journal Owner will publish the article (or have it published) in the Journal if the article’s editorial process is successfully completed and the Journal Owner or its sublicensee has become obligated to have the article published. Where such obligation depends on the payment of a fee, it shall not be deemed to exist until such time as that fee is paid. The Journal Owner may conform the article to a style of punctuation, spelling, capitalization and usage that it deems appropriate. The Journal Owner will be allowed to sublicense the rights that are licensed to it under this agreement. This agreement will be governed by the laws of Poland.
By signing this License, Author(s) warrant(s) that they have the full power to enter into this agreement. This License shall remain in effect throughout the term of copyright in the Work and may not be revoked without the express written consent of both parties.