Deep Image Features in Music Information Retrieval
Abstract
Applications of Convolutional Neural Networks (CNNs) to variousproblems have been the subject of a number of recent studiesranging from image classification and object detection to scene parsing, segmentation 3D volumetric images and action recognition in videos.
In this study, the CNNs were applied to a Music Information Retrieval (MIR),
in particular to musical genre recognition.
The model was trained on ILSVRC-2012 (more than 1 million natural images) to perform image classification and was reused to perform genre classification using spectrograms images. Harmonic and percussion separation was applied, because it is characteristic formusical genre.
At final stage, the evaluation of various strategies of merging Support Vector Machines (SVMs) was performed on well known in MIR community - GTZAN dataset.
Even though, the model was trained on natural images, the results achieved in this studywere close to the state-of-the-art.
References
M. Kassler, “Toward Musical Information,” Perspectives of
New Music, vol. 4, no. 2, pp. 59–66, 1966. [Online].
Available: http://www.jstor.org/discover/10.2307/832213?uid=3738840
&uid=2134&uid=2&uid=70&uid=4&sid=21103750075213
Y. Song, S. Dixon, and M. Pearce, “A survey of
music recommendation systems and future perspectives,” 9th
International Symposium on Computer Music Modeling and
Retrieval, 2012. [Online]. Available: http://www.mendeley.com/research/
survey-music-recommendation-systems-future-perspectives-1/
J. Futrelle and J. S. Downie, “Interdisciplinary Communi-
ties and Research Issues in Music Information Retrieval,”
Library and Information Science, pp. 215–221, 2002. [On-
line]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.
1.133.9456&rep=rep1&type=pdf
J. S. Downie, K. West, A. F. Ehmann, and E. Vincent, “The 2005 Music
Information Retrieval Evaluation Exchange (MIREX 2005): Preliminary
Overview,” International Conference on Music Information Retrieval,
no. Mirex, pp. 320–323.
J. S. Downie, A. F. Ehmann, M. Bay, and M. C. Jones, “The
music information retrieval evaluation exchange: Some observations
and insights.” in Advances in Music Information Retrieval, ser. Studies
in Computational Intelligence, Z. W. Ras and A. Wieczorkowska,
Eds. Springer, 2010, vol. 274, pp. 93–115. [Online]. Available:
http://dblp.uni-trier.de/db/series/sci/sci274.html#DownieEBJ10
P. Rao, “Audio signal processing,” in Speech, Audio, Image and
Biomedical Signal Processing using Neural Networks, ser. Studies in
Computational Intelligence, B. Prasad and S. Prasanna, Eds. Springer
Berlin Heidelberg, 2008, vol. 83, pp. 169–189.
D. Grzywczak and G. Gwardys, “Audio features in music information
retrieval,” in Active Media Technology, ser. Lecture Notes in Computer
Science, D. lzak, G. Schaefer, S. Vuong, and Y.-S. Kim, Eds. Springer
International Publishing, 2014, vol. 8610, pp. 187–199.
B. Zhen, X. Wu, Z. Liu, and H. Chi, “On the importance of components
of the mfcc in speech and speaker recognition.” in INTERSPEECH.
ISCA, 2000, pp. 487–490.
K. Lee, “Automatic chord recognition from audio using enhanced pitch
class profile,” in ICMC Proceedings, 2006.
X. Yu, J. Zhang, J. Liu, W. Wan, and W. Yang, “An audio retrieval
method based on chromagram and distance metrics,” in Audio Language
and Image Processing (ICALIP), 2010 International Conference on.
IEEE, 2010, pp. 425–428.
J. Serr, E. Gmez, P. Herrera, and X. Serra, “Chroma binary similarity
and local alignment applied to cover song identification,” IEEE Trans.
on Audio, Speech, and Language Processing, 2008.
P. Baldi, “Autoencoders, unsupervised learning, and deep architectures.”
in ICML Unsupervised and Transfer Learning, ser. JMLR Proceedings,
I. Guyon, G. Dror, V. Lemaire, G. W. Taylor, and D. L. Silver,
Eds., vol. 27. JMLR.org, 2012, pp. 37–50. [Online]. Available:
http://dblp.uni-trier.de/db/journals/jmlr/jmlrp27.html#Baldi12
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Neurocom-
puting: Foundations of research,” J. A. Anderson and E. Rosenfeld,
Eds. Cambridge, MA, USA: MIT Press, 1988, ch. Learning Internal
Representations by Error Propagation, pp. 673–695.
D. Yu and M. Seltzer, “Improved bottleneck features using pretrained
deep neural networks,” in Interspeech. International Speech Commu-
nication Association, August 2011.
P. Smolensky, “Parallel distributed processing: Explorations in the
microstructure of cognition, vol. 1,” D. E. Rumelhart, J. L. McClelland,
and C. PDP Research Group, Eds. Cambridge, MA, USA: MIT
Press, 1986, ch. Information Processing in Dynamical Systems:
Foundations of Harmony Theory, pp. 194–281. [Online]. Available:
http://dl.acm.org/citation.cfm?id=104279.104290
H. Lee, P. T. Pham, Y. Largman, and A. Y. Ng, “Unsupervised fea-
ture learning for audio classification using convolutional deep belief
networks.” in NIPS, vol. 9, 2009, pp. 1096–1104.
P. Hamel, S. Lemieux, Y. Bengio, and D. Eck, “Temporal pooling
and multiscale learning for automatic annotation and ranking of music
audio.” in ISMIR, 2011, pp. 729–734.
P. Hamel, Y. Bengio, and D. Eck, “Building musically-relevant audio
features through multiple timescale representations.” in ISMIR, 2012,
pp. 553–558.
E. M. Schmidt and Y. E. Kim, “Learning rhythm and melody features
with deep belief networks,” in ISMIR, 2013.
S. Cherla, T. Weyde, A. S. d’Avila Garcez, and M. Pearce, “A
distributed model for multiple-viewpoint melodic prediction.” in ISMIR,
A. de Souza Britto Jr., F. Gouyon, and S. Dixon, Eds., 2013,
pp. 15–20. [Online]. Available: http://dblp.uni-trier.de/db/conf/ismir/
ismir2013.html#CherlaWGP13
E. Battenberg and D. Wessel, “Analyzing drum patterns using condi-
tional deep belief networks.” in ISMIR, 2012, pp. 37–42.
J. Schmidhuber, “Multi-column deep neural networks for image
classification,” in Proceedings of the 2012 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), ser. CVPR ’12.
Washington, DC, USA: IEEE Computer Society, 2012, pp. 3642–3649.
[Online]. Available: http://dl.acm.org/citation.cfm?id=2354409.2354694
K. Fukushima, “Neocognitron: A self-organizing neural network model
for a mechanism of pattern recognition unaffected by shift in position,”
Biological Cybernetics, vol. 36, pp. 193–202, 1980.
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning
applied to document recognition,” in Proceedings of the IEEE, 1998, pp.
–2324.
D. Ciresan, A. Giusti, L. M. Gambardella, and J. Schmidhuber, “Deep
neural networks segment neuronal membranes in electron microscopy
images,” in Advances in neural information processing systems, 2012,
pp. 2843–2851.
P. H. Pinheiro and R. Collobert, “Recurrent convolutional neural net-
works for scene parsing,” arXiv preprint arXiv:1306.2795, 2013.
G. W. Taylor, R. Fergus, Y. LeCun, and C. Bregler, “Convolutional
learning of spatio-temporal features,” in Computer Vision–ECCV 2010.
Springer, 2010, pp. 140–153.
P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. Le-
Cun, “Overfeat: Integrated recognition, localization and detection using
convolutional networks,” arXiv preprint arXiv:1312.6229, 2013.
“Ilsvrc 2014,” http://image-net.org/challenges/LSVRC/2014/index, ac-
cessed: 2014-08-31.
“Ilsvrc 2012 results,” http://image-net.org/challenges/LSVRC/2012/
results.html, accessed: 2014-08-31.
A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification
with deep convolutional neural networks,” Advances in Neural
Information, pp. 1–9, 2012. [Online]. Available: http://books.nips.cc/
papers/files/nips25/NIPS2012 0534.pdf
“Mnist dataset,” http://yann.lecun.com/exdb/mnist/, accessed: 2014-08-
S. J. Pan and Q. Yang, “A survey on transfer learning,” Knowledge and
Data Engineering, IEEE Transactions on, vol. 22, no. 10, pp. 1345–
, Oct 2010.
W. Dai, G. rong Xue, Q. Yang, and Y. Yu, “Transferring naive bayes
classifiers for text classification,” in In Proceedings of the 22nd AAAI
Conference on Artificial Intelligence, 2007, pp. 540–545.
J. na Meng, H. fei Lin, and Y. hai Yu, “Transfer learning based on svd
for spam filtering,” in Intelligent Computing and Cognitive Informatics
(ICICCI), 2010 International Conference on, June 2010, pp. 491–494.
H. Wang, F. Nie, H. Huang, and C. Ding, “Dyadic transfer learning for
cross-domain image classification,” in Computer Vision (ICCV), 2011
IEEE International Conference on, Nov 2011, pp. 551–556.
A Practical Transfer Learning Algorithm for Face Verification.
International Conference on Computer Vision (ICCV), 2013.
[Online]. Available: http://research.microsoft.com/apps/pubs/default.
aspx?id=202192
N. Ono, K. Miyamoto, J. Le Roux, H. Kameoka, and S. Sagayama, “Sep-
aration of a monaural audio signal into harmonic/percussive components
by complementary diffusion on spectrogram,” in Proc. EUSIPCO, 2008.
Downloads
Published
Issue
Section
License
Copyright (c) 2014 International Journal of Electronics and Telecommunication
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
1. License
The non-commercial use of the article will be governed by the Creative Commons Attribution license as currently displayed on https://creativecommons.org/licenses/by/4.0/.
2. Author’s Warranties
The author warrants that the article is original, written by stated author/s, has not been published before, contains no unlawful statements, does not infringe the rights of others, is subject to copyright that is vested exclusively in the author and free of any third party rights, and that any necessary written permissions to quote from other sources have been obtained by the author/s. The undersigned also warrants that the manuscript (or its essential substance) has not been published other than as an abstract or doctorate thesis and has not been submitted for consideration elsewhere, for print, electronic or digital publication.
3. User Rights
Under the Creative Commons Attribution license, the author(s) and users are free to share (copy, distribute and transmit the contribution) under the following conditions: 1. they must attribute the contribution in the manner specified by the author or licensor, 2. they may alter, transform, or build upon this work, 3. they may use this contribution for commercial purposes.
4. Rights of Authors
Authors retain the following rights:
- copyright, and other proprietary rights relating to the article, such as patent rights,
- the right to use the substance of the article in own future works, including lectures and books,
- the right to reproduce the article for own purposes, provided the copies are not offered for sale,
- the right to self-archive the article
- the right to supervision over the integrity of the content of the work and its fair use.
5. Co-Authorship
If the article was prepared jointly with other authors, the signatory of this form warrants that he/she has been authorized by all co-authors to sign this agreement on their behalf, and agrees to inform his/her co-authors of the terms of this agreement.
6. Termination
This agreement can be terminated by the author or the Journal Owner upon two months’ notice where the other party has materially breached this agreement and failed to remedy such breach within a month of being given the terminating party’s notice requesting such breach to be remedied. No breach or violation of this agreement will cause this agreement or any license granted in it to terminate automatically or affect the definition of the Journal Owner. The author and the Journal Owner may agree to terminate this agreement at any time. This agreement or any license granted in it cannot be terminated otherwise than in accordance with this section 6. This License shall remain in effect throughout the term of copyright in the Work and may not be revoked without the express written consent of both parties.
7. Royalties
This agreement entitles the author to no royalties or other fees. To such extent as legally permissible, the author waives his or her right to collect royalties relative to the article in respect of any use of the article by the Journal Owner or its sublicensee.
8. Miscellaneous
The Journal Owner will publish the article (or have it published) in the Journal if the article’s editorial process is successfully completed and the Journal Owner or its sublicensee has become obligated to have the article published. Where such obligation depends on the payment of a fee, it shall not be deemed to exist until such time as that fee is paid. The Journal Owner may conform the article to a style of punctuation, spelling, capitalization and usage that it deems appropriate. The Journal Owner will be allowed to sublicense the rights that are licensed to it under this agreement. This agreement will be governed by the laws of Poland.
By signing this License, Author(s) warrant(s) that they have the full power to enter into this agreement. This License shall remain in effect throughout the term of copyright in the Work and may not be revoked without the express written consent of both parties.