September 25, 2017

A Dataset and Baseline System for Singing Voice Assessment

Barış Bozkurt, Ozan Baysal and Deniz Yuret. 2017. In The 13th International Symposium on Computer Music Multidisciplinary Research (CMMR), September. (PDF)

Abstract: In this paper we present a database of fundamental frequency series for singing performances to facilitate comparative analysis of algorithms developed for singing assessment. A large number of recordings have been collected during conservatory entrance exams which involves candidates’ reproduction of melodies (after listening to the target melody played on the piano) apart from some other rhythm and individual pitch perception related tasks. Leaving out the samples where jury members’ grades did not all agree, we deduced a collection of 1018 singing and 2599 piano performances as instances of 40 distinct melodies. A state of the art fundamental frequency (f0) detection algorithm is used to deduce f0 time-series for each of these recordings to form the dataset. The dataset is shared to support research in singing assessment. Together with the dataset, we provide a flexible singing assessment system that can serve as a baseline for comparison of assessment algorithms.


Full post...

September 14, 2017

Multidimensional Broadcast Operation on the GPU

Enis Berk Çoban, Deniz Yuret and Didem Unat. 2017. In 5. Ulusal Yüksek Başarımlı Hesaplama Konferansı, İstanbul, September. (PDF).

Abstract: Broadcast is a common operation in machine learning and widely used in calculating bias or subtracting maximum for normalization in convolutional neural networks. Broadcast operation is required when two tensors possibly with different number of dimensions, hence with different number of elements, are input to an element-wise function. Tensors are scaled in process so that the two tensors match in size and dimension. In this research, we introduce a new broadcast functionality for matrices to be used on CUDA enabled GPU devices. We further extend this operation to multidimensional arrays and measure its performance against the implementation available in the Knet deep learning framework. Our final implementation provides up to 2x improvement over the Knet broadcast implementation, which only supports vector broadcast. Our implementation can handle broadcast operations with any number of dimensions.
Full post...

September 04, 2017

RGB-D Object Recognition Using Deep Convolutional Neural Networks

Saman Zia, Yücel Yemez and Deniz Yuret. 2017. In The IEEE International Conference on Computer Vision (ICCV), October. (PDF).

Abstract: We address the problem of object recognition from RGB-D images using deep convolutional neural networks (CNNs). We advocate the use of 3D CNNs to fully exploit the 3D spatial information in depth images as well as the use of pretrained 2D CNNs to learn features from RGB-D images. There exists currently no large scale dataset available comprising depth information as compared to those for RGB data. Hence transfer learning from 2D source data is key to be able to train deep 3D CNNs. To this end, we propose a hybrid 2D/3D convolutional neural network that can be initialized with pretrained 2D CNNs and can then be trained over a relatively small RGB-D dataset. We conduct experiments on the Washington dataset involving RGB-D images of small household objects. Our experiments show that the features learnt from this hybrid structure, when fused with the features learnt from depth-only and RGB-only architectures, outperform the state of the art on RGB-D category recognition.


Full post...