September 25, 2017

A Dataset and Baseline System for Singing Voice Assessment

Barış Bozkurt, Ozan Baysal and Deniz Yuret. 2017. In The 13th International Symposium on Computer Music Multidisciplinary Research (CMMR), September. (PDF)

Abstract: In this paper we present a database of fundamental frequency series for singing performances to facilitate comparative analysis of algorithms developed for singing assessment. A large number of recordings have been collected during conservatory entrance exams which involves candidates’ reproduction of melodies (after listening to the target melody played on the piano) apart from some other rhythm and individual pitch perception related tasks. Leaving out the samples where jury members’ grades did not all agree, we deduced a collection of 1018 singing and 2599 piano performances as instances of 40 distinct melodies. A state of the art fundamental frequency (f0) detection algorithm is used to deduce f0 time-series for each of these recordings to form the dataset. The dataset is shared to support research in singing assessment. Together with the dataset, we provide a flexible singing assessment system that can serve as a baseline for comparison of assessment algorithms.


Full post...

September 14, 2017

Multidimensional Broadcast Operation on the GPU

Enis Berk Çoban, Deniz Yuret and Didem Unat. 2017. In 5. Ulusal Yüksek Başarımlı Hesaplama Konferansı, İstanbul, September. (PDF).

Abstract: Broadcast is a common operation in machine learning and widely used in calculating bias or subtracting maximum for normalization in convolutional neural networks. Broadcast operation is required when two tensors possibly with different number of dimensions, hence with different number of elements, are input to an element-wise function. Tensors are scaled in process so that the two tensors match in size and dimension. In this research, we introduce a new broadcast functionality for matrices to be used on CUDA enabled GPU devices. We further extend this operation to multidimensional arrays and measure its performance against the implementation available in the Knet deep learning framework. Our final implementation provides up to 2x improvement over the Knet broadcast implementation, which only supports vector broadcast. Our implementation can handle broadcast operations with any number of dimensions.
Full post...

September 04, 2017

RGB-D Object Recognition Using Deep Convolutional Neural Networks

Saman Zia, Yücel Yemez and Deniz Yuret. 2017. In The IEEE International Conference on Computer Vision (ICCV), October. (PDF).

Abstract: We address the problem of object recognition from RGB-D images using deep convolutional neural networks (CNNs). We advocate the use of 3D CNNs to fully exploit the 3D spatial information in depth images as well as the use of pretrained 2D CNNs to learn features from RGB-D images. There exists currently no large scale dataset available comprising depth information as compared to those for RGB data. Hence transfer learning from 2D source data is key to be able to train deep 3D CNNs. To this end, we propose a hybrid 2D/3D convolutional neural network that can be initialized with pretrained 2D CNNs and can then be trained over a relatively small RGB-D dataset. We conduct experiments on the Washington dataset involving RGB-D images of small household objects. Our experiments show that the features learnt from this hybrid structure, when fused with the features learnt from depth-only and RGB-only architectures, outperform the state of the art on RGB-D category recognition.


Full post...

August 25, 2017

Relational Symbol Grounding through Affordance Learning: An Overview of the ReGround Project

Antanas, Laura et al. Grounding Language Understanding (GLU 2017) ISCA Satellite Workshop of Interspeech 2017. (PDF, PPT)

Abstract: Symbol grounding is the problem of associating symbols from language with a corresponding referent in the environment. Traditionally, research has focused on identifying single objects and their properties. The ReGround project hypothesizes that the grounding process must consider the full context of the environment, including multiple objects, their properties, and relationships among these objects. ReGround targets the development of a novel framework for “affordance grounding”, by which an agent placed in a new environment can adapt to its new setting and interpret possibly multi-modal input in order to correctly carry out the requested tasks.


Full post...

August 03, 2017

FaceBook'un yapay zeka programı dünyayı ele geçirmeyi düşünmüyor

Son zamanlarda Facebook'un bir yapay zeka çalışması ile ilgili çıkan sansasyonel haberlerin gerçekle pek ilgisi yok:
Full post...

July 26, 2017

Parsing with context embeddings

Ömer Kırnap, Berkay Furkan Önder and Deniz Yuret. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Vancouver, 2017. (PDF, poster, presentation, related posts).

Abstract. We introduce context embeddings, dense vectors derived from a language model that represent the left/right context of a word instance, and demonstrate that context embeddings significantly improve the accuracy of our transition based parser. Our model consists of a bidirectional LSTM (BiLSTM) based language model that is pre-trained to predict words in plain text, and a multi-layer perceptron (MLP) decision model that uses features from the language model to predict the correct actions for an ArcHybrid transition based parser. We participated in the CoNLL 2017 UD Shared Task as the ``Koç University'' team and our system was ranked 7th out of 33 systems that parsed 81 treebanks in 49 languages.


Full post...

May 23, 2017

JuliaCon 2017, Berkeley, June 20-24

I gave a talk at JuliaCon introducing Knet on Wednesday, June 21, 2017, 4:16pm - 4:52pm, East Pauley Pauley Ballroom, Berkeley, CA. See these related posts.
Full post...

May 17, 2017

Congratulations to the Koç parsing team

Our neural net based dependency parser was number 7 overall out of 33 teams participating in the CoNLL 2017 Shared Task "Multilingual Parsing from Raw Text to Universal Dependencies" in which participating teams had to parse 68 corpora in 50 languages. I would like to thank Ömer Kırnap and Berkay Furkan Önder for their contributions and all-nighters.
Full post...

April 30, 2017

Learning to follow navigational instructions

Can, Ozan Arkan and Yuret, Deniz. 2017. International Symposium on Brain and Cognitive Science (ISBCS2017). Invited talk.


Full post...

April 28, 2017

The third deep learning revolution

1st presentation: April 19, 2017. Koç University Alumni Club.
2nd presentation: June 4, 2017. GİF Young Scholars Seminar.

The first revolution took place 1958-1969.
  • We figured out how to train perceptrons.
  • We proved the perceptron convergence theorem.
  • Interest waned after a book (Perceptrons) written by mathematicians.
The second revolution took place 1986-1995.
  • We figured out how to train multi-layer perceptrons.
  • We proved the universal approximation theorem.
  • Interest waned after a book (SLT) written by mathematicians.
The third revolution started in 2012.
  • We figured out how to train deep nets.
  • A mathematician will write a book around 2022.
  • The fourth revolution will not start until 2036 :)

Full post...

February 22, 2017

Overfitting, underfitting, regularization, dropout

Here is an IJulia notebook demonstrating overfitting, underfitting, regularization and dropout in Knet for my machine learning class.
Full post...