I am an associate professor in Computer Engineering at Koç University in Istanbul working at the Artificial Intelligence Laboratory. Previously I was at the MIT AI Lab and later co-founded Inquira, Inc. My research is in natural language processing and machine learning. For prospective students here are some research topics, papers, classes, blog posts and past students.
Koç Üniversitesi Bilgisayar Mühendisliği Bölümü'nde öğretim üyesiyim ve Yapay Zeka Laboratuarı'nda çalışıyorum. Bundan önce MIT Yapay Zeka Laboratuarı'nda çalıştım ve Inquira, Inc. şirketini kurdum. Araştırma konularım doğal dil işleme ve yapay öğrenmedir. İlgilenen öğrenciler için araştırma konuları, makaleler, verdiğim dersler, Türkçe yazılarım, ve mezunlarımız.

November 12, 2014

Some starting points for deep learning and RNNs

HintonBengioLeCunnJordan have done reddit AMA's.

A Google search for "fast neural network code" reveals cuda-convnet and cuda-convnet2 by Alex Khrizevsky, Hinton's student (convnet2 supports K20 and K40). There is also fann with no GPU support.  Radford Neal's (another Hinton student) fbm implements Bayesian neural nets and has wonderful documentation. Can also do gradient descent but I doubt it is competitive speedwise.  ufldl is a great tutorial (with matlab code) by Andrew Ng.  More software can be found at deeplearning.net.
Bengio's group wrote Theano and PyLearn2. Bengio also has a draft deep learning book. His research page summarizes the main problems nicely and gives pointers to a couple of review papers (one on neural net language models).
LeCunn uses Torch, written by Collobert, as does Deep Mind and some Google groups. Torch is based on a simple language Lua, and a descendent of machine learning language Lush (and Lush2) developed by LeCunn and Bottou.  
Schmidhuber invented LSTM. He is working on an RNN book, no draft yet. His student Alex Graves wrote RNNLIB. DeepMind (Alex Graves et al.) invented NTM.
Learnxinyminutes.com is a great resource to quickly start learning new languages including Lua. Too bad nobody is using Julia and Julia doesn't support GPUs yet (except for experimental add-ons). Is Matlab (e.g. Ng's ufldl code) still competitive? Ng uses LBFGS for optimization instead of gradient descent which is faster, does convnet use sophisticated optimization or gradient descent? Will explore in next post.

Also see my earlier post.

Full post...

September 30, 2014

Mode coupling points to functionally important residues in myosin II.

Onur Varol, Deniz Yuret, Burak Erman and Alkan Kabakçıoğlu. 2014. Proteins: Structure, Function, and Bioinformatics, vol 82, no 9, pp 1777--1786, September. (PDF)

Abstract: Relevance of mode coupling to energy/information transfer during protein function, particularly in the context of allosteric interactions is widely accepted. However, existing evidence in favor of this hypothesis comes essentially from model systems. We here report a novel formal analysis of the near-native dynamics of myosin II, which allows us to explore the impact of the interaction between possibly non-Gaussian vibrational modes on fluctuational dynamics. We show that an information-theoretic measure based on mode coupling alone yields a ranking of residues with a statistically significant bias favoring the functionally critical locations identified by experiments on myosin II.

Full post...

August 23, 2014

Unsupervised Instance-Based Part of Speech Induction Using Probable Substitutes

Mehmet Ali Yatbaz, Enis Sert, Deniz Yuret. COLING 2014. (This is a token based and multilingual extension of our EMNLP 2012 model. Up to date versions of the code can be found at github.) Abstract: We develop an instance (token) based extension of the state of the art word (type) based part-of-speech induction system introduced in (Yatbaz et al. 2012). Each word instance is represented by a feature vector that combines information from the target word and probable substitutes sampled from an n-gram model representing its context. Modeling ambiguity using an instance based model does not lead to significant gains in overall accuracy in part-of-speech tagging because most words in running text are used in their most frequent class (e.g. 93.69% in the Penn Treebank). However it is important to model ambiguity because most frequent words are ambiguous and not modeling them correctly may negatively affect upstream tasks. Our main contribution is to show that an instance based model can achieve significantly higher accuracy on ambiguous words at the cost of a slight degradation on unambiguous ones, maintaining a comparable overall accuracy. On the Penn Treebank, the overall many-to-one accuracy of the system is within 1% of the state-of-the-art (80%), while on highly ambiguous words it is up to 70% better. On multilingual experiments our results are significantly better than or comparable to the best published word or instance based systems on 15 out of 19 corpora in 15 languages. The vector representations for words used in our system are available for download for further experiments.
Full post...

August 08, 2014

Volkan Cirik, M.S. 2014

Current position: Masters in Language Technologies, Carnegie Mellon University, Language Technologies Institute (LinkedIn).
M.S. Thesis: Analysis of SCODE Word Embeddings based on Substitute Distributions in Supervised Tasks. Koç University, Department of Computer Engineering. August, 2014. (PDF, Presentation, word vectors (github), word vectors (dropbox))
Publications: bibtex.php

One of the interests of the Natural Language Processing (NLP) community is to find representations for lexical items using large amount of unlabeled data. Inducing low-dimensional, continuous, dense word vectors, or word embeddings, have become the principal technique to find representations for words. Word embeddings address the issues of the classical categorical representation of words by capturing syntactic and semantic information of words in the dimensions of a vector. These representations are shown to be successful across NLP tasks including Named Entity Recognition, Part-of-speech Tagging, Parsing, and Semantic Role Labeling.

In this work, I analyze a word embedding method in supervised Natural Language Processing (NLP) tasks. The framework maps words on a sphere such that words co-occurring in similar contexts lie closely. The similarity of contexts is measured by the distribution of substitutes that can fill them. I compared word embeddings, including more recent representations, in Named Entity Recognition (NER), Chunking, and Dependency Parsing. I examine the framework in a multilingual setup as well. The results show that the examined method achieves as good as or better results compared to the other word embeddings. The framework is consistent in improving the baseline systems across languages and achieves state-of-the-art results in multilingual dependency parsing.

Full post...

June 26, 2014

Probabilistic Modeling of Joint-context in Distributional Similarity

Oren Melamud, Ido Dagan, Jacob Goldberger, Idan Szpektor, and Deniz Yuret. In the Proceedings of the Eighteenth Conference on Computational Natural Language Learning (CoNLL-2014). (Download W14-1619)

Abstract: Most traditional distributional similarity models fail to capture syntagmatic patterns that group together multiple word features within the same joint context. In this work we introduce a novel generic distributional similarity scheme under which the power of probabilistic models can be leveraged to effectively model joint contexts. Based on this scheme, we implement a concrete model which utilizes probabilistic n-gram language models. Our evaluations suggest that this model is particularly well-suited for measuring similarity for verbs, which are known to exhibit richer syntagmatic patterns, while maintaining comparable or better performance with respect to competitive baselines for nouns. Following this, we propose our scheme as a framework for future semantic similarity models leveraging the substantial body of work that exists in probabilistic language modeling.

Full post...