I am a professor of Computer Engineering at Koç University in Istanbul and the founding director of the Artificial Intelligence Laboratory. Previously I was at the MIT AI Lab and later co-founded Inquira, Inc. My research is in natural language processing and machine learning. For prospective students here are some research topics, papers, classes, blog posts and past students.
Koç Üniversitesi Bilgisayar Mühendisliği Bölümü'nde öğretim üyesiyim ve Yapay Zeka Laboratuarı'nın kurucu müdürüyüm. Bundan önce MIT Yapay Zeka Laboratuarı'nda çalıştım ve Inquira, Inc. şirketini kurdum. Araştırma konularım doğal dil işleme ve yapay öğrenmedir. İlgilenen öğrenciler için araştırma konuları, makaleler, verdiğim dersler, Türkçe yazılarım, ve mezunlarımız.

March 28, 2020

BiLingUNet: Image Segmentation by Modulating Top-Down and Bottom-Up Visual Processing with Referring Expressions

Ozan Arkan Can, İlker Kesen, Deniz Yuret. March 28, 2020. Submitted to ECCV. arXiv:2003.12739.

Abstract: We present BiLingUNet, a state-of-the-art model for image segmentation using referring expressions. BiLingUNet uses language to customize visual filters and outperforms approaches that concatenate a linguistic representation to the visual input. We find that using language to modulate both bottom-up and top-down visual processing works better than just making the top-down processing language-conditional. We argue that common 1x1 language-conditional filters cannot represent relational concepts and experimentally demonstrate that wider filters work better. Our model achieves state-of-the-art performance on four referring expression datasets.

Full post...

November 12, 2019

A simple explanation of Variational Autoencoders

The goal of VAE is to model your data \(X\) coming from a complicated distribution \(P(X)\) using a latent (unobserved, hypothesized) variable \(Z\): \[ P(x) = \int P(x|z) P(z) dz \] This identity is true for any distribution \(P\) and any value \(x\). VAE takes \(P(Z)\) to be the multivariate standard normal. Note that this identity can also be written as an expectation: \[ P(x) = E_{z\sim P(Z)}[P(x|z)] \] and can be approximated by sampling \(z_n\) from \(P(Z)\): \[ P(x) \approx \frac{1}{N} \sum_{z_n\sim P(Z)} P(x|z_n) \] However for high dimensional spaces (images, text) typically modeled by VAE, this would be a poor approximation because for a given \(x\) value, \(P(x|z)\) would be close to 0 almost everywhere. Randomly sampling from \(P(Z)\) would be unlikely to hit regions of \(Z\) space where \(P(x|z)\) is high. Say we had a distribution \(Q(Z|X)\) which is more likely to give us \(z\) values where \(P(x|z)\) is high. We could rewrite our former identity as: \[ P(x) = \int P(x|z) P(z) Q(z|x) / Q(z|x) dz \] Note that this identity can also be expressed as an expectation: \[ P(x) = E_{z\sim Q(Z|x)}[P(x|z) P(z) / Q(z|x)] \] and can be approximated by sampling \(z_n\) from \(Q(Z|x)\) (this is called importance sampling and would converge faster because \(Q\) gives us better \(z\) values): \[ P(x) \approx \frac{1}{N} \sum_{z_n\sim Q(Z|x)} P(x|z_n) P(z_n) / Q(z_n|x) \] To train a VAE model we pick some parametric functions \(P_\theta(X|Z)\) (i.e. decoder, likelihood, generative network) and \(Q_\phi(Z|X)\) (i.e. encoder, posterior, inference network) and fiddle with their parameters to maximize the likelihood of the training data \( D=\{x_1,\ldots,x_M\} \). Actually, instead of likelihood \(P(D) = \prod P(x_m)\) we use log likelihood: \(\log P(D) = \sum\log P(x)\) because it nicely decomposes as a sum over each example. We now have to figure out how to approximate \(\log P(X)\). \[ \log P(x) = \log E_{z\sim Q(Z|x)}[P(x|z) P(z) / Q(z|x)] \] Jensen's inequality tells us that log of an expectation is greater than or equal to the expectation of the log: \[ \log P(x) \geq E_{z\sim Q(Z|x)}\log[P(x|z) P(z) / Q(z|x)] \] The RHS of this inequality is what is known in the business as ELBO (evidence lower bound), more typically written as: \[ \log P(x) \geq E_{z\sim Q(Z|x)}[\log P(x|z)] - D_{KL}[Q(Z|x)\,\|\,P(Z)] \] This standard expression tells us more directly what to compute but obscures the intuition that ELBO is just the expected log of an importance sampling term.

To see the exact difference between the two sides of this inequality we can use the integral version: \[ \begin{align} \log & P(x) - \int \log[P(x|z) P(z) / Q(z|x)] Q(z|x) dz \\ = & \int [\log P(x) - \log P(x|z) - \log P(z) + \log Q(z|x)] Q(z|x) dz \\ = & \int [\log Q(z|x) - \log P(z|x)] Q(z|x) dz \\ = & D_{KL}[Q(Z|x)\,\|\,P(Z|x)] \end{align} \] This allows us to write an exact equation, indicating the error of our approximation is given by the KL divergence between \(Q(Z|x)\) and \(P(Z|x)\): \[ \begin{align} \log & P(x) - D_{KL}[Q(Z|x)\,\|\,P(Z|x)] = \\ & E_{z\sim Q(Z|x)}[\log P(x|z)] - D_{KL}[Q(Z|x)\,\|\,P(Z)] \end{align} \]

Reference: Tutorial on Variational Autoencoders by Carl Doersch (https://arxiv.org/abs/1606.05908)
Full post...

September 25, 2019

Morphological analysis using a sequence decoder

Ekin Akyürek, Erenay Dayanık, Deniz Yuret (2019). Transactions Of The Association For Computational Linguistics, 7, 567-579. (PDF, arXiv)

Abstract: We introduce Morse, a recurrent encoder-decoder model that produces morphological analyses of each word in a sentence. The encoder turns the relevant information about the word and its context into a fixed size vector representation and the decoder generates the sequence of characters for the lemma followed by a sequence of individual morphological features. We show that generating morphological features individually rather than as a combined tag allows the model to handle rare or unseen tags and outperform whole-tag models. In addition, generating morphological features as a sequence rather than e.g. an unordered set allows our model to produce an arbitrary number of features that represent multiple inflectional groups in morphologically complex languages. We obtain state-of-the art results in nine languages of different morphological complexity under low-resource, high-resource and transfer learning settings. We also introduce TrMor2018, a new high accuracy Turkish morphology dataset. Our Morse implementation and the TrMor2018 dataset are available online to support future research.

See https://github.com/ai-ku/Morse.jl for a Morse implementation in Julia/Knet and https://github.com/ai-ku/TrMor2018 for the new Turkish dataset.

Full post...

September 09, 2019

Overview of CLEF 2019 Lab ProtestNews: Extracting Protests from News in a Cross-context Setting

Hürriyetoğlu, Ali and Yörük, Erdem and Yuret, Deniz and Yoltar, C. and Gürel, B. and Duruşan, F. and Mutlu, O. and Akdemir, A. In CLEF 2019 Working Notes. September, 2019. (PDF, Proceedings).

Abstract: We present an overview of the CLEF-2019 Lab ProtestNews on Extracting Protests from News in the context of generalizable natural language processing. The lab consists of document, sentence, and token level information classification and extraction tasks that were referred as task 1, task 2, and task 3 respectively in the scope of this lab. The tasks required the participants to identify protest relevant information from English local news at one or more aforementioned levels in a cross-context setting, which is cross-country in the scope of this lab. The training and development data were collected from India and test data was collected from India and China. The lab attracted 58 teams to participate in the lab. 12 and 9 of these teams submitted results and working notes respectively. We have observed neural networks yield the best results and the performance drops significantly for majority of the submissions in the cross-country setting, which is China.

Full post...