I am a professor of Computer Engineering at Koç University in Istanbul and the founding director of the Artificial Intelligence Laboratory. Previously I was at the MIT AI Lab and later co-founded Inquira, Inc. My research is in natural language processing and machine learning. For prospective students here are some research topics, papers, classes, blog posts and past students.
Koç Üniversitesi Bilgisayar Mühendisliği Bölümü'nde öğretim üyesiyim ve Yapay Zeka Laboratuarı'nın kurucu müdürüyüm. Bundan önce MIT Yapay Zeka Laboratuarı'nda çalıştım ve Inquira, Inc. şirketini kurdum. Araştırma konularım doğal dil işleme ve yapay öğrenmedir. İlgilenen öğrenciler için araştırma konuları, makaleler, verdiğim dersler, Türkçe yazılarım, ve mezunlarımız.

June 09, 2020

Cemil Cengiz, M.S. 2020

Contact info: Linkedin, Homepage.
M.S Thesis: Improving Generalization in Natural Language Inference by Joint Training with Semantic Role Labeling, Koç University, Department of Computer Engineering. June 2020. (PDF, Presentation).
Publications: BibTeX

Thesis Abstract:
Recently, end-to-end models have achieved near-human performance on natural language inference (NLI) datasets. However, they show low generalization on out-of-distribution evaluation sets since they tend to learn shallow heuristics due to the biases in the training datasets. The performance decreases dramatically on diagnostic sets measuring compositionality or robustness against simple heuristics. Existing solutions for this problem employ dataset augmentation by extending the training dataset with examples from the evaluated adversarial categories. However, that approach has the drawbacks of being applicable to only a limited set of adversaries and at worst hurting the model performance on other adversaries not included in the augmentation set. Instead, our proposed solution is to improve sentence understanding (hence out-of-distribution generalization) with joint learning of explicit semantics. In this thesis, we show that a BERT based model trained jointly on English semantic role labeling (SRL) and NLI achieves significantly higher performance on external evaluation sets measuring generalization performance.


Full post...

May 13, 2020

Berkay Furkan Önder, M.S. 2020

Contact info: Email, GitHub.
M.S Thesis: Effect Of Contextual Embeddings on Graph-Based Dependency Parsing, Koç University, Department of Computer Engineering. May 2020. (PDF, Presentation).
Publications: CoNLL18 and CoNLL17

Thesis Abstract:
I demonstrate the effect of contextual embeddings on transition and graph-based parsing methods and test our contribution, the structured meta biaffine decoder, using various graph-based parsing algorithms.
As Koc University Graph-Based parsing team, we implemented a graph-based dependency parsing model in order to perform syntactic and semantic analysis of given sentences. Our neural graph-based parser consists of two main parts, which are encoder and decoder. The encoder forms continuous feature vectors from provided sentences for the neural graph-based parser to process the texts properly, whereas the decoder produces the parse tree from the output of the neural parser, by first producing a graph representation of the output. We participated in CoNLL 2018 Shared Task with the parsing model we created, and had the opportunity to run our model on 61 different data sets formed with texts in 41 different languages. We took advantage of natural language processing and deep learning techniques, including graph-based dependency parsing algorithms.


Full post...

March 28, 2020

BiLingUNet: Image Segmentation by Modulating Top-Down and Bottom-Up Visual Processing with Referring Expressions

Ozan Arkan Can, İlker Kesen, Deniz Yuret. March 28, 2020. Submitted to ECCV. arXiv:2003.12739.

Abstract: We present BiLingUNet, a state-of-the-art model for image segmentation using referring expressions. BiLingUNet uses language to customize visual filters and outperforms approaches that concatenate a linguistic representation to the visual input. We find that using language to modulate both bottom-up and top-down visual processing works better than just making the top-down processing language-conditional. We argue that common 1x1 language-conditional filters cannot represent relational concepts and experimentally demonstrate that wider filters work better. Our model achieves state-of-the-art performance on four referring expression datasets.


Full post...

November 12, 2019

A simple explanation of Variational Autoencoders

The goal of VAE is to model your data \(X\) coming from a complicated distribution \(P(X)\) using a latent (unobserved, hypothesized) variable \(Z\): \[ P(x) = \int P(x|z) P(z) dz \] This identity is true for any distribution \(P\) and any value \(x\). VAE takes \(P(Z)\) to be the multivariate standard normal. Note that this identity can also be written as an expectation: \[ P(x) = E_{z\sim P(Z)}[P(x|z)] \] and can be approximated by sampling \(z_n\) from \(P(Z)\): \[ P(x) \approx \frac{1}{N} \sum_{z_n\sim P(Z)} P(x|z_n) \] However for high dimensional spaces (images, text) typically modeled by VAE, this would be a poor approximation because for a given \(x\) value, \(P(x|z)\) would be close to 0 almost everywhere. Randomly sampling from \(P(Z)\) would be unlikely to hit regions of \(Z\) space where \(P(x|z)\) is high. Say we had a distribution \(Q(Z|X)\) which is more likely to give us \(z\) values where \(P(x|z)\) is high. We could rewrite our former identity as: \[ P(x) = \int P(x|z) P(z) Q(z|x) / Q(z|x) dz \] Note that this identity can also be expressed as an expectation: \[ P(x) = E_{z\sim Q(Z|x)}[P(x|z) P(z) / Q(z|x)] \] and can be approximated by sampling \(z_n\) from \(Q(Z|x)\) (this is called importance sampling and would converge faster because \(Q\) gives us better \(z\) values): \[ P(x) \approx \frac{1}{N} \sum_{z_n\sim Q(Z|x)} P(x|z_n) P(z_n) / Q(z_n|x) \] To train a VAE model we pick some parametric functions \(P_\theta(X|Z)\) (i.e. decoder, likelihood, generative network) and \(Q_\phi(Z|X)\) (i.e. encoder, posterior, inference network) and fiddle with their parameters to maximize the likelihood of the training data \( D=\{x_1,\ldots,x_M\} \). Actually, instead of likelihood \(P(D) = \prod P(x_m)\) we use log likelihood: \(\log P(D) = \sum\log P(x)\) because it nicely decomposes as a sum over each example. We now have to figure out how to approximate \(\log P(X)\). \[ \log P(x) = \log E_{z\sim Q(Z|x)}[P(x|z) P(z) / Q(z|x)] \] Jensen's inequality tells us that log of an expectation is greater than or equal to the expectation of the log: \[ \log P(x) \geq E_{z\sim Q(Z|x)}\log[P(x|z) P(z) / Q(z|x)] \] The RHS of this inequality is what is known in the business as ELBO (evidence lower bound), more typically written as: \[ \log P(x) \geq E_{z\sim Q(Z|x)}[\log P(x|z)] - D_{KL}[Q(Z|x)\,\|\,P(Z)] \] This standard expression tells us more directly what to compute but obscures the intuition that ELBO is just the expected log of an importance sampling term.

To see the exact difference between the two sides of this inequality we can use the integral version: \[ \begin{align} \log & P(x) - \int \log[P(x|z) P(z) / Q(z|x)] Q(z|x) dz \\ = & \int [\log P(x) - \log P(x|z) - \log P(z) + \log Q(z|x)] Q(z|x) dz \\ = & \int [\log Q(z|x) - \log P(z|x)] Q(z|x) dz \\ = & D_{KL}[Q(Z|x)\,\|\,P(Z|x)] \end{align} \] This allows us to write an exact equation, indicating the error of our approximation is given by the KL divergence between \(Q(Z|x)\) and \(P(Z|x)\): \[ \begin{align} \log & P(x) - D_{KL}[Q(Z|x)\,\|\,P(Z|x)] = \\ & E_{z\sim Q(Z|x)}[\log P(x|z)] - D_{KL}[Q(Z|x)\,\|\,P(Z)] \end{align} \]

Reference: Tutorial on Variational Autoencoders by Carl Doersch (https://arxiv.org/abs/1606.05908)
Full post...