I am an associate professor of Computer Engineering at Koç University in Istanbul working at the Artificial Intelligence Laboratory. Previously I was at the MIT AI Lab and later co-founded Inquira, Inc. My research is in natural language processing and machine learning. For prospective students here are some research topics, papers, classes, blog posts and past students.
Koç Üniversitesi Bilgisayar Mühendisliği Bölümü'nde öğretim üyesiyim ve Yapay Zeka Laboratuarı'nda çalışıyorum. Bundan önce MIT Yapay Zeka Laboratuarı'nda çalıştım ve Inquira, Inc. şirketini kurdum. Araştırma konularım doğal dil işleme ve yapay öğrenmedir. İlgilenen öğrenciler için araştırma konuları, makaleler, verdiğim dersler, Türkçe yazılarım, ve mezunlarımız.

August 23, 2016


AutoGrad.jl is an automatic differentiation package for Julia. It is a Julia port of the popular Python autograd package. It can differentiate regular Julia code that includes loops, conditionals, helper functions, closures etc. by keeping track of the primitive operations and using this execution trace to compute gradients. It uses reverse mode differentiation (a.k.a. backpropagation) so it can efficiently handle functions with array inputs and scalar outputs. It can compute gradients of gradients to handle higher order derivatives. Please see the comments in core.jl for a description of how the code works in detail.


You can install AutoGrad in Julia using:

julia> Pkg.add("AutoGrad")

In order to use it in your code start with:

using AutoGrad

Here is a linear regression example simplified from housing.jl:

using AutoGrad

function loss(w)
    global xtrn,ytrn
    ypred = w[1]*xtrn .+ w[2]
    sum(abs2(ypred - ytrn)) / size(ypred,2)

function train(w; lr=.1, epochs=20)
    gradfun = grad(loss)
    for epoch=1:epochs
        g = gradfun(w)
        for i in 1:length(w)
            w[i] -= lr * g[i]
    return w

The loss function takes parameters as input and returns the loss to be minimized. The parameter w for this example is a pair: w[1] is a weight matrix, and w[2] is a bias vector. The training data xtrn,ytrn are in global variables. ypred is the predicted output, and the last line computes the quadratic loss. The loss function is implemented in regular Julia.

The train function takes initial parameters and returns optimized parameters. grad is the only AutoGrad function used: it creates a function gradfun that takes the same arguments as loss, but returns the gradient instead. The returned gradient will have the same type and shape as the input argument. The for loop implements gradient descent, where we calculate the gradient and subtract a scaled version of it from the weights.

See the examples directory for more examples, and the extensively documented core.jl for details.

Extending AutoGrad

AutoGrad can only handle a function if the primitives it uses have known gradients. You can add your own primitives with gradients as described in detail in core.jl or using the @primitive and @zerograd macros in util.jl Here is an example:

@primitive hypot(x1::Number,x2::Number)::y  (dy->dy*x1/y)  (dy->dy*x2/y)

The @primitive macro marks the hypot(::Number,::Number) method as a new primitive and the next two expressions define gradient functions wrt the first and second argument. The gradient expressions can refer to the parameters and the return variable (indicated after the final ::) of the method declaration.

Note that Julia supports multiple-dispatch, i.e. a function may have multiple methods each supporting different argument types. For example hypot(x1::Array,x2::Array) is another hypot method. In AutoGrad.jl each method can independently be defined as a primitive and can have its own specific gradient.

Code structure

core.jl implements the main functionality and acts as the main documentation source. util.jl has some support functions to define and test new primitives. interfaces.jl sets up support for common data structures including Arrays, Tuples, and Dictionaries. The numerical gradients are defined in files such as base/math.jl, special/trig.jl that mirror the organization under julia/base.

Current status and future work

The gradient coverage is spotty, I am still adding more gradients to cover the Julia base. Next steps are to make models faster by providing support for GPU operations and overwriting functions (to avoid memory allocation). I should also find out about the efficiency of closures and untyped functions in Julia which are used extensively in the code.

Acknowledgments and references

AutoGrad.jl was written by Deniz Yuret. Large parts of the code are directly ported from the Python autograd package. I'd like to thank autograd author Dougal Maclaurin for his support. See (Baydin et al. 2015) for a general review of automatic differentiation, autograd tutorial for some Python examples, and Dougal's PhD thesis for design principles. JuliaDiff has alternative differentiation tools for Julia. I would like to thank my students Ozan Arkan Can and Emre Yolcu for helpful contributions.

Full post...

June 14, 2016

Natural language communication with robots

Yonatan Bisk, Deniz Yuret, and Daniel Marcu. 2016. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2016) pp 751--761, San Diego, California. (PDF, Slides)


We propose a framework for devising empirically testable algorithms for bridging the communication gap between humans and robots. We instantiate our framework in the context of a problem setting in which humans give instructions to robots using unrestricted natural language commands, with instruction sequences being subservient to building complex goal configurations in a blocks world. We show how one can collect meaningful training data and we propose three neural architectures for interpreting contextually grounded natural language commands. The proposed architectures allow us to correctly understand/ground the blocks that the robot should move when instructed by a human who uses unrestricted language. The architectures have more diffi- culty in correctly understanding/grounding the spatial relations required to place blocks correctly, especially when the blocks are not easily identifiable.

Full post...

June 06, 2016

Saman Zia, M.S. 2016

Current position: Software Engineer at CBORD (Email, Linkedin)
M.S. Thesis: RGB-D Object Recognition using Deep Convolutional Neural Networks. Koç University, Department of Computer Engineering. June, 2016. (PDF, Presentation, Code)


Recent availability of low cost RGB-D sensors has led to an increased interest in object recognition combining both color and depth modalities. Object recognition from RGB-D images is particularly important in robotic tasks and the inclusion of depth has been proven to increase the performance. The problem of combining depth and color information is being widely researched. This thesis addresses this problem by initializing a 2-D Convolutional Neural Network (CNN) for RGB information via transfer learning and 3-D Convolutional Neural Network for encoding depth infor- mation. The obtained feature representations are fused to report performance over the RGB-D object recognition task. The transferred weights are from CNNs that are trained on large ImageNet classification challenge dataset and produces meaningful features. The depth information is encoded along with the color information in a 3-D voxel and learns joint features from scratch using a 3-D CNN. The approach is evaluated on the Washington RGB-D dataset and the performance for RGB category recognition exceeds the state-of-the-art, while the RGB-D performance is on par with it for category recognition. Due to good features learnt by the 3-D CNN, the po- tential of transfer learning from 2-D pre-trained CNN to 3-D CNN to include depth information is also addressed.

Full post...

March 19, 2016

How to write a technical paper

This is the evolving set of recommendations I share with my graduate students for technical writing...

  1. Empathy: This is the single most important principle of technical writing.  Try reading what you write from the perspective of somebody who has not spent the last few months working on your problem.  Better yet, find such a person and see if they understand everything you are talking about.  Don’t just take their word for it, ask them to tell you what they understand in their own words.  See where they struggle and debug your paper: Do they get lost in too much detail and miss the main point?  Do they get disoriented because you jump around too much?  Are there terms they do not understand?  Fix the paper using the following techniques until a dedicated freshman can understand all the important points.
  2. Winston’s Onion Rule: The document should state the most important points first, and expand on them gradually. It is a mistake to keep any important points until the end of the paper.  Only details and supporting material should be left to the end.  If I stop reading the document at any point, everything I haven't read so far should be less important than everything I have read up to this point:
    1. The title should be descriptive of the main point.
    2. The first sentence should state the main point.
    3. The first paragraph should expand on the first sentence.
    4. The first section should expand on the first paragraph.
    5. The first chapter should expand on the first section.
    6. The whole paper/thesis should expand on the first chapter, etc.
  3. Yuret’s Fractal Rule: Parts at every level of your document, down to each paragraph, should have their own introduction / conclusion to keep the reader oriented (i.e. stop them from asking “What is this person talking about now, and why?”):
    1. The first chapter of a paper/thesis should state the topic of the paper/thesis and the last chapter should summarize its point.
    2. The first section of a chapter should state the topic of the chapter and the last section should summarize its point.
    3. The first paragraph of a section should state the topic of the section and the last paragraph should summarize its point.
    4. The first sentence of a paragraph should state the topic of the paragraph and the last sentence should summarize its point.
  4. No undefined terms: Any technical term your nine year old niece would not understand should be defined before first use.  Any acronym should first be given in parentheses next to its long form before first use.  All variables in equations, all axes in graphs should be explained at the first opportunity.  Tables and Figures should have descriptive captions that can be understood stand-alone.  Technical terms and mathematical notation should be used consistently, no confusing variations allowed (i.e. calling the same thing context vector somewhere and word context vector elsewhere will confuse the reader into thinking these are two separate things).
  5. Replicability: Science is based on replicable results.   Your paper should provide enough detail (possibly in the appendices), and links to its code and data, to replicate each of its results.  In particular, for each set of experiments you should have:
    1. Data table: e.g. in a natural language processing experiment, things like number of words and sentences in train, dev, test; vocabulary size, tagset size, tag frequencies, out-of-vocabulary rate, average sentence length, i.e. any data statistic relevant to the task should go to a data table.
    2. Parameter table: things like the model structure, the training algorithm used, the hyperparameters used, number of training epochs, and any other details related to experimental replication should go to a table.
    3. Result table: the results (table or plot) should clearly indicate the evaluation metric, sensible lower bound baselines, upper bounds (e.g. inter-annotator agreement) if available, and current state of the art in published work to put your results in perspective.

Full post...

February 01, 2016

Learning Navigational Language from Linguistic and Visual Cues (2016-2018)

TUBITAK 1001 Project 114E628. "Dilbilimsel ve Görsel İpuçlarını Birlikte Kullanarak Gezinim Dilinin Öğrenilmesi." (2016-02-01 -- 2018-08-01)
Full post...