I am an associate professor in Computer Engineering at Koç University in Istanbul working at the Artificial Intelligence Laboratory. Previously I was at the MIT AI Lab and later co-founded Inquira, Inc. My research is in natural language processing and machine learning. This year I am helping organize SemEval-2013. For prospective students here are some research topics, papers, classes, blog posts and past students.
Koç Üniversitesi Bilgisayar Mühendisliği Bölümü'nde öğretim üyesiyim ve Yapay Zeka Laboratuarı'nda çalışıyorum. Bundan önce MIT Yapay Zeka Laboratuarı'nda çalıştım ve Inquira, Inc. şirketini kurdum. Araştırma konularım doğal dil işleme ve yapay öğrenmedir. Bu yıl SemEval-2013 organizasyonunda görev alıyorum. İlgilenen öğrenciler için araştırma konuları, makaleler, verdiğim dersler, Türkçe yazılarım, ve mezunlarımız.

May 19, 2013

Pitfalls of studying language in isolation

Studies of language acquisition and language understanding display a remarkable lack of attention to the subject matter of the utterances being studied.  This is probably because nobody knows how to represent and process meaning whereas the forms of utterances are readily available.  Thus "language acquisition" have come to mean the study of learning how to construct utterances "of the right form" and studies of language understanding focus on translating forms of utterances into other symbolic forms equally devoid of the richness and detail of the things the utterance is supposed to convey.
A real theory of language acquisition should study how babies learn to decode form-meaning mappings in an environment where lots of things are going on in addition to what is being said.  A real theory of language understanding should study what kinds of rich interconnected concepts and embodied simulations get triggered by words and constructions, how we decide what to simulate given the scant detail in descriptions, and what inferences are made possible beyond what is explicitly stated.  

All this is AI-complete you say?  Well by limiting ourselves to study language in isolation, we may have come to the end of the line where the ~80% accuracy limit of machine learning based computational linguistics (on almost any linguistic problem you can think of) is preventing us from building truly transformative applications.  Maybe we are shooting ourselves in the foot, and maybe, just maybe, some problems that look difficult right now are difficult not because we are missing the right machine learning algorithm or sufficient labeled data but because we are ignoring the constraints imposed by the meaning side of things.  We may have finally run out of options other than to try and crack the real problem, i.e. modeling what utterances are ABOUT.


Full post...

April 05, 2013

Turkish Language Resources

This post contains links to various Turkish language resources that I have collected. Please send a comment if you find Turkish resources that you would like to see on this page.

TS Corpus

Taner Sezer's TS Corpus is a 491M token general purpose Turkish corpus. See comments below for details.

BounWebCorpus

Hasim Sak's page contains some useful Turkish language resources and code in addition to a large web corpus.

Bibliography

Özgür Yılmazel's Bibliography on Turkish Information Retrieval and Natural Language Processing.

tr-disamb.tgz

Turkish morphological disambiguator code. Slow but 96% accurate. See Learning morphological disambiguation rules for Turkish for the theory.

correctparses_03.txt.gz, train.merge.gz

Turkish morphology training files. Semi-automatically tagged, has limited accuracy. Two files have the same data except the second file also includes the ambiguous parses (the first parse on each line is correct).

test.1.2.dis.gz, test.merge.gz

Turkish morphology test files, second one includes ambiguous parses (the first parse on each line is correct). The data is hand tagged, it has good accuracy.

tr-tagger.tgz

Turkish morphological tagger, includes Oflazer's finite state machines for Turkish. From Kemal Oflazer. Please use with permission. Requires the publically available Xerox Finite State software.

turklex.tgz, pc_kimmo.tgz

Turkish morphology rules for PC-Kimmo by Kemal Oflazer. Older implementation. Originally from www.cs.cmu.edu

Milliyet1.bz2, Milliyet2.bz2, Milliyet3.bz2

Original Milliyet corpus, one token per line, 19,627,500 total tokens. Latin-5 encoded, in three 11MB parts. From Kemal Oflazer. Please use with permission.

Turkish wordnet

From Kemal Oflazer. Please use with permission.

METU-Sabanci Turkish Treebank

Turkish treebank with dependency annotations. Please use with permission.

sozluk.txt.gz

English-Turkish dictionary (127157 entries, 826K) Originally from www.fen.bilkent.edu.tr/~aykutlu.

sozluk-boun.txt.gz
Turkish word list (25822 words, 73K) Originally from www.cmpe.boun.edu.tr/courses/cmpe230

Avrupa Birliği Temel Terimler Sözlüğü

(Originally from: www.abgs.gov.tr/ab_dosyalar, Oct 6, 2006)

BilisimSozlugu.zip

Bilişim Sözlüğü by Bülent Sankur (Originally from: www.bilisimsozlugu.com, Oct 9, 2006)

turkish.el

Emacs extension that automatically adds accents to Turkish words while typing on an English keyboard.

en-tr.zip, lm.tr.gz

Turkish English parallel text from Kemal Oflazer, Statistical Machine Translation into a Morphologically Complex Language, Invited Paper, In Proceedings of CICLING 2008 -- Conference on Intelligent Text Processing and Computational Linguistics, Haifa, Israel, February 2008 (lowercased and converted to utf8). The Turkish part of the dataset is "selectively split", i.e. some suffixes are separated from their stems, some are not. lm.tr.gz is the Turkish text used to develop the language model.


Full post...

February 27, 2013

Bret Victor

Bret Victor - Inventing on Principle from CUSEC on Vimeo.

Bret Victor's inspirational talk with his views on (1) how to flourish fragile ideas, and (2) how to live your life. For more from Bret, check out his website.
Full post...

Vi Hart


For more from my 7 year old daughter's new favorite educator Vi Hart check out Khan Academy, YouTube, other videos, Wikipedia, or her blog.
Full post...

January 25, 2013

A Mathematician's Lament by Paul Lockhart

This is a wonderful and heartfelt book about the poetry of mathematics and how we fail to introduce it to students at schools and to society in general. I can finally stop feeling guilty about my obsession with popular math books and cool problems even though I know they probably have no practical value to me or to society whatsoever. After all it is not practical value that makes reading poetry a pleasure! I am thankful to all my wonderful math teachers who bequeathed me this guilty pleasure without my knowledge or consent in spite of the horrible curriculum, text books, and test system. And I highly recommend the author's second book "Measurement" to those who seek an alternative to the typical 10 pound high school textbooks full of repetitive and unimaginative problems.

Full post... Related link