November 12, 2019

A simple explanation of Variational Autoencoders

The goal of VAE is to model your data \(X\) coming from a complicated distribution \(P(X)\) using a latent (unobserved, hypothesized) variable \(Z\): \[ P(x) = \int P(x|z) P(z) dz \] This identity is true for any distribution \(P\) and any value \(x\). VAE takes \(P(Z)\) to be the multivariate standard normal. Note that this identity can also be written as an expectation: \[ P(x) = E_{z\sim P(Z)}[P(x|z)] \] and can be approximated by sampling \(z_n\) from \(P(Z)\): \[ P(x) \approx \frac{1}{N} \sum_{z_n\sim P(Z)} P(x|z_n) \] However for high dimensional spaces (images, text) typically modeled by VAE, this would be a poor approximation because for a given \(x\) value, \(P(x|z)\) would be close to 0 almost everywhere. Randomly sampling from \(P(Z)\) would be unlikely to hit regions of \(Z\) space where \(P(x|z)\) is high. Say we had a distribution \(Q(Z|X)\) which is more likely to give us \(z\) values where \(P(x|z)\) is high. We could rewrite our former identity as: \[ P(x) = \int P(x|z) P(z) Q(z|x) / Q(z|x) dz \] Note that this identity can also be expressed as an expectation: \[ P(x) = E_{z\sim Q(Z|x)}[P(x|z) P(z) / Q(z|x)] \] and can be approximated by sampling \(z_n\) from \(Q(Z|x)\) (this is called importance sampling and would converge faster because \(Q\) gives us better \(z\) values): \[ P(x) \approx \frac{1}{N} \sum_{z_n\sim Q(Z|x)} P(x|z_n) P(z_n) / Q(z_n|x) \] To train a VAE model we pick some parametric functions \(P_\theta(X|Z)\) (i.e. decoder, likelihood, generative network) and \(Q_\phi(Z|X)\) (i.e. encoder, posterior, inference network) and fiddle with their parameters to maximize the likelihood of the training data \( D=\{x_1,\ldots,x_M\} \). Actually, instead of likelihood \(P(D) = \prod P(x_m)\) we use log likelihood: \(\log P(D) = \sum\log P(x)\) because it nicely decomposes as a sum over each example. We now have to figure out how to approximate \(\log P(X)\). \[ \log P(x) = \log E_{z\sim Q(Z|x)}[P(x|z) P(z) / Q(z|x)] \] Jensen's inequality tells us that log of an expectation is greater than or equal to the expectation of the log: \[ \log P(x) \geq E_{z\sim Q(Z|x)}\log[P(x|z) P(z) / Q(z|x)] \] The RHS of this inequality is what is known in the business as ELBO (evidence lower bound), more typically written as: \[ \log P(x) \geq E_{z\sim Q(Z|x)}[\log P(x|z)] - D_{KL}[Q(Z|x)\,\|\,P(Z)] \] This standard expression tells us more directly what to compute but obscures the intuition that ELBO is just the expected log of an importance sampling term.

To see the exact difference between the two sides of this inequality we can use the integral version: \[ \begin{align} \log & P(x) - \int \log[P(x|z) P(z) / Q(z|x)] Q(z|x) dz \\ = & \int [\log P(x) - \log P(x|z) - \log P(z) + \log Q(z|x)] Q(z|x) dz \\ = & \int [\log Q(z|x) - \log P(z|x)] Q(z|x) dz \\ = & D_{KL}[Q(Z|x)\,\|\,P(Z|x)] \end{align} \] This allows us to write an exact equation, indicating the error of our approximation is given by the KL divergence between \(Q(Z|x)\) and \(P(Z|x)\): \[ \begin{align} \log & P(x) - D_{KL}[Q(Z|x)\,\|\,P(Z|x)] = \\ & E_{z\sim Q(Z|x)}[\log P(x|z)] - D_{KL}[Q(Z|x)\,\|\,P(Z)] \end{align} \]

Reference: Tutorial on Variational Autoencoders by Carl Doersch (
Full post...

September 25, 2019

Morphological analysis using a sequence decoder

Ekin Akyürek, Erenay Dayanık, Deniz Yuret (2019). Transactions Of The Association For Computational Linguistics, 7, 567-579. (PDF, arXiv)

Abstract: We introduce Morse, a recurrent encoder-decoder model that produces morphological analyses of each word in a sentence. The encoder turns the relevant information about the word and its context into a fixed size vector representation and the decoder generates the sequence of characters for the lemma followed by a sequence of individual morphological features. We show that generating morphological features individually rather than as a combined tag allows the model to handle rare or unseen tags and outperform whole-tag models. In addition, generating morphological features as a sequence rather than e.g. an unordered set allows our model to produce an arbitrary number of features that represent multiple inflectional groups in morphologically complex languages. We obtain state-of-the art results in nine languages of different morphological complexity under low-resource, high-resource and transfer learning settings. We also introduce TrMor2018, a new high accuracy Turkish morphology dataset. Our Morse implementation and the TrMor2018 dataset are available online to support future research.

See for a Morse implementation in Julia/Knet and for the new Turkish dataset.

Full post...

September 09, 2019

Overview of CLEF 2019 Lab ProtestNews: Extracting Protests from News in a Cross-context Setting

Hürriyetoğlu, Ali and Yörük, Erdem and Yuret, Deniz and Yoltar, C. and Gürel, B. and Duruşan, F. and Mutlu, O. and Akdemir, A. In CLEF 2019 Working Notes. September, 2019. (PDF, Proceedings).

Abstract: We present an overview of the CLEF-2019 Lab ProtestNews on Extracting Protests from News in the context of generalizable natural language processing. The lab consists of document, sentence, and token level information classification and extraction tasks that were referred as task 1, task 2, and task 3 respectively in the scope of this lab. The tasks required the participants to identify protest relevant information from English local news at one or more aforementioned levels in a cross-context setting, which is cross-country in the scope of this lab. The training and development data were collected from India and test data was collected from India and China. The lab attracted 58 teams to participate in the lab. 12 and 9 of these teams submitted results and working notes respectively. We have observed neural networks yield the best results and the performance drops significantly for majority of the submissions in the cross-country setting, which is China.

Full post...

August 01, 2019

KU_AI at MEDIQA 2019: Domain-specific Pre-training and Transfer Learning for Medical NLI

Cengiz, Cemil and Sert, Ulaş and Yuret, Deniz. In Proceedings of the 18th BioNLP Workshop and Shared Task. August, 2019. Florence, Italy. (PDF)

Abstract: In this paper, we describe our system and results submitted for the Natural Language Inference (NLI) track of the MEDIQA 2019 Shared Task. As KU_AI team, we used BERT as our baseline model and pre-processed the MedNLI dataset to mitigate the negative impact of de-identification artifacts. Moreover, we investigated different pre-training and transfer learning approaches to improve the performance. We show that pre-training the language model on rich biomedical corpora has a significant effect in teaching the model domain-specific language. In addition, training the model on large NLI datasets such as MultiNLI and SNLI helps in learning task-specific reasoning. Finally, we ensembled our highest-performing models, and achieved 84.7% accuracy on the unseen test dataset and ranked 10th out of 17 teams in the official results.

Full post...

July 28, 2019

Research Statement

My main research area is natural language processing and my current focus is on grounded natural language learning systems based on neural network models trained end-to-end. To accelerate my research, I develop and maintain Knet, the Koç University deep learning framework, which has become the tool of choice for hundreds of researchers and students across the globe (937 github stars as of July 2019).

General AI Research

In addition to natural language processing, other areas of artificial intelligence I have studied include genetic algorithms and optimization [1, 2, 3, 4], game search [5, 6], computational economics and finance [7, 8, 9, 10, 11, 12, 13], computational biology [14, 15, 16, 17, 18, 19], multimedia processing [20, 21], machine learning algorithms and frameworks [22, 23, 24, 25, 26, 27].

Rule-based NLP

My natural language research has spanned the three eras of rule-based, statistical, and neural systems. I started with Boris Katz’s rule-based natural language question answering system START (the longest running NLP system on the Internet) and developed its OmniBase component which allowed access to structured websites like IMDB and World Factbook using a uniform interface [28, 29, 30]. I later co-founded a company, Inquira Inc., which commercialized question answering technology for customer self-service applications of large companies like Apple and Ebay [31, 32].

Statistical NLP

Natural languages are suffused with ambiguities and exceptions which makes development of robust rule-based systems difficult. Statistical models gradually replaced rule-based systems in the 1990s and 2000s as a more robust alternative. During this period I developed statistical models for supervised and unsupervised dependency parsing [33, 34], word sense disambiguation and induction [35, 36, 37, 38, 39], child word category acquisition [40, 41, 42, 43], morphological disambiguation [44, 45, 46], semantic role labeling [47], statistical language modeling [48, 49], and machine translation [50, 51, 52, 53].

A major portion of this work focused on unsupervised models because the large amounts of labeled data required for supervised models are expensive to collect, difficult to get agreement on, and not required by infants learning language. Nevertheless, supervised models play an important role in today’s NLP applications, so to help create labeled datasets and perform evaluations that push the state-of-the-art forward, I co-organized the CoNLL-2007 Shared Task on Dependency Parsing [54], the SemEval-2007 Shared Task on Classification of Semantic Relations between Nominals [55, 56], the SemEval-2010 Shared Task on Parser Evaluation Using Textual Entailments [57, 58], the SemEval-2012 and SemEval-2013 Semantic Evaluation Exercises [59, 60].

NLP and Neural Networks

In 2010s, neural network based natural language processing systems started catching up in performance with their statistical counterparts. More importantly, layers of (morphological, syntactic, semantic) representations designed by linguists and used to train statistical models have been gradually replaced with features automatically learned by deep models from data (A similar transition took place in computer vision where hand-designed HOG/SIFT type features have beenreplaced with convolutional layers trained from data). Feature engineering no longer plays the central role it did with statistical models: discrete features are replaced by continuous embedding vectors, hand-designed feature combinations or kernels are replaced by adaptive basis functions automatically learned by neural networks. For the first time, it became possible to train end-to-end models, e.g. neural machine translation or image captioning systems are trained on nothing but example input-output pairs.

During this period, I developed a novel continuous representation of word context based on the distribution of possible substitutes for a word rather than its neighbors. My students and I showed that this “paradigmatic” word context representation generalized better and improved the state-of-the-art on problems such as unsupervised part of speech induction [61, 62, 63], unsupervised word sense induction [64] and semantic word similarity [65]. Using neural models with little feature engineering we also developed a named entity recognizer that achieves state-of-the-art results in 7 languages [66] and a dependency parser that parsed 81 treebanks in 49 languages and was ranked 7th out of the 33 systems participating in the CoNLL-2017 UD Shared Task [67]. Neural models generally require more data compared to statistical models which poses a problem in low-resource settings. We showed one way to mitigate this problem using transfer learning: a low-resource Turkish-English machine translation model performs 50% better when initialized with weights from a high-resource French-English model compared to random initialization [68]. Another disadvantage of neural models is their lack of interpretability, which we tried to address in [69] where we discovered hidden units that count various features in a sequence-to-sequence RNN model.

I am most excited about the potential of deep neural network models for grounded language learning, i.e. learning the meanings of words, phrases, and sentences by observing natural interactions. In a preliminary study, we showed that a neural model can learn to follow instructions for arranging blocks on a table-top by observing humans giving and following instructions [70]. I am currently running two funded projects to explore this topic further [71, 72] and our ongoing studies are promising [73]. I suspect robust natural language understanding systems of the future will be end-to-end trained rather than hand engineered.

Sample Papers

The following papers are a representative sample of my work:

  • In “The Noisy Channel Model for Unsupervised Word Sense Disambiguation” [39], we use plain text and WordNet sense frequencies to build a generative probabilistic model for WSD without sense-tagged data.
  • In “Learning Syntactic Categories Using Paradigmatic Representations of Word Context” [62] we show how to construct a context vector based on the substitute distribution of a word and get state-of-the-art results on part-of-speech induction.
  • In “Transfer Learning for Low-Resource Neural Machine Translation” [68], we show how a neural machine translation model for a low-resource language pair can be significantly improved by borrowing parameters from a model for a high-resource language pair.
  • In “Natural Language Communication with Robots” [70], we propose a grounded language learning task of arranging blocks on a table-top in response to natural language instructions and train a baseline model end-to-end using data collected from Amazon Turk.
  • In “Knet: beginning deep learning with 100 lines of Julia” [24], I show how a high level programming language can be used as a deep learning framework when supported with automatic differentiation and GPU kernels.

References (My bibtex page has PDFs for most)

[1] Deniz Yuret and Michael de la Maza. Dynamic hill climbing: Overcoming the limitations of optimization techniques. In The Second Turkish Symposium on Artificial Intelligence and Neural Networks, 1993.

[2] Michael de la Maza and Deniz Yuret. Dynamic hill climbing. AI Expert, 1994.

[3] Deniz Yuret. From genetic algorithms to efficient optimization. Technical Report 1569, MIT AI Laboratory, 1994.

[4] Michael de la Maza and Deniz Yuret. Seeing clearly: Medical imaging now and tomorrow. In Clifford A. Pickover, editor, Future Health: Computers and Medicine in the 21st Century. St. Martin’s Press, 1995.

[5] Deniz Yuret. The principle of pressure in chess. In The Third Turkish Symposium on Artificial Intelligence and Neural Networks (TAINN ’94), 1994.

[6] David Allen McAllester and Deniz Yuret. Alpha-beta-conspiracy search. ICGA Journal, 25(1):16–35, 2002.

[7] Michael de la Maza and Deniz Yuret. A futures market simulation with non-rational participants. In Rodney Allen Brooks and Pattie Maes, editors, Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems, 1994.

[8] Deniz Yuret and Michael de la Maza. A genetic algorithm system for predicting the OEX. Technical Analysis of Stocks and Commodities, 1994.

[9] Michael de la Maza and Deniz Yuret. Experimenting with a market simulation. The Magazine of Artificial Intelligence in Finance, 1(3), 1994.

[10] Michael de la Maza and Deniz Yuret. A model of stock market participants. In Jörg Biethahn and Volker Nissen, editors, Evolutionary Algorithms in Management Applications. Springer, 1995.

[11] Michael de la Maza and Deniz Yuret. Neural network applications: A critique. The Magazine of Artificial Intelligence in Finance, 2(1), 1995.

[12] Michael de la Maza, Ayla Oğuş, and Deniz Yuret. How do firms transition between monopoly and competitive behavior? An agent-based economic model. In Proceedings of the Sixth International Conference on Artificial Life, 1998.

[13] Ayla Oğuş, Michael de la Maza, and Deniz Yuret. Modeling the economics of internet companies. In Computing in Economics and Finance, Proceedings of the Fifth International Conference of the Society for Computational Economics, 1999.

[14] Özlem Keskin, Deniz Yuret, Attila Gürsoy, Metin Türkay, and Burak Erman. Relationships between amino acid sequence and backbone torsion angle preferences. Proteins: Structure, Function, and Bioinformatics, 55(4):992–998, June 2004.

[15] Ersin Yurtsever, Deniz Yuret, and Burak Erman. Quantum mechanical calculations of tryptophan and comparison with conformations in native proteins. J. Phys. Chem. A, 110(51):13933–13938, December 2006.

[16] Alkan Kabakçıoğlu, Deniz Yuret, Mert Gür, and Burak Erman. Anharmonicity, mode-coupling and entropy in a fluctuating native protein. Physical Biology, 7:046005, October 2010.

[17] Onur Varol, Deniz Yuret, Burak Erman, and Alkan Kabakçıoğlu. Mode coupling points to functionally important residues in myosin II. Proteins: Structure, Function, and Bioinformatics, 82(9):1777–1786, September 2014.

[18] Alkan Kabakcioglu, Onur Varol, Deniz Yuret, and Burak Erman. Functionally important residues from mode coupling during short-time protein dynamics. In APS Meeting Abstracts, volume 1, page 48009, 2015.

[19] Onur Varol, Deniz Yuret, Burak Erman, and Alkan Kabakcioglu. Functionally important residues from mode coupling during short-time protein dynamics. Biophysical Journal, 108(2):377a, 2015.

[20] Barış Bozkurt, Ozan Baysal, and Deniz Yuret. A dataset and baseline system for singing voice assessment. In The 13th International Symposium on Computer Music Multidisciplinary Research (CMMR), September 2017.

[21] Saman Zia, Yücel Yemez, and Deniz Yuret. RGB-D object recognition using deep convolutional neural networks. In The IEEE International Conference on Computer Vision (ICCV), pages 896–903, October 2017.

[22] Deniz Yuret and Michael de la Maza. The greedy prepend algorithm for decision list induction. In A. Levi et al., editors, ISCIS 2006, LNCS 4263, pages 37–46, Berlin Heidelberg, November 2006. Springer-Verlag.

[23] Ergun Biçici and Deniz Yuret. Locally scaled density based clustering. In B. Beliczynski et al., editors, ICANNGA 2007, Part I, LNCS 4431, pages 739–748, Berlin Heidelberg, April 2007. Springer-Verlag.

[24] Deniz Yuret. Knet: beginning deep learning with 100 lines of Julia. In Machine Learning Systems Workshop at NIPS 2016, December 2016.

[25] Enis Berk Çoban, Deniz Yuret, and Didem Unat. Multidimensional broadcast operation on the GPU. In 5. Ulusal Yüksek Başarımlı Hesaplama Konferansı, İstanbul, September 2017.

[26] Doğa Dikbayır, Enis Berk Çoban, İlker Kesen, Deniz Yuret, and Didem Unat. Fast multidimensional reduction and broadcast operations on GPU for machine learning. Concurrency and Computation: Practice and Experience, May 2018.

[27] Mike Innes, Deniz Yuret, et al. On machine learning and programming languages. In SysML Conference, Stanford, CA, Feb 2018.

[28] Boris Katz, Deniz Yuret, et al. Blitz: a preprocessor for detecting context-independent linguistic structures. In Proceedings of the 5th Pacific Rim International Conference on Artificial Intelligence (PRICAI ’98), 1998.

[29] Boris Katz, Deniz Yuret, et al. Integrating web resources and lexicons into a natural language query system. In Proceedings of the 6th IEEE International Conference on Multimedia Computing and Systems (IEEE ICMCS’99), 1999.

[30] Boris Katz, Sue Felshin, Deniz Yuret, et al. Omnibase: Uniform access to heterogeneous data for question answering. In NLDB 2002, LNCS 2553, pages 230–234. Springer-Verlag, 2002.

[31] Deniz Yuret. Method of utilizing implicit references to answer a query. US Patent Number 6957213, Oct 2005.

[32] Edwin Riley Cooper, Gann Bierner, Laurel Kathleen Graham, Deniz Yuret, James Charles Williams, and Filippo Beghelli. Ontology for use with a system, method, and computer readable medium for retrieving information and response to a query. US Patent Number 8612208, 9747390, Dec 2013.

[33] Deniz Yuret. Discovery of linguistic relations using lexical attraction. PhD thesis, MIT, 1998.

[34] Deniz Yuret. Dependency parsing as a classification problem. In Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X), June 2006.

[35] Özlem Uzuner, Boris Katz, and Deniz Yuret. Word sense disambiguation for information retrieval. In Proceedings of the 1999 16th National Conference on Artificial Intelligence (AAAI-99), 1999.

[36] Deniz Yuret. Some experiments with a Naive Bayes WSD system. In Rada Mihalcea and Phil Edmonds, editors, Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pages 265–268, Barcelona, Spain, July 2004. Association for Computational Linguistics.

[37] Ergun Biçici and Deniz Yuret. Clustering word pairs to answer analogy questions. In Proceedings of the Fifteenth Turkish Symposium on Artificial Intelligence and Neural Networks (TAINN 2006), June 2006.

[38] Deniz Yuret. KU: Word sense disambiguation by substitution. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), pages 207–214, Prague, Czech Republic, June 2007. Association for Computational Linguistics.

[39] Deniz Yuret and Mehmet Ali Yatbaz. The noisy channel model for unsupervised word sense disambiguation. Computational Linguistics, 36(1):111–127, March 2010.

[40] Deniz Yuret, A. Engin Ural, F. Nihan Ketrez, Dilara Kocbas, and Aylin C. Kuntay. Morphological cues vs. number of nominals in learning verb types from child-directed speech. In Boston University Conference on Language Development (BUCLD33), October 2008.

[41] A. Engin Ural, Deniz Yuret, Nihan Ketrez, Dilara Kocbas, and Aylin Kuntay. Morphological cues vs. number of nominals in learning verb types in turkish: Syntactic bootstrapping mechanism revisited. Language and Cognitive Processes, 24(10):1393–1405, December 2009.

[42] Mehmet Ali Yatbaz, Volkan Cirik, Aylin Küntay, and Deniz Yuret. Paradigmatic representations outperform syntagmatic representations in distributional learning of grammatical categories. In BUCLD, November 2014.

[43] Mehmet Ali Yatbaz, Volkan Cirik, Aylin Küntay, and Deniz Yuret. Learning grammatical categories using paradigmatic representation: Substitute words for language acquisition. In COLING, December 2016.

[44] Deniz Yuret and Ferhan Türe. Learning morphological disambiguation rules for turkish. In HLT-NAACL 06, June 2006.

[45] Mehmet Ali Yatbaz and Deniz Yuret. Unsupervised morphological disambiguation using statistical language models. In NIPS 2009 Workshop on Grammar Induction, Representation of Language and Language Learning, Vancouver, Canada, December 2009.

[46] Deniz Yuret and Ergun Biçici. Modeling morphologically rich languages using split words and unstructured dependencies. In ACL-IJCNLP, Singapore, August 2009.

[47] Deniz Yuret, Mehmet Ali Yatbaz, and Ahmet Engin Ural. Discriminative vs. generative approaches in semantic role labeling. In Conference on Computational Natural Language Learning (CoNLL), Manchaster, UK, Aug 2008.

[48] Deniz Yuret. Smoothing a tera-word language model. In Proceedings of ACL-08: HLT, Short Papers, pages 141–144, Columbus, Ohio, June 2008. Association for Computational Linguistics.

[49] Deniz Yuret. Fastsubs: An efficient and exact procedure for finding the most likely lexical substitutes based on an n-gram language model. Signal Processing Letters, IEEE, 19(11):725–728, November 2012.

[50] E. Bicici and D. Yuret. L 1 regularized regression for reranking and system combination in machine translation. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pages 282–289. Association for Computational Linguistics, July 2010.

[51] Ergun Biçici and Deniz Yuret. Instance selection for machine translation using feature decay algorithms. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 272–283, Edinburgh, Scotland, July 2011. Association for Computational Linguistics.

[52] Ergun Biçici and Deniz Yuret. Regmt system for machine translation, system combination, and evaluation. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 323–329, Edinburgh, Scotland, July 2011. Association for Computational Linguistics.

[53] Ergun Biçici and Deniz Yuret. Optimizing instance selection for statistical machine translation with feature decay algorithms. IEEE Transactions on Audio, Speech and Language Processing, 23(2):339–350, February 2015.

[54] Joakim Nivre, Johan Hall, Sandra Kübler, Ryan McDonald, Jens Nilsson, Sebastian Riedel, and Deniz Yuret, editors. The CoNLL 2007 Shared Task on Dependency Parsing, Prague, Czech Republic, June 2007.

[55] Roxana Girju, Preslav Nakov, Vivi Nastase, Stan Szpakowicz, Peter Turney, and Deniz Yuret. Semeval-2007 task 04: Classification of semantic relations between nominals. In SemEval-2007: 4th International Workshop on Semantic Evaluations, June 2007.

[56] Roxana Girju, Preslav Nakov, Vivi Nastase, Stan Szpakowicz, Peter Turney, and Deniz Yuret. Classification of semantic relations between nominals. Language Resources and Evaluation, 43(2):105–121, June 2009.

[57] D. Yuret, A. Han, and Z. Turgut. Semeval-2010 task 12: Parser evaluation using textual entailments. In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 51–56. Association for Computational Linguistics, July 2010.

[58] Deniz Yuret, Laura Rimell, and Aydin Han. Parser evaluation using textual entailments. Language Resources and Evaluation, 47(3):639–659, September 2012.

[59] Deniz Yuret and Suresh Manandhar, editors. Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), 2012.

[60] Deniz Yuret and Suresh Manandhar, editors. Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), 2013.

[61] Mehmet Ali Yatbaz and Deniz Yuret. Unsupervised part of speech tagging using unambiguous substitutes from a statistical language model. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 1391–1398. Association for Computational Linguistics, August 2010.

[62] Mehmet Ali Yatbaz, Enis Sert, and Deniz Yuret. Learning syntactic categories using paradigmatic representations of word context. In Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing (EMNLP-CONLL 2012), Jeju, Korea, July 2012. Association for Computational Linguistics.

[63] Deniz Yuret, Mehmet Ali Yatbaz, and Enis Sert. Unsupervised instance-based part of speech induction using probable substitutes. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 2303–2313, Dublin, Ireland, August 2014. Dublin City University and Association for Computational Linguistics.

[64] Osman Başkaya, Enis Sert, Volkan Cirik, and Deniz Yuret. AI-KU: Using substitute vectors and co-occurrence modeling for word sense induction and disambiguation. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pages 300–306, Atlanta, Georgia, USA, June 2013. Association for Computational Linguistics.

[65] Oren Melamud, Ido Dagan, Jacob Goldberger, Idan Szpektor, and Deniz Yuret. Probabilistic modeling of joint-context in distributional similarity. In Proceedings of the Eighteenth Conference on Computational Natural Language Learning, pages 181–190, Ann Arbor, Michigan, June 2014. Association for Computational Linguistics.

[66] Onur Kuru, Ozan Arkan Can, and Deniz Yuret. Charner: Character-level named entity recognition. In COLING, December 2016.

[67] Ömer Kırnap, Berkay Furkan Önder, and Deniz Yuret. Parsing with context embeddings. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 80–87, Vancouver, Canada, August 2017. Association for Computational Linguistics.

[68] Barret Zoph, Deniz Yuret, Jon May, and Kevin Knight. Transfer learning for low-resource neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1568–1575, Austin, Texas, November 2016. Association for Computational Linguistics.

[69] Xing Shi, Kevin Knight, and Deniz Yuret. Why neural translations are the right length. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2278–2282, Austin, Texas, November 2016. Association for Computational Linguistics.

[70] Yonatan Bisk, Deniz Yuret, and Daniel Marcu. Natural language communication with robots. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 751–761, San Diego, California, June 2016. Association for Computational Linguistics.

[71] Yücel Yemez and Deniz Yuret. Dilbilimsel ve görsel i̇puçlarını birlikte kullanarak gezinim dilinin öğrenilmesi. TÜBİTAK 1001 Project, 2015–2018.

[72] Luc De Raedt, Deniz Yuret, and Alessandro Saffiotti. Relational symbol grounding through affordance learning (ReGROUND). CHIST-ERA Project on Human Language Understanding: Grounding Language Learning, 2015–2018.

[73] Laura Antanas, Ozan Arkan Can, Jesse Davis, Luc De Raedt, Amy Loutfy, Andreas Persson, Alessandro Saffiotti, Emre Ünal, Deniz Yuret, and Pedro Zuidberg dos Martires. Relational symbol grounding through affordance learning: An overview of the ReGround project. In Grounding Language Understanding (GLU 2017) ISCA Satellite Workshop of Interspeech 2017. Stockholm University, August 2017.

Full post...

Ekin Akyürek, B.S. 2019

Current Position: PHD Student at MIT, Boston (personal website)
Publications: bibtex.php
Code: Morse: Morphological analysis using a sequence decoder. KnetLayers: useful layers for Knet. MAC-Network: A Julia implementation of "Compositional Attention Networks for Machine Reasoning".

Full post...

June 28, 2019

Julia ile Derin Öğrenmeye Giriş

Julia ile Derin Öğrenmeye Giriş. (c) Deniz Yuret. 28 Haziran 2019.
Kuzeybatıda Yapay Öğrenme Yaz Okulu:
Dersleri kendi bilgisayarınızdan takip edebilmek için: sitesine girin. Launch tuşuna bastıktan sonra tutorials/more-advanced-materials/ML-demos/knet-tutorial dizininden Jupyter Notebook formatındaki programları çalıştırabilirsiniz.

0. Playlist:
1. Yapay öğrenmeye giriş:
2. Neden Julia?:
3. Julia öğrenelim:
4. MNIST el yazısı tanıma:
5. Klasik algoritmalar:
6. Doğrusal modeller:
7. Çok katmanlı modeller:
8. Evrişimli sinir ağları:
9. Özyineli sinir ağları:
10. IMDB duyarlılık analizi:
11. Harf tabanlı dil modelleri:
12. Soru cevap:
Full post...

June 16, 2019

"Tasarım Ne Bekler" kitabındaki yapay zeka sohbetim

Deniz Yüret
Hazırlayan: Meriç Tuncez
Tasarım Ne Bekler, © 2019 KUAR Yayınları

Konumuzla alakalı araştırma yaparken 2015 yılında Google’ın siyahi bir çiftin fotoğrafını goriller olarak etiketlediğine dair bir haberle karşılaştım. Benzer şekilde Google’ın iş önerilerinde bulunurken erkeklere kadınlara oranla altı kat daha yüksek maaşlı işler önerdiğine dair bir haber var. Bu bilgiden yola çıkarsak yapay zekânın algoritma kaynaklı (algorithmic) önyargıdan yani onu üreten kişinin ön yargılarından uzaklaşması mümkün mü? Ya da nasıl mümkün olabilir?

Öncelikle programın niye bu ön yargılara sahip olduğunu kısaca anlatayım. Bu bahsettiğin teknolojilerin hepsi eski usul “Yazılım 1.0” diyebileceğimiz birilerinin oturup bilgisayara bir şeyler programlaması şeklinde geliştirilmiyor.

Bana 20 sene önce bu soru sorulsaydı derdim ki “Bunu yazan programcı ırkçı ya da cinsiyetçi. Dolayısıyla bu adamı işten atın.” Ama şu anda artık bu yeni teknolojiler bu şekilde geliştirilmiyor. Onun yerine örneklere bakarak istatistikler üzerinden geliştiriliyor.

Yani iş bulma ya da resimden bir şeyler tanıma konusunda bir sürü etiketlenmiş veri hazırlıyorsunuz. Bu etiketlendirilmiş veriyi bilgisayara veriyorsunuz. Bilgisayar milyonlarca örnek üzerinden birtakım şeyleri öğrenip ondan sonra sizin sorularınıza cevap vermeye başlıyor.

Şimdi verdiğiniz veride bir önyargı var ve orada cinsiyetçi ya da ırkçı birtakım şeyler varsa programın önyargıyı da bu algoritmaların içine alması gayet normal. Bu durum algoritmanın suçu değil, verdiğimiz verinin suçu. Dolayısıyla biz eğer bu önyargı konusunda gerçekten duyarlı davranmak istiyorsak veriyi ona göre hazırlamamız lazım.

Yani bilgisayarda onu eğitim verisi olarak kullanmadan önce veriyi dengelememiz lazım. Benzer bir olay geçen sene Microsoft’ta yaşandı. Bir “sohbet robotu” (chatbot) hazırlayıp bunu Twitter’a saldılar. 24 saat sonra kapatmak zorunda kaldılar çünkü insanlardan birçok kötü, ırkçı cinsiyetçi dil elemanlarını öğrenip bunları taklit etmeye başlamıştı.

Yani bu öğrenme algoritmalarını masum birer bebek olarak düşünebiliriz. Ona ne öğretirsek o da aynı şekilde onu tekrarlamayı öğreniyor. Dolayısıyla bu öğretmenin kabahati olabilir.

Örneğin bir AlphaGo (Google DeepMind tarafından geliştirilmiş Go oyununu oynayan bir program) problemini ele aldığımızda bizim kazandığımız nokta belli. Yani nasıl kazanabileceğimiz o oyunda belli ve skorumuz var. Ama mesela bir tasarım probleminde aynı şekilde olmuyor bu, birçok farklı sonuca gitme yolu olduğunu görüyoruz. Örneğin bir iklim değişikliği için tasarım yapılacağı zaman “yapay zekâ”yı nasıl kullanabiliriz? Bizim çıktımız ne olacak burada? Yani sadece “iklimdeki sıcaklığı düşürmek” mi çıktımız? Yoksa başka bir şey mi? Yani böyle karışık bir problemde sonuçlarını ve neyin doğru olduğunu bilemediğimiz durumlarda biz yapay zekâyı ya da “özdevimli öğrenme”yi (machine learning) nasıl tasarımlarımızda kullanabiliriz?

Bu bence şu anda dahi çözümlenememiş bir soru. Çünkü yapay zekâ modellerini eğitirken verdiğimiz verinin yanı sıra bir de “objektif fonksiyon” (objective function) ya da “hata fonksiyonu” (error function) denilen bir değer atamamız gerekli.

Yani genel olarak “Ben sana böyle girdiler verdiğimde böyle çıktılar istiyorum” gibi bir eğitim verisi veriyoruz bu öğrenen programlara. Ama onun yanı sıra “Sen bu istediğim çıktıyı değil ondan biraz daha farklı bir çıktı üretirsen de ben senin hata oranını şu şekilde ölçeceğim, senin objektif fonksiyonun bu olacak.” şeklinde tasarımcının karar vermesi gerekiyor. Dolayısıyla neyi en uygun şekilde kullanacağımıza bizim karar vermemiz lazım. Bu dediğim gibi çok kolay bir problem değil. Özellikle iklim değişikliği gibi karmaşık konularda problem daha da zorlaşıyor.

Elon Musk’ın bu “Robotlar dünyayı fethedecek.” senaryosunu aydınlatabilecek çalışmalar yapılmakta günümüzde ve araştırmacıların en çok kaygılandığı konu bu. Yani biz yapay zekâya bir hedef belirlerken o hedef belirleme konusunda çok dikkatli olmazsak bu sistemlerin bizim o anda hiç beklemediğimiz birtakım yönlere gitmesi mümkün.

Diyelim ki dünyanın ısısını düşürmeyi bir hedef olarak verirsek bize verdiği çözümler yeni bir buz çağına sebep olabilir. Ama diğer yandan bu yeni bir problem değil ve yapay zekâya mahsus bir problem de değil. Geçenlerde yapay zekânın bu objektif fonksiyon problemi ile finansal marketleri ya da politik sistemleri karşılaştıran bir makale okudum. İnsanlar uzun zamandır karmaşık sistemler tasarlamakta güçlük çekiyorlar.

Dolayısıyla gayet iyi niyetlerle tasarlanmış Avrupa Birliği ya da Menkul Kıymetler Borsası gibi karmaşık sosyal sistemler düşünün. Bu sistemlerde de tasarlayanların kötü bir niyeti olmamasına rağmen sistem kendi dinamikleri içerisinde hiç beklemediğimiz birtakım sonuçlara sebep olup bize zarar verecek yönlere gidebiliyor.

Dolayısıyla bu bence üzerinde çalışmamız gereken bir sorun. Bunun bu arada teknik olarak kullanılan adı “hizalama” (alignment). “Sizin değerlerinizle geliştirdiğiniz sistemin ya da programın değerlerinin birbirine paralel hale getirilmesi nasıl mümkün olabilir?” Bu halen üzerinde çalışılan açık bir problem.

Gerisini oku

Full post...

June 06, 2019

Team Howard Beale at SemEval-2019 Task 4: Hyperpartisan News Detection with BERT

Osman Mutlu, Ozan Arkan Can and Erenay Dayanık. 2019. In International Workshop on Semantic Evaluation (SemEval-2019 at NAACL-HLT-2019). (paper, proceedings)

Abstract: This paper describes our system for SemEval-2019 Task 4: Hyperpartisan News Detection (Kiesel et al., 2019). We use pretrained BERT (Devlin et al., 2018) architecture and investigate the effect of different fine tuning regimes on the final classification task. We show that additional pretraining on news domain improves the performance on the Hyperpartisan News Detection task. Our system ranked 8th out of 42 teams with 78.3% accuracy on the held-out test dataset.

Full post...

Learning from Implicit Information in Natural Language Instructions for Robotic Manipulations

Ozan Arkan Can, Pedro Zuidberg Dos Martires , Andreas Persson , Julian Gaal , Amy Loutfi , Luc De Raedt , Deniz Yuret and Alessandro Saffiotti. 2019. In Proceedings of the Combined Workshop on Spatial Language Understanding (SpLU) & Grounded Communication for Robotics (RoboNLP) at NAACL-HLT-2019. (abstract, paper, poster, proceedings)

Abstract: Human-robot interaction often occurs in the form of instructions given from a human to a robot. For a robot to successfully follow instructions, a common representation of the world and objects in it should be shared between humans and the robot so that the instructions can be grounded. Achieving this representation can be done via learning, where both the world representation and the language grounding are learned simultaneously. However, in robotics this can be a difficult task due to the cost and scarcity of data. In this paper, we tackle the problem by separately learning the world representation of the robot and the language grounding. While this approach can address the challenges in getting sufficient data, it may give rise to inconsistencies between both learned components. Therefore, we further propose Bayesian learning to resolve such inconsistencies between the natural language grounding and a robot’s world representation by exploiting spatio-relational information that is implicitly present in instructions given by a human. Moreover, we demonstrate the feasibility of our approach on a scenario involving a robotic arm in the physical world.

Full post...

April 07, 2019

A Task Set Proposal for Automatic Event Information Collection across Multiple Countries

Ali Hürriyetoğlu, Erdem Yörük, Deniz Yuret, Çağrı Yoltar, Burak Güler, Fırat Duruşan and Osman Mutlu. 2019. In ECIR (CLEF Organizers Lab Track), Germany. (paper, proceedings)

Abstract: We propose a coherent set of tasks for protest information collection in the context of generalizable natural language processing. The tasks are news article classification, event sentence detection, and event extraction. Having tools for collecting event information from data produced in multiple countries enables comparative sociology and politics studies. We have annotated news articles in English from a source and a target country in order to be able to measure the performance of the tools developed using data from one country on data from a different country. Our preliminary experiments have shown that the performance of the tools developed using English texts from India drops to a level that are not usable when they are applied on English texts from China. We think our setting addresses the challenge of building generalizable NLP tools that perform well independent of the source of the text and will accelerate progress in line of developing generalizable NLP systems.

Full post...

January 22, 2019

Knet v1.2.0: iterators, iterators, iterators...

The new Knet release is all about iterators: iterators for minibatching, iterators for training, iterators for monitoring, convergence etc. Why am I so excited about iterators all of a sudden? Allow me to explain:

Knet has used iterators for data generation since 2015. That was about it until recently when I was looking for a way to improve the training interface. See, at the core of every deep learning project there is a training loop that looks like this:

function train(model,data)
    for (x,y) in data
        # improve model parameters so model(x) approaches y
And these things can run for hours or days. You want the user to have full control of this loop: how many iterations to go, how to detect convergence and quit, how to monitor progress, how to take model snapshots or measure dev accuracy every n iterations etc.

My original (non)solution was to write a new `train` function for every experiment. Why restrict the user with a bad interface when they can write their own 5 line loop? (of course then why write any package at all but that's another discussion).

My next (pseudo)solution was to provide a `train` function with lots of keyword arguments. I soon gave up on that idea when it became clear that I was on my way to implementing a Turing complete programming language using keyword arguments.

Then I thought I had a brilliant flash of insight based on callback functions. See if `train` just accepts a callback function that gets called inside the for loop, the user can implement any behavior:
function train(model,data,callback)
    for (x,y) in data
        callback() || break
        # improve model parameters so model(x) approaches y
You want to display a progress bar, do something every n iterations, or quit after N iterations? Just implement some callback function with state and you are all set! Brilliant? Everybody hated it. Including me. It turns out callback functions are awkward to write and do not lead to very readable code.

Then finally I rediscovered iterators and iterators that wrap other iterators (inspired by Tqdm.jl). I knew iterators can be these lazy collections that produce their next element only when asked. (Here is a summary with doc links to refresh your memory). See, once you implement the training loop as an iterator you can pause, restart and terminate it whenever you want:
train(model,data) = ((update model and return loss) for (x,y) in data)
What I realized iterators also do is turn the for loop inside out! Make its guts visible so one has explicit control: You can monitor and display its progress, take snapshots or whatever all with very explicit and readable code. Here are some actual examples from Knet v1.2.0. (`sgd` is a train iterator, f is the model, d is the data):

* To display a progress bar use progress(sgd(f,d)).
* To run until convergence use converge(sgd(f,cycle(d))).
* To run multiple epochs use sgd(f,repeat(d,n)).
* To run a given number of iterations use sgd(f,take(cycle(d),n)).
* To do a task every n iterations use:
(task(x) for x in every(n, sgd(f,cycle(d)))).

Each of the functions like `progress`, `converge`, `sgd` etc. take and return iterators. So they can be composed like crazy. Here is how to (1) train a model on dtrn, (2) measuring loss on dtst every 100 iterations, (3) quitting when dtst performance converges, and (4) displaying a progress bar from the Knet tutorial:
a = adam(model,cycle(dtrn))
b = (model(dtst) for _ in every(100,a))
c = converge(b, alpha=0.1)
progress!(c, alpha=1)
The code reads like the English description! Imagine trying to implement this using keyword arguments or callback functions... and that is why I am excited about iterators.

* the more nitpicky reader will probably point out that I should have called these things generators or coroutines or streams or something rather than iterators, but you get the idea.
* every(n,itr) = (x for (i,x) in enumerate(itr) if i%n == 0) should be a Julia primitive! (Thank you @CarloLucibello for pointing out that `IterTools.takenth` does the same thing.)
* @lostella has a wonderful post on iterators.
* Here are the relevant links in Julia docs: Interfaces, Collections, Iteration Utilities and Generator expressions.
* Here is a link to the discussion on Julia discourse.

Full post... Related link