July 02, 2018

July 01, 2018

Morphological Disambiguation for Turkish

Dilek Zeynep Hakkani-Tür, Murat Saraçlar, Gökhan Tür, Kemal Oflazer and Deniz Yuret. 2018. In Turkish Natural Language Processing, Kemal Oflazer and Murat Saraçlar (Eds.), Ch.3, pp.53-68. Springer. (URL)

Abstract: Morphological disambiguation is the task of determining the contextually correct morphological parses of tokens in a sentence. A morphological disambiguator takes in sets of morphological parses for each token, generated by a morphological analyzer, and then selects a morphological parse for each, considering statistical and/or linguistic contextual information. This task can be seen as a generalization of the part-of-speech (POS) tagging problem for morphologically rich languages. The disambiguated morphological analysis is usually crucial for further processing steps such as dependency parsing. In this chapter, we review morphological disambiguation problem for Turkish and discuss approaches for solving this problem as they have evolved from manually crafted constraint-based rule systems to systems employing machine learning.


Full post...

June 04, 2018

Erenay Dayanık, M.S. 2018

Current position: Applied Scientist at Amazon, Cambridge UK (Linkedin, Homepage)
M.S. Thesis: Morphological Tagging and Lemmatization with Neural Components. Koç University, Department of Computer Engineering. June, 2018. (PDF, Presentation, Code, Data)
Publications: bibtex.php

Abstract

I describe and evaluate MorphNet, a language-independent, end-to-end model that is designed to combine morphological analysis and disambiguation. Tradition- ally, analysis of morphologically complex languages has been performed in two stages: (i) A morphological analyzer based on finite-state transducers produces all possible morphological analyses of a word, (ii) A statistical disambiguation model picks the correct analysis based on the context for each word. MorphNet uses a sequence- to-sequence recurrent neural network to combine analysis and disambiguation. The model consists of three LSTM encoders to create embeddings of various input fea- tures and a two layer LSTM decoder to predict the correct morphological analysis. When MorphNet is trained with text labeled with correct morphological analyses, the model is able to achieve state-of-the art or comparable results in twenty-six different languages.


Full post...