June 04, 2018

Erenay Dayanık, M.S. 2018

Current position: PhD student at University of Stuttgart, Germany (Linkedin)
M.S. Thesis: Morphological Tagging and Lemmatization with Neural Components. Koç University, Department of Computer Engineering. June, 2018. (PDF, Presentation, Code, Data)
I describe and evaluate MorphNet, a language-independent, end-to-end model that is designed to combine morphological analysis and disambiguation. Tradition- ally, analysis of morphologically complex languages has been performed in two stages: (i) A morphological analyzer based on finite-state transducers produces all possible morphological analyses of a word, (ii) A statistical disambiguation model picks the correct analysis based on the context for each word. MorphNet uses a sequence- to-sequence recurrent neural network to combine analysis and disambiguation. The model consists of three LSTM encoders to create embeddings of various input fea- tures and a two layer LSTM decoder to predict the correct morphological analysis. When MorphNet is trained with text labeled with correct morphological analyses, the model is able to achieve state-of-the art or comparable results in twenty-six different languages.

