December 11, 2009

Unsupervised morphological disambiguation using statistical language models

Mehmet Ali Yatbaz and Deniz Yuret. NIPS 2009 Workshop on Grammar Induction, Representation of Language and Language Learning. December 2009. (PDF, Poster)
Abstract:
In this paper, we present a probabilistic model for the unsupervised morphological disambiguation problem. Our model assigns morphological parses T to the contexts C instead of assigning them to the words W. The target word $w \in W$ determines the possible parse set $T_w \subset T$ that can be used in $w$'s context $c_w \in C$. To assign the correct morphological parse $t\in T_w$ to $w$, our model finds the parse $t\in T_w$ that maximizes $P(t|c_w)$. $P(t|c_w)$'s are estimated using a statistical language model and the vocabulary of the corpus. The system performs significantly better than an unsupervised baseline and its performance is close to a supervised baseline.

Full post...