Deniz Yuret and Ergun Bicici. In the Joint conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2009) (PDF).
Abstract: We experiment with splitting words into their stem and sufﬁx components for modeling morphologically rich languages. We show that using a morphological analyzer and disambiguator results in a signiﬁcant perplexity reduction in Turkish. We present ﬂexible n-gram models, Flex-Grams, which assume that the n−1 tokens that determine the probability of a given token can be chosen anywhere in the sentence rather than the preceding n − 1 positions. Our ﬁnal model achieves 27% perplexity reduction compared to the standard n-gram model.