The Regression Model of Machine Translation
Ergun Biçici. Ph.D. Dissertation. Koç University, Department of Computer Engineering. August, 2011. (PDF, Presentation)
Machine translation is the task of automatically finding the translation of a source sentence in the target language. Statistical machine translation (SMT) use parallel corpora or bilingual paired corpora that are known to be translations of each other to find a likely translation for a given source sentence based on the observed translations. The task of machine translation can be seen as an instance of estimating the functions that map strings to strings.
Regression based machine translation (RegMT) approach provides a learning framework for machine translation, separating learning models for training, training instance selection, feature representation, and decoding. We use the transductive learning framework for making the RegMT approach computationally more scalable and consider the model building step independently for each test sentence. We develop training instance selection algorithms that not only make RegMT computationally more scalable but also improve the performance of standard SMT systems. We develop better training instance selection techniques than previous work from given parallel training sentences for achieving more accurate RegMT models using less training instances.
We introduce L_1 regularized regression as a better model than L_2 regularized regression for statistical machine translation. Our results demonstrate that sparse regression models are better than L_2 regularized regression for statistical machine translation in predicting target features, estimating word alignments, creating phrase tables, and generating translation outputs. We develop good evaluation techniques for measuring the performance of the RegMT model and the quality of the translations. We use F_1 measure, which performs good when evaluating translations into English according to human judgments. F_1 allows us to evaluate the performance of the RegMT models using the target feature prediction vectors or the coefficients matrices learned or a given SMT model using its phrase table without performing the decoding step, which can be computationally expensive.
Decoding is dependent on the representation of the training set and the features used. We use graph decoding on the prediction vectors represented in n-gram or word sequence counts space found in the training set. We also decode using Moses after transforming the learned weight matrix representing the mappings between the source and target features to a phrase table that can be used by Moses during decoding. We demonstrate that sparse L_1 regularized regression performs better than L_2 regularized regression in the German-English translation task and in the Spanish-English translation task when using small sized training sets. Graph based decoding can provide an alternative to phrase-based decoding in translation domains having low vocabulary.