September 11, 2006

Bengi Mizrahi, M.S. 2006

Current position: Software Engineering Manager, Juniper Networks. (email, linkedin).
M.S. Thesis: Paraphrase Extraction from Parallel News Corpora. Koç University Department of Computer Engineering. September, 2006. (Download PDF).



Abstract:
Different expressions of the same statement is said to be paraphrases of each other. An example is the phrases 'solved' and 'found a solution to' in 'Alice solved the problem' and 'Alice found a solution to the problem'. Paraphrase Extraction is the method of finding and grouping such paraphrases from free text. Finding equivalent paraphrases and structures can be very beneficial in a number of NLP applications, such as Question Answering, Machine Translation, and Multi-text Summarization, e.g. in Question Answering, alternative questions can be created using alternative paraphrases. We attack the problem by first grouping news articles that describe the same event and then collecting sentence pairs from these articles that are semantically close to each other, and then finally extracting paraphrases out of these sentence pairs to learn paraphrase structures. The precision of finding two equivalent documents turned out to be 0.56 and 0.70 on average, when matching criterion was strict and flexible, respectively. We tried 9 different evaluation techniques for sentence-level matching. Although, exact word match count approach had a better precision value than the n-gram precision count approaches, paraphrase extraction phase shows that the latter approaches catch sentence pairs with higher quality pairs for paraphrase extraction. Our system can extract paraphrases with 0.66 precision when only equivalent document pairs are used as a test set.

Related link

No comments: