September 11, 2005

Başak Mutlum, M.S. 2005

Current position: Principal Program Manager, Microsoft, Chicago.
M.S. Thesis: Word Sense Disambiguation Based on Sense Similarity and Syntactic Context. Koç University Department of Computer Engineering, September 2005. (Download PDF).

Word Sense Disambiguation (WSD) is the task of determining the
meaning of an ambiguous word within a given context. It is an open
problem that has to be solved effectively in order to meet the needs
of other natural language processing tasks. Supervised and
unsupervised algorithms have been tried throughout the WSD research
history. Up to now, supervised systems achieved the best
accuracies. However, these systems with the first sense heuristic
have come to a natural limit. In order to make improvement in WSD,
benefits of unsupervised systems should be examined.

In this thesis, an unsupervised algorithm based on sense similarity
and syntactic context is presented. The algorithm relies on the
intuition that two different words are likely to have similar
meanings if they occur in similar local contexts. With the help of a
principle-based broad coverage parser, a 100-million-word training
corpus is parsed and local context features are extracted based on
some rules. Similarity values between the ambiguous word and the
words that occurred in a similar local context as the ambiguous word
are evaluated. Based on a similarity maximization algorithm,
polysemous words are disambiguated. The performance of the algorithm
is tested on SENSEVAL-2 and SENSEVAL-3 English all-words task data
and an accuracy of 59% is obtained.

Related link

No comments: