September 21, 2005

Volkan Kurt, M.S. 2005

Last position: IT director, Markafoni, Istanbul. (twitter).
M.S. Thesis: Protein Structure Prediction Using Decision Lists. Koç University Department of Computational Sciences and Engineering, September 2005. (Download PDF).

Proteins are building blocks of life. Structure of these building
blocks plays a vital role in their function, and consequently in the
function of living organisms. Although, increasingly effective
methods are developed to determine protein structure, it is still
easier to determine amino acid sequence of a protein than its folded
structure and the gap between number of known structures and known
sequences is increasing in an accelerating manner. Structure
prediction algorithms may help closing this gap.

In this study, we have investigated various aspects of structure
prediction (both secondary and tertiary structure). We have
developed an algorithm (Greedy Decision List learner, or GDL) that
learns a list of pattern based rules for protein structure
prediction. The resulting rule lists are short, human readable and
open to interpretation. The performance of our method in secondary
structure predictions is verified using seven-fold cross validation
on a non-redundant database of 513 protein chains (CB513). The
overall three-state accuracy in secondary structure predictions is
62.5% for single sequence prediction and 69.2% using multiple
sequence alignment. We used GDL to predict tertiary structure of a
protein based on its backbone dihedral angles phi and psi. The
effect of angle representation granularity to the performance of
tertiary structure predictions has been investigated.

Existing structure prediction approaches build increasingly
sophisticated models emphasizing accuracy at the cost of
interpretability. We believe that the simplicity of the GDL models
provides scientific insight into the relationship between local
sequence and structure in proteins.

Full post... Related link

September 11, 2005

Başak Mutlum, M.S. 2005

Current position: Principal Program Manager, Microsoft, Chicago.
M.S. Thesis: Word Sense Disambiguation Based on Sense Similarity and Syntactic Context. Koç University Department of Computer Engineering, September 2005. (Download PDF).

Word Sense Disambiguation (WSD) is the task of determining the
meaning of an ambiguous word within a given context. It is an open
problem that has to be solved effectively in order to meet the needs
of other natural language processing tasks. Supervised and
unsupervised algorithms have been tried throughout the WSD research
history. Up to now, supervised systems achieved the best
accuracies. However, these systems with the first sense heuristic
have come to a natural limit. In order to make improvement in WSD,
benefits of unsupervised systems should be examined.

In this thesis, an unsupervised algorithm based on sense similarity
and syntactic context is presented. The algorithm relies on the
intuition that two different words are likely to have similar
meanings if they occur in similar local contexts. With the help of a
principle-based broad coverage parser, a 100-million-word training
corpus is parsed and local context features are extracted based on
some rules. Similarity values between the ambiguous word and the
words that occurred in a similar local context as the ambiguous word
are evaluated. Based on a similarity maximization algorithm,
polysemous words are disambiguated. The performance of the algorithm
is tested on SENSEVAL-2 and SENSEVAL-3 English all-words task data
and an accuracy of 59% is obtained.

Full post... Related link