September 21, 2005

Volkan Kurt, M.S. 2005

Last position: IT director, Markafoni, Istanbul. (twitter).
M.S. Thesis: Protein Structure Prediction Using Decision Lists. Koç University Department of Computational Sciences and Engineering, September 2005. (Download PDF).

Proteins are building blocks of life. Structure of these building
blocks plays a vital role in their function, and consequently in the
function of living organisms. Although, increasingly effective
methods are developed to determine protein structure, it is still
easier to determine amino acid sequence of a protein than its folded
structure and the gap between number of known structures and known
sequences is increasing in an accelerating manner. Structure
prediction algorithms may help closing this gap.

In this study, we have investigated various aspects of structure
prediction (both secondary and tertiary structure). We have
developed an algorithm (Greedy Decision List learner, or GDL) that
learns a list of pattern based rules for protein structure
prediction. The resulting rule lists are short, human readable and
open to interpretation. The performance of our method in secondary
structure predictions is verified using seven-fold cross validation
on a non-redundant database of 513 protein chains (CB513). The
overall three-state accuracy in secondary structure predictions is
62.5% for single sequence prediction and 69.2% using multiple
sequence alignment. We used GDL to predict tertiary structure of a
protein based on its backbone dihedral angles phi and psi. The
effect of angle representation granularity to the performance of
tertiary structure predictions has been investigated.

Existing structure prediction approaches build increasingly
sophisticated models emphasizing accuracy at the cost of
interpretability. We believe that the simplicity of the GDL models
provides scientific insight into the relationship between local
sequence and structure in proteins.

Related link

No comments: