December 03, 2008

Show and Tell

Here is a little demo video of some work I did with Sajit at MIT CSAIL this summer. The computer watches us play with a ball and produces live commentary. For now the detection of actions (like give, drop, move) are hand coded, the next step would be to learn them from examples. The step after that is "tell and show", i.e. to go from words to pictures. This would complete the imagination-perception loop which may underlie much understanding and problem solving.

I think one of the coolest things about the current implementation is how the computer starts a sentence and cuts it in half to say something more important. There is always tons of possible things to say and possible words to say them with, and a similar competition must be going on in our heads.

Full post...

October 31, 2008

Morphological cues vs. number of nominals in learning verb types in Turkish: Syntactic bootstrapping mechanism revisited

Deniz Yuret, A. Engin Ural, Nihan Ketrez, Dilara Koçbaş and Aylin C. Küntay. In The Boston University Conference on Language Development (BUCLD) (Long abstract, poster, PDF)

Abstract: The syntactic bootstrapping mechanism of verb classification was evaluated against child-directed speech in Turkish, a language with rich morphology, nominal ellipsis and free word order. Machine-learning algorithms were run on transcribed caregiver speech (12,276 and 20,687 utterances) directed to two Turkish learners (one hour every two weeks between 0,9 to 1;10) of different socioeconomic backgrounds. The corpora contained 12,276 and 20,687 child-directed utterances. Study 1 found that the number of nominals in child-directed utterances plays some role in classifying transitive and intransitive verbs. Study 2 found that accusative morphology on the noun is a stronger cue in clustering verb types. Study 3 found that verbal morphology is useful in distinguishing between different subtypes of intransitive verbs. These results suggest that syntactic bootstrapping mechanisms should be extended to include morphological cues to verb learning in morphologically rich languages.

Full post... Related link

September 20, 2008

Parser Training and Evaluation using Textual Entailments

A task proposal for SemEval-2010 by Deniz Yuret and Önder Eker. For example sentences and the bibliography, please download the original PDF.

Description of the Task

We propose a targeted textual entailment task designed to train and evaluate parsers (PETE). The typical parser training and evaluation methodology uses a gold treebank which raises several issues: (1) The treebank is built around a particular linguistic representation which makes parsers that use different representations (e.g. phrase structure vs. dependency) difficult to compare. (2) The parsers are evaluated based on how much of the linguistic structure dictated by the treebank they can replicate, some of which may be irrelevant for downstream applications. (3) The annotators that create the treebank not only have to understand the sentences in the corpus, but master the particular linguistic representation used, which makes their training difficult and leads to inconsistencies (see Carroll1998 for a review and Parseval2008 for more recent work on parser evaluation).

In the proposed method, simple textual entailments like the following will be used to fine-tune and evaluate different parsers:
  • Final-hour trading accelerated to 108.1 million shares, a record for the Big Board.
    • 108.1 million shares was a record. -- YES
    • Final-hour trading accelerated a record. -- NO

  • Earlier the company announced it would sell its aging fleet of Boeing Co. 707s because of increasing maintenance costs.
    • It would sell the fleet because of increasing costs. -- YES
    • Selling the fleet would increase maintenance costs. -- NO

  • Persistent redemptions would force some fund managers to dump stocks to raise cash.
    • The managers would dump stocks to raise cash. -- YES
    • The stocks would raise cash. -- NO

The entailment examples will be generated based on the following criteria: (1) It should be possible to automatically decide which entailments are implied based on the parser output only, i.e. there should be no need for lexical semantics, anaphora resolution etc. (2) It should be easy for a non-linguist annotator to decide which entailments are implied, reducing the time for training and increasing inter-annotator agreement. (3) The entailments should be non-trivial, i.e. they should focus on areas of disagreement between current state of the art parsers. The above examples satisfy all three criteria.

Training and evaluating parsers based on targeted textual entailments address each of the issues listed in the first paragraph regarding treebank based methods: The evaluation is representation independent, therefore there is no difficulty in comparing the performance of parsers from different frameworks. By focusing on the parse differences that result in different entailments, we ignore trivial differences that stem from the conventions of the underlying representation which should not matter for downstream applications. Finally our annotators will only need a good understanding of the English language and no expertise on any linguistic framework.

Generating Data

The example entailment questions can be generated by considering the differences between the outputs of different state of the art parsers and their gold datasets. Some of the detected parse differences can be turned into different entailments about the sentence. The example entailments in the previous section were generated comparing the outputs of two dependency parsers, which are included in the appendix. The generated entailments will then be annotated by multiple annotators and the differences will be resolved using standard techniques.

Generating entailment questions out of parser differences allow us to satisfy conditions 1 and 3 easily: the entailments can be judged based on parser output because that is how they were generated, and they are non-trivial because some state of the art parsers disagree on them.

In our experience the most difficult condition to satisfy is 2: that it should be easy for a non-linguist annotator to decide which entailments are implied. In most of the example sentences we looked at, the differences between the parsers were trivial, e.g. different conventions on how to tag coordinating conjunctions, or whether to label a particular phrase with ADVP vs. ADJP etc. These differences are trivial in the sense that it is impossible to generate different entailments from them, thus it is hard to see how they would matter in a downstream application.

The trivial differences between parsers make example generation using our process difficult. The efficiency of example generation may be improved by pre-filtering candidate sentences which contain structures that involve non-trivial decisions by the parser such as prepositional phrase attachments. In addition some types of entailment generation can be automated. On the other hand the requirement of expressing differences in entailments will hopefully focus the training and the evaluation on non-trivial differences that actually matter in applications.

Evaluation Methodology

The participants will be provided with training and test sets of entailments and they will be evaluated using the standard tools and methodology of the RTE challenges (Dagan2006). The main difference is that our entailment examples focus exclusively on parsing. This should make it possible to write simple tree matching modules that decide on the entailments based on parser output alone. Example tree matching modules for standard formats (Penn Treebank (Marcus1993) format for phrase structure parsing, and CoNLL (Nivre2007) format for dependency parsing) will be provided as examples which should make preparing an existing parser for evaluation relatively easy.

The training part will be more parser specific, so individual participants will have to decide how to best make use of the provided entailment training set. It is unlikely that we will be able to generate enough entailment examples to train a parser from scratch. Therefore the task will have to be open to using other resources. The participants will be free to use standard resources such as treebanks to train their parsers. We can also consider restricting the outside training resources (e.g. Penn Treebank only) and the domain of the entailments (e.g. finance only). The entailment training set can then be used to fine tune the parser by focusing the evaluation on important parser decisions that effect downstream applications.

Full post... Related link

August 16, 2008

Discriminative vs. Generative Approaches in Semantic Role Labeling

Deniz Yuret, Mehmet Ali Yatbaz and Ahmet Engin Ural. In Proceedings of The Twelfth Conference on Natural Language Learning (CoNLL-2008) (PDF, ACM)

Abstract: This paper describes the two algorithms we developed for the CoNLL 2008 Shared Task “Joint learning of syntactic and semantic dependencies”. Both algorithms start parsing the sentence using the same syntactic parser. The first algorithm uses machine learning methods to identify the semantic dependencies in four stages: identification and labeling of predicates, identification and labeling of arguments. The second algorithm uses a generative probabilistic model, choosing the semantic dependencies that maximize the probability with respect to the model. A hybrid algorithm combining the best stages of the two algorithms attains 86.62% labeled syntactic attachment accuracy, 73.24% labeled semantic dependency F1 and 79.93% labeled macro F1 score for the combined WSJ and Brown test sets.

Full post... Related link

August 01, 2008

Ahmet Engin Ural, M.S. 2008

Current position: Co-founder, Arena AI, New York. (email, website, linkedin).
M.S. Thesis: Evolution of Compositionality with a Bag of Words Syntax. Koç University Department of Computer Engineering, August 2008. (Download PDF).

In the last two decades, the idea of an emerging and evolving language has been studied thoroughly. The main question behind this kind of studies is how a group of humans reaches an agreement on the phonology, lexicon and syntax. The improvements in computational tools led the researchers build and test models that have been ran computer simulations to answer the question. Although the models are mere reflections of the reality, the results have been often useful and insightful. This dissertation follows the same line and proposes a new model, tested in a game based simulation methodology. Besides, this work tries to fill the gap in the studies of lexicon compositionality and proposes a plausible explanation for the transition from single word naming to multi word naming. The direction of the results is in line with the previous research such as the emergence of a stable and communicative language. Moreover compositionality in lexicon is observed with a very simple bag of words syntax. The parameters influencing the results are analyzed in depth. Even though the model does not meet the standards of the real world, future work hints insightful facts about the transition from single word naming to syntax.

A selection of Evolution of Language Resources

Biannual Evolution of Language Conference: evolang 2008

University of Edinburgh, Language Evolution and Computation Research Unit

A literature overview by Simon Kirby

The book Language Evolution by Morten Christiansen and Simon Kirby

A PhD thesis by Joris van Looveren, Design and Performance of Pre-Grammatical Language Games

Full post... Related link

April 24, 2008

Olasılık ve kehanet

Diyelim finans danışmanınız bir gün geldi ve borsanın yönünü
yüksek oranda doğru olarak tahmin edebilen bir algoritma bulduğunu
iddia etti. Algoritma girdi olarak son on günün hareketlerini
(aşağı ya da yukarı) alıyor ve çıktı olarak yarınki yönü (aşağı ya
da yukarı) belirliyor. Danışman son dört yılın datasını (yaklaşık
1000 gün) kullanarak algoritmayı geliştirdiğini ve aynı dönemde
simüle edildiğinde algoritmanın %90'ın üzerinde başarılı olduğunu
açıklıyor. Paranızı yatırır mısınız?

1. Eğer borsanın yönü yazı tura ile belirlense bu algoritmadan nasıl bir performans beklerdiniz?
2. Eğer algoritma ilk iki yıla bakarak geliştirilse ve son iki yıl üzerinde test edildiğinde bu başarıyı gösterse fikriniz değişir miydi?
3. Eğer algoritma son 10 güne değil son 5 güne bakarak bu başarıyı gösterse fikriniz değişir miydi?
Full post... Related link