- http://www.socher.org/index.php/DeepLearningTutorial/DeepLearningTutorial
-
Slides and video from the deep learning tutorial. Papers:
http://jmlr.org/papers/volume11/erhan10a/erhan10a.pdf
http://ronan.collobert.com/pub/matos/2011_nlp_jmlr.pdf
http://ronan.collobert.com/senna/
people seem more interested in finding good representations for supervised tasks than purely unsupervised solving a task. the primary way they get word representations is contrastive estimation (unsupervised pretraining). maybe we can compete with paradigmatic representations based on lm. at the very least lm based representations will be faster to compute and converge.
mikolov has nn based lm model, may be better than srilm to use:
http://www.fit.vutbr.cz/~imikolov/rnnlm/
he also has a paper showing vector space reps are amazingly good: e.g. x(dogs) - x(dog) = x(cats) - x(cat) or x(dog) - x(animal) = x(chair) - x(furniture) we should see if this kind of thing is true on scode sphere...
http://www.aclweb.org/anthology/N13-1090.pdf
A quote from the paper that highlights the advantages of vector space models compared to symbolic (or equivalently one-hot models; I realized symbolic representations are also vector representations, just pretty dumb ones with very high dimensions and all vectors being orthogonal): As pointed out by the original proposers, one of the main advantages of these models is that the distributed representation achieves a level of generalization that is not possi- ble with classical ␣-gram language models; whereas a ␣-gram model works in terms of discrete units that have no inherent relationship to one another, a con- tinuous space model works in terms of word vectors where similar words are likely to have similar vec- tors. Thus, when the model parameters are adjusted in response to a particular word or word-sequence, the improvements will carry over to occurrences of similar words and sequences.
Listening to Richard Socher on parsing, relation, sentiment and paraphrase recognition etc. I don't think we should try to use upos classes but rather the actual 25D vectors for extrinsic evaluation problems... Check out his work from the last couple years... www.socher.org has his code. He also talked about representing ambiguous words with multiple vectors. Some papers on ambiguity detection and representation:
http://aclweb.org/anthology/N/N10/N10-1013.pdf
http://aclweb.org/anthology/P/P12/P12-1092.pdf
Ambiguity icin soyle birsey denesek: Her kelime scode'da bir degil 5 vektorle represent edilsin. scode calisirken Y vektorune en yakin X hangisiyse o attract edilsin. Converge ettiginde umit unambiguous kelimelerin 5 vektoru de ustuste binecek, ama ambiguous olanlar farkli yerlere converge edecek...
Yukarida dedigimi kimse anlamamis, herkese ayri ayri anlattim, vakit olunca daha duzgun yazarim. Presumably scode converge ettiginde ambiguous X'lerin paired oldugu Y'lar daha genis bir alanda multimodal bir sekilde yayilmis olmali. Bunu test ettik mi? Bir X'le birlikte gorulen Y'lerin type ya da token weighted variance'ina bakilabilir. Bir adim oteye giderek bu X'in vektorunu ilgili Y'leri generate eden bir gaussian'in merkezi gibi gorebiliriz. Sonra da birden fazla gaussian var mi diye sorabiliriz DP gibi non-parametric bir clustering modeliyle. Bunu denedik mi? (Birileri denedigimiz seyleri bir yerlere yazsa da unutmasak :)
Bu vector representation'larla ilgili pek cok konusmaya girince bizim CL draft bu haliyle zayif kaliyor oldugunu gordum. Ya ambiguity cozelim dogru durust, ya da en azindan diger vektorlerle karsilastiralim: C&W (senna), RNNLM (mikolov), CCA based, quantum, tensor based vs. Bu adamlara yazip vektorlerini sadece alsak ve clustering denesek yeter. - http://www.cs.columbia.edu/~scohen/naacl13tutorial/
-
The slides from the spectral tutorial.
It turns out co-training and CCA are methods that work with co occuring vars as well. Would be nice to compare with scode and our wsd cl paper. Collins says Co-clustering may also be relevant to what we do with scode.
http://en.wikipedia.org/wiki/Biclustering
Guess what Lyle Ungar (one of the tutorial hosts) does with CCA: (1) find 30-d representations for words based on co occurrence with syntagmatic context features. (2) then feed these vectors to supervised prediction problems like pos tagging...
http://www.aclweb.org/anthology/N13-1014.pdf
-
Learning a pos tagger with little annotation. Another evaluation metric between supervised and unsupervised...
Mali: Iki dil kullanmislar, bunlardan Malagasy icin github da bir test file var. Kullandiklari train+dev ve wikipedia dump file ile bir LM olusturdum. %16 unknown kelime var bu esas sorun. 64 subs m2o skoru %64.1, adamlarin elde ettigi type accuracy skorlari farkli model configurasyonlari icin yuzde 71, 72,74 ve 76. LM mimiz kotu oldugu icin bu seviyede kaldik bence. Suan farkettim ingilizce deneyleri var sanirim orada 'da karsilastiricam. Ingilizce WSJ uzerindeymis ve 15-24 arasi test set olarak kullanilmis. Biz unsupervised oldugumuz icin tum datayi kullanabiliriz. Iki annotator var ingilizce icin biri %78 diger %76 elde etmis. Biz suan m2o %80 ile ikisindende iyiyiz. Bu cocugu taniyorum, hatta konusmustum Ravi ve bizim dictionary cleaning in birlesimi bir is yapiyor ve israrla bizi cite etmiyor. - https://lirias.kuleuven.be/bitstream/123456789/397128/1/NAACL2013VulicMoensFinal.pdf
- Volkan: cross lingual word similarities, burada da birseyler yapabiliriz diye tahmin ediyorum.
- http://www.transacl.org/papers/
- TACL started publishing! Latest TACL has three papers connecting language to action / world, the first of which is being presented now. Aydin, this is the type of domain in which MT techniques I think could help. There is going to be an ACL tutorial, and data/code is already available.
http://www.transacl.org/wp-content/uploads/2013/03/paper49.pdf http://www.transacl.org/wp-content/uploads/2013/03/paper25.pdf http://www.transacl.org/wp-content/uploads/2013/05/paper193.pdf - http://aclweb.org/anthology/N/N13/N13-1071.pdf
- Best short paper uses ontonotes to study coreference. I didnt know ontonotes had coreference. We should look at all the information and possible tasks in ontonotes. More generally it would be nice to have a repository of standard datasets and tasks on our website.
- http://www.aclweb.org/anthology/N13-1104.pdf
- A frame induction paper building on Chambers&Jurafsky 2011: http://www.stanford.edu/~jurafsky/acl2011-chambers-templates.pdf
- http://www.aclweb.org/anthology-new/N/N13/N13-1105.pdf
- They even use quantum theory to derive word vectors! (based on syntagmatic representations :( I now have an official reason to study quantum theory :)
- Bu sabahki invited talk productive gecti. Asagida ikinci paragraftaki variance hesabini yaptim. Yani belli bir X'le birlikte observe edilen Y'lerin variance'lari o X'in ambiguity'si hakkinda birsey soyluyor mu. Ekte yazdigim kod ve cikan graph'i gonderiyorum. Pozitif bir correlation gorunuyor ama cok conclusive degil. Bu ambiguity detection isinde temel problem gold tag'lerle ve gold tag perplexity ile karsilastirmaya calismamiz olabilir. Bildiginiz gibi scode NN'leri 5, NNP'leri 3, VB'leri 4 class'a vs ayirmayi tercih ediyor. Dolayisiyla PTB'nin unambiguous dedigi bazi kelimeler bizim standartlara gore ambiguous olabilir. Kaldi ki scode part-of-speech ile ugrastigimizi bilmiyor. Dolayisiyla word sense ambiguity'si olan kelimeleri de ayirmak istiyor olabilir. Sonuc: gold tag perplexity'den bagimsiz bir kriter bulmamiz lazim ambiguity detection ve handling icin. scode matematigindeki likelihood fonksiyonunu improve ediyor mu bir kelimeyi ikiye bolmek vs gibi bir soru sorabilir miyiz?
- http://www.aclweb.org/anthology/N13-1120.pdf
- Vector based relational similarity bizim scode icin iyi bir extrinsic evaluation olabilir.
- http://www.transacl.org/wp-content/uploads/2013/05/paper231.pdf
- Dan Roth's preposition paper seem relevant to stateml. There is no verb, no event, just a stated relation...
- http://www.cc.gatech.edu/~jeisenst/
- Jacob suggested Kevin Murphy's new ML book and Collins' NLP notes. Here is a nice amazon review comparing various ML texts. BRML, ITILA, GPML, and ESL are freely available online.
- http://www.transacl.org/wp-content/uploads/2013/03/paper631.pdf
- Husnu: DMV modelinde phonetic feature (word duration) ekleyerek improvement elde etmisler. Mali: biz de pos induction icin bu tip feature kullanalim diyorduk... Belki adamin dataseti ise yarar. Model purely lexical by the way...
- http://www.aclweb.org/anthology/N13-1134.pdf
- uses tensor arithmetic to represent word composition. they seem to get disambiguation for free...
- http://www.transacl.org/wp-content/uploads/2013/03/paper75.pdf
- Unsupervised CCG grammar induction using HDP. State of the art on 15 languages. Performance does not fall as fast with longer sentences.
- You shall know a word by the company it keeps.
- Everybody uses this quotation from (Firth, J. R. 1957:11). Must find a paradigmatic version to wake people up! You can know a word better by what it can replace :)
- http://www.aclweb.org/anthology/N13-1092.pdf
- There is a new paraphrase database. More coming...
Full post...

