## November 13, 2013

### Learning to map strings to meaning

This is the talk I gave at the Univ 101 Freshman Seminar and the introductory lecture of my NLP class. The PDF Presentation contains slides with links to relevant material and papers.
Full post...

## November 09, 2013

### A Language Visualization System

Ünal, Emre and Yuret, Deniz. In the Proceedings of the 2nd Workshop on Games and NLP (GAMNLP-13). November, 2013. İstanbul, Turkey. (Download PDF, Presentation, see our previous post, go to the workshop page).
Abstract:
A novel language visualization system is presented here that is capable of understanding some concrete nouns, visualizable adjectives and spatial prepositions in full natural language sentences to generate 3D scenes. The system has a rule based question answering component and it can answer spatial inference questions about the scene created by these sentences. This work is the first step of our greater vision of building a sophisticated text-to-scene and scene-to-text conversion engine.

Full post...

## October 13, 2013

### Deniz Yuret's Math Problems

This is a list of elementary problems I like for one reason or another from various branches of mathematics. I cited the people I heard the problems from, they are not necessarily the originators. The unsolved flag just means that the problem is yet unsolved by me. Send me a solution if you find one. You can send me any interesting problems you find by adding a comment to this post. Also check out my other math posts, especially probability twisters, unsolved elementary problems, and links to other math sites. Last update: October 13, 2013.
1. (Quanta Magazine) Show that if Alice tosses a fair coin until she sees a head followed by a tail, and Bob tosses a coin until he sees two heads in a row, then on average, Alice will require four tosses while Bob will require six tosses.
2. (Cihan Baran) If we sample k times with replacement from the set {1, 2, ..., n} (all members picked with equal probability) what is the probability that at least one member will not be picked?
3. (Bertsekas and Tsitsiklis) Suppose that n people throw their hats in a box and then each picks one hat at random. (Each hat can be picked by only one person, and each assignment of hats to persons is equally likely.) What is the expected value of X, the number of people that get back their own hat?
4. (Paul Lockhart) Must there always be two people at a party who have the same number of friends there?
5. (Volkan Cirik) Consider a game played with an array of 2n random numbers. The first player can pick the number at either end of the array, the second player can pick from either end of the remaining numbers etc. The players take turns picking numbers from either end until no more numbers remain. Whoever has the highest total wins. Show that the first player can always win or draw.
6. (Ahmed Roman) Nathan and Abi are playing a game. Abi always goes first. The players take turns changing a positive integer to a smaller one and then passing the smaller number back to their opponent. On each move, a player may either subtract one from the integer or halve it, rounding down if necessary. Thus, from 28 the legal moves are to 27 or to 14; from 27, the legal moves are to 26 or to 13. The game ends when the integer reaches 0. The player who makes the last move wins. For example, if the starting integer is 15, Abi might move to 7, Nathan to 6, Abi to 3, Nathan to 2, Abi to 1, and now Nathan moves to 0 and wins. (However, in this sample game Abi could have played better!)
1. Assuming both Nathan and Abi play according to the best possible strategy, who will win if the starting integer is 1000? 2000? Prove your answer.
2. As you might expect, for some starting integers Abi will win and for others Nathan will win. If we pick a starting integer at random from all the integers from 1 to n inclusive, we can consider the probability of Nathan winning. This probability will fluctuate as n increases, but what is its limit as n tends to infinity? Prove your answer.
7. (Serdar Tasiran) We have an n boxes, k of them have a ball in them (k<=n), the others are empty. We start opening the boxes in a random order and stop when we find a ball. What is the expected number of boxes we will open?
8. (Ertem Esiner) Two mathematicians meet an old friend who has two kids and ask her what their ages are. The mother writes the sum of the two ages on a piece of paper and gives it to the first mathematician, and gives the product of the two ages to the second mathematician. The mathematicians think for a minute and claim the information is not sufficient. The woman says "think again". At that moment the mathematician with the product figures out the answer. How old are the two kids?
9. (Ertem Esiner) Five pirates are trying to divide up 1000 gold pieces among themselves. They decide to take turns making offers. An offer by a pirate is accepted if at least half of the other pirates agree to it. Otherwise the pirate making the offer is killed, and the next one makes an offer. Each pirate is greedy, but none wants to die. If you are the first pirate, what offer do you make?
10. (Drake) A professor announces that there will be surprise exam the following week, specifically that the students will not know what day the exam is going to take place. The students reason that the exam cannot take place on Friday, because by Thursday night they would know what day the exam is going to take place. If the exam cannot be on Friday it cannot be on Thursday either, because they would know by Wednesday night and so on. They finally decide the exam cannot take place and forget to study. The professor gives the exam on Wednesday to everybody's surprise. What went wrong?
11. (Mackay) Fred rolls an unbiased six-sided die once per second, noting the occasions when the outcome is a six.
1. What is the mean number of rolls from one six to the next six?
2. Between two rolls, the clock strikes one. What is the mean number of rolls until the next six?
3. Now think back before the clock struck. What is the mean number of rolls, going back in time, until the most recent six?
4. What is the mean number of rolls from the six before the clock struck to the next six?
12. (Deniz) You have two independent random variables between 0 and 1. How do you decide which one is more likely to be larger than the other?
13. (Deniz) You have two arbitrary random variables between 0 and 1. How do you decide if they are independent or not looking at their joint pdf density plot?
14. (Dennis Eriksson) Find all solutions to the diophantine equation: 1+2+3+...+n=m^2, where n and m are positive integers.
15. (Feyz) You have a deck of n cards numbered from 1 to n. dealt and shuffled randomly. What is the probability that none of the i-th card is on the i-th position?
16. (Sonny) Prove that there is a natural number n, for which 2^n starts with the numbers 3141592, i.e., show that there is a number of the form 3141592.... which is a power of 2 (in base 10 representation).
17. (Alkan) Let n, k be integers greater than 1.
1. Show that 1/1 + 1/2 + 1/3 +...+ 1/n cannot be an integer.
2. Show that 1/k + 1/(k+1) + ... + 1/(k+n) cannot be an integer.
18. (Will,Minsky) These problems have something in common:
1. A monk leaves to ascend to the temple on top of a mountain at 9am and arrives at 5pm. The next day he leaves the temple at 9am and arrives back at the foot of the mountain at 5pm. Prove that there is a point in time where he was at the same location on the path at the same time.
2. Prove that on a 2D earth, there exists a diameter such that the temperature at the endpoints is equal.
3. Prove that on a 3D earth, there exists a diameter such that the temperature and humidity of the endpoints are equal.
4. Does every convex closed curve in the plane contain all four vertices of some square?
19. (Ben) Two nice algorithm questions: Given a shuffled array of numbers from 1 to 10,000, find the three that are missing in one pass. Given an array of positive and negative integers, find the subarray with the highest sum in linear time.
20. (Winston) Four people want to pass a bridge dark at night. They can walk the bridge in 1, 2, 9, and 10 minutes respectively. The bridge can carry at most two people at a time. There is a single flash-light, and they need the flash-light to walk on the bridge. What is the shortest time for all four to pass across? (This was apparently a popular Microsoft interview question).
21. (Beril) You have a glass of tea and a glass of milk. You take a spoonfull of milk, mix it with the tea. Then you take a spoonfull of this mixture and mix it with the milk. Is there more milk in the tea or more tea in the milk at the end?
22. (Mine) Construct a square from: (a) Three identical squares, (b) Any two squares, (c) A rectangle. (d) Divide a square into any given two squares with the same total area. (e) Divide a circle into 6, 7, 8, and 10 equal pie-slices.
23. (IMO practice) m+n people are standing on a movie line. m people have 5 dollar bills, n people have 10 dollar bills. The movie is 5 dollars. The cashier opens with no money. It will close if it does not have enough change to give one person. How many possible lines are there that will get through without closing the cashier? Note: (Deniz, Mar 10, 1998) I just discovered that this problem is equivalent to finding the number of full binary trees with m leaves when n=m-1. A full binary tree is a tree where each node has 0 or 2 children. The number of binary trees is equivalent to the number of shift-reduce sequences that parse them. For such a sequence to be valid the number of shifts need to always be ahead of the number of reduces, which turns this into our movie problem. The binary tree problem can also be solved using a generating function and the relation b[n] = sum[k=1..n-1](b[k] b[n-k]). The movie problem can be solved by using random walks and the reflection principle. The two solutions seem to give different answers but they turn out to be equivalent. This constitutes an indirect proof of the following combinatorial identity: (2n-1)!! = 2n!/(n! 2^n). Everything is related to everything else in math :-)
24. (Oguz) Find a function f on real numbers such that f(f(x)) = -x.
25. (Boris) You meet many women in your life. After meeting each one, you decide how good she is and whether you want to marry her. If you decide to marry her, you lose your chance with future candidates. If you decide to move on, you lose your chance with her. Assuming you will meet at most n women, find the optimum strategy for marrying the best bride. (The Azeri mathematician Gussein-Zade is apparently the first one to solve this problem.)
26. (Alkan) A small rectangle is cut out of a large rectangle. Find a line that divides the remaining figure into two equal areas using an unmarked ruler.
27. (IMO) Let A be a set of ten two-digit integers. Prove that one can always find two subsets of A with the same sum.
28. (IMO) 17 people correspond with each other. Each pair discusses one of three possible topics. Prove that there are three people that discuss the same topic with each other.
29. (Alkan) Five couples meet in a party. Everyone starts shaking hands with everyone else except their partners. At some point the host stops them and asks how many handshakes each had. Everyone gives a different number. How many hands did the host's wife shake?
30. (IMO-75/4) When 4444 4444 is written in decimal notation, the sum of its digits is A. Let B be the sum of the digits of A. Find the sum of the digits of B. (A and B are written in decimal notation.)
31. (Murat Fadiloglu) What is the probability of two randomly selected integers being mutually prime?
32. (Alkan) An old lady buys a plane ticket to visit her son. She goes to the airport and people let her board first. Since she can't read her seat number, she sits on a random seat. Rest of the passengers sit on their own seats, unless it is occupied in which case they randomly choose one of the emtpy seats. What is the probability that the last passenger will sit on his own seat?
33. (Alkan) sqrt(1 + 2*sqrt(1 + 3*sqrt(1 + 4*sqrt(1 + 5*sqrt(1 + ... ))...) = ?
34. (Rota) Given a sequence of (n 2 + 1) distinct integers, show that it is possible to find a sequence of (n+1) entries which is increasing or decreasing.
35. (Minkowsky) Consider a two dimensional lattice with grid points unit distance apart. Show that a convex shape that is symmetric around a grid point has to include at least three grid points if its area is 4.
36. (Science Museum, unsolved) Which rectangles can be divided into unequal squares?
37. (Alkan) Consider permutations of an array which contains n copies of each integer from 1 to n. Two permutations are defined as orthogonal if their corresponding elements form distinct pairs. What is the maximum number of permutations such that any two are orthogonal? For example, here is a possible set of mutually orthogonal permutations for n=3: {111222333, 123123123, 123231312, 123312231}.
38. (Ian) My new favorite algorithm: Find an algorithm that discovers if a linked list has a cycle in it using bounded memory.
39. (Uttrash) You are in prison and they give you n red balls, n green balls and two boxes. You are to place the balls in the two boxes in any way you like. The next day they will pick a ball from one of the boxes, and if it is green you will be set free. How do you arrange the balls?
40. (Will) You randomly throw k balls into n bins. What is the expected number of occupied bins.
41. (Will) You randomly throw k points on the unit interval. What is the expected length of the longest segment.
42. (Michael, unsolved) You distribute 100 balls to 10 buckets. What is the expected value of the number of balls in the bucket with most balls.
43. (Lon) Draw 2 circles, 1 completely inside the other (but not necessarily concentric.) What is the probablility that a line intersecting the outer circle also intersects the inner circle. Now, do the same with rectangles.
44. (Thurston and Conway) An angel is stuck on an infinite sheet of graph paper, he can hop from square to adjacent square. Everytime the angel hops, the devil can knock out any square, so the angel can't ever go there. Can the devil trap the angel? What if the graph paper is a positive quadrant (i.e. is bounded on two sides).
45. This is not really math, but here are my two favorite algorithms: (1) Find an algorithm for perfect shuffling of an array. (2) Find an algorithm that will pick a perfectly random element from a list in one pass without knowing the size of the list beforehand.
46. (Alkan) Given two points find the point midway between them using only a compass (no ruler).
47. (Alkan) You are sitting at point [0,0] and looking towards right into a tunnel bounded by y=1/x and y=-1/x curves. The walls of the tunnel are reflecting. Prove that if you send a light beam into the tunnel in any direction other than straight to the right, the beam will be reflected back towards left.
48. (Deniz) Let x be a random variable which can take positive integer values. P(x)=1/2x. We draw n random elements from this distribution. What is the probability that the n+1st element will be different from the first n?
49. (Alkan) Let A be the set of all rectangles that have one integer side. Prove that any rectangle constructed by concatenating rectangles from A will also be a member of A.
50. (Neal) Take a randomly shuffled deck. Open the cards one by one. At one point stop and predict that the next card is red. Is there a strategy that has more than 1/2 chance.
51. Pick two random points in the unit line segment. What is the expected distance between them?
52. Pick two random points in the unit circle. What is the expected distance between them?
53. (Umit) Suspend a rope from two points on the ceiling. What shape does it take?
54. (Bernoulli brothers) A ball is rolling from point A to a lower point B. What is the ideal curve for the path between A and B that minimize the travel time?
55. (Alkan) There are 100 light poles with switches on a street. In the beginning all lights are off. One person goes through pole number 1, 2, 3, ... and flips the switches. Then he goes back and goes through 2, 4, 6, ... and flips the switches. Then he goes back and goes through 3, 6, 9, ... and flips the switches. So at n'th round he flips the multiples of n. Which lights are on after 100 rounds?
56. (Deniz) There is a set A of n0 elements, and we randomly pick a subset B of n1 elements. We know that r0 of the elements in A were red. We are interested in finding out the number of red elements in B, r1. To find out we start picking random elements from B. We pick n2 elements, and r2 of them turn out to be red. Now what is the best estimate for r1?
57. (Minsky) I bring you three flipped cups and tell you there is gold under one of them. Furthermore, each cup has a number giving the probability that the gold is under that one. You immediately go to the one with highest probability. I tell you that you have amnesia and I may have tried this on you a million times. What is your best strategy?
58. (Minsky) An ant leaves a repeated binary pattern behind as it walks through the desert. What is the length of the shortest pattern that would let you distinguish which way the ant was going?
59. (Feyzu) Two points are chosen at random on a line AB, each point being chosen according to the uniform distribution on AB, and the choices being made independently of each other. The line AB may now be regarded as divided into three parts. What is the probability that they may be made into a triangle?
60. (IMO practice) The entries for a competition is locked in a safe. There are 11 judges. We would like them to be able to open the safe when more than half get together. How many locks / keys do we need?
61. (IMO practice) Given three parallel lines, show that an equilateral triangle can always be constructed with a vertex on each line.
62. (IMO-72/6) Given four distinct parallel planes, prove that there exists a regular tetrahedron with a vertex on each plane.
63. (Umit) A method for two people to share a pie fairly is to let one cut the other one pick. Generalize this method to n people.
64. You are making a random walk on an n-dimensional grid. What is the probability that you will ever return to the origin? (Hint: It is 1 for 1-D and 2-D! It is 0.3405 for 3-D).
65. (Ivanie) A rabbit hopping up the stairs can hop either one or two steps at a time. How many different ways can it climb n steps?
66. Show that if you cut off two opposite corner squares of a chess board, you cannot cover the rest with dominoes.
67. Show that n squares with total area less than 1/2 can always be fit into a unit square (non-overlapping).
68. Show that n squares with total area greater than 3 can always cover the surface of the unit square (non-overlapping).
69. You color all points of a plain with three colors. Show that I can always find two points of the same color that are a given distance apart.
70. You color all points on an equilateral triangle with two colors. I try to find a right triangle with its vertices on the edges of your triangle and all vertices having the same color. Can you find a coloring that prevents this?
71. How many 1's are there in the digits of numbers from 1 to 1 million? (one minute time limit).
72. (Michael) There is a piece of candy on every node of a binary tree. Find the shortest path through the binary tree that lets you collect all of the candies.
73. (Ihsan) Two men, x distance apart, start walking toward each other with speed v. At that instant a fly starts flying from one men's nose to the other with 2v speed. The fly keeps flying back and forth between the two noses until the guys meet. How much distance has the fly flown when they meet? (There is an easy way and a hard way to solve this).
74. (Ihsan) A coin is flipped until the first head appears. If you get a head in n flips you win \$2n. How much are you willing to pay to play this game?
75. (Bilim ve Teknik) You need to paint the area under the curve 1/x. How can you do it with a finite amount of paint?

Full post...

## September 26, 2013

### Enis Rıfat Sert, M.S. 2013

Current position: Software Engineer at Google, Seattle. (email).
M.S. Thesis: Word Context and Token Representations from Paradigmatic Relations and Their Application to Part-of-Speech Induction. Koç University, Department of Computer Engineering. September, 2013. (PDF, Presentation).
Publications: bibtex.php

Abstract:
Representation of words as dense real vectors in the Euclidean space provides an intuitive definition of relatedness in terms of the distance or the angle between one another. Regions occupied by these word representations reveal syntactic and semantic traits of the words. On top of that, word representations can be incorporated in other natural language processing algorithms as features.

In this thesis, we generate word representations in an unsupervised manner by utilizing paradigmatic relations which are concerned with substitutability of words. We employ an Euclidean embedding algorithm (S-CODE) to generate word context and word token representations from the substitute word distributions, in addition to word type representations. Word context and word token representations are capable of handling syntactic category ambiguities of word types because they are not restricted to a single representation for each word type.

We apply the word type, word context and word token representations to the part-of-speech induction problem by clustering the representations with k-means algorithm and obtain type and token based part-of-speech induction for Wall Street Journal section of Penn Treebank with 45 gold-standard tags. To the best of our knowledge, these part-of-speech induction results are the state-of-the-art for both type based and token based part-of-speech induction with Many-To-One mapping accuracies of 0.8025 and 0.8039, respectively. We also introduce a measure of ambiguity, Gold-standard-tag Perplexity, which we use to show that our token based part-of-speech induction is indeed successful at inducing part-of-speech categories of ambiguous word types.

Full post...

## June 20, 2013

### NAACL 2013 Notes

NAACL was great fun this year. It seems everybody is interested in Semantics and ways of representing it using vectors, matrices, tensors, quantum operators, deriving these representations using neural networks, linear algebra, etc. Some open questions that are active research areas are (1) how to represent ambiguity, (2) how to represent asymmetric lexical relations, (3) how to represent larger chunks like phrases and clauses. Here are some notes and highlights:
http://www.socher.org/index.php/DeepLearningTutorial/DeepLearningTutorial
Slides and video from the deep learning tutorial. Papers: http://jmlr.org/papers/volume11/erhan10a/erhan10a.pdf http://ronan.collobert.com/pub/matos/2011_nlp_jmlr.pdf http://ronan.collobert.com/senna/ people seem more interested in finding good representations for supervised tasks than purely unsupervised solving a task. the primary way they get word representations is contrastive estimation (unsupervised pretraining). maybe we can compete with paradigmatic representations based on lm. at the very least lm based representations will be faster to compute and converge. mikolov has nn based lm model (RNNLM), may be better than srilm to use: http://www.fit.vutbr.cz/~imikolov/rnnlm/ he also has a paper showing vector space reps are amazingly good: e.g. x(dogs) - x(dog) = x(cats) - x(cat) or x(dog) - x(animal) = x(chair) - x(furniture) we should see if this kind of thing is true on scode sphere... http://www.aclweb.org/anthology/N13-1090.pdf A quote from the paper that highlights the advantages of vector space models compared to symbolic (or equivalently one-hot models; I realized symbolic representations are also vector representations, just pretty dumb ones with very high dimensions and all symbols equivalent to one-hot orthogonal vectors): As pointed out by the original proposers, one of the main advantages of these models is that the distributed representation achieves a level of generalization that is not possi- ble with classical ␣-gram language models; whereas a ␣-gram model works in terms of discrete units that have no inherent relationship to one another, a con- tinuous space model works in terms of word vectors where similar words are likely to have similar vec- tors. Thus, when the model parameters are adjusted in response to a particular word or word-sequence, the improvements will carry over to occurrences of similar words and sequences. Listening to Richard Socher on parsing, relation, sentiment and paraphrase recognition etc. I don't think we should try to use upos classes but rather the actual 25D vectors for extrinsic evaluation problems... Check out his work from the last couple years... www.socher.org has his code. He also talked about representing ambiguous words with multiple vectors. Some papers on ambiguity detection and representation: http://aclweb.org/anthology/N/N10/N10-1013.pdf http://aclweb.org/anthology/P/P12/P12-1092.pdf Ambiguity icin soyle birsey denesek: Her kelime scode'da bir degil 5 vektorle represent edilsin. scode calisirken Y vektorune en yakin X hangisiyse o attract edilsin. Converge ettiginde umit unambiguous kelimelerin 5 vektoru de ustuste binecek, ama ambiguous olanlar farkli yerlere converge edecek... Yukarida dedigimi kimse anlamamis, herkese ayri ayri anlattim, vakit olunca daha duzgun yazarim. Presumably scode converge ettiginde ambiguous X'lerin paired oldugu Y'lar daha genis bir alanda multimodal bir sekilde yayilmis olmali. Bunu test ettik mi? Bir X'le birlikte gorulen Y'lerin type ya da token weighted variance'ina bakilabilir. Bir adim oteye giderek bu X'in vektorunu ilgili Y'leri generate eden bir gaussian'in merkezi gibi gorebiliriz. Sonra da birden fazla gaussian var mi diye sorabiliriz DP gibi non-parametric bir clustering modeliyle. Bunu denedik mi? (Birileri denedigimiz seyleri bir yerlere yazsa da unutmasak :) Bu vector representation'larla ilgili pek cok konusmaya girince bizim CL draft bu haliyle zayif kaliyor oldugunu gordum. Ya ambiguity cozelim dogru durust, ya da en azindan diger vektorlerle karsilastiralim: C&W (senna), RNNLM (mikolov), CCA based, quantum, tensor based vs. Bu adamlara yazip vektorlerini sadece alsak ve clustering denesek yeter.
http://www.cs.columbia.edu/~scohen/naacl13tutorial/
The slides from the spectral tutorial. It turns out co-training and CCA are methods that work with co occuring vars as well. Would be nice to compare with scode and our wsd cl paper. Collins says Co-clustering may also be relevant to what we do with scode. http://en.wikipedia.org/wiki/Biclustering Guess what Lyle Ungar (one of the tutorial hosts) does with CCA: (1) find 30-d representations for words based on co occurrence with syntagmatic context features. (2) then feed these vectors to supervised prediction problems like pos tagging...

http://www.aclweb.org/anthology/N13-1014.pdf
Learning a pos tagger with little annotation. Another evaluation metric between supervised and unsupervised... Mali: Iki dil kullanmislar, bunlardan Malagasy icin github da bir test file var. Kullandiklari train+dev ve wikipedia dump file ile bir LM olusturdum. %16 unknown kelime var bu esas sorun. 64 subs m2o skoru %64.1, adamlarin elde ettigi type accuracy skorlari farkli model configurasyonlari icin yuzde 71, 72,74 ve 76. LM mimiz kotu oldugu icin bu seviyede kaldik bence. Suan farkettim ingilizce deneyleri var sanirim orada 'da karsilastiricam... Ingilizce WSJ uzerindeymis ve 15-24 arasi test set olarak kullanilmis. Biz unsupervised oldugumuz icin tum datayi kullanabiliriz. Iki annotator var ingilizce icin biri %78 diger %76 elde etmis. Biz suan m2o %80 ile ikisindende iyiyiz.
https://lirias.kuleuven.be/bitstream/123456789/397128/1/NAACL2013VulicMoensFinal.pdf
Volkan: cross lingual word similarities, burada da birseyler yapabiliriz diye tahmin ediyorum.
http://www.transacl.org/papers/
TACL started publishing! Latest TACL has three papers connecting language to action / world, the first of which is being presented now. Aydin, this is the type of domain in which MT techniques I think could help. There is going to be an ACL tutorial, and data/code is already available. http://www.transacl.org/wp-content/uploads/2013/03/paper49.pdf http://www.transacl.org/wp-content/uploads/2013/03/paper25.pdf http://www.transacl.org/wp-content/uploads/2013/05/paper193.pdf
http://aclweb.org/anthology/N/N13/N13-1071.pdf
Best short paper uses ontonotes to study coreference. I didnt know ontonotes had coreference. We should look at all the information and possible tasks in ontonotes. More generally it would be nice to have a repository of standard datasets and tasks on our website.
http://www.aclweb.org/anthology/N13-1104.pdf
A frame induction paper building on Chambers&Jurafsky 2011: http://www.stanford.edu/~jurafsky/acl2011-chambers-templates.pdf . Here is an 2008 paper: http://acl.eldoc.ub.rug.nl/mirror/P/P08/P08-1090.pdf
http://www.aclweb.org/anthology-new/N/N13/N13-1105.pdf
They even use quantum theory to derive word vectors! (based on syntagmatic representations :( I now have an official reason to study quantum theory :)
Tag perplexity vs substitute variance:
Bu sabahki invited talk productive gecti. Asagida ikinci paragraftaki variance hesabini yaptim. Yani belli bir X'le birlikte observe edilen Y'lerin variance'lari o X'in ambiguity'si hakkinda birsey soyluyor mu. Ekte yazdigim kod ve cikan graph'i gonderiyorum. Pozitif bir correlation gorunuyor ama cok conclusive degil. Bu ambiguity detection isinde temel problem gold tag'lerle ve gold tag perplexity ile karsilastirmaya calismamiz olabilir. Bildiginiz gibi scode NN'leri 5, NNP'leri 3, VB'leri 4 class'a vs ayirmayi tercih ediyor. Dolayisiyla PTB'nin unambiguous dedigi bazi kelimeler bizim standartlara gore ambiguous olabilir. Kaldi ki scode part-of-speech ile ugrastigimizi bilmiyor. Dolayisiyla word sense ambiguity'si olan kelimeleri de ayirmak istiyor olabilir. Sonuc: gold tag perplexity'den bagimsiz bir kriter bulmamiz lazim ambiguity detection ve handling icin. scode matematigindeki likelihood fonksiyonunu improve ediyor mu bir kelimeyi ikiye bolmek vs gibi bir soru sorabilir miyiz?
http://www.aclweb.org/anthology/N13-1120.pdf
Vector based relational similarity bizim scode icin iyi bir extrinsic evaluation olabilir.
Dan Roth's preposition paper seem relevant to stateml. There is no verb, no event, just a stated relation...
http://www.cc.gatech.edu/~jeisenst/
Jacob suggested Kevin Murphy's new ML book and Collins' NLP notes. Here is a nice amazon review comparing various ML texts. BRML, ITILA, GPML, and ESL are freely available online.
From: john.pate@mq.edu.au Hi Deniz, It was nice to meet you! I've put the brent forced alignment on my Macquarie webspace: http://web.science.mq.edu.au/~jpate/br_tmpdata.33.map+score This is the Large Brent dataset from: http://dx.doi.org/10.1017/S0305000910000085 The first column is the number of the phone. The second column is the number of the utterance. The third column identifies the mother-child dyad, recording session and start and end timestamp. The fourth column identifies how many 10ms frames into the utterance the phone starts (so if it says 10, the phone starts 100ms from the start of the utterance timestamp). The fifth column identifies the end of the phone, the sixth provides the gold-standard phone, and the seventh column is the log-likelihood of the alignment. I've attached the original train/test partition. The DMV library is located at: https://github.com/jpate/predictabilityParsing To see examples of how to run it, look at: https://github.com/jpate/runDMV I reccommend always using the fold-unfold implementation, since it's faster and gives identical results; you can see how by following the - -foldUnfold flags in the runDMV repository. Also, if you were interested in the language acquisition angle of my talk, you might have a look at chapter 6 of my dissertation: http://web.science.mq.edu.au/~jpate/jkpate_dissertation.pdf It also has experiments on hand-annotated break index, and, in section 6.4, an examination of the posterior distribution over trees to try to tease apart the relative contribution of predictability effects and prosodic structure. Best, John - -- John K Pate http://jkpate.net/ Husnu: DMV modelinde phonetic feature (word duration) ekleyerek improvement elde etmisler. Mali: biz de pos induction icin bu tip feature kullanalim diyorduk... Belki adamin dataseti ise yarar. Model purely lexical by the way...
http://www.aclweb.org/anthology/N13-1134.pdf
uses tensor arithmetic to represent word composition. they seem to get disambiguation for free...
Unsupervised CCG grammar induction using HDP. State of the art on 15 languages. Performance does not fall as fast with longer sentences.
You shall know a word by the company it keeps.
Everybody uses this quotation from (Firth, J. R. 1957:11). Must find a paradigmatic version to wake people up! You can know a word better by what it can replace :)
http://www.aclweb.org/anthology/N13-1092.pdf
There is a new paraphrase database.
http://www.aclweb.org/anthology/S13-1002.pdf
The first two talks of *SEM made me think of limitations of our word vectors: - we have one word = one vector - how do we represent sets (apple and banana) - how do we represent ambiguity (either apple or banana) - how do we represent probability (80% apple, %20 banana) and is this different from ambiguity - we use cosine for similarity - similarity is symmetric, lexical relations usually aren't: apple implies fruit, but fruit does not imply apple. Probabilistically if I see apple 100% I see fruit. But if I see fruit maybe it is apple 20% of the time. Symmetric cosine is not going to cut it. - so how do we represent entailment, or generally set inclusion and/or intersection (cucumber and pickle do not entail each other but there is a large intersection) in a vector space (or matrix, tensor, quantum) model, and how do we learn this from data
Mali:
Merhaba hocam, Bu bahsettiginiz paperlara amerikadayken bakarim. Bu durumda iki yon var: Representation Based: Bu adamlar yeni representation mi olusturmuslar? Paradigmac vs C&W, RNNLM, CCA gibi mi olacak. Algorithmic Based: Yoksa bu metodlar syntagmatic representation kullaniyor biz bu metodlara paradigmatic bilgiyi entegre edicez. Deniz: Ben representation based dusunmustum, ama digeri daha uzun vadede daha da iyi sonuc verebilir.

http://aclweb.org/anthology/S/S13/S13-1011.pdf
Laura Rimell ek olarak Beagle representation da suggest etti. http://www.indiana.edu/~clcl/BEAGLE/ Diger representation'lara referans iki tutorial slide'larinda olmali: http://www.cs.columbia.edu/~scohen/naacl13tutorial/ http://www.socher.org/index.php/DeepLearningTutorial/DeepLearningTutorial Senna (C&W): bu thread'deki ilk email'imdan: http://ronan.collobert.com/senna/ CCA: http://www.cs.columbia.edu/~scohen/naacl13tutorial/ Ozellikle Lyle Ungar'in yaptigi islere bakin.
http://amr.isi.edu
Abstract meaning representations seem to be getting popular (from kevin night panel ppt). Seems to suggest the future of SMT: string, tree, and graph automata... nobody knows how to go to graphs yet...
On 6/15/2013 10:09 AM, Deniz Yuret wrote: Dear Kevin, I couldn't catch you after the panel, but one thing in your slides stuck with me: the column for learning string to graph mappings was empty! Is there really no theory / work about this? (I was hoping I would learn how to do this from MT folk and apply to other learning problems :)
deniz, tree transducers go back to 1970, and after three decades of work, there is a lot of excellent theory -- even so, we had to adapt some of the standard devices to work in machine translation ... likewise, there is a lot of graph grammar work, but i'm not satisfied yet that we've found the right thing for us, i.e., for unordered, sparse semantic feature structures, entities playing multiple roles, english realization with pronouns, zero pronouns, control structures, reflexives, etc. we started exploring, though! you can see a couple of papers on my webpage with daniel quernheim (on dag acceptors and dag-to-tree transducers), plus a couple of newer papers (including rejected ones, of course) on hyperedge replacement grammars (HRG). we'll have a paper at acl (first author david chiang) about a parsing algorithm for HRG and a synchronous version of HRG aimed at transduction (e.g., meaning to english, and english to meaning). of course, unification grammar is super-appropriate & powerful, but probabilizing/inverting/learning it has been hard. hope that helps -- also wish we had the solution already! maybe we do & haven't recognized it yet... kevin
On combining vectors:
Laura Rimell uses circular convolution: http://www.aclweb.org/anthology/S13-1011.pdf Stephen Wu uses structured vectorial semantics: http://www.aclweb.org/anthology/S13-1021.pdf They both use random indexing to represent words. We could use random indexing on our subst vectors: assign a random vector to each substitute, each token then becomes a probability weighted sum of substitute vectors.
http://www.aclweb.org/anthology/S13-1035.pdf
new dataset from google on syntactic ngrams.
http://www.aclweb.org/anthology/S13-1037.pdf
David Forsyth had a nice invited talk. Julia Hockenmeier seems to have a good vision to language dataset. We should find his slides...
http://www.thespermwhale.com/jaseweston/
Jason Weston from google gave an invited talk in vision language workshop. He embeds images in the same space as words. He solves the ambiguity problem using multiple vectors for each word as I suggested. Words map to multiple vectors, context (in his case image) maps to one. Similarity determined using closest match. Number of senses determined using cross validation. eccv 2012 paper discusses ambiguity.
http://www.aclweb.org/anthology/S13-2017.pdf
The melodi system from semeval also seems to represent ambiguity.
https://www.aclweb.org/anthology-new/S/S13/S13-2044.pdf
The spatial annotation task could potentially be used for (or converted to) a visual inference task. Part of the difficulty as you explained is to decide on the "correct" annotation format of a sentence. However regardless of the annotation, people's understanding of spatial structure is probably pretty consistent. Have you thought of a spatial task where annotation can be implicit but its details would not matter as much: i.e. a textual entailment, visual inference, spatial question answering type task? Something that would involve non-obvious spatial thinking such that pure keyword matching would go wrong but simple spatial labeling / modeling may get right... I would be very interested in such a task both as a participant and as an organizer. In the vision-language workshop yesterday people were talking about a visual entailment type task in the panel. But by visual entailment they mean questions that can be answered looking at a picture + text. Dealing with pixels is difficult and I think misses the point. Directly working with pure spatial concepts and representations may be a good way to go...
Oleksandr's story annotation tool. Here is Mark's related tool for reference: http://projects.csail.mit.edu/workbench and a link to the narrative workshop.
http://pages.cs.wisc.edu/~jerryzhu/pub/ttp.pdf

ftp://ftp.cis.upenn.edu/pub/graphics/rama/papers/agents.pdf
Ray Mooney on work other than WordsEye on language visualization: This is a paper by Jerry Zhu at U Wisc
Schuler, Badler, Palmer have papers on using NL instructions to guide behavior of virtual people.
http://www.isi.edu/natural-language/people/bayes-with-tears.pdf
Kevin Knight's Bayes tutorial.
http://dl.acm.org/citation.cfm?id=2387650

http://dl.acm.org/citation.cfm?id=2380822
Islam Beltagy on asymmetric relations in VSM: I think the two papers below summarize the attempts in doing asymmetric distributional semantics. This one compares between the known techniques: Identifying hypernyms in distributional semantic spaces Lenci, Alessandro and Benotto, Giulia and this one extends the same technique to phrase level. Entailment above the word level in distributional semantics Baroni, Marco and Bernardi, Raffaella and Do, Ngoc-Quynh and Shan, Chung-chieh
http://www.jair.org/media/2934/live-2934-4846-jair.pdf
Turney and Pantel's review of vector space semantics. Could be useful to find applications and methods.

Full post...

## May 28, 2013

### AI-KU: Using Substitute Vectors and Co-Occurrence Modeling For Word Sense Induction and Disambiguation

Baskaya, Osman and Sert, Enis and Cirik, Volkan and Yuret, Deniz. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013). June, 2013. Atlanta, Georgia, USA. (Download PDF, see the proceedings).

Abstract:
Word sense induction aims to discover different senses of a word from a corpus by using unsupervised learning approaches. Once a sense inventory is obtained for an ambiguous word, word sense discrimination approaches choose the best-fitting single sense for a given context from the induced sense inventory. However, there may not be a clear distinction between one sense and another, although for a context, more than one induced sense can be suitable. Graded word sense method allows for labeling a word in more than one sense. In contrast to the most common approach which is to apply clustering or graph partitioning on a representation of first or second order co-occurrences of a word, we propose a system that creates a substitute vector for each target word from the most likely substitutes suggested by a statistical language model. Word samples are then taken according to probabilities of these substitutes and the results of the co-occurrence model are clustered. This approach outperforms the other systems on graded word sense induction task in SemEval-2013.

Full post...

## May 19, 2013

### Pitfalls of studying language in isolation

Studies of language acquisition and language understanding display a remarkable lack of attention to the subject matter of the utterances being studied.  This is probably because nobody knows how to represent and process meaning whereas the forms of utterances are readily available.  Thus "language acquisition" have come to mean the study of learning how to construct utterances "of the right form" and studies of language understanding focus on translating forms of utterances into other symbolic forms equally devoid of the richness and detail of the things the utterance is supposed to convey.
A real theory of language acquisition should study how babies learn to decode form-meaning mappings in an environment where lots of things are going on in addition to what is being said.  A real theory of language understanding should study what kinds of rich interconnected concepts and embodied simulations get triggered by words and constructions, how we decide what to simulate given the scant detail in descriptions, and what inferences are made possible beyond what is explicitly stated.

All this is AI-complete you say?  Well by limiting ourselves to study language in isolation, we may have come to the end of the line where the ~80% accuracy limit of machine learning based computational linguistics (on almost any linguistic problem you can think of) is preventing us from building truly transformative applications.  Maybe we are shooting ourselves in the foot, and maybe, just maybe, some problems that look difficult right now are difficult not because we are missing the right machine learning algorithm or sufficient labeled data but because we are ignoring the constraints imposed by the meaning side of things.  We may have finally run out of options other than to try and crack the real problem, i.e. modeling what utterances are ABOUT.

Full post...

## February 27, 2013

### Bret Victor

Bret Victor's inspirational talk with his views on (1) how to flourish fragile ideas, and (2) how to live your life. For more from Bret, check out his website.
Full post...

### Vi Hart

For more from my 7 year old daughter's new favorite educator Vi Hart check out Khan Academy, YouTube, other videos, Wikipedia, or her blog.
Full post...

## January 25, 2013

### A Mathematician's Lament by Paul Lockhart

This is a wonderful and heartfelt book about the poetry of mathematics and how we fail to introduce it to students at schools and to society in general. I can finally stop feeling guilty about my obsession with popular math books and cool problems even though I know they probably have no practical value to me or to society whatsoever. After all it is not practical value that makes reading poetry a pleasure! I am thankful to all my wonderful math teachers who bequeathed me this guilty pleasure without my knowledge or consent in spite of the horrible curriculum, text books, and test system. And I highly recommend the author's second book "Measurement" to those who seek an alternative to the typical 10 pound high school textbooks full of repetitive and unimaginative problems.

Full post... Related link