January 03, 2011

Plausibility vs. Inference

Here are two examples of common sense judgement:

(1) I saw the Statue of Liberty flying over New York. => I was flying over New York.
(2) I gave the book to Mary. => Mary has the book.

The first one involves syntactic disambiguation. Either the subject or the object could be doing the flying (consider "I saw the airplane flying over New York.") Our "common sense" tells us that the Statue is too big to fly, so I am the more likely one flying.

The second one involves a semantic inference. The meaning of give involves a physical transfer or a transfer of possession, as a consequence the item given ends up with the recipient. Our "common sense" is full of such little factoids (here is another: "Oswald killed Kennedy." => "Kennedy is dead.") which let us see beyond what is explicitly stated in the text.

I want to emphasize that these examples are qualitatively different and calling them both "common sense judgements" may be confusing. The first one is a plausibility judgement (which is more likely to fly: me or the statue?). The second one is an exact inference, i.e. "give" definitely causes transfer and "kill" definitely causes death. To solve the first one we need a model of what is more likely to be happening in the world. To solve the second one we need more traditional inference of what entails what.

Disambiguation problems in computational linguistics (word sense disambiguation, resolving syntactic ambiguities, etc.) rely on plausibility judgements, not exact inference. A lot of work in AI "common sense reasoning" will not help there because traditionally reasoning and inference work focus on exact judgements.

As far as I can see nobody is working on plausibility judgements explicitly. Researchers use corpus statistics as a proxy to solve disambiguation problems. This may be obscuring the real issue: I think the right way to do linguistic disambiguation is to have a model of what is plausible in the world.

3 comments:

Unknown said...

So do we need something new, like the Cyc reasoning engine which adds plausibility inference mechanisms or is a radical new approach required? I remember that you had some criticisms against the Cyc approach and I wonder whether it would be possible to build a plausibility database / engine in open-source / collaborative style (for example a system like http://www.freebase.com, something where people can enter / edit / vote facts + relationships and some algorithms developed by researchers and experts that consume that data).

Deniz Yuret said...

I think inference engines in general are designed to solve exact inference, i.e. things we are sure about. Things like when you go to a restaurant, you are in the restaurant. In contrast by plausibility I mean things like when you go to a restaurant, you probably eat something. Now this is not always true: sometimes you just drink tea, or talk to a friend, or look at the menu and get out. Any sound inference engine will try to get at things that are always true, therefore will not necessarily be able to represent plausibility. I don't know the ideal way to collect and represent plausibility information. I just have some thoughts: (1) Schank's schemas were supposed to solve this problem, I think, but like every manually built KB it is bound to be wildly incomplete. (2) Statistical language models built on billions of words of text may give us some plausibility judgements. The two downsides are (i) obvious things are usually not stated in the text, and (ii) ngram models are hopelessly short distance and myopic. Crowdsourcing may be a way to get the manual approach more complete, however I am not very hopeful - plausibility is a LOT richer than inference (compare the set of everything that is likely to happen vs. the much smaller set of things we are sure will happen) and I have yet to see a semi-complete set of facts for inference. I would bet my money on some kind of statistical corpus based approach, at least to get going, but it would have to be a bit smarter than word ngram models and solve the "obvious things not in text" problem.

yusufarslan said...

It is interesting to think about a stochastic engine that can assess the probability between entities using data gathered from the internet.

By a algoritm that detects the distance of keywords that are used in common conversations (for example Facebook, Twitter etc.) the relations should be estimable. For the blind spots (obvious things not in text) there could be developed a enriching database that complements the probability.

I think that a big challenge is the contextual sensitivity of plausibility. (A photographer goes to a restaurant to take photos and a cook is going to a restaurant to ...)

Interesting post!