October 21, 2005


[Math_ Learning_] The field of machine learning is full of acronyms, and nobody tells you what each method is good for! There are common issues in each machine learning problem: lack of sufficient data, presence of irrelevant and/or redundant attributes. It would be nice to have a review that looked at each ML approach with these in mind.

Here is some observations:

SVD is great because it deals with the sparse data problem in very high dimensional spaces (e.g. histograms). The SVD mapping of samples to a smaller dimensional space makes noisy data points (e.g. histograms based on few data points) align themselves with less noisy ones and allows hidden similarities to surface.

SVM (or other generalized discriminant methods) is great because when you work with discriminants, irrelevant dimensions hurt you less. A nearest neighbor approach can certainly be misled by adding a number of random dimensions that make similar points look far away. In a discriminant approach, if the data is separable in a lower dimensional space, they will certainly remain separable when you add extra irrelevant dimensions.

EM is great because it has the ability to uncover underlying structure. We are not simply talking about classifying points into positive and negative examples. We are talking about taking surface observations and asking what sort of unobserved mechanism gave rise to them. I think this is very little appreciated, as the "hidden" stuff people look for consist of something very simple (like HMM states). But I showed that the "hidden" stuff can be something as complex as a parse tree!

No comments: