- If you are an undergrad, take as many math and physics course as you can, and learn to program. If you are an aspiring grad student: apply to schools where there is someone you want to work with. It's much more important that the ranking of the school (as long as the school is in the top 50).
- Read, learn from on-line material, try things for yourself. As Feynman said: don't read everything about a topic before starting to work on it. Think about the problem for yourself, figure out what's important, then read the literature. This will allow you to interpret the literature and tell what's good from what's bad.
- Don't get fooled by people who claim to have a solution to Artificial General Intelligence, who claim to have AI systems that work "just like the human brain", or who claim to have figured out how the brain works (well, except if it's Geoff Hinton making the claim). Ask them what error rate they get on MNIST or ImageNet.
- We are using Torch for most of our research projects (and some of our development projects) at Facebook. Deep Mind is also using Torch in a big way (largely because my former student and Torch-co-maintainer Koray Kavukcuoglu sold them on it). Since the Deep Mind acquisition, folks in the Google Brain group in Mountain View have also started to use it.
- Kernel methods are great for many purposes, but they are merely glorified template matching. Despite the beautiful math, a kernel machine is nothing more than one layer of template matchers (one per training sample) where the templates are the training samples, and one layer of linear combinations on top.
- There is nothing magical about margin maximization. It's just another way of saying "L2 regularization" (despite the cute math).
- There is no opposition between deep learning and graphical models. Many deep learning approaches can be seen as factor graphs. I posted about this in the past.
- (On most promising current areas of research): Representation learning (the current crop of deep learning methods is just one way of doing it); learning long-term dependencies; marrying representation learning with structured prediction and/or reasoning; unsupervised representation learning, particularly prediction-based methods for temporal/sequential signals; marrying representation learning and reinforcement learning; using learning to speed up the solution of complex inference problems; theory: do theory (any theory) on deep learning/representation learning; understanding the landscape of objective functions in deep learning; in terms of applications: natural language understanding (e.g. for machine translation), video understanding; learning complex control.
- (On HTM): Jeff Hawkins has the right intuition and the right philosophy. Some of us have had similar ideas for several decades. Certainly, we all agree that AI systems of the future will be hierarchical (it's the very idea of deep learning) and will use temporal prediction. But the difficulty is to instantiate these concepts and reduce them to practice. Another difficulty is grounding them on sound mathematical principles (is this algorithm minimizing an objective function?). I think Jeff Hawkins, Dileep George and others greatly underestimated the difficulty of reducing these conceptual ideas to practice. As far as I can tell, HTM has not been demonstrated to get anywhere close to state of the art on any serious task.
- Do you think having a PhD is important if you want to work in a good research team in the industry?