Untitled

I think his point was indeed mainly about deep learning. Linear regression, for example, we understand quite well in what it computes. However, even using these ML methods we understand somewhat better, it is not entirely clear for which problems these methods are effective. We don't really have any clear way to reason why one method would perform better than another for a particular problem, and that's why you often see that people simply try all methods and see what performs best. On top of that it is also generally influenced by the data people use, and also that is not well-understood. In that sense it may also feel a little bit like pseudo-science, although I would agree that some ML methods are somewhat easier to reason about than deep learning.

As for gradient descent, we do indeed understand what it does, but what we understand is that it does not work well in general. Gradient descent finds a local minimum, but this local minimum might be much worse than the global minimum. In fact, all we really know about gradient descent is that it finds the global minimum for convex functions. Thus, we cannot really explain why gradient descent (or deep learning for that matter) seems to perform so well in practice (why does it not end up in a bad local minimum?). Also, we have no clear idea for which functions gradient descent is expected to work well, or how we should change things (e.g. how to change the neural network: more layers? more nodes? more connectivity? less connectivity? fewer nodes?) to make the gradient descent work better. And this is one of the main problems. He also mentions that in the video: when it works well, is this because the neural network was designed well, or because the problem was simply just not that hard? There is essentially no way of telling which one it is, at least with the tools we have now.

Therefore I think it is also important for people in machine learning to focus more on trying to understand why some things (e.g. changes to a network, different data) work well, and focus less on trying to (with mostly trial and error) come up with another example of something that works well for some arbitrary problem without understanding why that is actually the case. For the latter type of research you do not really need that much knowledge (in fact, there already exist many tools that do most of the work for you, and this will only become more and more advanced). However, for the first type of research you need much broader knowledge, something that does exist in classical machine learning (this is related to the rigor police he was talking about), but also includes knowledge about algorithms and optimization.

It is clearly enticing for students to start building deep learning networks that can do (seemingly) impressive things, because it makes them feel very powerful (similar things can be said for programming in Python, without knowing the more low-level stuff you for example find in C++). Surely, it is more effective in getting certain things done, but when it goes wrong or you are trying to understand some weird behavior, you will need to have some broader knowledge to fall back on. That's why I believe that students (especially honor students, who should be leaders of the future) should be equipped with broader knowledge. I think machine learning can play an important role in the future, but we should educate the future engineers with the toolset they need to lift machine learning (especially deep learning) to the next, more scientific, level.