everything we know about NNs is wrong

One interesting thing about deep learning is that even as ever better results surface, everything we know about NNs is wrong. A short list (in rough chronological order):

- "you need to pretrain a NN"
- "NNs require thousands of datapoints to train"
- "NNs must be trained by backpropagation"
- "deep learning will only work for images"
- "hybrid approaches like SVMs on top of NN features will always work better"
- "backpropagation in any form is biologically implausible"
- "CNNs are nothing like the human visual cortex & certainly don't predict its activations"
- "small NNs can't be trained directly, so NNs must need to be big"
- [style transfer arrives] "Who ordered that?"
- "simple SGD is the worst update rule"
- "simple self-supervision like next-frame prediction can't learn semantics"
- "adversarial examples will be easy to fix and won't transfer, well, won't blackbox transfer, well, won't transfer to realworld, well..."
- [batchnorm arrives] "Oops."
- "big NNs overfit by memorizing data"
- "you can't train 1000-layer NNs but that's OK, that wouldn't be useful anyway"
- "big minibatches don't generalize"
- "NNs aren't Bayesian at all"
- "convolutions are only good for images; only LSTM RNNs can do translation/seq2seq/generation/meta-learning"
- "you need small learning rates, not superhigh ones, to get fast training" (superconvergence)
- "memory/discrete choices aren't differentiable"
- [CycleGAN arrives] "Who ordered that?"
- "you can't learn to generate raw audio, it's too low-level"
- "you need bilingual corpuses to learn translation"
- "NNs can't do zero-shot or few-shot learning"
- "NNs can't do planning, symbolic reasoning, or deductive logic"
- "NNs can't do causal reasoning"
- "pure self-play is unstable and won't work"
- "you need shortcut connections, not new activations or initializations to train 1000-layer nets"
- "learning deep environment models is unstable and won't work"
- "we need hierarchical RL to learn long-term strategies" (not bruteforce PPO)
- "you can't reuse minibatches for faster training"