Untitled

After a several year gap, I finally took another week-long programming retreat, where I could work in hermit mode, away from the normal press of work. My wife has been generously offering it to me the last few years, but I’m generally bad at taking vacations from work.
As a change of pace from my current Oculus work, I wanted to write some from-scratch-in-C++ neural network implementations, and I wanted to do it with a strictly base OpenBSD system. Someone remarked that is a pretty random pairing, but it worked out ok.
Despite not having actually used it, I have always been fond of the idea of OpenBSD — a relatively minimal and opinionated system with a cohesive vision and an emphasis on quality and craftsmanship. Linux is a lot of things, but cohesive isn’t one of them.
I’m not a Unix geek. I get around ok, but I am most comfortable developing in Visual Studio on Windows. I thought a week of full immersion work in the old school Unix style would be interesting, even if it meant working at a slower pace. It was sort of an adventure in retro computing — this was fvwm and vi. Not vim, actual BSD vi.
In the end, I didn’t really explore the system all that much, with 95% of my time in just the basic vi / make / gdb operations. I appreciated the good man pages, as I tried to do everything within the self contained system, without resorting to internet searches. Seeing references to 30+ year old things like Tektronix terminals was amusing.
I was a little surprised that the C++ support wasn’t very good. G++ didn’t support C++11, and LLVM C++ didn’t play nicely with gdb. Gdb crashed on me a lot as well, I suspect due to C++ issues. I know you can get more recent versions through ports, but I stuck with using the base system.
In hindsight, I should have just gone full retro and done everything in ANSI C. I do have plenty of days where, like many older programmers, I think “Maybe C++ isn’t as much of a net positive as we assume...”. There is still much that I like, but it isn’t a hardship for me to build small projects in plain C.
Maybe next time I do this I will try to go full emacs, another major culture that I don’t have much exposure to.
I have a decent overview understanding of most machine learning algorithms, and I have done some linear classifier and decision tree work, but for some reason I have avoided neural networks. On some level, I suspect that Deep Learning being so trendy tweaked a little bit of contrarian in me, and I still have a little bit of a reflexive bias against “throw everything at the NN and let it sort it out!”
In the spirit of my retro theme, I had printed out several of Yann LeCun’s old papers and was considering doing everything completely off line, as if I was actually in a mountain cabin somewhere, but I wound up watching a lot of the Stanford CS231N lectures on YouTube, and found them really valuable. Watching lecture videos is something that I very rarely do — it is normally hard for me to feel the time is justified, but on retreat it was great!
I don’t think I have anything particularly insightful to add about neural networks, but it was a very productive week for me, solidifying “book knowledge” into real experience.
I used a common pattern for me: get first results with hacky code, then write a brand new and clean implementation with the lessons learned, so they both exist and can be cross checked.
I initially got backprop wrong both times, comparison with numerical differentiation was critical! It is interesting that things still train even when various parts are pretty wrong — as long as the sign is right most of the time, progress is often made.
I was pretty happy with my multi-layer neural net code; it wound up in a form that I can just drop it into future efforts. Yes, for anything serious I should use an established library, but there are a lot of times when just having a single .cpp and .h file that you wrote ever line of is convenient.
My conv net code just got to the hacky but working phase, I could have used another day or two to make a clean and flexible implementation.
One thing I found interesting was that when testing on MNIST with my initial NN before adding any convolutions, I was getting significantly better results than the non-convolutional NN reported for comparison in LeCun ‘98 — right around 2% error on the test set with a single 100 node hidden layer, versus 3% for both wider and deeper nets back then. I attribute this to the modern best practices —ReLU, Softmax, and better initialization.
This is one of the most fascinating things about NN work — it is all so simple, and the breakthrough advances are often things that can be expressed with just a few lines of code. It feels like there are some similarities with ray tracing in the graphics world, where you can implement a physically based light transport ray tracer quite quickly, and produce state of the art images if you have the data and enough runtime patience.
I got a much better gut-level understanding of overtraining / generalization / regularization by exploring a bunch of training parameters. On the last night before I had to head home, I froze the architecture and just played with hyperparameters. “Training!” Is definitely worse than “Compiling!” for staying focused.
Now I get to keep my eyes open for a work opportunity to use the new skills!
I am dreading what my email and workspace are going to look like when I get into the office tomorrow.