What we did about expired links on HN

Arc doesn't cause the expired link issue. Arc is just a Lisp that lets
you program however you want. The HN software is written in a style
that uses closures to remember what a user is up to across multiple
web requests. I'll describe a classic example of this, then how we
changed it a few months ago. While there are still some expired link
errors--primarily when we restart the server process--there are vastly
fewer than there were.

When you click "More" at the bottom of a page, HN shows you the next
30 stories. These are different for logged-in users--they're affected
by profile settings like "showdead"--so, if you're logged in, the
system needs to compute the next 30 stories to show you. Which 30
those are depends on the ones you were just looking at. So the server
needs to know what you were just looking at in order to handle the
"More" request correctly. How can it know that? There are many ways.
HN's traditional way is to make a closure (a function that remembers
any info it needs) at the time that it's generating the original page,
which, when called, will compute the correct next 30 stories.

The advantage is that, at the time you're making the closure, all the
information you need to process the next request correctly is right
there in scope. You don't have to remember it, reconstitute it, or
anything--all of which takes extra code, and different code for each
kind of request. You just say in the simplest way what you want the
next request to do, if and when the user makes it. Since that's
usually very similar to what you've just done (e.g. "these 30 stories
instead of those 30"), it may only take a line or two of code. That's
a huge win for simplicity.

You can't send the closure directly to the browser, so you make a
unique ID instead, save the closure in a table keyed by that ID, and
put the ID in a link. When the user clicks that link, the server gets
the ID, looks up the closure, and executes it.

If you're saving lots of closures, eventually you'll run out of RAM if
you don't garbage-collect some. So the HN server periodically prunes
the oldest ones. If one of those links is still open in a browser
window somewhere and the user clicks on it, the server will no longer
remember what to do. That is when HN says "Unknown or expired link".

As HN grew larger, there were many more of these closures, and the
odds were higher that someone would try to use one after it had been
pruned. A few months ago, since we were working on that part of the
code anyway, we decided to eliminate the most common cases. We
measured how often all the different types of closure were
created--there are a few dozen different kinds, IIRC--and replaced
the most common ones with more traditional ways of passing state back
to the server, like query strings and hidden form fields. We did that
until the total number of closures being created was down by an order
of magnitude, and then--equally importantly--we stopped. Now the
system has enough RAM to remember the vast majority of closures again,
and the "expired link" errors have dwindled to a tiny fraction of what
they were before. (The one unfortunate exception to this is when we
restart the server process. Then *all* the closures get pruned,
regardless of how much RAM you have to cache them.)

I was the person who changed the code to not use closures for "More"
links and the next most common cases, and it gave me a new
appreciation for how powerful the closure technique is. Getting rid of
those closures was a pain, and it made the code more complicated.

One reason the closure technique is so powerful is that you're
leveraging the programming language and runtime to do all your
book-keeping: whatever data is handy, you just reference. The system
keeps track of all the references for you. That's why using things
like query strings and hidden form fields is more complicated: you
have to handle all those details yourself--not to mention serialize
and deserialize them, if you're passing through any format other than
what your program keeps in memory. That is tedious, and when your app
has many kinds of request, the complexity quickly piles up. Of course
there are other kinds of abstraction you can build over this, but
closures are an elegant one—especially in cases where programming
simplicity is more important than scalability, which, of course, is
most cases.

When HN was first being built, this tradeoff in favor of programming
ease was a no-brainer. In fact, it's because of methods like this that
HN exists at all, because HN wouldn't exist if one person (pg) hadn't
been able to build it in his spare time. The fact that it lasted as a
one-man side project for something like 7 years is evidence of how
powerful its software design is. The closure technique is a major part
of that, and it's still used throughout the codebase. It would be
foolish to try to replace all of them: the same tradeoff in favor of
programming efficiency is still the dominant factor in most of those
cases because they are relatively little used.