Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Arc doesn't cause the expired link issue. Arc is just a Lisp that lets
- you program however you want. The HN software is written in a style
- that uses closures to remember what a user is up to across multiple
- web requests. I'll describe a classic example of this, then how we
- changed it a few months ago. While there are still some expired link
- errors--primarily when we restart the server process--there are vastly
- fewer than there were.
- When you click "More" at the bottom of a page, HN shows you the next
- 30 stories. These are different for logged-in users--they're affected
- by profile settings like "showdead"--so, if you're logged in, the
- system needs to compute the next 30 stories to show you. Which 30
- those are depends on the ones you were just looking at. So the server
- needs to know what you were just looking at in order to handle the
- "More" request correctly. How can it know that? There are many ways.
- HN's traditional way is to make a closure (a function that remembers
- any info it needs) at the time that it's generating the original page,
- which, when called, will compute the correct next 30 stories.
- The advantage is that, at the time you're making the closure, all the
- information you need to process the next request correctly is right
- there in scope. You don't have to remember it, reconstitute it, or
- anything--all of which takes extra code, and different code for each
- kind of request. You just say in the simplest way what you want the
- next request to do, if and when the user makes it. Since that's
- usually very similar to what you've just done (e.g. "these 30 stories
- instead of those 30"), it may only take a line or two of code. That's
- a huge win for simplicity.
- You can't send the closure directly to the browser, so you make a
- unique ID instead, save the closure in a table keyed by that ID, and
- put the ID in a link. When the user clicks that link, the server gets
- the ID, looks up the closure, and executes it.
- If you're saving lots of closures, eventually you'll run out of RAM if
- you don't garbage-collect some. So the HN server periodically prunes
- the oldest ones. If one of those links is still open in a browser
- window somewhere and the user clicks on it, the server will no longer
- remember what to do. That is when HN says "Unknown or expired link".
- As HN grew larger, there were many more of these closures, and the
- odds were higher that someone would try to use one after it had been
- pruned. A few months ago, since we were working on that part of the
- code anyway, we decided to eliminate the most common cases. We
- measured how often all the different types of closure were
- created--there are a few dozen different kinds, IIRC--and replaced
- the most common ones with more traditional ways of passing state back
- to the server, like query strings and hidden form fields. We did that
- until the total number of closures being created was down by an order
- of magnitude, and then--equally importantly--we stopped. Now the
- system has enough RAM to remember the vast majority of closures again,
- and the "expired link" errors have dwindled to a tiny fraction of what
- they were before. (The one unfortunate exception to this is when we
- restart the server process. Then *all* the closures get pruned,
- regardless of how much RAM you have to cache them.)
- I was the person who changed the code to not use closures for "More"
- links and the next most common cases, and it gave me a new
- appreciation for how powerful the closure technique is. Getting rid of
- those closures was a pain, and it made the code more complicated.
- One reason the closure technique is so powerful is that you're
- leveraging the programming language and runtime to do all your
- book-keeping: whatever data is handy, you just reference. The system
- keeps track of all the references for you. That's why using things
- like query strings and hidden form fields is more complicated: you
- have to handle all those details yourself--not to mention serialize
- and deserialize them, if you're passing through any format other than
- what your program keeps in memory. That is tedious, and when your app
- has many kinds of request, the complexity quickly piles up. Of course
- there are other kinds of abstraction you can build over this, but
- closures are an elegant one—especially in cases where programming
- simplicity is more important than scalability, which, of course, is
- most cases.
- When HN was first being built, this tradeoff in favor of programming
- ease was a no-brainer. In fact, it's because of methods like this that
- HN exists at all, because HN wouldn't exist if one person (pg) hadn't
- been able to build it in his spare time. The fact that it lasted as a
- one-man side project for something like 7 years is evidence of how
- powerful its software design is. The closure technique is a major part
- of that, and it's still used throughout the codebase. It would be
- foolish to try to replace all of them: the same tradeoff in favor of
- programming efficiency is still the dominant factor in most of those
- cases because they are relatively little used.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement