What we did about expired links on HN

a guest Oct 7th, 2014 18,769 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. Arc doesn't cause the expired link issue. Arc is just a Lisp that lets
  2. you program however you want. The HN software is written in a style
  3. that uses closures to remember what a user is up to across multiple
  4. web requests. I'll describe a classic example of this, then how we
  5. changed it a few months ago. While there are still some expired link
  6. errors--primarily when we restart the server process--there are vastly
  7. fewer than there were.
  9. When you click "More" at the bottom of a page, HN shows you the next
  10. 30 stories. These are different for logged-in users--they're affected
  11. by profile settings like "showdead"--so, if you're logged in, the
  12. system needs to compute the next 30 stories to show you. Which 30
  13. those are depends on the ones you were just looking at. So the server
  14. needs to know what you were just looking at in order to handle the
  15. "More" request correctly. How can it know that? There are many ways.
  16. HN's traditional way is to make a closure (a function that remembers
  17. any info it needs) at the time that it's generating the original page,
  18. which, when called, will compute the correct next 30 stories.
  20. The advantage is that, at the time you're making the closure, all the
  21. information you need to process the next request correctly is right
  22. there in scope. You don't have to remember it, reconstitute it, or
  23. anything--all of which takes extra code, and different code for each
  24. kind of request. You just say in the simplest way what you want the
  25. next request to do, if and when the user makes it. Since that's
  26. usually very similar to what you've just done (e.g. "these 30 stories
  27. instead of those 30"), it may only take a line or two of code. That's
  28. a huge win for simplicity.
  30. You can't send the closure directly to the browser, so you make a
  31. unique ID instead, save the closure in a table keyed by that ID, and
  32. put the ID in a link. When the user clicks that link, the server gets
  33. the ID, looks up the closure, and executes it.
  35. If you're saving lots of closures, eventually you'll run out of RAM if
  36. you don't garbage-collect some. So the HN server periodically prunes
  37. the oldest ones. If one of those links is still open in a browser
  38. window somewhere and the user clicks on it, the server will no longer
  39. remember what to do. That is when HN says "Unknown or expired link".
  41. As HN grew larger, there were many more of these closures, and the
  42. odds were higher that someone would try to use one after it had been
  43. pruned. A few months ago, since we were working on that part of the
  44. code anyway, we decided to eliminate the most common cases. We
  45. measured how often all the different types of closure were
  46. created--there are a few dozen different kinds, IIRC--and replaced
  47. the most common ones with more traditional ways of passing state back
  48. to the server, like query strings and hidden form fields. We did that
  49. until the total number of closures being created was down by an order
  50. of magnitude, and then--equally importantly--we stopped. Now the
  51. system has enough RAM to remember the vast majority of closures again,
  52. and the "expired link" errors have dwindled to a tiny fraction of what
  53. they were before. (The one unfortunate exception to this is when we
  54. restart the server process. Then *all* the closures get pruned,
  55. regardless of how much RAM you have to cache them.)
  57. I was the person who changed the code to not use closures for "More"
  58. links and the next most common cases, and it gave me a new
  59. appreciation for how powerful the closure technique is. Getting rid of
  60. those closures was a pain, and it made the code more complicated.
  62. One reason the closure technique is so powerful is that you're
  63. leveraging the programming language and runtime to do all your
  64. book-keeping: whatever data is handy, you just reference. The system
  65. keeps track of all the references for you. That's why using things
  66. like query strings and hidden form fields is more complicated: you
  67. have to handle all those details yourself--not to mention serialize
  68. and deserialize them, if you're passing through any format other than
  69. what your program keeps in memory. That is tedious, and when your app
  70. has many kinds of request, the complexity quickly piles up. Of course
  71. there are other kinds of abstraction you can build over this, but
  72. closures are an elegant one—especially in cases where programming
  73. simplicity is more important than scalability, which, of course, is
  74. most cases.
  76. When HN was first being built, this tradeoff in favor of programming
  77. ease was a no-brainer. In fact, it's because of methods like this that
  78. HN exists at all, because HN wouldn't exist if one person (pg) hadn't
  79. been able to build it in his spare time. The fact that it lasted as a
  80. one-man side project for something like 7 years is evidence of how
  81. powerful its software design is. The closure technique is a major part
  82. of that, and it's still used throughout the codebase. It would be
  83. foolish to try to replace all of them: the same tradeoff in favor of
  84. programming efficiency is still the dominant factor in most of those
  85. cases because they are relatively little used.
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand