Advertisement
Guest User

Anonymous - BSD Story

a guest
Dec 15th, 2015
270
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 5.18 KB | None | 0 0
  1. Hi BSDNow-Team,
  2.  
  3. Allan told me to share a story with you, so here it is:
  4.  
  5. BEGIN OF STORY
  6.  
  7. I manage a cluster of machines (Dell C6220) at our computer science
  8. department that are intended for computing intensive jobs in the big
  9. data area (i.e. NoSQL "databases"). I took over the cluster from a
  10. colleague who had it running under Ubuntu 12.04. Now, I'm slowly but
  11. steadily converting it to FreeBSD on ZFS, bhyve and other such niceties.
  12. I had established a "bridge-head" node for testing purposes and it was
  13. running very well so far. Logins to the cluster are done via the
  14. departments LDAP server (which is out of my control).
  15. The machines in the cluster are pretty popular with students for their
  16. thesis work, research from professors as well as a couple of labs that
  17. make use of the computing ressources. At the moment, cluster machines
  18. are given exclusively to the users (a policy that I try to change) for a
  19. certain period of time to avoid someone else interfering with
  20. measurements and such. As you can imagine, this results in a shortness
  21. of nodes relatively quickly.
  22.  
  23. Recently, we got a request for a node from a student project in the
  24. middle of the semester. Apparently, they were using a VM from the our
  25. central IT department (VMware vSphere with a moderate NetApp filer for
  26. storage) and were having performance problems (not sure what kind). They
  27. asked us whether they can use a node on the cluster for their project.
  28. They sent me their requirements in an email:
  29.  
  30. OS: Ubuntu 15.04 or Debian 7 or CentOS 7.1 (CentOS preferred)
  31. * minimum 8 GB memory
  32. * at least 320 GB disk space
  33. * at least Intel Xeon E3 (the more dedicated, the better)
  34. * at least 100 MBit network connectivity
  35.  
  36. This clearly indicates that they had no idea about the real hardware of
  37. just one of our cluster nodes and were probably just citing what they
  38. previously got from central IT. Since the cluster has multiple times
  39. the hardware ressources in just one node, I thought it would be a waste
  40. to give them one of these nodes if they have such low requirements anyway.
  41. So I set up a bhyve VM (my very first actually) on my bridge-head
  42. FreeBSD node with a 400 GB ZFS volume. I configured bhyve to have 2
  43. CPUs (out of 8) and 8 GB RAM (out of 64 GB). After installing and
  44. configuring CentOS (not a very nice experience for me if you are used to
  45. BSD systems) for them, I gave them the IP address of the bhyve VM and a
  46. local user account. They were happy with it and are using that machine
  47. for roughly three weeks now without any complaints about performance
  48. whatsoever. As far as my monitoring tells me they are using that
  49. machine and the ZVOL does have a couple of gigabytes in it (not nearly
  50. as much as they requested, though).
  51.  
  52. One morning, I decided to upgrade a couple of packages on the
  53. bridge-head system (which is running CURRENT). Next thing I know, the
  54. openldap24-client package gets upgraded fine, but then errors start
  55. appearing saying that libssl.so.8 was not found. Remember that the
  56. users on these nodes are authenticated via LDAP only, so the PAM plugin
  57. wasn't working anymore. Which means that no one (not even root) could
  58. log into the machine anymore, which was running that bhyve VM. Ooops!
  59.  
  60. I figured the best way was to reboot the node, risk the downtime of the
  61. bhyve VM and try to fix things in single user mode. Unfortunately, I
  62. had no recent ZFS boot environment to roll back to. After booting a
  63. CURRENT snapshot ISO (great that FreeBSD has those!) into the Shell
  64. environment, I was able to zfs import the pool with the altroot=/mnt
  65. option. Then chrooted into /mnt and was successful in recompiling the
  66. openldap24-client package and rebooted again. Luckily, I was able to
  67. log in again and even the bhyve VM was unaffected. No one even
  68. complained about the downtime or that the CentOS was rebooted.
  69.  
  70. There are certainly better ways to handle this failure scenario, but it
  71. worked nontheless. I plan to expand this setup further with more
  72. FreeBSD nodes and even more bhyve instances. ZFS and boot environments
  73. (which I now create before running updates) make disaster recovery very
  74. easy and painless. Also, I found that although bhyve is still young
  75. (younger that its competitors in the hypervisor space), it is mature
  76. enough to run VMs for your users. If a student project is having
  77. performance problems on a big vSphere installation and are not
  78. complaining about performance at all when running on bhyve, that says a
  79. lot.
  80.  
  81. I think they will never figure out that they are not using the real
  82. cluster node and are instead running in a VM that has only a few
  83. ressources from the node. This reminds me of the quote from Scotty in
  84. the Star Trek TNG episode "Relics":
  85.  
  86. "Do you mind a little advice? Starfleet captains are like children. They
  87. want everything right now and they want it their way. But the secret is
  88. to give them only what they need, not what they want."
  89.  
  90. I think that does apply to a lot more people than just Starfleet
  91. captains.
  92.  
  93. END OF STORY
  94.  
  95.  
  96. Can you publish that story under an anonymous name, just in case they
  97. find out by watching your show? We won't get into any trouble with it,
  98. it's just a precaution.
  99.  
  100. Thanks and I look forward to your show every week (now even more)!
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement