Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Hi BSDNow-Team,
- Allan told me to share a story with you, so here it is:
- BEGIN OF STORY
- I manage a cluster of machines (Dell C6220) at our computer science
- department that are intended for computing intensive jobs in the big
- data area (i.e. NoSQL "databases"). I took over the cluster from a
- colleague who had it running under Ubuntu 12.04. Now, I'm slowly but
- steadily converting it to FreeBSD on ZFS, bhyve and other such niceties.
- I had established a "bridge-head" node for testing purposes and it was
- running very well so far. Logins to the cluster are done via the
- departments LDAP server (which is out of my control).
- The machines in the cluster are pretty popular with students for their
- thesis work, research from professors as well as a couple of labs that
- make use of the computing ressources. At the moment, cluster machines
- are given exclusively to the users (a policy that I try to change) for a
- certain period of time to avoid someone else interfering with
- measurements and such. As you can imagine, this results in a shortness
- of nodes relatively quickly.
- Recently, we got a request for a node from a student project in the
- middle of the semester. Apparently, they were using a VM from the our
- central IT department (VMware vSphere with a moderate NetApp filer for
- storage) and were having performance problems (not sure what kind). They
- asked us whether they can use a node on the cluster for their project.
- They sent me their requirements in an email:
- OS: Ubuntu 15.04 or Debian 7 or CentOS 7.1 (CentOS preferred)
- * minimum 8 GB memory
- * at least 320 GB disk space
- * at least Intel Xeon E3 (the more dedicated, the better)
- * at least 100 MBit network connectivity
- This clearly indicates that they had no idea about the real hardware of
- just one of our cluster nodes and were probably just citing what they
- previously got from central IT. Since the cluster has multiple times
- the hardware ressources in just one node, I thought it would be a waste
- to give them one of these nodes if they have such low requirements anyway.
- So I set up a bhyve VM (my very first actually) on my bridge-head
- FreeBSD node with a 400 GB ZFS volume. I configured bhyve to have 2
- CPUs (out of 8) and 8 GB RAM (out of 64 GB). After installing and
- configuring CentOS (not a very nice experience for me if you are used to
- BSD systems) for them, I gave them the IP address of the bhyve VM and a
- local user account. They were happy with it and are using that machine
- for roughly three weeks now without any complaints about performance
- whatsoever. As far as my monitoring tells me they are using that
- machine and the ZVOL does have a couple of gigabytes in it (not nearly
- as much as they requested, though).
- One morning, I decided to upgrade a couple of packages on the
- bridge-head system (which is running CURRENT). Next thing I know, the
- openldap24-client package gets upgraded fine, but then errors start
- appearing saying that libssl.so.8 was not found. Remember that the
- users on these nodes are authenticated via LDAP only, so the PAM plugin
- wasn't working anymore. Which means that no one (not even root) could
- log into the machine anymore, which was running that bhyve VM. Ooops!
- I figured the best way was to reboot the node, risk the downtime of the
- bhyve VM and try to fix things in single user mode. Unfortunately, I
- had no recent ZFS boot environment to roll back to. After booting a
- CURRENT snapshot ISO (great that FreeBSD has those!) into the Shell
- environment, I was able to zfs import the pool with the altroot=/mnt
- option. Then chrooted into /mnt and was successful in recompiling the
- openldap24-client package and rebooted again. Luckily, I was able to
- log in again and even the bhyve VM was unaffected. No one even
- complained about the downtime or that the CentOS was rebooted.
- There are certainly better ways to handle this failure scenario, but it
- worked nontheless. I plan to expand this setup further with more
- FreeBSD nodes and even more bhyve instances. ZFS and boot environments
- (which I now create before running updates) make disaster recovery very
- easy and painless. Also, I found that although bhyve is still young
- (younger that its competitors in the hypervisor space), it is mature
- enough to run VMs for your users. If a student project is having
- performance problems on a big vSphere installation and are not
- complaining about performance at all when running on bhyve, that says a
- lot.
- I think they will never figure out that they are not using the real
- cluster node and are instead running in a VM that has only a few
- ressources from the node. This reminds me of the quote from Scotty in
- the Star Trek TNG episode "Relics":
- "Do you mind a little advice? Starfleet captains are like children. They
- want everything right now and they want it their way. But the secret is
- to give them only what they need, not what they want."
- I think that does apply to a lot more people than just Starfleet
- captains.
- END OF STORY
- Can you publish that story under an anonymous name, just in case they
- find out by watching your show? We won't get into any trouble with it,
- it's just a precaution.
- Thanks and I look forward to your show every week (now even more)!
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement