Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Hello fine people of the Ganglia community!
- I recently joined the team at a company with a long running Ganglia,
- set up many years ago by persons unknown to me, and not documented in
- any manner that we can find. It's still working, but the server it's
- on is in a failure state with some bad ram (we think) and we get
- missing bits of the graphs. Aside from this being annoying, we're a
- little bit afraid that if we powered the machine off, it may never
- come back on - as happens.
- So, one of my tasks in recent weeks is to rebuild our Nagios and
- Ganglia setups. And I'm running into a wierd problem, which I will
- explain after a brief overview of how we use Ganglia - which isn't
- likely to change soon, for a number of reasons, though I have fielded
- some suggestions on how we may do things differently and why.
- So, there are about a hundred machines, ish, each running gmond
- configured to send data unicast to the collector. I've modified their
- configuration such that currently, each sends data to our existing
- host and to the new host. I am receiving at least some data for all
- machines, but I am missing quite a bit of data, esp load_one from
- almost everything, resulting in lots of broken images where I'd like
- to see graphs.
- These machines are split into clusters and grids with clusters in
- them, and it's .. well .. that's how it is. It looks something like
- this:
- SH Grid
- - Content Grid < (crawl, workflow - clusters)
- - Production Grid < (web, db, search, misc - clusters)
- - Dev QA (Cluster)
- - Corp Xen (Cluster)
- - Infrastructure (Cluster)
- So, on the collector host, there are three gmetad processes running:
- gmetad: SH Grid
- gmetad: Content Grid
- gmetad: Production Grid
- As well as numerous gmond:
- gmond: crawl
- gmond: workflow
- gmond: web
- gmond: db
- gmond: search
- gmond: misc
- gmond: dev/qa
- gmond: xen
- gmond: infra
- The configuration is exactly duplicated from the existing, "working"
- host, by which I mean that I am actually using the same configuration
- files. I was using gmetad 3.1 with gmond 3.0, but I decided that even
- though that should work and seemed not to be the problem, it wouldn't
- hurt to shore up the versions and am currently using both from 3.0.
- I have a few problems with this new setup:
- * Grids disappear and reappear sporadically - e.g. the Production
- grid is often not on the page, and today when I click through to
- production grid it takes me directly to web cluster because it is
- apparently not aware of any other clusters.
- * Wierd things happen - I know this is vague, but I'll lead with an
- example: when I click "Dev QA" sometimes it is reported as part of
- Production Grid, other times as part of Content Grid, when in fact it
- is a part of top-level "SH Grid".
- I'm sure there is other wierdness, but some of it may come into focus
- more if I get past these overwhelming problems.
- Thanks in advance for any help that any of you can offer!
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement