Advertisement
Guest User

Untitled

a guest
Jul 27th, 2017
56
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.92 KB | None | 0 0
  1. Hello fine people of the Ganglia community!
  2.  
  3. I recently joined the team at a company with a long running Ganglia,
  4. set up many years ago by persons unknown to me, and not documented in
  5. any manner that we can find. It's still working, but the server it's
  6. on is in a failure state with some bad ram (we think) and we get
  7. missing bits of the graphs. Aside from this being annoying, we're a
  8. little bit afraid that if we powered the machine off, it may never
  9. come back on - as happens.
  10.  
  11. So, one of my tasks in recent weeks is to rebuild our Nagios and
  12. Ganglia setups. And I'm running into a wierd problem, which I will
  13. explain after a brief overview of how we use Ganglia - which isn't
  14. likely to change soon, for a number of reasons, though I have fielded
  15. some suggestions on how we may do things differently and why.
  16.  
  17. So, there are about a hundred machines, ish, each running gmond
  18. configured to send data unicast to the collector. I've modified their
  19. configuration such that currently, each sends data to our existing
  20. host and to the new host. I am receiving at least some data for all
  21. machines, but I am missing quite a bit of data, esp load_one from
  22. almost everything, resulting in lots of broken images where I'd like
  23. to see graphs.
  24.  
  25. These machines are split into clusters and grids with clusters in
  26. them, and it's .. well .. that's how it is. It looks something like
  27. this:
  28.  
  29. SH Grid
  30. - Content Grid < (crawl, workflow - clusters)
  31. - Production Grid < (web, db, search, misc - clusters)
  32. - Dev QA (Cluster)
  33. - Corp Xen (Cluster)
  34. - Infrastructure (Cluster)
  35.  
  36. So, on the collector host, there are three gmetad processes running:
  37.  
  38. gmetad: SH Grid
  39. gmetad: Content Grid
  40. gmetad: Production Grid
  41.  
  42. As well as numerous gmond:
  43.  
  44. gmond: crawl
  45. gmond: workflow
  46. gmond: web
  47. gmond: db
  48. gmond: search
  49. gmond: misc
  50. gmond: dev/qa
  51. gmond: xen
  52. gmond: infra
  53.  
  54. The configuration is exactly duplicated from the existing, "working"
  55. host, by which I mean that I am actually using the same configuration
  56. files. I was using gmetad 3.1 with gmond 3.0, but I decided that even
  57. though that should work and seemed not to be the problem, it wouldn't
  58. hurt to shore up the versions and am currently using both from 3.0.
  59.  
  60. I have a few problems with this new setup:
  61.  
  62. * Grids disappear and reappear sporadically - e.g. the Production
  63. grid is often not on the page, and today when I click through to
  64. production grid it takes me directly to web cluster because it is
  65. apparently not aware of any other clusters.
  66. * Wierd things happen - I know this is vague, but I'll lead with an
  67. example: when I click "Dev QA" sometimes it is reported as part of
  68. Production Grid, other times as part of Content Grid, when in fact it
  69. is a part of top-level "SH Grid".
  70.  
  71. I'm sure there is other wierdness, but some of it may come into focus
  72. more if I get past these overwhelming problems.
  73.  
  74. Thanks in advance for any help that any of you can offer!
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement