Advertisement
Guest User

123

a guest
Dec 31st, 2015
29
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 5.83 KB | None | 0 0
  1. first rabbit node-16 got deployed and never failed since then, behaved OK:
  2. 2015-12-29T12:34:29.287391+00:00 info: INFO: p_rabbitmq-server: get_monitor(): rabbit app is running and is member of healthy cluster
  3.  
  4. started rabbit deploy task on other controllers:
  5. 2015-12-29T13:03:07.100362+00:00 node-17 puppet-apply notice: (Scope(Class[main])) MODULAR: rabbitmq.pp
  6.  
  7. node-17 was kept/recovered in the cluster with 16,18 all the time, behaved OK
  8.  
  9. cluster failed first:
  10. 2015-12-29T13:16:19.527375+00:00 node-19 lrmd err: ERROR: p_rabbitmq-server: get_monitor(): 'rabbitmqctl list_queues' timed out 1 of max. 1 time(s) in a row and is not responding. The resource is failed.
  11.  
  12. node-18 appeared out of cluster, was never recovered after that
  13. 2015-12-29T13:18:27.271940+00:00 node-18 lrmd err: ERROR: p_rabbitmq-server: get_monitor(): rabbit node is running out of the cluster
  14.  
  15. partitions started:
  16. 2015-12-29T13:17:56.239430+00:00 node-18 rabbitmq notice: Mnesia('rabbit@node-18'): ERROR mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-19'}
  17. 2015-12-29T13:17:56.697605+00:00 node-19 rabbitmq notice: Mnesia('rabbit@node-19'): ERROR mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-16'}
  18. 2015-12-29T13:17:56.697605+00:00 node-19 rabbitmq notice: Mnesia('rabbit@node-19'): ERROR mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-17'}
  19. 2015-12-29T13:17:56.697605+00:00 node-19 rabbitmq notice: Mnesia('rabbit@node-19'): ERROR mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-18'}
  20. 2015-12-29T13:18:02.047500+00:00 node-17 rabbitmq notice: Mnesia('rabbit@node-17'): ERROR mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-18'}
  21. 2015-12-29T13:18:02.047500+00:00 node-17 rabbitmq notice: Mnesia('rabbit@node-17'): ERROR mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-19'}
  22.  
  23. node-20 appeared out of cluster (the partition without the node-16), and recovered:
  24. 2015-12-29T13:18:28.257567+00:00 node-20 lrmd err: ERROR: p_rabbitmq-server: get_monitor(): rabbit node is running out of the cluster
  25. 2015-12-29T13:19:14.946196+00:00 node-20 lrmd info: INFO: p_rabbitmq-server: get_monitor(): rabbit app is running. master is node-16.test.domain.local
  26. 2015-12-29T13:19:15.553883+00:00 node-20 lrmd info: INFO: p_rabbitmq-server: get_monitor(): rabbit app is running and is member of healthy cluster
  27.  
  28. node-19 recovered in the partition with node-16:
  29. 2015-12-29T13:19:26.134129+00:00 node-17 lrmd info: INFO: p_rabbitmq-server: get_monitor(): rabbit app is running and is member of healthy cluster
  30.  
  31. node-20 failed and was never recovered after that:
  32. 2015-12-29T13:20:15.810503+00:00 node-20 lrmd err: ERROR: p_rabbitmq-server: get_monitor(): 'rabbitmqctl list_channels' timed out 4 of max. 1 time(s) in a row and is not responding. The resource is failed.
  33. 2015-12-29T13:20:45.944635+00:00 node-20 lrmd info: INFO: p_rabbitmq-server: get_monitor(): CHECK LEVEL IS: 0
  34. 2015-12-29T13:20:51.433388+00:00 node-20 lrmd info: INFO: get_status(): app kernel was not found in command output: []
  35. 2015-12-29T13:20:51.436963+00:00 node-20 lrmd info: INFO: p_rabbitmq-server: get_monitor(): get_status() returns 7.
  36. 2015-12-29T13:20:51.440569+00:00 node-20 lrmd info: INFO: p_rabbitmq-server: get_monitor(): ensuring this slave does not get promoted.
  37. (repeats)
  38.  
  39. node-19 died/recovered few times:
  40. 2015-12-29T13:20:26.355010+00:00 node-19 lrmd err: ERROR: p_rabbitmq-server: get_monitor(): 'rabbitmqctl list_channels' timed out 1 of max. 1 time(s) in a row and is not responding. The resource is
  41. failed.
  42. 2015-12-29T13:24:44.938585+00:00 node-19 lrmd info: INFO: p_rabbitmq-server: get_monitor(): rabbit app is running and is member of healthy cluster
  43. 2015-12-29T13:26:49.262999+00:00 node-19 lrmd err: ERROR: p_rabbitmq-server: get_monitor(): 'rabbitmqctl list_channels' timed out 2 of max. 1 time(s) in a row and is not responding. The resource is failed.
  44.  
  45. partitions seem ended:
  46. 2015-12-29T13:28:27.258277+00:00 node-19 rabbitmq notice: Autoheal request sent to 'rabbit@node-16'
  47. 2015-12-29T13:28:27.258277+00:00 node-19 rabbitmq notice: Mnesia('rabbit@node-19'): ERROR mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-16'}
  48. 2015-12-29T13:28:27.258277+00:00 node-19 rabbitmq notice: Mnesia('rabbit@node-19'): ERROR mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-18'}
  49. 2015-12-29T13:28:30.913524+00:00 node-18 rabbitmq notice: Mnesia('rabbit@node-18'): ERROR mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-19'}
  50. 2015-12-29T13:28:30.913524+00:00 node-18 rabbitmq-sasl notice: {inconsistent_database,running_partitioned_network,
  51.  
  52. node-19 entered the join failure loop (2/139 exit codes) for ever:
  53. 2015-12-29T13:30:47.878276+00:00 node-19 lrmd info: INFO: p_rabbitmq-server: su_rabbit_cmd(): the invoked command exited 2: /usr/sbin/rabbitmqctl join_cluster rabbit@node-16
  54.  
  55. dump definitions task was started and failed the deployment as the node-19 was not in the rabbit cluster:
  56. 2015-12-29T13:42:37.575708+00:00 node-19 puppet-apply notice: (Scope(Class[main])) MODULAR: dump_rabbitmq_definitions.pp
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement