Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- first rabbit node-16 got deployed and never failed since then, behaved OK:
- 2015-12-29T12:34:29.287391+00:00 info: INFO: p_rabbitmq-server: get_monitor(): rabbit app is running and is member of healthy cluster
- started rabbit deploy task on other controllers:
- 2015-12-29T13:03:07.100362+00:00 node-17 puppet-apply notice: (Scope(Class[main])) MODULAR: rabbitmq.pp
- node-17 was kept/recovered in the cluster with 16,18 all the time, behaved OK
- cluster failed first:
- 2015-12-29T13:16:19.527375+00:00 node-19 lrmd err: ERROR: p_rabbitmq-server: get_monitor(): 'rabbitmqctl list_queues' timed out 1 of max. 1 time(s) in a row and is not responding. The resource is failed.
- node-18 appeared out of cluster, was never recovered after that
- 2015-12-29T13:18:27.271940+00:00 node-18 lrmd err: ERROR: p_rabbitmq-server: get_monitor(): rabbit node is running out of the cluster
- partitions started:
- 2015-12-29T13:17:56.239430+00:00 node-18 rabbitmq notice: Mnesia('rabbit@node-18'): ERROR mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-19'}
- 2015-12-29T13:17:56.697605+00:00 node-19 rabbitmq notice: Mnesia('rabbit@node-19'): ERROR mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-16'}
- 2015-12-29T13:17:56.697605+00:00 node-19 rabbitmq notice: Mnesia('rabbit@node-19'): ERROR mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-17'}
- 2015-12-29T13:17:56.697605+00:00 node-19 rabbitmq notice: Mnesia('rabbit@node-19'): ERROR mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-18'}
- 2015-12-29T13:18:02.047500+00:00 node-17 rabbitmq notice: Mnesia('rabbit@node-17'): ERROR mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-18'}
- 2015-12-29T13:18:02.047500+00:00 node-17 rabbitmq notice: Mnesia('rabbit@node-17'): ERROR mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-19'}
- node-20 appeared out of cluster (the partition without the node-16), and recovered:
- 2015-12-29T13:18:28.257567+00:00 node-20 lrmd err: ERROR: p_rabbitmq-server: get_monitor(): rabbit node is running out of the cluster
- 2015-12-29T13:19:14.946196+00:00 node-20 lrmd info: INFO: p_rabbitmq-server: get_monitor(): rabbit app is running. master is node-16.test.domain.local
- 2015-12-29T13:19:15.553883+00:00 node-20 lrmd info: INFO: p_rabbitmq-server: get_monitor(): rabbit app is running and is member of healthy cluster
- node-19 recovered in the partition with node-16:
- 2015-12-29T13:19:26.134129+00:00 node-17 lrmd info: INFO: p_rabbitmq-server: get_monitor(): rabbit app is running and is member of healthy cluster
- node-20 failed and was never recovered after that:
- 2015-12-29T13:20:15.810503+00:00 node-20 lrmd err: ERROR: p_rabbitmq-server: get_monitor(): 'rabbitmqctl list_channels' timed out 4 of max. 1 time(s) in a row and is not responding. The resource is failed.
- 2015-12-29T13:20:45.944635+00:00 node-20 lrmd info: INFO: p_rabbitmq-server: get_monitor(): CHECK LEVEL IS: 0
- 2015-12-29T13:20:51.433388+00:00 node-20 lrmd info: INFO: get_status(): app kernel was not found in command output: []
- 2015-12-29T13:20:51.436963+00:00 node-20 lrmd info: INFO: p_rabbitmq-server: get_monitor(): get_status() returns 7.
- 2015-12-29T13:20:51.440569+00:00 node-20 lrmd info: INFO: p_rabbitmq-server: get_monitor(): ensuring this slave does not get promoted.
- (repeats)
- node-19 died/recovered few times:
- 2015-12-29T13:20:26.355010+00:00 node-19 lrmd err: ERROR: p_rabbitmq-server: get_monitor(): 'rabbitmqctl list_channels' timed out 1 of max. 1 time(s) in a row and is not responding. The resource is
- failed.
- 2015-12-29T13:24:44.938585+00:00 node-19 lrmd info: INFO: p_rabbitmq-server: get_monitor(): rabbit app is running and is member of healthy cluster
- 2015-12-29T13:26:49.262999+00:00 node-19 lrmd err: ERROR: p_rabbitmq-server: get_monitor(): 'rabbitmqctl list_channels' timed out 2 of max. 1 time(s) in a row and is not responding. The resource is failed.
- partitions seem ended:
- 2015-12-29T13:28:27.258277+00:00 node-19 rabbitmq notice: Autoheal request sent to 'rabbit@node-16'
- 2015-12-29T13:28:27.258277+00:00 node-19 rabbitmq notice: Mnesia('rabbit@node-19'): ERROR mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-16'}
- 2015-12-29T13:28:27.258277+00:00 node-19 rabbitmq notice: Mnesia('rabbit@node-19'): ERROR mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-18'}
- 2015-12-29T13:28:30.913524+00:00 node-18 rabbitmq notice: Mnesia('rabbit@node-18'): ERROR mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-19'}
- 2015-12-29T13:28:30.913524+00:00 node-18 rabbitmq-sasl notice: {inconsistent_database,running_partitioned_network,
- node-19 entered the join failure loop (2/139 exit codes) for ever:
- 2015-12-29T13:30:47.878276+00:00 node-19 lrmd info: INFO: p_rabbitmq-server: su_rabbit_cmd(): the invoked command exited 2: /usr/sbin/rabbitmqctl join_cluster rabbit@node-16
- dump definitions task was started and failed the deployment as the node-19 was not in the rabbit cluster:
- 2015-12-29T13:42:37.575708+00:00 node-19 puppet-apply notice: (Scope(Class[main])) MODULAR: dump_rabbitmq_definitions.pp
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement