123

first rabbit node-16 got deployed and never failed since then, behaved OK:
2015-12-29T12:34:29.287391+00:00 info:  INFO: p_rabbitmq-server: get_monitor(): rabbit app is running and is member of healthy cluster

started rabbit deploy task on other controllers:
2015-12-29T13:03:07.100362+00:00  node-17                puppet-apply notice:  (Scope(Class[main])) MODULAR: rabbitmq.pp

node-17 was kept/recovered in the cluster with 16,18 all the time, behaved OK

cluster failed first:
2015-12-29T13:16:19.527375+00:00  node-19                        lrmd err:  ERROR: p_rabbitmq-server: get_monitor(): 'rabbitmqctl list_queues' timed out 1 of max. 1 time(s) in a row and is not responding. The resource is failed.

node-18 appeared out of cluster, was never recovered after that
2015-12-29T13:18:27.271940+00:00  node-18                        lrmd err:  ERROR: p_rabbitmq-server: get_monitor(): rabbit node is running out of the cluster

partitions started:
2015-12-29T13:17:56.239430+00:00  node-18                    rabbitmq notice:  Mnesia('rabbit@node-18'):  ERROR  mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-19'}
2015-12-29T13:17:56.697605+00:00  node-19                    rabbitmq notice:  Mnesia('rabbit@node-19'):  ERROR  mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-16'}
2015-12-29T13:17:56.697605+00:00  node-19                    rabbitmq notice:  Mnesia('rabbit@node-19'):  ERROR  mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-17'}
2015-12-29T13:17:56.697605+00:00  node-19                    rabbitmq notice:  Mnesia('rabbit@node-19'):  ERROR  mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-18'}
2015-12-29T13:18:02.047500+00:00  node-17                    rabbitmq notice:  Mnesia('rabbit@node-17'):  ERROR  mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-18'}
2015-12-29T13:18:02.047500+00:00  node-17                    rabbitmq notice:  Mnesia('rabbit@node-17'):  ERROR  mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-19'}

node-20 appeared out of cluster (the partition without the node-16), and recovered:
2015-12-29T13:18:28.257567+00:00 node-20                        lrmd err:  ERROR: p_rabbitmq-server: get_monitor(): rabbit node is running out of the cluster
2015-12-29T13:19:14.946196+00:00 node-20                        lrmd info:  INFO: p_rabbitmq-server: get_monitor(): rabbit app is running. master is node-16.test.domain.local
2015-12-29T13:19:15.553883+00:00 node-20                        lrmd info:  INFO: p_rabbitmq-server: get_monitor(): rabbit app is running and is member of healthy cluster

node-19 recovered in the partition with node-16:
2015-12-29T13:19:26.134129+00:00  node-17                       lrmd info:  INFO: p_rabbitmq-server: get_monitor(): rabbit app is running and is member of healthy cluster

node-20 failed and was never recovered after that:
2015-12-29T13:20:15.810503+00:00 node-20                        lrmd err:  ERROR: p_rabbitmq-server: get_monitor(): 'rabbitmqctl list_channels' timed out 4 of max. 1 time(s) in a row and is not responding. The resource is failed.
2015-12-29T13:20:45.944635+00:00 node-20                        lrmd info:  INFO: p_rabbitmq-server: get_monitor(): CHECK LEVEL IS: 0
2015-12-29T13:20:51.433388+00:00 node-20                        lrmd info:  INFO: get_status(): app kernel was not found in command output: []
2015-12-29T13:20:51.436963+00:00 node-20                        lrmd info:  INFO: p_rabbitmq-server: get_monitor(): get_status() returns 7.
2015-12-29T13:20:51.440569+00:00 node-20                        lrmd info:  INFO: p_rabbitmq-server: get_monitor(): ensuring this slave does not get promoted.
(repeats)

node-19 died/recovered few times:
2015-12-29T13:20:26.355010+00:00 node-19                        lrmd err:  ERROR: p_rabbitmq-server: get_monitor(): 'rabbitmqctl list_channels' timed out 1 of max. 1 time(s) in a row and is not responding. The resource is
failed.
2015-12-29T13:24:44.938585+00:00 node-19                        lrmd info:  INFO: p_rabbitmq-server: get_monitor(): rabbit app is running and is member of healthy cluster
2015-12-29T13:26:49.262999+00:00 node-19                        lrmd err:  ERROR: p_rabbitmq-server: get_monitor(): 'rabbitmqctl list_channels' timed out 2 of max. 1 time(s) in a row and is not responding. The resource is failed.

partitions seem ended:
2015-12-29T13:28:27.258277+00:00  node-19                    rabbitmq notice:  Autoheal request sent to 'rabbit@node-16'
2015-12-29T13:28:27.258277+00:00  node-19                    rabbitmq notice:  Mnesia('rabbit@node-19'):  ERROR  mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-16'}
2015-12-29T13:28:27.258277+00:00  node-19                    rabbitmq notice:  Mnesia('rabbit@node-19'):  ERROR  mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-18'}
2015-12-29T13:28:30.913524+00:00  node-18                    rabbitmq notice:  Mnesia('rabbit@node-18'):  ERROR  mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@node-19'}
2015-12-29T13:28:30.913524+00:00  node-18               rabbitmq-sasl notice:                       {inconsistent_database,running_partitioned_network,

node-19 entered the join failure loop (2/139 exit codes) for ever:
2015-12-29T13:30:47.878276+00:00 node-19                        lrmd info:  INFO: p_rabbitmq-server: su_rabbit_cmd(): the invoked command exited 2: /usr/sbin/rabbitmqctl join_cluster rabbit@node-16

dump definitions task was started and failed the deployment as the node-19 was not in the rabbit cluster:
2015-12-29T13:42:37.575708+00:00  node-19                puppet-apply notice:  (Scope(Class[main])) MODULAR: dump_rabbitmq_definitions.pp