Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- 0)
- #### Given
- public_vip: 10.108.6.2
- management_vip: 10.108.7.2
- node-1 10.108.7.3
- node-3 10.108.7.5
- node-4 10.108.7.6
- node-2 (compute) 10.108.7.4
- #### Pre destroy status
- vip__management_old (ocf::mirantis:ns_IPaddr2): Started node-1.test.domain.local
- vip__public_old (ocf::mirantis:ns_IPaddr2): Started node-1.test.domain.local
- p_neutron-l3-agent (ocf::mirantis:neutron-agent-l3): Started node-1.test.domain.local
- #### Post destroy node-1 status
- vip__management_old (ocf::mirantis:ns_IPaddr2): Started node-3.test.domain.local
- vip__public_old (ocf::mirantis:ns_IPaddr2): Started node-4.test.domain.local
- p_neutron-l3-agent (ocf::mirantis:neutron-agent-l3): Started node-3.test.domain.local
- Errors repeated in node-2 nova compute logs (for ever).
- 2014-07-01 10:39:58.945 2114 ERROR nova.servicegroup.drivers.db [-] model server went away
- 2014-07-01 10:39:58.945 2114 TRACE nova.servicegroup.drivers.db MessagingTimeout: Timed out waiting for a reply to message ID 3d582b54848b499cba75183a7e4c429f
- 2014-07-01 10:39:58.964 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- Recurrence intervals are short 2-5 min, or less than 1 min:
- ...
- 2014-07-01 08:14:34.301 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- 2014-07-01 08:19:14.457 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- 2014-07-01 08:24:14.594 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- 2014-07-01 08:29:15.586 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- 2014-07-01 08:48:06.143 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- 2014-07-01 08:52:16.261 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- 2014-07-01 08:55:16.994 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- 2014-07-01 08:58:16.511 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- 2014-07-01 09:03:16.666 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- 2014-07-01 09:07:17.471 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- 2014-07-01 09:25:07.867 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- 2014-07-01 09:30:28.010 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- 2014-07-01 09:38:58.346 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- 2014-07-01 09:40:58.883 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- 2014-07-01 09:51:39.152 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- 2014-07-01 09:56:40.077 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- 2014-07-01 10:02:21.604 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- 2014-07-01 10:08:31.760 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- 2014-07-01 10:10:32.500 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- 2014-07-01 10:19:52.690 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- 2014-07-01 10:22:53.317 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- 2014-07-01 10:39:58.964 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
- ...
- 1)Get rmq connections from node-2:
- [root@node-2 ~]# ss -untap | grep 567
- tcp ESTAB 0 0 10.108.7.4:54807 10.108.7.5:5673 users:(("nova-compute",2114,19))
- ...
- All amqp sessions are established to node-3
- 2)Check for non empty (or growing) queues
- [root@node-3 ~]# rabbitmqctl list_queues name messages | egrep -v "0|^$"
- Listing queues ...
- notifications.info 129
- q-agent-notifier-security_group-update_fanout_8d465979f67e4bc785b1c6257ddb99e8 15
- ...done.
- We have two queues growing slowly
- 3)Check for half-open conections at node-3
- [root@node-3 ~]# ip netns exec haproxy ss -untap | grep 'SYN-SENT'
- tcp SYN-SENT 0 1 10.108.7.2:33460 10.108.7.3:3307 users:(("haproxy",32431,84))
- ...
- there are variable number of half-open TCP connections from OSt services management_vip (10.108.7.2) to 'destroyed' node-1 management IP (10.108.7.3)
- And the same for node-4 and public VIP
- [root@node-4 ~]# ip netns exec haproxy ss -untap | grep 'SYN-SENT'
- tcp SYN-SENT 0 1 10.108.6.2:39458 10.108.7.3:9292 users:(("haproxy",22207,34))
- ...
- there are variable number of half-open TCP connections from OSt services public_vip (10.108.7.2) to 'destroyed' node-1 management IP (10.108.7.3)
- 4) Check OSTF for smoke faulires
- [root@nailgun site-packages]# fuel health --env 1 --check smoke | grep failure
- ...
- [27 of 30] [failure] 'Launch instance' (176.5 s) Timed out waiting to become ACTIVE Please refer to OpenStack logs for more details.
- [28 of 30] [failure] 'Check network connectivity from instance via floating IP' (166.2 s) Timed out waiting to become ACTIVE Please refer to OpenStack logs for more details.
- Note: test 17 failure expected (we've destroyed node-1 controller)
- 5)Restart rabbitmq service at node-3 and check rmq cluster status
- (Hanged on gracefull restart, so was killed)...
- [{nodes,[{disc,['rabbit@node-1','rabbit@node-3','rabbit@node-4']}]},
- {running_nodes,['rabbit@node-4']},
- {partitions,[]}]
- is OK
- 6) Start rmq server at node-3 and recheck cluster-status...
- Is OK.
- 7) Run smoke tests again
- ...
- Note: test 17 failure expected (we've destroyed node-1 controller)
- Test 27 and 28 are no more failing.
- 8) Recheck compute for rmq sessions and errors in logs
- [root@node-2 ~]# ss -untap | grep 567
- tcp ESTAB 0 0 10.108.7.4:43031 10.108.7.6:5673 users:(("nova-compute",2114,21))
- ...
- Compute node's amqp sessions have been reconnected to node-4 and
- grep -r /var/log/nova/compute.log -e 'Timed out' -e 'Recovered model' -e 'went away'
- shows no new errors in its logs now.
- 9) Recheck syn-sent states
- Nothing has changed, syn-sent sessions still exist on both controllers and vary in time. Probably is not an issue?
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement