Advertisement
Guest User

bug1333143

a guest
Jul 1st, 2014
137
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 6.19 KB | None | 0 0
  1. 0)
  2. #### Given
  3. public_vip: 10.108.6.2
  4. management_vip: 10.108.7.2
  5. node-1 10.108.7.3
  6. node-3 10.108.7.5
  7. node-4 10.108.7.6
  8. node-2 (compute) 10.108.7.4
  9.  
  10. #### Pre destroy status
  11. vip__management_old (ocf::mirantis:ns_IPaddr2): Started node-1.test.domain.local
  12. vip__public_old (ocf::mirantis:ns_IPaddr2): Started node-1.test.domain.local
  13. p_neutron-l3-agent (ocf::mirantis:neutron-agent-l3): Started node-1.test.domain.local
  14.  
  15. #### Post destroy node-1 status
  16. vip__management_old (ocf::mirantis:ns_IPaddr2): Started node-3.test.domain.local
  17. vip__public_old (ocf::mirantis:ns_IPaddr2): Started node-4.test.domain.local
  18. p_neutron-l3-agent (ocf::mirantis:neutron-agent-l3): Started node-3.test.domain.local
  19.  
  20. Errors repeated in node-2 nova compute logs (for ever).
  21. 2014-07-01 10:39:58.945 2114 ERROR nova.servicegroup.drivers.db [-] model server went away
  22. 2014-07-01 10:39:58.945 2114 TRACE nova.servicegroup.drivers.db MessagingTimeout: Timed out waiting for a reply to message ID 3d582b54848b499cba75183a7e4c429f
  23. 2014-07-01 10:39:58.964 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  24.  
  25. Recurrence intervals are short 2-5 min, or less than 1 min:
  26. ...
  27. 2014-07-01 08:14:34.301 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  28. 2014-07-01 08:19:14.457 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  29. 2014-07-01 08:24:14.594 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  30. 2014-07-01 08:29:15.586 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  31. 2014-07-01 08:48:06.143 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  32. 2014-07-01 08:52:16.261 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  33. 2014-07-01 08:55:16.994 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  34. 2014-07-01 08:58:16.511 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  35. 2014-07-01 09:03:16.666 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  36. 2014-07-01 09:07:17.471 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  37. 2014-07-01 09:25:07.867 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  38. 2014-07-01 09:30:28.010 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  39. 2014-07-01 09:38:58.346 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  40. 2014-07-01 09:40:58.883 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  41. 2014-07-01 09:51:39.152 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  42. 2014-07-01 09:56:40.077 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  43. 2014-07-01 10:02:21.604 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  44. 2014-07-01 10:08:31.760 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  45. 2014-07-01 10:10:32.500 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  46. 2014-07-01 10:19:52.690 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  47. 2014-07-01 10:22:53.317 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  48. 2014-07-01 10:39:58.964 2114 ERROR nova.servicegroup.drivers.db [-] Recovered model server connection!
  49. ...
  50.  
  51. 1)Get rmq connections from node-2:
  52. [root@node-2 ~]# ss -untap | grep 567
  53. tcp ESTAB 0 0 10.108.7.4:54807 10.108.7.5:5673 users:(("nova-compute",2114,19))
  54. ...
  55. All amqp sessions are established to node-3
  56.  
  57. 2)Check for non empty (or growing) queues
  58. [root@node-3 ~]# rabbitmqctl list_queues name messages | egrep -v "0|^$"
  59. Listing queues ...
  60. notifications.info 129
  61. q-agent-notifier-security_group-update_fanout_8d465979f67e4bc785b1c6257ddb99e8 15
  62. ...done.
  63. We have two queues growing slowly
  64.  
  65. 3)Check for half-open conections at node-3
  66. [root@node-3 ~]# ip netns exec haproxy ss -untap | grep 'SYN-SENT'
  67. tcp SYN-SENT 0 1 10.108.7.2:33460 10.108.7.3:3307 users:(("haproxy",32431,84))
  68. ...
  69. there are variable number of half-open TCP connections from OSt services management_vip (10.108.7.2) to 'destroyed' node-1 management IP (10.108.7.3)
  70. And the same for node-4 and public VIP
  71. [root@node-4 ~]# ip netns exec haproxy ss -untap | grep 'SYN-SENT'
  72. tcp SYN-SENT 0 1 10.108.6.2:39458 10.108.7.3:9292 users:(("haproxy",22207,34))
  73. ...
  74. there are variable number of half-open TCP connections from OSt services public_vip (10.108.7.2) to 'destroyed' node-1 management IP (10.108.7.3)
  75.  
  76. 4) Check OSTF for smoke faulires
  77. [root@nailgun site-packages]# fuel health --env 1 --check smoke | grep failure
  78. ...
  79. [27 of 30] [failure] 'Launch instance' (176.5 s) Timed out waiting to become ACTIVE Please refer to OpenStack logs for more details.
  80. [28 of 30] [failure] 'Check network connectivity from instance via floating IP' (166.2 s) Timed out waiting to become ACTIVE Please refer to OpenStack logs for more details.
  81. Note: test 17 failure expected (we've destroyed node-1 controller)
  82.  
  83. 5)Restart rabbitmq service at node-3 and check rmq cluster status
  84. (Hanged on gracefull restart, so was killed)...
  85. [{nodes,[{disc,['rabbit@node-1','rabbit@node-3','rabbit@node-4']}]},
  86. {running_nodes,['rabbit@node-4']},
  87. {partitions,[]}]
  88. is OK
  89.  
  90. 6) Start rmq server at node-3 and recheck cluster-status...
  91. Is OK.
  92.  
  93. 7) Run smoke tests again
  94. ...
  95. Note: test 17 failure expected (we've destroyed node-1 controller)
  96. Test 27 and 28 are no more failing.
  97.  
  98. 8) Recheck compute for rmq sessions and errors in logs
  99. [root@node-2 ~]# ss -untap | grep 567
  100. tcp ESTAB 0 0 10.108.7.4:43031 10.108.7.6:5673 users:(("nova-compute",2114,21))
  101. ...
  102. Compute node's amqp sessions have been reconnected to node-4 and
  103. grep -r /var/log/nova/compute.log -e 'Timed out' -e 'Recovered model' -e 'went away'
  104. shows no new errors in its logs now.
  105.  
  106. 9) Recheck syn-sent states
  107. Nothing has changed, syn-sent sessions still exist on both controllers and vary in time. Probably is not an issue?
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement