Advertisement
digimer

Untitled

Oct 23rd, 2013
205
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 28.85 KB | None | 0 0
  1. # At this time, the fence handler was set to 644 and then both switches were powered off.
  2.  
  3. Oct 23 14:01:55 an-c05n02 kernel: igb: eth4 NIC Link is Down
  4. Oct 23 14:01:55 an-c05n02 kernel: e1000e: eth3 NIC Link is Down
  5. Oct 23 14:01:55 an-c05n02 kernel: igb: eth5 NIC Link is Down
  6. Oct 23 14:01:55 an-c05n02 kernel: e1000e: eth2 NIC Link is Down
  7. Oct 23 14:01:55 an-c05n02 kernel: igb: eth0 NIC Link is Down
  8. Oct 23 14:01:55 an-c05n02 kernel: igb: eth1 NIC Link is Down
  9. Oct 23 14:01:55 an-c05n02 kernel: bonding: bond2: link status definitely down for interface eth2, disabling it
  10. Oct 23 14:01:55 an-c05n02 kernel: device eth2 left promiscuous mode
  11. Oct 23 14:01:55 an-c05n02 kernel: bonding: bond2: now running without any active interface !
  12. Oct 23 14:01:55 an-c05n02 kernel: bonding: bond2: link status definitely down for interface eth5, disabling it
  13. Oct 23 14:01:55 an-c05n02 kernel: bonding: bond1: link status definitely down for interface eth1, disabling it
  14. Oct 23 14:01:55 an-c05n02 kernel: bonding: bond1: now running without any active interface !
  15. Oct 23 14:01:55 an-c05n02 kernel: bonding: bond1: link status definitely down for interface eth4, disabling it
  16. Oct 23 14:01:55 an-c05n02 kernel: bonding: bond0: link status definitely down for interface eth0, disabling it
  17. Oct 23 14:01:55 an-c05n02 kernel: bonding: bond0: now running without any active interface !
  18. Oct 23 14:01:55 an-c05n02 kernel: bonding: bond0: link status definitely down for interface eth3, disabling it
  19. Oct 23 14:01:56 an-c05n02 kernel: vbr2: port 1(bond2) entering disabled state
  20. Oct 23 14:01:56 an-c05n02 kernel: block drbd1: PingAck did not arrive in time.
  21. Oct 23 14:01:56 an-c05n02 kernel: block drbd1: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 )
  22. Oct 23 14:01:56 an-c05n02 kernel: block drbd1: asender terminated
  23. Oct 23 14:01:56 an-c05n02 kernel: block drbd1: Terminating drbd1_asender
  24. Oct 23 14:01:56 an-c05n02 kernel: block drbd1: Connection closed
  25. Oct 23 14:01:56 an-c05n02 kernel: block drbd1: conn( NetworkFailure -> Unconnected )
  26. Oct 23 14:01:56 an-c05n02 kernel: block drbd1: receiver terminated
  27. Oct 23 14:01:56 an-c05n02 kernel: block drbd1: Restarting drbd1_receiver
  28. Oct 23 14:01:56 an-c05n02 kernel: block drbd1: receiver (re)started
  29. Oct 23 14:01:56 an-c05n02 kernel: block drbd1: conn( Unconnected -> WFConnection )
  30. Oct 23 14:01:56 an-c05n02 kernel: block drbd1: helper command: /sbin/drbdadm fence-peer minor-1
  31. Oct 23 14:01:56 an-c05n02 kernel: block drbd1: helper command: /sbin/drbdadm fence-peer minor-1 exit code 10 (0xa00)
  32.  
  33. # This is the "error 10"
  34. Oct 23 14:01:56 an-c05n02 kernel: block drbd1: fence-peer helper broken, returned 10
  35.  
  36. Oct 23 14:02:04 an-c05n02 corosync[6754]: [TOTEM ] A processor failed, forming new configuration.
  37. Oct 23 14:02:06 an-c05n02 corosync[6754]: [QUORUM] Members[1]: 2
  38. Oct 23 14:02:06 an-c05n02 corosync[6754]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
  39. Oct 23 14:02:06 an-c05n02 kernel: dlm: closing connection to node 1
  40. Oct 23 14:02:06 an-c05n02 corosync[6754]: [CPG ] chosen downlist: sender r(0) ip(10.20.50.2) ; members(old:2 left:1)
  41. Oct 23 14:02:06 an-c05n02 corosync[6754]: [MAIN ] Completed service synchronization, ready to provide service.
  42. Oct 23 14:02:06 an-c05n02 fenced[6807]: fencing node an-c05n01.alteeve.ca
  43. Oct 23 14:02:06 an-c05n02 kernel: GFS2: fsid=an-cluster-A:shared.1: jid=0: Trying to acquire journal lock...
  44. Oct 23 14:02:13 an-c05n02 kernel: block drbd0: PingAck did not arrive in time.
  45. Oct 23 14:02:13 an-c05n02 kernel: block drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 )
  46. Oct 23 14:02:13 an-c05n02 kernel: block drbd0: asender terminated
  47. Oct 23 14:02:13 an-c05n02 kernel: block drbd0: Terminating drbd0_asender
  48. Oct 23 14:02:13 an-c05n02 kernel: block drbd0: Connection closed
  49. Oct 23 14:02:13 an-c05n02 kernel: block drbd0: conn( NetworkFailure -> Unconnected )
  50. Oct 23 14:02:13 an-c05n02 kernel: block drbd0: receiver terminated
  51. Oct 23 14:02:13 an-c05n02 kernel: block drbd0: Restarting drbd0_receiver
  52. Oct 23 14:02:13 an-c05n02 kernel: block drbd0: receiver (re)started
  53. Oct 23 14:02:13 an-c05n02 kernel: block drbd0: conn( Unconnected -> WFConnection )
  54. Oct 23 14:02:13 an-c05n02 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0
  55. Oct 23 14:02:13 an-c05n02 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code 10 (0xa00)
  56. Oct 23 14:02:13 an-c05n02 kernel: block drbd0: fence-peer helper broken, returned 10
  57. Oct 23 14:02:31 an-c05n02 fenced[6807]: fence an-c05n01.alteeve.ca dev 0.0 agent fence_ipmilan result: error from agent
  58. Oct 23 14:02:31 an-c05n02 fenced[6807]: fence an-c05n01.alteeve.ca dev 1.0 agent fence_apc_snmp result: error from agent
  59. Oct 23 14:02:31 an-c05n02 fenced[6807]: fence an-c05n01.alteeve.ca failed
  60. Oct 23 14:02:35 an-c05n02 fenced[6807]: fencing node an-c05n01.alteeve.ca
  61. Oct 23 14:03:00 an-c05n02 fenced[6807]: fence an-c05n01.alteeve.ca dev 0.0 agent fence_ipmilan result: error from agent
  62. Oct 23 14:03:00 an-c05n02 fenced[6807]: fence an-c05n01.alteeve.ca dev 1.0 agent fence_apc_snmp result: error from agent
  63. Oct 23 14:03:00 an-c05n02 fenced[6807]: fence an-c05n01.alteeve.ca failed
  64. Oct 23 14:03:03 an-c05n02 fenced[6807]: fencing node an-c05n01.alteeve.ca
  65. Oct 23 14:03:28 an-c05n02 fenced[6807]: fence an-c05n01.alteeve.ca dev 0.0 agent fence_ipmilan result: error from agent
  66. Oct 23 14:03:28 an-c05n02 fenced[6807]: fence an-c05n01.alteeve.ca dev 1.0 agent fence_apc_snmp result: error from agent
  67. Oct 23 14:03:28 an-c05n02 fenced[6807]: fence an-c05n01.alteeve.ca failed
  68. Oct 23 14:03:54 an-c05n02 kernel: e1000e: eth3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
  69. Oct 23 14:03:54 an-c05n02 kernel: bonding: bond0: link status up for interface eth3, enabling it in 0 ms.
  70. Oct 23 14:03:54 an-c05n02 kernel: bond0: link status definitely up for interface eth3, 1000 Mbps full duplex.
  71. Oct 23 14:03:54 an-c05n02 kernel: bonding: bond0: making interface eth3 the new active one.
  72. Oct 23 14:03:54 an-c05n02 kernel: bonding: bond0: first active interface up!
  73. Oct 23 14:03:54 an-c05n02 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
  74. Oct 23 14:03:54 an-c05n02 kernel: bonding: bond2: link status up for interface eth2, enabling it in 0 ms.
  75. Oct 23 14:03:54 an-c05n02 kernel: bond2: link status definitely up for interface eth2, 1000 Mbps full duplex.
  76. Oct 23 14:03:54 an-c05n02 kernel: bonding: bond2: making interface eth2 the new active one.
  77. Oct 23 14:03:54 an-c05n02 kernel: device eth2 entered promiscuous mode
  78. Oct 23 14:03:54 an-c05n02 kernel: bonding: bond2: first active interface up!
  79. Oct 23 14:03:54 an-c05n02 kernel: vbr2: port 1(bond2) entering forwarding state
  80. Oct 23 14:03:54 an-c05n02 corosync[6754]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
  81. Oct 23 14:03:54 an-c05n02 corosync[6754]: [QUORUM] Members[2]: 1 2
  82. Oct 23 14:03:54 an-c05n02 corosync[6754]: [QUORUM] Members[2]: 1 2
  83. Oct 23 14:03:54 an-c05n02 corosync[6754]: [CPG ] chosen downlist: sender r(0) ip(10.20.50.1) ; members(old:1 left:0)
  84. Oct 23 14:03:54 an-c05n02 corosync[6754]: [MAIN ] Completed service synchronization, ready to provide service.
  85. Oct 23 14:03:54 an-c05n02 gfs_controld[6882]: receive_start 1:4 add node with started_count 3
  86. Oct 23 14:03:56 an-c05n02 kernel: igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
  87. Oct 23 14:03:56 an-c05n02 kernel: bonding: bond1: link status up for interface eth1, enabling it in 0 ms.
  88. Oct 23 14:03:56 an-c05n02 kernel: bond1: link status definitely up for interface eth1, 1000 Mbps full duplex.
  89. Oct 23 14:03:56 an-c05n02 kernel: bonding: bond1: making interface eth1 the new active one.
  90. Oct 23 14:03:56 an-c05n02 kernel: bonding: bond1: first active interface up!
  91. Oct 23 14:03:56 an-c05n02 kernel: igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
  92. Oct 23 14:03:56 an-c05n02 kernel: bonding: bond0: link status up for interface eth0, enabling it in 12000 ms.
  93. Oct 23 14:03:56 an-c05n02 fenced[6807]: fence an-c05n01.alteeve.ca dev 0.0 agent fence_ipmilan result: error from agent
  94.  
  95. # Switches recover, cman finally fences node 1
  96. Oct 23 14:03:56 an-c05n02 fenced[6807]: fence an-c05n01.alteeve.ca success
  97.  
  98. Oct 23 14:03:57 an-c05n02 kernel: igb: eth4 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
  99. Oct 23 14:03:57 an-c05n02 kernel: bonding: bond1: link status up for interface eth4, enabling it in 12000 ms.
  100. Oct 23 14:03:57 an-c05n02 kernel: igb: eth5 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
  101. Oct 23 14:03:57 an-c05n02 kernel: bonding: bond2: link status up for interface eth5, enabling it in 12000 ms.
  102. Oct 23 14:04:05 an-c05n02 corosync[6754]: [TOTEM ] A processor failed, forming new configuration.
  103. Oct 23 14:04:07 an-c05n02 corosync[6754]: [QUORUM] Members[1]: 2
  104. Oct 23 14:04:07 an-c05n02 corosync[6754]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
  105. Oct 23 14:04:07 an-c05n02 kernel: dlm: closing connection to node 1
  106. Oct 23 14:04:07 an-c05n02 corosync[6754]: [CPG ] chosen downlist: sender r(0) ip(10.20.50.2) ; members(old:2 left:1)
  107. Oct 23 14:04:07 an-c05n02 corosync[6754]: [MAIN ] Completed service synchronization, ready to provide service.
  108. Oct 23 14:04:07 an-c05n02 rgmanager[7487]: Marking service:storage_an01 as stopped: Restricted domain unavailable
  109. Oct 23 14:04:07 an-c05n02 rgmanager[7487]: Taking over service vm:vm01-dev from down member an-c05n01.alteeve.ca
  110. Oct 23 14:04:07 an-c05n02 rgmanager[7487]: Marking service:storage_an01 as stopped: Restricted domain unavailable
  111. Oct 23 14:04:08 an-c05n02 kernel: bond0: link status definitely up for interface eth0, 1000 Mbps full duplex.
  112. Oct 23 14:04:08 an-c05n02 kernel: bonding: bond0: making interface eth0 the new active one.
  113. Oct 23 14:04:09 an-c05n02 kernel: bond1: link status definitely up for interface eth4, 1000 Mbps full duplex.
  114. Oct 23 14:04:09 an-c05n02 kernel: bond2: link status definitely up for interface eth5, 1000 Mbps full duplex.
  115. Oct 23 14:04:09 an-c05n02 kernel: vbr2: port 1(bond2) entering forwarding state
  116.  
  117. # At this point, DRBD still hasn't fenced because the handler is still broken.
  118. Oct 23 14:04:21 an-c05n02 kernel: INFO: task gfs2_quotad:9195 blocked for more than 120 seconds.
  119. Oct 23 14:04:21 an-c05n02 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  120. <snip gfs2 traces>
  121. Oct 23 14:08:21 an-c05n02 kernel: [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
  122. Oct 23 14:20:54 an-c05n02 libvirtd: Could not find keytab file: /etc/libvirt/krb5.tab: No such file or directory
  123. Oct 23 14:20:54 an-c05n02 kernel: lo: Disabled Privacy Extensions
  124.  
  125.  
  126. Oct 23 14:28:58 an-c05n02 kernel: block drbd0: Handshake successful: Agreed network protocol version 97
  127. Oct 23 14:28:58 an-c05n02 kernel: block drbd0: conn( WFConnection -> WFReportParams )
  128. Oct 23 14:28:58 an-c05n02 kernel: block drbd0: Starting asender thread (from drbd0_receiver [8973])
  129. Oct 23 14:28:58 an-c05n02 kernel: block drbd0: data-integrity-alg: <not-used>
  130. Oct 23 14:28:58 an-c05n02 kernel: block drbd0: drbd_sync_handshake:
  131. Oct 23 14:28:58 an-c05n02 kernel: block drbd0: self C9F77959FD4A71D3:0000000000000000:282E11A241B8A05A:282D11A241B8A05B bits:0 flags:0
  132. Oct 23 14:28:58 an-c05n02 kernel: block drbd0: peer C9F77959FD4A71D2:0000000000000000:282E11A241B8A05B:282D11A241B8A05B bits:263168 flags:2
  133. Oct 23 14:28:58 an-c05n02 kernel: block drbd0: uuid_compare()=-1 by rule 40
  134. Oct 23 14:28:58 an-c05n02 kernel: block drbd0: I shall become SyncTarget, but I am primary!
  135. Oct 23 14:28:58 an-c05n02 kernel: block drbd0: conn( WFReportParams -> Disconnecting )
  136. Oct 23 14:28:58 an-c05n02 kernel: block drbd0: error receiving ReportState, l: 4!
  137. Oct 23 14:28:58 an-c05n02 kernel: block drbd0: asender terminated
  138. Oct 23 14:28:58 an-c05n02 kernel: block drbd0: Terminating drbd0_asender
  139. Oct 23 14:28:58 an-c05n02 kernel: block drbd0: Connection closed
  140. Oct 23 14:28:58 an-c05n02 kernel: block drbd0: conn( Disconnecting -> StandAlone )
  141. Oct 23 14:28:58 an-c05n02 kernel: block drbd0: receiver terminated
  142. Oct 23 14:28:58 an-c05n02 kernel: block drbd0: Terminating drbd0_receiver
  143. Oct 23 14:28:58 an-c05n02 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0
  144. Oct 23 14:28:58 an-c05n02 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code 126 (0x7e00)
  145. Oct 23 14:28:58 an-c05n02 kernel: block drbd0: fence-peer helper broken, returned 126
  146. Oct 23 14:34:46 an-c05n02 kernel: block drbd0: conn( StandAlone -> Unconnected )
  147. Oct 23 14:34:46 an-c05n02 kernel: block drbd0: Starting receiver thread (from drbd0_worker [8965])
  148. Oct 23 14:34:46 an-c05n02 kernel: block drbd0: receiver (re)started
  149. Oct 23 14:34:46 an-c05n02 kernel: block drbd0: conn( Unconnected -> WFConnection )
  150. Oct 23 14:34:47 an-c05n02 kernel: block drbd0: Handshake successful: Agreed network protocol version 97
  151. Oct 23 14:34:47 an-c05n02 kernel: block drbd0: conn( WFConnection -> WFReportParams )
  152. Oct 23 14:34:47 an-c05n02 kernel: block drbd0: Starting asender thread (from drbd0_receiver [26267])
  153. Oct 23 14:34:47 an-c05n02 kernel: block drbd0: data-integrity-alg: <not-used>
  154. Oct 23 14:34:47 an-c05n02 kernel: block drbd0: drbd_sync_handshake:
  155. Oct 23 14:34:47 an-c05n02 kernel: block drbd0: self C9F77959FD4A71D3:0000000000000000:282E11A241B8A05A:282D11A241B8A05B bits:0 flags:0
  156. Oct 23 14:34:47 an-c05n02 kernel: block drbd0: peer C9F77959FD4A71D2:0000000000000000:282E11A241B8A05B:282D11A241B8A05B bits:263168 flags:2
  157. Oct 23 14:34:47 an-c05n02 kernel: block drbd0: uuid_compare()=-1 by rule 40
  158. Oct 23 14:34:47 an-c05n02 kernel: block drbd0: I shall become SyncTarget, but I am primary!
  159. Oct 23 14:34:47 an-c05n02 kernel: block drbd0: conn( WFReportParams -> Disconnecting )
  160. Oct 23 14:34:47 an-c05n02 kernel: block drbd0: error receiving ReportState, l: 4!
  161. Oct 23 14:34:47 an-c05n02 kernel: block drbd0: asender terminated
  162. Oct 23 14:34:47 an-c05n02 kernel: block drbd0: Terminating drbd0_asender
  163. Oct 23 14:34:47 an-c05n02 kernel: block drbd0: Connection closed
  164. Oct 23 14:34:47 an-c05n02 kernel: block drbd0: conn( Disconnecting -> StandAlone )
  165. Oct 23 14:34:47 an-c05n02 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0
  166. Oct 23 14:34:47 an-c05n02 kernel: block drbd0: receiver terminated
  167. Oct 23 14:34:47 an-c05n02 kernel: block drbd0: Terminating drbd0_receiver
  168. Oct 23 14:34:47 an-c05n02 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code 126 (0x7e00)
  169. Oct 23 14:34:47 an-c05n02 kernel: block drbd0: fence-peer helper broken, returned 126
  170. Oct 23 14:37:46 an-c05n02 udevd[865]: worker [9349] unexpectedly returned with status 0x0100
  171. Oct 23 14:37:46 an-c05n02 udevd[865]: worker [9349] failed while handling '/devices/virtual/block/drbd0'
  172. Oct 23 14:37:48 an-c05n02 kernel: block drbd1: Handshake successful: Agreed network protocol version 97
  173. Oct 23 14:37:48 an-c05n02 kernel: block drbd1: conn( WFConnection -> WFReportParams )
  174. Oct 23 14:37:48 an-c05n02 kernel: block drbd1: Starting asender thread (from drbd1_receiver [7153])
  175. Oct 23 14:37:48 an-c05n02 kernel: block drbd1: data-integrity-alg: <not-used>
  176. Oct 23 14:37:48 an-c05n02 kernel: block drbd1: drbd_sync_handshake:
  177. Oct 23 14:37:48 an-c05n02 kernel: block drbd1: self 8314C38D1738144B:0000000000000000:97927DD1274D9799:97917DD1274D9799 bits:0 flags:0
  178. Oct 23 14:37:48 an-c05n02 kernel: block drbd1: peer 8314C38D1738144A:0000000000000000:97927DD1274D9799:97917DD1274D9799 bits:263168 flags:2
  179. Oct 23 14:37:48 an-c05n02 kernel: block drbd1: uuid_compare()=-1 by rule 40
  180. Oct 23 14:37:48 an-c05n02 kernel: block drbd1: I shall become SyncTarget, but I am primary!
  181. Oct 23 14:37:48 an-c05n02 kernel: block drbd1: conn( WFReportParams -> Disconnecting )
  182. Oct 23 14:37:48 an-c05n02 kernel: block drbd1: error receiving ReportState, l: 4!
  183. Oct 23 14:37:48 an-c05n02 kernel: block drbd1: asender terminated
  184. Oct 23 14:37:48 an-c05n02 kernel: block drbd1: Terminating drbd1_asender
  185. Oct 23 14:37:48 an-c05n02 kernel: block drbd1: Connection closed
  186. Oct 23 14:37:48 an-c05n02 kernel: block drbd1: helper command: /sbin/drbdadm fence-peer minor-1
  187. Oct 23 14:37:48 an-c05n02 kernel: block drbd1: conn( Disconnecting -> StandAlone )
  188. Oct 23 14:37:48 an-c05n02 kernel: block drbd1: receiver terminated
  189. Oct 23 14:37:48 an-c05n02 kernel: block drbd1: Terminating drbd1_receiver
  190. Oct 23 14:37:48 an-c05n02 kernel: block drbd1: helper command: /sbin/drbdadm fence-peer minor-1 exit code 126 (0x7e00)
  191. Oct 23 14:37:48 an-c05n02 kernel: block drbd1: fence-peer helper broken, returned 126
  192. Oct 23 14:44:01 an-c05n02 kernel: block drbd0: conn( StandAlone -> Unconnected )
  193. Oct 23 14:44:01 an-c05n02 kernel: block drbd0: Starting receiver thread (from drbd0_worker [8965])
  194. Oct 23 14:44:01 an-c05n02 kernel: block drbd0: receiver (re)started
  195. Oct 23 14:44:01 an-c05n02 kernel: block drbd0: conn( Unconnected -> WFConnection )
  196. Oct 23 14:44:02 an-c05n02 kernel: block drbd0: Handshake successful: Agreed network protocol version 97
  197. Oct 23 14:44:02 an-c05n02 kernel: block drbd0: conn( WFConnection -> WFReportParams )
  198. Oct 23 14:44:02 an-c05n02 kernel: block drbd0: Starting asender thread (from drbd0_receiver [4278])
  199. Oct 23 14:44:02 an-c05n02 kernel: block drbd0: data-integrity-alg: <not-used>
  200. Oct 23 14:44:02 an-c05n02 kernel: block drbd0: drbd_sync_handshake:
  201. Oct 23 14:44:02 an-c05n02 kernel: block drbd0: self C9F77959FD4A71D3:0000000000000000:282E11A241B8A05A:282D11A241B8A05B bits:0 flags:0
  202. Oct 23 14:44:02 an-c05n02 kernel: block drbd0: peer C9F77959FD4A71D2:0000000000000000:282E11A241B8A05B:282D11A241B8A05B bits:263168 flags:2
  203. Oct 23 14:44:02 an-c05n02 kernel: block drbd0: uuid_compare()=-1 by rule 40
  204. Oct 23 14:44:02 an-c05n02 kernel: block drbd0: I shall become SyncTarget, but I am primary!
  205. Oct 23 14:44:02 an-c05n02 kernel: block drbd0: conn( WFReportParams -> Disconnecting )
  206. Oct 23 14:44:02 an-c05n02 kernel: block drbd0: error receiving ReportState, l: 4!
  207. Oct 23 14:44:02 an-c05n02 kernel: block drbd0: asender terminated
  208. Oct 23 14:44:02 an-c05n02 kernel: block drbd0: Terminating drbd0_asender
  209. Oct 23 14:44:02 an-c05n02 kernel: block drbd0: Connection closed
  210.  
  211. # At this point, the mode of rhcs_fence was set back to 755. This was triggered when 'drbdadm connect r0' was run from this node.
  212. Oct 23 14:44:02 an-c05n02 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0
  213. Oct 23 14:44:02 an-c05n02 kernel: block drbd0: conn( Disconnecting -> StandAlone )
  214. Oct 23 14:44:02 an-c05n02 kernel: block drbd0: receiver terminated
  215. Oct 23 14:44:02 an-c05n02 kernel: block drbd0: Terminating drbd0_receiver
  216. Oct 23 14:44:02 an-c05n02 rhcs_fence: Attempting to fence peer using RHCS from DRBD...
  217. Oct 23 14:44:21 an-c05n02 corosync[6754]: [TOTEM ] A processor failed, forming new configuration.
  218. Oct 23 14:44:23 an-c05n02 corosync[6754]: [QUORUM] Members[1]: 2
  219. Oct 23 14:44:23 an-c05n02 corosync[6754]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
  220. Oct 23 14:44:23 an-c05n02 kernel: dlm: closing connection to node 1
  221. Oct 23 14:44:23 an-c05n02 corosync[6754]: [CPG ] chosen downlist: sender r(0) ip(10.20.50.2) ; members(old:2 left:1)
  222. Oct 23 14:44:23 an-c05n02 corosync[6754]: [MAIN ] Completed service synchronization, ready to provide service.
  223. Oct 23 14:44:23 an-c05n02 fenced[6807]: fencing node an-c05n01.alteeve.ca
  224. Oct 23 14:44:27 an-c05n02 fenced[6807]: fence an-c05n01.alteeve.ca success
  225. Oct 23 14:44:27 an-c05n02 fence_node[4546]: fence an-c05n01.alteeve.ca success
  226. Oct 23 14:44:27 an-c05n02 kernel: block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code 7 (0x700)
  227. Oct 23 14:44:27 an-c05n02 kernel: block drbd0: fence-peer helper returned 7 (peer was stonithed)
  228. Oct 23 14:44:27 an-c05n02 kernel: block drbd0: pdsk( DUnknown -> Outdated )
  229. Oct 23 14:44:27 an-c05n02 kernel: block drbd0: new current UUID BDB851199DCABE3F:C9F77959FD4A71D3:282E11A241B8A05A:282D11A241B8A05B
  230. Oct 23 14:44:27 an-c05n02 kernel: block drbd0: susp( 1 -> 0 )
  231. Oct 23 14:44:27 an-c05n02 kernel: GFS2: fsid=an-cluster-A:shared.1: jid=0: Looking at journal...
  232. Oct 23 14:44:27 an-c05n02 rgmanager[7487]: start on vm "vm01-dev" returned 1 (generic error)
  233. Oct 23 14:44:27 an-c05n02 kernel: GFS2: fsid=an-cluster-A:shared.1: jid=0: Done
  234. Oct 23 14:44:27 an-c05n02 rgmanager[5023]: [vm] Could not determine Hypervisor
  235. Oct 23 14:44:27 an-c05n02 rgmanager[7487]: status on vm "vm02-cthulhu" returned 2 (invalid argument(s))
  236.  
  237. # vm02-cthulhu is on r1 which is still blocked at this point, hence the failures. vm01-dev was on r0 which is now working and the VM recovers.
  238. Oct 23 14:44:27 an-c05n02 rgmanager[7487]: Stopping service vm:vm02-cthulhu
  239. Oct 23 14:44:27 an-c05n02 rgmanager[7487]: #68: Failed to start vm:vm01-dev; return value: 1
  240. Oct 23 14:44:27 an-c05n02 rgmanager[7487]: Stopping service vm:vm01-dev
  241. Oct 23 14:44:28 an-c05n02 rgmanager[7487]: Service vm:vm01-dev is recovering
  242. Oct 23 14:44:28 an-c05n02 rgmanager[7487]: #71: Relocating failed service vm:vm01-dev
  243. Oct 23 14:44:28 an-c05n02 rgmanager[7487]: Service vm:vm01-dev is stopped
  244. Oct 23 14:44:28 an-c05n02 rgmanager[7487]: Starting stopped service vm:vm01-dev
  245. Oct 23 14:44:28 an-c05n02 kernel: device vnet1 entered promiscuous mode
  246. Oct 23 14:44:28 an-c05n02 kernel: vbr2: port 3(vnet1) entering forwarding state
  247. Oct 23 14:44:28 an-c05n02 qemu-kvm: Could not find keytab file: /etc/qemu/krb5.tab: No such file or directory
  248. Oct 23 14:44:29 an-c05n02 rgmanager[7487]: Service vm:vm01-dev started
  249. Oct 23 14:44:31 an-c05n02 ntpd[7102]: Listening on interface #11 vnet1, fe80::fc54:ff:fed4:2230#123 Enabled
  250. Oct 23 14:44:38 an-c05n02 kernel: kvm: 5354: cpu0 disabled perfctr wrmsr: 0xc1 data 0xabcd
  251. Oct 23 14:44:43 an-c05n02 kernel: vbr2: port 3(vnet1) entering forwarding state
  252. Oct 23 14:46:29 an-c05n02 corosync[6754]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
  253. Oct 23 14:46:29 an-c05n02 corosync[6754]: [QUORUM] Members[2]: 1 2
  254. Oct 23 14:46:29 an-c05n02 corosync[6754]: [QUORUM] Members[2]: 1 2
  255. Oct 23 14:46:29 an-c05n02 corosync[6754]: [CPG ] chosen downlist: sender r(0) ip(10.20.50.1) ; members(old:1 left:0)
  256. Oct 23 14:46:29 an-c05n02 corosync[6754]: [MAIN ] Completed service synchronization, ready to provide service.
  257. Oct 23 14:49:15 an-c05n02 rgmanager[7487]: stop on vm "vm02-cthulhu" returned 1 (generic error)
  258. Oct 23 14:49:15 an-c05n02 rgmanager[7487]: #12: RG vm:vm02-cthulhu failed to stop; intervention required
  259. Oct 23 14:49:15 an-c05n02 rgmanager[7487]: Service vm:vm02-cthulhu is failed
  260. Oct 23 14:49:15 an-c05n02 rgmanager[7487]: #43: Service vm:vm02-cthulhu has failed; can not start.
  261. Oct 23 14:49:15 an-c05n02 rgmanager[7487]: #13: Service vm:vm02-cthulhu failed to stop cleanly
  262. Oct 23 14:52:00 an-c05n02 kernel: dlm: got connection from 1
  263. Oct 23 15:16:19 an-c05n02 kernel: block drbd0: conn( StandAlone -> Unconnected )
  264. Oct 23 15:16:19 an-c05n02 kernel: block drbd0: Starting receiver thread (from drbd0_worker [8965])
  265. Oct 23 15:16:19 an-c05n02 kernel: block drbd0: receiver (re)started
  266. Oct 23 15:16:19 an-c05n02 kernel: block drbd0: conn( Unconnected -> WFConnection )
  267. Oct 23 15:16:19 an-c05n02 kernel: block drbd0: Handshake successful: Agreed network protocol version 97
  268. Oct 23 15:16:19 an-c05n02 kernel: block drbd0: conn( WFConnection -> WFReportParams )
  269. Oct 23 15:16:19 an-c05n02 kernel: block drbd0: Starting asender thread (from drbd0_receiver [30109])
  270. Oct 23 15:16:19 an-c05n02 kernel: block drbd0: data-integrity-alg: <not-used>
  271. Oct 23 15:16:19 an-c05n02 kernel: block drbd0: drbd_sync_handshake:
  272. Oct 23 15:16:19 an-c05n02 kernel: block drbd0: self BDB851199DCABE3F:C9F77959FD4A71D3:282E11A241B8A05A:282D11A241B8A05B bits:427 flags:0
  273. Oct 23 15:16:19 an-c05n02 kernel: block drbd0: peer C9F77959FD4A71D2:0000000000000000:282E11A241B8A05B:282D11A241B8A05B bits:0 flags:2
  274. Oct 23 15:16:19 an-c05n02 kernel: block drbd0: uuid_compare()=1 by rule 70
  275. Oct 23 15:16:19 an-c05n02 kernel: block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> WFBitMapS ) pdsk( Outdated -> Consistent )
  276. Oct 23 15:16:19 an-c05n02 kernel: block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0
  277. Oct 23 15:16:19 an-c05n02 kernel: block drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0 (0x0)
  278. Oct 23 15:16:19 an-c05n02 kernel: block drbd0: conn( WFBitMapS -> SyncSource ) pdsk( Consistent -> Inconsistent )
  279. Oct 23 15:16:19 an-c05n02 kernel: block drbd0: Began resync as SyncSource (will sync 1708 KB [427 bits set]).
  280. Oct 23 15:16:19 an-c05n02 kernel: block drbd0: updated sync UUID BDB851199DCABE3F:C9F87959FD4A71D3:C9F77959FD4A71D3:282E11A241B8A05A
  281. Oct 23 15:16:22 an-c05n02 kernel: block drbd0: Resync done (total 2 sec; paused 0 sec; 852 K/sec)
  282. Oct 23 15:16:22 an-c05n02 kernel: block drbd0: updated UUIDs BDB851199DCABE3F:0000000000000000:C9F87959FD4A71D3:C9F77959FD4A71D3
  283. Oct 23 15:16:22 an-c05n02 kernel: block drbd0: conn( SyncSource -> Connected ) pdsk( Inconsistent -> UpToDate )
  284. Oct 23 15:16:22 an-c05n02 kernel: block drbd0: bitmap WRITE of 3153 pages took 10 jiffies
  285. Oct 23 15:16:22 an-c05n02 kernel: block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
  286. Oct 23 15:18:22 an-c05n02 kernel: block drbd1: conn( StandAlone -> Unconnected )
  287. Oct 23 15:18:22 an-c05n02 kernel: block drbd1: Starting receiver thread (from drbd1_worker [7142])
  288. Oct 23 15:18:22 an-c05n02 kernel: block drbd1: receiver (re)started
  289. Oct 23 15:18:22 an-c05n02 kernel: block drbd1: conn( Unconnected -> WFConnection )
  290. Oct 23 15:18:23 an-c05n02 kernel: block drbd1: Handshake successful: Agreed network protocol version 97
  291. Oct 23 15:18:23 an-c05n02 kernel: block drbd1: conn( WFConnection -> WFReportParams )
  292. Oct 23 15:18:23 an-c05n02 kernel: block drbd1: Starting asender thread (from drbd1_receiver [32137])
  293. Oct 23 15:18:23 an-c05n02 kernel: block drbd1: data-integrity-alg: <not-used>
  294. Oct 23 15:18:23 an-c05n02 kernel: block drbd1: drbd_sync_handshake:
  295. Oct 23 15:18:23 an-c05n02 kernel: block drbd1: self 8314C38D1738144B:0000000000000000:97927DD1274D9799:97917DD1274D9799 bits:0 flags:0
  296. Oct 23 15:18:23 an-c05n02 kernel: block drbd1: peer 8314C38D1738144A:0000000000000000:97927DD1274D9799:97917DD1274D9799 bits:0 flags:2
  297. Oct 23 15:18:23 an-c05n02 kernel: block drbd1: uuid_compare()=-1 by rule 40
  298. Oct 23 15:18:23 an-c05n02 kernel: block drbd1: I shall become SyncTarget, but I am primary!
  299. Oct 23 15:18:23 an-c05n02 kernel: block drbd1: conn( WFReportParams -> Disconnecting )
  300. Oct 23 15:18:23 an-c05n02 kernel: block drbd1: error receiving ReportState, l: 4!
  301. Oct 23 15:18:23 an-c05n02 kernel: block drbd1: asender terminated
  302. Oct 23 15:18:23 an-c05n02 kernel: block drbd1: Terminating drbd1_asender
  303. Oct 23 15:18:23 an-c05n02 kernel: block drbd1: Connection closed
  304. Oct 23 15:18:23 an-c05n02 kernel: block drbd1: helper command: /sbin/drbdadm fence-peer minor-1
  305. Oct 23 15:18:23 an-c05n02 kernel: block drbd1: conn( Disconnecting -> StandAlone )
  306. Oct 23 15:18:23 an-c05n02 kernel: block drbd1: receiver terminated
  307. Oct 23 15:18:23 an-c05n02 kernel: block drbd1: Terminating drbd1_receiver
  308. Oct 23 15:18:23 an-c05n02 kernel: block drbd0: peer( Secondary -> Primary )
  309. Oct 23 15:18:23 an-c05n02 rhcs_fence: Attempting to fence peer using RHCS from DRBD...
  310. Oct 23 15:18:24 an-c05n02 corosync[6754]: cman killed by node 1 because we were killed by cman_tool or other application
  311. Oct 23 15:18:24 an-c05n02 fenced[6807]: cluster is down, exiting
  312. Oct 23 15:18:24 an-c05n02 gfs_controld[6882]: cluster is down, exiting
  313. Oct 23 15:18:24 an-c05n02 dlm_controld[6833]: cluster is down, exiting
  314. Oct 23 15:18:24 an-c05n02 fenced[6807]: daemon cpg_dispatch error 2
  315. Oct 23 15:18:24 an-c05n02 gfs_controld[6882]: daemon cpg_dispatch error 2
  316. Oct 23 15:18:24 an-c05n02 dlm_controld[6833]: daemon cpg_dispatch error 2
  317. Oct 23 15:18:24 an-c05n02 rgmanager[7487]: #67: Shutting down uncleanly
  318. Oct 23 15:18:24 an-c05n02 rgmanager[32242]: [script] Executing /etc/init.d/libvirtd stop
  319. Oct 23 15:18:24 an-c05n02 rgmanager[32270]: [vm] Could not determine Hypervisor
  320. Oct 23 15:18:24 an-c05n02 rgmanager[7487]: stop on vm "vm01-dev" returned 2 (invalid argument(s))
  321. Oct 23 15:18:24 an-c05n02 rgmanager[32290]: [vm] Could not determine Hypervisor
  322. Oct 23 15:18:24 an-c05n02 rgmanager[7487]: stop on vm "vm02-cthulhu" returned 2 (invalid argument(s))
  323. Oct 23 15:18:24 an-c05n02 rgmanager[32307]: [script] Executing /etc/init.d/gfs2 stop
  324. Oct 23 15:18:32 an-c05n02 kernel: dlm: closing connection to node 1
  325. Oct 23 15:18:32 an-c05n02 kernel: dlm: closing connection to node 2
  326. Oct 23 15:18:32 an-c05n02 kernel: dlm: shared: no userland control daemon, stopping lockspace
  327. Oct 23 15:18:32 an-c05n02 kernel: dlm: clvmd: no userland control daemon, stopping lockspace
  328. Oct 23 15:18:32 an-c05n02 kernel: dlm: rgmanager: no userland control daemon, stopping lockspace
  329. Oct 23 15:18:32 an-c05n02 kernel: block drbd1: helper command: /sbin/drbdadm fence-peer minor-1 exit code 1 (0x100)
  330. Oct 23 15:18:32 an-c05n02 kernel: block drbd1: fence-peer helper broken, returned 1
  331.  
  332. # This is where node 1 fenced this node when trying to 'drbdadm connect r1'.
  333. Write failed: Broken pipe
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement