Advertisement
damianivereigh

Galera crash info

May 23rd, 2015
340
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 14.39 KB | None | 0 0
  1. Servers involved:-
  2. 103.25.214.60 - apollo.launtel.net.au - node crashed
  3. 122.201.85.240 - aphrodite.launtel.net.au - node crashed
  4. 103.25.214.61 - zeus.launtel.net.au - executed drop table on this node, didn't crash but went non-primary
  5. 54.66.156.123 - ares.launtel.net.au - didn't crash but went non-primary.
  6. 172.31.15.84 - ares.launtel.net.au - internal IP of above server - behind NAT gw (AWS)
  7.  
  8. Galera section of mysql config file (all the same except IP addresses and names swapped around):-
  9. [galera]
  10. wsrep_provider=/usr/lib64/galera/libgalera_smm.so
  11. wsrep_cluster_address=gcomm://122.201.85.240,103.25.214.60,54.66.156.123
  12. binlog_format=row
  13. default_storage_engine=InnoDB
  14. innodb_autoinc_lock_mode=2
  15. bind-address=0.0.0.0
  16. wsrep_cluster_name='launtel'
  17. wsrep_node_address='103.25.214.61'
  18. wsrep_node_name='zeus'
  19. wsrep_sst_method=rsync
  20. wsrep_sst_auth=replicate:XXXXXXXXXXXX
  21. wsrep_on=ON
  22. ===========================================================================================
  23.  
  24. /var/lib/mysql/apollo.launtel.net.au.err -
  25. 150523 13:28:51 [ERROR] mysqld got signal 11 ;
  26. This could be because you hit a bug. It is also possible that this binary
  27. or one of the libraries it was linked against is corrupt, improperly built,
  28. or misconfigured. This error can also be caused by malfunctioning hardware.
  29.  
  30. To report this bug, see http://kb.askmonty.org/en/reporting-bugs
  31.  
  32. We will try our best to scrape up some info that will hopefully help
  33. diagnose the problem, but since we have already crashed,
  34. something is definitely wrong and this may fail.
  35.  
  36. Server version: 5.5.41-MariaDB-wsrep
  37. key_buffer_size=134217728
  38. read_buffer_size=131072
  39. max_used_connections=16
  40. max_threads=153
  41. thread_count=17
  42. It is possible that mysqld could use up to
  43. key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 466778 K bytes of memory
  44. Hope that's ok; if not, decrease some variables in the equation.
  45.  
  46. Thread pointer: 0x0x7fc6e6c14000
  47. Attempting backtrace. You can use the following information to find out
  48. where mysqld died. If you see no messages after this, something went
  49. terribly wrong...
  50. stack_bottom = 0x7fc6f90c59b0 thread_stack 0x48000
  51. /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0xae59ee]
  52. /usr/sbin/mysqld(handle_fatal_signal+0x390)[0x6fec00]
  53. /lib64/libpthread.so.0(+0xf130)[0x7fc6f8cfc130]
  54. /usr/sbin/mysqld(_ZN28Format_description_log_event14do_apply_eventEPK14Relay_log_info+0xc8)[0x7c45b8]
  55. /usr/sbin/mysqld(_Z14wsrep_apply_cbPvPKvmjPK14wsrep_trx_meta+0x6d0)[0x6b2380]
  56. /usr/lib64/galera/libgalera_smm.so(_ZNK6galera9TrxHandle5applyEPvPF15wsrep_cb_statusS1_PKvmjPK14wsrep_trx_metaERS6_+0x100)[0x7fc6f3d551a0]
  57. /usr/lib64/galera/libgalera_smm.so(+0x1b7330)[0x7fc6f3d8a330]
  58. /usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM9apply_trxEPvPNS_9TrxHandleE+0xc3)[0x7fc6f3d8ca53]
  59. /usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM11process_trxEPvPNS_9TrxHandleE+0x136)[0x7fc6f3d8f476]
  60. /usr/lib64/galera/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x1d9)[0x7fc6f3d6e759]
  61. /usr/lib64/galera/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x5c)[0x7fc6f3d6f55c]
  62. /usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x83)[0x7fc6f3d8fa13]
  63. /usr/lib64/galera/libgalera_smm.so(galera_recv+0x2b)[0x7fc6f3d9eedb]
  64. /usr/sbin/mysqld[0x6b2c6f]
  65. /usr/sbin/mysqld(start_wsrep_THD+0x4f8)[0x522718]
  66. /lib64/libpthread.so.0(+0x7df3)[0x7fc6f8cf4df3]
  67. /lib64/libc.so.6(clone+0x6d)[0x7fc6f75721ad]
  68.  
  69. Trying to get some variables.
  70. Some pointers may be invalid and cause the dump to abort.
  71. Query (0x0): is an invalid pointer
  72. Connection ID (thread ID): 2
  73. Status: NOT_KILLED
  74.  
  75. Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=off
  76.  
  77. The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
  78. information that should help you find out what is causing the crash.
  79. 150523 13:28:52 mysqld_safe Number of processes running now: 0
  80. 150523 13:28:52 mysqld_safe WSREP: not restarting wsrep node automatically
  81. 150523 13:28:52 mysqld_safe mysqld from pid file /var/lib/mysql/apollo.launtel.net.au.pid ended
  82. ===========================================================================================
  83. /var/lib/mysqld/zeus.launtel.net.au.err (node didn't crash):-
  84. 150523 13:28:52 [Note] WSREP: (a5225fff, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://103.25.214.60:4567
  85. 150523 13:28:53 [Note] WSREP: (a5225fff, 'tcp://0.0.0.0:4567') reconnecting to 9a8aa1a2 (tcp://103.25.214.60:4567), attempt 0
  86. 150523 13:28:53 [Note] WSREP: (a5225fff, 'tcp://0.0.0.0:4567') reconnecting to ed987f90 (tcp://122.201.85.240:4567), attempt 0
  87. 150523 13:28:57 [Note] WSREP: evs::proto(a5225fff, OPERATIONAL, view_id(REG,9a8aa1a2,4)) suspecting node: 9a8aa1a2
  88. 150523 13:28:57 [Note] WSREP: evs::proto(a5225fff, OPERATIONAL, view_id(REG,9a8aa1a2,4)) suspected node without join message, declaring inactive
  89. 150523 13:28:57 [Note] WSREP: evs::proto(a5225fff, OPERATIONAL, view_id(REG,9a8aa1a2,4)) suspecting node: ed987f90
  90. 150523 13:28:57 [Note] WSREP: evs::proto(a5225fff, OPERATIONAL, view_id(REG,9a8aa1a2,4)) suspected node without join message, declaring inactive
  91. 150523 13:29:04 [Warning] WSREP: evs::proto(a5225fff, GATHER, view_id(REG,9a8aa1a2,4)) install timer expired
  92. evs::proto(evs::proto(a5225fff, GATHER, view_id(REG,9a8aa1a2,4)), GATHER) {
  93. current_view=view(view_id(REG,9a8aa1a2,4) memb {
  94. 9a8aa1a2,0
  95. a5225fff,0
  96. cc1f173d,0
  97. ed987f90,0
  98. } joined {
  99. } left {
  100. } partitioned {
  101. }),
  102. input_map=evs::input_map: {aru_seq=67516,safe_seq=67516,node_index=node: {idx=0,range=[67517,67516],safe_seq=67516} node: {idx=1,range=[67531,67530],safe_seq=67516} node: {idx=2,range=[67531,67530],safe_seq=67516} node: {idx=3,range=[67519,67518],safe_seq=67516} },
  103. fifo_seq=153345,
  104. last_sent=67530,
  105. known:
  106. 9a8aa1a2 at tcp://103.25.214.60:4567
  107. {o=0,s=1,i=0,fs=147332,}
  108. a5225fff at
  109. {o=1,s=0,i=0,fs=-1,jm=
  110. {v=0,t=4,ut=255,o=1,s=67516,sr=-1,as=67516,f=0,src=a5225fff,srcvid=view_id(REG,9a8aa1a2,4),insvid=view_id(UNKNOWN,00000000,0),ru=00000000,r=[-1,-1],fs=153343,nl=(
  111. 9a8aa1a2, {o=0,s=1,e=0,ls=-1,vid=view_id(REG,9a8aa1a2,4),ss=67516,ir=[67517,67516],}
  112. a5225fff, {o=1,s=0,e=0,ls=-1,vid=view_id(REG,9a8aa1a2,4),ss=67516,ir=[67531,67530],}
  113. cc1f173d, {o=1,s=0,e=0,ls=-1,vid=view_id(REG,9a8aa1a2,4),ss=67516,ir=[67531,67530],}
  114. ed987f90, {o=0,s=1,e=0,ls=-1,vid=view_id(REG,9a8aa1a2,4),ss=67516,ir=[67519,67518],}
  115. )
  116. },
  117. }
  118. cc1f173d at tcp://54.66.156.123:4567
  119. {o=1,s=0,i=0,fs=135024,jm=
  120. {v=0,t=4,ut=255,o=1,s=67516,sr=-1,as=67516,f=4,src=cc1f173d,srcvid=view_id(REG,9a8aa1a2,4),insvid=view_id(UNKNOWN,00000000,0),ru=00000000,r=[-1,-1],fs=135024,nl=(
  121. 9a8aa1a2, {o=1,s=1,e=0,ls=-1,vid=view_id(REG,9a8aa1a2,4),ss=67516,ir=[67517,67516],}
  122. a5225fff, {o=1,s=0,e=0,ls=-1,vid=view_id(REG,9a8aa1a2,4),ss=67516,ir=[67531,67530],}
  123. cc1f173d, {o=1,s=0,e=0,ls=-1,vid=view_id(REG,9a8aa1a2,4),ss=67516,ir=[67531,67530],}
  124. ed987f90, {o=1,s=1,e=0,ls=-1,vid=view_id(REG,9a8aa1a2,4),ss=67516,ir=[67519,67518],}
  125. )
  126. },
  127. }
  128. ed987f90 at tcp://122.201.85.240:4567
  129. {o=0,s=1,i=0,fs=154192,}
  130. }
  131. 150523 13:29:04 [Note] WSREP: no install message received
  132. 150523 13:29:04 [Note] WSREP: view(view_id(NON_PRIM,9a8aa1a2,4) memb {
  133. a5225fff,0
  134. } joined {
  135. } left {
  136. } partitioned {
  137. 9a8aa1a2,0
  138. cc1f173d,0
  139. ed987f90,0
  140. })
  141. 150523 13:29:04 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
  142. 150523 13:29:04 [Note] WSREP: Flow-control interval: [16, 16]
  143. 150523 13:29:04 [Note] WSREP: Received NON-PRIMARY.
  144. 150523 13:29:04 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 49731511)
  145. 150523 13:29:04 [Warning] WSREP: Last Applied Action message in non-primary configuration from member 0
  146. 150523 13:29:04 [Note] WSREP: view(view_id(NON_PRIM,a5225fff,5) memb {
  147. a5225fff,0
  148. } joined {
  149. } left {
  150. } partitioned {
  151. 9a8aa1a2,0
  152. cc1f173d,0
  153. ed987f90,0
  154. })
  155. 150523 13:29:04 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
  156. 150523 13:29:04 [Note] WSREP: New cluster view: global state: 1e90fb11-b059-11e4-a6ce-0663d9000b59:49731511, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
  157. 150523 13:29:04 [Note] WSREP: Flow-control interval: [16, 16]
  158. 150523 13:29:04 [Note] WSREP: Received NON-PRIMARY.
  159. 150523 13:29:04 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
  160. 150523 13:29:04 [Note] WSREP: New cluster view: global state: 1e90fb11-b059-11e4-a6ce-0663d9000b59:49731511, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
  161. 150523 13:29:04 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
  162. 150523 13:29:04 [Warning] WSREP: Send action {(nil), 439, TORDERED} returned -107 (Transport endpoint is not connected)
  163. 150523 13:29:05 [Note] WSREP: declaring cc1f173d at tcp://54.66.156.123:4567 stable
  164. 150523 13:29:05 [Note] WSREP: view(view_id(NON_PRIM,a5225fff,6) memb {
  165. a5225fff,0
  166. cc1f173d,0
  167. } joined {
  168. } left {
  169. } partitioned {
  170. 9a8aa1a2,0
  171. ed987f90,0
  172. })
  173. 150523 13:29:05 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 2
  174. 150523 13:29:05 [Note] WSREP: Flow-control interval: [23, 23]
  175. 150523 13:29:05 [Note] WSREP: Received NON-PRIMARY.
  176. 150523 13:29:05 [Note] WSREP: New cluster view: global state: 1e90fb11-b059-11e4-a6ce-0663d9000b59:49731511, view# -1: non-Primary, number of nodes: 2, my index: 0, protocol version 3
  177. 150523 13:29:05 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
  178. 150523 13:29:37 [Note] WSREP: (a5225fff, 'tcp://0.0.0.0:4567') reconnecting to 9a8aa1a2 (tcp://103.25.214.60:4567), attempt 30
  179. 150523 13:29:38 [Note] WSREP: (a5225fff, 'tcp://0.0.0.0:4567') reconnecting to ed987f90 (tcp://122.201.85.240:4567), attempt 30
  180. 150523 13:30:22 [Note] WSREP: (a5225fff, 'tcp://0.0.0.0:4567') reconnecting to 9a8aa1a2 (tcp://103.25.214.60:4567), attempt 60
  181. 150523 13:30:23 [Note] WSREP: (a5225fff, 'tcp://0.0.0.0:4567') reconnecting to ed987f90 (tcp://122.201.85.240:4567), attempt 60
  182. 150523 13:31:07 [Note] WSREP: (a5225fff, 'tcp://0.0.0.0:4567') reconnecting to 9a8aa1a2 (tcp://103.25.214.60:4567), attempt 90
  183. 150523 13:31:08 [Note] WSREP: (a5225fff, 'tcp://0.0.0.0:4567') reconnecting to ed987f90 (tcp://122.201.85.240:4567), attempt 90
  184. 150523 13:31:52 [Note] WSREP: (a5225fff, 'tcp://0.0.0.0:4567') reconnecting to 9a8aa1a2 (tcp://103.25.214.60:4567), attempt 120
  185. 150523 13:31:53 [Note] WSREP: (a5225fff, 'tcp://0.0.0.0:4567') reconnecting to ed987f90 (tcp://122.201.85.240:4567), attempt 120
  186. 150523 13:32:37 [Note] WSREP: (a5225fff, 'tcp://0.0.0.0:4567') reconnecting to 9a8aa1a2 (tcp://103.25.214.60:4567), attempt 150
  187. 150523 13:32:38 [Note] WSREP: (a5225fff, 'tcp://0.0.0.0:4567') reconnecting to ed987f90 (tcp://122.201.85.240:4567), attempt 150
  188. 150523 13:33:17 [Note] WSREP: view(view_id(PRIM,a5225fff,6) memb {
  189. a5225fff,0
  190. cc1f173d,0
  191. } joined {
  192. } left {
  193. } partitioned {
  194. 9a8aa1a2,0
  195. ed987f90,0
  196. })
  197. 150523 13:33:17 [Note] WSREP: save pc into disk
  198. 150523 13:33:17 [Note] WSREP: forgetting 9a8aa1a2 (tcp://103.25.214.60:4567)
  199. 150523 13:33:17 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = yes, my_idx = 0, memb_num = 2
  200. 150523 13:33:17 [Note] WSREP: forgetting ed987f90 (tcp://122.201.85.240:4567)
  201. 150523 13:33:17 [Note] WSREP: (a5225fff, 'tcp://0.0.0.0:4567') turning message relay requesting off
  202. 150523 13:33:17 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 73a3c47a-00fc-11e5-8ba9-b25b7ee5f569
  203. 150523 13:33:17 [Note] WSREP: STATE EXCHANGE: sent state msg: 73a3c47a-00fc-11e5-8ba9-b25b7ee5f569
  204. 150523 13:33:17 [Note] WSREP: STATE EXCHANGE: got state msg: 73a3c47a-00fc-11e5-8ba9-b25b7ee5f569 from 0 (zeus)
  205. 150523 13:33:17 [Note] WSREP: STATE EXCHANGE: got state msg: 73a3c47a-00fc-11e5-8ba9-b25b7ee5f569 from 1 (ares)
  206. 150523 13:33:17 [Warning] WSREP: Quorum: No node with complete state:
  207.  
  208.  
  209. Version : 3
  210. Flags : 0x7
  211. Protocols : 0 / 7 / 3
  212. State : NON-PRIMARY
  213. Prim state : SYNCED
  214. Prim UUID : cd99c074-00e2-11e5-addc-2bca0d0759b6
  215. Prim seqno : 4
  216. First seqno : 49703422
  217. Last seqno : 49731511
  218. Prim JOINED : 4
  219. State UUID : 73a3c47a-00fc-11e5-8ba9-b25b7ee5f569
  220. Group UUID : 1e90fb11-b059-11e4-a6ce-0663d9000b59
  221. Name : 'zeus'
  222. Incoming addr: '103.25.214.61:3306'
  223.  
  224. Version : 3
  225. Flags : 0x6
  226. Protocols : 0 / 5 / 3
  227. State : NON-PRIMARY
  228. Prim state : SYNCED
  229. Prim UUID : cd99c074-00e2-11e5-addc-2bca0d0759b6
  230. Prim seqno : 4
  231. First seqno : 49705790
  232. Last seqno : 49731511
  233. Prim JOINED : 4
  234. State UUID : 73a3c47a-00fc-11e5-8ba9-b25b7ee5f569
  235. Group UUID : 1e90fb11-b059-11e4-a6ce-0663d9000b59
  236. Name : 'ares'
  237. Incoming addr: '172.31.15.84:3306'
  238.  
  239. 150523 13:33:17 [Note] WSREP: Partial re-merge of primary cd99c074-00e2-11e5-addc-2bca0d0759b6 found: 2 of 4.
  240. 150523 13:33:17 [Note] WSREP: Quorum results:
  241. version = 3,
  242. component = PRIMARY,
  243. conf_id = 4,
  244. members = 2/2 (joined/total),
  245. act_id = 49731511,
  246. last_appl. = 49731384,
  247. protocols = 0/5/3 (gcs/repl/appl),
  248. group UUID = 1e90fb11-b059-11e4-a6ce-0663d9000b59
  249. 150523 13:33:17 [Note] WSREP: Flow-control interval: [23, 23]
  250. 150523 13:33:17 [Note] WSREP: Restored state OPEN -> SYNCED (49731511)
  251. 150523 13:33:17 [Note] WSREP: New cluster view: global state: 1e90fb11-b059-11e4-a6ce-0663d9000b59:49731511, view# 5: Primary, number of nodes: 2, my index: 0, protocol version 3
  252. 150523 13:33:17 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
  253. 150523 13:33:17 [Note] WSREP: REPL Protocols: 5 (3, 1)
  254. 150523 13:33:17 [Note] WSREP: Service thread queue flushed.
  255. 150523 13:33:17 [Note] WSREP: Assign initial position for certification: 49731511, protocol version: 3
  256. 150523 13:33:17 [Note] WSREP: Service thread queue flushed.
  257. 150523 13:33:17 [Note] WSREP: Synchronized with group, ready for connections
  258. 150523 13:33:17 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
  259. 150523 13:33:20 [Note] WSREP: cleaning up 9a8aa1a2 (tcp://103.25.214.60:4567)
  260. 150523 13:33:20 [Note] WSREP: cleaning up ed987f90 (tcp://122.201.85.240:4567)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement