Advertisement
Guest User

Untitled

a guest
Mar 13th, 2013
223
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 13.67 KB | None | 0 0
  1. After the clusterwide stop and restart:
  2.  
  3. On dnds1-13:
  4. -------------------
  5. 2013-03-14 00:11:37,993 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Atomically moving dnds1-4,60020,1363219866063's hlogs to my queue
  6. 2013-03-14 00:11:37,997 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-4%2C60020%2C1363219866063.1363219868868 with data
  7. 2013-03-14 00:11:37,997 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: The multi list size is: 5
  8. 2013-03-14 00:11:38,068 WARN org.apache.hadoop.hbase.replication.ReplicationZookeeper: Got exception in copyQueuesFromRSUsingMulti:
  9. org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
  10. at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
  11. at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
  12. at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
  13. at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:531)
  14. at org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1436)
  15. at org.apache.hadoop.hbase.replication.ReplicationZookeeper.copyQueuesFromRSUsingMulti(ReplicationZookeeper.java:705)
  16. at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$NodeFailoverWorker.run(ReplicationSourceManager.java:590)
  17. at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  18. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  19. at java.lang.Thread.run(Thread.java:662)
  20. 2013-03-14 00:11:38,070 INFO org.apache.hadoop.hbase.metrics: new MBeanInfo
  21. 2013-03-14 00:11:38,077 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Getting 2 rs from peer cluster # 1
  22. 2013-03-14 00:11:38,077 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Choosing peer ist6-dnds1-6,60020,1359508078741
  23. 2013-03-14 00:11:38,077 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Choosing peer ist6-dnds1-3,60020,1359508079058
  24. 2013-03-14 00:11:39,080 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicating 4f3d5435-898c-47c2-8821-aeb01f9e87cc -> 74c750a5-4254-4a3b-ab12-063869759edd
  25. 2013-03-14 00:11:39,081 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log for replication dnds1-4%2C60020%2C1363219866063.136321986886
  26. 8 at 0
  27. 2013-03-14 00:11:39,090 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: currentNbOperations:0 and seenEntries:3 and size: 0
  28. 2013-03-14 00:11:39,090 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Going to report log #dnds1-4%2C60020%2C1363219866063.1363219868868
  29. for position 1004 in hdfs://cluster/hbase/.oldlogs/dnds1-4%2C60020%2C1363219866063.1363219868868
  30. 2013-03-14 00:11:39,139 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server dnds1-13,60020,1363219887385: Writing replication status
  31. org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/replication/rs/dnds1-13,60020,1363219887385/1-dnds1-4,60
  32. 020,1363219866063/dnds1-4%2C60020%2C1363219866063.1363219868868
  33. at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
  34. at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
  35. at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1266)
  36. at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:349)
  37. at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:848)
  38. at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:900)
  39. at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:894)
  40. at org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:558)
  41. at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:155)
  42. at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:379)
  43. 2013-03-14 00:11:39,140 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: []
  44. -------------------
  45.  
  46.  
  47. Another one on dnds1-12:
  48. -------------------
  49. 2013-03-14 00:11:35,905 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Atomically moving dnds1-8,60020,1363219865904's hlogs to my queue
  50. 2013-03-14 00:11:35,909 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-8%2C60020%2C1363219865904.1363219868852 with data
  51. 2013-03-14 00:11:35,937 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1363217800957 with data 7470
  52. 2013-03-14 00:11:35,972 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1363102598299 with data null
  53. 2013-03-14 00:11:35,973 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1363109798542 with data null
  54. 2013-03-14 00:11:35,974 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1363124199049 with data null
  55. 2013-03-14 00:11:35,975 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1363116998811 with data null
  56. 2013-03-14 00:11:35,977 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1363098998059 with data null
  57. 2013-03-14 00:11:35,978 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1363026996488 with data null
  58. 2013-03-14 00:11:35,979 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1363127799218 with data null
  59. 2013-03-14 00:11:35,991 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1363034196762 with data null
  60. 2013-03-14 00:11:35,992 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1362533787045 with data null
  61. 2013-03-14 00:11:35,993 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-9%2C60020%2C1362533781275.1363023396288 with data null
  62. 2013-03-14 00:11:35,993 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: The multi list size is: 29
  63. 2013-03-14 00:11:36,018 WARN org.apache.hadoop.hbase.replication.ReplicationZookeeper: Got exception in copyQueuesFromRSUsingMulti:
  64. org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
  65. at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
  66. at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
  67. at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
  68. at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:531)
  69. at org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1436)
  70. at org.apache.hadoop.hbase.replication.ReplicationZookeeper.copyQueuesFromRSUsingMulti(ReplicationZookeeper.java:705)
  71. at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$NodeFailoverWorker.run(ReplicationSourceManager.java:590)
  72. at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  73. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  74. at java.lang.Thread.run(Thread.java:662)
  75. 2013-03-14 00:11:36,019 INFO org.apache.hadoop.hbase.metrics: new MBeanInfo
  76. 2013-03-14 00:11:36,025 INFO org.apache.hadoop.hbase.metrics: new MBeanInfo
  77. 2013-03-14 00:11:36,025 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Getting 2 rs from peer cluster # 1
  78. 2013-03-14 00:11:36,025 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Choosing peer ist6-dnds1-3,60020,1359508079058
  79. 2013-03-14 00:11:36,025 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Choosing peer ist6-dnds1-11,60020,1359508114550
  80. 2013-03-14 00:11:36,033 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Getting 2 rs from peer cluster # 1
  81. 2013-03-14 00:11:36,033 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Choosing peer ist6-dnds1-10,60020,1359508114634
  82. 2013-03-14 00:11:36,033 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Choosing peer ist6-dnds1-6,60020,1359508078741
  83. 2013-03-14 00:11:37,028 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicating 4f3d5435-898c-47c2-8821-aeb01f9e87cc -> 74c750a5-4254-4a3b-ab12-063869759edd
  84. 2013-03-14 00:11:37,030 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log for replication dnds1-8%2C60020%2C1363219865904.1363219868852 at 0
  85. 2013-03-14 00:11:37,036 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicating 4f3d5435-898c-47c2-8821-aeb01f9e87cc -> 74c750a5-4254-4a3b-ab12-063869759edd
  86. 2013-03-14 00:11:37,038 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log for replication dnds1-9%2C60020%2C1362533781275.1362533787045 at 0
  87. 2013-03-14 00:11:37,046 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: currentNbOperations:0 and seenEntries:0 and size: 0
  88. 2013-03-14 00:11:37,046 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Done with the recovered queue 1-dnds1-8,60020,1363219865904
  89. 2013-03-14 00:11:37,048 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Finished recovering the queue
  90. 2013-03-14 00:11:37,048 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Source exiting 1
  91. 2013-03-14 00:11:37,055 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: currentNbOperations:0 and seenEntries:1 and size: 0
  92. 2013-03-14 00:11:37,055 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Going to report log #dnds1-9%2C60020%2C1362533781275.1362533787045 for position 372 in hdfs://cluster/hbase/.oldlogs/dnds1-9%2C60020%2C1362533781275.1362533787045
  93. 2013-03-14 00:11:37,074 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server dnds1-12,60020,1363219887328: Writing replication status
  94. org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/replication/rs/dnds1-12,60020,1363219887328/1-dnds1-9,60020,1362533781275-dnds1-11,60020,1362533806866-dnds1-8,60020,1363219865904/dnds1-9%2C60020%2C1362533781275.1362533787045
  95. at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
  96. at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
  97. at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1266)
  98. at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:349)
  99. at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:848)
  100. at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:900)
  101. at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:894)
  102. at org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:558)
  103. at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:155)
  104. at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:379)
  105. 2013-03-14 00:11:37,074 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: []
  106. -------------------------
  107.  
  108.  
  109. We then restarted all RS a 2nd time.
  110. This time they stay up, but now the log is filled with messages like these:
  111. ------------------------
  112. 2013-03-14 01:00:37,998 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log for replication dnds1-12%2C60020%2C1363220608780.1363220609572 at 0
  113. 2013-03-14 01:00:38,001 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: 1 Got:
  114. java.io.EOFException
  115. at java.io.DataInputStream.readFully(DataInputStream.java:180)
  116. at java.io.DataInputStream.readFully(DataInputStream.java:152)
  117. at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1800)
  118. at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765)
  119. at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1714)
  120. at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1728)
  121. at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:55)
  122. at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:177)
  123. at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:728)
  124. at org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:67)
  125. at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:507)
  126. at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:313)
  127. 2013-03-14 01:00:38,001 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Waited too long for this file, considering dumping
  128. 2013-03-14 01:00:38,001 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Unable to open a reader, sleeping 1000 times 10
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement