Guest User

CDH User

a guest
Jun 26th, 2012
106
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. I am testing a small HDFS cluster of 4 machines. I successfully
  2. installed CDH3 (and now CDH4), configured and started HDFS, etc.
  3.  
  4. As part of my testing, I filled HDFS with files to full capacity (100%
  5. full, cannot write any data into HDFS, the data disks on all 4
  6. machines are 100% full) and hit an anomaly and a showstopper.
  7.  
  8. The anomaly has to do with reported disk space. "df" is expected to
  9. report HDFS 100% full, instead it reports 41% used. ("dfsadmin -
  10. report" is correct: "DFS Used: 99.9%, DFS remaining: 1.34GB" (out of
  11. 2.7TB)).
  12.  
  13. The showstopper has to do with the balancer. The balancer thinks that
  14. one of the nodes is only 8% used and thinks that data should be moved
  15. to it from other nodes. Of course this 8% number is wrong - all data
  16. disks are 100% full, and as expected, all data movement fails.
  17.  
  18. This is somewhat consistent with the "live nodes" report for the
  19. supposedly 8% full node:
  20.  
  21. Node Last Contact Admin State Configured Capacity (TB) Used(TB)
  22. Non DFS Used (TB) Remaining(TB) Used (%) Used (%) Remaining(%)
  23. Blocks Block PoolUsed (TB) Block PoolUsed (%)> Blocks Failed
  24. Volumes
  25. iris02 0 In Service 1.61 0.13 1.47 0 8.19 0.01 2164
  26. 0.13 8.19 1
  27. ladd02 1 In Service 0.26 0.26 0 0 99.99 0.01 4321 0.26 99.99 1
  28. ladd06 1 In Service 0.49 0.49 0 0 99.95 0.05 7973 0.49 99.95 1
  29. ladd12 1 In Service 0.37 0.38 0 0 103.31 0.05 6236 0.38 103.31 0
  30.  
  31. Observe iris02 is reported as "8% used", "0 TB remaining", "0%
  32. remaining".
  33.  
  34. Also observe that iris02 has 1.47TB of "non DFS" data reported, which
  35. is correct - there is about 1.5 TB of non-DFS data on that disk.
  36.  
  37. This is consistent with "dfsadmin -report" which shows 1.47GB non-DFS
  38. uses, 91MB "DFS remaining", 8% "DFS Used", 0.01% "DFS remaining".
  39.  
  40. So what is going on here?
  41. "df" and the balancer "forgot" to account for the "non-DFS" data?
  42. The HDFS balancer only works correctly if "non DFS" disk use is zero
  43. on every node?
  44.  
  45. Attached is the output of "df", output of "dfsadmin -report" and
  46. output of the balancer.
  47.  
  48. > df /hdfs
  49. Filesystem 1K-blocks Used Available Use% Mounted on
  50. fuse_dfs 2925264896 1353711616 1571553280 47% /hdfs
  51.  
  52. > hdfs dfsadmin -report
  53. Configured Capacity: 2995494459813 (2.72 TB)
  54. Present Capacity: 1387596803939 (1.26 TB)
  55. DFS Remaining: 1433567075 (1.34 GB)
  56. DFS Used: 1386163236864 (1.26 TB)
  57. DFS Used%: 99.9%
  58. Under replicated blocks: 3867
  59. Blocks with corrupt replicas: 0
  60. Missing blocks: 0
  61.  
  62. -------------------------------------------------
  63. Datanodes available: 4 (4 total, 0 dead)
  64.  
  65. Live datanodes:
  66. Name: 142.90.111.96:50010 (ladd06.triumf.ca)
  67. Hostname: ladd06.triumf.ca
  68. Rack: /detfac
  69. Decommission Status : Normal
  70. Configured Capacity: 535521764641 (498.74 GB)
  71. DFS Used: 534397218816 (497.7 GB)
  72. Non DFS Used: 0 (0 KB)
  73. DFS Remaining: 1124545825 (1.05 GB)
  74. DFS Used%: 99.79%
  75. DFS Remaining%: 0.21%
  76. Last contact: Sat Jun 23 18:35:16 PDT 2012
  77.  
  78. Name: 142.90.119.126:50010 (ladd12.triumf.ca)
  79. Hostname: ladd12.triumf.ca
  80. Rack: /isac2
  81. Decommission Status : Normal
  82. Configured Capacity: 404426435138 (376.65 GB)
  83. DFS Used: 417828085760 (389.13 GB)
  84. Non DFS Used: 0 (0 KB)
  85. DFS Remaining: 189199649 (180.43 MB)
  86. DFS Used%: 103.31%
  87. DFS Remaining%: 0.05%
  88. Last contact: Sat Jun 23 18:35:18 PDT 2012
  89.  
  90.  
  91. Name: 142.90.111.72:50010 (ladd02.triumf.ca)
  92. Hostname: ladd02.triumf.ca
  93. Rack: /detfac
  94. Decommission Status : Normal
  95. Configured Capacity: 289398961441 (269.52 GB)
  96. DFS Used: 289375211520 (269.5 GB)
  97. Non DFS Used: 0 (0 KB)
  98. DFS Remaining: 23749921 (22.65 MB)
  99. DFS Used%: 99.99%
  100. DFS Remaining%: 0.01%
  101. Last contact: Sat Jun 23 18:35:18 PDT 2012
  102.  
  103.  
  104. Name: 142.90.119.162:50010 (iris02.triumf.ca)
  105. Hostname: iris02.triumf.ca
  106. Rack: /isac2
  107. Decommission Status : Normal
  108. Configured Capacity: 1766147298593 (1.61 TB)
  109. DFS Used: 144562720768 (134.63 GB)
  110. Non DFS Used: 1621488506145 (1.47 TB)
  111. DFS Remaining: 96071680 (91.62 MB)
  112. DFS Used%: 8.19%
  113. DFS Remaining%: 0.01%
  114. Last contact: Sat Jun 23 18:35:17 PDT 2012
  115.  
  116.  
  117. > hdfs balancer
  118. 12/06/23 18:40:24 INFO balancer.Balancer: namenodes = [hdfs://
  119. ladd12.triumf.ca/]
  120. 12/06/23 18:40:24 INFO balancer.Balancer: p =
  121. Balancer.Parameters[BalancingPolicy.Node, threshold=10.0]
  122. Time Stamp Iteration# Bytes Already Moved Bytes Left
  123. To Move Bytes Being Moved
  124. 12/06/23 18:40:25 INFO net.NetworkTopology: Adding a new node: /detfac/
  125. 142.90.111.96:50010
  126. 12/06/23 18:40:25 INFO net.NetworkTopology: Adding a new node: /detfac/
  127. 142.90.111.72:50010
  128. 12/06/23 18:40:25 INFO net.NetworkTopology: Adding a new node: /
  129. isac2/142.90.119.126:50010
  130. 12/06/23 18:40:25 INFO net.NetworkTopology: Adding a new node: /
  131. isac2/142.90.119.162:50010
  132. 12/06/23 18:40:25 INFO balancer.Balancer: 3 over-utilized:
  133. [Source[142.90.111.96:50010, utilization=99.91803337474168],
  134. Source[142.90.111.72:50010, utilization=99.99179336343097],
  135. Source[142.90.119.126:50010, utilization=103.31374249000984]]
  136. 12/06/23 18:40:25 INFO balancer.Balancer: 1 underutilized:
  137. [BalancerDatanode[142.90.119.162:50010,
  138. utilization=8.185201816584936]]
  139. 12/06/23 18:40:25 INFO balancer.Balancer: Need to move 512.4 GB to
  140. make the cluster balanced.
  141. 12/06/23 18:40:25 INFO balancer.Balancer: Decided to move 91.62 MB
  142. bytes from 142.90.111.96:50010 to 142.90.119.162:50010
  143. 12/06/23 18:40:25 INFO balancer.Balancer: Will move 91.62 MB in this
  144. iteration
  145. Jun 23, 2012 6:40:25 PM 0 0 KB
  146. 512.4 GB 91.62 MB
  147.  
  148. All disks are 100% full:
  149.  
  150. > df /home1 /ladd/data2 /ladd/data6 /ladd/data12 /ladd/iris_data2
  151. Filesystem 1K-blocks Used Available Use% Mounted on
  152. /dev/md2 96132876 84858660 6390864 93% /home1
  153. ladd02:/data2 307355648 306812928 542720 100% /ladd/data2
  154. ladd06:/data6 547709952 522897408 24812544 96% /ladd/data6
  155. /data12 348293764 323569928 24723836 93% /ladd/data12
  156. iris02:/iris_data2 1749492736 1749464064 28672 100% /ladd/
  157. iris_data2
RAW Paste Data