Advertisement
Guest User

Untitled

a guest
Dec 2nd, 2015
80
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 10.19 KB | None | 0 0
  1. ##Replacing a node with a machine of the same name
  2. ###Objective
  3. The goal is to replace a Riak instance with a new Riak instance of the same name so that the application environment does not need to have instance-name related changes.
  4.  
  5. ###Scenario
  6.  
  7. Riak is running in a cluster of five nodes.
  8.  
  9. * `riak@node1.localdomain` on `node1.localdomain` (192.168.17.11)
  10. * `riak@node2.localdomain` on `node2.localdomain` (192.168.17.12)
  11. * `riak@node3.localdomain` on `node3.localdomain` (192.168.17.13)
  12. * `riak@node4.localdomain` on `node4.localdomain` (192.168.17.14)
  13. * `riak@node5.localdomain` on `node5.localdomain` (192.168.17.15)
  14.  
  15. The load-balancer being used performs periodic checks on the Riak nodes to determine if they are suitable for servicing requests.
  16.  
  17. A hard failure has occurred on `node2.localdomain` and it will not receive requests until it is replaced with a node of the same name.
  18.  
  19. The goal is to replace `riak@node2.localdomain` with a new Riak instance named `riak@node2.localdomain` so that the application environment does not need to have instance-name related changes.
  20.  
  21. ### The Process tl;dr
  22. This process can be accomplished in three steps, the details of which will be discussed below.
  23.  
  24. * [Down the Node](#down)
  25. * [Build the Node with a Temporary Name](#build)
  26. * [Rename the Node to the Original Name](#rename)
  27.  
  28.  
  29.  
  30. ----
  31. ### The Process
  32. #### [Down the Node](id:down)
  33. 1. Stop riak on `riak@node2.localdomain` if the node is still running in any way.
  34. >**riak stop**
  35.  
  36. ```
  37. node2> riak stop
  38. Attempting to restart script through sudo -H -u riak
  39. ok
  40. node2>
  41. ```
  42.  
  43. 2. Shutdown `node2.localdomain`, using any means, from `shutdown -h now` to hitting the power button.
  44.  
  45.  
  46. 3. Mark `riak@node2` down from `node1.localdomain`
  47. >**riak-admin down riak@node2.localdomain**
  48.  
  49. ```
  50. node1> riak-admin down riak@node2.localdomain
  51. Attempting to restart script through sudo -H -u riak
  52. Success: "riak@node2.localdomain" marked as down
  53. node1>
  54. ```
  55. This will tell the cluster that this node is offline and ring-state transtions should be allowed, and can be run from any running cluster node.
  56.  
  57.  
  58. #### [Build the Node with a Temporary Name](id:build)
  59. 1. Reformat `node2.localdomain` or start with clean hardware and install Riak.
  60.  
  61. 2. Edit the `vm.args` file on the new node and set the `-name` argument as follows:
  62.  
  63. **Note: Using a temporary, yet resolvable, name for the Riak instance is important**
  64.  
  65. ```
  66. -name riak@192.168.17.12
  67. ```
  68.  
  69.  
  70. 3. Start `riak@192.168.17.12` on `node2.localdomain`.
  71. > **riak start**
  72.  
  73. ```
  74. node2> riak start
  75. Attempting to restart script through sudo -H -u riak
  76. node2>
  77. ```
  78.  
  79. 4. Join the newly created node to the cluster.
  80.  
  81. >**riak-admin cluster join riak@node1.localdomain**
  82.  
  83. ```
  84. node2> riak-admin cluster join riak@node1.localdomain
  85. Attempting to restart script through sudo -H -u riak
  86. Success: staged join request for 'riak@192.168.17.12' to 'riak@node1.localdomain'
  87. node2>
  88. ```
  89.  
  90. 5. Use `force-replace` to change all ownership references from `riak@node2.localdomain` to `riak@192.168.17.12`.
  91. > **riak-admin cluster force-replace riak@node2.localdomain riak@192.168.17.12**
  92.  
  93. ```
  94. node2> riak-admin cluster force-replace riak@node2.localdomain riak@192.168.17.12
  95. Attempting to restart script through sudo -H -u riak
  96. Success: staged forced replacement of 'riak@node2.localdomain' with 'riak@192.168.17.18'
  97. node2>
  98. ```
  99.  
  100. 6. Show the planned cluster changes.
  101. > **riak-admin cluster plan**
  102.  
  103. ```
  104. node2> riak-admin cluster plan
  105. Attempting to restart script through sudo -H -u riak
  106. =========================== Staged Changes ============================
  107. Action Nodes(s)
  108. -----------------------------------------------------------------------
  109. join 'riak@192.168.17.12'
  110. force-replace 'riak@node2.localdomain' with 'riak@192.168.17.12'
  111. -----------------------------------------------------------------------
  112.  
  113. WARNING: All of 'riak@node2.localdomain' replicas will be lost
  114.  
  115. NOTE: Applying these changes will result in 1 cluster transition
  116.  
  117. #######################################################################
  118. After cluster transition 1/1
  119. #######################################################################
  120.  
  121. ============================= Membership ==============================
  122. Status Ring Pending Node
  123. -----------------------------------------------------------------------
  124. valid 20.3% -- 'riak@192.168.17.12'
  125. valid 20.3% -- 'riak@node1.localdomain'
  126. valid 20.3% -- 'riak@node3.localdomain'
  127. valid 20.3% -- 'riak@node4.localdomain'
  128. valid 18.8% -- 'riak@node5.localdomain'
  129. -----------------------------------------------------------------------
  130. Valid:5 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
  131.  
  132. Partitions reassigned from cluster changes: 13
  133. 13 reassigned from 'riak@node2.localdomain' to 'riak@192.168.17.12'
  134.  
  135. node2>
  136. ```
  137.  
  138. 7. Commit the changes to the cluster.
  139. > **riak-admin cluster commit**
  140.  
  141. ```
  142. node2> riak-admin cluster commit
  143. Attempting to restart script through sudo -H -u riak
  144. Cluster changes committed
  145. node2>
  146. ```
  147. 8. Check that everything connected and functioning as expected
  148. >**riak-admin member-status**
  149.  
  150. ```
  151. node2> riak-admin member-status
  152. Attempting to restart script through sudo -H -u riak
  153. ============================= Membership ==============================
  154. Status Ring Pending Node
  155. -----------------------------------------------------------------------
  156. valid 20.3% -- 'riak@192.168.17.18'
  157. valid 20.3% -- 'riak@node1.localdomain'
  158. valid 20.3% -- 'riak@node3.localdomain'
  159. valid 20.3% -- 'riak@node4.localdomain'
  160. valid 18.8% -- 'riak@node5.localdomain'
  161. -----------------------------------------------------------------------
  162. Valid:5 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
  163. ```
  164.  
  165. #### [Rename the Node to the Original Name](id:rename)
  166.  
  167. 1. Stop `riak@192.168.17.12` on `node2.localdomain`.
  168. >**riak stop**
  169.  
  170. ```
  171. node2> riak stop
  172. ok
  173. node2>
  174. ```
  175.  
  176.  
  177. 2. Mark `riak@192.168.17.12` down from `node1.localdomain`.
  178.  
  179. >**riak-admin down riak@192.168.17.12**
  180.  
  181. ```
  182. node1> riak-admin down riak@192.168.17.12
  183. Attempting to restart script through sudo -H -u riak
  184. Success: "riak@192.168.17.12" marked as down
  185. node1>
  186. ```
  187.  
  188. 3. Edit the `vm.args` file on the new node and set the `-name` argument as follows:
  189.  
  190. ```
  191. -name riak@node2.localdomain
  192. ```
  193.  
  194.  
  195. 4. Back up the `riak@192.168.17.12` ring folder by renaming it to ring_192.186.17.12. The ring files location can be determined by inspecting the `app.config` file, and are usually found in `/var/lib/riak/ring/`.
  196. >**mv /var/lib/riak/ring /var/lib/riak/ring_192.186.17.12**
  197.  
  198. ```
  199. node2> mv /var/lib/riak/ring /var/lib/riak/ring_192.186.17.12
  200. node2>
  201. ```
  202. Moving the ring files will cause the node to "forget" that it was a member of a cluster and allow the node to start up with the new name.
  203.  
  204. 5. Start riak on `node2.localdomain`.
  205.  
  206. >**riak start**
  207.  
  208. ```
  209. node2> riak start
  210. Attempting to restart script through sudo -H -u riak
  211. node2>
  212. ```
  213.  
  214. 6. Join the `riak@node2.localdomain` to the cluster.
  215. >**riak-admin cluster join riak@node1.localdomain**
  216.  
  217. ```
  218. node2> riak-admin cluster join riak@node2.localdomain
  219. Attempting to restart script through sudo -H -u riak
  220. Success: staged join request for 'riak@node2.localdomain' to 'riak@node1.localdomain'
  221. node2>
  222. ```
  223.  
  224. 7. Use `force-replace` to change all ownership references from `riak@192.168.17.12` to `riak@node2`.
  225. >**riak-admin cluster force-replace riak@192.168.17.12 riak@node2.localdomain**
  226.  
  227. ```
  228. node2> riak-admin cluster force-replace riak@192.168.17.12 riak@node2.localdomain
  229. Attempting to restart script through sudo -H -u riak
  230. Success: staged forced replacement of 'riak@192.168.17.18' with 'riak@node2.localdomain'
  231. node2>
  232. ```
  233.  
  234. 8. Show the planned changed to the cluster.
  235. >**riak-admin cluster plan**
  236.  
  237. ```
  238. node2> riak-admin cluster plan
  239. ```
  240.  
  241. 9. Commit the changes.
  242. >**riak-admin cluster commit**
  243.  
  244. ```
  245. node2> riak-admin cluster commit
  246. ```
  247.  
  248. 10. Check that everything is running as expected
  249. >**riak-admin member-status**
  250.  
  251. ```
  252. node2> riak-admin member-status
  253. Attempting to restart script through sudo -H -u riak
  254. =========================== Staged Changes ============================
  255. Action Nodes(s)
  256. -----------------------------------------------------------------------
  257. force-replace 'riak@192.168.17.12' with 'riak@node2.localdomain'
  258. join 'riak@node2.localdomain'
  259. -----------------------------------------------------------------------
  260.  
  261. WARNING: All of 'riak@192.168.17.12' replicas will be lost
  262.  
  263. NOTE: Applying these changes will result in 1 cluster transition
  264.  
  265. #######################################################################
  266. After cluster transition 1/1
  267. #######################################################################
  268.  
  269. ============================= Membership ==============================
  270. Status Ring Pending Node
  271. -----------------------------------------------------------------------
  272. valid 20.3% -- 'riak@node1.localdomain'
  273. valid 20.3% -- 'riak@node2.localdomain'
  274. valid 20.3% -- 'riak@node3.localdomain'
  275. valid 20.3% -- 'riak@node4.localdomain'
  276. valid 18.8% -- 'riak@node5.localdomain'
  277. -----------------------------------------------------------------------
  278. Valid:5 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
  279.  
  280. Partitions reassigned from cluster changes: 13
  281. 13 reassigned from 'riak@192.168.17.12' to 'riak@node2.localdomain'
  282.  
  283. node2>
  284. ```
  285.  
  286. 11. Remove the backed-up ring folder from `node2.localdomain`
  287.  
  288. >**rm -rf /var/lib/riak/ring_192.186.17.12**
  289.  
  290. ```
  291. node2> rm -rf /var/lib/riak/ring_192.186.17.12
  292. node2>
  293. ```
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement