Advertisement
Guest User

Untitled

a guest
Dec 2nd, 2015
82
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 10.19 KB | None | 0 0
  1. ##Replacing a node with a machine of the same name
  2. ###Objective
  3. The goal is to replace a Riak instance with a new Riak instance of the same name so that the application environment does not need to have instance-name related changes.
  4.  
  5. ###Scenario
  6.  
  7. Riak is running in a cluster of five nodes.
  8.  
  9. * `[email protected]` on `node1.localdomain` (192.168.17.11)
  10. * `[email protected]` on `node2.localdomain` (192.168.17.12)
  11. * `[email protected]` on `node3.localdomain` (192.168.17.13)
  12. * `[email protected]` on `node4.localdomain` (192.168.17.14)
  13. * `[email protected]` on `node5.localdomain` (192.168.17.15)
  14.  
  15. The load-balancer being used performs periodic checks on the Riak nodes to determine if they are suitable for servicing requests.
  16.  
  17. A hard failure has occurred on `node2.localdomain` and it will not receive requests until it is replaced with a node of the same name.
  18.  
  19. The goal is to replace `[email protected]` with a new Riak instance named `[email protected]` so that the application environment does not need to have instance-name related changes.
  20.  
  21. ### The Process tl;dr
  22. This process can be accomplished in three steps, the details of which will be discussed below.
  23.  
  24. * [Down the Node](#down)
  25. * [Build the Node with a Temporary Name](#build)
  26. * [Rename the Node to the Original Name](#rename)
  27.  
  28.  
  29.  
  30. ----
  31. ### The Process
  32. #### [Down the Node](id:down)
  33. 1. Stop riak on `[email protected]` if the node is still running in any way.
  34. >**riak stop**
  35.  
  36. ```
  37. node2> riak stop
  38. Attempting to restart script through sudo -H -u riak
  39. ok
  40. node2>
  41. ```
  42.  
  43. 2. Shutdown `node2.localdomain`, using any means, from `shutdown -h now` to hitting the power button.
  44.  
  45.  
  46. 3. Mark `riak@node2` down from `node1.localdomain`
  47. >**riak-admin down [email protected]**
  48.  
  49. ```
  50. node1> riak-admin down [email protected]
  51. Attempting to restart script through sudo -H -u riak
  52. Success: "[email protected]" marked as down
  53. node1>
  54. ```
  55. This will tell the cluster that this node is offline and ring-state transtions should be allowed, and can be run from any running cluster node.
  56.  
  57.  
  58. #### [Build the Node with a Temporary Name](id:build)
  59. 1. Reformat `node2.localdomain` or start with clean hardware and install Riak.
  60.  
  61. 2. Edit the `vm.args` file on the new node and set the `-name` argument as follows:
  62.  
  63. **Note: Using a temporary, yet resolvable, name for the Riak instance is important**
  64.  
  65. ```
  66. ```
  67.  
  68.  
  69. 3. Start `[email protected]` on `node2.localdomain`.
  70. > **riak start**
  71.  
  72. ```
  73. node2> riak start
  74. Attempting to restart script through sudo -H -u riak
  75. node2>
  76. ```
  77.  
  78. 4. Join the newly created node to the cluster.
  79.  
  80. >**riak-admin cluster join [email protected]**
  81.  
  82. ```
  83. node2> riak-admin cluster join [email protected]
  84. Attempting to restart script through sudo -H -u riak
  85. Success: staged join request for '[email protected]' to '[email protected]'
  86. node2>
  87. ```
  88.  
  89. 5. Use `force-replace` to change all ownership references from `[email protected]` to `[email protected]`.
  90. > **riak-admin cluster force-replace [email protected] [email protected]**
  91.  
  92. ```
  93. node2> riak-admin cluster force-replace [email protected] [email protected]
  94. Attempting to restart script through sudo -H -u riak
  95. Success: staged forced replacement of '[email protected]' with '[email protected]'
  96. node2>
  97. ```
  98.  
  99. 6. Show the planned cluster changes.
  100. > **riak-admin cluster plan**
  101.  
  102. ```
  103. node2> riak-admin cluster plan
  104. Attempting to restart script through sudo -H -u riak
  105. =========================== Staged Changes ============================
  106. Action Nodes(s)
  107. -----------------------------------------------------------------------
  108. force-replace '[email protected]' with '[email protected]'
  109. -----------------------------------------------------------------------
  110.  
  111. WARNING: All of '[email protected]' replicas will be lost
  112.  
  113. NOTE: Applying these changes will result in 1 cluster transition
  114.  
  115. #######################################################################
  116. After cluster transition 1/1
  117. #######################################################################
  118.  
  119. ============================= Membership ==============================
  120. Status Ring Pending Node
  121. -----------------------------------------------------------------------
  122. valid 20.3% -- '[email protected]'
  123. valid 20.3% -- '[email protected]'
  124. valid 20.3% -- '[email protected]'
  125. valid 20.3% -- '[email protected]'
  126. valid 18.8% -- '[email protected]'
  127. -----------------------------------------------------------------------
  128. Valid:5 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
  129.  
  130. Partitions reassigned from cluster changes: 13
  131. 13 reassigned from '[email protected]' to '[email protected]'
  132.  
  133. node2>
  134. ```
  135.  
  136. 7. Commit the changes to the cluster.
  137. > **riak-admin cluster commit**
  138.  
  139. ```
  140. node2> riak-admin cluster commit
  141. Attempting to restart script through sudo -H -u riak
  142. Cluster changes committed
  143. node2>
  144. ```
  145. 8. Check that everything connected and functioning as expected
  146. >**riak-admin member-status**
  147.  
  148. ```
  149. node2> riak-admin member-status
  150. Attempting to restart script through sudo -H -u riak
  151. ============================= Membership ==============================
  152. Status Ring Pending Node
  153. -----------------------------------------------------------------------
  154. valid 20.3% -- '[email protected]'
  155. valid 20.3% -- '[email protected]'
  156. valid 20.3% -- '[email protected]'
  157. valid 20.3% -- '[email protected]'
  158. valid 18.8% -- '[email protected]'
  159. -----------------------------------------------------------------------
  160. Valid:5 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
  161. ```
  162.  
  163. #### [Rename the Node to the Original Name](id:rename)
  164.  
  165. 1. Stop `[email protected]` on `node2.localdomain`.
  166. >**riak stop**
  167.  
  168. ```
  169. node2> riak stop
  170. ok
  171. node2>
  172. ```
  173.  
  174.  
  175. 2. Mark `[email protected]` down from `node1.localdomain`.
  176.  
  177. >**riak-admin down [email protected]**
  178.  
  179. ```
  180. node1> riak-admin down [email protected]
  181. Attempting to restart script through sudo -H -u riak
  182. Success: "[email protected]" marked as down
  183. node1>
  184. ```
  185.  
  186. 3. Edit the `vm.args` file on the new node and set the `-name` argument as follows:
  187.  
  188. ```
  189. ```
  190.  
  191.  
  192. 4. Back up the `[email protected]` ring folder by renaming it to ring_192.186.17.12. The ring files location can be determined by inspecting the `app.config` file, and are usually found in `/var/lib/riak/ring/`.
  193. >**mv /var/lib/riak/ring /var/lib/riak/ring_192.186.17.12**
  194.  
  195. ```
  196. node2> mv /var/lib/riak/ring /var/lib/riak/ring_192.186.17.12
  197. node2>
  198. ```
  199. Moving the ring files will cause the node to "forget" that it was a member of a cluster and allow the node to start up with the new name.
  200.  
  201. 5. Start riak on `node2.localdomain`.
  202.  
  203. >**riak start**
  204.  
  205. ```
  206. node2> riak start
  207. Attempting to restart script through sudo -H -u riak
  208. node2>
  209. ```
  210.  
  211. 6. Join the `[email protected]` to the cluster.
  212. >**riak-admin cluster join [email protected]**
  213.  
  214. ```
  215. node2> riak-admin cluster join [email protected]
  216. Attempting to restart script through sudo -H -u riak
  217. Success: staged join request for '[email protected]' to '[email protected]'
  218. node2>
  219. ```
  220.  
  221. 7. Use `force-replace` to change all ownership references from `[email protected]` to `riak@node2`.
  222. >**riak-admin cluster force-replace [email protected] [email protected]**
  223.  
  224. ```
  225. node2> riak-admin cluster force-replace [email protected] [email protected]
  226. Attempting to restart script through sudo -H -u riak
  227. Success: staged forced replacement of '[email protected]' with '[email protected]'
  228. node2>
  229. ```
  230.  
  231. 8. Show the planned changed to the cluster.
  232. >**riak-admin cluster plan**
  233.  
  234. ```
  235. node2> riak-admin cluster plan
  236. ```
  237.  
  238. 9. Commit the changes.
  239. >**riak-admin cluster commit**
  240.  
  241. ```
  242. node2> riak-admin cluster commit
  243. ```
  244.  
  245. 10. Check that everything is running as expected
  246. >**riak-admin member-status**
  247.  
  248. ```
  249. node2> riak-admin member-status
  250. Attempting to restart script through sudo -H -u riak
  251. =========================== Staged Changes ============================
  252. Action Nodes(s)
  253. -----------------------------------------------------------------------
  254. force-replace '[email protected]' with '[email protected]'
  255. -----------------------------------------------------------------------
  256.  
  257. WARNING: All of '[email protected]' replicas will be lost
  258.  
  259. NOTE: Applying these changes will result in 1 cluster transition
  260.  
  261. #######################################################################
  262. After cluster transition 1/1
  263. #######################################################################
  264.  
  265. ============================= Membership ==============================
  266. Status Ring Pending Node
  267. -----------------------------------------------------------------------
  268. valid 20.3% -- '[email protected]'
  269. valid 20.3% -- '[email protected]'
  270. valid 20.3% -- '[email protected]'
  271. valid 20.3% -- '[email protected]'
  272. valid 18.8% -- '[email protected]'
  273. -----------------------------------------------------------------------
  274. Valid:5 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
  275.  
  276. Partitions reassigned from cluster changes: 13
  277. 13 reassigned from '[email protected]' to '[email protected]'
  278.  
  279. node2>
  280. ```
  281.  
  282. 11. Remove the backed-up ring folder from `node2.localdomain`
  283.  
  284. >**rm -rf /var/lib/riak/ring_192.186.17.12**
  285.  
  286. ```
  287. node2> rm -rf /var/lib/riak/ring_192.186.17.12
  288. node2>
  289. ```
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement