Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ##Replacing a node with a machine of the same name
- ###Objective
- The goal is to replace a Riak instance with a new Riak instance of the same name so that the application environment does not need to have instance-name related changes.
- ###Scenario
- Riak is running in a cluster of five nodes.
- * `[email protected]` on `node1.localdomain` (192.168.17.11)
- * `[email protected]` on `node2.localdomain` (192.168.17.12)
- * `[email protected]` on `node3.localdomain` (192.168.17.13)
- * `[email protected]` on `node4.localdomain` (192.168.17.14)
- * `[email protected]` on `node5.localdomain` (192.168.17.15)
- The load-balancer being used performs periodic checks on the Riak nodes to determine if they are suitable for servicing requests.
- A hard failure has occurred on `node2.localdomain` and it will not receive requests until it is replaced with a node of the same name.
- The goal is to replace `[email protected]` with a new Riak instance named `[email protected]` so that the application environment does not need to have instance-name related changes.
- ### The Process tl;dr
- This process can be accomplished in three steps, the details of which will be discussed below.
- * [Down the Node](#down)
- * [Build the Node with a Temporary Name](#build)
- * [Rename the Node to the Original Name](#rename)
- ----
- ### The Process
- #### [Down the Node](id:down)
- 1. Stop riak on `[email protected]` if the node is still running in any way.
- >**riak stop**
- ```
- node2> riak stop
- Attempting to restart script through sudo -H -u riak
- ok
- node2>
- ```
- 2. Shutdown `node2.localdomain`, using any means, from `shutdown -h now` to hitting the power button.
- 3. Mark `riak@node2` down from `node1.localdomain`
- >**riak-admin down [email protected]**
- ```
- node1> riak-admin down [email protected]
- Attempting to restart script through sudo -H -u riak
- Success: "[email protected]" marked as down
- node1>
- ```
- This will tell the cluster that this node is offline and ring-state transtions should be allowed, and can be run from any running cluster node.
- #### [Build the Node with a Temporary Name](id:build)
- 1. Reformat `node2.localdomain` or start with clean hardware and install Riak.
- 2. Edit the `vm.args` file on the new node and set the `-name` argument as follows:
- **Note: Using a temporary, yet resolvable, name for the Riak instance is important**
- ```
- -name [email protected]
- ```
- 3. Start `[email protected]` on `node2.localdomain`.
- > **riak start**
- ```
- node2> riak start
- Attempting to restart script through sudo -H -u riak
- node2>
- ```
- 4. Join the newly created node to the cluster.
- >**riak-admin cluster join [email protected]**
- ```
- node2> riak-admin cluster join [email protected]
- Attempting to restart script through sudo -H -u riak
- Success: staged join request for '[email protected]' to '[email protected]'
- node2>
- ```
- 5. Use `force-replace` to change all ownership references from `[email protected]` to `[email protected]`.
- > **riak-admin cluster force-replace [email protected] [email protected]**
- ```
- node2> riak-admin cluster force-replace [email protected] [email protected]
- Attempting to restart script through sudo -H -u riak
- Success: staged forced replacement of '[email protected]' with '[email protected]'
- node2>
- ```
- 6. Show the planned cluster changes.
- > **riak-admin cluster plan**
- ```
- node2> riak-admin cluster plan
- Attempting to restart script through sudo -H -u riak
- =========================== Staged Changes ============================
- Action Nodes(s)
- -----------------------------------------------------------------------
- join '[email protected]'
- force-replace '[email protected]' with '[email protected]'
- -----------------------------------------------------------------------
- WARNING: All of '[email protected]' replicas will be lost
- NOTE: Applying these changes will result in 1 cluster transition
- #######################################################################
- After cluster transition 1/1
- #######################################################################
- ============================= Membership ==============================
- Status Ring Pending Node
- -----------------------------------------------------------------------
- valid 20.3% -- '[email protected]'
- valid 20.3% -- '[email protected]'
- valid 20.3% -- '[email protected]'
- valid 20.3% -- '[email protected]'
- valid 18.8% -- '[email protected]'
- -----------------------------------------------------------------------
- Valid:5 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
- Partitions reassigned from cluster changes: 13
- 13 reassigned from '[email protected]' to '[email protected]'
- node2>
- ```
- 7. Commit the changes to the cluster.
- > **riak-admin cluster commit**
- ```
- node2> riak-admin cluster commit
- Attempting to restart script through sudo -H -u riak
- Cluster changes committed
- node2>
- ```
- 8. Check that everything connected and functioning as expected
- >**riak-admin member-status**
- ```
- node2> riak-admin member-status
- Attempting to restart script through sudo -H -u riak
- ============================= Membership ==============================
- Status Ring Pending Node
- -----------------------------------------------------------------------
- valid 20.3% -- '[email protected]'
- valid 20.3% -- '[email protected]'
- valid 20.3% -- '[email protected]'
- valid 20.3% -- '[email protected]'
- valid 18.8% -- '[email protected]'
- -----------------------------------------------------------------------
- Valid:5 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
- ```
- #### [Rename the Node to the Original Name](id:rename)
- 1. Stop `[email protected]` on `node2.localdomain`.
- >**riak stop**
- ```
- node2> riak stop
- ok
- node2>
- ```
- 2. Mark `[email protected]` down from `node1.localdomain`.
- >**riak-admin down [email protected]**
- ```
- node1> riak-admin down [email protected]
- Attempting to restart script through sudo -H -u riak
- Success: "[email protected]" marked as down
- node1>
- ```
- 3. Edit the `vm.args` file on the new node and set the `-name` argument as follows:
- ```
- -name [email protected]
- ```
- 4. Back up the `[email protected]` ring folder by renaming it to ring_192.186.17.12. The ring files location can be determined by inspecting the `app.config` file, and are usually found in `/var/lib/riak/ring/`.
- >**mv /var/lib/riak/ring /var/lib/riak/ring_192.186.17.12**
- ```
- node2> mv /var/lib/riak/ring /var/lib/riak/ring_192.186.17.12
- node2>
- ```
- Moving the ring files will cause the node to "forget" that it was a member of a cluster and allow the node to start up with the new name.
- 5. Start riak on `node2.localdomain`.
- >**riak start**
- ```
- node2> riak start
- Attempting to restart script through sudo -H -u riak
- node2>
- ```
- 6. Join the `[email protected]` to the cluster.
- >**riak-admin cluster join [email protected]**
- ```
- node2> riak-admin cluster join [email protected]
- Attempting to restart script through sudo -H -u riak
- Success: staged join request for '[email protected]' to '[email protected]'
- node2>
- ```
- 7. Use `force-replace` to change all ownership references from `[email protected]` to `riak@node2`.
- >**riak-admin cluster force-replace [email protected] [email protected]**
- ```
- node2> riak-admin cluster force-replace [email protected] [email protected]
- Attempting to restart script through sudo -H -u riak
- Success: staged forced replacement of '[email protected]' with '[email protected]'
- node2>
- ```
- 8. Show the planned changed to the cluster.
- >**riak-admin cluster plan**
- ```
- node2> riak-admin cluster plan
- ```
- 9. Commit the changes.
- >**riak-admin cluster commit**
- ```
- node2> riak-admin cluster commit
- ```
- 10. Check that everything is running as expected
- >**riak-admin member-status**
- ```
- node2> riak-admin member-status
- Attempting to restart script through sudo -H -u riak
- =========================== Staged Changes ============================
- Action Nodes(s)
- -----------------------------------------------------------------------
- force-replace '[email protected]' with '[email protected]'
- join '[email protected]'
- -----------------------------------------------------------------------
- WARNING: All of '[email protected]' replicas will be lost
- NOTE: Applying these changes will result in 1 cluster transition
- #######################################################################
- After cluster transition 1/1
- #######################################################################
- ============================= Membership ==============================
- Status Ring Pending Node
- -----------------------------------------------------------------------
- valid 20.3% -- '[email protected]'
- valid 20.3% -- '[email protected]'
- valid 20.3% -- '[email protected]'
- valid 20.3% -- '[email protected]'
- valid 18.8% -- '[email protected]'
- -----------------------------------------------------------------------
- Valid:5 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
- Partitions reassigned from cluster changes: 13
- 13 reassigned from '[email protected]' to '[email protected]'
- node2>
- ```
- 11. Remove the backed-up ring folder from `node2.localdomain`
- >**rm -rf /var/lib/riak/ring_192.186.17.12**
- ```
- node2> rm -rf /var/lib/riak/ring_192.186.17.12
- node2>
- ```
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement