Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # I have a hypothesis that client requests should immediately get the 10s delay behavior if TCP requests hang
- # although only if the request triggers a request against the blocked node
- # Furthermore I expect that garage correctly marks nodes as failed if the connection is rejected
- # I will use iptables on my docker host to simulate both cases:
- # - DROP being similar to container removal with TCP connections hanging
- # - REJECT being similar to garage server stop with TCP connections immediately closed
- # simulate container being removed
- sudo iptables -I FORWARD 1 -s 172.20.0.2 -d 172.20.0.4 -j DROP
- # telnet test to indicate the connection hangs
- root@d20c7b7ba66b:/# curl -v telnet://172.20.0.4:3901
- * Trying 172.20.0.4:3901...
- * TCP_NODELAY set
- # status after a while
- root@d20c7b7ba66b:/# garage status
- ==== HEALTHY NODES ====
- ID Hostname Address Tags Zone Capacity
- 306dc3d22479daf2… fee972f61861 172.20.0.3:3901 [garage2] gar2 10
- 1a3701c8c354b7fd… d20c7b7ba66b 172.20.0.2:3901 [garage1] gar1 10
- 5f786b59f46a2abe… c23a2a9a20e3 172.20.0.4:3901 [garage3] gar3 10
- # Node still shows up as healthy but after some time we see client impact
- # Doesn't actually simulate the container being removed entirely, despite my hypothesis..
- # I don't understand why it remains working for a while. Is the delay I see here actually another problem?
- # undo
- sudo iptables -D FORWARD -s 172.20.0.2 -d 172.20.0.4 -j DROP
- # simulate garage server stopped
- sudo iptables -I FORWARD 1 -s 172.20.0.2 -d 172.20.0.4 -j REJECT
- # telnet test to indicate that it works as intended
- root@d20c7b7ba66b:/# curl -v telnet://172.20.0.4:3901
- * Trying 172.20.0.4:3901...
- * TCP_NODELAY set
- * connect to 172.20.0.4 port 3901 failed: Connection refused
- * Failed to connect to 172.20.0.4 port 3901: Connection refused
- * Closing connection 0
- curl: (7) Failed to connect to 172.20.0.4 port 3901: Connection refused
- # status after a while
- root@d20c7b7ba66b:/# garage status
- ==== HEALTHY NODES ====
- ID Hostname Address Tags Zone Capacity
- 306dc3d22479daf2… fee972f61861 172.20.0.3:3901 [garage2] gar2 10
- 1a3701c8c354b7fd… d20c7b7ba66b 172.20.0.2:3901 [garage1] gar1 10
- 5f786b59f46a2abe… c23a2a9a20e3 172.20.0.4:3901 [garage3] gar3 10
- # Node still shows up as healthy but after some time we see client impact
- # Doesn't actually simulate the garage server being stopped, despite my hypothesis..
- # - The node doesn't get marked as failed as assumed
- # - Client sees impact even though requests to this node immediately fails?
- # - The time between adding the rule and impact again suggests it's not as simple as the client request hanging on one node?
- # undo
- sudo iptables -D FORWARD -s 172.20.0.2 -d 172.20.0.4 -j REJECT
- # after removing the reject it takes a bit of time before client requests are handled correctly again
- # but this is expected as the node is re-initializing before serving requests, so definitely not a bug
- # if the node was marked as failed we shouldn't see any impact while it re-initializes.
- # For reference, telnet returns this when connecting to a container where garage server is stopped
- root@d20c7b7ba66b:/# curl -v telnet://172.20.0.4:3901
- * Trying 172.20.0.4:3901...
- * TCP_NODELAY set
- * connect to 172.20.0.4 port 3901 failed: Connection refused
- * Failed to connect to 172.20.0.4 port 3901: Connection refused
- * Closing connection 0
- curl: (7) Failed to connect to 172.20.0.4 port 3901: Connection refused
- # Afterwards the node is usually immediately getting marked as failed and there is no client impact
- # Notably, when I was trying the above steps in various ways I did actually manage to get into a state
- # where stopping the server still took a long time to be detected as node failure
- # but this did not cause client problems
- # For reference, telnet connects as expected when the server is garage running and unblocked
- root@d20c7b7ba66b:/# curl -v telnet://172.20.0.4:3901
- * Trying 172.20.0.4:3901...
- * TCP_NODELAY set
- * Connected to 172.20.0.4 (172.20.0.4) port 3901 (#0)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement