Advertisement
Guest User

Untitled

a guest
Mar 12th, 2022
33
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 4.14 KB | None | 0 0
  1. # I have a hypothesis that client requests should immediately get the 10s delay behavior if TCP requests hang
  2. # although only if the request triggers a request against the blocked node
  3. # Furthermore I expect that garage correctly marks nodes as failed if the connection is rejected
  4. # I will use iptables on my docker host to simulate both cases:
  5. # - DROP being similar to container removal with TCP connections hanging
  6. # - REJECT being similar to garage server stop with TCP connections immediately closed
  7.  
  8. # simulate container being removed
  9. sudo iptables -I FORWARD 1 -s 172.20.0.2 -d 172.20.0.4 -j DROP
  10.  
  11. # telnet test to indicate the connection hangs
  12. root@d20c7b7ba66b:/# curl -v telnet://172.20.0.4:3901
  13. * Trying 172.20.0.4:3901...
  14. * TCP_NODELAY set
  15.  
  16. # status after a while
  17. root@d20c7b7ba66b:/# garage status
  18. ==== HEALTHY NODES ====
  19. ID Hostname Address Tags Zone Capacity
  20. 306dc3d22479daf2… fee972f61861 172.20.0.3:3901 [garage2] gar2 10
  21. 1a3701c8c354b7fd… d20c7b7ba66b 172.20.0.2:3901 [garage1] gar1 10
  22. 5f786b59f46a2abe… c23a2a9a20e3 172.20.0.4:3901 [garage3] gar3 10
  23.  
  24. # Node still shows up as healthy but after some time we see client impact
  25. # Doesn't actually simulate the container being removed entirely, despite my hypothesis..
  26. # I don't understand why it remains working for a while. Is the delay I see here actually another problem?
  27.  
  28. # undo
  29. sudo iptables -D FORWARD -s 172.20.0.2 -d 172.20.0.4 -j DROP
  30.  
  31.  
  32. # simulate garage server stopped
  33. sudo iptables -I FORWARD 1 -s 172.20.0.2 -d 172.20.0.4 -j REJECT
  34.  
  35. # telnet test to indicate that it works as intended
  36. root@d20c7b7ba66b:/# curl -v telnet://172.20.0.4:3901
  37. * Trying 172.20.0.4:3901...
  38. * TCP_NODELAY set
  39. * connect to 172.20.0.4 port 3901 failed: Connection refused
  40. * Failed to connect to 172.20.0.4 port 3901: Connection refused
  41. * Closing connection 0
  42. curl: (7) Failed to connect to 172.20.0.4 port 3901: Connection refused
  43.  
  44. # status after a while
  45. root@d20c7b7ba66b:/# garage status
  46. ==== HEALTHY NODES ====
  47. ID Hostname Address Tags Zone Capacity
  48. 306dc3d22479daf2… fee972f61861 172.20.0.3:3901 [garage2] gar2 10
  49. 1a3701c8c354b7fd… d20c7b7ba66b 172.20.0.2:3901 [garage1] gar1 10
  50. 5f786b59f46a2abe… c23a2a9a20e3 172.20.0.4:3901 [garage3] gar3 10
  51.  
  52. # Node still shows up as healthy but after some time we see client impact
  53. # Doesn't actually simulate the garage server being stopped, despite my hypothesis..
  54. # - The node doesn't get marked as failed as assumed
  55. # - Client sees impact even though requests to this node immediately fails?
  56. # - The time between adding the rule and impact again suggests it's not as simple as the client request hanging on one node?
  57.  
  58. # undo
  59. sudo iptables -D FORWARD -s 172.20.0.2 -d 172.20.0.4 -j REJECT
  60.  
  61. # after removing the reject it takes a bit of time before client requests are handled correctly again
  62. # but this is expected as the node is re-initializing before serving requests, so definitely not a bug
  63. # if the node was marked as failed we shouldn't see any impact while it re-initializes.
  64.  
  65.  
  66. # For reference, telnet returns this when connecting to a container where garage server is stopped
  67.  
  68. root@d20c7b7ba66b:/# curl -v telnet://172.20.0.4:3901
  69. * Trying 172.20.0.4:3901...
  70. * TCP_NODELAY set
  71. * connect to 172.20.0.4 port 3901 failed: Connection refused
  72. * Failed to connect to 172.20.0.4 port 3901: Connection refused
  73. * Closing connection 0
  74. curl: (7) Failed to connect to 172.20.0.4 port 3901: Connection refused
  75.  
  76. # Afterwards the node is usually immediately getting marked as failed and there is no client impact
  77. # Notably, when I was trying the above steps in various ways I did actually manage to get into a state
  78. # where stopping the server still took a long time to be detected as node failure
  79. # but this did not cause client problems
  80.  
  81. # For reference, telnet connects as expected when the server is garage running and unblocked
  82.  
  83. root@d20c7b7ba66b:/# curl -v telnet://172.20.0.4:3901
  84. * Trying 172.20.0.4:3901...
  85. * TCP_NODELAY set
  86. * Connected to 172.20.0.4 (172.20.0.4) port 3901 (#0)
  87.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement