Advertisement
jolausa

Untitled

Apr 9th, 2019
150
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 7.62 KB | None | 0 0
  1. 0:38:41 crean: jose_lausuch: what's up?
  2. 10:40:03 jose_lausuch: crean: I have a problem with bonding test case
  3. 10:40:20 jose_lausuch: crean: I'm basically replicating this config https://gitlab.suse.de/wicked-maintainers/wicked/blob/master/man/ifcfg-bonding.5.in#L45
  4. 10:41:04 crean: jose_lausuch: you may also add explicit primary=eth0
  5. 10:41:11 jose_lausuch: crean: I have a VM with 2 NICs (eth0, eth1) connected to the same bridge
  6. 10:42:04 crean: jose_lausuch: this works for active-backup mode only
  7. 10:42:17 crean: jose_lausuch: what's the problem?
  8. 10:42:32 jose_lausuch: vi ifc root
  9. 10:42:35 jose_lausuch: oops
  10. 10:42:52 jose_lausuch: crean: let me try that explicit primary=eth0
  11. 10:43:50 crean: jose_lausuch: a bridge is is not permitted to forward lacp frames (by IEEE standards), because most bondings are under the bridge
  12. 10:44:17 crean: jose_lausuch: but perhaps you describe what happens first?
  13. 10:46:30 jose_lausuch: crean: I had primary=eth1 and it didn't work, but after changing that to primary=eth0 it works
  14. 10:46:37 jose_lausuch: the bond interface is created
  15. 10:50:11 crean: jose_lausuch: a bond in active backup check the link of a (primary) slave and is using only this one slave -- the another one is in "sparse" mode.
  16. 10:52:32 crean: jose_lausuch: both slaves are required to be in same vlan on the switch, e.g. when you're using ovs.
  17. 10:53:00 jose_lausuch: crean: ok, actually I'm not using OVS here, just libvirt networks
  18. 10:53:15 jose_lausuch: which creates a linux bridge
  19. 10:53:45 crean: jose_lausuch: forget libvirt networks. they're applying bridge filters, iptable rules, ... that _disallow_ it.
  20. 10:54:41 jose_lausuch: crean: which problems can it cause? the bonding interface is up and I can ping REF (another VM on the same bridge)
  21. 10:55:06 jose_lausuch: asmorodskyi: ^
  22. 10:55:41 crean: jose_lausuch: basically they're trying to "isolate" the nics and e.g. permit traffic from eth0 MAC on the eth0's vif in the bridge.
  23. 10:57:42 crean: jose_lausuch: an the bonding takes the mac from one of the slaves (primary / first active one, depends also on follow_mac mode) and is setting/using this mac on _both_ slaves, so depending on the exact config of the bridge + filter rules, this may be even considered as loop and dropped.
  24. 10:59:03 jose_lausuch: crean: and this potential issues will not appear if we bring the VMs on top of OVS, right?
  25. 11:00:19 crean: jose_lausuch: so you effectively get either something like this:
  26. 11:00:21 crean: http://pastebin.nue.suse.com/24037/src
  27. 11:01:16 crean: jose_lausuch: the bond sends packets with MAC from eth0 (or another one set by LLADDR on bond0) through both ports.
  28. 11:01:54 crean: jose_lausuch: but in case of libvirt networks, the filter may disallow this.
  29. 11:02:14 jose_lausuch: crean: by filter, you mean iptables?
  30. 11:02:17 crean: jose_lausuch: the old testsuite were using a setup like this:
  31. 11:02:19 crean: http://pastebin.nue.suse.com/24025/src
  32. 11:02:33 crean: yes, but also ebtables
  33. 11:03:09 jose_lausuch: mmm, about this second configuration, isn't bonds supposed to be on interfaces connected to the same switch?
  34. 11:03:20 crean: so depending on the primary on both VMs, the traffic goes through with a chance of 50% 
  35. 11:03:42 crean: jose_lausuch: no, this depends on the bonding mode.
  36. 11:04:38 crean: jose_lausuch: in active-backup and in balance-alb it needs to be a single switch=bridge,
  37. 11:04:57 jose_lausuch: yes, otherwise it could happen what you say, 50% chance
  38. 11:05:27 crean: jose_lausuch: but e.g. in balance-alb the reference should _not_ configure a bond, but use a single interface.
  39. 11:05:29 jose_lausuch: crean: in the test you have in wicked-testsuite, it's basically using active-backup
  40. rossella_s [rossella@rsblendido.openvpn2.suse.de] entered the room. (11:06:37)
  41. 11:06:48 crean: oh... I've team meeting now.
  42. 11:06:49 jose_lausuch: but there are actually ifcfg/xml files for other configs as well (balance-alb, rr, tlb, etc..)
  43. 11:06:57 jose_lausuch: crean: no worries, thanks for the help 
  44. 11:08:16 crean: jose_lausuch: yes, that's why there are two bridges and probably also 3 eth interfaces or some small tricks like this.
  45. 11:08:46 crean: jose_lausuch: e.g. images-config/files-ref/ifcfg-bond0-ab enforces to use primary=eth1
  46. 11:09:46 crean: jose_lausuch: and on test-files/bonding/ifcfg-bond0-ab as well
  47. 11:10:18 crean: it is because of active-backup
  48. 11:10:30 crean: and the 50% chance to work
  49. 11:11:25 crean: jose_lausuch: see also test-files/bonding/ifcfg-bond0-alb on features/wicked_9_aggregation.feature -> features/step_definitions/wicked_actions.rb
  50. 11:11:30 jose_lausuch: crean: ok, will take that into consideration, but in any case, I will follow the recommendation of using only 1 OVS bridge for active-backup config
  51. 11:12:35 crean: jose_lausuch: this will IMO work only with active-backup, e.g. lacp,balance-rr,balance-xor will most probably not work with single switch at all.
  52. 11:13:03 jose_lausuch: crean: yes, in that case I would use 2 bridges
  53. 11:15:30 crean: jose_lausuch: see the matrix the old testsuite is using with the config -> http://pastebin.nue.suse.com/24045/src
  54. 11:17:55 jose_lausuch: crean: ok
  55. 11:18:02 jose_lausuch: crean: are you testing all combinations?
  56. 11:20:31 crean: sure
  57. 11:20:49 crean: active-backup is the most useless one 
  58. 11:21:40 jose_lausuch: crean: well, for active-backup test, it would make sense to include failover test, like bringing down the master, while checking that network flow is not interrupted (just thinking loud)
  59. 11:22:26 crean: 99% of the bonds in the real-life are lacp alias 802.3ad (==standard) or balance-xor (also inner xfer part of lacp)
  60. 11:23:21 crean: jose_lausuch: bringing down the active slave you mean, bringing down the master breaks the bonding completely.
  61. 11:23:59 crean: jose_lausuch: but this test has basically nothing in common with wicked
  62. 11:24:09 jose_lausuch: crean: there is no failover?
  63. 11:24:17 jose_lausuch: crean: yes, sure 
  64. 11:24:24 crean: jose_lausuch: no, master=bond0
  65. 11:25:02 crean: jose_lausuch: more usefull, where wicked is involved would be to remove + re-add slave interfaces, aka hotplugging tests.
  66. 11:25:38 jose_lausuch: crean: sorry, I meant the active slave, not master (which of course is braking the bond)
  67. 11:26:03 crean: jose_lausuch: see https://w3.suse.de/~mt/tools/
  68. 11:27:13 crean: jose_lausuch: ./ifbind.sh unbind eth0 -> removes eth0 (enslaved in bond0) from system, like pulling the _NIC_
  69. 11:27:35 jose_lausuch: crean: yep, we use that script in our tests
  70. 11:27:50 crean: jose_lausuch: ./ifbind.sh bind eth0 -> re-adds eth0 to system, like putting the _NIC_ back -> wicked should automatically enslave it.
  71. 11:28:41 crean: jose_lausuch: the problems here are all the "50% chances" due to the setup of the host-bridges, ...
  72. 11:33:47 jose_lausuch: crean: ok!
  73. 11:38:50 crean: jose_lausuch: you can try to pull cable (so a slave gets NO-CARRIER) when possible with the hypervisor you have or bind|unbind the nic completely, but do not make tweaks like "ip link set down $primary" -> this is an invalid tweak in bonding cases usually.
  74. 11:40:13 jose_lausuch: crean: doing things in hypervisor directly might be tricky for our automated scearios, but I guess it's enough by using ifbind.sh ?
  75. 11:41:54 crean: jose_lausuch: yes
  76. 11:43:15 crean: jose_lausuch: e.g. in KVM/libvirt I didn't found any possibility to loose carrier (maybe some qemu remote controls permit it, just libvirt not), but vmware has such a switch somewhere in the cli / expert settings.
  77. 11:43:35 crean: jose_lausuch: but as this is difficult to trigger -> bind / unbind only.
  78. 11:44:13 crean: jose_lausuch: also, loosing a carrier is kernel/bonding test, bind / unbind is wicked's work.
  79. 12:40:08 jose_lausuch: crean: understood, thanks 
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement