1. 14 Oct, 2015 22 commits
  2. 13 Oct, 2015 18 commits
    • David S. Miller's avatar
      Merge branch 'bridge-vlan' · bbb300eb
      David S. Miller authored
      Nikolay Aleksandrov says:
      
      ====================
      bridge: vlan: cleanups & fixes (part 3)
      
      Patch 01 converts the vlgrp member to use rcu as it was already used in a
      similar way so better to make it official and use all the available RCU
      instrumentation. Patch 02 fixes a bug where the vlan_list can be traversed
      without rtnl or rcu held which could lead to using freed entries.
      Patch 03 removes some redundant code that isn't needed anymore.
      Patch 04 fixes a bug reported by Ido Schimmel about the vlan_flush order
      and switchdevs, it moves it back.
      
      v2: patch 03 and 04 are new, couldn't escape the second synchronize_rcu()
      since the rhtable destruction can sleep
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbb300eb
    • Nikolay Aleksandrov's avatar
      bridge: vlan: move back vlan_flush · f409d0ed
      Nikolay Aleksandrov authored
      Ido Schimmel reported a problem with switchdev devices because of the
      order change of del_nbp operations, more specifically the move of
      nbp_vlan_flush() which deletes all vlans and frees vlgrp after the
      rx_handler has been unregistered. So in order to fix this move
      vlan_flush back where it was and make it destroy the rhtable after
      NULLing vlgrp and waiting a grace period to make sure noone can see it.
      Reported-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f409d0ed
    • Nikolay Aleksandrov's avatar
      bridge: vlan: drop unnecessary flush code · b8d02c3c
      Nikolay Aleksandrov authored
      As Ido Schimmel pointed out the vlan_vid_del() code in nbp_vlan_flush is
      unnecessary (and is actually a remnant of the old vlan code) so we can
      remove it.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8d02c3c
    • Nikolay Aleksandrov's avatar
      bridge: vlan: use rcu for vlan_list traversal in br_fill_ifinfo · e9c953ef
      Nikolay Aleksandrov authored
      br_fill_ifinfo is called by br_ifinfo_notify which can be called from
      many contexts with different locks held, sometimes it relies upon
      bridge's spinlock only which is a problem for the vlan code, so use
      explicitly rcu for that to avoid problems.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9c953ef
    • Nikolay Aleksandrov's avatar
      bridge: vlan: use proper rcu for the vlgrp member · 907b1e6e
      Nikolay Aleksandrov authored
      The bridge and port's vlgrp member is already used in RCU way, currently
      we rely on the fact that it cannot disappear while the port exists but
      that is error-prone and we might miss places with improper locking
      (either RCU or RTNL must be held to walk the vlan_list). So make it
      official and use RCU for vlgrp to catch offenders. Introduce proper vlgrp
      accessors and use them consistently throughout the code.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      907b1e6e
    • David S. Miller's avatar
      Merge branch 'vrf-ipv6' · 4b918163
      David S. Miller authored
      David Ahern says:
      
      ====================
      net: VRF support in IPv6 stack
      
      Initial support for VRF in IPv6 stack. Makes IPv6 functionality on par
      with IPv4 -- ping, tcp client/server and udp client/server all work fine.
      tcpdump on vrf device and external tap (e.g., host side tap device) shows
      all packets with proper addresses. IPv6 does not need the source address
      operation like IPv4. Verified vti6 works properly in my setup as does use
      of an IPv6 address on the VRF device.
      
      v3
      - re-based to top of net-next (updates per net namespace changes by Eric)
      - fixed dst_entry typecasts as requested by Dave
      - added flags to inet6_rtm_getroute (IPv6 version of deaa0a6a)
      
      v2
      - fixed CONFIG_IPV6 dependency as questioned by Cong
        - if IPV6 is a module, kbuild ensures VRF is a module
        - if IPV6 is disabled IPV6 functionality is compiled out of VRF module
      - addressed comments from Nik over IRC
        - removed duplicate call to netif_is_l3_master in l3mdev_rt6_dst_by_oif
        - changed allocation flag from GFP_ATOMIC to GFP_KERNEL since it is init time
        - added free of rt6i_pcpu
        - check_ipv6_frame returns false only if packet is NDISC type
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b918163
    • David Ahern's avatar
      net: Add VRF support to IPv6 stack · ca254490
      David Ahern authored
      As with IPv4 support for VRFs added to IPv6 stack by replacing hardcoded
      table ids with possibly device specific ones and manipulating the oif in
      the flowi6. The flow flags are used to skip oif compare in nexthop lookups
      if the device is enslaved to a VRF via the L3 master device.
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca254490
    • David Ahern's avatar
      net: Add IPv6 support to VRF device · 35402e31
      David Ahern authored
      Add support for IPv6 to VRF device driver. Implemenation parallels what
      has been done for IPv4.
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35402e31
    • David Ahern's avatar
      c4850687
    • David Ahern's avatar
      net: Add IPv6 support to l3mdev · ccf3c8c3
      David Ahern authored
      Add operations to retrieve cached IPv6 dst entry from l3mdev device
      and lookup IPv6 source address.
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ccf3c8c3
    • Nikolay Aleksandrov's avatar
      bridge: fix gc_timer mod/del race condition · af379392
      Nikolay Aleksandrov authored
      commit c62987bb ("bridge: push bridge setting ageing_time down to
      switchdev") introduced a timer race condition because the gc_timer can
      get rearmed after it's supposedly stopped and flushed in br_dev_delete()
      leading to a use of freed memory. So take rtnl to sync with bridge
      destruction when setting ageing_timer.
      Here's the trace reproduced with these two commands running in parallel:
      while :; do echo 10000 > /sys/class/net/br0/bridge/ageing_timer; done;
      while :; do brctl addbr br0; ip l set br0 up; ip l set br0 down;
      brctl delbr br0; done;
      
      [  300.000029] BUG: unable to handle kernel paging request at
      ffffffff811c59d3
      [  300.000263] IP: [<ffffffff810f168e>] __internal_add_timer+0x2e/0xd0
      [  300.000422] PGD 1a0f067 PUD 1a10063 PMD 10001e1
      [  300.000639] Oops: 0003 [#1] SMP
      [  300.000793] Modules linked in: bridge stp llc nfsd auth_rpcgss
      oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul
      crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev aesni_intel
      aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd
      snd_hda_codec_generic qxl drm_kms_helper psmouse pcspkr ttm
      snd_hda_intel 9pnet_virtio evdev serio_raw joydev snd_hda_codec 9pnet
      virtio_balloon drm snd_hwdep virtio_console snd_hda_core pvpanic snd_pcm
      i2c_piix4 snd_timer acpi_cpufreq parport_pc snd parport soundcore button
      processor i2c_core ipv6 autofs4 hid_generic usbhid hid ext4 crc16
      mbcache jbd2 sg sr_mod cdrom ata_generic virtio_blk virtio_net e1000
      ehci_pci uhci_hcd ehci_hcd usbcore usb_common floppy ata_piix libata
      virtio_pci virtio_ring virtio scsi_mod
      [  300.004008] CPU: 1 PID: 1169 Comm: bash Not tainted 4.3.0-rc3+ #46
      [  300.004008] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  300.004008] task: ffff880035be2200 ti: ffff88003795c000 task.ti:
      ffff88003795c000
      [  300.004008] RIP: 0010:[<ffffffff810f168e>]  [<ffffffff810f168e>]
      __internal_add_timer+0x2e/0xd0
      [  300.004008] RSP: 0018:ffff88003fd03e78  EFLAGS: 00010046
      [  300.004008] RAX: ffff88003fd0ef60 RBX: 840fc78949c08548 RCX:
      00000001ffffffff
      [  300.004008] RDX: 0000000000000000 RSI: ffffffff811c59d3 RDI:
      ffff88003fd0df00
      [  300.004008] RBP: ffff88003fd03e78 R08: 00000000ffffffff R09:
      0000000000000000
      [  300.004008] R10: 0000000000000000 R11: 0000000000000000 R12:
      ffff88003fd0df00
      [  300.004008] R13: 0000000000000000 R14: 0000000000000001 R15:
      ffffffff816032e0
      [  300.004008] FS:  00007fcbdd609700(0000) GS:ffff88003fd00000(0000)
      knlGS:0000000000000000
      [  300.004008] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  300.004008] CR2: ffffffff811c59d3 CR3: 0000000037879000 CR4:
      00000000000406e0
      [  300.004008] Stack:
      [  300.004008]  ffff88003fd03ea8 ffffffff810f1775 ffff88003c8cb958
      ffff88003fd0df00
      [  300.004008]  0000000000000000 0000000000000001 ffff88003fd03f18
      ffffffff810f28c4
      [  300.004008]  ffff88003fd0eb68 ffff88003fd0e968 ffff88003fd0e768
      ffff88003fd0df68
      [  300.004008] Call Trace:
      [  300.004008]  <IRQ>
      [  300.004008]  [<ffffffff810f1775>] cascade+0x45/0x70
      [  300.004008]  [<ffffffff810f28c4>] run_timer_softirq+0x2f4/0x340
      [  300.004008]  [<ffffffff8107e380>] __do_softirq+0xd0/0x440
      [  300.004008]  [<ffffffff8107e8a3>] irq_exit+0xb3/0xc0
      [  300.004008]  [<ffffffff815c2032>] smp_apic_timer_interrupt+0x42/0x50
      [  300.004008]  [<ffffffff815bfe37>] apic_timer_interrupt+0x87/0x90
      [  300.004008]  <EOI>
      [  300.004008]  [<ffffffff811fb80c>] ? create_object+0x13c/0x2e0
      [  300.004008]  [<ffffffff8109b23e>] ? __kernel_text_address+0x4e/0x70
      [  300.004008]  [<ffffffff8109b23e>] ? __kernel_text_address+0x4e/0x70
      [  300.004008]  [<ffffffff8101e17f>] print_context_stack+0x7f/0xf0
      [  300.004008]  [<ffffffff8101d55b>] dump_trace+0x11b/0x300
      [  300.004008]  [<ffffffff8102970b>] save_stack_trace+0x2b/0x50
      [  300.004008]  [<ffffffff811fb80c>] create_object+0x13c/0x2e0
      [  300.004008]  [<ffffffff815b2e8e>] kmemleak_alloc+0x4e/0xb0
      [  300.004008]  [<ffffffff811e475d>] kmem_cache_alloc_trace+0x18d/0x2f0
      [  300.004008]  [<ffffffff8128b139>] kernfs_fop_open+0xc9/0x380
      [  300.004008]  [<ffffffff8120214f>] do_dentry_open+0x1ff/0x2f0
      [  300.004008]  [<ffffffff8128b070>] ? kernfs_fop_release+0x70/0x70
      [  300.004008]  [<ffffffff812034f9>] vfs_open+0x59/0x60
      [  300.004008]  [<ffffffff812130de>] path_openat+0x1ce/0x1260
      [  300.004008]  [<ffffffff812154ae>] do_filp_open+0x7e/0xe0
      [  300.004008]  [<ffffffff812251ff>] ? __alloc_fd+0xaf/0x180
      [  300.004008]  [<ffffffff8120387b>] do_sys_open+0x12b/0x210
      [  300.004008]  [<ffffffff8120397e>] SyS_open+0x1e/0x20
      [  300.004008]  [<ffffffff815bf0b6>] entry_SYSCALL_64_fastpath+0x16/0x7a
      [  300.004008] Code: 66 90 48 8b 46 10 48 8b 4f 40 55 48 89 c2 48 89 e5
      48 29 ca 48 81 fa ff 00 00 00 77 20 0f b6 c0 48 8d 44 c7 68 48 8b 10 48
      85 d2 <48> 89 16 74 04 48 89 72 08 48 89 30 48 89 46 08 5d c3 48 81 fa
      [  300.004008] RIP  [<ffffffff810f168e>] __internal_add_timer+0x2e/0xd0
      [  300.004008]  RSP <ffff88003fd03e78>
      [  300.004008] CR2: ffffffff811c59d3
      
      Fixes: c62987bb ("bridge: push bridge setting ageing_time down to switchdev")
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarScott Feldman <sfeldma@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af379392
    • Nikolay Aleksandrov's avatar
      switchdev: enforce no pvid flag in vlan ranges · cc02aa8e
      Nikolay Aleksandrov authored
      We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: default avatarElad Raz <eladr@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc02aa8e
    • David S. Miller's avatar
      Merge branch 'dsa-mv88e6xxx-fix-hardware-bridging' · f83665d0
      David S. Miller authored
      Vivien Didelot says:
      
      ====================
      net: dsa: mv88e6xxx: fix hardware bridging
      
      DSA and its drivers currently hook the NETDEV_CHANGEUPPER net_device event in
      order to configure the VLAN map of every port.
      
      This VLAN map is a feature of these switch chips to hardcode and restrict which
      output ports a given input port can egress frames to.
      
      A Linux bridge is a simple untagged VLAN propagated by the bridge code itself.
      With a proper 802.1Q support, a driver does not need this hook anymore, and
      will simply program the related VLAN object.
      
      This patchset improves the hardware bridging code in the mv88e6xxx driver with
      a strict 802.1Q mode.
      
      Ideally, the equivalent must be done for Broadcom Starfighter 2 and Rocker,
      before completely getting rid of this hook.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f83665d0
    • Vivien Didelot's avatar
      net: dsa: mv88e6xxx: fix hardware bridging · 5fe7f680
      Vivien Didelot authored
      Playing with the VLAN map of every port to implement "hardware bridging"
      in the 88E6352 driver was a hack until full 802.1Q was supported.
      
      Indeed with 802.1Q port mode "Disabled" or "Fallback", this feature is
      used to restrict which output ports an input port can egress frames to.
      
      A Linux bridge is an untagged VLAN. With full 802.1Q support, we don't
      need this hack anymore and can use the "Secure" strict 802.1Q port mode.
      
      With this mode, the port-based VLAN map still needs to be configured,
      but all the logic is VTU-centric. This means that the switch only cares
      about rules described in its hardware VLAN table, which is exactly what
      Linux bridge expects and what we want.
      
      Note also that the hardware bridging was broken with the previous
      flexible "Fallback" 802.1Q port mode. Here's an example:
      
      Port0 and Port1 belong to the same bridge. If Port0 sends crafted tagged
      frames with VID 200 to Port1, Port1 receives it. Even if Port1 is in
      hardware VLAN 200, but not Port0, Port1 will still receive it, because
      Fallback mode doesn't care about invalid VID or non-member source port.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5fe7f680
    • Vivien Didelot's avatar
      net: dsa: do not warn unsupported bridge ops · efd29b3d
      Vivien Didelot authored
      A DSA driver may not provide the port_join_bridge and port_leave_bridge
      functions, so don't warn in such case.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      efd29b3d
    • Vivien Didelot's avatar
      net: dsa: mv88e6xxx: do not support per-port FID · f02bdffc
      Vivien Didelot authored
      Since we configure a switch chip through a Linux bridge, and a bridge is
      implemented as a VLAN, there is no need for per-port FID anymore.
      
      This patch gets rid of this and simplifies the driver code since we can
      now directly map all 4095 FIDs available to all VLANs.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f02bdffc
    • Vivien Didelot's avatar
      net: dsa: mv88e6xxx: bridges do not need an FID · ede8098d
      Vivien Didelot authored
      With 88E6352 and similar switch chips, each port has a map to restrict
      which output port this input port can egress frames to.
      
      The current driver code implements hardware bridging using this feature,
      and assigns to a bridge group the FID of its first member.
      
      Now that 802.1Q is fully implemented in this driver, a Linux bridge
      which is a simple untagged VLAN, already gets its own FID.
      
      This patch gets rid of the per-bridge FID and explicits the usage of the
      port based VLAN map feature.
      Signed-off-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ede8098d
    • Sowmini Varadhan's avatar
      RDS-TCP: Reset tcp callbacks if re-using an outgoing socket in rds_tcp_accept_one() · 241b2719
      Sowmini Varadhan authored
      Consider the following "duelling syn" sequence between two peers A and B:
              	A		B
              	SYN1     -->
              	    	<--	SYN2
              	SYN2ACK  -->
      
      Note that the SYN/ACK has already been sent out by TCP before
      rds_tcp_accept_one() gets invoked as part of callbacks.
      
      If the inet_addr(A) is numerically less than inet_addr(B),
      the arbitration scheme in rds_tcp_accept_one() will prefer the
      TCP connection triggered by SYN1, and will send a CLOSE for the
      SYN2 (just after the SYN2ACK was sent).
      
      Since B also follows the same arbitration scheme, it will send the SYN-ACK
      for SYN1 that will set up a healthy ESTABLISHED connection on both sides.
      B will also get a  CLOSE for SYN2, which should result in the cleanup
      of the TCP state machine for SYN2, but it should not trigger any
      stale RDS-TCP callbacks (such as ->writespace, ->state_change etc),
      that would disrupt the progress of the SYN2 based RDS-TCP  connection.
      
      Thus the arbitration scheme in rds_tcp_accept_one() should restore
      rds_tcp callbacks for the winner before setting them up for the
      new accept socket, and also make sure that conn->c_outgoing
      is set to 0 so that we do not trigger any reconnect attempts on the
      passive side of the tcp socket in the future, in conformance with
      commit c82ac7e6 ("net/rds: RDS-TCP: only initiate reconnect attempt
      on outgoing TCP socket.")
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      241b2719