1. 19 Jun, 2019 1 commit
  2. 18 Jun, 2019 4 commits
    • Fei Li's avatar
      tun: wake up waitqueues after IFF_UP is set · 72b319dc
      Fei Li authored
      Currently after setting tap0 link up, the tun code wakes tx/rx waited
      queues up in tun_net_open() when .ndo_open() is called, however the
      IFF_UP flag has not been set yet. If there's already a wait queue, it
      would fail to transmit when checking the IFF_UP flag in tun_sendmsg().
      Then the saving vhost_poll_start() will add the wq into wqh until it
      is waken up again. Although this works when IFF_UP flag has been set
      when tun_chr_poll detects; this is not true if IFF_UP flag has not
      been set at that time. Sadly the latter case is a fatal error, as
      the wq will never be waken up in future unless later manually
      setting link up on purpose.
      
      Fix this by moving the wakeup process into the NETDEV_UP event
      notifying process, this makes sure IFF_UP has been set before all
      waited queues been waken up.
      Signed-off-by: default avatarFei Li <lifei.shirley@bytedance.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72b319dc
    • JingYi Hou's avatar
      net: remove duplicate fetch in sock_getsockopt · d0bae4a0
      JingYi Hou authored
      In sock_getsockopt(), 'optlen' is fetched the first time from userspace.
      'len < 0' is then checked. Then in condition 'SO_MEMINFO', 'optlen' is
      fetched the second time from userspace.
      
      If change it between two fetches may cause security problems or unexpected
      behaivor, and there is no reason to fetch it a second time.
      
      To fix this, we need to remove the second fetch.
      Signed-off-by: default avatarJingYi Hou <houjingyi647@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0bae4a0
    • Tuong Lien's avatar
      tipc: fix issues with early FAILOVER_MSG from peer · d0f84d08
      Tuong Lien authored
      It appears that a FAILOVER_MSG can come from peer even when the failure
      link is resetting (i.e. just after the 'node_write_unlock()'...). This
      means the failover procedure on the node has not been started yet.
      The situation is as follows:
      
               node1                                node2
        linkb          linka                  linka        linkb
          |              |                      |            |
          |              |                      x failure    |
          |              |                  RESETTING        |
          |              |                      |            |
          |              x failure            RESET          |
          |          RESETTING             FAILINGOVER       |
          |              |   (FAILOVER_MSG)     |            |
          |<-------------------------------------------------|
          | *FAILINGOVER |                      |            |
          |              | (dummy FAILOVER_MSG) |            |
          |------------------------------------------------->|
          |            RESET                    |            | FAILOVER_END
          |         FAILINGOVER               RESET          |
          .              .                      .            .
          .              .                      .            .
          .              .                      .            .
      
      Once this happens, the link failover procedure will be triggered
      wrongly on the receiving node since the node isn't in FAILINGOVER state
      but then another link failover will be carried out.
      The consequences are:
      
      1) A peer might get stuck in FAILINGOVER state because the 'sync_point'
      was set, reset and set incorrectly, the criteria to end the failover
      would not be met, it could keep waiting for a message that has already
      received.
      
      2) The early FAILOVER_MSG(s) could be queued in the link failover
      deferdq but would be purged or not pulled out because the 'drop_point'
      was not set correctly.
      
      3) The early FAILOVER_MSG(s) could be dropped too.
      
      4) The dummy FAILOVER_MSG could make the peer leaving FAILINGOVER state
      shortly, but later on it would be restarted.
      
      The same situation can also happen when the link is in PEER_RESET state
      and a FAILOVER_MSG arrives.
      
      The commit resolves the issues by forcing the link down immediately, so
      the failover procedure will be started normally (which is the same as
      when receiving a FAILOVER_MSG and the link is in up state).
      
      Also, the function "tipc_node_link_failover()" is toughen to avoid such
      a situation from happening.
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.se>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0f84d08
    • Mauro S. M. Rodrigues's avatar
      bnx2x: Check if transceiver implements DDM before access · cf18cecc
      Mauro S. M. Rodrigues authored
      Some transceivers may comply with SFF-8472 even though they do not
      implement the Digital Diagnostic Monitoring (DDM) interface described in
      the spec. The existence of such area is specified by the 6th bit of byte
      92, set to 1 if implemented.
      
      Currently, without checking this bit, bnx2x fails trying to read sfp
      module's EEPROM with the follow message:
      
      ethtool -m enP5p1s0f1
      Cannot get Module EEPROM data: Input/output error
      
      Because it fails to read the additional 256 bytes in which it is assumed
      to exist the DDM data.
      
      This issue was noticed using a Mellanox Passive DAC PN 01FT738. The EEPROM
      data was confirmed by Mellanox as correct and similar to other Passive
      DACs from other manufacturers.
      Signed-off-by: default avatarMauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
      Acked-by: default avatarSudarsana Reddy Kalluru <skalluru@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf18cecc
  3. 17 Jun, 2019 15 commits
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 29f785ff
      Linus Torvalds authored
      Pull vfs fixes from Al Viro:
       "MS_MOVE regression fix + breakage in fsmount(2) (also introduced in
        this cycle, along with fsmount(2) itself).
      
        I'm still digging through the piles of mail, so there might be more
        fixes to follow, but these two are obvious and self-contained, so
        there's no point delaying those..."
      
      * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fs/namespace: fix unprivileged mount propagation
        vfs: fsmount: add missing mntget()
      29f785ff
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · da0f3820
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Lots of bug fixes here:
      
         1) Out of bounds access in __bpf_skc_lookup, from Lorenz Bauer.
      
         2) Fix rate reporting in cfg80211_calculate_bitrate_he(), from John
            Crispin.
      
         3) Use after free in psock backlog workqueue, from John Fastabend.
      
         4) Fix source port matching in fdb peer flow rule of mlx5, from Raed
            Salem.
      
         5) Use atomic_inc_not_zero() in fl6_sock_lookup(), from Eric Dumazet.
      
         6) Network header needs to be set for packet redirect in nfp, from
            John Hurley.
      
         7) Fix udp zerocopy refcnt, from Willem de Bruijn.
      
         8) Don't assume linear buffers in vxlan and geneve error handlers,
            from Stefano Brivio.
      
         9) Fix TOS matching in mlxsw, from Jiri Pirko.
      
        10) More SCTP cookie memory leak fixes, from Neil Horman.
      
        11) Fix VLAN filtering in rtl8366, from Linus Walluij.
      
        12) Various TCP SACK payload size and fragmentation memory limit fixes
            from Eric Dumazet.
      
        13) Use after free in pneigh_get_next(), also from Eric Dumazet.
      
        14) LAPB control block leak fix from Jeremy Sowden"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (145 commits)
        lapb: fixed leak of control-blocks.
        tipc: purge deferredq list for each grp member in tipc_group_delete
        ax25: fix inconsistent lock state in ax25_destroy_timer
        neigh: fix use-after-free read in pneigh_get_next
        tcp: fix compile error if !CONFIG_SYSCTL
        hv_sock: Suppress bogus "may be used uninitialized" warnings
        be2net: Fix number of Rx queues used for flow hashing
        net: handle 802.1P vlan 0 packets properly
        tcp: enforce tcp_min_snd_mss in tcp_mtu_probing()
        tcp: add tcp_min_snd_mss sysctl
        tcp: tcp_fragment() should apply sane memory limits
        tcp: limit payload size of sacked skbs
        Revert "net: phylink: set the autoneg state in phylink_phy_change"
        bpf: fix nested bpf tracepoints with per-cpu data
        bpf: Fix out of bounds memory access in bpf_sk_storage
        vsock/virtio: set SOCK_DONE on peer shutdown
        net: dsa: rtl8366: Fix up VLAN filtering
        net: phylink: set the autoneg state in phylink_phy_change
        net: add high_order_alloc_disable sysctl/static key
        tcp: add tcp_tx_skb_cache sysctl
        ...
      da0f3820
    • Christian Brauner's avatar
      fs/namespace: fix unprivileged mount propagation · d728cf79
      Christian Brauner authored
      When propagating mounts across mount namespaces owned by different user
      namespaces it is not possible anymore to move or umount the mount in the
      less privileged mount namespace.
      
      Here is a reproducer:
      
        sudo mount -t tmpfs tmpfs /mnt
        sudo --make-rshared /mnt
      
        # create unprivileged user + mount namespace and preserve propagation
        unshare -U -m --map-root --propagation=unchanged
      
        # now change back to the original mount namespace in another terminal:
        sudo mkdir /mnt/aaa
        sudo mount -t tmpfs tmpfs /mnt/aaa
      
        # now in the unprivileged user + mount namespace
        mount --move /mnt/aaa /opt
      
      Unfortunately, this is a pretty big deal for userspace since this is
      e.g. used to inject mounts into running unprivileged containers.
      So this regression really needs to go away rather quickly.
      
      The problem is that a recent change falsely locked the root of the newly
      added mounts by setting MNT_LOCKED. Fix this by only locking the mounts
      on copy_mnt_ns() and not when adding a new mount.
      
      Fixes: 3bd045cc ("separate copying and locking mount tree on cross-userns copies")
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Tested-by: default avatarChristian Brauner <christian@brauner.io>
      Acked-by: default avatarChristian Brauner <christian@brauner.io>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarChristian Brauner <christian@brauner.io>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      d728cf79
    • Eric Biggers's avatar
      vfs: fsmount: add missing mntget() · 1b0b9cc8
      Eric Biggers authored
      sys_fsmount() needs to take a reference to the new mount when adding it
      to the anonymous mount namespace.  Otherwise the filesystem can be
      unmounted while it's still in use, as found by syzkaller.
      Reported-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reported-by: syzbot+99de05d099a170867f22@syzkaller.appspotmail.com
      Reported-by: syzbot+7008b8b8ba7df475fdc8@syzkaller.appspotmail.com
      Fixes: 93766fbd ("vfs: syscall: Add fsmount() to create a mount for a superblock")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1b0b9cc8
    • David S. Miller's avatar
      Merge branch 'tcp-fixes' · 4fddbf8a
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      tcp: make sack processing more robust
      
      Jonathan Looney brought to our attention multiple problems
      in TCP stack at the sender side.
      
      SACK processing can be abused by malicious peers to either
      cause overflows, or increase of memory usage.
      
      First two patches fix the immediate problems.
      
      Since the malicious peers abuse senders by advertizing a very
      small MSS in their SYN or SYNACK packet, the last two
      patches add a new sysctl so that admins can chose a higher
      limit for MSS clamping.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4fddbf8a
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-v5.2/fixes-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · eb7c825b
      Linus Torvalds authored
      Pull RISC-V fixes from Paul Walmsley:
       "This contains fixes, defconfig, and DT data changes for the v5.2-rc
        series.
      
        The fixes are relatively straightforward:
      
         - Addition of a TLB fence in the vmalloc_fault path, so the CPU
           doesn't enter an infinite page fault loop
      
         - Readdition of the pm_power_off export, so device drivers that
           reassign it can now be built as modules
      
         - A udelay() fix for RV32, fixing a miscomputation of the delay time
      
         - Removal of deprecated smp_mb__*() barriers
      
        This also adds initial DT data infrastructure for arch/riscv, along
        with initial data for the SiFive FU540-C000 SoC and the corresponding
        HiFive Unleashed board.
      
        We also update the RV64 defconfig to include some core drivers for the
        FU540 in the build"
      
      * tag 'riscv-for-v5.2/fixes-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: remove unused barrier defines
        riscv: mm: synchronize MMU after pte change
        riscv: dts: add initial board data for the SiFive HiFive Unleashed
        riscv: dts: add initial support for the SiFive FU540-C000 SoC
        dt-bindings: riscv: convert cpu binding to json-schema
        dt-bindings: riscv: sifive: add YAML documentation for the SiFive FU540
        arch: riscv: add support for building DTB files from DT source data
        riscv: Fix udelay in RV32.
        riscv: export pm_power_off again
        RISC-V: defconfig: enable clocks, serial console
      eb7c825b
    • Rolf Eike Beer's avatar
      riscv: remove unused barrier defines · 259931fd
      Rolf Eike Beer authored
      They were introduced in commit fab957c1 ("RISC-V: Atomic and
      Locking Code") long after commit 2e39465a ("locking: Remove
      deprecated smp_mb__() barriers") removed the remnants of all previous
      instances from the tree.
      Signed-off-by: default avatarRolf Eike Beer <eb@emlix.com>
      [paul.walmsley@sifive.com: stripped spurious mbox header from patch
       description; fixed commit references in patch header]
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      259931fd
    • ShihPo Hung's avatar
      riscv: mm: synchronize MMU after pte change · bf587caa
      ShihPo Hung authored
      Because RISC-V compliant implementations can cache invalid entries
      in TLB, an SFENCE.VMA is necessary after changes to the page table.
      This patch adds an SFENCE.vma for the vmalloc_fault path.
      Signed-off-by: default avatarShihPo Hung <shihpo.hung@sifive.com>
      [paul.walmsley@sifive.com: reversed tab->whitespace conversion,
       wrapped comment lines]
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: linux-riscv@lists.infradead.org
      Cc: stable@vger.kernel.org
      bf587caa
    • Paul Walmsley's avatar
      riscv: dts: add initial board data for the SiFive HiFive Unleashed · c35f1b87
      Paul Walmsley authored
      Add initial board data for the SiFive HiFive Unleashed A00.
      
      Currently the data populated in this DT file describes the board
      DRAM configuration and the external clock sources that supply the
      PRCI.
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: default avatarPaul Walmsley <paul@pwsan.com>
      Tested-by: default avatarLoys Ollivier <lollivier@baylibre.com>
      Tested-by: default avatarKevin Hilman <khilman@baylibre.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Antony Pavlov <antonynpavlov@gmail.com>
      Cc: devicetree@vger.kernel.org
      Cc: linux-riscv@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      c35f1b87
    • Paul Walmsley's avatar
      riscv: dts: add initial support for the SiFive FU540-C000 SoC · 72296bde
      Paul Walmsley authored
      Add initial support for the SiFive FU540-C000 SoC.  This is a 28nm SoC
      based around the SiFive U54-MC core complex and a TileLink
      interconnect.
      
      This file is expected to grow as more device drivers are added to the
      kernel.
      
      This patch includes a fix to the QSPI memory map due to a
      documentation bug, found by ShihPo Hung <shihpo.hung@sifive.com>, adds
      entries for the I2C controller, and merges all DT changes that
      formerly were made dynamically by the riscv-pk BBL proxy kernel.
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: default avatarPaul Walmsley <paul@pwsan.com>
      Tested-by: default avatarLoys Ollivier <lollivier@baylibre.com>
      Tested-by: default avatarKevin Hilman <khilman@baylibre.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: ShihPo Hung <shihpo.hung@sifive.com>
      Cc: devicetree@vger.kernel.org
      Cc: linux-riscv@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      72296bde
    • Paul Walmsley's avatar
      dt-bindings: riscv: convert cpu binding to json-schema · 4fd669a8
      Paul Walmsley authored
      At Rob's request, we're starting to migrate our DT binding
      documentation to json-schema YAML format.  Start by converting our cpu
      binding documentation.  While doing so, document more properties and
      nodes.  This includes adding binding documentation support for the E51
      and U54 CPU cores ("harts") that are present on this SoC.  These cores
      are described in:
      
          https://static.dev.sifive.com/FU540-C000-v1.0.pdf
      
      This cpus.yaml file is intended to be a starting point and to
      evolve over time.  It passes dt-doc-validate as of the yaml-bindings
      commit 4c79d42e9216.
      
      This patch was originally based on the ARM json-schema binding
      documentation as added by commit 672951cb ("dt-bindings: arm: Convert
      cpu binding to json-schema").
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: default avatarPaul Walmsley <paul@pwsan.com>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: devicetree@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-riscv@lists.infradead.org
      4fd669a8
    • Paul Walmsley's avatar
      dt-bindings: riscv: sifive: add YAML documentation for the SiFive FU540 · c7af5598
      Paul Walmsley authored
      Add YAML DT binding documentation for the SiFive FU540 SoC.  This
      SoC is documented at:
      
          https://static.dev.sifive.com/FU540-C000-v1.0.pdf
      
      Passes dt-doc-validate, as of yaml-bindings commit 4c79d42e9216.
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: default avatarPaul Walmsley <paul@pwsan.com>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: devicetree@vger.kernel.org
      Cc: linux-riscv@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      c7af5598
    • Paul Walmsley's avatar
      arch: riscv: add support for building DTB files from DT source data · 8d4e048d
      Paul Walmsley authored
      Similar to ARM64, add support for building DTB files from DT source
      data for RISC-V boards.
      
      This patch starts with the infrastructure needed for SiFive boards.
      Boards from other vendors would add support here in a similar form.
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: default avatarPaul Walmsley <paul@pwsan.com>
      Tested-by: default avatarLoys Ollivier <lollivier@baylibre.com>
      Tested-by: default avatarKevin Hilman <khilman@baylibre.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      8d4e048d
    • Jeremy Sowden's avatar
      lapb: fixed leak of control-blocks. · 6be8e297
      Jeremy Sowden authored
      lapb_register calls lapb_create_cb, which initializes the control-
      block's ref-count to one, and __lapb_insert_cb, which increments it when
      adding the new block to the list of blocks.
      
      lapb_unregister calls __lapb_remove_cb, which decrements the ref-count
      when removing control-block from the list of blocks, and calls lapb_put
      itself to decrement the ref-count before returning.
      
      However, lapb_unregister also calls __lapb_devtostruct to look up the
      right control-block for the given net_device, and __lapb_devtostruct
      also bumps the ref-count, which means that when lapb_unregister returns
      the ref-count is still 1 and the control-block is leaked.
      
      Call lapb_put after __lapb_devtostruct to fix leak.
      
      Reported-by: syzbot+afb980676c836b4a0afa@syzkaller.appspotmail.com
      Signed-off-by: default avatarJeremy Sowden <jeremy@azazel.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6be8e297
    • Xin Long's avatar
      tipc: purge deferredq list for each grp member in tipc_group_delete · 5cf02612
      Xin Long authored
      Syzbot reported a memleak caused by grp members' deferredq list not
      purged when the grp is be deleted.
      
      The issue occurs when more(msg_grp_bc_seqno(hdr), m->bc_rcv_nxt) in
      tipc_group_filter_msg() and the skb will stay in deferredq.
      
      So fix it by calling __skb_queue_purge for each member's deferredq
      in tipc_group_delete() when a tipc sk leaves the grp.
      
      Fixes: b87a5ea3 ("tipc: guarantee group unicast doesn't bypass group broadcast")
      Reported-by: syzbot+78fbe679c8ca8d264a8d@syzkaller.appspotmail.com
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5cf02612
  4. 16 Jun, 2019 16 commits
    • Eric Dumazet's avatar
      ax25: fix inconsistent lock state in ax25_destroy_timer · d4d5d8e8
      Eric Dumazet authored
      Before thread in process context uses bh_lock_sock()
      we must disable bh.
      
      sysbot reported :
      
      WARNING: inconsistent lock state
      5.2.0-rc3+ #32 Not tainted
      
      inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      blkid/26581 [HC0[0]:SC1[1]:HE1:SE0] takes:
      00000000e0da85ee (slock-AF_AX25){+.?.}, at: spin_lock include/linux/spinlock.h:338 [inline]
      00000000e0da85ee (slock-AF_AX25){+.?.}, at: ax25_destroy_timer+0x53/0xc0 net/ax25/af_ax25.c:275
      {SOFTIRQ-ON-W} state was registered at:
        lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4303
        __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
        _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151
        spin_lock include/linux/spinlock.h:338 [inline]
        ax25_rt_autobind+0x3ca/0x720 net/ax25/ax25_route.c:429
        ax25_connect.cold+0x30/0xa4 net/ax25/af_ax25.c:1221
        __sys_connect+0x264/0x330 net/socket.c:1834
        __do_sys_connect net/socket.c:1845 [inline]
        __se_sys_connect net/socket.c:1842 [inline]
        __x64_sys_connect+0x73/0xb0 net/socket.c:1842
        do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      irq event stamp: 2272
      hardirqs last  enabled at (2272): [<ffffffff810065f3>] trace_hardirqs_on_thunk+0x1a/0x1c
      hardirqs last disabled at (2271): [<ffffffff8100660f>] trace_hardirqs_off_thunk+0x1a/0x1c
      softirqs last  enabled at (1522): [<ffffffff87400654>] __do_softirq+0x654/0x94c kernel/softirq.c:320
      softirqs last disabled at (2267): [<ffffffff81449010>] invoke_softirq kernel/softirq.c:374 [inline]
      softirqs last disabled at (2267): [<ffffffff81449010>] irq_exit+0x180/0x1d0 kernel/softirq.c:414
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(slock-AF_AX25);
        <Interrupt>
          lock(slock-AF_AX25);
      
       *** DEADLOCK ***
      
      1 lock held by blkid/26581:
       #0: 0000000010fd154d ((&ax25->dtimer)){+.-.}, at: lockdep_copy_map include/linux/lockdep.h:175 [inline]
       #0: 0000000010fd154d ((&ax25->dtimer)){+.-.}, at: call_timer_fn+0xe0/0x720 kernel/time/timer.c:1312
      
      stack backtrace:
      CPU: 1 PID: 26581 Comm: blkid Not tainted 5.2.0-rc3+ #32
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_usage_bug.cold+0x393/0x4a2 kernel/locking/lockdep.c:2935
       valid_state kernel/locking/lockdep.c:2948 [inline]
       mark_lock_irq kernel/locking/lockdep.c:3138 [inline]
       mark_lock+0xd46/0x1370 kernel/locking/lockdep.c:3513
       mark_irqflags kernel/locking/lockdep.c:3391 [inline]
       __lock_acquire+0x159f/0x5490 kernel/locking/lockdep.c:3745
       lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4303
       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
       _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151
       spin_lock include/linux/spinlock.h:338 [inline]
       ax25_destroy_timer+0x53/0xc0 net/ax25/af_ax25.c:275
       call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
       expire_timers kernel/time/timer.c:1366 [inline]
       __run_timers kernel/time/timer.c:1685 [inline]
       __run_timers kernel/time/timer.c:1653 [inline]
       run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
       __do_softirq+0x25c/0x94c kernel/softirq.c:293
       invoke_softirq kernel/softirq.c:374 [inline]
       irq_exit+0x180/0x1d0 kernel/softirq.c:414
       exiting_irq arch/x86/include/asm/apic.h:536 [inline]
       smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
       apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
       </IRQ>
      RIP: 0033:0x7f858d5c3232
      Code: 8b 61 08 48 8b 84 24 d8 00 00 00 4c 89 44 24 28 48 8b ac 24 d0 00 00 00 4c 8b b4 24 e8 00 00 00 48 89 7c 24 68 48 89 4c 24 78 <48> 89 44 24 58 8b 84 24 e0 00 00 00 89 84 24 84 00 00 00 8b 84 24
      RSP: 002b:00007ffcaf0cf5c0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
      RAX: 00007f858d7d27a8 RBX: 00007f858d7d8820 RCX: 00007f858d3940d8
      RDX: 00007ffcaf0cf798 RSI: 00000000f5e616f3 RDI: 00007f858d394fee
      RBP: 0000000000000000 R08: 00007ffcaf0cf780 R09: 00007f858d7db480
      R10: 0000000000000000 R11: 0000000009691a75 R12: 0000000000000005
      R13: 00000000f5e616f3 R14: 0000000000000000 R15: 00007ffcaf0cf798
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4d5d8e8
    • Eric Dumazet's avatar
      neigh: fix use-after-free read in pneigh_get_next · f3e92cb8
      Eric Dumazet authored
      Nine years ago, I added RCU handling to neighbours, not pneighbours.
      (pneigh are not commonly used)
      
      Unfortunately I missed that /proc dump operations would use a
      common entry and exit point : neigh_seq_start() and neigh_seq_stop()
      
      We need to read_lock(tbl->lock) or risk use-after-free while
      iterating the pneigh structures.
      
      We might later convert pneigh to RCU and revert this patch.
      
      sysbot reported :
      
      BUG: KASAN: use-after-free in pneigh_get_next.isra.0+0x24b/0x280 net/core/neighbour.c:3158
      Read of size 8 at addr ffff888097f2a700 by task syz-executor.0/9825
      
      CPU: 1 PID: 9825 Comm: syz-executor.0 Not tainted 5.2.0-rc4+ #32
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
       __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
       kasan_report+0x12/0x20 mm/kasan/common.c:614
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
       pneigh_get_next.isra.0+0x24b/0x280 net/core/neighbour.c:3158
       neigh_seq_next+0xdb/0x210 net/core/neighbour.c:3240
       seq_read+0x9cf/0x1110 fs/seq_file.c:258
       proc_reg_read+0x1fc/0x2c0 fs/proc/inode.c:221
       do_loop_readv_writev fs/read_write.c:714 [inline]
       do_loop_readv_writev fs/read_write.c:701 [inline]
       do_iter_read+0x4a4/0x660 fs/read_write.c:935
       vfs_readv+0xf0/0x160 fs/read_write.c:997
       kernel_readv fs/splice.c:359 [inline]
       default_file_splice_read+0x475/0x890 fs/splice.c:414
       do_splice_to+0x127/0x180 fs/splice.c:877
       splice_direct_to_actor+0x2d2/0x970 fs/splice.c:954
       do_splice_direct+0x1da/0x2a0 fs/splice.c:1063
       do_sendfile+0x597/0xd00 fs/read_write.c:1464
       __do_sys_sendfile64 fs/read_write.c:1525 [inline]
       __se_sys_sendfile64 fs/read_write.c:1511 [inline]
       __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511
       do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4592c9
      Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f4aab51dc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
      RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00000000004592c9
      RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000005
      RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000080000000 R11: 0000000000000246 R12: 00007f4aab51e6d4
      R13: 00000000004c689d R14: 00000000004db828 R15: 00000000ffffffff
      
      Allocated by task 9827:
       save_stack+0x23/0x90 mm/kasan/common.c:71
       set_track mm/kasan/common.c:79 [inline]
       __kasan_kmalloc mm/kasan/common.c:489 [inline]
       __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
       kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503
       __do_kmalloc mm/slab.c:3660 [inline]
       __kmalloc+0x15c/0x740 mm/slab.c:3669
       kmalloc include/linux/slab.h:552 [inline]
       pneigh_lookup+0x19c/0x4a0 net/core/neighbour.c:731
       arp_req_set_public net/ipv4/arp.c:1010 [inline]
       arp_req_set+0x613/0x720 net/ipv4/arp.c:1026
       arp_ioctl+0x652/0x7f0 net/ipv4/arp.c:1226
       inet_ioctl+0x2a0/0x340 net/ipv4/af_inet.c:926
       sock_do_ioctl+0xd8/0x2f0 net/socket.c:1043
       sock_ioctl+0x3ed/0x780 net/socket.c:1194
       vfs_ioctl fs/ioctl.c:46 [inline]
       file_ioctl fs/ioctl.c:509 [inline]
       do_vfs_ioctl+0xd5f/0x1380 fs/ioctl.c:696
       ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
       __do_sys_ioctl fs/ioctl.c:720 [inline]
       __se_sys_ioctl fs/ioctl.c:718 [inline]
       __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
       do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 9824:
       save_stack+0x23/0x90 mm/kasan/common.c:71
       set_track mm/kasan/common.c:79 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
       kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
       __cache_free mm/slab.c:3432 [inline]
       kfree+0xcf/0x220 mm/slab.c:3755
       pneigh_ifdown_and_unlock net/core/neighbour.c:812 [inline]
       __neigh_ifdown+0x236/0x2f0 net/core/neighbour.c:356
       neigh_ifdown+0x20/0x30 net/core/neighbour.c:372
       arp_ifdown+0x1d/0x21 net/ipv4/arp.c:1274
       inetdev_destroy net/ipv4/devinet.c:319 [inline]
       inetdev_event+0xa14/0x11f0 net/ipv4/devinet.c:1544
       notifier_call_chain+0xc2/0x230 kernel/notifier.c:95
       __raw_notifier_call_chain kernel/notifier.c:396 [inline]
       raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:403
       call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1749
       call_netdevice_notifiers_extack net/core/dev.c:1761 [inline]
       call_netdevice_notifiers net/core/dev.c:1775 [inline]
       rollback_registered_many+0x9b9/0xfc0 net/core/dev.c:8178
       rollback_registered+0x109/0x1d0 net/core/dev.c:8220
       unregister_netdevice_queue net/core/dev.c:9267 [inline]
       unregister_netdevice_queue+0x1ee/0x2c0 net/core/dev.c:9260
       unregister_netdevice include/linux/netdevice.h:2631 [inline]
       __tun_detach+0xd8a/0x1040 drivers/net/tun.c:724
       tun_detach drivers/net/tun.c:741 [inline]
       tun_chr_close+0xe0/0x180 drivers/net/tun.c:3451
       __fput+0x2ff/0x890 fs/file_table.c:280
       ____fput+0x16/0x20 fs/file_table.c:313
       task_work_run+0x145/0x1c0 kernel/task_work.c:113
       tracehook_notify_resume include/linux/tracehook.h:185 [inline]
       exit_to_usermode_loop+0x273/0x2c0 arch/x86/entry/common.c:168
       prepare_exit_to_usermode arch/x86/entry/common.c:199 [inline]
       syscall_return_slowpath arch/x86/entry/common.c:279 [inline]
       do_syscall_64+0x58e/0x680 arch/x86/entry/common.c:304
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff888097f2a700
       which belongs to the cache kmalloc-64 of size 64
      The buggy address is located 0 bytes inside of
       64-byte region [ffff888097f2a700, ffff888097f2a740)
      The buggy address belongs to the page:
      page:ffffea00025fca80 refcount:1 mapcount:0 mapping:ffff8880aa400340 index:0x0
      flags: 0x1fffc0000000200(slab)
      raw: 01fffc0000000200 ffffea000250d548 ffffea00025726c8 ffff8880aa400340
      raw: 0000000000000000 ffff888097f2a000 0000000100000020 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff888097f2a600: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
       ffff888097f2a680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      >ffff888097f2a700: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                         ^
       ffff888097f2a780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
       ffff888097f2a800: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      
      Fixes: 767e97e1 ("neigh: RCU conversion of struct neighbour")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3e92cb8
    • Eric Dumazet's avatar
      tcp: fix compile error if !CONFIG_SYSCTL · 2e05fcae
      Eric Dumazet authored
      tcp_tx_skb_cache_key and tcp_rx_skb_cache_key must be available
      even if CONFIG_SYSCTL is not set.
      
      Fixes: 0b7d7f6b ("tcp: add tcp_tx_skb_cache sysctl")
      Fixes: ede61ca4 ("tcp: add tcp_rx_skb_cache sysctl")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2e05fcae
    • Dexuan Cui's avatar
      hv_sock: Suppress bogus "may be used uninitialized" warnings · d424a2af
      Dexuan Cui authored
      gcc 8.2.0 may report these bogus warnings under some condition:
      
      warning: ‘vnew’ may be used uninitialized in this function
      warning: ‘hvs_new’ may be used uninitialized in this function
      
      Actually, the 2 pointers are only initialized and used if the variable
      "conn_from_host" is true. The code is not buggy here.
      Signed-off-by: default avatarDexuan Cui <decui@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d424a2af
    • Ivan Vecera's avatar
      be2net: Fix number of Rx queues used for flow hashing · 718f4a25
      Ivan Vecera authored
      Number of Rx queues used for flow hashing returned by the driver is
      incorrect and this bug prevents user to use the last Rx queue in
      indirection table.
      
      Let's say we have a NIC with 6 combined queues:
      
      [root@sm-03 ~]# ethtool -l enp4s0f0
      Channel parameters for enp4s0f0:
      Pre-set maximums:
      RX:             5
      TX:             5
      Other:          0
      Combined:       6
      Current hardware settings:
      RX:             0
      TX:             0
      Other:          0
      Combined:       6
      
      Default indirection table maps all (6) queues equally but the driver
      reports only 5 rings available.
      
      [root@sm-03 ~]# ethtool -x enp4s0f0
      RX flow hash indirection table for enp4s0f0 with 5 RX ring(s):
          0:      0     1     2     3     4     5     0     1
          8:      2     3     4     5     0     1     2     3
         16:      4     5     0     1     2     3     4     5
         24:      0     1     2     3     4     5     0     1
      ...
      
      Now change indirection table somehow:
      
      [root@sm-03 ~]# ethtool -X enp4s0f0 weight 1 1
      [root@sm-03 ~]# ethtool -x enp4s0f0
      RX flow hash indirection table for enp4s0f0 with 6 RX ring(s):
          0:      0     0     0     0     0     0     0     0
      ...
         64:      1     1     1     1     1     1     1     1
      ...
      
      Now it is not possible to change mapping back to equal (default) state:
      
      [root@sm-03 ~]# ethtool -X enp4s0f0 equal 6
      Cannot set RX flow hash configuration: Invalid argument
      
      Fixes: 594ad54a ("be2net: Add support for setting and getting rx flow hash options")
      Reported-by: default avatarTianhao <tizhao@redhat.com>
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      718f4a25
    • Govindarajulu Varadarajan's avatar
      net: handle 802.1P vlan 0 packets properly · 36b2f61a
      Govindarajulu Varadarajan authored
      When stack receives pkt: [802.1P vlan 0][802.1AD vlan 100][IPv4],
      vlan_do_receive() returns false if it does not find vlan_dev. Later
      __netif_receive_skb_core() fails to find packet type handler for
      skb->protocol 801.1AD and drops the packet.
      
      801.1P header with vlan id 0 should be handled as untagged packets.
      This patch fixes it by checking if vlan_id is 0 and processes next vlan
      header.
      Signed-off-by: default avatarGovindarajulu Varadarajan <gvaradar@cisco.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36b2f61a
    • Linus Torvalds's avatar
      Linux 5.2-rc5 · 9e0babf2
      Linus Torvalds authored
      9e0babf2
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 963172d9
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "The accumulated fixes from this and last week:
      
         - Fix vmalloc TLB flush and map range calculations which lead to
           stale TLBs, spurious faults and other hard to diagnose issues.
      
         - Use fault_in_pages_writable() for prefaulting the user stack in the
           FPU code as it's less fragile than the current solution
      
         - Use the PF_KTHREAD flag when checking for a kernel thread instead
           of current->mm as the latter can give the wrong answer due to
           use_mm()
      
         - Compute the vmemmap size correctly for KASLR and 5-Level paging.
           Otherwise this can end up with a way too small vmemmap area.
      
         - Make KASAN and 5-level paging work again by making sure that all
           invalid bits are masked out when computing the P4D offset. This
           worked before but got broken recently when the LDT remap area was
           moved.
      
         - Prevent a NULL pointer dereference in the resource control code
           which can be triggered with certain mount options when the
           requested resource is not available.
      
         - Enforce ordering of microcode loading vs. perf initialization on
           secondary CPUs. Otherwise perf tries to access a non-existing MSR
           as the boot CPU marked it as available.
      
         - Don't stop the resource control group walk early otherwise the
           control bitmaps are not updated correctly and become inconsistent.
      
         - Unbreak kgdb by returning 0 on success from
           kgdb_arch_set_breakpoint() instead of an error code.
      
         - Add more Icelake CPU model defines so depending changes can be
           queued in other trees"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callback
        x86/kasan: Fix boot with 5-level paging and KASAN
        x86/fpu: Don't use current->mm to check for a kthread
        x86/kgdb: Return 0 from kgdb_arch_set_breakpoint()
        x86/resctrl: Prevent NULL pointer dereference when local MBM is disabled
        x86/resctrl: Don't stop walking closids when a locksetup group is found
        x86/fpu: Update kernel's FPU state before using for the fsave header
        x86/mm/KASLR: Compute the size of the vmemmap section properly
        x86/fpu: Use fault_in_pages_writeable() for pre-faulting
        x86/CPU: Add more Icelake model numbers
        mm/vmalloc: Avoid rare case of flushing TLB with weird arguments
        mm/vmalloc: Fix calculation of direct map addr range
      963172d9
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · efba92d5
      Linus Torvalds authored
      Pull timer fixes from Thomas Gleixner:
       "A set of small fixes:
      
         - Repair the ktime_get_coarse() functions so they actually deliver
           what they are supposed to: tick granular time stamps. The current
           code missed to add the accumulated nanoseconds part of the
           timekeeper so the resulting granularity was 1 second.
      
         - Prevent the tracer from infinitely recursing into time getter
           functions in the arm architectured timer by marking these functions
           notrace
      
         - Fix a trivial compiler warning caused by wrong qualifier ordering"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        timekeeping: Repair ktime_get_coarse*() granularity
        clocksource/drivers/arm_arch_timer: Don't trace count reader functions
        clocksource/drivers/timer-ti-dm: Change to new style declaration
      efba92d5
    • Linus Torvalds's avatar
      Merge branch 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f763cf8e
      Linus Torvalds authored
      Pull RAS fixes from Thomas Gleixner:
       "Two small fixes for RAS:
      
         - Use a proper search algorithm to find the correct element in the
           CEC array. The replacement was a better choice than fixing the
           crash causes by the original search function with horrible duct
           tape.
      
         - Move the timer based decay function into thread context so it can
           actually acquire the mutex which protects the CEC array to prevent
           corruption"
      
      * 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        RAS/CEC: Convert the timer callback to a workqueue
        RAS/CEC: Fix binary search function
      f763cf8e
    • Eric Dumazet's avatar
      tcp: enforce tcp_min_snd_mss in tcp_mtu_probing() · 967c05ae
      Eric Dumazet authored
      If mtu probing is enabled tcp_mtu_probing() could very well end up
      with a too small MSS.
      
      Use the new sysctl tcp_min_snd_mss to make sure MSS search
      is performed in an acceptable range.
      
      CVE-2019-11479 -- tcp mss hardcoded to 48
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarJonathan Lemon <jonathan.lemon@gmail.com>
      Cc: Jonathan Looney <jtl@netflix.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Tyler Hicks <tyhicks@canonical.com>
      Cc: Bruce Curtis <brucec@netflix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      967c05ae
    • Eric Dumazet's avatar
      tcp: add tcp_min_snd_mss sysctl · 5f3e2bf0
      Eric Dumazet authored
      Some TCP peers announce a very small MSS option in their SYN and/or
      SYN/ACK messages.
      
      This forces the stack to send packets with a very high network/cpu
      overhead.
      
      Linux has enforced a minimal value of 48. Since this value includes
      the size of TCP options, and that the options can consume up to 40
      bytes, this means that each segment can include only 8 bytes of payload.
      
      In some cases, it can be useful to increase the minimal value
      to a saner value.
      
      We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility
      reasons.
      
      Note that TCP_MAXSEG socket option enforces a minimal value
      of (TCP_MIN_MSS). David Miller increased this minimal value
      in commit c39508d6 ("tcp: Make TCP_MAXSEG minimum more correct.")
      from 64 to 88.
      
      We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS.
      
      CVE-2019-11479 -- tcp mss hardcoded to 48
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Suggested-by: default avatarJonathan Looney <jtl@netflix.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Tyler Hicks <tyhicks@canonical.com>
      Cc: Bruce Curtis <brucec@netflix.com>
      Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5f3e2bf0
    • Eric Dumazet's avatar
      tcp: tcp_fragment() should apply sane memory limits · f070ef2a
      Eric Dumazet authored
      Jonathan Looney reported that a malicious peer can force a sender
      to fragment its retransmit queue into tiny skbs, inflating memory
      usage and/or overflow 32bit counters.
      
      TCP allows an application to queue up to sk_sndbuf bytes,
      so we need to give some allowance for non malicious splitting
      of retransmit queue.
      
      A new SNMP counter is added to monitor how many times TCP
      did not allow to split an skb if the allowance was exceeded.
      
      Note that this counter might increase in the case applications
      use SO_SNDBUF socket option to lower sk_sndbuf.
      
      CVE-2019-11478 : tcp_fragment, prevent fragmenting a packet when the
      	socket is already using more than half the allowed space
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarJonathan Looney <jtl@netflix.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Reviewed-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Cc: Bruce Curtis <brucec@netflix.com>
      Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f070ef2a
    • Eric Dumazet's avatar
      tcp: limit payload size of sacked skbs · 3b4929f6
      Eric Dumazet authored
      Jonathan Looney reported that TCP can trigger the following crash
      in tcp_shifted_skb() :
      
      	BUG_ON(tcp_skb_pcount(skb) < pcount);
      
      This can happen if the remote peer has advertized the smallest
      MSS that linux TCP accepts : 48
      
      An skb can hold 17 fragments, and each fragment can hold 32KB
      on x86, or 64KB on PowerPC.
      
      This means that the 16bit witdh of TCP_SKB_CB(skb)->tcp_gso_segs
      can overflow.
      
      Note that tcp_sendmsg() builds skbs with less than 64KB
      of payload, so this problem needs SACK to be enabled.
      SACK blocks allow TCP to coalesce multiple skbs in the retransmit
      queue, thus filling the 17 fragments to maximal capacity.
      
      CVE-2019-11477 -- u16 overflow of TCP_SKB_CB(skb)->tcp_gso_segs
      
      Fixes: 832d11c5 ("tcp: Try to restore large SKBs while SACK processing")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarJonathan Looney <jtl@netflix.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Reviewed-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Bruce Curtis <brucec@netflix.com>
      Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b4929f6
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 1eb4169c
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2019-06-15
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) fix stack layout of JITed x64 bpf code, from Alexei.
      
      2) fix out of bounds memory access in bpf_sk_storage, from Arthur.
      
      3) fix lpm trie walk, from Jonathan.
      
      4) fix nested bpf_perf_event_output, from Matt.
      
      5) and several other fixes.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1eb4169c
    • David S. Miller's avatar
      Revert "net: phylink: set the autoneg state in phylink_phy_change" · 5db2e7c7
      David S. Miller authored
      This reverts commit ef7bfa84.
      
      Russell King espressed some strong opposition to this
      change, explaining that this is trying to make phylink
      behave outside of how it has been designed.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5db2e7c7
  5. 15 Jun, 2019 4 commits
    • Matt Mullins's avatar
      bpf: fix nested bpf tracepoints with per-cpu data · 9594dc3c
      Matt Mullins authored
      BPF_PROG_TYPE_RAW_TRACEPOINTs can be executed nested on the same CPU, as
      they do not increment bpf_prog_active while executing.
      
      This enables three levels of nesting, to support
        - a kprobe or raw tp or perf event,
        - another one of the above that irq context happens to call, and
        - another one in nmi context
      (at most one of which may be a kprobe or perf event).
      
      Fixes: 20b9d7ac ("bpf: avoid excessive stack usage for perf_sample_data")
      Signed-off-by: default avatarMatt Mullins <mmullins@fb.com>
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9594dc3c
    • Arthur Fabre's avatar
      bpf: Fix out of bounds memory access in bpf_sk_storage · 85749218
      Arthur Fabre authored
      bpf_sk_storage maps use multiple spin locks to reduce contention.
      The number of locks to use is determined by the number of possible CPUs.
      With only 1 possible CPU, bucket_log == 0, and 2^0 = 1 locks are used.
      
      When updating elements, the correct lock is determined with hash_ptr().
      Calling hash_ptr() with 0 bits is undefined behavior, as it does:
      
      x >> (64 - bits)
      
      Using the value results in an out of bounds memory access.
      In my case, this manifested itself as a page fault when raw_spin_lock_bh()
      is called later, when running the self tests:
      
      ./tools/testing/selftests/bpf/test_verifier 773 775
      [   16.366342] BUG: unable to handle page fault for address: ffff8fe7a66f93f8
      
      Force the minimum number of locks to two.
      Signed-off-by: default avatarArthur Fabre <afabre@cloudflare.com>
      Fixes: 6ac99e8f ("bpf: Introduce bpf sk local storage")
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      85749218
    • Stephen Barber's avatar
      vsock/virtio: set SOCK_DONE on peer shutdown · 42f5cda5
      Stephen Barber authored
      Set the SOCK_DONE flag to match the TCP_CLOSING state when a peer has
      shut down and there is nothing left to read.
      
      This fixes the following bug:
      1) Peer sends SHUTDOWN(RDWR).
      2) Socket enters TCP_CLOSING but SOCK_DONE is not set.
      3) read() returns -ENOTCONN until close() is called, then returns 0.
      Signed-off-by: default avatarStephen Barber <smbarber@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      42f5cda5
    • Linus Walleij's avatar
      net: dsa: rtl8366: Fix up VLAN filtering · 760c80b7
      Linus Walleij authored
      We get this regression when using RTL8366RB as part of a bridge
      with OpenWrt:
      
      WARNING: CPU: 0 PID: 1347 at net/switchdev/switchdev.c:291
      	 switchdev_port_attr_set_now+0x80/0xa4
      lan0: Commit of attribute (id=7) failed.
      (...)
      realtek-smi switch lan0: failed to initialize vlan filtering on this port
      
      This is because it is trying to disable VLAN filtering
      on VLAN0, as we have forgot to add 1 to the port number
      to get the right VLAN in rtl8366_vlan_filtering(): when
      we initialize the VLAN we associate VLAN1 with port 0,
      VLAN2 with port 1 etc, so we need to add 1 to the port
      offset.
      
      Fixes: d8652956 ("net: dsa: realtek-smi: Add Realtek SMI driver")
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      760c80b7