1. 26 May, 2017 22 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 6741d516
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix state pruning in bpf verifier wrt. alignment, from Daniel
          Borkmann.
      
       2) Handle non-linear SKBs properly in SCTP ICMP parsing, from Davide
          Caratti.
      
       3) Fix bit field definitions for rss_hash_type of descriptors in mlx5
          driver, from Jesper Brouer.
      
       4) Defer slave->link updates until bonding is ready to do a full commit
          to the new settings, from Nithin Sujir.
      
       5) Properly reference count ipv4 FIB metrics to avoid use after free
          situations, from Eric Dumazet and several others including Cong Wang
          and Julian Anastasov.
      
       6) Fix races in llc_ui_bind(), from Lin Zhang.
      
       7) Fix regression of ESP UDP encapsulation for TCP packets, from
          Steffen Klassert.
      
       8) Fix mdio-octeon driver Kconfig deps, from Randy Dunlap.
      
       9) Fix regression in setting DSCP on ipv6/GRE encapsulation, from Peter
          Dawson.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
        ipv4: add reference counting to metrics
        net: ethernet: ax88796: don't call free_irq without request_irq first
        ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets
        sctp: fix ICMP processing if skb is non-linear
        net: llc: add lock_sock in llc_ui_bind to avoid a race condition
        bonding: Don't update slave->link until ready to commit
        test_bpf: Add a couple of tests for BPF_JSGE.
        bpf: add various verifier test cases
        bpf: fix wrong exposure of map_flags into fdinfo for lpm
        bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data
        bpf: properly reset caller saved regs after helper call and ld_abs/ind
        bpf: fix incorrect pruning decision when alignment must be tracked
        arp: fixed -Wuninitialized compiler warning
        tcp: avoid fastopen API to be used on AF_UNSPEC
        net: move somaxconn init from sysctl code
        net: fix potential null pointer dereference
        geneve: fix fill_info when using collect_metadata
        virtio-net: enable TSO/checksum offloads for Q-in-Q vlans
        be2net: Fix offload features for Q-in-Q packets
        vlan: Fix tcp checksum offloads in Q-in-Q vlans
        ...
      6741d516
    • Linus Torvalds's avatar
      Merge tag 'xfs-4.12-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · cdbe0206
      Linus Torvalds authored
      Pull XFS fixes from Darrick Wong:
       "A few miscellaneous bug fixes & cleanups:
      
         - Fix indlen block reservation accounting bug when splitting delalloc
           extent
      
         - Fix warnings about unused variables that appeared in -rc1.
      
         - Don't spew errors when bmapping a local format directory
      
         - Fix an off-by-one error in a delalloc eof assertion
      
         - Make fsmap only return inode information for CAP_SYS_ADMIN
      
         - Fix a potential mount time deadlock recovering cow extents
      
         - Fix unaligned memory access in _btree_visit_blocks
      
         - Fix various SEEK_HOLE/SEEK_DATA bugs"
      
      * tag 'xfs-4.12-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: Move handling of missing page into one place in xfs_find_get_desired_pgoff()
        xfs: Fix off-by-in in loop termination in xfs_find_get_desired_pgoff()
        xfs: Fix missed holes in SEEK_HOLE implementation
        xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff()
        xfs: fix unaligned access in xfs_btree_visit_blocks
        xfs: avoid mount-time deadlock in CoW extent recovery
        xfs: only return detailed fsmap info if the caller has CAP_SYS_ADMIN
        xfs: bad assertion for delalloc an extent that start at i_size
        xfs: fix warnings about unused stack variables
        xfs: BMAPX shouldn't barf on inline-format directories
        xfs: fix indlen accounting error on partial delalloc conversion
      cdbe0206
    • Eric Dumazet's avatar
      ipv4: add reference counting to metrics · 3fb07daf
      Eric Dumazet authored
      Andrey Konovalov reported crashes in ipv4_mtu()
      
      I could reproduce the issue with KASAN kernels, between
      10.246.7.151 and 10.246.7.152 :
      
      1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 &
      
      2) At the same time run following loop :
      while :
      do
       ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
       ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
      done
      
      Cong Wang attempted to add back rt->fi in commit
      82486aa6 ("ipv4: restore rt->fi for reference counting")
      but this proved to add some issues that were complex to solve.
      
      Instead, I suggested to add a refcount to the metrics themselves,
      being a standalone object (in particular, no reference to other objects)
      
      I tried to make this patch as small as possible to ease its backport,
      instead of being super clean. Note that we believe that only ipv4 dst
      need to take care of the metric refcount. But if this is wrong,
      this patch adds the basic infrastructure to extend this to other
      families.
      
      Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang
      for his efforts on this problem.
      
      Fixes: 2860583f ("ipv4: Kill rt->fi")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3fb07daf
    • Uwe Kleine-König's avatar
      net: ethernet: ax88796: don't call free_irq without request_irq first · 82533ad9
      Uwe Kleine-König authored
      The function ax_init_dev (which is called only from the driver's .probe
      function) calls free_irq in the error path without having requested the
      irq in the first place. So drop the free_irq call in the error path.
      
      Fixes: 825a2ff1 ("AX88796 network driver")
      Signed-off-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      82533ad9
    • Peter Dawson's avatar
      ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets · 0e9a7095
      Peter Dawson authored
      This fix addresses two problems in the way the DSCP field is formulated
       on the encapsulating header of IPv6 tunnels.
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195661
      
      1) The IPv6 tunneling code was manipulating the DSCP field of the
       encapsulating packet using the 32b flowlabel. Since the flowlabel is
       only the lower 20b it was incorrect to assume that the upper 12b
       containing the DSCP and ECN fields would remain intact when formulating
       the encapsulating header. This fix handles the 'inherit' and
       'fixed-value' DSCP cases explicitly using the extant dsfield u8 variable.
      
      2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was
       incorrect and resulted in the DSCP value always being set to 0.
      
      Commit 90427ef5 ("ipv6: fix flow labels when the traffic class
       is non-0") caused the regression by masking out the flowlabel
       which exposed the incorrect handling of the DSCP portion of the
       flowlabel in ip6_tunnel and ip6_gre.
      
      Fixes: 90427ef5 ("ipv6: fix flow labels when the traffic class is non-0")
      Signed-off-by: default avatarPeter Dawson <peter.a.dawson@boeing.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e9a7095
    • Davide Caratti's avatar
      sctp: fix ICMP processing if skb is non-linear · 804ec7eb
      Davide Caratti authored
      sometimes ICMP replies to INIT chunks are ignored by the client, even if
      the encapsulated SCTP headers match an open socket. This happens when the
      ICMP packet is carried by a paged skb: use skb_header_pointer() to read
      packet contents beyond the SCTP header, so that chunk header and initiate
      tag are validated correctly.
      
      v2:
      - don't use skb_header_pointer() to read the transport header, since
        icmp_socket_deliver() already puts these 8 bytes in the linear area.
      - change commit message to make specific reference to INIT chunks.
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      804ec7eb
    • linzhang's avatar
      net: llc: add lock_sock in llc_ui_bind to avoid a race condition · 0908cf4d
      linzhang authored
      There is a race condition in llc_ui_bind if two or more processes/threads
      try to bind a same socket.
      
      If more processes/threads bind a same socket success that will lead to
      two problems, one is this action is not what we expected, another is
      will lead to kernel in unstable status or oops(in my simple test case,
      cause llc2.ko can't unload).
      
      The current code is test SOCK_ZAPPED bit to avoid a process to
      bind a same socket twice but that is can't avoid more processes/threads
      try to bind a same socket at the same time.
      
      So, add lock_sock in llc_ui_bind like others, such as llc_ui_connect.
      Signed-off-by: default avatarLin Zhang <xiaolou4617@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0908cf4d
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 1b8f2ffc
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A collection of fixes that should go into this series. This contains:
      
         - A set of NVMe fixes, pulled from Christoph. This includes a set of
           fixes for the fiber channel bits from James Smart, rdma queue depth
           fix from Marta, controller removal fixes from Ming, and some more
           APST quirk updates from Andy.
      
         - A blk-mq debugfs fix from Bart, fixing a problem with the
           untangling of the sysfs and debugfs blk-mq bits that was added in
           this series.
      
         - Error code fix in add_partition() from Dan.
      
         - A small series of fixes for the new blk-throttle code from Shaohua"
      
      * 'for-linus' of git://git.kernel.dk/linux-block: (21 commits)
        blk-mq: Only register debugfs attributes for blk-mq queues
        nvme: Quirk APST on Intel 600P/P3100 devices
        nvme: only setup block integrity if supported by the driver
        nvme: replace is_flags field in nvme_ctrl_ops with a flags field
        nvme-pci: consistencly use ctrl->device for logging
        partitions/msdos: FreeBSD UFS2 file systems are not recognized
        block: fix an error code in add_partition()
        blk-throttle: force user to configure all settings for io.low
        blk-throttle: respect 0 bps/iops settings for io.low
        blk-throttle: output some debug info in trace
        blk-throttle: add hierarchy support for latency target and idle time
        nvme_fc: remove extra controller reference taken on reconnect
        nvme_fc: correct nvme status set on abort
        nvme_fc: set logging level on resets/deletes
        nvme_fc: revise comment on teardown
        nvme_fc: Support ctrl_loss_tmo
        nvme_fc: get rid of local reconnect_delay
        blk-mq: remove blk_mq_abort_requeue_list()
        nvme: avoid to use blk_mq_abort_requeue_list()
        nvme: use blk_mq_start_hw_queues() in nvme_kill_queues()
        ...
      1b8f2ffc
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.12-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 6ce47829
      Linus Torvalds authored
      Pull PCI fixes from Bjorn Helgaas:
      
       - fix PCI_ENDPOINT build error (merged for v4.12)
      
       - fix Switchtec driver (merged for v4.12)
      
       - fix imx6 config read timeouts, fallout from changing to non-postable
         reads
      
       - add PM "needs_resume" flag for i915 suspend issue
      
      * tag 'pci-v4.12-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI/PM: Add needs_resume flag to avoid suspend complete optimization
        PCI: imx6: Fix config read timeout handling
        switchtec: Fix minor bug with partition ID register
        switchtec: Use new cdev_device_add() helper function
        PCI: endpoint: Make PCI_ENDPOINT depend on HAS_DMA
      6ce47829
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-4.12-rc3' of git://github.com/ceph/ceph-client · 80941b2a
      Linus Torvalds authored
      Pul ceph fixes from Ilya Dryomov:
       "A bunch of make W=1 and static checker fixups, a RECONNECT_SEQ
        messenger patch from Zheng and Luis' fallocate fix"
      
      * tag 'ceph-for-4.12-rc3' of git://github.com/ceph/ceph-client:
        ceph: check that the new inode size is within limits in ceph_fallocate()
        libceph: cleanup old messages according to reconnect seq
        libceph: NULL deref on crush_decode() error path
        libceph: fix error handling in process_one_ticket()
        libceph: validate blob_struct_v in process_one_ticket()
        libceph: drop version variable from ceph_monmap_decode()
        libceph: make ceph_msg_data_advance() return void
        libceph: use kbasename() and kill ceph_file_part()
      80941b2a
    • Linus Torvalds's avatar
      Merge tag 'mmc-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · a38b461e
      Linus Torvalds authored
      Pull MMC fixes from Ulf Hansson:
       "This contains fixes to make the WiFi work again for the ARM64 Hikey
        board.
      
        Together with a couple of DTS updates for the Hikey board we have also
        extended the mmc pwrseq_simple, to support a new power-off-delay-us DT
        property, as that was required to enable a graceful power off sequence
        for the WiFi chip"
      
      * tag 'mmc-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        arm64: dts: hikey: Fix WiFi support
        arm64: dts: hi6220: Move board data from the dwmmc nodes to hikey dts
        arm64: dts: hikey: Add the SYS_5V and the VDD_3V3 regulators
        arm64: dts: hi6220: Move the fixed_5v_hub regulator to the hikey dts
        arm64: dts: hikey: Add clock for the pmic mfd
        mfd: dts: hi655x: Add clock binding for the pmic
        mmc: pwrseq_simple: Parse DTS for the power-off-delay-us property
        mmc: dt: pwrseq-simple: Invent power-off-delay-us
      a38b461e
    • Linus Torvalds's avatar
      Merge tag 'sound-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · e95806df
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "This contains a few HD-audio device-specific quirks and an endianess
        fix for USB-audio, as well as the update of quirk model list document.
        All fixes are small and trivial.
      
        The document update could have been postponed, but it's a good thing
        for user and has absolutely zero risk of breakage, so included here"
      
      * tag 'sound-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430
        ALSA: hda - Update the list of quirk models
        ALSA: hda - Provide dual-codecs model option for a few Realtek codecs
        ALSA: hda - Apply dual-codec quirk for MSI Z270-Gaming mobo
        ALSA: hda - No loopback on ALC299 codec
        ALSA: usb-audio: fix Amanero Combo384 quirk on big-endian hosts
      e95806df
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-for-v4.12-rc3' of git://people.freedesktop.org/~airlied/linux · 876ca8f3
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Not a whole lot happening here, a set of amdgpu fixes and one core
        deadlock fix, and some misc drivers fixes"
      
      * tag 'drm-fixes-for-v4.12-rc3' of git://people.freedesktop.org/~airlied/linux:
        drm/amdgpu: fix null point error when rmmod amdgpu.
        drm/amd/powerplay: fix a signedness bugs
        drm/amdgpu: fix NULL pointer panic of emit_gds_switch
        drm/radeon: Unbreak HPD handling for r600+
        drm/amd/powerplay/smu7: disable mclk switching for high refresh rates
        drm/amd/powerplay/smu7: add vblank check for mclk switching (v2)
        drm/radeon/ci: disable mclk switching for high refresh rates (v2)
        drm/amdgpu/ci: disable mclk switching for high refresh rates (v2)
        drm/amdgpu: fix fundamental suspend/resume issue
        drm/gma500/psb: Actually use VBT mode when it is found
        drm: Fix deadlock retry loop in page_flip_ioctl
        drm: qxl: Delay entering atomic context during cursor update
        drm/radeon: Fix oops upon driver load on PowerXpress laptops
      876ca8f3
    • Christoph Hellwig's avatar
      PCI/msi: fix the pci_alloc_irq_vectors_affinity stub · 83b4605b
      Christoph Hellwig authored
      We need to return an error for any call that asks for MSI / MSI-X
      vectors only, so that non-trivial fallback logic can work properly.
      
      Also valid dev->irq and use the "correct" errno value based on feedback
      from Linus.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reported-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Fixes: aff17164 ("PCI: Provide sensible IRQ vector alloc/free routines")
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      83b4605b
    • Jens Axboe's avatar
      Merge branch 'nvme-4.12' of git://git.infradead.org/nvme into for-linus · 8aa63829
      Jens Axboe authored
      Christoph writes:
      
      "A couple of fixes for the next rc on the nvme front. Various FC fixes
      from James, controller removal fixes from Ming (including a block layer
      patch), a APST related device quirk from Andy, a RDMA fix for small
      queue depth device from Marta, as well as fixes for the lack of
      metadata support in non-PCIe drivers and the printk logging format from
      me."
      8aa63829
    • Bart Van Assche's avatar
      blk-mq: Only register debugfs attributes for blk-mq queues · a8ecdd71
      Bart Van Assche authored
      The code in blk-mq-debugfs.c assumes that it is working on a blk-mq
      queue and is not intended to work on a blk-sq queue. Hence only
      register blk-mq debugfs attributes for blk-mq queues.
      
      Fixes: commit 9c1051aa ("blk-mq: untangle debugfs and sysfs")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ming Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      a8ecdd71
    • Andy Lutomirski's avatar
      nvme: Quirk APST on Intel 600P/P3100 devices · 50af47d0
      Andy Lutomirski authored
      They have known firmware bugs.  A fix is apparently in the works --
      once fixed firmware is available, someone from Intel (Hi, Keith!)
      can adjust the quirk accordingly.
      
      Cc: stable@vger.kernel.org # v4.11
      Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>
      Cc: Mario Limonciello <mario_limonciello@dell.com>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      50af47d0
    • Christoph Hellwig's avatar
      nvme: only setup block integrity if supported by the driver · c81bfba9
      Christoph Hellwig authored
      Currently only the PCIe driver supports metadata, so we should not claim
      integrity support for the other drivers.  This prevents nasty crashes
      with targets that advertise metadata support on fabrics.
      
      Also use the opportunity to factor out some code into a separate helper
      that isn't even compiled if CONFIG_BLK_DEV_INTEGRITY is disabled.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      c81bfba9
    • Christoph Hellwig's avatar
      nvme: replace is_flags field in nvme_ctrl_ops with a flags field · d3d5b87d
      Christoph Hellwig authored
      So that we can have more flags for transport-specific behavior.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      d3d5b87d
    • Christoph Hellwig's avatar
      nvme-pci: consistencly use ctrl->device for logging · 9bdcfb10
      Christoph Hellwig authored
      This is what most of the code already does and gives much more useful
      prefixes than the device embedded in the pci_dev.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      9bdcfb10
    • Dave Airlie's avatar
      Merge branch 'drm-fixes-4.12' of git://people.freedesktop.org/~agd5f/linux into drm-fixes · bc1f0e04
      Dave Airlie authored
      A bunch of bug fixes:
      - Fix display flickering on some chips at high refresh rates
      - suspend/resume fix
      - hotplug fix
      - a couple of segfault fixes for certain cases
      
      * 'drm-fixes-4.12' of git://people.freedesktop.org/~agd5f/linux:
        drm/amdgpu: fix null point error when rmmod amdgpu.
        drm/amd/powerplay: fix a signedness bugs
        drm/amdgpu: fix NULL pointer panic of emit_gds_switch
        drm/radeon: Unbreak HPD handling for r600+
        drm/amd/powerplay/smu7: disable mclk switching for high refresh rates
        drm/amd/powerplay/smu7: add vblank check for mclk switching (v2)
        drm/radeon/ci: disable mclk switching for high refresh rates (v2)
        drm/amdgpu/ci: disable mclk switching for high refresh rates (v2)
        drm/amdgpu: fix fundamental suspend/resume issue
      bc1f0e04
    • Dave Airlie's avatar
      Merge tag 'drm-misc-fixes-2017-05-25' of git://anongit.freedesktop.org/git/drm-misc into drm-fixes · 538fd19e
      Dave Airlie authored
      Core Changes:
      - Don't drop vblank reference more than once in cases of ww retry (Daniel)
      
      Driver Changes:
      - radeon: Fix oops during radeon probe trying to reference wrong device (Lukas)
      - qxl: Avoid sleeping while in atomic context on cursor update (Gabriel)
      - gma500: Use VBT mode instead of pre-programmed mode for LVDS (Patrik)
      
      Cc: Lukas Wunner <lukas@wunner.de>
      Cc: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
      
      * tag 'drm-misc-fixes-2017-05-25' of git://anongit.freedesktop.org/git/drm-misc:
        drm/gma500/psb: Actually use VBT mode when it is found
        drm: Fix deadlock retry loop in page_flip_ioctl
        drm: qxl: Delay entering atomic context during cursor update
        drm/radeon: Fix oops upon driver load on PowerXpress laptops
      538fd19e
  2. 25 May, 2017 18 commits
    • Nithin Sujir's avatar
      bonding: Don't update slave->link until ready to commit · 797a9364
      Nithin Sujir authored
      In the loadbalance arp monitoring scheme, when a slave link change is
      detected, the slave->link is immediately updated and slave_state_changed
      is set. Later down the function, the rtnl_lock is acquired and the
      changes are committed, updating the bond link state.
      
      However, the acquisition of the rtnl_lock can fail. The next time the
      monitor runs, since slave->link is already updated, it determines that
      link is unchanged. This results in the bond link state permanently out
      of sync with the slave link.
      
      This patch modifies bond_loadbalance_arp_mon() to handle link changes
      identical to bond_ab_arp_{inspect/commit}(). The new link state is
      maintained in slave->new_link until we're ready to commit at which point
      it's copied into slave->link.
      
      NOTE: miimon_{inspect/commit}() has a more complex state machine
      requiring the use of the bond_{propose,commit}_link_state() functions
      which maintains the intermediate state in slave->link_new_state. The arp
      monitors don't require that.
      
      Testing: This bug is very easy to reproduce with the following steps.
      1. In a loop, toggle a slave link of a bond slave interface.
      2. In a separate loop, do ifconfig up/down of an unrelated interface to
      create contention for rtnl_lock.
      Within a few iterations, the bond link goes out of sync with the slave
      link.
      Signed-off-by: default avatarNithin Nayak Sujir <nsujir@tintri.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: Jay Vosburgh <jay.vosburgh@canonical.com>
      Acked-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      797a9364
    • David Daney's avatar
      test_bpf: Add a couple of tests for BPF_JSGE. · 791caeb0
      David Daney authored
      Some JITs can optimize comparisons with zero.  Add a couple of
      BPF_JSGE tests against immediate zero.
      Signed-off-by: default avatarDavid Daney <david.daney@cavium.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      791caeb0
    • David S. Miller's avatar
      Merge branch 'bpf-fixes' · ae08ea97
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      Various BPF fixes
      
      Follow-up to fix incorrect pruning when alignment tracking is
      in use and to properly clear regs after call to not leave stale
      data behind, also a fix that adds bpf_clone_redirect to the
      bpf_helper_changes_pkt_data helper and exposes correct map_flags
      for lpm map into fdinfo. For details, please see individual
      patches.
      
      v1 -> v2:
        - Reworked first patch so that env->strict_alignment is the
          final indicator on whether we have to deal with strict
          alignment rather than having CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
          checks on various locations, so only checking env->strict_alignment
          is sufficient after that. Thanks for spotting, Dave!
        - Added patch 3 and 4.
        - Rest as is.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae08ea97
    • Daniel Borkmann's avatar
      bpf: add various verifier test cases · 614d0d77
      Daniel Borkmann authored
      This patch adds various verifier test cases:
      
      1) A test case for the pruning issue when tracking alignment
         is used.
      2) Various PTR_TO_MAP_VALUE_OR_NULL tests to make sure pointer
         arithmetic turns such register into UNKNOWN_VALUE type.
      3) Test cases for the special treatment of LD_ABS/LD_IND to
         make sure verifier doesn't break calling convention here.
         Latter is needed, since f.e. arm64 JIT uses r1 - r5 for
         storing temporary data, so they really must be marked as
         NOT_INIT.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      614d0d77
    • Daniel Borkmann's avatar
      bpf: fix wrong exposure of map_flags into fdinfo for lpm · a316338c
      Daniel Borkmann authored
      trie_alloc() always needs to have BPF_F_NO_PREALLOC passed in via
      attr->map_flags, since it does not support preallocation yet. We
      check the flag, but we never copy the flag into trie->map.map_flags,
      which is later on exposed into fdinfo and used by loaders such as
      iproute2. Latter uses this in bpf_map_selfcheck_pinned() to test
      whether a pinned map has the same spec as the one from the BPF obj
      file and if not, bails out, which is currently the case for lpm
      since it exposes always 0 as flags.
      
      Also copy over flags in array_map_alloc() and stack_map_alloc().
      They always have to be 0 right now, but we should make sure to not
      miss to copy them over at a later point in time when we add actual
      flags for them to use.
      
      Fixes: b95a5c4d ("bpf: add a longest prefix match trie map implementation")
      Reported-by: default avatarJarno Rajahalme <jarno@covalent.io>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a316338c
    • Daniel Borkmann's avatar
      bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data · 41703a73
      Daniel Borkmann authored
      The bpf_clone_redirect() still needs to be listed in
      bpf_helper_changes_pkt_data() since we call into
      bpf_try_make_head_writable() from there, thus we need
      to invalidate prior pkt regs as well.
      
      Fixes: 36bbef52 ("bpf: direct packet write and access for helpers for clsact progs")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      41703a73
    • Daniel Borkmann's avatar
      bpf: properly reset caller saved regs after helper call and ld_abs/ind · a9789ef9
      Daniel Borkmann authored
      Currently, after performing helper calls, we clear all caller saved
      registers, that is r0 - r5 and fill r0 depending on struct bpf_func_proto
      specification. The way we reset these regs can affect pruning decisions
      in later paths, since we only reset register's imm to 0 and type to
      NOT_INIT. However, we leave out clearing of other variables such as id,
      min_value, max_value, etc, which can later on lead to pruning mismatches
      due to stale data.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9789ef9
    • Daniel Borkmann's avatar
      bpf: fix incorrect pruning decision when alignment must be tracked · 1ad2f583
      Daniel Borkmann authored
      Currently, when we enforce alignment tracking on direct packet access,
      the verifier lets the following program pass despite doing a packet
      write with unaligned access:
      
        0: (61) r2 = *(u32 *)(r1 +76)
        1: (61) r3 = *(u32 *)(r1 +80)
        2: (61) r7 = *(u32 *)(r1 +8)
        3: (bf) r0 = r2
        4: (07) r0 += 14
        5: (25) if r7 > 0x1 goto pc+4
         R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
         R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
        6: (2d) if r0 > r3 goto pc+1
         R0=pkt(id=0,off=14,r=14) R1=ctx R2=pkt(id=0,off=0,r=14)
         R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
        7: (63) *(u32 *)(r0 -4) = r0
        8: (b7) r0 = 0
        9: (95) exit
      
        from 6 to 8:
         R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
         R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
        8: (b7) r0 = 0
        9: (95) exit
      
        from 5 to 10:
         R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
         R3=pkt_end R7=inv,min_value=2 R10=fp
        10: (07) r0 += 1
        11: (05) goto pc-6
        6: safe                           <----- here, wrongly found safe
        processed 15 insns
      
      However, if we enforce a pruning mismatch by adding state into r8
      which is then being mismatched in states_equal(), we find that for
      the otherwise same program, the verifier detects a misaligned packet
      access when actually walking that path:
      
        0: (61) r2 = *(u32 *)(r1 +76)
        1: (61) r3 = *(u32 *)(r1 +80)
        2: (61) r7 = *(u32 *)(r1 +8)
        3: (b7) r8 = 1
        4: (bf) r0 = r2
        5: (07) r0 += 14
        6: (25) if r7 > 0x1 goto pc+4
         R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
         R3=pkt_end R7=inv,min_value=0,max_value=1
         R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
        7: (2d) if r0 > r3 goto pc+1
         R0=pkt(id=0,off=14,r=14) R1=ctx R2=pkt(id=0,off=0,r=14)
         R3=pkt_end R7=inv,min_value=0,max_value=1
         R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
        8: (63) *(u32 *)(r0 -4) = r0
        9: (b7) r0 = 0
        10: (95) exit
      
        from 7 to 9:
         R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
         R3=pkt_end R7=inv,min_value=0,max_value=1
         R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
        9: (b7) r0 = 0
        10: (95) exit
      
        from 6 to 11:
         R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
         R3=pkt_end R7=inv,min_value=2
         R8=imm1,min_value=1,max_value=1,min_align=1 R10=fp
        11: (07) r0 += 1
        12: (b7) r8 = 0
        13: (05) goto pc-7                <----- mismatch due to r8
        7: (2d) if r0 > r3 goto pc+1
         R0=pkt(id=0,off=15,r=15) R1=ctx R2=pkt(id=0,off=0,r=15)
         R3=pkt_end R7=inv,min_value=2
         R8=imm0,min_value=0,max_value=0,min_align=2147483648 R10=fp
        8: (63) *(u32 *)(r0 -4) = r0
        misaligned packet access off 2+15+-4 size 4
      
      The reason why we fail to see it in states_equal() is that the
      third test in compare_ptrs_to_packet() ...
      
        if (old->off <= cur->off &&
            old->off >= old->range && cur->off >= cur->range)
                return true;
      
      ... will let the above pass. The situation we run into is that
      old->off <= cur->off (14 <= 15), meaning that prior walked paths
      went with smaller offset, which was later used in the packet
      access after successful packet range check and found to be safe
      already.
      
      For example: Given is R0=pkt(id=0,off=0,r=0). Adding offset 14
      as in above program to it, results in R0=pkt(id=0,off=14,r=0)
      before the packet range test. Now, testing this against R3=pkt_end
      with 'if r0 > r3 goto out' will transform R0 into R0=pkt(id=0,off=14,r=14)
      for the case when we're within bounds. A write into the packet
      at offset *(u32 *)(r0 -4), that is, 2 + 14 -4, is valid and
      aligned (2 is for NET_IP_ALIGN). After processing this with
      all fall-through paths, we later on check paths from branches.
      When the above skb->mark test is true, then we jump near the
      end of the program, perform r0 += 1, and jump back to the
      'if r0 > r3 goto out' test we've visited earlier already. This
      time, R0 is of type R0=pkt(id=0,off=15,r=0), and we'll prune
      that part because this time we'll have a larger safe packet
      range, and we already found that with off=14 all further insn
      were already safe, so it's safe as well with a larger off.
      However, the problem is that the subsequent write into the packet
      with 2 + 15 -4 is then unaligned, and not caught by the alignment
      tracking. Note that min_align, aux_off, and aux_off_align were
      all 0 in this example.
      
      Since we cannot tell at this time what kind of packet access was
      performed in the prior walk and what minimal requirements it has
      (we might do so in the future, but that requires more complexity),
      fix it to disable this pruning case for strict alignment for now,
      and let the verifier do check such paths instead. With that applied,
      the test cases pass and reject the program due to misalignment.
      
      Fixes: d1174416 ("bpf: Track alignment of register values in the verifier.")
      Reference: http://patchwork.ozlabs.org/patch/761909/Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ad2f583
    • Ihar Hrachyshka's avatar
      arp: fixed -Wuninitialized compiler warning · 5990baaa
      Ihar Hrachyshka authored
      Commit 7d472a59 ("arp: always override
      existing neigh entries with gratuitous ARP") introduced a compiler
      warning:
      
      net/ipv4/arp.c:880:35: warning: 'addr_type' may be used uninitialized in
      this function [-Wmaybe-uninitialized]
      
      While the code logic seems to be correct and doesn't allow the variable
      to be used uninitialized, and the warning is not consistently
      reproducible, it's still worth fixing it for other people not to waste
      time looking at the warning in case it pops up in the build environment.
      Yes, compiler is probably at fault, but we will need to accommodate.
      
      Fixes: 7d472a59 ("arp: always override existing neigh entries with gratuitous ARP")
      Signed-off-by: default avatarIhar Hrachyshka <ihrachys@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5990baaa
    • Wei Wang's avatar
      tcp: avoid fastopen API to be used on AF_UNSPEC · ba615f67
      Wei Wang authored
      Fastopen API should be used to perform fastopen operations on the TCP
      socket. It does not make sense to use fastopen API to perform disconnect
      by calling it with AF_UNSPEC. The fastopen data path is also prone to
      race conditions and bugs when using with AF_UNSPEC.
      
      One issue reported and analyzed by Vegard Nossum is as follows:
      +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      Thread A:                            Thread B:
      ------------------------------------------------------------------------
      sendto()
       - tcp_sendmsg()
           - sk_stream_memory_free() = 0
               - goto wait_for_sndbuf
      	     - sk_stream_wait_memory()
      	        - sk_wait_event() // sleep
                |                          sendto(flags=MSG_FASTOPEN, dest_addr=AF_UNSPEC)
      	  |                           - tcp_sendmsg()
      	  |                              - tcp_sendmsg_fastopen()
      	  |                                 - __inet_stream_connect()
      	  |                                    - tcp_disconnect() //because of AF_UNSPEC
      	  |                                       - tcp_transmit_skb()// send RST
      	  |                                    - return 0; // no reconnect!
      	  |                           - sk_stream_wait_connect()
      	  |                                 - sock_error()
      	  |                                    - xchg(&sk->sk_err, 0)
      	  |                                    - return -ECONNRESET
      	- ... // wake up, see sk->sk_err == 0
          - skb_entail() on TCP_CLOSE socket
      
      If the connection is reopened then we will send a brand new SYN packet
      after thread A has already queued a buffer. At this point I think the
      socket internal state (sequence numbers etc.) becomes messed up.
      
      When the new connection is closed, the FIN-ACK is rejected because the
      sequence number is outside the window. The other side tries to
      retransmit,
      but __tcp_retransmit_skb() calls tcp_trim_head() on an empty skb which
      corrupts the skb data length and hits a BUG() in copy_and_csum_bits().
      +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      
      Hence, this patch adds a check for AF_UNSPEC in the fastopen data path
      and return EOPNOTSUPP to user if such case happens.
      
      Fixes: cf60af03 ("tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ba615f67
    • Roman Kapl's avatar
      net: move somaxconn init from sysctl code · 7c3f1875
      Roman Kapl authored
      The default value for somaxconn is set in sysctl_core_net_init(), but this
      function is not called when kernel is configured without CONFIG_SYSCTL.
      
      This results in the kernel not being able to accept TCP connections,
      because the backlog has zero size. Usually, the user ends up with:
      "TCP: request_sock_TCP: Possible SYN flooding on port 7. Dropping request.  Check SNMP counters."
      If SYN cookies are not enabled the connection is rejected.
      
      Before ef547f2a (tcp: remove max_qlen_log), the effects were less
      severe, because the backlog was always at least eight slots long.
      Signed-off-by: default avatarRoman Kapl <roman.kapl@sysgo.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c3f1875
    • Gustavo A. R. Silva's avatar
      net: fix potential null pointer dereference · 65d786c2
      Gustavo A. R. Silva authored
      Add null check to avoid a potential null pointer dereference.
      
      Addresses-Coverity-ID: 1408831
      Signed-off-by: default avatarGustavo A. R. Silva <garsilva@embeddedor.com>
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65d786c2
    • Rex Zhu's avatar
      drm/amdgpu: fix null point error when rmmod amdgpu. · b62ce397
      Rex Zhu authored
      this bug happened when amdgpu load failed.
      
      [   75.740951] BUG: unable to handle kernel paging request at 00000000000031c0
      [   75.748167] IP: [<ffffffffa064a0e0>] amdgpu_fbdev_restore_mode+0x20/0x60 [amdgpu]
      [   75.755774] PGD 0
      
      [   75.759185] Oops: 0000 [#1] SMP
      [   75.762408] Modules linked in: amdgpu(OE-) ttm(OE) drm_kms_helper(OE) drm(OE) i2c_algo_bit(E) fb_sys_fops(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) rpcsec_gss_krb5(E) nfsv4(E) nfs(E) fscache(E) eeepc_wmi(E) asus_wmi(E) sparse_keymap(E) intel_rapl(E) snd_hda_codec_hdmi(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) snd_hda_intel(E) snd_hda_codec(E) snd_hda_core(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) snd_hwdep(E) snd_pcm(E) snd_seq_midi(E) coretemp(E) kvm_intel(E) snd_seq_midi_event(E) snd_rawmidi(E) kvm(E) snd_seq(E) joydev(E) snd_seq_device(E) snd_timer(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) mei_me(E) ghash_clmulni_intel(E) snd(E) aesni_intel(E) mei(E) soundcore(E) aes_x86_64(E) shpchp(E) serio_raw(E) lrw(E) acpi_pad(E) gf128mul(E) glue_helper(E) ablk_helper(E) mac_hid(E)
      [   75.835574]  cryptd(E) parport_pc(E) ppdev(E) lp(E) nfsd(E) parport(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) autofs4(E) hid_generic(E) usbhid(E) mxm_wmi(E) psmouse(E) e1000e(E) ptp(E) pps_core(E) ahci(E) libahci(E) wmi(E) video(E) i2c_hid(E) hid(E)
      [   75.858489] CPU: 5 PID: 1603 Comm: rmmod Tainted: G           OE   4.9.0-custom #2
      [   75.866183] Hardware name: System manufacturer System Product Name/Z170-A, BIOS 0901 08/31/2015
      [   75.875050] task: ffff88045d1bbb80 task.stack: ffffc90002de4000
      [   75.881094] RIP: 0010:[<ffffffffa064a0e0>]  [<ffffffffa064a0e0>] amdgpu_fbdev_restore_mode+0x20/0x60 [amdgpu]
      [   75.891238] RSP: 0018:ffffc90002de7d48  EFLAGS: 00010286
      [   75.896648] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
      [   75.903933] RDX: 0000000000000000 RSI: ffff88045d1bbb80 RDI: 0000000000000286
      [   75.911183] RBP: ffffc90002de7d50 R08: 0000000000000502 R09: 0000000000000004
      [   75.918449] R10: 0000000000000000 R11: 0000000000000001 R12: ffff880464bf0000
      [   75.925675] R13: ffffffffa0853000 R14: 0000000000000000 R15: 0000564e44f88210
      [   75.932980] FS:  00007f13d5400700(0000) GS:ffff880476540000(0000) knlGS:0000000000000000
      [   75.941238] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   75.947088] CR2: 00000000000031c0 CR3: 000000045fd0b000 CR4: 00000000003406e0
      [   75.954332] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   75.961566] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   75.968834] Stack:
      [   75.970881]  ffff880464bf0000 ffffc90002de7d60 ffffffffa0636592 ffffc90002de7d80
      [   75.978454]  ffffffffa059015f ffff880464bf0000 ffff880464bf0000 ffffc90002de7da8
      [   75.986076]  ffffffffa0595216 ffff880464bf0000 ffff880460f4d000 ffffffffa0853000
      [   75.993692] Call Trace:
      [   75.996177]  [<ffffffffa0636592>] amdgpu_driver_lastclose_kms+0x12/0x20 [amdgpu]
      [   76.003700]  [<ffffffffa059015f>] drm_lastclose+0x2f/0xd0 [drm]
      [   76.009777]  [<ffffffffa0595216>] drm_dev_unregister+0x16/0xd0 [drm]
      [   76.016255]  [<ffffffffa0595944>] drm_put_dev+0x34/0x70 [drm]
      [   76.022139]  [<ffffffffa062f365>] amdgpu_pci_remove+0x15/0x20 [amdgpu]
      [   76.028800]  [<ffffffff81416499>] pci_device_remove+0x39/0xc0
      [   76.034661]  [<ffffffff81531caa>] __device_release_driver+0x9a/0x140
      [   76.041121]  [<ffffffff81531e58>] driver_detach+0xb8/0xc0
      [   76.046575]  [<ffffffff81530c95>] bus_remove_driver+0x55/0xd0
      [   76.052401]  [<ffffffff815325fc>] driver_unregister+0x2c/0x50
      [   76.058244]  [<ffffffff81416289>] pci_unregister_driver+0x29/0x90
      [   76.064466]  [<ffffffffa0596c5e>] drm_pci_exit+0x9e/0xb0 [drm]
      [   76.070507]  [<ffffffffa0796d71>] amdgpu_exit+0x1c/0x32 [amdgpu]
      [   76.076609]  [<ffffffff81104810>] SyS_delete_module+0x1a0/0x200
      [   76.082627]  [<ffffffff810e2b1a>] ? rcu_eqs_enter.isra.36+0x4a/0x50
      [   76.089001]  [<ffffffff8100392e>] do_syscall_64+0x6e/0x180
      [   76.094583]  [<ffffffff817e1d2f>] entry_SYSCALL64_slow_path+0x25/0x25
      [   76.101114] Code: 94 c0 c3 31 c0 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 31 c0 48 89 e5 53 48 89 fb 48 c7 c7 1d 21 84 a0 e8 ab 77 b3 e0 e8 fc 8b d7 e0 <48> 8b bb c0 31 00 00 48 85 ff 74 09 e8 ff eb fc ff 85 c0 75 03
      [   76.121432] RIP  [<ffffffffa064a0e0>] amdgpu_fbdev_restore_mode+0x20/0x60 [amdgpu]
      Signed-off-by: default avatarRex Zhu <Rex.Zhu@amd.com>
      Reviewed-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      b62ce397
    • Eric Garver's avatar
      geneve: fix fill_info when using collect_metadata · 11387fe4
      Eric Garver authored
      Since 9b4437a5 ("geneve: Unify LWT and netdev handling.") fill_info
      does not return UDP_ZERO_CSUM6_RX when using COLLECT_METADATA. This is
      because it uses ip_tunnel_info_af() with the device level info, which is
      not valid for COLLECT_METADATA.
      
      Fix by checking for the presence of the actual sockets.
      
      Fixes: 9b4437a5 ("geneve: Unify LWT and netdev handling.")
      Signed-off-by: default avatarEric Garver <e@erig.me>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11387fe4
    • Jan Kara's avatar
      xfs: Move handling of missing page into one place in xfs_find_get_desired_pgoff() · a54fba8f
      Jan Kara authored
      Currently several places in xfs_find_get_desired_pgoff() handle the case
      of a missing page. Make them all handled in one place after the loop has
      terminated.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      a54fba8f
    • Jan Kara's avatar
      xfs: Fix off-by-in in loop termination in xfs_find_get_desired_pgoff() · d7fd2425
      Jan Kara authored
      There is an off-by-one error in loop termination conditions in
      xfs_find_get_desired_pgoff() since 'end' may index a page beyond end of
      desired range if 'endoff' is page aligned. It doesn't have any visible
      effects but still it is good to fix it.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      d7fd2425
    • Jan Kara's avatar
      xfs: Fix missed holes in SEEK_HOLE implementation · 5375023a
      Jan Kara authored
      XFS SEEK_HOLE implementation could miss a hole in an unwritten extent as
      can be seen by the following command:
      
      xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "pwrite 128k 8k"
             -c "seek -h 0" file
      wrote 57344/57344 bytes at offset 0
      56 KiB, 14 ops; 0.0000 sec (49.312 MiB/sec and 12623.9856 ops/sec)
      wrote 8192/8192 bytes at offset 131072
      8 KiB, 2 ops; 0.0000 sec (70.383 MiB/sec and 18018.0180 ops/sec)
      Whence	Result
      HOLE	139264
      
      Where we can see that hole at offset 56k was just ignored by SEEK_HOLE
      implementation. The bug is in xfs_find_get_desired_pgoff() which does
      not properly detect the case when pages are not contiguous.
      
      Fix the problem by properly detecting when found page has larger offset
      than expected.
      
      CC: stable@vger.kernel.org
      Fixes: d126d43fSigned-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      5375023a
    • Eryu Guan's avatar
      xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff() · 8affebe1
      Eryu Guan authored
      xfs_find_get_desired_pgoff() is used to search for offset of hole or
      data in page range [index, end] (both inclusive), and the max number
      of pages to search should be at least one, if end == index.
      Otherwise the only page is missed and no hole or data is found,
      which is not correct.
      
      When block size is smaller than page size, this can be demonstrated
      by preallocating a file with size smaller than page size and writing
      data to the last block. E.g. run this xfs_io command on a 1k block
      size XFS on x86_64 host.
      
        # xfs_io -fc "falloc 0 3k" -c "pwrite 2k 1k" \
        	    -c "seek -d 0" /mnt/xfs/testfile
        wrote 1024/1024 bytes at offset 2048
        1 KiB, 1 ops; 0.0000 sec (33.675 MiB/sec and 34482.7586 ops/sec)
        Whence  Result
        DATA    EOF
      
      Data at offset 2k was missed, and lseek(2) returned ENXIO.
      
      This is uncovered by generic/285 subtest 07 and 08 on ppc64 host,
      where pagesize is 64k. Because a recent change to generic/285
      reduced the preallocated file size to smaller than 64k.
      
      Cc: stable@vger.kernel.org # v3.7+
      Signed-off-by: default avatarEryu Guan <eguan@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      8affebe1