1. 07 Dec, 2015 9 commits
  2. 04 Dec, 2015 3 commits
    • Ilya Dryomov's avatar
      block: detach bdev inode from its wb in __blkdev_put() · 43d1c0eb
      Ilya Dryomov authored
      Since 52ebea74 ("writeback: make backing_dev_info host
      cgroup-specific bdi_writebacks") inode, at some point in its lifetime,
      gets attached to a wb (struct bdi_writeback).  Detaching happens on
      evict, in inode_detach_wb() called from __destroy_inode(), and involves
      updating wb.
      
      However, detaching an internal bdev inode from its wb in
      __destroy_inode() is too late.  Its bdi and by extension root wb are
      embedded into struct request_queue, which has different lifetime rules
      and can be freed long before the final bdput() is called (can be from
      __fput() of a corresponding /dev inode, through dput() - evict() -
      bd_forget().  bdevs hold onto the underlying disk/queue pair only while
      opened; as soon as bdev is closed all bets are off.  In fact,
      disk/queue can be gone before __blkdev_put() even returns:
      
      1499 static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
      1500 {
      ...
      1518         if (bdev->bd_contains == bdev) {
      1519                 if (disk->fops->release)
      1520                         disk->fops->release(disk, mode);
      
      [ Driver puts its references to disk/queue ]
      
      1521         }
      1522         if (!bdev->bd_openers) {
      1523                 struct module *owner = disk->fops->owner;
      1524
      1525                 disk_put_part(bdev->bd_part);
      1526                 bdev->bd_part = NULL;
      1527                 bdev->bd_disk = NULL;
      1528                 if (bdev != bdev->bd_contains)
      1529                         victim = bdev->bd_contains;
      1530                 bdev->bd_contains = NULL;
      1531
      1532                 put_disk(disk);
      
      [ We put ours, the queue is gone
        The last bdput() would result in a write to invalid memory ]
      
      1533                 module_put(owner);
      ...
      1539 }
      
      Since bdev inodes are special anyway, detach them in __blkdev_put()
      after clearing inode's dirty bits, turning the problematic
      inode_detach_wb() in __destroy_inode() into a noop.
      
      add_disk() grabs its disk->queue since 523e1d39 ("block: make
      gendisk hold a reference to its queue"), so the old ->release comment
      is removed in favor of the new inode_detach_wb() comment.
      
      Cc: stable@vger.kernel.org # 4.2+, needs backporting
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Tested-by: default avatarRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      43d1c0eb
    • Ken Xue's avatar
      SCSI: Fix NULL pointer dereference in runtime PM · 4fd41a85
      Ken Xue authored
      The routines in scsi_pm.c assume that if a runtime-PM callback is
      invoked for a SCSI device, it can only mean that the device's driver
      has asked the block layer to handle the runtime power management (by
      calling blk_pm_runtime_init(), which among other things sets q->dev).
      
      However, this assumption turns out to be wrong for things like the ses
      driver.  Normally ses devices are not allowed to do runtime PM, but
      userspace can override this setting.  If this happens, the kernel gets
      a NULL pointer dereference when blk_post_runtime_resume() tries to use
      the uninitialized q->dev pointer.
      
      This patch fixes the problem by checking q->dev in block layer before
      handle runtime PM. Since ses doesn't define any PM callbacks and call
      blk_pm_runtime_init(), the crash won't occur.
      
      This fixes Bugzilla #101371.
      https://bugzilla.kernel.org/show_bug.cgi?id=101371
      
      More discussion can be found from below link.
      http://marc.info/?l=linux-scsi&m=144163730531875&w=2Signed-off-by: default avatarKen Xue <Ken.Xue@amd.com>
      Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Cc: Xiangliang Yu <Xiangliang.Yu@amd.com>
      Cc: James E.J. Bottomley <JBottomley@odin.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Michael Terry <Michael.terry@canonical.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      4fd41a85
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 071f5d10
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "A lot of Thanksgiving turkey leftovers accumulated, here goes:
      
         1) Fix bluetooth l2cap_chan object leak, from Johan Hedberg.
      
         2) IDs for some new iwlwifi chips, from Oren Givon.
      
         3) Fix rtlwifi lockups on boot, from Larry Finger.
      
         4) Fix memory leak in fm10k, from Stephen Hemminger.
      
         5) We have a route leak in the ipv6 tunnel infrastructure, fix from
            Paolo Abeni.
      
         6) Fix buffer pointer handling in arm64 bpf JIT,f rom Zi Shen Lim.
      
         7) Wrong lockdep annotations in tcp md5 support, fix from Eric
            Dumazet.
      
         8) Work around some middle boxes which prevent proper handling of TCP
            Fast Open, from Yuchung Cheng.
      
         9) TCP repair can do huge kmalloc() requests, build paged SKBs
            instead.  From Eric Dumazet.
      
        10) Fix msg_controllen overflow in scm_detach_fds, from Daniel
            Borkmann.
      
        11) Fix device leaks on ipmr table destruction in ipv4 and ipv6, from
            Nikolay Aleksandrov.
      
        12) Fix use after free in epoll with AF_UNIX sockets, from Rainer
            Weikusat.
      
        13) Fix double free in VRF code, from Nikolay Aleksandrov.
      
        14) Fix skb leaks on socket receive queue in tipc, from Ying Xue.
      
        15) Fix ifup/ifdown crach in xgene driver, from Iyappan Subramanian.
      
        16) Fix clearing of persistent array maps in bpf, from Daniel
            Borkmann.
      
        17) In TCP, for the cross-SYN case, we don't initialize tp->copied_seq
            early enough.  From Eric Dumazet.
      
        18) Fix out of bounds accesses in bpf array implementation when
            updating elements, from Daniel Borkmann.
      
        19) Fill gaps in RCU protection of np->opt in ipv6 stack, from Eric
            Dumazet.
      
        20) When dumping proxy neigh entries, we have to accomodate NULL
            device pointers properly, from Konstantin Khlebnikov.
      
        21) SCTP doesn't release all ipv6 socket resources properly, fix from
            Eric Dumazet.
      
        22) Prevent underflows of sch->q.qlen for multiqueue packet
            schedulers, also from Eric Dumazet.
      
        23) Fix MAC and unicast list handling in bnxt_en driver, from Jeffrey
            Huang and Michael Chan.
      
        24) Don't actively scan radar channels, from Antonio Quartulli"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (110 commits)
        net: phy: reset only targeted phy
        bnxt_en: Setup uc_list mac filters after resetting the chip.
        bnxt_en: enforce proper storing of MAC address
        bnxt_en: Fixed incorrect implementation of ndo_set_mac_address
        net: lpc_eth: remove irq > NR_IRQS check from probe()
        net_sched: fix qdisc_tree_decrease_qlen() races
        openvswitch: fix hangup on vxlan/gre/geneve device deletion
        ipv4: igmp: Allow removing groups from a removed interface
        ipv6: sctp: implement sctp_v6_destroy_sock()
        arm64: bpf: add 'store immediate' instruction
        ipv6: kill sk_dst_lock
        ipv6: sctp: add rcu protection around np->opt
        net/neighbour: fix crash at dumping device-agnostic proxy entries
        sctp: use GFP_USER for user-controlled kmalloc
        sctp: convert sack_needed and sack_generation to bits
        ipv6: add complete rcu protection around np->opt
        bpf: fix allocation warnings in bpf maps and integer overflow
        mvebu: dts: enable IP checksum with jumbo frames for Armada 38x on Port0
        net: mvneta: enable setting custom TX IP checksum limit
        net: mvneta: fix error path for building skb
        ...
      071f5d10
  3. 03 Dec, 2015 28 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 2873d32f
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A collection of fixes from this series.  The most important here is a
        regression fix for an issue that some folks would hit in blk-merge.c,
        and the NVMe queue depth limit for the screwed up Apple "nvme"
        controller.
      
        In more detail, this pull request contains:
      
         - a set of fixes for null_blk, including a fix for a few corner cases
           where we could hang the device.  From Arianna and Paolo.
      
         - lightnvm:
              - A build improvement from Keith.
              - Update the qemu pci id detection from Matias.
              - Error handling fixes for leaks and other little fixes from
                Sudip and Wenwei.
      
         - fix from Eric where BLKRRPART would not return EBUSY for whole
           device mounts, only when partitions were mounted.
      
         - fix from Jan Kara, where EOF O_DIRECT reads would return
           negatively.
      
         - remove check for rq_mergeable() when checking limits for cloned
           requests.  The check doesn't make any sense.  It's assuming that
           since NOMERGE is set on the request that we don't have to
           recalculate limits since the request didn't change, but that's not
           true if the request has been redirected.  From Hannes.
      
         - correctly get the bio front segment value set for single segment
           bio's, fixing a BUG() in blk-merge.  From Ming"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        nvme: temporary fix for Apple controller reset
        null_blk: change type of completion_nsec to unsigned long
        null_blk: guarantee device restart in all irq modes
        null_blk: set a separate timer for each command
        blk-merge: fix computing bio->bi_seg_front_size in case of single segment
        direct-io: Fix negative return from dio read beyond eof
        block: Always check queue limits for cloned requests
        lightnvm: missing nvm_lock acquire
        lightnvm: unconverted ppa returned in get_bb_tbl
        lightnvm: refactor and change vendor id for qemu
        lightnvm: do device max sectors boundary check first
        lightnvm: fix ioctl memory leaks
        lightnvm: free memory when gennvm register fails
        lightnvm: Simplify config when disabled
        Return EBUSY from BLKRRPART for mounted whole-dev fs
      2873d32f
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · c041f087
      Linus Torvalds authored
      Pull tracing fix from Steven Rostedt:
       "During the merge window I added a new file that is used to filter
        trace events on pids.  It filters all events where only tasks with
        their pid in that file exists.  It also handles the sched_switch and
        sched_wakeup trace events where the current task does not have its pid
        in the file, but the task either being switched to or awaken does.
      
        Unfortunately, I forgot about sched_wakeup_new and sched_waking.  Both
        of these tracepoints use the same class as the sched_wakeup
        tracepoint, and they too should be included in what gets filtered by
        the set_event_pid file"
      
      * tag 'trace-v4.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Add sched_wakeup_new and sched_waking tracepoints for pid filter
      c041f087
    • David S. Miller's avatar
      Merge tag 'mac80211-for-davem-2015-12-02' of... · e3c9b1ef
      David S. Miller authored
      Merge tag 'mac80211-for-davem-2015-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      A small set of fixes for 4.4:
       * fix scanning in mac80211 to not actively scan radar
         channels (from Antonio)
       * fix uninitialized variable in remain-on-channel that
         could lead to treating frame TX as remain-on-channel
         and not sending the frame at all
       * remove NL80211_FEATURE_FULL_AP_CLIENT_STATE again, it
         was broken and needs more work, we'll enable it later
       * fix call_rcu() induced use-after-reset/free in mesh
         (that was suddenly causing issues in certain tests)
       * always request block-ack window size 64 as we found
         some APs will otherwise crash (really ...)
       * fix P2P-Device teardown sequence to avoid restarting
         with uninitialized data
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3c9b1ef
    • Jérôme Pouiller's avatar
      net: phy: reset only targeted phy · cf18b778
      Jérôme Pouiller authored
      It is possible to address another chip on same MDIO bus. The case is
      correctly handled for media advertising. It is taken into account
      only if mii_data->phy_id == phydev->addr. However, this condition
      was missing for reset case.
      Signed-off-by: default avatarJérôme Pouiller <jezz@sysmic.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf18b778
    • David S. Miller's avatar
      Merge branch 'bnxt_en-fixes' · c5ba5c8a
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: set mac address and uc_list bug fixes.
      
      Fix ndo_set_mac_address() for PF and VF.
      Re-apply uc_list after chip reset.
      
      v2: Fix compile error if CONFIG_BNXT_SRIOV is not set.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5ba5c8a
    • Michael Chan's avatar
      bnxt_en: Setup uc_list mac filters after resetting the chip. · b664f008
      Michael Chan authored
      Call bnxt_cfg_rx_mode() in bnxt_init_chip() to setup uc_list and
      mc_list mac address filters.  Before the patch, uc_list is not
      setup again after chip reset (such as ethtool ring size change)
      and macvlans don't work any more after that.
      
      Modify bnxt_cfg_rx_mode() to return error codes appropriately so
      that the init chip sequence can detect any failures.
      Signed-off-by: default avatarMichael Chan <mchan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b664f008
    • Jeffrey Huang's avatar
      bnxt_en: enforce proper storing of MAC address · bdd4347b
      Jeffrey Huang authored
      For PF, the bp->pf.mac_addr always holds the permanent MAC
      addr assigned by the HW.  For VF, the bp->vf.mac_addr always
      holds the administrator assigned VF MAC addr. The random
      generated VF MAC addr should never get stored to bp->vf.mac_addr.
      This way, when the VF wants to change the MAC address, we can tell
      if the adminstrator has already set it and disallow the VF from
      changing it.
      
      v2: Fix compile error if CONFIG_BNXT_SRIOV is not set.
      Signed-off-by: default avatarJeffrey Huang <huangjw@broadcom.com>
      Signed-off-by: default avatarMichael Chan <mchan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bdd4347b
    • Jeffrey Huang's avatar
      bnxt_en: Fixed incorrect implementation of ndo_set_mac_address · 1fc2cfd0
      Jeffrey Huang authored
      The existing ndo_set_mac_address only copies the new MAC addr
      and didn't set the new MAC addr to the HW. The correct way is
      to delete the existing default MAC filter from HW and add
      the new one. Because of RFS filters are also dependent on the
      default mac filter l2 context, the driver must go thru
      close_nic() to delete the default MAC and RFS filters, then
      open_nic() to set the default MAC address to HW.
      Signed-off-by: default avatarJeffrey Huang <huangjw@broadcom.com>
      Signed-off-by: default avatarMichael Chan <mchan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1fc2cfd0
    • Vladimir Zapolskiy's avatar
      net: lpc_eth: remove irq > NR_IRQS check from probe() · 39198ec9
      Vladimir Zapolskiy authored
      If the driver is used on an ARM platform with SPARSE_IRQ defined,
      semantics of NR_IRQS is different (minimal value of virtual irqs) and
      by default it is set to 16, see arch/arm/include/asm/irq.h.
      
      This value may be less than the actual number of virtual irqs, which
      may break the driver initialization. The check removal allows to use
      the driver on such a platform, and, if irq controller driver works
      correctly, the check is not needed on legacy platforms.
      
      Fixes a runtime problem:
      
          lpc-eth 31060000.ethernet: error getting resources.
          lpc_eth: lpc-eth: not found (-6).
      Signed-off-by: default avatarVladimir Zapolskiy <vz@mleia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39198ec9
    • Eric Dumazet's avatar
      net_sched: fix qdisc_tree_decrease_qlen() races · 4eaf3b84
      Eric Dumazet authored
      qdisc_tree_decrease_qlen() suffers from two problems on multiqueue
      devices.
      
      One problem is that it updates sch->q.qlen and sch->qstats.drops
      on the mq/mqprio root qdisc, while it should not : Daniele
      reported underflows errors :
      [  681.774821] PAX: sch->q.qlen: 0 n: 1
      [  681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head;
      [  681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G           O    4.2.6.201511282239-1-grsec #1
      [  681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015
      [  681.774956]  ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c
      [  681.774959]  ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b
      [  681.774960]  ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001
      [  681.774962] Call Trace:
      [  681.774967]  [<ffffffffa95d2810>] dump_stack+0x4c/0x7f
      [  681.774970]  [<ffffffffa91a44f4>] report_size_overflow+0x34/0x50
      [  681.774972]  [<ffffffffa94d17e2>] qdisc_tree_decrease_qlen+0x152/0x160
      [  681.774976]  [<ffffffffc02694b1>] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel]
      [  681.774978]  [<ffffffffc02680a0>] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel]
      [  681.774980]  [<ffffffffa94cd92d>] __qdisc_run+0x4d/0x1d0
      [  681.774983]  [<ffffffffa949b2b2>] net_tx_action+0xc2/0x160
      [  681.774985]  [<ffffffffa90664c1>] __do_softirq+0xf1/0x200
      [  681.774987]  [<ffffffffa90665ee>] run_ksoftirqd+0x1e/0x30
      [  681.774989]  [<ffffffffa90896b0>] smpboot_thread_fn+0x150/0x260
      [  681.774991]  [<ffffffffa9089560>] ? sort_range+0x40/0x40
      [  681.774992]  [<ffffffffa9085fe4>] kthread+0xe4/0x100
      [  681.774994]  [<ffffffffa9085f00>] ? kthread_worker_fn+0x170/0x170
      [  681.774995]  [<ffffffffa95d8d1e>] ret_from_fork+0x3e/0x70
      
      mq/mqprio have their own ways to report qlen/drops by folding stats on
      all their queues, with appropriate locking.
      
      A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup()
      without proper locking : concurrent qdisc updates could corrupt the list
      that qdisc_match_from_root() parses to find a qdisc given its handle.
      
      Fix first problem adding a TCQ_F_NOPARENT qdisc flag that
      qdisc_tree_decrease_qlen() can use to abort its tree traversal,
      as soon as it meets a mq/mqprio qdisc children.
      
      Second problem can be fixed by RCU protection.
      Qdisc are already freed after RCU grace period, so qdisc_list_add() and
      qdisc_list_del() simply have to use appropriate rcu list variants.
      
      A future patch will add a per struct netdev_queue list anchor, so that
      qdisc_tree_decrease_qlen() can have more efficient lookups.
      Reported-by: default avatarDaniele Fucini <dfucini@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Cong Wang <cwang@twopensource.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4eaf3b84
    • Paolo Abeni's avatar
      openvswitch: fix hangup on vxlan/gre/geneve device deletion · 13175303
      Paolo Abeni authored
      Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference
      to the underlying tunnel device, but never released it when such
      device is deleted.
      Deleting the underlying device via the ip tool cause the kernel to
      hangup in the netdev_wait_allrefs() loop.
      This commit ensure that on device unregistration dp_detach_port_notify()
      is called for all vports that hold the device reference, properly
      releasing it.
      
      Fixes: 614732ea ("openvswitch: Use regular VXLAN net_device device")
      Fixes: b2acd1dc ("openvswitch: Use regular GRE net_device instead of vport")
      Fixes: 6b001e68 ("openvswitch: Use Geneve device.")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarFlavio Leitner <fbl@sysclose.org>
      Acked-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      13175303
    • Andrew Lunn's avatar
      ipv4: igmp: Allow removing groups from a removed interface · 4eba7bb1
      Andrew Lunn authored
      When a multicast group is joined on a socket, a struct ip_mc_socklist
      is appended to the sockets mc_list containing information about the
      joined group.
      
      If the interface is hot unplugged, this entry becomes stale. Prior to
      commit 52ad353a ("igmp: fix the problem when mc leave group") it
      was possible to remove the stale entry by performing a
      IP_DROP_MEMBERSHIP, passing either the old ifindex or ip address on
      the interface. However, this fix enforces that the interface must
      still exist. Thus with time, the number of stale entries grows, until
      sysctl_igmp_max_memberships is reached and then it is not possible to
      join and more groups.
      
      The previous patch fixes an issue where a IP_DROP_MEMBERSHIP is
      performed without specifying the interface, either by ifindex or ip
      address. However here we do supply one of these. So loosen the
      restriction on device existence to only apply when the interface has
      not been specified. This then restores the ability to clean up the
      stale entries.
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Fixes: 52ad353a "(igmp: fix the problem when mc leave group")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4eba7bb1
    • Eric Dumazet's avatar
      ipv6: sctp: implement sctp_v6_destroy_sock() · 602dd62d
      Eric Dumazet authored
      Dmitry Vyukov reported a memory leak using IPV6 SCTP sockets.
      
      We need to call inet6_destroy_sock() to properly release
      inet6 specific fields.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      602dd62d
    • David S. Miller's avatar
      Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · 79aecc72
      David S. Miller authored
      Johan Hedberg says:
      
      ====================
      pull request: bluetooth 2015-12-01
      
      Here's a Bluetooth fix for the 4.4-rc series that fixes a memory leak of
      the Security Manager L2CAP channel that'll happen for every LE
      connection.
      
      Please let me know if there are any issues pulling. Thanks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79aecc72
    • Yang Shi's avatar
      arm64: bpf: add 'store immediate' instruction · df849ba3
      Yang Shi authored
      aarch64 doesn't have native store immediate instruction, such operation
      has to be implemented by the below instruction sequence:
      
      Load immediate to register
      Store register
      Signed-off-by: default avatarYang Shi <yang.shi@linaro.org>
      CC: Zi Shen Lim <zlim.lnx@gmail.com>
      CC: Xi Wang <xi.wang@gmail.com>
      Reviewed-by: default avatarZi Shen Lim <zlim.lnx@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df849ba3
    • Eric Dumazet's avatar
      ipv6: kill sk_dst_lock · 6bd4f355
      Eric Dumazet authored
      While testing the np->opt RCU conversion, I found that UDP/IPv6 was
      using a mixture of xchg() and sk_dst_lock to protect concurrent changes
      to sk->sk_dst_cache, leading to possible corruptions and crashes.
      
      ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest
      way to fix the mess is to remove sk_dst_lock completely, as we did for
      IPv4.
      
      __ip6_dst_store() and ip6_dst_store() share same implementation.
      
      sk_setup_caps() being called with socket lock being held or not,
      we have to use sk_dst_set() instead of __sk_dst_set()
      
      Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);"
      in ip6_dst_store() before the sk_setup_caps(sk, dst) call.
      
      This is because ip6_dst_store() can be called from process context,
      without any lock held.
      
      As soon as the dst is installed in sk->sk_dst_cache, dst can be freed
      from another cpu doing a concurrent ip6_dst_store()
      
      Doing the dst dereference before doing the install is needed to make
      sure no use after free would trigger.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6bd4f355
    • Eric Dumazet's avatar
      ipv6: sctp: add rcu protection around np->opt · c836a8ba
      Eric Dumazet authored
      This patch completes the work I did in commit 45f6fad8
      ("ipv6: add complete rcu protection around np->opt"), as I missed
      sctp part.
      
      This simply makes sure np->opt is used with proper RCU locking
      and accessors.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c836a8ba
    • Konstantin Khlebnikov's avatar
      net/neighbour: fix crash at dumping device-agnostic proxy entries · 6adc5fd6
      Konstantin Khlebnikov authored
      Proxy entries could have null pointer to net-device.
      Signed-off-by: default avatarKonstantin Khlebnikov <koct9i@gmail.com>
      Fixes: 84920c14 ("net: Allow ipv6 proxies and arp proxies be shown with iproute2")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6adc5fd6
    • Marcelo Ricardo Leitner's avatar
      sctp: use GFP_USER for user-controlled kmalloc · cacc0621
      Marcelo Ricardo Leitner authored
      Dmitry Vyukov reported that the user could trigger a kernel warning by
      using a large len value for getsockopt SCTP_GET_LOCAL_ADDRS, as that
      value directly affects the value used as a kmalloc() parameter.
      
      This patch thus switches the allocation flags from all user-controllable
      kmalloc size to GFP_USER to put some more restrictions on it and also
      disables the warn, as they are not necessary.
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cacc0621
    • Marcelo Ricardo Leitner's avatar
      sctp: convert sack_needed and sack_generation to bits · 38ee8fb6
      Marcelo Ricardo Leitner authored
      They don't need to be any bigger than that and with this we start a new
      bitfield for tracking association runtime stuff, like zero window
      situation.
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38ee8fb6
    • Eric Dumazet's avatar
      ipv6: add complete rcu protection around np->opt · 45f6fad8
      Eric Dumazet authored
      This patch addresses multiple problems :
      
      UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
      while socket is not locked : Other threads can change np->opt
      concurrently. Dmitry posted a syzkaller
      (http://github.com/google/syzkaller) program desmonstrating
      use-after-free.
      
      Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
      and dccp_v6_request_recv_sock() also need to use RCU protection
      to dereference np->opt once (before calling ipv6_dup_options())
      
      This patch adds full RCU protection to np->opt
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45f6fad8
    • Alexei Starovoitov's avatar
      bpf: fix allocation warnings in bpf maps and integer overflow · 01b3f521
      Alexei Starovoitov authored
      For large map->value_size the user space can trigger memory allocation warnings like:
      WARNING: CPU: 2 PID: 11122 at mm/page_alloc.c:2989
      __alloc_pages_nodemask+0x695/0x14e0()
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff82743b56>] dump_stack+0x68/0x92 lib/dump_stack.c:50
       [<ffffffff81244ec9>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:460
       [<ffffffff812450f9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:493
       [<     inline     >] __alloc_pages_slowpath mm/page_alloc.c:2989
       [<ffffffff81554e95>] __alloc_pages_nodemask+0x695/0x14e0 mm/page_alloc.c:3235
       [<ffffffff816188fe>] alloc_pages_current+0xee/0x340 mm/mempolicy.c:2055
       [<     inline     >] alloc_pages include/linux/gfp.h:451
       [<ffffffff81550706>] alloc_kmem_pages+0x16/0xf0 mm/page_alloc.c:3414
       [<ffffffff815a1c89>] kmalloc_order+0x19/0x60 mm/slab_common.c:1007
       [<ffffffff815a1cef>] kmalloc_order_trace+0x1f/0xa0 mm/slab_common.c:1018
       [<     inline     >] kmalloc_large include/linux/slab.h:390
       [<ffffffff81627784>] __kmalloc+0x234/0x250 mm/slub.c:3525
       [<     inline     >] kmalloc include/linux/slab.h:463
       [<     inline     >] map_update_elem kernel/bpf/syscall.c:288
       [<     inline     >] SYSC_bpf kernel/bpf/syscall.c:744
      
      To avoid never succeeding kmalloc with order >= MAX_ORDER check that
      elem->value_size and computed elem_size are within limits for both hash and
      array type maps.
      Also add __GFP_NOWARN to kmalloc(value_size | elem_size) to avoid OOM warnings.
      Note kmalloc(key_size) is highly unlikely to trigger OOM, since key_size <= 512,
      so keep those kmalloc-s as-is.
      
      Large value_size can cause integer overflows in elem_size and map.pages
      formulas, so check for that as well.
      
      Fixes: aaac3ba9 ("bpf: charge user for creation of BPF maps and programs")
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      01b3f521
    • David S. Miller's avatar
      Merge branch 'mvneta-fixes' · a64f3f83
      David S. Miller authored
      Marcin Wojtas says:
      
      ====================
      Marvell Armada XP/370/38X Neta fixes
      
      I'm sending v4 with corrected commit log of the last patch, in order to
      avoid possible conflicts between the branches as suggested by Gregory
      Clement.
      
      Best regards,
      Marcin Wojtas
      
      Changes from v4:
      * Correct commit log of patch 6/6
      
      Changes from v2:
      * Style fixes in patch updating mbus protection
      * Remove redundant stable notifications except for patch 4/6
      
      Changes from v1:
      * update MBUS windows access protection register once, after whole loop
      * add fixing value of MVNETA_RXQ_INTR_ENABLE_ALL_MASK
      * add fixing error path for skb_build()
      * add possibility of setting custom TX IP checksum limit in DT property
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a64f3f83
    • Marcin Wojtas's avatar
      mvebu: dts: enable IP checksum with jumbo frames for Armada 38x on Port0 · c4a25007
      Marcin Wojtas authored
      The Ethernet controller found in the Armada 38x SoC's family support
      TCP/IP checksumming with frame sizes larger than 1600 bytes, however
      only on port 0.
      
      This commit enables it by setting 'tx-csum-limit' to 9800B in
      'ethernet@70000' node.
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4a25007
    • Marcin Wojtas's avatar
      net: mvneta: enable setting custom TX IP checksum limit · 9110ee07
      Marcin Wojtas authored
      Since Armada 38x SoC can support IP checksum for jumbo frames only on
      a single port, it means that this feature should be enabled per-port,
      rather than for the whole SoC.
      
      This patch enables setting custom TX IP checksum limit by adding new
      optional property to the mvneta device tree node. If not used, by
      default 1600B is set for "marvell,armada-370-neta" and 9800B for other
      strings, which ensures backward compatibility. Binding documentation
      is updated accordingly.
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9110ee07
    • Marcin Wojtas's avatar
      net: mvneta: fix error path for building skb · 26c17a17
      Marcin Wojtas authored
      In the actual RX processing, there is same error path for both descriptor
      ring refilling and building skb fails. This is not correct, because after
      successful refill, the ring is already updated with newly allocated
      buffer. Then, in case of build_skb() fail, hitherto code left the original
      buffer unmapped.
      
      This patch fixes above situation by swapping error check of skb build with
      DMA-unmap of original buffer.
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Acked-by: default avatarSimon Guinot <simon.guinot@sequanux.org>
      Cc: <stable@vger.kernel.org> # v4.2+
      Fixes a84e3289 ("net: mvneta: fix refilling for Rx DMA buffers")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26c17a17
    • Marcin Wojtas's avatar
      net: mvneta: fix bit assignment for RX packet irq enable · dc1aadf6
      Marcin Wojtas authored
      A value originally defined in the driver was inappropriate. Even though
      the ingress was somehow working, writing MVNETA_RXQ_INTR_ENABLE_ALL_MASK
      to MVNETA_INTR_ENABLE didn't make any effect, because the bits [31:16]
      are reserved and read-only.
      
      This commit updates MVNETA_RXQ_INTR_ENABLE_ALL_MASK to be compliant with
      the controller's documentation.
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      
      Fixes: c5aff182 ("net: mvneta: driver for Marvell Armada 370/XP network
      unit")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc1aadf6
    • Marcin Wojtas's avatar
      net: mvneta: fix bit assignment in MVNETA_RXQ_CONFIG_REG · e5bdf689
      Marcin Wojtas authored
      MVNETA_RXQ_HW_BUF_ALLOC bit which controls enabling hardware buffer
      allocation was mistakenly set as BIT(1). This commit fixes the assignment.
      Signed-off-by: default avatarMarcin Wojtas <mw@semihalf.com>
      Reviewed-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      
      Fixes: c5aff182 ("net: mvneta: driver for Marvell Armada 370/XP network
      unit")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5bdf689