1. 15 Mar, 2017 14 commits
    • Feras Daoud's avatar
      IB/ipoib: Fix deadlock between rmmod and set_mode · 10beca53
      Feras Daoud authored
      commit 0a0007f2 upstream.
      
      When calling set_mode from sys/fs, the call flow locks the sys/fs lock
      first and then tries to lock rtnl_lock (when calling ipoib_set_mod).
      On the other hand, the rmmod call flow takes the rtnl_lock first
      (when calling unregister_netdev) and then tries to take the sys/fs
      lock. Deadlock a->b, b->a.
      
      The problem starts when ipoib_set_mod frees it's rtnl_lck and tries
      to get it after that.
      
          set_mod:
          [<ffffffff8104f2bd>] ? check_preempt_curr+0x6d/0x90
          [<ffffffff814fee8e>] __mutex_lock_slowpath+0x13e/0x180
          [<ffffffff81448655>] ? __rtnl_unlock+0x15/0x20
          [<ffffffff814fed2b>] mutex_lock+0x2b/0x50
          [<ffffffff81448675>] rtnl_lock+0x15/0x20
          [<ffffffffa02ad807>] ipoib_set_mode+0x97/0x160 [ib_ipoib]
          [<ffffffffa02b5f5b>] set_mode+0x3b/0x80 [ib_ipoib]
          [<ffffffff8134b840>] dev_attr_store+0x20/0x30
          [<ffffffff811f0fe5>] sysfs_write_file+0xe5/0x170
          [<ffffffff8117b068>] vfs_write+0xb8/0x1a0
          [<ffffffff8117ba81>] sys_write+0x51/0x90
          [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
      
          rmmod:
          [<ffffffff81279ffc>] ? put_dec+0x10c/0x110
          [<ffffffff8127a2ee>] ? number+0x2ee/0x320
          [<ffffffff814fe6a5>] schedule_timeout+0x215/0x2e0
          [<ffffffff8127cc04>] ? vsnprintf+0x484/0x5f0
          [<ffffffff8127b550>] ? string+0x40/0x100
          [<ffffffff814fe323>] wait_for_common+0x123/0x180
          [<ffffffff81060250>] ? default_wake_function+0x0/0x20
          [<ffffffff8119661e>] ? ifind_fast+0x5e/0xb0
          [<ffffffff814fe43d>] wait_for_completion+0x1d/0x20
          [<ffffffff811f2e68>] sysfs_addrm_finish+0x228/0x270
          [<ffffffff811f2fb3>] sysfs_remove_dir+0xa3/0xf0
          [<ffffffff81273f66>] kobject_del+0x16/0x40
          [<ffffffff8134cd14>] device_del+0x184/0x1e0
          [<ffffffff8144e59b>] netdev_unregister_kobject+0xab/0xc0
          [<ffffffff8143c05e>] rollback_registered+0xae/0x130
          [<ffffffff8143c102>] unregister_netdevice+0x22/0x70
          [<ffffffff8143c16e>] unregister_netdev+0x1e/0x30
          [<ffffffffa02a91b0>] ipoib_remove_one+0xe0/0x120 [ib_ipoib]
          [<ffffffffa01ed95f>] ib_unregister_device+0x4f/0x100 [ib_core]
          [<ffffffffa021f5e1>] mlx4_ib_remove+0x41/0x180 [mlx4_ib]
          [<ffffffffa01ab771>] mlx4_remove_device+0x71/0x90 [mlx4_core]
      
      Fixes: 862096a8 ("IB/ipoib: Add more rtnl_link_ops callbacks")
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarFeras Daoud <ferasda@mellanox.com>
      Signed-off-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      10beca53
    • Eric W. Biederman's avatar
      mnt: Tuck mounts under others instead of creating shadow/side mounts. · 839d4268
      Eric W. Biederman authored
      commit 1064f874 upstream.
      
      Ever since mount propagation was introduced in cases where a mount in
      propagated to parent mount mountpoint pair that is already in use the
      code has placed the new mount behind the old mount in the mount hash
      table.
      
      This implementation detail is problematic as it allows creating
      arbitrary length mount hash chains.
      
      Furthermore it invalidates the constraint maintained elsewhere in the
      mount code that a parent mount and a mountpoint pair will have exactly
      one mount upon them.  Making it hard to deal with and to talk about
      this special case in the mount code.
      
      Modify mount propagation to notice when there is already a mount at
      the parent mount and mountpoint where a new mount is propagating to
      and place that preexisting mount on top of the new mount.
      
      Modify unmount propagation to notice when a mount that is being
      unmounted has another mount on top of it (and no other children), and
      to replace the unmounted mount with the mount on top of it.
      
      Move the MNT_UMUONT test from __lookup_mnt_last into
      __propagate_umount as that is the only call of __lookup_mnt_last where
      MNT_UMOUNT may be set on any mount visible in the mount hash table.
      
      These modifications allow:
       - __lookup_mnt_last to be removed.
       - attach_shadows to be renamed __attach_mnt and its shadow
         handling to be removed.
       - commit_tree to be simplified
       - copy_tree to be simplified
      
      The result is an easier to understand tree of mounts that does not
      allow creation of arbitrary length hash chains in the mount hash table.
      
      The result is also a very slight userspace visible difference in semantics.
      The following two cases now behave identically, where before order
      mattered:
      
      case 1: (explicit user action)
      	B is a slave of A
      	mount something on A/a , it will propagate to B/a
      	and than mount something on B/a
      
      case 2: (tucked mount)
      	B is a slave of A
      	mount something on B/a
      	and than mount something on A/a
      
      Histroically umount A/a would fail in case 1 and succeed in case 2.
      Now umount A/a succeeds in both configurations.
      
      This very small change in semantics appears if anything to be a bug
      fix to me and my survey of userspace leads me to believe that no programs
      will notice or care of this subtle semantic change.
      
      v2: Updated to mnt_change_mountpoint to not call dput or mntput
      and instead to decrement the counts directly.  It is guaranteed
      that there will be other references when mnt_change_mountpoint is
      called so this is safe.
      
      v3: Moved put_mountpoint under mount_lock in attach_recursive_mnt
          As the locking in fs/namespace.c changed between v2 and v3.
      
      v4: Reworked the logic in propagate_mount_busy and __propagate_umount
          that detects when a mount completely covers another mount.
      
      v5: Removed unnecessary tests whose result is alwasy true in
          find_topper and attach_recursive_mnt.
      
      v6: Document the user space visible semantic difference.
      
      Fixes: b90fa9ae ("[PATCH] shared mount handling: bind and rbind")
      Tested-by: default avatarAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      839d4268
    • Thomas Petazzoni's avatar
      net: mvpp2: fix DMA address calculation in mvpp2_txq_inc_put() · b57ffb2a
      Thomas Petazzoni authored
      commit 239a3b66 upstream.
      
      When TX descriptors are filled in, the buffer DMA address is split
      between the tx_desc->buf_phys_addr field (high-order bits) and
      tx_desc->packet_offset field (5 low-order bits).
      
      However, when we re-calculate the DMA address from the TX descriptor in
      mvpp2_txq_inc_put(), we do not take tx_desc->packet_offset into
      account. This means that when the DMA address is not aligned on a 32
      bytes boundary, we end up calling dma_unmap_single() with a DMA address
      that was not the one returned by dma_map_single().
      
      This inconsistency is detected by the kernel when DMA_API_DEBUG is
      enabled. We fix this problem by properly calculating the DMA address in
      mvpp2_txq_inc_put().
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b57ffb2a
    • Heiko Carstens's avatar
      s390: use correct input data address for setup_randomness · 376a12eb
      Heiko Carstens authored
      commit 4920e3cf upstream.
      
      The current implementation of setup_randomness uses the stack address
      and therefore the pointer to the SYSIB 3.2.2 block as input data
      address. Furthermore the length of the input data is the number of
      virtual-machine description blocks which is typically one.
      
      This means that typically a single zero byte is fed to
      add_device_randomness.
      
      Fix both of these and use the address of the first virtual machine
      description block as input data address and also use the correct
      length.
      
      Fixes: bcfcbb6b ("s390: add system information as device randomness")
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      376a12eb
    • Heiko Carstens's avatar
      s390: make setup_randomness work · 296f7bd7
      Heiko Carstens authored
      commit da8fd820 upstream.
      
      Commit bcfcbb6b ("s390: add system information as device
      randomness") intended to add some virtual machine specific information
      to the randomness pool.
      
      Unfortunately it uses the page allocator before it is ready to use. In
      result the page allocator always returns NULL and the setup_randomness
      function never adds anything to the randomness pool.
      
      To fix this use memblock_alloc and memblock_free instead.
      
      Fixes: bcfcbb6b ("s390: add system information as device randomness")
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      296f7bd7
    • Martin Schwidefsky's avatar
      s390: TASK_SIZE for kernel threads · 9cf431db
      Martin Schwidefsky authored
      commit fb94a687 upstream.
      
      Return a sensible value if TASK_SIZE if called from a kernel thread.
      
      This gets us around an issue with copy_mount_options that does a magic
      size calculation "TASK_SIZE - (unsigned long)data" while in a kernel
      thread and data pointing to kernel space.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9cf431db
    • Gerald Schaefer's avatar
      s390/dcssblk: fix device size calculation in dcssblk_direct_access() · 792bd1fb
      Gerald Schaefer authored
      commit a63f53e3 upstream.
      
      Since commit dd22f551 "block: Change direct_access calling convention",
      the device size calculation in dcssblk_direct_access() is off-by-one.
      This results in bdev_direct_access() always returning -ENXIO because the
      returned value is not page aligned.
      
      Fix this by adding 1 to the dev_sz calculation.
      
      Fixes: dd22f551 ("block: Change direct_access calling convention")
      Signed-off-by: default avatarGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      792bd1fb
    • Julian Wiedmann's avatar
      s390/qdio: clear DSCI prior to scanning multiple input queues · ec50c80c
      Julian Wiedmann authored
      commit 1e4a382f upstream.
      
      For devices with multiple input queues, tiqdio_call_inq_handlers()
      iterates over all input queues and clears the device's DSCI
      during each iteration. If the DSCI is re-armed during one
      of the later iterations, we therefore do not scan the previous
      queues again.
      The re-arming also raises a new adapter interrupt. But its
      handler does not trigger a rescan for the device, as the DSCI
      has already been erroneously cleared.
      This can result in queue stalls on devices with multiple
      input queues.
      
      Fix it by clearing the DSCI just once, prior to scanning the queues.
      
      As the code is moved in front of the loop, we also need to access
      the DSCI directly (ie irq->dsci) instead of going via each queue's
      parent pointer to the same irq. This is not a functional change,
      and a follow-up patch will clean up the other users.
      
      In practice, this bug only affects CQ-enabled HiperSockets devices,
      ie. devices with sysfs-attribute "hsuid" set. Setting a hsuid is
      needed for AF_IUCV socket applications that use HiperSockets
      communication.
      
      Fixes: 104ea556 ("qdio: support asynchronous delivery of storage blocks")
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ec50c80c
    • Dmitry Tunin's avatar
      Bluetooth: Add another AR3012 04ca:3018 device · 00cfdbf5
      Dmitry Tunin authored
      commit 441ad62d upstream.
      
      T:  Bus=01 Lev=01 Prnt=01 Port=07 Cnt=04 Dev#=  5 Spd=12  MxCh= 0
      D:  Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
      P:  Vendor=04ca ProdID=3018 Rev=00.01
      C:  #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
      I:  If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      I:  If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      Signed-off-by: default avatarDmitry Tunin <hanipouspilot@gmail.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      00cfdbf5
    • Chao Peng's avatar
      KVM: VMX: use correct vmcs_read/write for guest segment selector/base · cae929bd
      Chao Peng authored
      commit 96794e4e upstream.
      
      Guest segment selector is 16 bit field and guest segment base is natural
      width field. Fix two incorrect invocations accordingly.
      
      Without this patch, build fails when aggressive inlining is used with ICC.
      Signed-off-by: default avatarChao Peng <chao.p.peng@linux.intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cae929bd
    • Janosch Frank's avatar
      KVM: s390: Disable dirty log retrieval for UCONTROL guests · 0a3df041
      Janosch Frank authored
      commit e1e8a962 upstream.
      
      User controlled KVM guests do not support the dirty log, as they have
      no single gmap that we can check for changes.
      
      As they have no single gmap, kvm->arch.gmap is NULL and all further
      referencing to it for dirty checking will result in a NULL
      dereference.
      
      Let's return -EINVAL if a caller tries to sync dirty logs for a
      UCONTROL guest.
      
      Fixes: 15f36ebd ("KVM: s390: Add proper dirty bitmap support to S390 kvm.")
      Signed-off-by: default avatarJanosch Frank <frankja@linux.vnet.ibm.com>
      Reported-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a3df041
    • Ian Abbott's avatar
      serial: 8250_pci: Add MKS Tenta SCOM-0800 and SCOM-0801 cards · 4b34572e
      Ian Abbott authored
      commit 1c9c858e upstream.
      
      The MKS Instruments SCOM-0800 and SCOM-0801 cards (originally by Tenta
      Technologies) are 3U CompactPCI serial cards with 4 and 8 serial ports,
      respectively.  The first 4 ports are implemented by an OX16PCI954 chip,
      and the second 4 ports are implemented by an OX16C954 chip on a local
      bus, bridged by the second PCI function of the OX16PCI954.  The ports
      are jumper-selectable as RS-232 and RS-422/485, and the UARTs use a
      non-standard oscillator frequency of 20 MHz (base_baud = 1250000).
      Signed-off-by: default avatarIan Abbott <abbotti@mev.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4b34572e
    • Alexander Popov's avatar
      tty: n_hdlc: get rid of racy n_hdlc.tbuf · 999853d9
      Alexander Popov authored
      commit 82f2341c upstream.
      
      Currently N_HDLC line discipline uses a self-made singly linked list for
      data buffers and has n_hdlc.tbuf pointer for buffer retransmitting after
      an error.
      
      The commit be10eb75
      ("tty: n_hdlc add buffer flushing") introduced racy access to n_hdlc.tbuf.
      After tx error concurrent flush_tx_queue() and n_hdlc_send_frames() can put
      one data buffer to tx_free_buf_list twice. That causes double free in
      n_hdlc_release().
      
      Let's use standard kernel linked list and get rid of n_hdlc.tbuf:
      in case of tx error put current data buffer after the head of tx_buf_list.
      Signed-off-by: default avatarAlexander Popov <alex.popov@linux.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      999853d9
    • Jiri Slaby's avatar
      TTY: n_hdlc, fix lockdep false positive · 59c4d783
      Jiri Slaby authored
      commit e9b736d8 upstream.
      
      The class of 4 n_hdls buf locks is the same because a single function
      n_hdlc_buf_list_init is used to init all the locks. But since
      flush_tx_queue takes n_hdlc->tx_buf_list.spinlock and then calls
      n_hdlc_buf_put which takes n_hdlc->tx_free_buf_list.spinlock, lockdep
      emits a warning:
      =============================================
      [ INFO: possible recursive locking detected ]
      4.3.0-25.g91e30a7-default #1 Not tainted
      ---------------------------------------------
      a.out/1248 is trying to acquire lock:
       (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fd020>] n_hdlc_buf_put+0x20/0x60 [n_hdlc]
      
      but task is already holding lock:
       (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fdc07>] n_hdlc_tty_ioctl+0x127/0x1d0 [n_hdlc]
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&list->spinlock)->rlock);
        lock(&(&list->spinlock)->rlock);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      2 locks held by a.out/1248:
       #0:  (&tty->ldisc_sem){++++++}, at: [<ffffffff814c9eb0>] tty_ldisc_ref_wait+0x20/0x50
       #1:  (&(&list->spinlock)->rlock){......}, at: [<ffffffffa01fdc07>] n_hdlc_tty_ioctl+0x127/0x1d0 [n_hdlc]
      ...
      Call Trace:
      ...
       [<ffffffff81738fd0>] _raw_spin_lock_irqsave+0x50/0x70
       [<ffffffffa01fd020>] n_hdlc_buf_put+0x20/0x60 [n_hdlc]
       [<ffffffffa01fdc24>] n_hdlc_tty_ioctl+0x144/0x1d0 [n_hdlc]
       [<ffffffff814c25c1>] tty_ioctl+0x3f1/0xe40
      ...
      
      Fix it by initializing the spin_locks separately. This removes also
      reduntand memset of a freshly kzallocated space.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59c4d783
  2. 12 Mar, 2017 26 commits