1. 02 Dec, 2015 40 commits
    • Eric Dumazet's avatar
      packet: fix match_fanout_group() · 391e4815
      Eric Dumazet authored
      commit 161642e2 upstream.
      
      Recent TCP listener patches exposed a prior af_packet bug :
      match_fanout_group() blindly assumes it is always safe
      to cast sk to a packet socket to compare fanout with af_packet_priv
      
      But SYNACK packets can be sent while attached to request_sock, which
      are smaller than a "struct sock".
      
      We can read non existent memory and crash.
      
      Fixes: c0de08d0 ("af_packet: don't emit packet on orig fanout group")
      Fixes: ca6fb065 ("tcp: attach SYNACK messages to request sockets instead of listener")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Eric Leblond <eric@regit.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      391e4815
    • Johannes Berg's avatar
      mac80211: fix driver RSSI event calculations · e3b6c1b8
      Johannes Berg authored
      commit 8ec6d978 upstream.
      
      The ifmgd->ave_beacon_signal value cannot be taken as is for
      comparisons, it must be divided by since it's represented
      like that for better accuracy of the EWMA calculations. This
      would lead to invalid driver RSSI events. Fix the used value.
      
      Fixes: 615f7b9b ("mac80211: add driver RSSI threshold events")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e3b6c1b8
    • Jay Vosburgh's avatar
      bonding: fix panic on non-ARPHRD_ETHER enslave failure · 86902b22
      Jay Vosburgh authored
      commit 40baec22 upstream.
      
      Since commit 7d5cd2ce529b, when bond_enslave fails on devices that
      are not ARPHRD_ETHER, if needed, it resets the bonding device back to
      ARPHRD_ETHER by calling ether_setup.
      
      	Unfortunately, ether_setup clobbers dev->flags, clearing IFF_UP
      if the bond device is up, leaving it in a quasi-down state without
      having actually gone through dev_close.  For bonding, if any periodic
      work queue items are active (miimon, arp_interval, etc), those will
      remain running, as they are stopped by bond_close.  At this point, if
      the bonding module is unloaded or the bond is deleted, the system will
      panic when the work function is called.
      
      	This panic is resolved by calling dev_close on the bond itself
      prior to calling ether_setup.
      
      Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Fixes: 7d5cd2ce ("bonding: correctly handle bonding type change on enslave failure")
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      86902b22
    • Peter Feiner's avatar
      perf trace: Fix documentation for -i · fee47cbd
      Peter Feiner authored
      commit 956959f6 upstream.
      
      The -i flag was incorrectly listed as a short flag for --no-inherit.  It
      should have only been listed as a short flag for --input.
      
      This documentation error has existed since the --input flag was
      introduced in 6810fc91 (perf trace: Add
      option to analyze events in a file versus live).
      Signed-off-by: default avatarPeter Feiner <pfeiner@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Link: http://lkml.kernel.org/r/1446657706-14518-1-git-send-email-pfeiner@google.com
      Fixes: 6810fc91 ("perf trace: Add option to analyze events in a file versus live")
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      fee47cbd
    • Michal Kubeček's avatar
      ipv6: fix tunnel error handling · dc8e5f6e
      Michal Kubeček authored
      commit ebac62fe upstream.
      
      Both tunnel6_protocol and tunnel46_protocol share the same error
      handler, tunnel6_err(), which traverses through tunnel6_handlers list.
      For ipip6 tunnels, we need to traverse tunnel46_handlers as we do e.g.
      in tunnel46_rcv(). Current code can generate an ICMPv6 error message
      with an IPv4 packet embedded in it.
      
      Fixes: 73d605d1 ("[IPSEC]: changing API of xfrm6_tunnel_register")
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      dc8e5f6e
    • Ralf Baechle's avatar
      MIPS: atomic: Fix comment describing atomic64_add_unless's return value. · af59174f
      Ralf Baechle authored
      commit f25319d2 upstream.
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Fixes: f24219b4
      (cherry picked from commit f0a232cde7be18a207fd057dd79bbac8a0a45dec)
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      af59174f
    • Dan Carpenter's avatar
      devres: fix a for loop bounds check · c4ea741d
      Dan Carpenter authored
      commit 1f35d04a upstream.
      
      The iomap[] array has PCIM_IOMAP_MAX (6) elements and not
      DEVICE_COUNT_RESOURCE (16).  This bug was found using a static checker.
      It may be that the "if (!(mask & (1 << i)))" check means we never
      actually go past the end of the array in real life.
      
      Fixes: ec04b075 ('iomap: implement pcim_iounmap_regions()')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      c4ea741d
    • Andy Shevchenko's avatar
      dmaengine: dw: convert to __ffs() · df3c2125
      Andy Shevchenko authored
      commit 39416677 upstream.
      
      We replace __fls() by __ffs() since we have to find a *minimum* data width that
      satisfies both source and destination.
      
      While here, rename dwc_fast_fls() to dwc_fast_ffs() which it really is.
      
      Fixes: 4c2d56c5 (dw_dmac: introduce dwc_fast_fls())
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      df3c2125
    • Dan Carpenter's avatar
      mwifiex: fix mwifiex_rdeeprom_read() · aa81da86
      Dan Carpenter authored
      commit 1f9c6e1b upstream.
      
      There were several bugs here.
      
      1)  The done label was in the wrong place so we didn't copy any
          information out when there was no command given.
      
      2)  We were using PAGE_SIZE as the size of the buffer instead of
          "PAGE_SIZE - pos".
      
      3)  snprintf() returns the number of characters that would have been
          printed if there were enough space.  If there was not enough space
          (and we had fixed the memory corruption bug #2) then it would result
          in an information leak when we do simple_read_from_buffer().  I've
          changed it to use scnprintf() instead.
      
      I also removed the initialization at the start of the function, because
      I thought it made the code a little more clear.
      
      Fixes: 5e6e3a92 ('wireless: mwifiex: initial commit for Marvell mwifiex driver')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarAmitkumar Karwar <akarwar@marvell.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      aa81da86
    • Valentin Rothberg's avatar
      wm831x_power: Use IRQF_ONESHOT to request threaded IRQs · 7ff0ff62
      Valentin Rothberg authored
      commit 90adf98d upstream.
      
      Since commit 1c6c6952 ("genirq: Reject bogus threaded irq requests")
      threaded IRQs without a primary handler need to be requested with
      IRQF_ONESHOT, otherwise the request will fail.
      
      scripts/coccinelle/misc/irqf_oneshot.cocci detected this issue.
      
      Fixes: b5874f33 ("wm831x_power: Use genirq")
      Signed-off-by: default avatarValentin Rothberg <valentinrothberg@gmail.com>
      Signed-off-by: default avatarSebastian Reichel <sre@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7ff0ff62
    • Maciej W. Rozycki's avatar
      binfmt_elf: Don't clobber passed executable's file header · c820f3a7
      Maciej W. Rozycki authored
      commit b582ef5c upstream.
      
      Do not clobber the buffer space passed from `search_binary_handler' and
      originally preloaded by `prepare_binprm' with the executable's file
      header by overwriting it with its interpreter's file header.  Instead
      keep the buffer space intact and directly use the data structure locally
      allocated for the interpreter's file header, fixing a bug introduced in
      2.1.14 with loadable module support (linux-mips.org commit beb11695
      [Import of Linux/MIPS 2.1.14], predating kernel.org repo's history).
      Adjust the amount of data read from the interpreter's file accordingly.
      
      This was not an issue before loadable module support, because back then
      `load_elf_binary' was executed only once for a given ELF executable,
      whether the function succeeded or failed.
      
      With loadable module support supported and enabled, upon a failure of
      `load_elf_binary' -- which may for example be caused by architecture
      code rejecting an executable due to a missing hardware feature requested
      in the file header -- a module load is attempted and then the function
      reexecuted by `search_binary_handler'.  With the executable's file
      header replaced with its interpreter's file header the executable can
      then be erroneously accepted in this subsequent attempt.
      Signed-off-by: default avatarMaciej W. Rozycki <macro@imgtec.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      c820f3a7
    • David Howells's avatar
      FS-Cache: Handle a write to the page immediately beyond the EOF marker · 495b046a
      David Howells authored
      commit 102f4d90 upstream.
      
      Handle a write being requested to the page immediately beyond the EOF
      marker on a cache object.  Currently this gets an assertion failure in
      CacheFiles because the EOF marker is used there to encode information about
      a partial page at the EOF - which could lead to an unknown blank spot in
      the file if we extend the file over it.
      
      The problem is actually in fscache where we check the index of the page
      being written against store_limit.  store_limit is set to the number of
      pages that we're allowed to store by fscache_set_store_limit() - which
      means it's one more than the index of the last page we're allowed to store.
      The problem is that we permit writing to a page with an index _equal_ to
      the store limit - when we should reject that case.
      
      Whilst we're at it, change the triggered assertion in CacheFiles to just
      return -ENOBUFS instead.
      
      The assertion failure looks something like this:
      
      CacheFiles: Assertion failed
      1000 < 7b1 is false
      ------------[ cut here ]------------
      kernel BUG at fs/cachefiles/rdwr.c:962!
      ...
      RIP: 0010:[<ffffffffa02c9e83>]  [<ffffffffa02c9e83>] cachefiles_write_page+0x273/0x2d0 [cachefiles]
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      [ kamal: backport to 3.13-stable: no __kernel_write(); thanks Ben H. ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      495b046a
    • Kinglong Mee's avatar
      FS-Cache: Don't override netfs's primary_index if registering failed · 4ea591a2
      Kinglong Mee authored
      commit b130ed59 upstream.
      
      Only override netfs->primary_index when registering success.
      Signed-off-by: default avatarKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      4ea591a2
    • Kinglong Mee's avatar
      FS-Cache: Increase reference of parent after registering, netfs success · ef7ea66b
      Kinglong Mee authored
      commit 86108c2e upstream.
      
      If netfs exist, fscache should not increase the reference of parent's
      usage and n_children, otherwise, never be decreased.
      
      v2: thanks David's suggest,
       move increasing reference of parent if success
       use kmem_cache_free() freeing primary_index directly
      
      v3: don't move "netfs->primary_index->parent = &fscache_fsdef_index;"
      Signed-off-by: default avatarKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ef7ea66b
    • Egbert Eich's avatar
      drm/ast: Initialized data needed to map fbdev memory · 4b171865
      Egbert Eich authored
      commit 28fb4cb7 upstream.
      
      Due to a missing initialization there was no way to map fbdev memory.
      Thus for example using the Xserver with the fbdev driver failed.
      This fix adds initialization for fix.smem_start and fix.smem_len
      in the fb_info structure, which fixes this problem.
      Requested-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarEgbert Eich <eich@suse.de>
      [pulled from SuSE tree by me - airlied]
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      4b171865
    • Paolo Bonzini's avatar
      KVM: svm: unconditionally intercept #DB · d3bb3b55
      Paolo Bonzini authored
      commit cbdb967a upstream.
      
      This is needed to avoid the possibility that the guest triggers
      an infinite stream of #DB exceptions (CVE-2015-8104).
      
      VMX is not affected: because it does not save DR6 in the VMCS,
      it already intercepts #DB unconditionally.
      Reported-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      d3bb3b55
    • Eric Northup's avatar
      KVM: x86: work around infinite loop in microcode when #AC is delivered · 7bde6109
      Eric Northup authored
      commit 54a20552 upstream.
      
      It was found that a guest can DoS a host by triggering an infinite
      stream of "alignment check" (#AC) exceptions.  This causes the
      microcode to enter an infinite loop where the core never receives
      another interrupt.  The host kernel panics pretty quickly due to the
      effects (CVE-2015-5307).
      Signed-off-by: default avatarEric Northup <digitaleric@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7bde6109
    • Nadav Amit's avatar
      KVM: x86: Defining missing x86 vectors · 21836c8c
      Nadav Amit authored
      commit c9cdd085 upstream.
      
      Defining XE, XM and VE vector numbers.
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [ kamal: 3.13-stable prereq ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      21836c8c
    • K. Y. Srinivasan's avatar
      storvsc: Don't set the SRB_FLAGS_QUEUE_ACTION_ENABLE flag · b58827ea
      K. Y. Srinivasan authored
      commit 8cf308e1 upstream.
      
      Don't set the SRB_FLAGS_QUEUE_ACTION_ENABLE flag since we are not specifying
      tags.  Without this, the qlogic driver doesn't work properly with storvsc.
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Odin.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b58827ea
    • Filipe Manana's avatar
      Btrfs: fix race when listing an inode's xattrs · 99743d42
      Filipe Manana authored
      commit f1cd1f0b upstream.
      
      When listing a inode's xattrs we have a time window where we race against
      a concurrent operation for adding a new hard link for our inode that makes
      us not return any xattr to user space. In order for this to happen, the
      first xattr of our inode needs to be at slot 0 of a leaf and the previous
      leaf must still have room for an inode ref (or extref) item, and this can
      happen because an inode's listxattrs callback does not lock the inode's
      i_mutex (nor does the VFS does it for us), but adding a hard link to an
      inode makes the VFS lock the inode's i_mutex before calling the inode's
      link callback.
      
      If we have the following leafs:
      
                     Leaf X (has N items)                    Leaf Y
      
       [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 XATTR_ITEM 12345), ... ]
                 slot N - 2         slot N - 1              slot 0
      
      The race illustrated by the following sequence diagram is possible:
      
             CPU 1                                               CPU 2
      
        btrfs_listxattr()
      
          searches for key (257 XATTR_ITEM 0)
      
          gets path with path->nodes[0] == leaf X
          and path->slots[0] == N
      
          because path->slots[0] is >=
          btrfs_header_nritems(leaf X), it calls
          btrfs_next_leaf()
      
          btrfs_next_leaf()
            releases the path
      
                                                         adds key (257 INODE_REF 666)
                                                         to the end of leaf X (slot N),
                                                         and leaf X now has N + 1 items
      
            searches for the key (257 INODE_REF 256),
            with path->keep_locks == 1, because that
            is the last key it saw in leaf X before
            releasing the path
      
            ends up at leaf X again and it verifies
            that the key (257 INODE_REF 256) is no
            longer the last key in leaf X, so it
            returns with path->nodes[0] == leaf X
            and path->slots[0] == N, pointing to
            the new item with key (257 INODE_REF 666)
      
          btrfs_listxattr's loop iteration sees that
          the type of the key pointed by the path is
          different from the type BTRFS_XATTR_ITEM_KEY
          and so it breaks the loop and stops looking
          for more xattr items
            --> the application doesn't get any xattr
                listed for our inode
      
      So fix this by breaking the loop only if the key's type is greater than
      BTRFS_XATTR_ITEM_KEY and skip the current key if its type is smaller.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      [ luis: backported to 3.16:
        - drop btrfs_key_type(), which was dropped upstream by
          962a298f ("btrfs: kill the key type accessor helpers") ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      99743d42
    • Peter Oberparleiter's avatar
      scsi_sysfs: Fix queue_ramp_up_period return code · e886db69
      Peter Oberparleiter authored
      commit 863e02d0 upstream.
      
      Writing a number to /sys/bus/scsi/devices/<sdev>/queue_ramp_up_period
      returns the value of that number instead of the number of bytes written.
      This behavior can confuse programs expecting POSIX write() semantics.
      Fix this by returning the number of bytes written instead.
      Signed-off-by: default avatarPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
      Reviewed-by: default avatarEwan D. Milne <emilne@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e886db69
    • Peter Zijlstra's avatar
      perf: Fix inherited events vs. tracepoint filters · a506b23e
      Peter Zijlstra authored
      commit b71b437e upstream.
      
      Arnaldo reported that tracepoint filters seem to misbehave (ie. not
      apply) on inherited events.
      
      The fix is obvious; filters are only set on the actual (parent)
      event, use the normal pattern of using this parent event for filters.
      This is safe because each child event has a reference to it.
      Reported-by: default avatarArnaldo Carvalho de Melo <acme@kernel.org>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20151102095051.GN17308@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      a506b23e
    • Filipe Manana's avatar
      Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow · 14fbf7ee
      Filipe Manana authored
      commit 1d512cb7 upstream.
      
      If we are using the NO_HOLES feature, we have a tiny time window when
      running delalloc for a nodatacow inode where we can race with a concurrent
      link or xattr add operation leading to a BUG_ON.
      
      This happens because at run_delalloc_nocow() we end up casting a leaf item
      of type BTRFS_INODE_[REF|EXTREF]_KEY or of type BTRFS_XATTR_ITEM_KEY to a
      file extent item (struct btrfs_file_extent_item) and then analyse its
      extent type field, which won't match any of the expected extent types
      (values BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]) and therefore trigger an
      explicit BUG_ON(1).
      
      The following sequence diagram shows how the race happens when running a
      no-cow dellaloc range [4K, 8K[ for inode 257 and we have the following
      neighbour leafs:
      
                   Leaf X (has N items)                    Leaf Y
      
       [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 EXTENT_DATA 8192), ... ]
                    slot N - 2         slot N - 1              slot 0
      
       (Note the implicit hole for inode 257 regarding the [0, 8K[ range)
      
             CPU 1                                         CPU 2
      
       run_dealloc_nocow()
         btrfs_lookup_file_extent()
           --> searches for a key with value
               (257 EXTENT_DATA 4096) in the
               fs/subvol tree
           --> returns us a path with
               path->nodes[0] == leaf X and
               path->slots[0] == N
      
         because path->slots[0] is >=
         btrfs_header_nritems(leaf X), it
         calls btrfs_next_leaf()
      
         btrfs_next_leaf()
           --> releases the path
      
                                                    hard link added to our inode,
                                                    with key (257 INODE_REF 500)
                                                    added to the end of leaf X,
                                                    so leaf X now has N + 1 keys
      
           --> searches for the key
               (257 INODE_REF 256), because
               it was the last key in leaf X
               before it released the path,
               with path->keep_locks set to 1
      
           --> ends up at leaf X again and
               it verifies that the key
               (257 INODE_REF 256) is no longer
               the last key in the leaf, so it
               returns with path->nodes[0] ==
               leaf X and path->slots[0] == N,
               pointing to the new item with
               key (257 INODE_REF 500)
      
         the loop iteration of run_dealloc_nocow()
         does not break out the loop and continues
         because the key referenced in the path
         at path->nodes[0] and path->slots[0] is
         for inode 257, its type is < BTRFS_EXTENT_DATA_KEY
         and its offset (500) is less then our delalloc
         range's end (8192)
      
         the item pointed by the path, an inode reference item,
         is (incorrectly) interpreted as a file extent item and
         we get an invalid extent type, leading to the BUG_ON(1):
      
         if (extent_type == BTRFS_FILE_EXTENT_REG ||
            extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
             (...)
         } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
             (...)
         } else {
             BUG_ON(1)
         }
      
      The same can happen if a xattr is added concurrently and ends up having
      a key with an offset smaller then the delalloc's range end.
      
      So fix this by skipping keys with a type smaller than
      BTRFS_EXTENT_DATA_KEY.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      14fbf7ee
    • Filipe Manana's avatar
      Btrfs: fix race leading to incorrect item deletion when dropping extents · 8939f4dc
      Filipe Manana authored
      commit aeafbf84 upstream.
      
      While running a stress test I got the following warning triggered:
      
        [191627.672810] ------------[ cut here ]------------
        [191627.673949] WARNING: CPU: 8 PID: 8447 at fs/btrfs/file.c:779 __btrfs_drop_extents+0x391/0xa50 [btrfs]()
        (...)
        [191627.701485] Call Trace:
        [191627.702037]  [<ffffffff8145f077>] dump_stack+0x4f/0x7b
        [191627.702992]  [<ffffffff81095de5>] ? console_unlock+0x356/0x3a2
        [191627.704091]  [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
        [191627.705380]  [<ffffffffa0664499>] ? __btrfs_drop_extents+0x391/0xa50 [btrfs]
        [191627.706637]  [<ffffffff8104b46d>] warn_slowpath_null+0x1a/0x1c
        [191627.707789]  [<ffffffffa0664499>] __btrfs_drop_extents+0x391/0xa50 [btrfs]
        [191627.709155]  [<ffffffff8115663c>] ? cache_alloc_debugcheck_after.isra.32+0x171/0x1d0
        [191627.712444]  [<ffffffff81155007>] ? kmemleak_alloc_recursive.constprop.40+0x16/0x18
        [191627.714162]  [<ffffffffa06570c9>] insert_reserved_file_extent.constprop.40+0x83/0x24e [btrfs]
        [191627.715887]  [<ffffffffa065422b>] ? start_transaction+0x3bb/0x610 [btrfs]
        [191627.717287]  [<ffffffffa065b604>] btrfs_finish_ordered_io+0x273/0x4e2 [btrfs]
        [191627.728865]  [<ffffffffa065b888>] finish_ordered_fn+0x15/0x17 [btrfs]
        [191627.730045]  [<ffffffffa067d688>] normal_work_helper+0x14c/0x32c [btrfs]
        [191627.731256]  [<ffffffffa067d96a>] btrfs_endio_write_helper+0x12/0x14 [btrfs]
        [191627.732661]  [<ffffffff81061119>] process_one_work+0x24c/0x4ae
        [191627.733822]  [<ffffffff810615b0>] worker_thread+0x206/0x2c2
        [191627.734857]  [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
        [191627.736052]  [<ffffffff810613aa>] ? process_scheduled_works+0x2f/0x2f
        [191627.737349]  [<ffffffff810669a6>] kthread+0xef/0xf7
        [191627.738267]  [<ffffffff810f3b3a>] ? time_hardirqs_on+0x15/0x28
        [191627.739330]  [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
        [191627.741976]  [<ffffffff81465592>] ret_from_fork+0x42/0x70
        [191627.743080]  [<ffffffff810668b7>] ? __kthread_parkme+0xad/0xad
        [191627.744206] ---[ end trace bbfddacb7aaada8d ]---
      
        $ cat -n fs/btrfs/file.c
        691  int __btrfs_drop_extents(struct btrfs_trans_handle *trans,
        (...)
        758                  btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
        759                  if (key.objectid > ino ||
        760                      key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
        761                          break;
        762
        763                  fi = btrfs_item_ptr(leaf, path->slots[0],
        764                                      struct btrfs_file_extent_item);
        765                  extent_type = btrfs_file_extent_type(leaf, fi);
        766
        767                  if (extent_type == BTRFS_FILE_EXTENT_REG ||
        768                      extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
        (...)
        774                  } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
        (...)
        778                  } else {
        779                          WARN_ON(1);
        780                          extent_end = search_start;
        781                  }
        (...)
      
      This happened because the item we were processing did not match a file
      extent item (its key type != BTRFS_EXTENT_DATA_KEY), and even on this
      case we cast the item to a struct btrfs_file_extent_item pointer and
      then find a type field value that does not match any of the expected
      values (BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]). This scenario happens
      due to a tiny time window where a race can happen as exemplified below.
      For example, consider the following scenario where we're using the
      NO_HOLES feature and we have the following two neighbour leafs:
      
                     Leaf X (has N items)                    Leaf Y
      
      [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 EXTENT_DATA 8192), ... ]
                slot N - 2         slot N - 1              slot 0
      
      Our inode 257 has an implicit hole in the range [0, 8K[ (implicit rather
      than explicit because NO_HOLES is enabled). Now if our inode has an
      ordered extent for the range [4K, 8K[ that is finishing, the following
      can happen:
      
                CPU 1                                       CPU 2
      
        btrfs_finish_ordered_io()
          insert_reserved_file_extent()
            __btrfs_drop_extents()
               Searches for the key
                (257 EXTENT_DATA 4096) through
                btrfs_lookup_file_extent()
      
               Key not found and we get a path where
               path->nodes[0] == leaf X and
               path->slots[0] == N
      
               Because path->slots[0] is >=
               btrfs_header_nritems(leaf X), we call
               btrfs_next_leaf()
      
               btrfs_next_leaf() releases the path
      
                                                        inserts key
                                                        (257 INODE_REF 4096)
                                                        at the end of leaf X,
                                                        leaf X now has N + 1 keys,
                                                        and the new key is at
                                                        slot N
      
               btrfs_next_leaf() searches for
               key (257 INODE_REF 256), with
               path->keep_locks set to 1,
               because it was the last key it
               saw in leaf X
      
                 finds it in leaf X again and
                 notices it's no longer the last
                 key of the leaf, so it returns 0
                 with path->nodes[0] == leaf X and
                 path->slots[0] == N (which is now
                 < btrfs_header_nritems(leaf X)),
                 pointing to the new key
                 (257 INODE_REF 4096)
      
               __btrfs_drop_extents() casts the
               item at path->nodes[0], slot
               path->slots[0], to a struct
               btrfs_file_extent_item - it does
               not skip keys for the target
               inode with a type less than
               BTRFS_EXTENT_DATA_KEY
               (BTRFS_INODE_REF_KEY < BTRFS_EXTENT_DATA_KEY)
      
               sees a bogus value for the type
               field triggering the WARN_ON in
               the trace shown above, and sets
               extent_end = search_start (4096)
      
               does the if-then-else logic to
               fixup 0 length extent items created
               by a past bug from hole punching:
      
                 if (extent_end == key.offset &&
                     extent_end >= search_start)
                     goto delete_extent_item;
      
               that evaluates to true and it ends
               up deleting the key pointed to by
               path->slots[0], (257 INODE_REF 4096),
               from leaf X
      
      The same could happen for example for a xattr that ends up having a key
      with an offset value that matches search_start (very unlikely but not
      impossible).
      
      So fix this by ensuring that keys smaller than BTRFS_EXTENT_DATA_KEY are
      skipped, never casted to struct btrfs_file_extent_item and never deleted
      by accident. Also protect against the unexpected case of getting a key
      for a lower inode number by skipping that key and issuing a warning.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8939f4dc
    • Borislav Petkov's avatar
      x86/cpu: Call verify_cpu() after having entered long mode too · 8afa2d46
      Borislav Petkov authored
      commit 04633df0 upstream.
      
      When we get loaded by a 64-bit bootloader, kernel entry point is
      startup_64 in head_64.S. We don't trust any and all bootloaders because
      some will fiddle with CPU configuration so we go ahead and massage each
      CPU into sanity again.
      
      For example, some dell BIOSes have this XD disable feature which set
      IA32_MISC_ENABLE[34] and disable NX. This might be some dumb workaround
      for other OSes but Linux sure doesn't need it.
      
      A similar thing is present in the Surface 3 firmware - see
      https://bugzilla.kernel.org/show_bug.cgi?id=106051 - which sets this bit
      only on the BSP:
      
        # rdmsr -a 0x1a0
        400850089
        850089
        850089
        850089
      
      I know, right?!
      
      There's not even an off switch in there.
      
      So fix all those cases by sanitizing the 64-bit entry point too. For
      that, make verify_cpu() callable in 64-bit mode also.
      Requested-and-debugged-by: default avatar"H. Peter Anvin" <hpa@zytor.com>
      Reported-and-tested-by: default avatarBastien Nocera <bugzilla@hadess.net>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1446739076-21303-1-git-send-email-bp@alien8.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8afa2d46
    • Mathias Krause's avatar
      printk: prevent userland from spoofing kernel messages · 4c2e3415
      Mathias Krause authored
      commit 3824657c upstream.
      
      The following statement of ABI/testing/dev-kmsg is not quite right:
      
         It is not possible to inject messages from userspace with the
         facility number LOG_KERN (0), to make sure that the origin of the
         messages can always be reliably determined.
      
      Userland actually can inject messages with a facility of 0 by abusing the
      fact that the facility is stored in a u8 data type.  By using a facility
      which is a multiple of 256 the assignment of msg->facility in log_store()
      implicitly truncates it to 0, i.e.  LOG_KERN, allowing users of /dev/kmsg
      to spoof kernel messages as shown below:
      
      The following call...
         # printf '<%d>Kernel panic - not syncing: beer empty\n' 0 >/dev/kmsg
      ...leads to the following log entry (dmesg -x | tail -n 1):
         user  :emerg : [   66.137758] Kernel panic - not syncing: beer empty
      
      However, this call...
         # printf '<%d>Kernel panic - not syncing: beer empty\n' 0x800 >/dev/kmsg
      ...leads to the slightly different log entry (note the kernel facility):
         kern  :emerg : [   74.177343] Kernel panic - not syncing: beer empty
      
      Fix that by limiting the user provided facility to 8 bit right from the
      beginning and catch the truncation early.
      
      Fixes: 7ff9554b ("printk: convert byte-buffer to variable-length...")
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Alex Elder <elder@linaro.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [ kamal: backport to 3.13-stable: retain local 'int i' ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      4c2e3415
    • Oleg Nesterov's avatar
      proc: actually make proc_fd_permission() thread-friendly · c7f00b9b
      Oleg Nesterov authored
      commit 54708d28 upstream.
      
      The commit 96d0df79 ("proc: make proc_fd_permission() thread-friendly")
      fixed the access to /proc/self/fd from sub-threads, but introduced another
      problem: a sub-thread can't access /proc/<tid>/fd/ or /proc/thread-self/fd
      if generic_permission() fails.
      
      Change proc_fd_permission() to check same_thread_group(pid_task(), current).
      
      Fixes: 96d0df79 ("proc: make proc_fd_permission() thread-friendly")
      Reported-by: default avatar"Jin, Yihua" <yihua.jin@intel.com>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      c7f00b9b
    • Stefan Richter's avatar
      firewire: ohci: fix JMicron JMB38x IT context discovery · 2a3277be
      Stefan Richter authored
      commit 100ceb66 upstream.
      
      Reported by Clifford and Craig for JMicron OHCI-1394 + SDHCI combo
      controllers:  Often or even most of the time, the controller is
      initialized with the message "added OHCI v1.10 device as card 0, 4 IR +
      0 IT contexts, quirks 0x10".  With 0 isochronous transmit DMA contexts
      (IT contexts), applications like audio output are impossible.
      
      However, OHCI-1394 demands that at least 4 IT contexts are implemented
      by the link layer controller, and indeed JMicron JMB38x do implement
      four of them.  Only their IsoXmitIntMask register is unreliable at early
      access.
      
      With my own JMB381 single function controller I found:
        - I can reproduce the problem with a lower probability than Craig's.
        - If I put a loop around the section which clears and reads
          IsoXmitIntMask, then either the first or the second attempt will
          return the correct initial mask of 0x0000000f.  I never encountered
          a case of needing more than a second attempt.
        - Consequently, if I put a dummy reg_read(...IsoXmitIntMaskSet)
          before the first write, the subsequent read will return the correct
          result.
        - If I merely ignore a wrong read result and force the known real
          result, later isochronous transmit DMA usage works just fine.
      
      So let's just fix this chip bug up by the latter method.  Tested with
      JMB381 on kernel 3.13 and 4.3.
      
      Since OHCI-1394 generally requires 4 IT contexts at a minium, this
      workaround is simply applied whenever the initial read of IsoXmitIntMask
      returns 0, regardless whether it's a JMicron chip or not.  I never heard
      of this issue together with any other chip though.
      
      I am not 100% sure that this fix works on the OHCI-1394 part of JMB380
      and JMB388 combo controllers exactly the same as on the JMB381 single-
      function controller, but so far I haven't had a chance to let an owner
      of a combo chip run a patched kernel.
      
      Strangely enough, IsoRecvIntMask is always reported correctly, even
      though it is probed right before IsoXmitIntMask.
      
      Reported-by: Clifford Dunn
      Reported-by: default avatarCraig Moore <craig.moore@qenos.com>
      Signed-off-by: default avatarStefan Richter <stefanr@s5r6.in-berlin.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      2a3277be
    • Alexandra Yates's avatar
      ALSA: hda - Add Intel Lewisburg device IDs Audio · 02d365cd
      Alexandra Yates authored
      commit 5cf92c8b upstream.
      
      Adding Intel codename Lewisburg platform device IDs for audio.
      
      [rearranged the position by tiwai]
      Signed-off-by: default avatarAlexandra Yates <alexandra.yates@linux.intel.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      02d365cd
    • Takashi Iwai's avatar
      ALSA: hda - Apply pin fixup for HP ProBook 6550b · 3275d8fa
      Takashi Iwai authored
      commit c932b98c upstream.
      
      HP ProBook 6550b needs the same pin fixup applied to other HP B-series
      laptops with docks for making its headphone and dock headphone jacks
      working properly.  We just need to add the codec SSID to the list.
      
      Bugzilla: https://bugzilla.kernel.org/attachment.cgi?id=191971Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3275d8fa
    • Radim Krčmář's avatar
      KVM: VMX: fix SMEP and SMAP without EPT · d9de1138
      Radim Krčmář authored
      commit 656ec4a4 upstream.
      
      The comment in code had it mostly right, but we enable paging for
      emulated real mode regardless of EPT.
      
      Without EPT (which implies emulated real mode), secondary VCPUs won't
      start unless we disable SM[AE]P when the guest doesn't use paging.
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      d9de1138
    • Feng Wu's avatar
      KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode · 39dbf66e
      Feng Wu authored
      commit e1e746b3 upstream.
      
      SMAP is disabled if CPU is in non-paging mode in hardware.
      However KVM always uses paging mode to emulate guest non-paging
      mode with TDP. To emulate this behavior, SMAP needs to be
      manually disabled when guest switches to non-paging mode.
      Signed-off-by: default avatarFeng Wu <feng.wu@intel.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      [ kamal: 3.13-stable prereq for
        656ec4a4 KVM: VMX: fix SMEP and SMAP without EPT ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      39dbf66e
    • libin's avatar
      recordmcount: Fix endianness handling bug for nop_mcount · 9514f3ce
      libin authored
      commit c84da8b9 upstream.
      
      In nop_mcount, shdr->sh_offset and welp->r_offset should handle
      endianness properly, otherwise it will trigger Segmentation fault
      if the recordmcount main and file.o have different endianness.
      
      Link: http://lkml.kernel.org/r/563806C7.7070606@huawei.comSigned-off-by: default avatarLi Bin <huawei.libin@huawei.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9514f3ce
    • Arik Nemtsov's avatar
      mac80211: allow null chandef in tracing · 1d4a161a
      Arik Nemtsov authored
      commit 254d3dfe upstream.
      
      In TDLS channel-switch operations the chandef can sometimes be NULL.
      Avoid an oops in the trace code for these cases and just print a
      chandef full of zeros.
      
      Fixes: a7a6bdd0 ("mac80211: introduce TDLS channel switch ops")
      Signed-off-by: default avatarArik Nemtsov <arikx.nemtsov@intel.com>
      Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1d4a161a
    • sumit.saxena@avagotech.com's avatar
      megaraid_sas : SMAP restriction--do not access user memory from IOCTL code · 0efc1968
      sumit.saxena@avagotech.com authored
      commit 323c4a02 upstream.
      
      This is an issue on SMAP enabled CPUs and 32 bit apps running on 64 bit
      OS. Do not access user memory from kernel code. The SMAP bit restricts
      accessing user memory from kernel code.
      Signed-off-by: default avatarSumit Saxena <sumit.saxena@avagotech.com>
      Signed-off-by: default avatarKashyap Desai <kashyap.desai@avagotech.com>
      Reviewed-by: default avatarTomas Henzl <thenzl@redhat.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      0efc1968
    • Max Filippov's avatar
      xtensa: fixes for configs without loop option · ab5abbbb
      Max Filippov authored
      commit 5029615e upstream.
      
      Build-time fixes:
      - make lbeg/lend/lcount save/restore conditional on kernel entry;
      - don't clear lcount in platform_restart functions unconditionally.
      
      Run-time fixes:
      - use correct end of range register in __endla paired with __loopt, not
        the unused temporary register. This fixes .bss zero-initialization.
        Update comments in asmmacro.h;
      - don't clobber a10 in the usercopy that leads to access to unmapped
        memory.
      Signed-off-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ab5abbbb
    • Herbert Xu's avatar
      crypto: algif_hash - Only export and import on sockets with data · 6da6ef82
      Herbert Xu authored
      commit 4afa5f96 upstream.
      
      The hash_accept call fails to work on sockets that have not received
      any data.  For some algorithm implementations it may cause crashes.
      
      This patch fixes this by ensuring that we only export and import on
      sockets that have received data.
      Reported-by: default avatarHarsh Jain <harshjain.prof@gmail.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Tested-by: default avatarStephan Mueller <smueller@chronox.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      6da6ef82
    • Mauricio Faria de Oliveira's avatar
      Revert "dm mpath: fix stalls when handling invalid ioctls" · 14d2bdec
      Mauricio Faria de Oliveira authored
      commit 47796938 upstream.
      
      This reverts commit a1989b33.
      
      That commit introduced a regression at least for the case of the SG_IO ioctl()
      running without CAP_SYS_RAWIO capability (e.g., unprivileged users) when there
      are no active paths: the ioctl() fails with the ENOTTY errno immediately rather
      than blocking due to queue_if_no_path until a path becomes active, for example.
      
      That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices
      (qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2])
      from multipath devices; which leads to SCSI/filesystem errors in such a guest.
      
      More general scenarios can hit that regression too. The following demonstration
      employs a SG_IO ioctl() with a standard SCSI INQUIRY command for this objective
      (some output & user changes omitted for brevity and comments added for clarity).
      
      Reverting that commit restores normal operation (queueing) in failing scenarios;
      tested on linux-next (next-20151022).
      
      1) Test-case is based on sg_simple0 [3] (just SG_IO; remove SG_GET_VERSION_NUM)
      
          $ cat sg_simple0.c
          ... see [3] ...
          $ sed '/SG_GET_VERSION_NUM/,/}/d' sg_simple0.c > sgio_inquiry.c
          $ gcc sgio_inquiry.c -o sgio_inquiry
      
      2) The ioctl() works fine with active paths present.
      
          # multipath -l 85ag56
          85ag56 (...) dm-19 IBM     ,2145
          size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
          |-+- policy='service-time 0' prio=0 status=active
          | |- 8:0:11:0  sdz  65:144  active undef running
          | `- 9:0:9:0   sdbf 67:144  active undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            |- 8:0:12:0  sdae 65:224  active undef running
            `- 9:0:12:0  sdbo 68:32   active undef running
      
          $ ./sgio_inquiry /dev/mapper/85ag56
          Some of the INQUIRY command's response:
              IBM       2145              0000
          INQUIRY duration=0 millisecs, resid=0
      
      3) The ioctl() fails with ENOTTY errno with _no_ active paths present,
         for unprivileged users (rather than blocking due to queue_if_no_path).
      
          # for path in $(multipath -l 85ag56 | grep -o 'sd[a-z]\+'); \
                do multipathd -k"fail path $path"; done
      
          # multipath -l 85ag56
          85ag56 (...) dm-19 IBM     ,2145
          size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
          |-+- policy='service-time 0' prio=0 status=enabled
          | |- 8:0:11:0  sdz  65:144  failed undef running
          | `- 9:0:9:0   sdbf 67:144  failed undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            |- 8:0:12:0  sdae 65:224  failed undef running
            `- 9:0:12:0  sdbo 68:32   failed undef running
      
          $ ./sgio_inquiry /dev/mapper/85ag56
          sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
      
      4) dmesg shows that scsi_verify_blk_ioctl() failed for SG_IO (0x2285);
         it returns -ENOIOCTLCMD, later replaced with -ENOTTY in vfs_ioctl().
      
          $ dmesg
          <...>
          [] device-mapper: multipath: Failing path 65:144.
          [] device-mapper: multipath: Failing path 67:144.
          [] device-mapper: multipath: Failing path 65:224.
          [] device-mapper: multipath: Failing path 68:32.
          [] sgio_inquiry: sending ioctl 2285 to a partition!
      
      5) The ioctl() only works if the SYS_CAP_RAWIO capability is present
         (then queueing happens -- in this example, queue_if_no_path is set);
         this is due to a conditional check in scsi_verify_blk_ioctl().
      
          # capsh --drop=cap_sys_rawio -- -c './sgio_inquiry /dev/mapper/85ag56'
          sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
      
          # ./sgio_inquiry /dev/mapper/85ag56 &
          [1] 72830
      
          # cat /proc/72830/stack
          [<c00000171c0df700>] 0xc00000171c0df700
          [<c000000000015934>] __switch_to+0x204/0x350
          [<c000000000152d4c>] msleep+0x5c/0x80
          [<c00000000077dfb0>] dm_blk_ioctl+0x70/0x170
          [<c000000000487c40>] blkdev_ioctl+0x2b0/0x9b0
          [<c0000000003128e4>] block_ioctl+0x64/0xd0
          [<c0000000002dd3b0>] do_vfs_ioctl+0x490/0x780
          [<c0000000002dd774>] SyS_ioctl+0xd4/0xf0
          [<c000000000009358>] system_call+0x38/0xd0
      
      6) This is the function call chain exercised in this analysis:
      
      SYSCALL_DEFINE3(ioctl, <...>) @ fs/ioctl.c
          -> do_vfs_ioctl()
              -> vfs_ioctl()
                  ...
                  error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
                  ...
                      -> dm_blk_ioctl() @ drivers/md/dm.c
                          -> multipath_ioctl() @ drivers/md/dm-mpath.c
                              ...
                              (bdev = NULL, due to no active paths)
                              ...
                              if (!bdev || <...>) {
                                  int err = scsi_verify_blk_ioctl(NULL, cmd);
                                  if (err)
                                      r = err;
                              }
                              ...
                                  -> scsi_verify_blk_ioctl() @ block/scsi_ioctl.c
                                      ...
                                      if (bd && bd == bd->bd_contains) // not taken (bd = NULL)
                                          return 0;
                                      ...
                                      if (capable(CAP_SYS_RAWIO)) // not taken (unprivileged user)
                                          return 0;
                                      ...
                                      printk_ratelimited(KERN_WARNING
                                                 "%s: sending ioctl %x to a partition!\n" <...>);
      
                                      return -ENOIOCTLCMD;
                                  <-
                              ...
                              return r ? : <...>
                          <-
                  ...
                  if (error == -ENOIOCTLCMD)
                      error = -ENOTTY;
                   out:
                      return error;
                  ...
      
      Links:
      [1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52
      [2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device')
      [3] http://tldp.org/HOWTO/SCSI-Generic-HOWTO/pexample.html (Revision 1.2, 2002-05-03)
      Signed-off-by: default avatarMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      14d2bdec
    • Brian Norris's avatar
      mtd: blkdevs: fix potential deadlock + lockdep warnings · 04e1d483
      Brian Norris authored
      commit f3c63795 upstream.
      
      Commit 073db4a5 ("mtd: fix: avoid race condition when accessing
      mtd->usecount") fixed a race condition but due to poor ordering of the
      mutex acquisition, introduced a potential deadlock.
      
      The deadlock can occur, for example, when rmmod'ing the m25p80 module, which
      will delete one or more MTDs, along with any corresponding mtdblock
      devices. This could potentially race with an acquisition of the block
      device as follows.
      
       -> blktrans_open()
          ->  mutex_lock(&dev->lock);
          ->  mutex_lock(&mtd_table_mutex);
      
       -> del_mtd_device()
          ->  mutex_lock(&mtd_table_mutex);
          ->  blktrans_notify_remove() -> del_mtd_blktrans_dev()
             ->  mutex_lock(&dev->lock);
      
      This is a classic (potential) ABBA deadlock, which can be fixed by
      making the A->B ordering consistent everywhere. There was no real
      purpose to the ordering in the original patch, AFAIR, so this shouldn't
      be a problem. This ordering was actually already present in
      del_mtd_blktrans_dev(), for one, where the function tried to ensure that
      its caller already held mtd_table_mutex before it acquired &dev->lock:
      
              if (mutex_trylock(&mtd_table_mutex)) {
                      mutex_unlock(&mtd_table_mutex);
                      BUG();
              }
      
      So, reverse the ordering of acquisition of &dev->lock and &mtd_table_mutex so
      we always acquire mtd_table_mutex first.
      
      Snippets of the lockdep output follow:
      
        # modprobe -r m25p80
        [   53.419251]
        [   53.420838] ======================================================
        [   53.427300] [ INFO: possible circular locking dependency detected ]
        [   53.433865] 4.3.0-rc6 #96 Not tainted
        [   53.437686] -------------------------------------------------------
        [   53.444220] modprobe/372 is trying to acquire lock:
        [   53.449320]  (&new->lock){+.+...}, at: [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc
        [   53.457271]
        [   53.457271] but task is already holding lock:
        [   53.463372]  (mtd_table_mutex){+.+.+.}, at: [<c0439994>] del_mtd_device+0x18/0x100
        [   53.471321]
        [   53.471321] which lock already depends on the new lock.
        [   53.471321]
        [   53.479856]
        [   53.479856] the existing dependency chain (in reverse order) is:
        [   53.487660]
        -> #1 (mtd_table_mutex){+.+.+.}:
        [   53.492331]        [<c043fc5c>] blktrans_open+0x34/0x1a4
        [   53.497879]        [<c01afce0>] __blkdev_get+0xc4/0x3b0
        [   53.503364]        [<c01b0bb8>] blkdev_get+0x108/0x320
        [   53.508743]        [<c01713c0>] do_dentry_open+0x218/0x314
        [   53.514496]        [<c0180454>] path_openat+0x4c0/0xf9c
        [   53.519959]        [<c0182044>] do_filp_open+0x5c/0xc0
        [   53.525336]        [<c0172758>] do_sys_open+0xfc/0x1cc
        [   53.530716]        [<c000f740>] ret_fast_syscall+0x0/0x1c
        [   53.536375]
        -> #0 (&new->lock){+.+...}:
        [   53.540587]        [<c063f124>] mutex_lock_nested+0x38/0x3cc
        [   53.546504]        [<c043fe4c>] del_mtd_blktrans_dev+0x80/0xdc
        [   53.552606]        [<c043f164>] blktrans_notify_remove+0x7c/0x84
        [   53.558891]        [<c04399f0>] del_mtd_device+0x74/0x100
        [   53.564544]        [<c043c670>] del_mtd_partitions+0x80/0xc8
        [   53.570451]        [<c0439aa0>] mtd_device_unregister+0x24/0x48
        [   53.576637]        [<c046ce6c>] spi_drv_remove+0x1c/0x34
        [   53.582207]        [<c03de0f0>] __device_release_driver+0x88/0x114
        [   53.588663]        [<c03de19c>] device_release_driver+0x20/0x2c
        [   53.594843]        [<c03dd9e8>] bus_remove_device+0xd8/0x108
        [   53.600748]        [<c03dacc0>] device_del+0x10c/0x210
        [   53.606127]        [<c03dadd0>] device_unregister+0xc/0x20
        [   53.611849]        [<c046d878>] __unregister+0x10/0x20
        [   53.617211]        [<c03da868>] device_for_each_child+0x50/0x7c
        [   53.623387]        [<c046eae8>] spi_unregister_master+0x58/0x8c
        [   53.629578]        [<c03e12f0>] release_nodes+0x15c/0x1c8
        [   53.635223]        [<c03de0f8>] __device_release_driver+0x90/0x114
        [   53.641689]        [<c03de900>] driver_detach+0xb4/0xb8
        [   53.647147]        [<c03ddc78>] bus_remove_driver+0x4c/0xa0
        [   53.652970]        [<c00cab50>] SyS_delete_module+0x11c/0x1e4
        [   53.658976]        [<c000f740>] ret_fast_syscall+0x0/0x1c
        [   53.664621]
        [   53.664621] other info that might help us debug this:
        [   53.664621]
        [   53.672979]  Possible unsafe locking scenario:
        [   53.672979]
        [   53.679169]        CPU0                    CPU1
        [   53.683900]        ----                    ----
        [   53.688633]   lock(mtd_table_mutex);
        [   53.692383]                                lock(&new->lock);
        [   53.698306]                                lock(mtd_table_mutex);
        [   53.704658]   lock(&new->lock);
        [   53.707946]
        [   53.707946]  *** DEADLOCK ***
      
      Fixes: 073db4a5 ("mtd: fix: avoid race condition when accessing mtd->usecount")
      Reported-by: default avatarFelipe Balbi <balbi@ti.com>
      Tested-by: default avatarFelipe Balbi <balbi@ti.com>
      Signed-off-by: default avatarBrian Norris <computersforpeace@gmail.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      04e1d483
    • Marek Vasut's avatar
      can: Use correct type in sizeof() in nla_put() · 42f563e7
      Marek Vasut authored
      commit 562b103a upstream.
      
      The sizeof() is invoked on an incorrect variable, likely due to some
      copy-paste error, and this might result in memory corruption. Fix this.
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Cc: Wolfgang Grandegger <wg@grandegger.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      42f563e7