1. 10 Aug, 2017 3 commits
    • James Smart's avatar
      lpfc: support nvmet_fc defer_rcv callback · 50738420
      James Smart authored
      Currently, calls to nvmet_fc_rcv_fcp_req() always copied the
      FC-NVME cmd iu to a temporary buffer before returning, allowing
      the driver to immediately repost the buffer to the hardware.
      
      To address timing conditions on queue element structures vs async
      command reception, the nvmet_fc transport occasionally may need to
      hold on to the command iu buffer for a short period. In these cases,
      the nvmet_fc_rcv_fcp_req() will return a special return code
      (-EOVERFLOW). In these cases, the LLDD must delay until the new
      defer_rcv lldd callback is called before recycling the buffer back
      to the hw.
      
      This patch adds support for the new nvmet_fc transport defer_rcv
      callback and recognition of the new error code when passing commands
      to the transport.
      Signed-off-by: default avatarDick Kennedy <dick.kennedy@broadcom.com>
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      50738420
    • James Smart's avatar
      nvmet_fc: add defer_req callback for deferment of cmd buffer return · 0fb228d3
      James Smart authored
      At queue creation, the transport allocates a local job struct
      (struct nvmet_fc_fcp_iod) for each possible element of the queue.
      When a new CMD is received from the wire, a jobs struct is allocated
      from the queue and then used for the duration of the command.
      The job struct contains buffer space for the wire command iu. Thus,
      upon allocation of the job struct, the cmd iu buffer is copied to
      the job struct and the LLDD may immediately free/reuse the CMD IU
      buffer passed in the call.
      
      However, in some circumstances, due to the packetized nature of FC
      and the api of the FC LLDD which may issue a hw command to send the
      wire response, but the LLDD may not get the hw completion for the
      command and upcall the nvmet_fc layer before a new command may be
      asynchronously received on the wire. In other words, its possible
      for the initiator to get the response from the wire, thus believe a
      command slot free, and send a new command iu. The new command iu
      may be received by the LLDD and passed to the transport before the
      LLDD had serviced the hw completion and made the teardown calls for
      the original job struct. As such, there is no available job struct
      available for the new io. E.g. it appears like the host sent more
      queue elements than the queue size. It didn't based on it's
      understanding.
      
      Rather than treat this as a hard connection failure queue the new
      request until the job struct does free up. As the buffer isn't
      copied as there's no job struct, a special return value must be
      returned to the LLDD to signify to hold off on recycling the cmd
      iu buffer.  And later, when a job struct is allocated and the
      buffer copied, a new LLDD callback is introduced to notify the
      LLDD and allow it to recycle it's command iu buffer.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      0fb228d3
    • Martin Wilck's avatar
      nvme: strip trailing 0-bytes in wwid_show · 758f3735
      Martin Wilck authored
      Some broken controllers (such as earlier Linux targets) pad model or
      serial fields with 0-bytes rather than spaces. The NVMe spec disallows
      0 bytes in "ASCII" fields.  Thus strip trailing 0-bytes, too. Also make
      sure that we get no underflow for pathological input.
      Signed-off-by: default avatarMartin Wilck <mwilck@suse.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      758f3735
  2. 26 Jul, 2017 1 commit
    • Scott Bauer's avatar
      nvme: validate admin queue before unquiesce · 7dd1ab16
      Scott Bauer authored
      With a misbehaving controller it's possible we'll never
      enter the live state and create an admin queue. When we
      fail out of reset work it's possible we failed out early
      enough without setting up the admin queue. We tear down
      queues after a failed reset, but needed to do some more
      sanitization.
      
      Fixes 443bd90f: "nvme: host: unquiesce queue in nvme_kill_queues()"
      
      [  189.650995] nvme nvme1: pci function 0000:0b:00.0
      [  317.680055] nvme nvme0: Device not ready; aborting reset
      [  317.680183] nvme nvme0: Removing after probe failure status: -19
      [  317.681258] kasan: GPF could be caused by NULL-ptr deref or user memory access
      [  317.681397] general protection fault: 0000 [#1] SMP KASAN
      [  317.682984] CPU: 3 PID: 477 Comm: kworker/3:2 Not tainted 4.13.0-rc1+ #5
      [  317.683112] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F5 03/07/2016
      [  317.683284] Workqueue: events nvme_remove_dead_ctrl_work [nvme]
      [  317.683398] task: ffff8803b0990000 task.stack: ffff8803c2ef0000
      [  317.683516] RIP: 0010:blk_mq_unquiesce_queue+0x2b/0xa0
      [  317.683614] RSP: 0018:ffff8803c2ef7d40 EFLAGS: 00010282
      [  317.683716] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 1ffff1006fbdcde3
      [  317.683847] RDX: 0000000000000038 RSI: 1ffff1006f5a9245 RDI: 0000000000000000
      [  317.683978] RBP: ffff8803c2ef7d58 R08: 1ffff1007bcdc974 R09: 0000000000000000
      [  317.684108] R10: 1ffff1007bcdc975 R11: 0000000000000000 R12: 00000000000001c0
      [  317.684239] R13: ffff88037ad49228 R14: ffff88037ad492d0 R15: ffff88037ad492e0
      [  317.684371] FS:  0000000000000000(0000) GS:ffff8803de6c0000(0000) knlGS:0000000000000000
      [  317.684519] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  317.684627] CR2: 0000002d1860c000 CR3: 000000045b40d000 CR4: 00000000003406e0
      [  317.684758] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  317.684888] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  317.685018] Call Trace:
      [  317.685084]  nvme_kill_queues+0x4d/0x170 [nvme_core]
      [  317.685185]  nvme_remove_dead_ctrl_work+0x3a/0x90 [nvme]
      [  317.685289]  process_one_work+0x771/0x1170
      [  317.685372]  worker_thread+0xde/0x11e0
      [  317.685452]  ? pci_mmcfg_check_reserved+0x110/0x110
      [  317.685550]  kthread+0x2d3/0x3d0
      [  317.685617]  ? process_one_work+0x1170/0x1170
      [  317.685704]  ? kthread_create_on_node+0xc0/0xc0
      [  317.685785]  ret_from_fork+0x25/0x30
      [  317.685798] Code: 0f 1f 44 00 00 55 48 b8 00 00 00 00 00 fc ff df 48 89 e5 41 54 4c 8d a7 c0 01 00 00 53 48 89 fb 4c 89 e2 48 c1 ea 03 48 83 ec 08 <80> 3c 02 00 75 50 48 8b bb c0 01 00 00 e8 33 8a f9 00 0f ba b3
      [  317.685872] RIP: blk_mq_unquiesce_queue+0x2b/0xa0 RSP: ffff8803c2ef7d40
      [  317.685908] ---[ end trace a3f8704150b1e8b4 ]---
      Signed-off-by: default avatarScott Bauer <scott.bauer@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      7dd1ab16
  3. 25 Jul, 2017 5 commits
    • Christoph Hellwig's avatar
      nvme-pci: fix HMB size calculation · 50cdb7c6
      Christoph Hellwig authored
      It's possible the preferred HMB size may not be a multiple of the
      chunk_size. This patch moves len to function scope and uses that in
      the for loop increment so the last iteration doesn't cause the total
      size to exceed the allocated HMB size.
      
      Based on an earlier patch from Keith Busch.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Fixes: 87ad72a5 ("nvme-pci: implement host memory buffer support")
      50cdb7c6
    • James Smart's avatar
      nvme-fc: revise TRADDR parsing · 9c5358e1
      James Smart authored
      The FC-NVME spec hasn't locked down on the format string for TRADDR.
      Currently the spec is lobbying for "nn-<16hexdigits>:pn-<16hexdigits>"
      where the wwn's are hex values but not prefixed by 0x.
      
      Most implementations so far expect a string format of
      "nn-0x<16hexdigits>:pn-0x<16hexdigits>" to be used. The transport
      uses the match_u64 parser which requires a leading 0x prefix to set
      the base properly. If it's not there, a match will either fail or return
      a base 10 value.
      
      The resolution in T11 is pushing out. Therefore, to fix things now and
      to cover any eventuality and any implementations already in the field,
      this patch adds support for both formats.
      
      The change consists of replacing the token matching routine with a
      routine that validates the fixed string format, and then builds
      a local copy of the hex name with a 0x prefix before calling
      the system parser.
      
      Note: the same parser routine exists in both the initiator and target
      transports. Given this is about the only "shared" item, we chose to
      replicate rather than create an interdendency on some shared code.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      9c5358e1
    • James Smart's avatar
      nvme-fc: address target disconnect race conditions in fcp io submit · 8b25f351
      James Smart authored
      There are cases where threads are in the process of submitting new
      io when the LLDD calls in to remove the remote port. In some cases,
      the next io actually goes to the LLDD, who knows the remoteport isn't
      present and rejects it. To properly recovery/restart these i/o's we
      don't want to hard fail them, we want to treat them as temporary
      resource errors in which a delayed retry will work.
      
      Add a couple more checks on remoteport connectivity and commonize the
      busy response handling when it's seen.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      8b25f351
    • Jon Derrick's avatar
      nvme: fabrics commands should use the fctype field for data direction · 2fd4167f
      Jon Derrick authored
      Fabrics commands with opcode 0x7F use the fctype field to indicate data
      direction.
      Signed-off-by: default avatarJon Derrick <jonathan.derrick@intel.com>
      Reviewed-by: default avatarSagi Grimberg <sai@grmberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Fixes: eb793e2c ("nvme.h: add NVMe over Fabrics definitions")
      2fd4167f
    • Johannes Thumshirn's avatar
      nvme: also provide a UUID in the WWID sysfs attribute · 6484f5d1
      Johannes Thumshirn authored
      The WWID sysfs attribute can provide multiple means of a World Wide ID
      for a NVMe device. It can either be a NGUID, a EUI-64 or a concatenation
      of VID, Serial Number, Model and the Namespace ID in this order of
      preference.
      
      If the target also sends us a UUID use the UUID for identification and
      give it the highest priority.
      
      This eases generation of /dev/disk/by-* symlinks.
      Signed-off-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      6484f5d1
  4. 24 Jul, 2017 3 commits
    • Christoph Hellwig's avatar
      blk-mq: map queues to all present CPUs · 76451d79
      Christoph Hellwig authored
      We already do this for PCI mappings, and the higher level code now
      expects that CPU on/offlining doesn't have an affect on the queue
      mappings.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      76451d79
    • Christoph Hellwig's avatar
      block: disable runtime-pm for blk-mq · 765e40b6
      Christoph Hellwig authored
      The blk-mq code lacks support for looking at the rpm_status field, tracking
      active requests and the RQF_PM flag.
      
      Due to the default switch to blk-mq for scsi people start to run into
      suspend / resume issue due to this fact, so make sure we disable the runtime
      PM functionality until it is properly implemented.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      765e40b6
    • Bart Van Assche's avatar
      xen-blkfront: Fix handling of non-supported operations · 31c4ccc3
      Bart Van Assche authored
      This patch fixes the following sparse warnings:
      
      drivers/block/xen-blkfront.c:916:45: warning: incorrect type in argument 2 (different base types)
      drivers/block/xen-blkfront.c:916:45:    expected restricted blk_status_t [usertype] error
      drivers/block/xen-blkfront.c:916:45:    got int [signed] error
      drivers/block/xen-blkfront.c:1599:47: warning: incorrect type in assignment (different base types)
      drivers/block/xen-blkfront.c:1599:47:    expected int [signed] error
      drivers/block/xen-blkfront.c:1599:47:    got restricted blk_status_t [usertype] <noident>
      drivers/block/xen-blkfront.c:1607:55: warning: incorrect type in assignment (different base types)
      drivers/block/xen-blkfront.c:1607:55:    expected int [signed] error
      drivers/block/xen-blkfront.c:1607:55:    got restricted blk_status_t [usertype] <noident>
      drivers/block/xen-blkfront.c:1625:55: warning: incorrect type in assignment (different base types)
      drivers/block/xen-blkfront.c:1625:55:    expected int [signed] error
      drivers/block/xen-blkfront.c:1625:55:    got restricted blk_status_t [usertype] <noident>
      drivers/block/xen-blkfront.c:1628:62: warning: restricted blk_status_t degrades to integer
      
      Compile-tested only.
      
      Fixes: commit 2a842aca ("block: introduce new block status code type")
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Roger Pau Monné <roger.pau@citrix.com>
      Cc: <xen-devel@lists.xenproject.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      31c4ccc3
  5. 22 Jul, 2017 4 commits
  6. 21 Jul, 2017 24 commits
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-4.13-2' of git://git.linux-nfs.org/projects/anna/linux-nfs · 505d5c11
      Linus Torvalds authored
      Pull NFS client bugfixes from Anna Schumaker:
       "Stable bugfix:
         - Fix error reporting regression
      
        Bugfixes:
         - Fix setting filelayout ds address race
         - Fix subtle access bug when using ACLs
         - Fix setting mnt3_counts array size
         - Fix a couple of pNFS commit races"
      
      * tag 'nfs-for-4.13-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
        NFS/filelayout: Fix racy setting of fl->dsaddr in filelayout_check_deviceid()
        NFS: Be more careful about mapping file permissions
        NFS: Store the raw NFS access mask in the inode's access cache
        NFSv3: Convert nfs3_proc_access() to use nfs_access_set_mask()
        NFS: Refactor NFS access to kernel access mask calculation
        net/sunrpc/xprt_sock: fix regression in connection error reporting.
        nfs: count correct array for mnt3_counts array size
        Revert commit 722f0b89 ("pNFS: Don't send COMMITs to the DSes if...")
        pNFS/flexfiles: Handle expired layout segments in ff_layout_initiate_commit()
        NFS: Fix another COMMIT race in pNFS
        NFS: Fix a COMMIT race in pNFS
        mount: copy the port field into the cloned nfs_server structure.
        NFS: Don't run wake_up_bit() when nobody is waiting...
        nfs: add export operations
      505d5c11
    • Linus Torvalds's avatar
      Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs · 99313414
      Linus Torvalds authored
      Pull overlayfs fixes from Miklos Szeredi:
       "This fixes a crash with SELinux and several other old and new bugs"
      
      * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
        ovl: check for bad and whiteout index on lookup
        ovl: do not cleanup directory and whiteout index entries
        ovl: fix xattr get and set with selinux
        ovl: remove unneeded check for IS_ERR()
        ovl: fix origin verification of index dir
        ovl: mark parent impure on ovl_link()
        ovl: fix random return value on mount
      99313414
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 0151ef00
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A small set of fixes for -rc2 - two fixes for BFQ, documentation and
        code, and a removal of an unused variable in nbd. Outside of that, a
        small collection of fixes from the usual crew on the nvme side"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        nvmet: don't report 0-bytes in serial number
        nvmet: preserve controller serial number between reboots
        nvmet: Move serial number from controller to subsystem
        nvmet: prefix version configfs file with attr
        nvme-pci: Fix an error handling path in 'nvme_probe()'
        nvme-pci: Remove nvme_setup_prps BUG_ON
        nvme-pci: add another device ID with stripe quirk
        nvmet-fc: fix byte swapping in nvmet_fc_ls_create_association
        nvme: fix byte swapping in the streams code
        nbd: kill unused ret in recv_work
        bfq: dispatch request to prevent queue stalling after the request completion
        bfq: fix typos in comments about B-WF2Q+ algorithm
      0151ef00
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · bb236dbe
      Linus Torvalds authored
      Pull more rdma fixes from Doug Ledford:
       "As per my previous pull request, there were two drivers that each had
        a rather large number of legitimate fixes still to be sent.
      
        As it turned out, I also missed a reasonably large set of fixes from
        one person across the stack that are all important fixes. All in all,
        the bnxt_re, i40iw, and Dan Carpenter are 3/4 to 2/3rds of this pull
        request.
      
        There were some other random fixes that I didn't send in the last pull
        request that I added to this one. This catches the rdma stack up to
        the fixes from up to about the beginning of this week. Any more fixes
        I'll wait and batch up later in the -rc cycle. This will give us a
        good base to start with for basing a for-next branch on -rc2.
      
        Summary:
      
         - i40iw fixes
      
         - bnxt_re fixes
      
         - Dan Carpenter bugfixes across stack
      
         - ten more random fixes, no more than two from any one person"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits)
        RDMA/core: Initialize port_num in qp_attr
        RDMA/uverbs: Fix the check for port number
        IB/cma: Fix reference count leak when no ipv4 addresses are set
        RDMA/iser: don't send an rkey if all data is written as immadiate-data
        rxe: fix broken receive queue draining
        RDMA/qedr: Prevent memory overrun in verbs' user responses
        iw_cxgb4: don't use WR keys/addrs for 0 byte reads
        IB/mlx4: Fix CM REQ retries in paravirt mode
        IB/rdmavt: Setting of QP timeout can overflow jiffies computation
        IB/core: Fix sparse warnings
        RDMA/bnxt_re: Fix the value reported for local ack delay
        RDMA/bnxt_re: Report MISSED_EVENTS in req_notify_cq
        RDMA/bnxt_re: Fix return value of poll routine
        RDMA/bnxt_re: Enable atomics only if host bios supports
        RDMA/bnxt_re: Specify RDMA component when allocating stats context
        RDMA/bnxt_re: Fixed the max_rd_atomic support for initiator and destination QP
        RDMA/bnxt_re: Report supported value to IB stack in query_device
        RDMA/bnxt_re: Do not free the ctx_tbl entry if delete GID fails
        RDMA/bnxt_re: Fix WQE Size posted to HW to prevent it from throwing error
        RDMA/bnxt_re: Free doorbell page index (DPI) during dealloc ucontext
        ...
      bb236dbe
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-for-v4.13-rc2' of git://people.freedesktop.org/~airlied/linux · 24a1635a
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "A bunch of fixes for rc2: two imx regressions, vc4 fix, dma-buf fix,
        some displayport mst fixes, and an amdkfd fix.
      
        Nothing too crazy, I assume we just haven't see much rc1 testing yet"
      
      * tag 'drm-fixes-for-v4.13-rc2' of git://people.freedesktop.org/~airlied/linux:
        drm/mst: Avoid processing partially received up/down message transactions
        drm/mst: Avoid dereferencing a NULL mstb in drm_dp_mst_handle_up_req()
        drm/mst: Fix error handling during MST sideband message reception
        drm/imx: parallel-display: Accept drm_of_find_panel_or_bridge failure
        drm/imx: fix typo in ipu_plane_formats[]
        drm/vc4: Fix VBLANK handling in crtc->enable() path
        dma-buf/fence: Avoid use of uninitialised timestamp
        drm/amdgpu: Remove unused field kgd2kfd_shared_resources.num_mec
        drm/radeon: Remove initialization of shared_resources.num_mec
        drm/amdkfd: Remove unused references to shared_resources.num_mec
        drm/amdgpu: Fix KFD oversubscription by tracking queues correctly
      24a1635a
    • Linus Torvalds's avatar
      Merge tag 'trace-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · f79ec886
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "Three minor updates
      
         - Use the new GFP_RETRY_MAYFAIL to be more aggressive in allocating
           memory for the ring buffer without causing OOMs
      
         - Fix a memory leak in adding and removing instances
      
         - Add __rcu annotation to be able to debug RCU usage of function
           tracing a bit better"
      
      * tag 'trace-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        trace: fix the errors caused by incompatible type of RCU variables
        tracing: Fix kmemleak in instance_rmdir
        tracing/ring_buffer: Try harder to allocate
      f79ec886
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · b0a75281
      Linus Torvalds authored
      Pull KVM fixes from Radim Krčmář:
       "A bunch of small fixes for x86"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        kvm: x86: hyperv: avoid livelock in oneshot SynIC timers
        KVM: VMX: Fix invalid guest state detection after task-switch emulation
        x86: add MULTIUSER dependency for KVM
        KVM: nVMX: Disallow VM-entry in MOV-SS shadow
        KVM: nVMX: track NMI blocking state separately for each VMCS
        KVM: x86: masking out upper bits
      b0a75281
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 10fc9554
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "A handful of fixes, mostly for new code:
      
         - some reworking of the new STRICT_KERNEL_RWX support to make sure we
           also remove executable permission from __init memory before it's
           freed.
      
         - a fix to some recent optimisations to the hypercall entry where we
           were clobbering r12, this was breaking nested guests (PR KVM).
      
         - a fix for the recent patch to opal_configure_cores(). This could
           break booting on bare metal Power8 boxes if the kernel was built
           without CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG.
      
         - .. and finally a workaround for spurious PMU interrupts on Power9
           DD2.
      
        Thanks to: Nicholas Piggin, Anton Blanchard, Balbir Singh"
      
      * tag 'powerpc-4.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/mm: Mark __init memory no-execute when STRICT_KERNEL_RWX=y
        powerpc/mm/hash: Refactor hash__mark_rodata_ro()
        powerpc/mm/radix: Refactor radix__mark_rodata_ro()
        powerpc/64s: Fix hypercall entry clobbering r12 input
        powerpc/perf: Avoid spurious PMU interrupts after idle
        powerpc/powernv: Fix boot on Power8 bare metal due to opal_configure_cores()
      10fc9554
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4ec9f7a1
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Half of the fixes are for various build time warnings triggered by
        randconfig builds. Most (but not all...) were harmless.
      
        There's also:
      
         - ACPI boundary condition fixes
      
         - UV platform fixes
      
         - defconfig updates
      
         - an AMD K6 CPU init fix
      
         - a %pOF printk format related preparatory change
      
         - .. and a warning fix related to the tlb/PCID changes"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/devicetree: Convert to using %pOF instead of ->full_name
        x86/platform/uv/BAU: Disable BAU on single hub configurations
        x86/platform/intel-mid: Fix a format string overflow warning
        x86/platform: Add PCI dependency for PUNIT_ATOM_DEBUG
        x86/build: Silence the build with "make -s"
        x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl
        x86/fpu/math-emu: Avoid bogus -Wint-in-bool-context warning
        x86/fpu/math-emu: Fix possible uninitialized variable use
        perf/x86: Shut up false-positive -Wmaybe-uninitialized warning
        x86/defconfig: Remove stale, old Kconfig options
        x86/ioapic: Pass the correct data to unmask_ioapic_irq()
        x86/acpi: Prevent out of bound access caused by broken ACPI tables
        x86/mm, KVM: Fix warning when !CONFIG_PREEMPT_COUNT
        x86/platform/uv/BAU: Fix congested_response_us not taking effect
        x86/cpu: Use indirect call to measure performance in init_amd_k6()
      4ec9f7a1
    • Linus Torvalds's avatar
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e234b4a8
      Linus Torvalds authored
      Pull timer fix from Ingo Molnar:
       "A timer_irq_init() clocksource API robustness fix"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource/drivers/timer-of: Handle of_irq_get_byname() result correctly
      e234b4a8
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 5a77f025
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar:
       "A cputime fix and code comments/organization fix to the deadline
        scheduler"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/deadline: Fix confusing comments about selection of top pi-waiter
        sched/cputime: Don't use smp_processor_id() in preemptible context
      5a77f025
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · bbcdea65
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Two hw-enablement patches, two race fixes, three fixes for regressions
        of semantics, plus a number of tooling fixes"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel: Add proper condition to run sched_task callbacks
        perf/core: Fix locking for children siblings group read
        perf/core: Fix scheduling regression of pinned groups
        perf/x86/intel: Fix debug_store reset field for freq events
        perf/x86/intel: Add Goldmont Plus CPU PMU support
        perf/x86/intel: Enable C-state residency events for Apollo Lake
        perf symbols: Accept zero as the kernel base address
        Revert "perf/core: Drop kernel samples even though :u is specified"
        perf annotate: Fix broken arrow at row 0 connecting jmp instruction to its target
        perf evsel: State in the default event name if attr.exclude_kernel is set
        perf evsel: Fix attr.exclude_kernel setting for default cycles:p
      bbcdea65
    • Linus Torvalds's avatar
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8b810a3a
      Linus Torvalds authored
      Pull locking fixlet from Ingo Molnar:
       "Remove an unnecessary priority adjustment in the rtmutex code"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/rtmutex: Remove unnecessary priority adjustment
      8b810a3a
    • Trond Myklebust's avatar
      NFS/filelayout: Fix racy setting of fl->dsaddr in filelayout_check_deviceid() · 1ebf9801
      Trond Myklebust authored
      We must set fl->dsaddr once, and once only, even if there are multiple
      processes calling filelayout_check_deviceid() for the same layout
      segment.
      Reported-by: default avatarOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      1ebf9801
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 34eddefe
      Linus Torvalds authored
      Pull irq fixes from Ingo Molnar:
       "A resume_irq() fix, plus a number of static declaration fixes"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/digicolor: Drop unnecessary static
        irqchip/mips-cpu: Drop unnecessary static
        irqchip/gic/realview: Drop unnecessary static
        irqchip/mips-gic: Remove population of irq domain names
        genirq/PM: Properly pretend disabled state when force resuming interrupts
      34eddefe
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0a6109fd
      Linus Torvalds authored
      Pull core fixes from Ingo Molnar:
       "A fix to WARN_ON_ONCE() done by modules, plus a MAINTAINERS update"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        debug: Fix WARN_ON_ONCE() for modules
        MAINTAINERS: Update the PTRACE entry
      0a6109fd
    • Trond Myklebust's avatar
      NFS: Be more careful about mapping file permissions · ecbb903c
      Trond Myklebust authored
      When mapping a directory, we want the MAY_WRITE permissions to reflect
      whether or not we have permission to modify, add and delete the directory
      entries. MAY_EXEC must map to lookup permissions.
      
      On the other hand, for files, we want MAY_WRITE to reflect a permission
      to modify and extend the file.
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      ecbb903c
    • Trond Myklebust's avatar
    • Trond Myklebust's avatar
    • Trond Myklebust's avatar
    • NeilBrown's avatar
      net/sunrpc/xprt_sock: fix regression in connection error reporting. · 3ffbc1d6
      NeilBrown authored
      Commit 3d476263 ("tcp: remove poll() flakes when receiving
      RST") in v4.12 changed the order in which ->sk_state_change()
      and ->sk_error_report() are called when a socket is shut
      down - sk_state_change() is now called first.
      
      This causes xs_tcp_state_change() -> xs_sock_mark_closed() ->
      xprt_disconnect_done() to wake all pending tasked with -EAGAIN.
      When the ->sk_error_report() callback arrives, it is too late to
      pass the error on, and it is lost.
      
      As easy way to demonstrate the problem caused is to try to start
      rpc.nfsd while rcpbind isn't running.
      nfsd will attempt a tcp connection to rpcbind.  A ECONNREFUSED
      error is returned, but sunrpc code loses the error and keeps
      retrying.  If it saw the ECONNREFUSED, it would abort.
      
      To fix this, handle the sk->sk_err in the TCP_CLOSE branch of
      xs_tcp_state_change().
      
      Fixes: 3d476263 ("tcp: remove poll() flakes when receiving RST")
      Cc: stable@vger.kernel.org (v4.12)
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      3ffbc1d6
    • Eryu Guan's avatar
      nfs: count correct array for mnt3_counts array size · ecc7b435
      Eryu Guan authored
      Array size of mnt3_counts should be the size of array
      mnt3_procedures, not mnt_procedures, though they're same in size
      right now. Found this by code inspection.
      
      Fixes: 1c5876dd ("sunrpc: move p_count out of struct rpc_procinfo")
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarEryu Guan <eguan@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      ecc7b435
    • Rob Herring's avatar
      x86/devicetree: Convert to using %pOF instead of ->full_name · db15e7f2
      Rob Herring authored
      Now that we have a custom printf format specifier, convert users of
      full_name to use %pOF instead. This is preparation to remove storing
      of the full path string for each device node.
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: devicetree@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170718214339.7774-7-robh@kernel.org
      [ Clarify the error message while at it, as 'node' is ambiguous. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      db15e7f2
    • Jiri Olsa's avatar
      perf/x86/intel: Add proper condition to run sched_task callbacks · df6c3db8
      Jiri Olsa authored
      We have 2 functions using the same sched_task callback:
      
        - PEBS drain for free running counters
        - LBR save/store
      
      Both of them are called from intel_pmu_sched_task() and
      either of them can be unwillingly triggered when the
      other one is configured to run.
      
      Let's say there's PEBS drain configured in sched_task
      callback for the event, but in the callback itself
      (intel_pmu_sched_task()) we will also run the code for
      LBR save/restore, which we did not ask for, but the
      code in intel_pmu_sched_task() does not check for that.
      
      This can lead to extra cycles in some perf monitoring,
      like when we monitor PEBS event without LBR data.
      
        # perf record --no-timestamp -c 10000 -e cycles:p ./perf bench sched pipe -l 1000000
      
        (We need PEBS, non freq/non timestamp event to enable
         the sched_task callback)
      
      The perf stat of cycles and msr:write_msr for above
      command before the change:
        ...
        Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \
                                       ./perf bench sched pipe -l 1000000' (5 runs):
      
          18,519,557,441      cycles:k
              91,195,527      msr:write_msr
      
            29.334476406 seconds time elapsed
      
      And after the change:
        ...
        Performance counter stats for './perf record --no-timestamp -c 10000 -e cycles:p \
                                       ./perf bench sched pipe -l 1000000' (5 runs):
      
          18,704,973,540      cycles:k
              27,184,720      msr:write_msr
      
            16.977875900 seconds time elapsed
      
      There's no affect on cycles:k because the sched_task happens
      with events switched off, however the msr:write_msr tracepoint
      counter together with almost 50% of time speedup show the
      improvement.
      
      Monitoring LBR event and having extra PEBS drain processing
      in sched_task callback showed just a little speedup, because
      the drain function does not do much extra work in case there
      is no PEBS data.
      
      Adding conditions to recognize the configured work that needs
      to be done in the x86_pmu's sched_task callback.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170719075247.GA27506@kravaSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      df6c3db8