1. 21 Jan, 2016 14 commits
    • Ilya Dryomov's avatar
      libceph: invalidate AUTH in addition to a service ticket · 187d131d
      Ilya Dryomov authored
      If we fault due to authentication, we invalidate the service ticket we
      have and request a new one - the idea being that if a service rejected
      our authorizer, it must have expired, despite mon_client's attempts at
      periodic renewal.  (The other possibility is that our ticket is too new
      and the service hasn't gotten it yet, in which case invalidating isn't
      necessary but doesn't hurt.)
      
      Invalidating just the service ticket is not enough, though.  If we
      assume a failure on mon_client's part to renew a service ticket, we
      have to assume the same for the AUTH ticket.  If our AUTH ticket is
      bad, we won't get any service tickets no matter how hard we try, so
      invalidate AUTH ticket along with the service ticket.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      187d131d
    • Ilya Dryomov's avatar
      libceph: fix authorizer invalidation, take 2 · 6abe097d
      Ilya Dryomov authored
      Back in 2013, commit 4b8e8b5d ("libceph: fix authorizer
      invalidation") tried to fix authorizer invalidation issues by clearing
      validity field.  However, nothing ever consults this field, so it
      doesn't force us to request any new secrets in any way and therefore we
      never get out of the exponential backoff mode:
      
          [  129.973812] libceph: osd2 192.168.122.1:6810 connect authorization failure
          [  130.706785] libceph: osd2 192.168.122.1:6810 connect authorization failure
          [  131.710088] libceph: osd2 192.168.122.1:6810 connect authorization failure
          [  133.708321] libceph: osd2 192.168.122.1:6810 connect authorization failure
          [  137.706598] libceph: osd2 192.168.122.1:6810 connect authorization failure
          ...
      
      AFAICT this was the case at the time 4b8e8b5d was merged, too.
      
      Using timespec solely as a bool isn't nice, so introduce a new have_key
      flag, specifically for this purpose.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      6abe097d
    • Ilya Dryomov's avatar
      libceph: clear messenger auth_retry flag if we fault · f6330cc1
      Ilya Dryomov authored
      Commit 20e55c4c ("libceph: clear messenger auth_retry flag when we
      authenticate") got us only half way there.  We clear the flag if the
      second attempt succeeds, but it also needs to be cleared if that
      attempt fails, to allow for the exponential backoff to kick in.
      Otherwise, if ->should_authenticate() thinks our keys are valid, we
      will busy loop, incrementing auth_retry to no avail:
      
          process_connect ffff880079a63830 got BADAUTHORIZER attempt 1
          process_connect ffff880079a63830 got BADAUTHORIZER attempt 2
          process_connect ffff880079a63830 got BADAUTHORIZER attempt 3
          process_connect ffff880079a63830 got BADAUTHORIZER attempt 4
          process_connect ffff880079a63830 got BADAUTHORIZER attempt 5
          ...
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      f6330cc1
    • Ilya Dryomov's avatar
      libceph: fix ceph_msg_revoke() · 67645d76
      Ilya Dryomov authored
      There are a number of problems with revoking a "was sending" message:
      
      (1) We never make any attempt to revoke data - only kvecs contibute to
      con->out_skip.  However, once the header (envelope) is written to the
      socket, our peer learns data_len and sets itself to expect at least
      data_len bytes to follow front or front+middle.  If ceph_msg_revoke()
      is called while the messenger is sending message's data portion,
      anything we send after that call is counted by the OSD towards the now
      revoked message's data portion.  The effects vary, the most common one
      is the eventual hang - higher layers get stuck waiting for the reply to
      the message that was sent out after ceph_msg_revoke() returned and
      treated by the OSD as a bunch of data bytes.  This is what Matt ran
      into.
      
      (2) Flat out zeroing con->out_kvec_bytes worth of bytes to handle kvecs
      is wrong.  If ceph_msg_revoke() is called before the tag is sent out or
      while the messenger is sending the header, we will get a connection
      reset, either due to a bad tag (0 is not a valid tag) or a bad header
      CRC, which kind of defeats the purpose of revoke.  Currently the kernel
      client refuses to work with header CRCs disabled, but that will likely
      change in the future, making this even worse.
      
      (3) con->out_skip is not reset on connection reset, leading to one or
      more spurious connection resets if we happen to get a real one between
      con->out_skip is set in ceph_msg_revoke() and before it's cleared in
      write_partial_skip().
      
      Fixing (1) and (3) is trivial.  The idea behind fixing (2) is to never
      zero the tag or the header, i.e. send out tag+header regardless of when
      ceph_msg_revoke() is called.  That way the header is always correct, no
      unnecessary resets are induced and revoke stands ready for disabled
      CRCs.  Since ceph_msg_revoke() rips out con->out_msg, introduce a new
      "message out temp" and copy the header into it before sending.
      
      Cc: stable@vger.kernel.org # 4.0+
      Reported-by: default avatarMatt Conner <matt.conner@keepertech.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Tested-by: default avatarMatt Conner <matt.conner@keepertech.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      67645d76
    • Geliang Tang's avatar
      libceph: use list_for_each_entry_safe · 10bcee14
      Geliang Tang authored
      Use list_for_each_entry_safe() instead of list_for_each_safe() to
      simplify the code.
      Signed-off-by: default avatarGeliang Tang <geliangtang@163.com>
      [idryomov@gmail.com: nuke call to list_splice_init() as well]
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      10bcee14
    • Yan, Zheng's avatar
      ceph: use i_size_{read,write} to get/set i_size · 99c88e69
      Yan, Zheng authored
      Cap message from MDS can update i_size. In that case, we don't
      hold i_mutex. So it's unsafe to directly access inode->i_size
      while holding i_mutex.
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      99c88e69
    • Yan, Zheng's avatar
      ceph: re-send AIO write request when getting -EOLDSNAP error · 5be0389d
      Yan, Zheng authored
      When receiving -EOLDSNAP from OSD, we need to re-send corresponding
      write request. Due to locking issue, we can send new request inside
      another OSD request's complete callback. So we use worker to re-send
      request for AIO write.
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      5be0389d
    • Yan, Zheng's avatar
      ceph: Asynchronous IO support · c8fe9b17
      Yan, Zheng authored
      The basic idea of AIO support is simple, just call kiocb::ki_complete()
      in OSD request's complete callback. But there are several special cases.
      
      when IO span multiple objects, we need to wait until all OSD requests
      are complete, then call kiocb::ki_complete(). Error handling in this case
      is tricky too. For simplify, AIO both span multiple objects and extends
      i_size are not allowed.
      
      Another special case is check EOF for reading (other client can write to
      the file and extend i_size concurrently). For simplify, the direct-IO/AIO
      code path does do the check, fallback to normal syn read instead.
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      c8fe9b17
    • Minfei Huang's avatar
      ceph: Avoid to propagate the invalid page point · 458c4703
      Minfei Huang authored
      The variant pagep will still get the invalid page point, although ceph
      fails in function ceph_update_writeable_page.
      
      To fix this issue, Assigne the page to pagep until there is no failure
      in function ceph_update_writeable_page.
      Signed-off-by: default avatarMinfei Huang <mnfhuang@gmail.com>
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      458c4703
    • Yan, Zheng's avatar
      ceph: fix double page_unlock() in page_mkwrite() · f9cac5ac
      Yan, Zheng authored
      ceph_update_writeable_page() unlocks the page on errors, so
      page_mkwrite() should not unlock the page again.
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      f9cac5ac
    • Markus Elfring's avatar
      rbd: delete an unnecessary check before rbd_dev_destroy() · 1761b229
      Markus Elfring authored
      The rbd_dev_destroy() function tests whether its argument is NULL
      and then returns immediately. Thus the test around the call is not needed.
      
      This issue was detected by using the Coccinelle software.
      Signed-off-by: default avatarMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      1761b229
    • Geliang Tang's avatar
      libceph: use list_next_entry instead of list_entry_next · 17ddc49b
      Geliang Tang authored
      list_next_entry has been defined in list.h, so I replace list_entry_next
      with it.
      Signed-off-by: default avatarGeliang Tang <geliangtang@163.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      17ddc49b
    • Yaowei Bai's avatar
      ceph: ceph_frag_contains_value can be boolean · 79a3ed2e
      Yaowei Bai authored
      This patch makes ceph_frag_contains_value return bool to improve
      readability due to this particular function only using either one or
      zero as its return value.
      
      No functional change.
      Signed-off-by: default avatarYaowei Bai <baiyaowei@cmss.chinamobile.com>
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      79a3ed2e
    • Yaowei Bai's avatar
      ceph: remove unused functions in ceph_frag.h · eade1fe7
      Yaowei Bai authored
      These functions were introduced in commit 3d14c5d2 ("ceph: factor
      out libceph from Ceph file system"). Howover, there's no user of
      these functions since then, so remove them for simplicity.
      Signed-off-by: default avatarYaowei Bai <baiyaowei@cmss.chinamobile.com>
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      eade1fe7
  2. 10 Jan, 2016 1 commit
  3. 09 Jan, 2016 4 commits
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · eac6f76a
      Linus Torvalds authored
      Pull SCSI fix from James Bottomley:
       "A single fix for machines with pages > 4k (PPC mostly).
      
        There's a bug in our optimal transfer size code where we don't account
        for pages > 4k and can set the transfer size to be less than the page
        size causing nasty failures"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        sd: Reject optimal transfer length smaller than page size
      eac6f76a
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.4-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · c0cb1393
      Linus Torvalds authored
      Pull PCI fixlet from Bjorn Helgaas:
       "This marks the TI DRA7xx host bridge driver as broken.  Apparently it
        has never worked without some additional out-of-tree code, so I'm
        going to mark it broken now and remove it completely next cycle unless
        it's fixed"
      
      * tag 'pci-v4.4-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: dra7xx: Mark driver as broken
      c0cb1393
    • Michal Hocko's avatar
      vmstat: allocate vmstat_wq before it is used · 751e5f5c
      Michal Hocko authored
      kernel test robot has reported the following crash:
      
        BUG: unable to handle kernel NULL pointer dereference at 00000100
        IP: [<c1074df6>] __queue_work+0x26/0x390
        *pdpt = 0000000000000000 *pde = f000ff53f000ff53 *pde = f000ff53f000ff53
        Oops: 0000 [#1] PREEMPT PREEMPT SMP SMP
        CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 4.4.0-rc4-00139-g373ccbe5 #1
        Workqueue: events vmstat_shepherd
        task: cb684600 ti: cb7ba000 task.ti: cb7ba000
        EIP: 0060:[<c1074df6>] EFLAGS: 00010046 CPU: 0
        EIP is at __queue_work+0x26/0x390
        EAX: 00000046 EBX: cbb37800 ECX: cbb37800 EDX: 00000000
        ESI: 00000000 EDI: 00000000 EBP: cb7bbe68 ESP: cb7bbe38
         DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
        CR0: 8005003b CR2: 00000100 CR3: 01fd5000 CR4: 000006b0
        Stack:
        Call Trace:
          __queue_delayed_work+0xa1/0x160
          queue_delayed_work_on+0x36/0x60
          vmstat_shepherd+0xad/0xf0
          process_one_work+0x1aa/0x4c0
          worker_thread+0x41/0x440
          kthread+0xb0/0xd0
          ret_from_kernel_thread+0x21/0x40
      
      The reason is that start_shepherd_timer schedules the shepherd work item
      which uses vmstat_wq (vmstat_shepherd) before setup_vmstat allocates
      that workqueue so if the further initialization takes more than HZ we
      might end up scheduling on a NULL vmstat_wq.  This is really unlikely
      but not impossible.
      
      Fixes: 373ccbe5 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress")
      Reported-by: default avatarkernel test robot <ying.huang@linux.intel.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Tested-by: default avatarTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: stable@vger.kernel.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      751e5f5c
    • Linus Torvalds's avatar
      Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 44d8a7d5
      Linus Torvalds authored
      Pull ARM SoC fixes from Arnd Bergmann:
       "This is the final small set of ARM SoC bug fixes for linux-4.4, almost
        all regressions:
      
        OMAP:
         - data corruption on the Nokia N900 flash
      
        Allwinner:
         - Two defconfig change to get USB working again
      
        ARM Versatile:
         - Interrupt numbers gone bad after an older bug fix
      
        Nomadik:
         - Crashes from incorrect L2 cache settings
      
        VIA vt8500:
         - SD/MMC support on WM8650 never worked"
      
      * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        dts: vt8500: Add SDHC node to DTS file for WM8650
        ARM: Fix broken USB support in multi_v7_defconfig for sunxi devices
        ARM: versatile: fix MMC/SD interrupt assignment
        ARM: nomadik: set latencies to 8 cycles
        ARM: OMAP2+: Fix onenand rate detection to avoid filesystem corruption
        ARM: Fix broken USB support in sunxi_defconfig
      44d8a7d5
  4. 08 Jan, 2016 15 commits
  5. 07 Jan, 2016 6 commits