1. 20 Feb, 2015 12 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus-3.20-1' of git://git.code.sf.net/p/openipmi/linux-ipmi · 7bad2227
      Linus Torvalds authored
      Pull IPMI driver updates from Corey Minyard:
       "Some minor fixes and cleanups, nothing big.
      
        In for-next for a while and I've done some extensive beating on the
        driver since I have it working in qemu and can do creatively cruel
        things to it"
      
      * tag 'for-linus-3.20-1' of git://git.code.sf.net/p/openipmi/linux-ipmi:
        ipmi: Fix a memory ordering issue
        ipmi: Remove uses of return value of seq_printf
        ipmi: Use is_visible callback for conditional sysfs entries
        ipmi: Free ipmi_recv_msg messages from the linked list on close
        ipmi: avoid gcc warning
        ipmi: Update timespec usage to timespec64
        ipmi: Cleanup DEBUG_TIMING ifdef usage
        drivers:char:ipmi: Remove unneeded FIXME comment in the file,ipmi_si_intf.c
        char: ipmi: Remove obsolete cleanup for clientdata
        ipmi: Remove a FIXME for slab conversion
      7bad2227
    • Corey Minyard's avatar
      ipmi: Fix a memory ordering issue · 1d86e29b
      Corey Minyard authored
      From a locking point of view it is safe to check waiting_msg without
      a lock, but there is a memory ordering issue that causes it to
      possibly not be set right when viewed from another processor.  We are
      already claiming a lock right after that, move the check to inside
      the lock to enforce the memory ordering.
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      1d86e29b
    • Joe Perches's avatar
      ipmi: Remove uses of return value of seq_printf · d6c5dc18
      Joe Perches authored
      The seq_printf like functions will soon be changed to return void.
      
      Convert these uses to check seq_has_overflowed instead.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      d6c5dc18
    • Takashi Iwai's avatar
      ipmi: Use is_visible callback for conditional sysfs entries · 2d06a0c9
      Takashi Iwai authored
      Instead of manual calls of device_create_file() and
      device_remove_file(), implement the condition in is_visible callback
      for the attribute group and put these entries to the group, too.
      This simplifies the code and avoids the possible races.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      2d06a0c9
    • Nicholas Krause's avatar
      ipmi: Free ipmi_recv_msg messages from the linked list on close · bdf2829c
      Nicholas Krause authored
      This adds a loop through the elements in the linked list, recv_msgs using
      list_for_entry_safe in order to free messages in this list.  In addition
      we are using the safe version of this marco in order to prevent use after
      bugs related to deleting the element we are on currently by holding a
      pointer to the next element after the current one we are on and freeing
      with the function, ipmi_free_recv_msg internally in this loop.
      Signed-off-by: default avatarNicholas Krause <xerofoify@gmail.com>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      bdf2829c
    • Arnd Bergmann's avatar
      ipmi: avoid gcc warning · 191cc414
      Arnd Bergmann authored
      A new harmless warning has come up on ARM builds with gcc-4.9:
      
      drivers/char/ipmi/ipmi_msghandler.c: In function 'smi_send.isra.11':
      include/linux/spinlock.h:372:95: warning: 'flags' may be used uninitialized in this function [-Wmaybe-uninitialized]
        raw_spin_unlock_irqrestore(&lock->rlock, flags);
                                                                                                     ^
      drivers/char/ipmi/ipmi_msghandler.c:1490:16: note: 'flags' was declared here
        unsigned long flags;
                      ^
      
      This could be worked around by initializing the 'flags' variable, but it
      seems better to rework the code to avoid this.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 7ea0ed2b ("ipmi: Make the message handler easier to use for SMI interfaces")
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      191cc414
    • John Stultz's avatar
      ipmi: Update timespec usage to timespec64 · 48862ea2
      John Stultz authored
      As part of the internal y2038 cleanup, this patch removes
      timespec usage in the ipmi driver, replacing it timespec64
      
      Cc: openipmi-developer@lists.sourceforge.net
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarCorey Minyard <minyard@mvista.com>
      48862ea2
    • John Stultz's avatar
      ipmi: Cleanup DEBUG_TIMING ifdef usage · f93aae9f
      John Stultz authored
      The driver uses #ifdef DEBUG_TIMING in order to conditionally print out
      timestamped debug messages. Unfortunately it adds the ifdefs all over the
      usage sites.
      
      This patch cleans it up by adding a debug_timestamp() function which
      is compiled out if DEBUG_TIMING isn't present. This cleans up all
      the ugly ifdefs in the function logic.
      
      Cc: openipmi-developer@lists.sourceforge.net
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarCorey Minyard <minyard@mvista.com>
      f93aae9f
    • Nicholas Krause's avatar
      drivers:char:ipmi: Remove unneeded FIXME comment in the file,ipmi_si_intf.c · 31013fa9
      Nicholas Krause authored
      Removes a no longer needed FIXME comment in the function,acpi_gpe_irq_setup
      for the file,ipmi_si_intf.c. This comment is no longer needed as clearly we
      are passing the correct level of  ACPI_GPE_LEVEL_TRIGGERED to the installer
      function,acpi_install_gpe_handler due to no breakage after years of using
      this ACPI level in the function,acpi_install_gpe_handler.
      Signed-off-by: default avatarNicholas Krause <xerofoify@gmail.com>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      31013fa9
    • Wolfram Sang's avatar
      char: ipmi: Remove obsolete cleanup for clientdata · bb82d90e
      Wolfram Sang authored
      A few new i2c-drivers came into the kernel which clear the clientdata-pointer
      on exit or error. This is obsolete meanwhile, the core will do it.
      Signed-off-by: default avatarWolfram Sang <wsa@the-dreams.de>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      bb82d90e
    • Corey Minyard's avatar
      ipmi: Remove a FIXME for slab conversion · 2fcaf60c
      Corey Minyard authored
      There can't be more than a few IPMI messages allocated at any one time,
      so converting the messages to slabs would be a waste.  So just remove
      the FIXME.
      Suggested-by: default avatarNicholas Krause <xerofoify@gmail.com>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      2fcaf60c
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal · 3d883483
      Linus Torvalds authored
      Pull more thermal managament updates from Zhang Rui:
       "Specifics:
      
         - Exynos thermal driver refactoring.  Several cleanups, code
           optimization, unused symbols removal, and unused feature removal in
           Exynos thermal driver.  Thanks Lukasz for this effort.
      
         - Exynos thermal driver support to OF thermal.  After the code
           refactoring, the driver earned the support to OF thermal.  Chip
           thermal data were moved from driver code to DTS, reducing the code
           footprint.  Thanks Lukasz for this.
      
         - After receiving the OF thermal support, the exynos thermal driver
           now must allow modular build.  Thanks Arnd for detecting, reporting
           and fixing this.
      
         - Exynos thermal driver support to Exynos 7 SoC.  Thanks Abhilash for
           this.
      
         - Accurate temperature reporting on Rockchip thermal driver, thanks
           to Caesar.
      
         - Fix on how OF thermal enables its zones, thanks Lukasz for fixing.
      
         - Fixes in OF thermal examples under Documentation/.  Thanks Srinivas
           for fixing"
      
      * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
        thermal: exynos: Add TMU support for Exynos7 SoC
        dts: Documentation: Add documentation for Exynos7 SoC thermal bindings
        cpufreq: exynos: allow modular build
        thermal: Fix examples in DT documentation
        thermal: exynos: Correct sanity check at exynos_report_trigger() function
        thermal: Kconfig: Remove config for not used EXYNOS_THERMAL_CORE
        thermal: exynos: Remove exynos_tmu_data.c file
        thermal: rockchip: make temperature reporting much more accurate
        thermal: exynos: Remove exynos_thermal_common.[c|h] files
        thermal: samsung: core: Exynos TMU rework to use device tree for configuration
        dts: Documentation: Update exynos-thermal.txt example for Exynos5440
        dts: Documentation: Extending documentation entry for exynos-thermal
        cpufreq: exynos: Use device tree to determine if cpufreq cooling should be registered
        thermal: exynos: Modify exynos thermal code to use device tree for cpu cooling configuration
        thermal: exynos: Provide thermal_exynos.h file to be included in device tree files
        thermal: exynos: cosmetic: Correct comment format
        thermal: of: Enable thermal_zoneX when sensor is correctly added
      3d883483
  2. 19 Feb, 2015 28 commits
    • David Vrabel's avatar
      x86: pte_protnone() and pmd_protnone() must check entry is not present · e3a1f6ca
      David Vrabel authored
      Since _PAGE_PROTNONE aliases _PAGE_GLOBAL it is only valid if
      _PAGE_PRESENT is clear.  Make pte_protnone() and pmd_protnone() check
      for this.
      
      This fixes a 64-bit Xen PV guest regression introduced by 8a0516ed
      ("mm: convert p[te|md]_numa users to p[te|md]_protnone_numa").  Any
      userspace process would endlessly fault.
      
      In a 64-bit PV guest, userspace page table entries have _PAGE_GLOBAL set
      by the hypervisor.  This meant that any fault on a present userspace
      entry (e.g., a write to a read-only mapping) would be misinterpreted as
      a NUMA hinting fault and the fault would not be correctly handled,
      resulting in the access endlessly faulting.
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e3a1f6ca
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 2b9fb532
      Linus Torvalds authored
      Pull btrfs updates from Chris Mason:
       "This pull is mostly cleanups and fixes:
      
         - The raid5/6 cleanups from Zhao Lei fixup some long standing warts
           in the code and add improvements on top of the scrubbing support
           from 3.19.
      
         - Josef has round one of our ENOSPC fixes coming from large btrfs
           clusters here at FB.
      
         - Dave Sterba continues a long series of cleanups (thanks Dave), and
           Filipe continues hammering on corner cases in fsync and others
      
        This all was held up a little trying to track down a use-after-free in
        btrfs raid5/6.  It's not clear yet if this is just made easier to
        trigger with this pull or if its a new bug from the raid5/6 cleanups.
        Dave Sterba is the only one to trigger it so far, but he has a
        consistent way to reproduce, so we'll get it nailed shortly"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (68 commits)
        Btrfs: don't remove extents and xattrs when logging new names
        Btrfs: fix fsync data loss after adding hard link to inode
        Btrfs: fix BUG_ON in btrfs_orphan_add() when delete unused block group
        Btrfs: account for large extents with enospc
        Btrfs: don't set and clear delalloc for O_DIRECT writes
        Btrfs: only adjust outstanding_extents when we do a short write
        btrfs: Fix out-of-space bug
        Btrfs: scrub, fix sleep in atomic context
        Btrfs: fix scheduler warning when syncing log
        Btrfs: Remove unnecessary placeholder in btrfs_err_code
        btrfs: cleanup init for list in free-space-cache
        btrfs: delete chunk allocation attemp when setting block group ro
        btrfs: clear bio reference after submit_one_bio()
        Btrfs: fix scrub race leading to use-after-free
        Btrfs: add missing cleanup on sysfs init failure
        Btrfs: fix race between transaction commit and empty block group removal
        btrfs: add more checks to btrfs_read_sys_array
        btrfs: cleanup, rename a few variables in btrfs_read_sys_array
        btrfs: add checks for sys_chunk_array sizes
        btrfs: more superblock checks, lower bounds on devices and sectorsize/nodesize
        ...
      2b9fb532
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 4533f6e2
      Linus Torvalds authored
      Pull Ceph changes from Sage Weil:
       "On the RBD side, there is a conversion to blk-mq from Christoph,
        several long-standing bug fixes from Ilya, and some cleanup from
        Rickard Strandqvist.
      
        On the CephFS side there is a long list of fixes from Zheng, including
        improved session handling, a few IO path fixes, some dcache management
        correctness fixes, and several blocking while !TASK_RUNNING fixes.
      
        The core code gets a few cleanups and Chaitanya has added support for
        TCP_NODELAY (which has been used on the server side for ages but we
        somehow missed on the kernel client).
      
        There is also an update to MAINTAINERS to fix up some email addresses
        and reflect that Ilya and Zheng are doing most of the maintenance for
        RBD and CephFS these days.  Do not be surprised to see a pull request
        come from one of them in the future if I am unavailable for some
        reason"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits)
        MAINTAINERS: update Ceph and RBD maintainers
        libceph: kfree() in put_osd() shouldn't depend on authorizer
        libceph: fix double __remove_osd() problem
        rbd: convert to blk-mq
        ceph: return error for traceless reply race
        ceph: fix dentry leaks
        ceph: re-send requests when MDS enters reconnecting stage
        ceph: show nocephx_require_signatures and notcp_nodelay options
        libceph: tcp_nodelay support
        rbd: do not treat standalone as flatten
        ceph: fix atomic_open snapdir
        ceph: properly mark empty directory as complete
        client: include kernel version in client metadata
        ceph: provide seperate {inode,file}_operations for snapdir
        ceph: fix request time stamp encoding
        ceph: fix reading inline data when i_size > PAGE_SIZE
        ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions)
        ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps)
        ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync)
        rbd: fix error paths in rbd_dev_refresh()
        ...
      4533f6e2
    • Linus Torvalds's avatar
      Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux · 89d3fa45
      Linus Torvalds authored
      Pull thermal managament updates from Zhang Rui:
       "Specifics:
      
         - Abstract the code and introduce helper functions for all int340x
           thermal drivers.  From: Srinivas Pandruvada.
      
         - Reorganize the ACPI LPAT table support code so that it can be
           shared for both ACPI PMIC driver and int340x thermal driver.
      
         - Add support for Braswell in intel_soc_dts thermal driver.
      
         - a couple of small fixes/cleanups for step_wise governor and int340x
           thermal driver"
      
      * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
        Thermal/int340x_thermal: remove unused uuids.
        thermal: step_wise: spelling fixes
        thermal: int340x: fix sparse warning
        Thermal/int340x: LPAT conversion for temperature
        ACPI / PMIC: Use common LPAT table handling functions
        ACPI / LPAT: Common table processing functions
        thermal: Intel SoC DTS: Add Braswell support
        Thermal/int340x/int3402: Provide notification support
        Thermal/int340x/processor_thermal: Add thermal zone support
        Thermal/int340x/int3403: Use int340x thermal API
        Thermal/int340x/int3402: Use int340x thermal API
        Thermal/int340x: Add common thermal zone handler
      89d3fa45
    • Linus Torvalds's avatar
      Merge tag 'edac_fixes_for_3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp · 477ea116
      Linus Torvalds authored
      Pull two EDAC fixes from Borislav Petkov:
      
       - A fix to sb_edac for proper detection on SNB machines
      
       - A fix to amd64_edac to not explode on Numascale machines with more
         than 16 memory controllers, from Daniel J Blueman.
      
      * tag 'edac_fixes_for_3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
        EDAC, amd64_edac: Prevent OOPS with >16 memory controllers
        sb_edac: Fix detection on SNB machines
      477ea116
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v3.20-1' of... · 6ed3e57f
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v3.20-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
      
      Pull platform driver update from Darren Hart:
       "This includes a significant update to the toshiba_acpi driver,
        bringing it to feature parity with the Windows driver, followed by
        some needed cleanups.
      
        The other changes are mostly minor updates, quirks, sparse fixes, or
        cleanups.
      
        Details:
      
         - toshiba_acpi:
             Add support for missing features from the Windows driver, bump the
             sysfs version, and clean up the driver.
      
         - thinkpad_acpi:
             BIOS string versions, unhandled hkey events.
      
         - msamsung-laptop:
             Add native backlight quirk, enable better lid handling.
      
         - intel_scu_ipc:
             Read resources from PCI configuration
      
         - other:
             Fix sparse warnings, general cleanups"
      
      * tag 'platform-drivers-x86-v3.20-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: (34 commits)
        toshiba_acpi: Cleanup GPL header
        toshiba_acpi: Cleanup comment blocks and capitalization
        toshiba_acpi: Make use of DEVICE_ATTR_{RO, RW} macros
        toshiba_acpi: Drop the toshiba_ prefix from sysfs function names
        toshiba_acpi: Move sysfs function and struct declarations further down
        Documentation/ABI: Add file describing the sysfs entries for toshiba_acpi
        toshiba_acpi: Clean file according to coding style
        toshiba_acpi: Bump version number to 0.21
        toshiba_acpi: Add support to enable/disable USB 3
        toshiba_acpi: Add support for Panel Power ON
        toshiba_acpi: Add support for Keyboard functions mode
        toshiba_acpi: Add fan entry to sysfs
        toshiba_acpi: Add version entry to sysfs
        thinkpad_acpi: support new BIOS version string pattern
        thinkpad_acpi: unhandled hkey event
        toshiba_acpi: Make toshiba_eco_mode_available more robust
        classmate-laptop: Fix sparse warning (0 as NULL)
        Sony-laptop: Fix sparse warning (make undeclared var static)
        thinkpad_acpi.c: Fix sparse warning (make undeclared var static)
        samsung-laptop.c: Prefer kstrtoint over single variable sscanf
        ...
      6ed3e57f
    • Linus Torvalds's avatar
      Merge branch 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild · b11a2783
      Linus Torvalds authored
      Pull kconfig updates from Michal Marek:
       "Yann E Morin was supposed to take over kconfig maintainership, but
        this hasn't happened.  So I'm sending a few kconfig patches that I
        collected:
      
         - Fix for missing va_end in kconfig
         - merge_config.sh displays used if given too few arguments
         - s/boolean/bool/ in Kconfig files for consistency, with the plan to
           only support bool in the future"
      
      * 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
        kconfig: use va_end to match corresponding va_start
        merge_config.sh: Display usage if given too few arguments
        kconfig: use bool instead of boolean for type definition attributes
      b11a2783
    • Linus Torvalds's avatar
      Merge branch 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild · 77343343
      Linus Torvalds authored
      Pull misc kbuild changes from Michal Marek:
       "Just a few non-critical kbuild changes:
      
         - builddeb adds the actual distribution name in the changelog
         - documentation fixes"
      
      * 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
        kbuild: trivial - fix the help doc of CONFIG_CC_OPTIMIZE_FOR_SIZE
        kbuild: Update documentation of clean-files and clean-dirs
        builddeb: Try to determine distribution
        builddeb: Update year and git repository URL in debian/copyright
      77343343
    • Sage Weil's avatar
      MAINTAINERS: update Ceph and RBD maintainers · 0f5417ce
      Sage Weil authored
      - add Ilya, drop Yehuda as an RBD maintainer
      - add Zheng as a Ceph maintainer
      - update Yehuda and Sage's emails
      Signed-off-by: default avatarSage Weil <sage@redhat.com>
      0f5417ce
    • Linus Torvalds's avatar
      Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild · 27a22ee4
      Linus Torvalds authored
      Pull kbuild updates from Michal Marek:
      
       - several cleanups in kbuild
      
       - serialize multiple *config targets so that 'make defconfig kvmconfig'
         works
      
       - The cc-ifversion macro got support for an else-branch
      
      * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
        kbuild,gcov: simplify kernel/gcov/Makefile more
        kbuild: allow cc-ifversion to have the argument for false condition
        kbuild,gcov: simplify kernel/gcov/Makefile
        kbuild,gcov: remove unnecessary workaround
        kbuild: do not add $(call ...) to invoke cc-version or cc-fullversion
        kbuild: fix cc-ifversion macro
        kbuild: drop $(version_h) from MRPROPER_FILES
        kbuild: use mixed-targets when two or more config targets are given
        kbuild: remove redundant line from bounds.h/asm-offsets.h
        kbuild: merge bounds.h and asm-offsets.h rules
        kbuild: Drop support for clean-rule
      27a22ee4
    • Ilya Dryomov's avatar
      libceph: kfree() in put_osd() shouldn't depend on authorizer · b28ec2f3
      Ilya Dryomov authored
      a255651d ("ceph: ensure auth ops are defined before use") made
      kfree() in put_osd() conditional on the authorizer.  A mechanical
      mistake most likely - fix it.
      
      Cc: Alex Elder <elder@linaro.org>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      b28ec2f3
    • Ilya Dryomov's avatar
      libceph: fix double __remove_osd() problem · 7eb71e03
      Ilya Dryomov authored
      It turns out it's possible to get __remove_osd() called twice on the
      same OSD.  That doesn't sit well with rb_erase() - depending on the
      shape of the tree we can get a NULL dereference, a soft lockup or
      a random crash at some point in the future as we end up touching freed
      memory.  One scenario that I was able to reproduce is as follows:
      
                  <osd3 is idle, on the osd lru list>
      <con reset - osd3>
      con_fault_finish()
        osd_reset()
                                    <osdmap - osd3 down>
                                    ceph_osdc_handle_map()
                                      <takes map_sem>
                                      kick_requests()
                                        <takes request_mutex>
                                        reset_changed_osds()
                                          __reset_osd()
                                            __remove_osd()
                                        <releases request_mutex>
                                      <releases map_sem>
          <takes map_sem>
          <takes request_mutex>
          __kick_osd_requests()
            __reset_osd()
              __remove_osd() <-- !!!
      
      A case can be made that osd refcounting is imperfect and reworking it
      would be a proper resolution, but for now Sage and I decided to fix
      this by adding a safe guard around __remove_osd().
      
      Fixes: http://tracker.ceph.com/issues/8087
      
      Cc: Sage Weil <sage@redhat.com>
      Cc: stable@vger.kernel.org # 3.9+: 7c6e6fc5: libceph: assert both regular and lingering lists in __remove_osd()
      Cc: stable@vger.kernel.org # 3.9+: cc9f1f51: libceph: change from BUG to WARN for __remove_osd() asserts
      Cc: stable@vger.kernel.org # 3.9+
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      7eb71e03
    • Christoph Hellwig's avatar
      rbd: convert to blk-mq · 7ad18afa
      Christoph Hellwig authored
      This converts the rbd driver to use the blk-mq infrastructure.  Except
      for switching to a per-request work item this is almost mechanical.
      
      This was tested by Alexandre DERUMIER in November, and found to give
      him 120000 iops, although the only comparism available was an old
      3.10 kernel which gave 80000iops.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      [idryomov@gmail.com: context, blk_mq_init_queue() EH]
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      7ad18afa
    • Yan, Zheng's avatar
      ceph: return error for traceless reply race · 4d41cef2
      Yan, Zheng authored
      When we receives traceless reply for request that created new inode,
      we re-send a lookup request to MDS get information of the newly created
      inode. (VFS expects FS' callback return an inode in create case)
      This breaks one request into two requests. Other client may modify or
      move to the new inode in the middle.
      
      When the race happens, ceph_handle_notrace_create() unconditionally
      links the dentry for 'create' operation to the inode returned by lookup.
      This may confuse VFS when the inode is a directory (VFS does not allow
      multiple linkages for directory inode).
      
      This patch makes ceph_handle_notrace_create() when it detect a race.
      This event should be rare and it happens only when we talk to old MDS.
      Recent MDS does not send traceless reply for request that creates new
      inode.
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      4d41cef2
    • Yan, Zheng's avatar
      ceph: fix dentry leaks · 5cba372c
      Yan, Zheng authored
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      5cba372c
    • Yan, Zheng's avatar
      ceph: re-send requests when MDS enters reconnecting stage · 3de22be6
      Yan, Zheng authored
      So that MDS can check if any request is already completed and process
      completed requests in clientreplay stage. When completed requests are
      processed in clientreplay stage, MDS can avoid sending traceless
      replies.
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      3de22be6
    • Ilya Dryomov's avatar
    • Chaitanya Huilgol's avatar
      libceph: tcp_nodelay support · ba988f87
      Chaitanya Huilgol authored
      TCP_NODELAY socket option set on connection sockets,
      disables Nagle’s algorithm and improves latency characteristics.
      tcp_nodelay(default)/notcp_nodelay option flags provided to
      enable/disable setting the socket option.
      Signed-off-by: default avatarChaitanya Huilgol <chaitanya.huilgol@sandisk.com>
      [idryomov@redhat.com: NO_TCP_NODELAY -> TCP_NODELAY, minor adjustments]
      Signed-off-by: default avatarIlya Dryomov <idryomov@redhat.com>
      ba988f87
    • Ilya Dryomov's avatar
      rbd: do not treat standalone as flatten · cf32bd9c
      Ilya Dryomov authored
      If the clone is resized down to 0, it becomes standalone.  If such
      resize is carried over while an image is mapped we would detect this
      and call rbd_dev_parent_put() which means "let go of all parent state,
      including the spec(s) of parent images(s)".  This leads to a mismatch
      between "rbd info" and sysfs parent fields, so a fix is in order.
      
          # rbd create --image-format 2 --size 1 foo
          # rbd snap create foo@snap
          # rbd snap protect foo@snap
          # rbd clone foo@snap bar
          # DEV=$(rbd map bar)
          # rbd resize --allow-shrink --size 0 bar
          # rbd resize --size 1 bar
          # rbd info bar | grep parent
                  parent: rbd/foo@snap
      
      Before:
      
          # cat /sys/bus/rbd/devices/0/parent
          (no parent image)
      
      After:
      
          # cat /sys/bus/rbd/devices/0/parent
          pool_id 0
          pool_name rbd
          image_id 10056b8b4567
          image_name foo
          snap_id 2
          snap_name snap
          overlap 0
      Signed-off-by: default avatarIlya Dryomov <idryomov@redhat.com>
      Reviewed-by: default avatarJosh Durgin <jdurgin@redhat.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      cf32bd9c
    • Yan, Zheng's avatar
      ceph: fix atomic_open snapdir · bf91c315
      Yan, Zheng authored
      ceph_handle_snapdir() checks ceph_mdsc_do_request()'s return value
      and creates snapdir inode if it's -ENOENT
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      bf91c315
    • Yan, Zheng's avatar
      ceph: properly mark empty directory as complete · 2f92b3d0
      Yan, Zheng authored
      ceph_add_cap() calls __check_cap_issue(), which clears directory
      inode' complete flag. so we should set the complete flag for empty
      directory should be set after calling ceph_add_cap().
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      2f92b3d0
    • Yan, Zheng's avatar
      client: include kernel version in client metadata · a6a5ce4f
      Yan, Zheng authored
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      a6a5ce4f
    • Yan, Zheng's avatar
      ceph: provide seperate {inode,file}_operations for snapdir · 38c48b5f
      Yan, Zheng authored
      remove all unsupported operations from {inode,file}_operations.
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      38c48b5f
    • Yan, Zheng's avatar
      ceph: fix request time stamp encoding · 1f041a89
      Yan, Zheng authored
      struct timespec uses 'long' to present second and nanosecond. 'long'
      is 64 bits on 64bits machine. ceph MDS expects time stamp to be
      encoded as struct ceph_timespec, which uses 'u32' to present second
      and nanosecond.
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      1f041a89
    • Yan, Zheng's avatar
      ceph: fix reading inline data when i_size > PAGE_SIZE · fcc02d2a
      Yan, Zheng authored
      when inode has inline data but its size > PAGE_SIZE (it was truncated
      to larger size), previous direct read code return -EIO. This patch adds
      code to return zeros for data whose offset > PAGE_SIZE.
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      fcc02d2a
    • Yan, Zheng's avatar
      ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions) · 86d8f67b
      Yan, Zheng authored
      use an atomic variable to track number of sessions, this can avoid block
      operation inside wait loops.
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      86d8f67b
    • Yan, Zheng's avatar
      ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps) · c4d4a582
      Yan, Zheng authored
      we should not do block operation in wait_event_interruptible()'s condition
      check function, but reading inline data can block. so move the read inline
      data code to ceph_get_caps()
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      c4d4a582
    • Yan, Zheng's avatar
      ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync) · d3383a8e
      Yan, Zheng authored
      check_cap_flush() calls mutex_lock(), which may block. So we can't
      use it as condition check function for wait_event();
      Signed-off-by: default avatarYan, Zheng <zyan@redhat.com>
      d3383a8e