1. 11 Nov, 2015 4 commits
    • Sascha Silbe's avatar
      s390/zcrypt: Fix initialisation when zcrypt is built-in · 121a868d
      Sascha Silbe authored
      ap_bus and zcrypt_api assumed module information to always be present
      and initialisation to be done in module loading order (symbol
      dependencies). These assumptions don't hold if zcrypt is built-in;
      THIS_MODULE will be NULL in this case and init call order is linker
      order, i.e. Makefile order.
      
      Fix initialisation order by ordering the object files in the Makefile
      according to their dependencies, like the module loader would do.
      
      Fix message type registration by using a dedicated "name" field rather
      than piggy-backing on the module ("owner") information. There's no
      change to the requirement that module name and msgtype name are
      identical. The existing name macros are used.
      
      We don't need any special code for dealing with the drivers being
      built-in; the generic module support code already does the right
      thing.
      
      Test results:
      1. CONFIG_MODULES=y, CONFIG_ZCRYPT=y
      
         KVM: boots, no /sys/bus/ap (expected)
         LPAR with CEX5: boots, /sys/bus/ap/devices/card*/type present
      
      2. CONFIG_MODULES=y, CONFIG_ZCRYPT=m=:
      
         KVM: boots, loading zcrypt_cex4 (and ap) fails (expected)
         LPAR with CEX5: boots, loading =zcrypt_cex4= succeeds,
         /sys/bus/ap/devices/card*/type present after explicit module
         loading
      
      3. CONFIG_MODULES unset, CONFIG_ZCRYPT=y:
         KVM: boots, no /sys/bus/ap (expected)
         LPAR with CEX5: boots, /sys/bus/ap/devices/card*/type present
      
      No further testing (user-space functionality) was done.
      
      Fixes: 3b6245fd303f ("s390/zcrypt: Separate msgtype implementation from card modules.")
      Signed-off-by: default avatarSascha Silbe <silbe@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      121a868d
    • Sascha Silbe's avatar
      s390/zcrypt: Fix kernel crash on systems without AP bus support · e387753c
      Sascha Silbe authored
      On systems without AP bus (e.g. KVM) the kernel crashes during init
      calls when zcrypt is built-in:
      
      kernel BUG at drivers/base/driver.c:153!
      illegal operation: 0001 ilc:1 [#1] SMP
      Modules linked in:
      CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.2.0+ #221
      task: 0000000010a40000 ti: 0000000010a48000 task.ti:0000000010a48000
      Krnl PSW : 0704c00180000000 0000000000592bd6(driver_register+0x106/0x140)
                 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
                 0000000000000012 0000000000000000 0000000000c45328 0000000000c44e30
                 00000000009ef63c 000000000067f598 0000000000cf3c58 0000000000000000
                 000000000000007b 0000000000cb1030 0000000000000002 0000000000000000
                 0000000000ca8580 0000000010306700 00000000001001d8 0000000010a4bd88
      Krnl Code: 0000000000592bc6: f0b00004ebcf	srp 4(12,%r0),3023(%r14),0
                 0000000000592bcc: f0a0000407f4       srp     4(11,%r0),2036,0
                #0000000000592bd2: a7f40001           brc     15,592bd4
                >0000000000592bd6: e330d0000004       lg      %r3,0(%r13)
                 0000000000592bdc: c0200021edfd       larl    %r2,9d07d6
                 0000000000592be2: c0e500126d8f       brasl   %r14,7e0700
                 0000000000592be8: e330d0080004       lg      %r3,8(%r13)
                 0000000000592bee: a7f4ffab           brc     15,592b44
      Call Trace:
      ([<00000000001001c8>] do_one_initcall+0x90/0x1d0)
       [<0000000000c6dd34>] kernel_init_freeable+0x1e4/0x2a0
       [<00000000007db53a>] kernel_init+0x2a/0x120
       [<00000000007e8ece>] kernel_thread_starter+0x6/0xc
       [<00000000007e8ec8>] kernel_thread_starter+0x0/0xc
      Last Breaking-Event-Address:
       [<0000000000592bd2>] driver_register+0x102/0x140
      
      When zcrypt is built as a module, the module loader ensures that the
      driver modules cannot be loaded if the AP bus module returns an error
      during initialisation. But if zcrypt and the driver are built-in, the
      driver is getting initialised even if the AP bus initialisation
      failed. The driver invokes ap_driver_register() during initialisation,
      which then causes operations on uninitialised data structures to be
      performed.
      
      Explicitly protect ap_driver_register() by introducing an
      "initialised" flag that gets set iff the AP bus initialisation was
      successful. When the AP bus initialisation failed,
      ap_driver_register() will error out with -ENODEV, causing the driver
      initialisation to fail as well.
      
      Test results:
      1. Inside KVM (no AP bus), zcrypt built-in
      
         Boots. /sys/bus/ap not present (expected).
      
      2. Inside KVM (no AP bus), zcrypt as module
      
         Boots. Loading zcrypt_cex4 fails because loading ap_bus fails
         (expected).
      
      3. On LPAR with CEX5, zcrypt built-in
      
         Boots. /sys/bus/ap/devices/card* present but .../card*/type missing
         (i.e. zcrypt_device_register() fails, unrelated issue).
      
      4. On LPAR with CEX5, zcrypt as module
      
         Boots. Loading zcrypt_cex4 successful,
         /sys/bus/ap/devices/card*/type present. No further testing
         (user-space functionality) was done.
      Signed-off-by: default avatarSascha Silbe <silbe@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      e387753c
    • Sebastian Ott's avatar
      s390: add support for ipl devices in subchannel sets > 0 · 18e22a17
      Sebastian Ott authored
      Allow to ipl from CCW based devices residing in any subchannel set.
      Reviewed-by: default avatarMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: default avatarSebastian Ott <sebott@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      18e22a17
    • Sebastian Ott's avatar
      s390/ipl: fix out of bounds access in scpdata_write · e0bedada
      Sebastian Ott authored
      The input buffer in reipl_fcp_scpdata_write is accessed out of bounds
      when an offset is specified. The problem is that the offset refers to
      the data we should write to and not to the buffer we read from.
      
      So instead of
              memcpy(scp_data, buf + off, count);
      we could just do
              memcpy(scp_data + off, buf, count);
      
      However we not only modify the data but also store its length. For this to
      work we'd need to remember a state per open FH. Since that's not possible
      with sysfs callbacks let's just fail when an offset is specified.
      Signed-off-by: default avatarSebastian Ott <sebott@linux.vnet.ibm.com>
      Acked-by: default avatarHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      e0bedada
  2. 09 Nov, 2015 7 commits
  3. 08 Nov, 2015 1 commit
  4. 07 Nov, 2015 28 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · ad804a0b
      Linus Torvalds authored
      Merge second patch-bomb from Andrew Morton:
      
       - most of the rest of MM
      
       - procfs
      
       - lib/ updates
      
       - printk updates
      
       - bitops infrastructure tweaks
      
       - checkpatch updates
      
       - nilfs2 update
      
       - signals
      
       - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
         dma-debug, dma-mapping, ...
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
        ipc,msg: drop dst nil validation in copy_msg
        include/linux/zutil.h: fix usage example of zlib_adler32()
        panic: release stale console lock to always get the logbuf printed out
        dma-debug: check nents in dma_sync_sg*
        dma-mapping: tidy up dma_parms default handling
        pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
        kexec: use file name as the output message prefix
        fs, seqfile: always allow oom killer
        seq_file: reuse string_escape_str()
        fs/seq_file: use seq_* helpers in seq_hex_dump()
        coredump: change zap_threads() and zap_process() to use for_each_thread()
        coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
        signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
        signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
        signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
        signals: kill block_all_signals() and unblock_all_signals()
        nilfs2: fix gcc uninitialized-variable warnings in powerpc build
        nilfs2: fix gcc unused-but-set-variable warnings
        MAINTAINERS: nilfs2: add header file for tracing
        nilfs2: add tracepoints for analyzing reading and writing metadata files
        ...
      ad804a0b
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · ab9f2faf
      Linus Torvalds authored
      Pull rdma updates from Doug Ledford:
       "This is my initial round of 4.4 merge window patches.  There are a few
        other things I wish to get in for 4.4 that aren't in this pull, as
        this represents what has gone through merge/build/run testing and not
        what is the last few items for which testing is not yet complete.
      
         - "Checksum offload support in user space" enablement
         - Misc cxgb4 fixes, add T6 support
         - Misc usnic fixes
         - 32 bit build warning fixes
         - Misc ocrdma fixes
         - Multicast loopback prevention extension
         - Extend the GID cache to store and return attributes of GIDs
         - Misc iSER updates
         - iSER clustering update
         - Network NameSpace support for rdma CM
         - Work Request cleanup series
         - New Memory Registration API"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (76 commits)
        IB/core, cma: Make __attribute_const__ declarations sparse-friendly
        IB/core: Remove old fast registration API
        IB/ipath: Remove fast registration from the code
        IB/hfi1: Remove fast registration from the code
        RDMA/nes: Remove old FRWR API
        IB/qib: Remove old FRWR API
        iw_cxgb4: Remove old FRWR API
        RDMA/cxgb3: Remove old FRWR API
        RDMA/ocrdma: Remove old FRWR API
        IB/mlx4: Remove old FRWR API support
        IB/mlx5: Remove old FRWR API support
        IB/srp: Dont allocate a page vector when using fast_reg
        IB/srp: Remove srp_finish_mapping
        IB/srp: Convert to new registration API
        IB/srp: Split srp_map_sg
        RDS/IW: Convert to new memory registration API
        svcrdma: Port to new memory registration API
        xprtrdma: Port to new memory registration API
        iser-target: Port to new memory registration API
        IB/iser: Port to new fast registration API
        ...
      ab9f2faf
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial · 75021d28
      Linus Torvalds authored
      Pull trivial updates from Jiri Kosina:
       "Trivial stuff from trivial tree that can be trivially summed up as:
      
         - treewide drop of spurious unlikely() before IS_ERR() from Viresh
           Kumar
      
         - cosmetic fixes (that don't really affect basic functionality of the
           driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek
      
         - various comment / printk fixes and updates all over the place"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
        bcache: Really show state of work pending bit
        hwmon: applesmc: fix comment typos
        Kconfig: remove comment about scsi_wait_scan module
        class_find_device: fix reference to argument "match"
        debugfs: document that debugfs_remove*() accepts NULL and error values
        net: Drop unlikely before IS_ERR(_OR_NULL)
        mm: Drop unlikely before IS_ERR(_OR_NULL)
        fs: Drop unlikely before IS_ERR(_OR_NULL)
        drivers: net: Drop unlikely before IS_ERR(_OR_NULL)
        drivers: misc: Drop unlikely before IS_ERR(_OR_NULL)
        UBI: Update comments to reflect UBI_METAONLY flag
        pktcdvd: drop null test before destroy functions
      75021d28
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 6f1da317
      Linus Torvalds authored
      Pull HID updates from Jiri Kosina:
       "Highlights:
      
         - Intel Skylake Win8 precision touchpads support fixes/improvements
           from Mika Westerberg
      
         - Lenovo Yoga 2 quirk from Ritesh Raj Sarraf
      
         - potential uninitialized buffer access fix in HID core from Richard
           Purdie
      
         - Wacom Intuos and Wacom Cintiq 2 support improvements from Jason
           Gerecke and Ping Cheng
      
         - initiation of sysfs deprecation process for most of the roccat
           drivers, from the roccat support maintiner Stefan Achatz
      
         - quite a few device ID / quirk additions and small fixes"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (30 commits)
        HID: logitech: Add support for G29
        HID: logitech: Simplify wheel detection scheme
        HID: wacom: Call 'wacom_query_tablet_data' only after 'hid_hw_start'
        HID: wacom: Fix ABS_MISC reporting for Cintiq Companion 2
        HID: wacom: Remove useless conditions from 'wacom_query_tablet_data'
        HID: wacom: fix Intuos wireless report id issue
        HID: fix some indenting issues
        HID: wacom: Expect 'touch_max' touches if HID_DG_CONTACTCOUNT not present
        HID: wacom: Tie cached HID_DG_CONTACTCOUNT indices to report ID
        HID: roccat: Fixed resubmit: Deprecating most Roccat sysfs attributes
        HID: wacom: Report full pressure range for Intuos, Cintiq 13HD Touch
        HID: wacom: Add support for Cintiq Companion 2
        HID: multitouch: Fetch feature reports on demand for Win8 devices
        HID: sensor-hub: Add quirk for Lenovo Yoga 2 with ITE Chips
        HID: usbhid: Fix for the WiiU adapter from Mayflash
        HID: corsair: boolify struct k90_led.removed
        HID: corsair: Add Corsair Vengeance K90 driver
        HID: hid-input: allow input_configured callback return errors
        HID: multitouch: Add suffix for HID_DG_TOUCHPAD
        HID: i2c-hid: Fill in physical device providing HID functionality
        ...
      6f1da317
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching · 99aaa9c6
      Linus Torvalds authored
      Pull livepatching fix from Jiri Kosina:
       "A fix for a kernel oops in case CONFIG_DEBUG_SET_MODULE_RONX is unset
        (as in such case it's possible for module struct to share a page with
        executable text, which is currently not being handled with grace) from
        Josh Poimboeuf"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
        livepatch: Fix crash with !CONFIG_DEBUG_SET_MODULE_RONX
      99aaa9c6
    • Davidlohr Bueso's avatar
      ipc,msg: drop dst nil validation in copy_msg · 5f2a2d5d
      Davidlohr Bueso authored
      d0edd852 ("ipc: convert invalid scenarios to use WARN_ON") relaxed the
      nil dst parameter check, originally being a full BUG_ON.  However, this
      check seems quite unnecessary when the only purpose is for
      ceckpoint/restore (MSG_COPY flag):
      
      o The copy variable is set initially to nil, apparently as a way of
        ensuring that prepare_copy is previously called.  Which is in fact done,
        unconditionally at the beginning of do_msgrcv.
      
      o There is no concurrency with 'copy' (stack allocated in do_msgrcv).
      
      Furthermore, any errors in 'copy' (and thus prepare_copy/copy_msg) should
      always handled by IS_ERR() family.  Therefore remove this check altogether
      as it can never occur with the current users.
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5f2a2d5d
    • Anish Bhatt's avatar
      include/linux/zutil.h: fix usage example of zlib_adler32() · cb7ae262
      Anish Bhatt authored
      alder32 was renamed to zlib_adler32 since before 2.6.11.
      Signed-off-by: default avatarAnish Bhatt <anish@chelsio.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb7ae262
    • Vitaly Kuznetsov's avatar
      panic: release stale console lock to always get the logbuf printed out · 08d78658
      Vitaly Kuznetsov authored
      In some cases we may end up killing the CPU holding the console lock
      while still having valuable data in logbuf. E.g. I'm observing the
      following:
      
      - A crash is happening on one CPU and console_unlock() is being called on
        some other.
      
      - console_unlock() tries to print out the buffer before releasing the lock
        and on slow console it takes time.
      
      - in the meanwhile crashing CPU does lots of printk()-s with valuable data
        (which go to the logbuf) and sends IPIs to all other CPUs.
      
      - console_unlock() finishes printing previous chunk and enables interrupts
        before trying to print out the rest, the CPU catches the IPI and never
        releases console lock.
      
      This is not the only possible case: in VT/fb subsystems we have many other
      console_lock()/console_unlock() users.  Non-masked interrupts (or
      receiving NMI in case of extreme slowness) will have the same result.
      Getting the whole console buffer printed out on crash should be top
      priority.
      
      [akpm@linux-foundation.org: tweak comment text]
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      08d78658
    • Robin Murphy's avatar
      dma-debug: check nents in dma_sync_sg* · 7f830642
      Robin Murphy authored
      Like dma_unmap_sg, dma_sync_sg* should be called with the original number
      of entries passed to dma_map_sg, so do the same check in the sync path as
      we do in the unmap path.
      Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Sakari Ailus <sakari.ailus@iki.fi>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7f830642
    • Robin Murphy's avatar
      dma-mapping: tidy up dma_parms default handling · 002edb6f
      Robin Murphy authored
      Many DMA controllers and other devices set max_segment_size to
      indicate their scatter-gather capability, but have no interest in
      segment_boundary_mask. However, the existence of a dma_parms structure
      precludes the use of any default value, leaving them as zeros (assuming
      a properly kzalloc'ed structure). If a well-behaved IOMMU (or SWIOTLB)
      then tries to respect this by ensuring a mapped segment does not cross
      a zero-byte boundary, hilarity ensues.
      
      Since zero is a nonsensical value for either parameter, treat it as an
      indicator for "default", as might be expected. In the process, clean up
      a bit by replacing the bare constants with slightly more meaningful
      macros and removing the superfluous "else" statements.
      
      [akpm@linux-foundation.org: dma-mapping.h needs sizes.h for SZ_64K]
      Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Reviewed-by: default avatarSumit Semwal <sumit.semwal@linaro.org>
      Acked-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Sakari Ailus <sakari.ailus@iki.fi>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      002edb6f
    • Ben Segall's avatar
      pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode · 8639b461
      Ben Segall authored
      setpriority(PRIO_USER, 0, x) will change the priority of tasks outside of
      the current pid namespace.  This is in contrast to both the other modes of
      setpriority and the example of kill(-1).  Fix this.  getpriority and
      ioprio have the same failure mode, fix them too.
      
      Eric said:
      
      : After some more thinking about it this patch sounds justifiable.
      :
      : My goal with namespaces is not to build perfect isolation mechanisms
      : as that can get into ill defined territory, but to build well defined
      : mechanisms.  And to handle the corner cases so you can use only
      : a single namespace with well defined results.
      :
      : In this case you have found the two interfaces I am aware of that
      : identify processes by uid instead of by pid.  Which quite frankly is
      : weird.  Unfortunately the weird unexpected cases are hard to handle
      : in the usual way.
      :
      : I was hoping for a little more information.  Changes like this one we
      : have to be careful of because someone might be depending on the current
      : behavior.  I don't think they are and I do think this make sense as part
      : of the pid namespace.
      Signed-off-by: default avatarBen Segall <bsegall@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Ambrose Feinstein <ambrose@google.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8639b461
    • Minfei Huang's avatar
      kexec: use file name as the output message prefix · de90a6bc
      Minfei Huang authored
      kexec output message misses the prefix "kexec", when Dave Young split the
      kexec code.  Now, we use file name as the output message prefix.
      
      Currently, the format of output message:
      [  140.290795] SYSC_kexec_load: hello, world
      [  140.291534] kexec: sanity_check_segment_list: hello, world
      
      Ideally, the format of output message:
      [   30.791503] kexec: SYSC_kexec_load, Hello, world
      [   79.182752] kexec_core: sanity_check_segment_list, Hello, world
      
      Remove the custom prefix "kexec" in output message.
      Signed-off-by: default avatarMinfei Huang <mnfhuang@gmail.com>
      Acked-by: default avatarDave Young <dyoung@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      de90a6bc
    • Greg Thelen's avatar
      fs, seqfile: always allow oom killer · 0f930902
      Greg Thelen authored
      Since 5cec38ac ("fs, seq_file: fallback to vmalloc instead of oom kill
      processes") seq_buf_alloc() avoids calling the oom killer for PAGE_SIZE or
      smaller allocations; but larger allocations can use the oom killer via
      vmalloc().  Thus reads of small files can return ENOMEM, but larger files
      use the oom killer to avoid ENOMEM.
      
      The effect of this bug is that reads from /proc and other virtual
      filesystems can return ENOMEM instead of the preferred behavior - oom
      killing something (possibly the calling process).  I don't know of anyone
      except Google who has noticed the issue.
      
      I suspect the fix is more needed in smaller systems where there isn't any
      reclaimable memory.  But these seem like the kinds of systems which
      probably don't use the oom killer for production situations.
      
      Memory overcommit requires use of the oom killer to select a victim
      regardless of file size.
      
      Enable oom killer for small seq_buf_alloc() allocations.
      
      Fixes: 5cec38ac ("fs, seq_file: fallback to vmalloc instead of oom kill processes")
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0f930902
    • Andy Shevchenko's avatar
      seq_file: reuse string_escape_str() · 25c6bb76
      Andy Shevchenko authored
      strint_escape_str() escapes input string by given criteria.  In case of
      seq_escape() the criteria is to convert some characters to their octal
      representation.
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      25c6bb76
    • Andy Shevchenko's avatar
      fs/seq_file: use seq_* helpers in seq_hex_dump() · 8b91a318
      Andy Shevchenko authored
      This improves code readability.
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8b91a318
    • Oleg Nesterov's avatar
      coredump: change zap_threads() and zap_process() to use for_each_thread() · d61ba589
      Oleg Nesterov authored
      Change zap_threads() paths to use for_each_thread() rather than
      while_each_thread().
      
      While at it, change zap_threads() to avoid the nested if's to make the
      code more readable and lessen the indentation.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Kyle Walker <kwalker@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Stanislav Kozina <skozina@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d61ba589
    • Oleg Nesterov's avatar
      coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP · 5fa534c9
      Oleg Nesterov authored
      task_will_free_mem() is wrong in many ways, and in particular the
      SIGNAL_GROUP_COREDUMP check is not reliable: a task can participate in the
      coredumping without SIGNAL_GROUP_COREDUMP bit set.
      
      change zap_threads() paths to always set SIGNAL_GROUP_COREDUMP even if
      other CLONE_VM processes can't react to SIGKILL.  Fortunately, at least
      oom-kill case if fine; it kills all tasks sharing the same mm, so it
      should also kill the process which actually dumps the core.
      
      The change in prepare_signal() is not strictly necessary, it just ensures
      that the patch does not bring another subtle behavioural change.  But it
      reminds us that this SIGNAL_GROUP_EXIT/COREDUMP case needs more changes.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Kyle Walker <kwalker@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Stanislav Kozina <skozina@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5fa534c9
    • Oleg Nesterov's avatar
      signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT) · 9317bb96
      Oleg Nesterov authored
      jffs2_garbage_collect_thread() does allow_signal(SIGCONT) for no reason,
      SIGCONT will wake a stopped task up even if it is ignored.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Markus Pargmann <mpa@pengutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9317bb96
    • Oleg Nesterov's avatar
      signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread() · 9a13049e
      Oleg Nesterov authored
      jffs2_garbage_collect_thread() can race with SIGCONT and sleep in
      TASK_STOPPED state after it was already sent. Add the new helper,
      kernel_signal_stop(), which does this correctly.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Markus Pargmann <mpa@pengutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a13049e
    • Oleg Nesterov's avatar
      signal: turn dequeue_signal_lock() into kernel_dequeue_signal() · be0e6f29
      Oleg Nesterov authored
      1. Rename dequeue_signal_lock() to kernel_dequeue_signal(). This
         matches another "for kthreads only" kernel_sigaction() helper.
      
      2. Remove the "tsk" and "mask" arguments, they are always current
         and current->blocked. And it is simply wrong if tsk != current.
      
      3. We could also remove the 3rd "siginfo_t *info" arg but it looks
         potentially useful. However we can simplify the callers if we
         change kernel_dequeue_signal() to accept info => NULL.
      
      4. Remove _irqsave, it is never called from atomic context.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Markus Pargmann <mpa@pengutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      be0e6f29
    • Oleg Nesterov's avatar
      signals: kill block_all_signals() and unblock_all_signals() · 2e01fabe
      Oleg Nesterov authored
      It is hardly possible to enumerate all problems with block_all_signals()
      and unblock_all_signals().  Just for example,
      
      1. block_all_signals(SIGSTOP/etc) simply can't help if the caller is
         multithreaded. Another thread can dequeue the signal and force the
         group stop.
      
      2. Even is the caller is single-threaded, it will "stop" anyway. It
         will not sleep, but it will spin in kernel space until SIGCONT or
         SIGKILL.
      
      And a lot more. In short, this interface doesn't work at all, at least
      the last 10+ years.
      
      Daniel said:
      
        Yeah the only times I played around with the DRM_LOCK stuff was when
        old drivers accidentally deadlocked - my impression is that the entire
        DRM_LOCK thing was never really tested properly ;-) Hence I'm all for
        purging where this leaks out of the drm subsystem.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Acked-by: default avatarDave Airlie <airlied@redhat.com>
      Cc: Richard Weinberger <richard@nod.at>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2e01fabe
    • Ryusuke Konishi's avatar
      nilfs2: fix gcc uninitialized-variable warnings in powerpc build · 4f05028f
      Ryusuke Konishi authored
      Some false positive warnings are reported for powerpc build.
      
      The following warnings are reported in
       http://kisskb.ellerman.id.au/kisskb/buildresult/12519703/
      
         CC      fs/nilfs2/super.o
       fs/nilfs2/super.c: In function 'nilfs_resize_fs':
       fs/nilfs2/super.c:376:2: warning: 'blocknr' may be used uninitialized in this function [-Wuninitialized]
       fs/nilfs2/super.c:362:11: note: 'blocknr' was declared here
         CC      fs/nilfs2/recovery.o
       fs/nilfs2/recovery.c: In function 'nilfs_salvage_orphan_logs':
       fs/nilfs2/recovery.c:631:21: warning: 'sum' may be used uninitialized in this function [-Wuninitialized]
       fs/nilfs2/recovery.c:585:32: note: 'sum' was declared here
       fs/nilfs2/recovery.c: In function 'nilfs_search_super_root':
       fs/nilfs2/recovery.c:873:11: warning: 'sum' may be used uninitialized in this function [-Wuninitialized]
      
      Another similar warning is reported in
       http://kisskb.ellerman.id.au/kisskb/buildresult/12520079/
      
         CC      fs/nilfs2/btree.o
       fs/nilfs2/btree.c: In function 'nilfs_btree_convert_and_insert':
       include/asm-generic/bitops/non-atomic.h:105:20: warning: 'bh' may be used uninitialized in this function [-Wuninitialized]
       fs/nilfs2/btree.c:1859:22: note: 'bh' was declared here
      
      This cleans out these warnings by forcing the variables to be initialized.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4f05028f
    • Ryusuke Konishi's avatar
      nilfs2: fix gcc unused-but-set-variable warnings · 09ef29e0
      Ryusuke Konishi authored
      Fix the following build warnings:
      
       $ make W=1
       [...]
         CC [M]  fs/nilfs2/btree.o
       fs/nilfs2/btree.c: In function 'nilfs_btree_split':
       fs/nilfs2/btree.c:923:8: warning: variable 'newptr' set but not used [-Wunused-but-set-variable]
         __u64 newptr;
               ^
       fs/nilfs2/btree.c:922:8: warning: variable 'newkey' set but not used [-Wunused-but-set-variable]
         __u64 newkey;
               ^
         CC [M]  fs/nilfs2/dat.o
       fs/nilfs2/dat.c: In function 'nilfs_dat_prepare_end':
       fs/nilfs2/dat.c:158:8: warning: variable 'start' set but not used [-Wunused-but-set-variable]
         __u64 start;
               ^
         CC [M]  fs/nilfs2/segment.o
       fs/nilfs2/segment.c: In function 'nilfs_segctor_do_immediate_flush':
       fs/nilfs2/segment.c:2433:6: warning: variable 'err' set but not used [-Wunused-but-set-variable]
         int err;
             ^
         CC [M]  fs/nilfs2/sufile.o
       fs/nilfs2/sufile.c: In function 'nilfs_sufile_alloc':
       fs/nilfs2/sufile.c:320:27: warning: variable 'ncleansegs' set but not used [-Wunused-but-set-variable]
         unsigned long nsegments, ncleansegs, nsus, cnt;
                                  ^
         CC [M]  fs/nilfs2/alloc.o
       fs/nilfs2/alloc.c: In function 'nilfs_palloc_prepare_alloc_entry':
       fs/nilfs2/alloc.c:478:38: warning: variable 'groups_per_desc_block' set but not used [-Wunused-but-set-variable]
         unsigned long n, entries_per_group, groups_per_desc_block;
                                             ^
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09ef29e0
    • Ryusuke Konishi's avatar
      MAINTAINERS: nilfs2: add header file for tracing · c35c7ac5
      Ryusuke Konishi authored
      This adds header file "include/trace/events/nilfs2.h" to maintainer-ship
      of nilfs2 so that updates to the nilfs2 header file go to the mailing list
      of nilfs2.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c35c7ac5
    • Hitoshi Mitake's avatar
      nilfs2: add tracepoints for analyzing reading and writing metadata files · a9cd207c
      Hitoshi Mitake authored
      This patch adds tracepoints for analyzing requests of reading and writing
      metadata files.  The tracepoints cover every in-place mdt files (cpfile,
      sufile, and datfile).
      
      Example of tracing mdt_insert_new_block():
                    cp-14635 [000] ...1 30598.199309: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 155
                    cp-14635 [000] ...1 30598.199520: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 5
                    cp-14635 [000] ...1 30598.200828: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 253
      Signed-off-by: default avatarHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: TK Kato <TK.Kato@wdc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9cd207c
    • Hitoshi Mitake's avatar
      nilfs2: add tracepoints for analyzing sufile manipulation · 83eec5e6
      Hitoshi Mitake authored
      This patch adds tracepoints which would be useful for analyzing segment
      usage from a perspective of high level sufile manipulation (check, alloc,
      free).  sufile is an important in-place updated metadata file, so
      analyzing the behavior would be useful for performance turning.
      
      example of usage (a case of allocation):
      
      $ sudo bin/tpoint nilfs2:nilfs2_segment_usage_allocated
      Tracing nilfs2:nilfs2_segment_usage_allocated. Ctrl-C to end.
              segctord-17800 [002] ...1 10671.867294: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 2
              segctord-17800 [002] ...1 10675.073477: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 3
      Signed-off-by: default avatarHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benixon Dhas <benixon.dhas@wdc.com>
      Cc: TK Kato <TK.Kato@wdc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      83eec5e6
    • Hitoshi Mitake's avatar
      nilfs2: add a tracepoint for transaction events · 44fda114
      Hitoshi Mitake authored
      This patch adds a tracepoint for transaction events of nilfs.  With the
      tracepoint, these events can be tracked: begin, abort, commit, trylock,
      lock, and unlock.  Basically, these events have corresponding functions
      e.g.  begin event corresponds nilfs_transaction_begin().  The unlock event
      is an exception.  It corresponds to the iteration in
      nilfs_transaction_lock().
      
      Only one tracepoint is introcued: nilfs2_transaction_transition.  The
      above events are distinguished with newly introduced enum.  With this
      tracepoint, we can analyse a critical section of segment constructoin.
      
      Sample output by tpoint of perf-tools:
                    cp-4457  [000] ...1    63.266220: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 1 flags = 9 state = BEGIN
                    cp-4457  [000] ...1    63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
                    cp-4457  [000] ...1    63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
              segctord-4371  [001] ...1    68.261196: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK
              segctord-4371  [001] ...1    68.261280: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = LOCK
              segctord-4371  [001] ...1    68.261877: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 1 flags = 10 state = BEGIN
              segctord-4371  [001] ...1    68.262116: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = COMMIT
              segctord-4371  [001] ...1    68.265032: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = UNLOCK
              segctord-4371  [001] ...1   132.376847: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK
      
      This patch also does trivial cleaning of comma usage in collection stage
      transition event for consistent coding style.
      Signed-off-by: default avatarHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      44fda114
    • Hitoshi Mitake's avatar
      nilfs2: add a tracepoint for tracking stage transition of segment construction · 58497703
      Hitoshi Mitake authored
      This patch adds a tracepoint for tracking stage transition of block
      collection in segment construction.  With the tracepoint, we can analysis
      the behavior of segment construction in depth.  It would be useful for
      bottleneck detection and debugging, etc.
      
      The tracepoint is created with the standard trace API of linux (like ext3,
      ext4, f2fs and btrfs).  So we can analysis with existing tools easily.  Of
      course, more detailed analysis will be possible if we can create nilfs
      specific analysis tools.
      
      Below is an example of event dump with Brendan Gregg's perf-tools
      (https://github.com/brendangregg/perf-tools).  Time consumption between
      each stage can be obtained.
      
      $ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition
      Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end.
              segctord-14875 [003] ...1 28311.067794: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_INIT
              segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_GC
              segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_FILE
              segctord-14875 [003] ...1 28311.068486: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_IFILE
              segctord-14875 [003] ...1 28311.068540: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_CPFILE
              segctord-14875 [003] ...1 28311.068561: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SUFILE
              segctord-14875 [003] ...1 28311.068565: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DAT
              segctord-14875 [003] ...1 28311.068573: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SR
              segctord-14875 [003] ...1 28311.068574: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DONE
      
      For capturing transition correctly, this patch adds wrappers for the
      member scnt of nilfs_cstage.  With this change, every transition of the
      stage can produce trace event in a correct manner.
      Signed-off-by: default avatarHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      58497703