1. 28 Mar, 2017 16 commits
    • Johan Hovold's avatar
      USB: serial: omninet: fix reference leaks at open · 5b6983c4
      Johan Hovold authored
      commit 30572418 upstream.
      
      This driver needlessly took another reference to the tty on open, a
      reference which was then never released on close. This lead to not just
      a leak of the tty, but also a driver reference leak that prevented the
      driver from being unloaded after a port had once been opened.
      
      Fixes: 4a90f09b ("tty: usb-serial krefs")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5b6983c4
    • Johan Hovold's avatar
      USB: serial: safe_serial: fix information leak in completion handler · 4151dae0
      Johan Hovold authored
      commit 8c76d7cd upstream.
      
      Add missing sanity check to the bulk-in completion handler to avoid an
      integer underflow that could be triggered by a malicious device.
      
      This avoids leaking up to 56 bytes from after the URB transfer buffer to
      user space.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4151dae0
    • Guenter Roeck's avatar
      usb: host: xhci-plat: Fix timeout on removal of hot pluggable xhci controllers · 9fae77e3
      Guenter Roeck authored
      commit dcc7620c upstream.
      
      Upstream commit 98d74f9c ("xhci: fix 10 second timeout on removal of
      PCI hotpluggable xhci controllers") fixes a problem with hot pluggable PCI
      xhci controllers which can result in excessive timeouts, to the point where
      the system reports a deadlock.
      
      The same problem is seen with hot pluggable xhci controllers using the
      xhci-plat driver, such as the driver used for Type-C ports on rk3399.
      Similar to hot-pluggable PCI controllers, the driver for this chip
      removes the xhci controller from the system when the Type-C cable is
      disconnected.
      
      The solution for PCI devices works just as well for non-PCI devices
      and avoids the problem.
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      9fae77e3
    • Felipe Balbi's avatar
      usb: dwc3: gadget: make Set Endpoint Configuration macros safe · dc930f29
      Felipe Balbi authored
      commit 7369090a upstream.
      
      Some gadget drivers are bad, bad boys. We notice
      that ADB was passing bad Burst Size which caused top
      bits of param0 to be overwritten which confused DWC3
      when running this command.
      
      In order to avoid future issues, we're going to make
      sure values passed by macros are always safe for the
      controller. Note that ADB still needs a fix to *not*
      pass bad values.
      Reported-by: default avatarMohamed Abbas <mohamed.abbas@intel.com>
      Sugested-by: default avatarAdam Andruszak <adam.andruszak@intel.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      dc930f29
    • Rik van Riel's avatar
      tracing: Add #undef to fix compile error · 0994ff74
      Rik van Riel authored
      commit bf7165cf upstream.
      
      There are several trace include files that define TRACE_INCLUDE_FILE.
      
      Include several of them in the same .c file (as I currently have in
      some code I am working on), and the compile will blow up with a
      "warning: "TRACE_INCLUDE_FILE" redefined #define TRACE_INCLUDE_FILE syscalls"
      
      Every other include file in include/trace/events/ avoids that issue
      by having a #undef TRACE_INCLUDE_FILE before the #define; syscalls.h
      should have one, too.
      
      Link: http://lkml.kernel.org/r/20160928225554.13bd7ac6@annuminas.surriel.com
      
      Fixes: b8007ef7 ("tracing: Separate raw syscall from syscall tracer")
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0994ff74
    • Ralf Baechle's avatar
      MIPS: DEC: Avoid la pseudo-instruction in delay slots · 1b5d88be
      Ralf Baechle authored
      commit 3021773c upstream.
      
      When expanding the la or dla pseudo-instruction in a delay slot the GNU
      assembler will complain should the pseudo-instruction expand to multiple
      actual instructions, since only the first of them will be in the delay
      slot leading to the pseudo-instruction being only partially executed if
      the branch is taken. Use of PTR_LA in the dec int-handler.S leads to
      such warnings:
      
        arch/mips/dec/int-handler.S: Assembler messages:
        arch/mips/dec/int-handler.S:149: Warning: macro instruction expanded into multiple instructions in a branch delay slot
        arch/mips/dec/int-handler.S:198: Warning: macro instruction expanded into multiple instructions in a branch delay slot
      
      Avoid this by open coding the PTR_LA macros.
      Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1b5d88be
    • Arnd Bergmann's avatar
      cpmac: remove hopeless #warning · 74f00de6
      Arnd Bergmann authored
      commit d43e6fb4 upstream.
      
      The #warning was present 10 years ago when the driver first got merged.
      As the platform is rather obsolete by now, it seems very unlikely that
      the warning will cause anyone to fix the code properly.
      
      kernelci.org reports the warning for every build in the meantime, so
      I think it's better to just turn it into a code comment to reduce
      noise.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      74f00de6
    • John Crispin's avatar
      MIPS: ralink: Cosmetic change to prom_init(). · 8c585d84
      John Crispin authored
      commit 9c48568b upstream.
      
      Over the years the code has been changed various times leading to
      argc/argv being defined in a different function to where we actually
      use the variables. Clean this up by moving them to prom_init_cmdline().
      Signed-off-by: default avatarJohn Crispin <john@phrozen.org>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/14902/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8c585d84
    • Arnd Bergmann's avatar
      mtd: pmcmsp: use kstrndup instead of kmalloc+strncpy · 94f8c5b0
      Arnd Bergmann authored
      commit 906b2684 upstream.
      
      kernelci.org reports a warning for this driver, as it copies a local
      variable into a 'const char *' string:
      
          drivers/mtd/maps/pmcmsp-flash.c:149:30: warning: passing argument 1 of 'strncpy' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
      
      Using kstrndup() simplifies the code and avoids the warning.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarMarek Vasut <marek.vasut@gmail.com>
      Signed-off-by: default avatarBrian Norris <computersforpeace@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      94f8c5b0
    • Arnd Bergmann's avatar
      MIPS: ip22: Fix ip28 build for modern gcc · 90a5fa3e
      Arnd Bergmann authored
      commit 23ca9b52 upstream.
      
      kernelci reports a failure of the ip28_defconfig build after upgrading its
      gcc version:
      
      arch/mips/sgi-ip22/Platform:29: *** gcc doesn't support needed option -mr10k-cache-barrier=store.  Stop.
      
      The problem apparently is that the -mr10k-cache-barrier=store option is now
      rejected for CPUs other than r10k. Explicitly including the CPU in the
      check fixes this and is safe because both options were introduced in
      gcc-4.4.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/15049/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      90a5fa3e
    • Arnd Bergmann's avatar
      MIPS: ip27: Disable qlge driver in defconfig · ef3f3ada
      Arnd Bergmann authored
      commit b6176494 upstream.
      
      One of the last remaining failures in kernelci.org is for a gcc bug:
      
      drivers/net/ethernet/qlogic/qlge/qlge_main.c:4819:1: error: insn does not satisfy its constraints:
      drivers/net/ethernet/qlogic/qlge/qlge_main.c:4819:1: internal compiler error: in extract_constrain_insn, at recog.c:2190
      
      This is apparently broken in gcc-6 but fixed in gcc-7, and I cannot
      reproduce the problem here. However, it is clear that ip27_defconfig
      does not actually need this driver as the platform has only PCI-X but
      not PCIe, and the qlge adapter in turn is PCIe-only.
      
      The driver was originally enabled in 2010 along with lots of other
      drivers.
      
      Fixes: 59d302b3 ("MIPS: IP27: Make defconfig useful again.")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/15197/Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ef3f3ada
    • Arnd Bergmann's avatar
      crypto: improve gcc optimization flags for serpent and wp512 · f1ff62bf
      Arnd Bergmann authored
      commit 7d6e9105 upstream.
      
      An ancient gcc bug (first reported in 2003) has apparently resurfaced
      on MIPS, where kernelci.org reports an overly large stack frame in the
      whirlpool hash algorithm:
      
      crypto/wp512.c:987:1: warning: the frame size of 1112 bytes is larger than 1024 bytes [-Wframe-larger-than=]
      
      With some testing in different configurations, I'm seeing large
      variations in stack frames size up to 1500 bytes for what should have
      around 300 bytes at most. I also checked the reference implementation,
      which is essentially the same code but also comes with some test and
      benchmarking infrastructure.
      
      It seems that recent compiler versions on at least arm, arm64 and powerpc
      have a partial fix for this problem, but enabling "-fsched-pressure", but
      even with that fix they suffer from the issue to a certain degree. Some
      testing on arm64 shows that the time needed to hash a given amount of
      data is roughly proportional to the stack frame size here, which makes
      sense given that the wp512 implementation is doing lots of loads for
      table lookups, and the problem with the overly large stack is a result
      of doing a lot more loads and stores for spilled registers (as seen from
      inspecting the object code).
      
      Disabling -fschedule-insns consistently fixes the problem for wp512,
      in my collection of cross-compilers, the results are consistently better
      or identical when comparing the stack sizes in this function, though
      some architectures (notable x86) have schedule-insns disabled by
      default.
      
      The four columns are:
      default: -O2
      press:	 -O2 -fsched-pressure
      nopress: -O2 -fschedule-insns -fno-sched-pressure
      nosched: -O2 -no-schedule-insns (disables sched-pressure)
      
      				default	press	nopress	nosched
      alpha-linux-gcc-4.9.3		1136	848	1136	176
      am33_2.0-linux-gcc-4.9.3	2100	2076	2100	2104
      arm-linux-gnueabi-gcc-4.9.3	848	848	1048	352
      cris-linux-gcc-4.9.3		272	272	272	272
      frv-linux-gcc-4.9.3		1128	1000	1128	280
      hppa64-linux-gcc-4.9.3		1128	336	1128	184
      hppa-linux-gcc-4.9.3		644	308	644	276
      i386-linux-gcc-4.9.3		352	352	352	352
      m32r-linux-gcc-4.9.3		720	656	720	268
      microblaze-linux-gcc-4.9.3	1108	604	1108	256
      mips64-linux-gcc-4.9.3		1328	592	1328	208
      mips-linux-gcc-4.9.3		1096	624	1096	240
      powerpc64-linux-gcc-4.9.3	1088	432	1088	160
      powerpc-linux-gcc-4.9.3		1080	584	1080	224
      s390-linux-gcc-4.9.3		456	456	624	360
      sh3-linux-gcc-4.9.3		292	292	292	292
      sparc64-linux-gcc-4.9.3		992	240	992	208
      sparc-linux-gcc-4.9.3		680	592	680	312
      x86_64-linux-gcc-4.9.3		224	240	272	224
      xtensa-linux-gcc-4.9.3		1152	704	1152	304
      
      aarch64-linux-gcc-7.0.0		224	224	1104	208
      arm-linux-gnueabi-gcc-7.0.1	824	824	1048	352
      mips-linux-gcc-7.0.0		1120	648	1120	272
      x86_64-linux-gcc-7.0.1		240	240	304	240
      
      arm-linux-gnueabi-gcc-4.4.7	840			392
      arm-linux-gnueabi-gcc-4.5.4	784	728	784	320
      arm-linux-gnueabi-gcc-4.6.4	736	728	736	304
      arm-linux-gnueabi-gcc-4.7.4	944	784	944	352
      arm-linux-gnueabi-gcc-4.8.5	464	464	760	352
      arm-linux-gnueabi-gcc-4.9.3	848	848	1048	352
      arm-linux-gnueabi-gcc-5.3.1	824	824	1064	336
      arm-linux-gnueabi-gcc-6.1.1	808	808	1056	344
      arm-linux-gnueabi-gcc-7.0.1	824	824	1048	352
      
      Trying the same test for serpent-generic, the picture is a bit different,
      and while -fno-schedule-insns is generally better here than the default,
      -fsched-pressure wins overall, so I picked that instead.
      
      				default	press	nopress	nosched
      alpha-linux-gcc-4.9.3		1392	864	1392	960
      am33_2.0-linux-gcc-4.9.3	536	524	536	528
      arm-linux-gnueabi-gcc-4.9.3	552	552	776	536
      cris-linux-gcc-4.9.3		528	528	528	528
      frv-linux-gcc-4.9.3		536	400	536	504
      hppa64-linux-gcc-4.9.3		524	208	524	480
      hppa-linux-gcc-4.9.3		768	472	768	508
      i386-linux-gcc-4.9.3		564	564	564	564
      m32r-linux-gcc-4.9.3		712	576	712	532
      microblaze-linux-gcc-4.9.3	724	392	724	512
      mips64-linux-gcc-4.9.3		720	384	720	496
      mips-linux-gcc-4.9.3		728	384	728	496
      powerpc64-linux-gcc-4.9.3	704	304	704	480
      powerpc-linux-gcc-4.9.3		704	296	704	480
      s390-linux-gcc-4.9.3		560	560	592	536
      sh3-linux-gcc-4.9.3		540	540	540	540
      sparc64-linux-gcc-4.9.3		544	352	544	496
      sparc-linux-gcc-4.9.3		544	344	544	496
      x86_64-linux-gcc-4.9.3		528	536	576	528
      xtensa-linux-gcc-4.9.3		752	544	752	544
      
      aarch64-linux-gcc-7.0.0		432	432	656	480
      arm-linux-gnueabi-gcc-7.0.1	616	616	808	536
      mips-linux-gcc-7.0.0		720	464	720	488
      x86_64-linux-gcc-7.0.1		536	528	600	536
      
      arm-linux-gnueabi-gcc-4.4.7	592			440
      arm-linux-gnueabi-gcc-4.5.4	776	448	776	544
      arm-linux-gnueabi-gcc-4.6.4	776	448	776	544
      arm-linux-gnueabi-gcc-4.7.4	768	448	768	544
      arm-linux-gnueabi-gcc-4.8.5	488	488	776	544
      arm-linux-gnueabi-gcc-4.9.3	552	552	776	536
      arm-linux-gnueabi-gcc-5.3.1	552	552	776	536
      arm-linux-gnueabi-gcc-6.1.1	560	560	776	536
      arm-linux-gnueabi-gcc-7.0.1	616	616	808	536
      
      I did not do any runtime tests with serpent, so it is possible that stack
      frame size does not directly correlate with runtime performance here and
      it actually makes things worse, but it's more likely to help here, and
      the reduced stack frame size is probably enough reason to apply the patch,
      especially given that the crypto code is often used in deep call chains.
      
      Link: https://kernelci.org/build/id/58797d7559b5149efdf6c3a9/logs/
      Link: http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11488
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f1ff62bf
    • Johan Hovold's avatar
      USB: serial: digi_acceleport: fix OOB-event processing · 0918203d
      Johan Hovold authored
      commit 2e46565c upstream.
      
      A recent change claimed to fix an off-by-one error in the OOB-port
      completion handler, but instead introduced such an error. This could
      specifically led to modem-status changes going unnoticed, effectively
      breaking TIOCMGET.
      
      Note that the offending commit fixes a loop-condition underflow and is
      marked for stable, but should not be backported without this fix.
      Reported-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Fixes: 2d380889 ("USB: serial: digi_acceleport: fix OOB data sanity
      check")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0918203d
    • Johan Hovold's avatar
      USB: serial: digi_acceleport: fix OOB data sanity check · 771046c5
      Johan Hovold authored
      commit 2d380889 upstream.
      
      Make sure to check for short transfers to avoid underflow in a loop
      condition when parsing the receive buffer.
      
      Also fix an off-by-one error in the incomplete sanity check which could
      lead to invalid data being parsed.
      
      Fixes: 8c209e67 ("USB: make actual_length in struct urb field u32")
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      771046c5
    • Mathias Nyman's avatar
      xhci: fix 10 second timeout on removal of PCI hotpluggable xhci controllers · aa11eaeb
      Mathias Nyman authored
      commit 98d74f9c upstream.
      
      PCI hotpluggable xhci controllers such as some Alpine Ridge solutions will
      remove the xhci controller from the PCI bus when the last USB device is
      disconnected.
      
      Add a flag to indicate that the host is being removed to avoid queueing
      configure_endpoint commands for the dropped endpoints.
      For PCI hotplugged controllers this will prevent 5 second command timeouts
      For static xhci controllers the configure_endpoint command is not needed
      in the removal case as everything will be returned, freed, and the
      controller is reset.
      
      For now the flag is only set for PCI connected host controllers.
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      aa11eaeb
    • Keno Fischer's avatar
      mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp · 6e7c509e
      Keno Fischer authored
      commit 8310d48b upstream.
      
      In commit 19be0eaf ("mm: remove gup_flags FOLL_WRITE games from
      __get_user_pages()"), the mm code was changed from unsetting FOLL_WRITE
      after a COW was resolved to setting the (newly introduced) FOLL_COW
      instead.  Simultaneously, the check in gup.c was updated to still allow
      writes with FOLL_FORCE set if FOLL_COW had also been set.
      
      However, a similar check in huge_memory.c was forgotten.  As a result,
      remote memory writes to ro regions of memory backed by transparent huge
      pages cause an infinite loop in the kernel (handle_mm_fault sets
      FOLL_COW and returns 0 causing a retry, but follow_trans_huge_pmd bails
      out immidiately because `(flags & FOLL_WRITE) && !pmd_write(*pmd)` is
      true.
      
      While in this state the process is stil SIGKILLable, but little else
      works (e.g.  no ptrace attach, no other signals).  This is easily
      reproduced with the following code (assuming thp are set to always):
      
          #include <assert.h>
          #include <fcntl.h>
          #include <stdint.h>
          #include <stdio.h>
          #include <string.h>
          #include <sys/mman.h>
          #include <sys/stat.h>
          #include <sys/types.h>
          #include <sys/wait.h>
          #include <unistd.h>
      
          #define TEST_SIZE 5 * 1024 * 1024
      
          int main(void) {
            int status;
            pid_t child;
            int fd = open("/proc/self/mem", O_RDWR);
            void *addr = mmap(NULL, TEST_SIZE, PROT_READ,
                              MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
            assert(addr != MAP_FAILED);
            pid_t parent_pid = getpid();
            if ((child = fork()) == 0) {
              void *addr2 = mmap(NULL, TEST_SIZE, PROT_READ | PROT_WRITE,
                                 MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
              assert(addr2 != MAP_FAILED);
              memset(addr2, 'a', TEST_SIZE);
              pwrite(fd, addr2, TEST_SIZE, (uintptr_t)addr);
              return 0;
            }
            assert(child == waitpid(child, &status, 0));
            assert(WIFEXITED(status) && WEXITSTATUS(status) == 0);
            return 0;
          }
      
      Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously
      to the update in gup.c in the original commit.  The same pattern exists
      in follow_devmap_pmd.  However, we should not be able to reach that
      check with FOLL_COW set, so add WARN_ONCE to make sure we notice if we
      ever do.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20170106015025.GA38411@juliacomputing.comSigned-off-by: default avatarKeno Fischer <keno@juliacomputing.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2:
       - Drop change to follow_devmap_pmd()
       - pmd_dirty() is not available; check the page flags as in
         can_follow_write_pte()
       - Adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      [mhocko:
        This has been forward ported from the 3.2 stable tree.
        And fixed to return NULL.]
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6e7c509e
  2. 16 Mar, 2017 2 commits
    • Brian Foster's avatar
      xfs: pass total block res. as total xfs_bmapi_write() parameter · 2d288808
      Brian Foster authored
      commit dbd5c8c9 upstream.
      
      The total field from struct xfs_alloc_arg is a bit of an unknown
      commodity. It is documented as the total block requirement for the
      transaction and is used in this manner from most call sites by virtue of
      passing the total block reservation of the transaction associated with
      an allocation. Several xfs_bmapi_write() callers pass hardcoded values
      of 0 or 1 for the total block requirement, which is a historical oddity
      without any clear reasoning.
      
      The xfs_iomap_write_direct() caller, for example, passes 0 for the total
      block requirement. This has been determined to cause problems in the
      form of ABBA deadlocks of AGF buffers due to incorrect AG selection in
      the block allocator. Specifically, the xfs_alloc_space_available()
      function incorrectly selects an AG that doesn't actually have sufficient
      space for the allocation. This occurs because the args.total field is 0
      and thus the remaining free space check on the AG doesn't actually
      consider the size of the allocation request. This locks the AGF buffer,
      the allocation attempt proceeds and ultimately fails (in
      xfs_alloc_fix_minleft()), and xfs_alloc_vexent() moves on to the next
      AG. In turn, this can lead to incorrect AG locking order (if the
      allocator wraps around, attempting to lock AG 0 after acquiring AG N)
      and thus deadlock if racing with another operation. This problem has
      been reproduced via generic/299 on smallish (1GB) ramdisk test devices.
      
      To avoid this problem, replace the undocumented hardcoded total
      parameters from the iomap and utility callers to pass the block
      reservation used for the associated transaction. This is consistent with
      other xfs_bmapi_write() callers throughout XFS. The assumption is that
      the total field allows the selection of an AG that can handle the entire
      operation rather than simply the allocation/range being requested (e.g.,
      resulting btree splits, etc.). This addresses the aforementioned
      generic/299 hang by ensuring AG selection only occurs when the
      allocation can be satisfied by the AG.
      
      [nb] backport to 3.12
      Reported-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarNikolay Borisov <nborisov@suse.com>
      Acked-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2d288808
    • Mikulas Patocka's avatar
      dm: flush queued bios when process blocks to avoid deadlock · d4b83d12
      Mikulas Patocka authored
      commit d67a5f4b upstream.
      
      Commit df2cb6da ("block: Avoid deadlocks with bio allocation by
      stacking drivers") created a workqueue for every bio set and code
      in bio_alloc_bioset() that tries to resolve some low-memory deadlocks
      by redirecting bios queued on current->bio_list to the workqueue if the
      system is low on memory.  However other deadlocks (see below **) may
      happen, without any low memory condition, because generic_make_request
      is queuing bios to current->bio_list (rather than submitting them).
      
      ** the related dm-snapshot deadlock is detailed here:
      https://www.redhat.com/archives/dm-devel/2016-July/msg00065.html
      
      Fix this deadlock by redirecting any bios on current->bio_list to the
      bio_set's rescue workqueue on every schedule() call.  Consequently,
      when the process blocks on a mutex, the bios queued on
      current->bio_list are dispatched to independent workqueus and they can
      complete without waiting for the mutex to be available.
      
      The structure blk_plug contains an entry cb_list and this list can contain
      arbitrary callback functions that are called when the process blocks.
      To implement this fix DM (ab)uses the onstack plug's cb_list interface
      to get its flush_current_bio_list() called at schedule() time.
      
      This fixes the snapshot deadlock - if the map method blocks,
      flush_current_bio_list() will be called and it redirects bios waiting
      on current->bio_list to appropriate workqueues.
      
      Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1267650
      Depends-on: df2cb6da ("block: Avoid deadlocks with bio allocation by stacking drivers")
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d4b83d12
  3. 14 Mar, 2017 1 commit
  4. 13 Mar, 2017 21 commits