1. 18 May, 2014 23 commits
    • Miao Xie's avatar
      Btrfs: fix inode caching vs tree log · d6db8ad7
      Miao Xie authored
      commit 1c70d8fb upstream.
      
      Currently, with inode cache enabled, we will reuse its inode id immediately
      after unlinking file, we may hit something like following:
      
      |->iput inode
      |->return inode id into inode cache
      |->create dir,fsync
      |->power off
      
      An easy way to reproduce this problem is:
      
      mkfs.btrfs -f /dev/sdb
      mount /dev/sdb /mnt -o inode_cache,commit=100
      dd if=/dev/zero of=/mnt/data bs=1M count=10 oflag=sync
      inode_id=`ls -i /mnt/data | awk '{print $1}'`
      rm -f /mnt/data
      
      i=1
      while [ 1 ]
      do
              mkdir /mnt/dir_$i
              test1=`stat /mnt/dir_$i | grep Inode: | awk '{print $4}'`
              if [ $test1 -eq $inode_id ]
              then
      		dd if=/dev/zero of=/mnt/dir_$i/data bs=1M count=1 oflag=sync
      		echo b > /proc/sysrq-trigger
      	fi
      	sleep 1
              i=$(($i+1))
      done
      
      mount /dev/sdb /mnt
      umount /dev/sdb
      btrfs check /dev/sdb
      
      We fix this problem by adding unlinked inode's id into pinned tree,
      and we can not reuse them until committing transaction.
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d6db8ad7
    • Stefan Behrens's avatar
      Btrfs: Don't allocate inode that is already in use · 8bbfb31d
      Stefan Behrens authored
      commit ff76b056 upstream.
      
      Due to an off-by-one error, it is possible to reproduce a bug
      when the inode cache is used.
      
      The same inode number is assigned twice, the second time this
      leads to an EEXIST in btrfs_insert_empty_items().
      
      The issue can happen when a file is removed right after a subvolume
      is created and then a new inode number is created before the
      inodes in free_inode_pinned are processed.
      unlink() calls btrfs_return_ino() which calls start_caching() in this
      case which adds [highest_ino + 1, BTRFS_LAST_FREE_OBJECTID] by
      searching for the highest inode (which already cannot find the
      unlinked one anymore in btrfs_find_free_objectid()). So if this
      unlinked inode's number is equal to the highest_ino + 1 (or >= this value
      instead of > this value which was the off-by-one error), we mustn't add
      the inode number to free_ino_pinned (caching_thread() does it right).
      In this case we need to try directly to add the number to the inode_cache
      which will fail in this case.
      
      When this inode number is allocated while it is still in free_ino_pinned,
      it is allocated and still added to the free inode cache when the
      pinned inodes are processed, thus one of the following inode number
      allocations will get an inode that is already in use and fail with EEXIST
      in btrfs_insert_empty_items().
      
      One example which was created with the reproducer below:
      Create a snapshot, work in the newly created snapshot for the rest.
      In unlink(inode 34284) call btrfs_return_ino() which calls start_caching().
      start_caching() calls add_free_space [34284, 18446744073709517077].
      In btrfs_return_ino(), call start_caching pinned [34284, 1] which is wrong.
      mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
      btrfs_unpin_free_ino calls add_free_space [34284, 1].
      mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
      EEXIST when the new inode is inserted.
      
      One possible reproducer is this one:
       #!/bin/sh
       # preparation
      TEST_DEV=/dev/sdc1
      TEST_MNT=/mnt
      umount ${TEST_MNT} 2>/dev/null || true
      mkfs.btrfs -f ${TEST_DEV}
      mount ${TEST_DEV} ${TEST_MNT} -o \
       rw,relatime,compress=lzo,space_cache,inode_cache
      btrfs subv create ${TEST_MNT}/s1
      for i in `seq 34027`; do touch ${TEST_MNT}/s1/${i}; done
      btrfs subv snap ${TEST_MNT}/s1 ${TEST_MNT}/s2
      FILENAME=`find ${TEST_MNT}/s1/ -inum 4085 | sed 's|^.*/\([^/]*\)$|\1|'`
      rm ${TEST_MNT}/s2/$FILENAME
      touch ${TEST_MNT}/s2/$FILENAME
       # the following steps can be repeated to reproduce the issue again and again
      [ -e ${TEST_MNT}/s3 ] && btrfs subv del ${TEST_MNT}/s3
      btrfs subv snap ${TEST_MNT}/s2 ${TEST_MNT}/s3
      rm ${TEST_MNT}/s3/$FILENAME
      touch ${TEST_MNT}/s3/$FILENAME
      ls -alFi ${TEST_MNT}/s?/$FILENAME
      touch ${TEST_MNT}/s3/_1 || logger FAILED
      ls -alFi ${TEST_MNT}/s?/_1
      touch ${TEST_MNT}/s3/_2 || logger FAILED
      ls -alFi ${TEST_MNT}/s?/_2
      touch ${TEST_MNT}/s3/__1 || logger FAILED
      ls -alFi ${TEST_MNT}/s?/__1
      touch ${TEST_MNT}/s3/__2 || logger FAILED
      ls -alFi ${TEST_MNT}/s?/__2
       # if the above is not enough, add the following loop:
      for i in `seq 3 9`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
       #for i in `seq 3 34027`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
       # one of the touch(1) calls in s3 fail due to EEXIST because the inode is
       # already in use that btrfs_find_ino_for_alloc() returns.
      Signed-off-by: default avatarStefan Behrens <sbehrens@giantdisaster.de>
      Reviewed-by: default avatarJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: default avatarJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: default avatarChris Mason <chris.mason@fusionio.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      8bbfb31d
    • Johan Hovold's avatar
      USB: serial: fix sysfs-attribute removal deadlock · 27b34b28
      Johan Hovold authored
      commit 10164c2a upstream.
      
      Fix driver new_id sysfs-attribute removal deadlock by making sure to
      not hold any locks that the attribute operations grab when removing the
      attribute.
      
      Specifically, usb_serial_deregister holds the table mutex when
      deregistering the driver, which includes removing the new_id attribute.
      This can lead to a deadlock as writing to new_id increments the
      attribute's active count before trying to grab the same mutex in
      usb_serial_probe.
      
      The deadlock can easily be triggered by inserting a sleep in
      usb_serial_deregister and writing the id of an unbound device to new_id
      during module unload.
      
      As the table mutex (in this case) is used to prevent subdriver unload
      during probe, it should be sufficient to only hold the lock while
      manipulating the usb-serial driver list during deregister. A racing
      probe will then either fail to find a matching subdriver or fail to get
      the corresponding module reference.
      
      Since v3.15-rc1 this also triggers the following lockdep warning:
      
      ======================================================
      [ INFO: possible circular locking dependency detected ]
      3.15.0-rc2 #123 Tainted: G        W
      -------------------------------------------------------
      modprobe/190 is trying to acquire lock:
       (s_active#4){++++.+}, at: [<c0167aa0>] kernfs_remove_by_name_ns+0x4c/0x94
      
      but task is already holding lock:
       (table_lock){+.+.+.}, at: [<bf004d84>] usb_serial_deregister+0x3c/0x78 [usbserial]
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (table_lock){+.+.+.}:
             [<c0075f84>] __lock_acquire+0x1694/0x1ce4
             [<c0076de8>] lock_acquire+0xb4/0x154
             [<c03af3cc>] _raw_spin_lock+0x4c/0x5c
             [<c02bbc24>] usb_store_new_id+0x14c/0x1ac
             [<bf007eb4>] new_id_store+0x68/0x70 [usbserial]
             [<c025f568>] drv_attr_store+0x30/0x3c
             [<c01690e0>] sysfs_kf_write+0x5c/0x60
             [<c01682c0>] kernfs_fop_write+0xd4/0x194
             [<c010881c>] vfs_write+0xbc/0x198
             [<c0108e4c>] SyS_write+0x4c/0xa0
             [<c000f880>] ret_fast_syscall+0x0/0x48
      
      -> #0 (s_active#4){++++.+}:
             [<c03a7a28>] print_circular_bug+0x68/0x2f8
             [<c0076218>] __lock_acquire+0x1928/0x1ce4
             [<c0076de8>] lock_acquire+0xb4/0x154
             [<c0166b70>] __kernfs_remove+0x254/0x310
             [<c0167aa0>] kernfs_remove_by_name_ns+0x4c/0x94
             [<c0169fb8>] remove_files.isra.1+0x48/0x84
             [<c016a2fc>] sysfs_remove_group+0x58/0xac
             [<c016a414>] sysfs_remove_groups+0x34/0x44
             [<c02623b8>] driver_remove_groups+0x1c/0x20
             [<c0260e9c>] bus_remove_driver+0x3c/0xe4
             [<c026235c>] driver_unregister+0x38/0x58
             [<bf007fb4>] usb_serial_bus_deregister+0x84/0x88 [usbserial]
             [<bf004db4>] usb_serial_deregister+0x6c/0x78 [usbserial]
             [<bf005330>] usb_serial_deregister_drivers+0x2c/0x4c [usbserial]
             [<bf016618>] usb_serial_module_exit+0x14/0x1c [sierra]
             [<c009d6cc>] SyS_delete_module+0x184/0x210
             [<c000f880>] ret_fast_syscall+0x0/0x48
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(table_lock);
                                     lock(s_active#4);
                                     lock(table_lock);
        lock(s_active#4);
      
       *** DEADLOCK ***
      
      1 lock held by modprobe/190:
       #0:  (table_lock){+.+.+.}, at: [<bf004d84>] usb_serial_deregister+0x3c/0x78 [usbserial]
      
      stack backtrace:
      CPU: 0 PID: 190 Comm: modprobe Tainted: G        W     3.15.0-rc2 #123
      [<c0015e10>] (unwind_backtrace) from [<c0013728>] (show_stack+0x20/0x24)
      [<c0013728>] (show_stack) from [<c03a9a54>] (dump_stack+0x24/0x28)
      [<c03a9a54>] (dump_stack) from [<c03a7cac>] (print_circular_bug+0x2ec/0x2f8)
      [<c03a7cac>] (print_circular_bug) from [<c0076218>] (__lock_acquire+0x1928/0x1ce4)
      [<c0076218>] (__lock_acquire) from [<c0076de8>] (lock_acquire+0xb4/0x154)
      [<c0076de8>] (lock_acquire) from [<c0166b70>] (__kernfs_remove+0x254/0x310)
      [<c0166b70>] (__kernfs_remove) from [<c0167aa0>] (kernfs_remove_by_name_ns+0x4c/0x94)
      [<c0167aa0>] (kernfs_remove_by_name_ns) from [<c0169fb8>] (remove_files.isra.1+0x48/0x84)
      [<c0169fb8>] (remove_files.isra.1) from [<c016a2fc>] (sysfs_remove_group+0x58/0xac)
      [<c016a2fc>] (sysfs_remove_group) from [<c016a414>] (sysfs_remove_groups+0x34/0x44)
      [<c016a414>] (sysfs_remove_groups) from [<c02623b8>] (driver_remove_groups+0x1c/0x20)
      [<c02623b8>] (driver_remove_groups) from [<c0260e9c>] (bus_remove_driver+0x3c/0xe4)
      [<c0260e9c>] (bus_remove_driver) from [<c026235c>] (driver_unregister+0x38/0x58)
      [<c026235c>] (driver_unregister) from [<bf007fb4>] (usb_serial_bus_deregister+0x84/0x88 [usbserial])
      [<bf007fb4>] (usb_serial_bus_deregister [usbserial]) from [<bf004db4>] (usb_serial_deregister+0x6c/0x78 [usbserial])
      [<bf004db4>] (usb_serial_deregister [usbserial]) from [<bf005330>] (usb_serial_deregister_drivers+0x2c/0x4c [usbserial])
      [<bf005330>] (usb_serial_deregister_drivers [usbserial]) from [<bf016618>] (usb_serial_module_exit+0x14/0x1c [sierra])
      [<bf016618>] (usb_serial_module_exit [sierra]) from [<c009d6cc>] (SyS_delete_module+0x184/0x210)
      [<c009d6cc>] (SyS_delete_module) from [<c000f880>] (ret_fast_syscall+0x0/0x48)
      Signed-off-by: default avatarJohan Hovold <jhovold@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      27b34b28
    • Xiangyu Lu's avatar
      ARM: 8027/1: fix do_div() bug in big-endian systems · 747a78bd
      Xiangyu Lu authored
      commit 80bb3ef1 upstream.
      
      In big-endian systems, "%1" get the most significant part of the value, cause the instruction to get the wrong result.
      
      When viewing ftrace record in big-endian ARM systems, we found that
      the timestamp errors:
      
      swapper-0   [001] 1325.970000:   0:120:R ==> [001]    16:120:R events/1
      events/1-16 [001] 1325.970000:   16:120:S ==> [001]    0:120:R swapper
      swapper-0   [000] 1325.1000000:  0:120:R   + [000]    15:120:R events/0
      swapper-0   [000] 1325.1000000:  0:120:R ==> [000]    15:120:R events/0
      swapper-0   [000] 1326.030000:   0:120:R   + [000]  1150:120:R sshd
      swapper-0   [000] 1326.030000:   0:120:R ==> [000]  1150:120:R sshd
      
      When viewed ftrace records, it will call the do_div(n, base) function, which achieved arch/arm/include/asm/div64.h in. When n = 10000000, base = 1000000, in do_div(n, base) will execute "umull %Q0, %R0, %1, %Q2".
      Reviewed-by: default avatarDave Martin <Dave.Martin@arm.com>
      Reviewed-by: default avatarNicolas Pitre <nico@linaro.org>
      Signed-off-by: default avatarAlex Wu <wuquanming@huawei.com>
      Signed-off-by: default avatarXiangyu Lu <luxiangyu@huawei.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      747a78bd
    • Linus Torvalds's avatar
      mm: make fixup_user_fault() check the vma access rights too · 649c4c00
      Linus Torvalds authored
      commit 1b17844b upstream.
      
      fixup_user_fault() is used by the futex code when the direct user access
      fails, and the futex code wants it to either map in the page in a usable
      form or return an error.  It relied on handle_mm_fault() to map the
      page, and correctly checked the error return from that, but while that
      does map the page, it doesn't actually guarantee that the page will be
      mapped with sufficient permissions to be then accessed.
      
      So do the appropriate tests of the vma access rights by hand.
      
      [ Side note: arguably handle_mm_fault() could just do that itself, but
        we have traditionally done it in the caller, because some callers -
        notably get_user_pages() - have been able to access pages even when
        they are mapped with PROT_NONE.  Maybe we should re-visit that design
        decision, but in the meantime this is the minimal patch. ]
      
      Found by Dave Jones running his trinity tool.
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      649c4c00
    • Hans de Goede's avatar
      Input: synaptics - add min/max quirk for ThinkPad T431s, L440, L540, S1 Yoga and X1 · 53a58d9f
      Hans de Goede authored
      commit 46a2986e upstream.
      
      We expect that all the Haswell series will need such quirks, sigh.
      
      The T431s seems to be T430 hardware in a T440s case, using the T440s touchpad,
      with the same min/max issue.
      
      The X1 Carbon 3rd generation name says 2nd while it is a 3rd generation.
      
      The X1 and T431s share a PnPID with the T540p, but the reported ranges are
      closer to those of the T440s.
      
      HdG: Squashed 5 quirk patches into one. T431s + L440 + L540 are written by me,
      S1 Yoga and X1 are written by Benjamin Tissoires.
      
      Hdg: Standardized S1 Yoga and X1 values, Yoga uses the same touchpad as the
      X240, X1 uses the same touchpad as the T440.
      Signed-off-by: default avatarBenjamin Tissoires <benjamin.tissoires@redhat.com>
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      53a58d9f
    • Jani Nikula's avatar
      dmi: add support for exact DMI matches in addition to substring matching · f632e351
      Jani Nikula authored
      commit 5017b285 upstream.
      
      dmi_match() considers a substring match to be a successful match.  This is
      not always sufficient to distinguish between DMI data for different
      systems.  Add support for exact string matching using strcmp() in addition
      to the substring matching using strstr().
      
      The specific use case in the i915 driver is to allow us to use an exact
      match for D510MO, without also incorrectly matching D510MOV:
      
        {
      	.ident = "Intel D510MO",
      	.matches = {
      		DMI_MATCH(DMI_BOARD_VENDOR, "Intel"),
      		DMI_EXACT_MATCH(DMI_BOARD_NAME, "D510MO"),
      	},
        }
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Cc: <annndddrr@gmail.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Cornel Panceac <cpanceac@gmail.com>
      Acked-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f632e351
    • Mizuma, Masayoshi's avatar
      mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages() · b8309464
      Mizuma, Masayoshi authored
      commit 7848a4bf upstream.
      
      soft lockup in freeing gigantic hugepage fixed in commit 55f67141 "mm:
      hugetlb: fix softlockup when a large number of hugepages are freed." can
      happen in return_unused_surplus_pages(), so let's fix it.
      Signed-off-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b8309464
    • Dan Williams's avatar
      libata/ahci: accommodate tag ordered controllers · f9a65249
      Dan Williams authored
      commit 8a4aeec8 upstream.
      
      The AHCI spec allows implementations to issue commands in tag order
      rather than FIFO order:
      
      	5.3.2.12 P:SelectCmd
      	HBA sets pSlotLoc = (pSlotLoc + 1) mod (CAP.NCS + 1)
      	or HBA selects the command to issue that has had the
      	PxCI bit set to '1' longer than any other command
      	pending to be issued.
      
      The result is that commands posted sequentially (time-wise) may play out
      of sequence when issued by hardware.
      
      This behavior has likely been hidden by drives that arrange for commands
      to complete in issue order.  However, it appears recent drives (two from
      different vendors that we have found so far) inflict out-of-order
      completions as a matter of course.  So, we need to take care to maintain
      ordered submission, otherwise we risk triggering a drive to fall out of
      sequential-io automation and back to random-io processing, which incurs
      large latency and degrades throughput.
      
      This issue was found in simple benchmarks where QD=2 seq-write
      performance was 30-50% *greater* than QD=32 seq-write performance.
      
      Tagging for -stable and making the change globally since it has a low
      risk-to-reward ratio.  Also, word is that recent versions of an unnamed
      OS also does it this way now.  So, drives in the field are already
      experienced with this tag ordering scheme.
      
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Ed Ciechanowski <ed.ciechanowski@intel.com>
      Reviewed-by: default avatarMatthew Wilcox <matthew.r.wilcox@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f9a65249
    • Jeff Layton's avatar
      nfsd: set timeparms.to_maxval in setup_callback_client · 9b492e65
      Jeff Layton authored
      commit 3758cf7e upstream.
      
      ...otherwise the logic in the timeout handling doesn't work correctly.
      Spotted-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      [bwh: Backported to 3.2: max_cb_time() takes no parameters]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9b492e65
    • Michael Ulbricht's avatar
      USB: cdc-acm: Remove Motorola/Telit H24 serial interfaces from ACM driver · 0d71d4e4
      Michael Ulbricht authored
      commit 895d240d upstream.
      
      By specifying NO_UNION_NORMAL the ACM driver does only use the first two
      USB interfaces (modem data & control). The AT Port, Diagnostic and NMEA
      interfaces are left to the USB serial driver.
      Signed-off-by: default avatarMichael Ulbricht <michael.ulbricht@systec-electronic.com>
      Signed-off-by: default avatarAlexander Stein <alexander.stein@systec-electronic.com>
      Signed-off-by: default avatarOliver Neukum <oliver@neukum.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0d71d4e4
    • Aaron Sanders's avatar
      USB: pl2303: add ids for Hewlett-Packard HP POS pole displays · 9296342c
      Aaron Sanders authored
      commit b16c02fb upstream.
      
      Add device ids to pl2303 for the Hewlett-Packard HP POS pole displays:
      
      LD960: 03f0:0B39
      LCM220: 03f0:3139
      LCM960: 03f0:3239
      
      [ Johan: fix indentation and sort PIDs numerically ]
      Signed-off-by: default avatarAaron Sanders <aaron.sanders@hp.com>
      Signed-off-by: default avatarJohan Hovold <jhovold@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9296342c
    • Tristan Bruns's avatar
      38036481
    • Daniele Palmas's avatar
      usb: option driver, add support for Telit UE910v2 · 065ca198
      Daniele Palmas authored
      commit d6de486b upstream.
      
      option driver, added VID/PID for Telit UE910v2 modem
      Signed-off-by: default avatarDaniele Palmas <dnlplm@gmail.com>
      Signed-off-by: default avatarJohan Hovold <jhovold@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      065ca198
    • Johan Hovold's avatar
      Revert "USB: serial: add usbid for dell wwan card to sierra.c" · 52752b3f
      Johan Hovold authored
      commit 2e01280d upstream.
      
      This reverts commit 1ebca9da.
      
      This device was erroneously added to the sierra driver even though it's
      not a Sierra device and was already handled by the option driver.
      
      Cc: Richard Farina <sidhayn@gmail.com>
      Signed-off-by: default avatarJohan Hovold <jhovold@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      52752b3f
    • Michele Baldessari's avatar
      USB: serial: ftdi_sio: add id for Brainboxes serial cards · 7b788101
      Michele Baldessari authored
      commit efe26e16 upstream.
      
      Custom VID/PIDs for Brainboxes cards as reported in
      https://bugzilla.redhat.com/show_bug.cgi?id=1071914Signed-off-by: default avatarMichele Baldessari <michele@acksyn.org>
      Signed-off-by: default avatarJohan Hovold <jhovold@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7b788101
    • Larry Finger's avatar
      staging: r8712u: Fix case where ethtype was never obtained and always be checked against 0 · a7503c19
      Larry Finger authored
      commit f764cd68 upstream.
      
      Zero-initializing ether_type masked that the ether type would never be
      obtained for 8021x packets and the comparison against eapol_type
      would always fail.
      Reported-by: default avatarJes Sorensen <Jes.Sorensen@redhat.com>
      Signed-off-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a7503c19
    • Chris Mason's avatar
      mlx4_en: don't use napi_synchronize inside mlx4_en_netpoll · ef70c601
      Chris Mason authored
      commit c98235cb upstream.
      
      The mlx4 driver is triggering schedules while atomic inside
      mlx4_en_netpoll:
      
      	spin_lock_irqsave(&cq->lock, flags);
      	napi_synchronize(&cq->napi);
      		^^^^^ msleep here
      	mlx4_en_process_rx_cq(dev, cq, 0);
      	spin_unlock_irqrestore(&cq->lock, flags);
      
      This was part of a patch by Alexander Guller from Mellanox in 2011,
      but it still isn't upstream.
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Acked-By: default avatarAmir Vadai <amirv@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ef70c601
    • Jeff Layton's avatar
      locks: allow __break_lease to sleep even when break_time is 0 · 48633e72
      Jeff Layton authored
      commit f1c6bb2c upstream.
      
      A fl->fl_break_time of 0 has a special meaning to the lease break code
      that basically means "never break the lease". knfsd uses this to ensure
      that leases don't disappear out from under it.
      
      Unfortunately, the code in __break_lease can end up passing this value
      to wait_event_interruptible as a timeout, which prevents it from going
      to sleep at all. This makes __break_lease to spin in a tight loop and
      causes soft lockups.
      
      Fix this by ensuring that we pass a minimum value of 1 as a timeout
      instead.
      
      Cc: J. Bruce Fields <bfields@fieldses.org>
      Reported-by: default avatarTerry Barnaby <terry1@beam.ltd.uk>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      48633e72
    • Helge Deller's avatar
      parisc: fix epoll_pwait syscall on compat kernel · a87ba5f1
      Helge Deller authored
      commit ab3e55b1 upstream.
      
      This bug was detected with the libio-epoll-perl debian package where the
      test case IO-Ppoll-compat.t failed.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a87ba5f1
    • Theodore Ts'o's avatar
      ext4: use i_size_read in ext4_unaligned_aio() · 0289029f
      Theodore Ts'o authored
      commit 6e6358fc upstream.
      
      We haven't taken i_mutex yet, so we need to use i_size_read().
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      0289029f
    • Matthew Wilcox's avatar
      ext4: note the error in ext4_end_bio() · 81692a1d
      Matthew Wilcox authored
      commit 9503c67c upstream.
      
      ext4_end_bio() currently throws away the error that it receives.  Chances
      are this is part of a spate of errors, one of which will end up getting
      the error returned to userspace somehow, but we shouldn't take that risk.
      Also print out the errno to aid in debug.
      Signed-off-by: default avatarMatthew Wilcox <matthew.r.wilcox@intel.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      81692a1d
    • Kazuya Mio's avatar
      ext4: FIBMAP ioctl causes BUG_ON due to handle EXT_MAX_BLOCKS · 5e76e584
      Kazuya Mio authored
      commit 4adb6ab3 upstream.
      
      When we try to get 2^32-1 block of the file which has the extent
      (ee_block=2^32-2, ee_len=1) with FIBMAP ioctl, it causes BUG_ON
      in ext4_ext_put_gap_in_cache().
      
      To avoid the problem, ext4_map_blocks() needs to check the file logical block
      number. ext4_ext_put_gap_in_cache() called via ext4_map_blocks() cannot
      handle 2^32-1 because the maximum file logical block number is 2^32-2.
      
      Note that ext4_ind_map_blocks() returns -EIO when the block number is invalid.
      So ext4_map_blocks() should also return the same errno.
      Signed-off-by: default avatarKazuya Mio <k-mio@sx.jp.nec.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      5e76e584
  2. 30 Apr, 2014 17 commits
    • Ben Hutchings's avatar
      Linux 3.2.58 · f453538a
      Ben Hutchings authored
      f453538a
    • Ben Hutchings's avatar
      Revert "isci: fix reset timeout handling" · 2e59f013
      Ben Hutchings authored
      This reverts commit 584ec122, which
      was commit ddfadd77 upstream.  It
      causes boot failure on 3.2 although no such problem occurs upstream.
      Reported-by: default avatarOndrej Zary <linux@rainbow-software.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      2e59f013
    • Ben Hutchings's avatar
      Revert "alpha: fix broken network checksum" · 53119558
      Ben Hutchings authored
      This reverts commit b93b90ff, which
      was commit 0ef38d70 upstream.
      It was intended to fix a regression which never occurred in 3.2.
      53119558
    • Mikulas Patocka's avatar
      powernow-k6: reorder frequencies · 3d793c47
      Mikulas Patocka authored
      commit 22c73795 upstream.
      
      This patch reorders reported frequencies from the highest to the lowest,
      just like in other frequency drivers.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      [bwh: Backported to 3.2: cpu_frequency_table::driver_data is called index]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      3d793c47
    • Mikulas Patocka's avatar
      powernow-k6: correctly initialize default parameters · 133eadd5
      Mikulas Patocka authored
      commit d82b922a upstream.
      
      The powernow-k6 driver used to read the initial multiplier from the
      powernow register. However, there is a problem with this:
      
      * If there was a frequency transition before, the multiplier read from the
        register corresponds to the current multiplier.
      * If there was no frequency transition since reset, the field in the
        register always reads as zero, regardless of the current multiplier that
        is set using switches on the mainboard and that the CPU is running at.
      
      The zero value corresponds to multiplier 4.5, so as a consequence, the
      powernow-k6 driver always assumes multiplier 4.5.
      
      For example, if we have 550MHz CPU with bus frequency 100MHz and
      multiplier 5.5, the powernow-k6 driver thinks that the multiplier is 4.5
      and bus frequency is 122MHz. The powernow-k6 driver then sets the
      multiplier to 4.5, underclocking the CPU to 450MHz, but reports the
      current frequency as 550MHz.
      
      There is no reliable way how to read the initial multiplier. I modified
      the driver so that it contains a table of known frequencies (based on
      parameters of existing CPUs and some common overclocking schemes) and sets
      the multiplier according to the frequency. If the frequency is unknown
      (because of unusual overclocking or underclocking), the user must supply
      the bus speed and maximum multiplier as module parameters.
      
      This patch should be backported to all stable kernels. If it doesn't
      apply cleanly, change it, or ask me to change it.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      [bwh: Backported to 3.2:
       - Adjust context
       - s/driver_data/index/]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      133eadd5
    • Mikulas Patocka's avatar
      powernow-k6: disable cache when changing frequency · f5fe5bb9
      Mikulas Patocka authored
      commit e20e1d0a upstream.
      
      I found out that a system with k6-3+ processor is unstable during network
      server load. The system locks up or the network card stops receiving. The
      reason for the instability is the CPU frequency scaling.
      
      During frequency transition the processor is in "EPM Stop Grant" state.
      The documentation says that the processor doesn't respond to inquiry
      requests in this state. Consequently, coherency of processor caches and
      bus master devices is not maintained, causing the system instability.
      
      This patch flushes the cache during frequency transition. It fixes the
      instability.
      
      Other minor changes:
      * u64 invalue changed to unsigned long because the variable is 32-bit
      * move the logic to set the multiplier to a separate function
        powernow_k6_set_cpu_multiplier
      * preserve lower 5 bits of the powernow port instead of 4 (the voltage
        field has 5 bits)
      * mask interrupts when reading the multiplier, so that the port is not
        open during other activity (running other kernel code with the port open
        shouldn't cause any misbehavior, but we should better be safe and keep
        the port closed)
      
      This patch should be backported to all stable kernels. If it doesn't
      apply cleanly, change it, or ask me to change it.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f5fe5bb9
    • Paul Moore's avatar
      selinux: correctly label /proc inodes in use before the policy is loaded · a07089d7
      Paul Moore authored
      commit f64410ec upstream.
      
      This patch is based on an earlier patch by Eric Paris, he describes
      the problem below:
      
        "If an inode is accessed before policy load it will get placed on a
         list of inodes to be initialized after policy load.  After policy
         load we call inode_doinit() which calls inode_doinit_with_dentry()
         on all inodes accessed before policy load.  In the case of inodes
         in procfs that means we'll end up at the bottom where it does:
      
           /* Default to the fs superblock SID. */
           isec->sid = sbsec->sid;
      
           if ((sbsec->flags & SE_SBPROC) && !S_ISLNK(inode->i_mode)) {
                   if (opt_dentry) {
                           isec->sclass = inode_mode_to_security_class(...)
                           rc = selinux_proc_get_sid(opt_dentry,
                                                     isec->sclass,
                                                     &sid);
                           if (rc)
                                   goto out_unlock;
                           isec->sid = sid;
                   }
           }
      
         Since opt_dentry is null, we'll never call selinux_proc_get_sid()
         and will leave the inode labeled with the label on the superblock.
         I believe a fix would be to mimic the behavior of xattrs.  Look
         for an alias of the inode.  If it can't be found, just leave the
         inode uninitialized (and pick it up later) if it can be found, we
         should be able to call selinux_proc_get_sid() ..."
      
      On a system exhibiting this problem, you will notice a lot of files in
      /proc with the generic "proc_t" type (at least the ones that were
      accessed early in the boot), for example:
      
         # ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }'
         system_u:object_r:proc_t:s0 /proc/sys/kernel/shmmax
      
      However, with this patch in place we see the expected result:
      
         # ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }'
         system_u:object_r:sysctl_kernel_t:s0 /proc/sys/kernel/shmmax
      
      Cc: Eric Paris <eparis@redhat.com>
      Signed-off-by: default avatarPaul Moore <pmoore@redhat.com>
      Acked-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a07089d7
    • Jiri Slaby's avatar
      Char: ipmi_bt_sm, fix infinite loop · 24b528cb
      Jiri Slaby authored
      commit a94cdd1f upstream.
      
      In read_all_bytes, we do
      
        unsigned char i;
        ...
        bt->read_data[0] = BMC2HOST;
        bt->read_count = bt->read_data[0];
        ...
        for (i = 1; i <= bt->read_count; i++)
          bt->read_data[i] = BMC2HOST;
      
      If bt->read_data[0] == bt->read_count == 255, we loop infinitely in the
      'for' loop.  Make 'i' an 'int' instead of 'char' to get rid of the
      overflow and finish the loop after 255 iterations every time.
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Reported-and-debugged-by: default avatarRui Hui Dian <rhdian@novell.com>
      Cc: Tomas Cech <tcech@suse.cz>
      Cc: Corey Minyard <minyard@acm.org>
      Cc: <openipmi-developer@lists.sourceforge.net>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      24b528cb
    • Qiang Huang's avatar
      drivers: hv: additional switch to use mb() instead of smp_mb() · 683bce8d
      Qiang Huang authored
      commit e4af376d(drivers: hv: switch to use mb() instead of smp_mb()),
      the adjustment mistakenly dropped the change in hv_ringbuffer_read,
      so add it.
      Signed-off-by: default avatarQiang Huang <h.huangqiang@huawei.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      683bce8d
    • Andy Grover's avatar
      target/tcm_fc: Fix use-after-free of ft_tpg · 6ff2fec9
      Andy Grover authored
      commit 2c42be2d upstream.
      
      ft_del_tpg checks tpg->tport is set before unlinking the tpg from the
      tport when the tpg is being removed. Set this pointer in ft_tport_create,
      or the unlinking won't happen in ft_del_tpg and tport->tpg will reference
      a deleted object.
      
      This patch sets tpg->tport in ft_tport_create, because that's what
      ft_del_tpg checks, and is the only way to get back to the tport to
      clear tport->tpg.
      
      The bug was occuring when:
      
      - lport created, tport (our per-lport, per-provider context) is
        allocated.
        tport->tpg = NULL
      - tpg created
      - a PRLI is received. ft_tport_create is called, tpg is found and
        tport->tpg is set
      - tpg removed. ft_tpg is freed in ft_del_tpg. Since tpg->tport was not
        set, tport->tpg is not cleared and points at freed memory
      - Future calls to ft_tport_create return tport via first conditional,
        instead of searching for new tpg by calling ft_lport_find_tpg.
        tport->tpg is still invalid, and will access freed memory.
      
      see https://bugzilla.redhat.com/show_bug.cgi?id=1071340Signed-off-by: default avatarAndy Grover <agrover@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6ff2fec9
    • H. Peter Anvin's avatar
      x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels · a862b5c4
      H. Peter Anvin authored
      commit b3b42ac2 upstream.
      
      The IRET instruction, when returning to a 16-bit segment, only
      restores the bottom 16 bits of the user space stack pointer.  We have
      a software workaround for that ("espfix") for the 32-bit kernel, but
      it relies on a nonzero stack segment base which is not available in
      32-bit mode.
      
      Since 16-bit support is somewhat crippled anyway on a 64-bit kernel
      (no V86 mode), and most (if not quite all) 64-bit processors support
      virtualization for the users who really need it, simply reject
      attempts at creating a 16-bit segment when running on top of a 64-bit
      kernel.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/n/tip-kicdm89kzw9lldryb1br9od0@git.kernel.orgSigned-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a862b5c4
    • Rafał Miłecki's avatar
      b43: Fix machine check error due to improper access of B43_MMIO_PSM_PHY_HDR · 68289dd3
      Rafał Miłecki authored
      commit 12cd43c6 upstream.
      
      Register B43_MMIO_PSM_PHY_HDR is 16 bit one, so accessing it with 32b
      functions isn't safe. On my machine it causes delayed (!) CPU exception:
      
      Disabling lock debugging due to kernel taint
      mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 4: b200000000070f0f
      mce: [Hardware Error]: TSC 164083803dc
      mce: [Hardware Error]: PROCESSOR 2:20fc2 TIME 1396650505 SOCKET 0 APIC 0 microcode 0
      mce: [Hardware Error]: Run the above through 'mcelog --ascii'
      mce: [Hardware Error]: Machine check: Processor context corrupt
      Kernel panic - not syncing: Fatal machine check on current CPU
      Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
      Signed-off-by: default avatarRafał Miłecki <zajec5@gmail.com>
      Acked-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      68289dd3
    • Jens Axboe's avatar
      lib/percpu_counter.c: fix bad percpu counter state during suspend · 1f9df430
      Jens Axboe authored
      commit e39435ce upstream.
      
      I got a bug report yesterday from Laszlo Ersek in which he states that
      his kvm instance fails to suspend.  Laszlo bisected it down to this
      commit 1cf7e9c6 ("virtio_blk: blk-mq support") where virtio-blk is
      converted to use the blk-mq infrastructure.
      
      After digging a bit, it became clear that the issue was with the queue
      drain.  blk-mq tracks queue usage in a percpu counter, which is
      incremented on request alloc and decremented when the request is freed.
      The initial hunt was for an inconsistency in blk-mq, but everything
      seemed fine.  In fact, the counter only returned crazy values when
      suspend was in progress.
      
      When a CPU is unplugged, the percpu counters merges that CPU state with
      the general state.  blk-mq takes care to register a hotcpu notifier with
      the appropriate priority, so we know it runs after the percpu counter
      notifier.  However, the percpu counter notifier only merges the state
      when the CPU is fully gone.  This leaves a state transition where the
      CPU going away is no longer in the online mask, yet it still holds
      private values.  This means that in this state, percpu_counter_sum()
      returns invalid results, and the suspend then hangs waiting for
      abs(dead-cpu-value) requests to complete which of course will never
      happen.
      
      Fix this by clearing the state earlier, so we never have a case where
      the CPU isn't in online mask but still holds private state.  This bug
      has been there since forever, I guess we don't have a lot of users where
      percpu counters needs to be reliable during the suspend cycle.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Reported-by: default avatarLaszlo Ersek <lersek@redhat.com>
      Tested-by: default avatarLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1f9df430
    • Takashi Iwai's avatar
      ALSA: ice1712: Fix boundary checks in PCM pointer ops · 2a803e14
      Takashi Iwai authored
      commit 4f8e9400 upstream.
      
      PCM pointer callbacks in ice1712 driver check the buffer size boundary
      wrongly between bytes and frames.  This leads to PCM core warnings
      like:
         snd_pcm_update_hw_ptr0: 105 callbacks suppressed
         ALSA pcm_lib.c:352 BUG: pcmC3D0c:0, pos = 5461, buffer size = 5461, period size = 2730
      
      This patch fixes these checks to be placed after the proper unit
      conversions.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      2a803e14
    • Oleg Nesterov's avatar
      wait: fix reparent_leader() vs EXIT_DEAD->EXIT_ZOMBIE race · a817fbac
      Oleg Nesterov authored
      commit dfccbb5e upstream.
      
      wait_task_zombie() first does EXIT_ZOMBIE->EXIT_DEAD transition and
      drops tasklist_lock.  If this task is not the natural child and it is
      traced, we change its state back to EXIT_ZOMBIE for ->real_parent.
      
      The last transition is racy, this is even documented in 50b8d257
      "ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE
      race".  wait_consider_task() tries to detect this transition and clear
      ->notask_error but we can't rely on ptrace_reparented(), debugger can
      exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE.
      
      And there is another problem which were missed before: this transition
      can also race with reparent_leader() which doesn't reset >exit_signal if
      EXIT_DEAD, assuming that this task must be reaped by someone else.  So
      the tracee can be re-parented with ->exit_signal != SIGCHLD, and if
      /sbin/init doesn't use __WALL it becomes unreapable.
      
      Change reparent_leader() to update ->exit_signal even if EXIT_DEAD.
      Note: this is the simple temporary hack for -stable, it doesn't try to
      solve all problems, it will be reverted by the next changes.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reported-by: default avatarJan Kratochvil <jan.kratochvil@redhat.com>
      Reported-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Tested-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Lennart Poettering <lpoetter@redhat.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a817fbac
    • Mizuma, Masayoshi's avatar
      mm: hugetlb: fix softlockup when a large number of hugepages are freed. · 057d8772
      Mizuma, Masayoshi authored
      commit 55f67141 upstream.
      
      When I decrease the value of nr_hugepage in procfs a lot, softlockup
      happens.  It is because there is no chance of context switch during this
      process.
      
      On the other hand, when I allocate a large number of hugepages, there is
      some chance of context switch.  Hence softlockup doesn't happen during
      this process.  So it's necessary to add the context switch in the
      freeing process as same as allocating process to avoid softlockup.
      
      When I freed 12 TB hugapages with kernel-2.6.32-358.el6, the freeing
      process occupied a CPU over 150 seconds and following softlockup message
      appeared twice or more.
      
      $ echo 6000000 > /proc/sys/vm/nr_hugepages
      $ cat /proc/sys/vm/nr_hugepages
      6000000
      $ grep ^Huge /proc/meminfo
      HugePages_Total:   6000000
      HugePages_Free:    6000000
      HugePages_Rsvd:        0
      HugePages_Surp:        0
      Hugepagesize:       2048 kB
      $ echo 0 > /proc/sys/vm/nr_hugepages
      
      BUG: soft lockup - CPU#16 stuck for 67s! [sh:12883] ...
      Pid: 12883, comm: sh Not tainted 2.6.32-358.el6.x86_64 #1
      Call Trace:
        free_pool_huge_page+0xb8/0xd0
        set_max_huge_pages+0x128/0x190
        hugetlb_sysctl_handler_common+0x113/0x140
        hugetlb_sysctl_handler+0x1e/0x20
        proc_sys_call_handler+0x97/0xd0
        proc_sys_write+0x14/0x20
        vfs_write+0xb8/0x1a0
        sys_write+0x51/0x90
        __audit_syscall_exit+0x265/0x290
        system_call_fastpath+0x16/0x1b
      
      I have not confirmed this problem with upstream kernels because I am not
      able to prepare the machine equipped with 12TB memory now.  However I
      confirmed that the amount of decreasing hugepages was directly
      proportional to the amount of required time.
      
      I measured required times on a smaller machine.  It showed 130-145
      hugepages decreased in a millisecond.
      
        Amount of decreasing     Required time      Decreasing rate
        hugepages                     (msec)         (pages/msec)
        ------------------------------------------------------------
        10,000 pages == 20GB         70 -  74          135-142
        30,000 pages == 60GB        208 - 229          131-144
      
      It means decrement of 6TB hugepages will trigger softlockup with the
      default threshold 20sec, in this decreasing rate.
      Signed-off-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      057d8772
    • Vlastimil Babka's avatar
      mm: try_to_unmap_cluster() should lock_page() before mlocking · 8e8836ab
      Vlastimil Babka authored
      commit 57e68e9c upstream.
      
      A BUG_ON(!PageLocked) was triggered in mlock_vma_page() by Sasha Levin
      fuzzing with trinity.  The call site try_to_unmap_cluster() does not lock
      the pages other than its check_page parameter (which is already locked).
      
      The BUG_ON in mlock_vma_page() is not documented and its purpose is
      somewhat unclear, but apparently it serializes against page migration,
      which could otherwise fail to transfer the PG_mlocked flag.  This would
      not be fatal, as the page would be eventually encountered again, but
      NR_MLOCK accounting would become distorted nevertheless.  This patch adds
      a comment to the BUG_ON in mlock_vma_page() and munlock_vma_page() to that
      effect.
      
      The call site try_to_unmap_cluster() is fixed so that for page !=
      check_page, trylock_page() is attempted (to avoid possible deadlocks as we
      already have check_page locked) and mlock_vma_page() is performed only
      upon success.  If the page lock cannot be obtained, the page is left
      without PG_mlocked, which is again not a problem in the whole unevictable
      memory design.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.2: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      8e8836ab