1. 02 Apr, 2015 40 commits
    • Sergey Ryazanov's avatar
      ath5k: fix spontaneus AR5312 freezes · 4b9c9e13
      Sergey Ryazanov authored
      commit 8bfae4f9 upstream.
      
      Sometimes while CPU have some load and ath5k doing the wireless
      interface reset the whole WiSoC completely freezes. Set of tests shows
      that using atomic delay function while we wait interface reset helps to
      avoid such freezes.
      
      The easiest way to reproduce this issue: create a station interface,
      start continous scan with wpa_supplicant and load CPU by something. Or
      just create multiple station interfaces and put them all in continous
      scan.
      
      This patch partially reverts the commit 1846ac3d ("ath5k: Use
      usleep_range where possible"), which replaces initial udelay()
      by usleep_range().
      
      I do not know actual source of this issue, but all looks like that HW
      freeze is caused by transaction on internal SoC bus, while wireless
      block is in reset state.
      
      Also I should note that I do not know how many chips are affected, but I
      did not see this issue with chips, other than AR5312.
      
      CC: Jiri Slaby <jirislaby@gmail.com>
      CC: Nick Kossifidis <mickflemm@gmail.com>
      CC: Luis R. Rodriguez <mcgrof@do-not-panic.com>
      Fixes: 1846ac3d ("ath5k: Use usleep_range where possible")
      Reported-by: default avatarChristophe Prevotaux <c.prevotaux@rural-networks.com>
      Tested-by: default avatarChristophe Prevotaux <c.prevotaux@rural-networks.com>
      Tested-by: default avatarEric Bree <ebree@nltinc.com>
      Signed-off-by: default avatarSergey Ryazanov <ryazanov.s.a@gmail.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      4b9c9e13
    • Dan Carpenter's avatar
      ALSA: off by one bug in snd_riptide_joystick_probe() · bf1ea01b
      Dan Carpenter authored
      commit e4940626 upstream.
      
      The problem here is that we check:
      
      	if (dev >= SNDRV_CARDS)
      
      Then we increment "dev".
      
             if (!joystick_port[dev++])
      
      Then we use it as an offset into a array with SNDRV_CARDS elements.
      
      	if (!request_region(joystick_port[dev], 8, "Riptide gameport")) {
      
      This has 3 effects:
      1) If you use the module option to specify the joystick port then it has
         to be shifted one space over.
      2) The wrong error message will be printed on failure if you have over
         32 cards.
      3) Static checkers will correctly complain that are off by one.
      
      Fixes: db1005ec ('ALSA: riptide - Fix joystick resource handling')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      bf1ea01b
    • Sergei Shtylyov's avatar
      clk-gate: fix bit # check in clk_register_gate() · 4b739bce
      Sergei Shtylyov authored
      commit 2e9dcdae upstream.
      
      In case CLK_GATE_HIWORD_MASK flag is passed to clk_register_gate(), the bit #
      should be no higher than 15, however the corresponding check is obviously off-
      by-one.
      
      Fixes: 04577994 ("clk: gate: add CLK_GATE_HIWORD_MASK")
      Signed-off-by: default avatarSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: default avatarMichael Turquette <mturquette@linaro.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      4b739bce
    • Martin KaFai Lau's avatar
      ipv6: fix ipv6_cow_metrics for non DST_HOST case · 9aaaff5f
      Martin KaFai Lau authored
      commit 3b471175 upstream.
      
      ipv6_cow_metrics() currently assumes only DST_HOST routes require
      dynamic metrics allocation from inetpeer.  The assumption breaks
      when ndisc discovered router with RTAX_MTU and RTAX_HOPLIMIT metric.
      Refer to ndisc_router_discovery() in ndisc.c and note that dst_metric_set()
      is called after the route is created.
      
      This patch creates the metrics array (by calling dst_cow_metrics_generic) in
      ipv6_cow_metrics().
      
      Test:
      radvd.conf:
      interface qemubr0
      {
      	AdvLinkMTU 1300;
      	AdvCurHopLimit 30;
      
      	prefix fd00:face:face:face::/64
      	{
      		AdvOnLink on;
      		AdvAutonomous on;
      		AdvRouterAddr off;
      	};
      };
      
      Before:
      [root@qemu1 ~]# ip -6 r show | egrep -v unreachable
      fd00:face:face:face::/64 dev eth0  proto kernel  metric 256  expires 27sec
      fe80::/64 dev eth0  proto kernel  metric 256
      default via fe80::74df:d0ff:fe23:8ef2 dev eth0  proto ra  metric 1024  expires 27sec
      
      After:
      [root@qemu1 ~]# ip -6 r show | egrep -v unreachable
      fd00:face:face:face::/64 dev eth0  proto kernel  metric 256  expires 27sec mtu 1300
      fe80::/64 dev eth0  proto kernel  metric 256  mtu 1300
      default via fe80::74df:d0ff:fe23:8ef2 dev eth0  proto ra  metric 1024  expires 27sec mtu 1300 hoplimit 30
      
      Fixes: 8e2ec639 (ipv6: don't use inetpeer to store metrics for routes.)
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9aaaff5f
    • Sabrina Dubroca's avatar
      pktgen: fix UDP checksum computation · e82fe130
      Sabrina Dubroca authored
      commit 7744b5f3 upstream.
      
      This patch fixes two issues in UDP checksum computation in pktgen.
      
      First, the pseudo-header uses the source and destination IP
      addresses. Currently, the ports are used for IPv4.
      
      Second, the UDP checksum covers both header and data.  So we need to
      generate the data earlier (move pktgen_finalize_skb up), and compute
      the checksum for UDP header + data.
      
      Fixes: c26bf4a5 ("pktgen: Add UDPCSUM flag to support UDP checksums")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e82fe130
    • Al Viro's avatar
      autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation · 4dc44507
      Al Viro authored
      commit 0a280962 upstream.
      
      X-Coverup: just ask spender
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      4dc44507
    • Al Viro's avatar
      procfs: fix race between symlink removals and traversals · 2af59e5f
      Al Viro authored
      commit 7e0e953b upstream.
      
      use_pde()/unuse_pde() in ->follow_link()/->put_link() resp.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      2af59e5f
    • Thadeu Lima de Souza Cascardo's avatar
      blk-throttle: check stats_cpu before reading it from sysfs · 350337c6
      Thadeu Lima de Souza Cascardo authored
      commit 045c47ca upstream.
      
      When reading blkio.throttle.io_serviced in a recently created blkio
      cgroup, it's possible to race against the creation of a throttle policy,
      which delays the allocation of stats_cpu.
      
      Like other functions in the throttle code, just checking for a NULL
      stats_cpu prevents the following oops caused by that race.
      
      [ 1117.285199] Unable to handle kernel paging request for data at address 0x7fb4d0020
      [ 1117.285252] Faulting instruction address: 0xc0000000003efa2c
      [ 1137.733921] Oops: Kernel access of bad area, sig: 11 [#1]
      [ 1137.733945] SMP NR_CPUS=2048 NUMA PowerNV
      [ 1137.734025] Modules linked in: bridge stp llc kvm_hv kvm binfmt_misc autofs4
      [ 1137.734102] CPU: 3 PID: 5302 Comm: blkcgroup Not tainted 3.19.0 #5
      [ 1137.734132] task: c000000f1d188b00 ti: c000000f1d210000 task.ti: c000000f1d210000
      [ 1137.734167] NIP: c0000000003efa2c LR: c0000000003ef9f0 CTR: c0000000003ef980
      [ 1137.734202] REGS: c000000f1d213500 TRAP: 0300   Not tainted  (3.19.0)
      [ 1137.734230] MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR: 42008884  XER: 20000000
      [ 1137.734325] CFAR: 0000000000008458 DAR: 00000007fb4d0020 DSISR: 40000000 SOFTE: 0
      GPR00: c0000000003ed3a0 c000000f1d213780 c000000000c59538 0000000000000000
      GPR04: 0000000000000800 0000000000000000 0000000000000000 0000000000000000
      GPR08: ffffffffffffffff 00000007fb4d0020 00000007fb4d0000 c000000000780808
      GPR12: 0000000022000888 c00000000fdc0d80 0000000000000000 0000000000000000
      GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      GPR20: 000001003e120200 c000000f1d5b0cc0 0000000000000200 0000000000000000
      GPR24: 0000000000000001 c000000000c269e0 0000000000000020 c000000f1d5b0c80
      GPR28: c000000000ca3a08 c000000000ca3dec c000000f1c667e00 c000000f1d213850
      [ 1137.734886] NIP [c0000000003efa2c] .tg_prfill_cpu_rwstat+0xac/0x180
      [ 1137.734915] LR [c0000000003ef9f0] .tg_prfill_cpu_rwstat+0x70/0x180
      [ 1137.734943] Call Trace:
      [ 1137.734952] [c000000f1d213780] [d000000005560520] 0xd000000005560520 (unreliable)
      [ 1137.734996] [c000000f1d2138a0] [c0000000003ed3a0] .blkcg_print_blkgs+0xe0/0x1a0
      [ 1137.735039] [c000000f1d213960] [c0000000003efb50] .tg_print_cpu_rwstat+0x50/0x70
      [ 1137.735082] [c000000f1d2139e0] [c000000000104b48] .cgroup_seqfile_show+0x58/0x150
      [ 1137.735125] [c000000f1d213a70] [c0000000002749dc] .kernfs_seq_show+0x3c/0x50
      [ 1137.735161] [c000000f1d213ae0] [c000000000218630] .seq_read+0xe0/0x510
      [ 1137.735197] [c000000f1d213bd0] [c000000000275b04] .kernfs_fop_read+0x164/0x200
      [ 1137.735240] [c000000f1d213c80] [c0000000001eb8e0] .__vfs_read+0x30/0x80
      [ 1137.735276] [c000000f1d213cf0] [c0000000001eb9c4] .vfs_read+0x94/0x1b0
      [ 1137.735312] [c000000f1d213d90] [c0000000001ebb38] .SyS_read+0x58/0x100
      [ 1137.735349] [c000000f1d213e30] [c000000000009218] syscall_exit+0x0/0x98
      [ 1137.735383] Instruction dump:
      [ 1137.735405] 7c6307b4 7f891800 409d00b8 60000000 60420000 3d420004 392a63b0 786a1f24
      [ 1137.735471] 7d49502a e93e01c8 7d495214 7d2ad214 <7cead02a> e9090008 e9490010 e9290018
      
      And here is one code that allows to easily reproduce this, although this
      has first been found by running docker.
      
      void run(pid_t pid)
      {
      	int n;
      	int status;
      	int fd;
      	char *buffer;
      	buffer = memalign(BUFFER_ALIGN, BUFFER_SIZE);
      	n = snprintf(buffer, BUFFER_SIZE, "%d\n", pid);
      	fd = open(CGPATH "/test/tasks", O_WRONLY);
      	write(fd, buffer, n);
      	close(fd);
      	if (fork() > 0) {
      		fd = open("/dev/sda", O_RDONLY | O_DIRECT);
      		read(fd, buffer, 512);
      		close(fd);
      		wait(&status);
      	} else {
      		fd = open(CGPATH "/test/blkio.throttle.io_serviced", O_RDONLY);
      		n = read(fd, buffer, BUFFER_SIZE);
      		close(fd);
      	}
      	free(buffer);
      	exit(0);
      }
      
      void test(void)
      {
      	int status;
      	mkdir(CGPATH "/test", 0666);
      	if (fork() > 0)
      		wait(&status);
      	else
      		run(getpid());
      	rmdir(CGPATH "/test");
      }
      
      int main(int argc, char **argv)
      {
      	int i;
      	for (i = 0; i < NR_TESTS; i++)
      		test();
      	return 0;
      }
      Reported-by: default avatarRicardo Marin Matinata <rmm@br.ibm.com>
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      350337c6
    • Jay Lan's avatar
      kdb: fix incorrect counts in KDB summary command output · ff39336d
      Jay Lan authored
      commit 14675592 upstream.
      
      The output of KDB 'summary' command should report MemTotal, MemFree
      and Buffers output in kB. Current codes report in unit of pages.
      
      A define of K(x) as
      is defined in the code, but not used.
      
      This patch would apply the define to convert the values to kB.
      Please include me on Cc on replies. I do not subscribe to linux-kernel.
      Signed-off-by: default avatarJay Lan <jlan@sgi.com>
      Signed-off-by: default avatarJason Wessel <jason.wessel@windriver.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ff39336d
    • James Hogan's avatar
      MIPS: Export FP functions used by lose_fpu(1) for KVM · 90ff35e0
      James Hogan authored
      [ Upstream commit 3ce465e0 ]
      
      Export the _save_fp asm function used by the lose_fpu(1) macro to GPL
      modules so that KVM can make use of it when it is built as a module.
      
      This fixes the following build error when CONFIG_KVM=m due to commit
      f798217d ("KVM: MIPS: Don't leak FPU/DSP to guest"):
      
      ERROR: "_save_fp" [arch/mips/kvm/kvm.ko] undefined!
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Fixes: f798217d (KVM: MIPS: Don't leak FPU/DSP to guest)
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/9260/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      [james.hogan@imgtec.com: Only export when CPU_R4K_FPU=y prior to v3.16,
       so as not to break the Octeon build which excludes FPU support. KVM
       depends on MIPS32r2 anyway.]
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      90ff35e0
    • Ilya Dryomov's avatar
      libceph: fix double __remove_osd() problem · 55452286
      Ilya Dryomov authored
      commit 7eb71e03 upstream.
      
      It turns out it's possible to get __remove_osd() called twice on the
      same OSD.  That doesn't sit well with rb_erase() - depending on the
      shape of the tree we can get a NULL dereference, a soft lockup or
      a random crash at some point in the future as we end up touching freed
      memory.  One scenario that I was able to reproduce is as follows:
      
                  <osd3 is idle, on the osd lru list>
      <con reset - osd3>
      con_fault_finish()
        osd_reset()
                                    <osdmap - osd3 down>
                                    ceph_osdc_handle_map()
                                      <takes map_sem>
                                      kick_requests()
                                        <takes request_mutex>
                                        reset_changed_osds()
                                          __reset_osd()
                                            __remove_osd()
                                        <releases request_mutex>
                                      <releases map_sem>
          <takes map_sem>
          <takes request_mutex>
          __kick_osd_requests()
            __reset_osd()
              __remove_osd() <-- !!!
      
      A case can be made that osd refcounting is imperfect and reworking it
      would be a proper resolution, but for now Sage and I decided to fix
      this by adding a safe guard around __remove_osd().
      
      Fixes: http://tracker.ceph.com/issues/8087
      
      Cc: Sage Weil <sage@redhat.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      55452286
    • Ilya Dryomov's avatar
      libceph: change from BUG to WARN for __remove_osd() asserts · 1c128bc4
      Ilya Dryomov authored
      commit cc9f1f51 upstream.
      
      No reason to use BUG_ON for osd request list assertions.
      Signed-off-by: default avatarIlya Dryomov <idryomov@redhat.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1c128bc4
    • Ilya Dryomov's avatar
      libceph: assert both regular and lingering lists in __remove_osd() · 10c2b4aa
      Ilya Dryomov authored
      commit 7c6e6fc5 upstream.
      
      It is important that both regular and lingering requests lists are
      empty when the OSD is removed.
      Signed-off-by: default avatarIlya Dryomov <ilya.dryomov@inktank.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      10c2b4aa
    • Arnd Bergmann's avatar
      cpufreq: s3c: remove incorrect __init annotations · 8ac96be7
      Arnd Bergmann authored
      commit 61882b63 upstream.
      
      The two functions s3c2416_cpufreq_driver_init and s3c_cpufreq_register
      are marked init but are called from a context that might be run after
      the __init sections are discarded, as the compiler points out:
      
      WARNING: vmlinux.o(.data+0x1ad9dc): Section mismatch in reference from the variable s3c2416_cpufreq_driver to the function .init.text:s3c2416_cpufreq_driver_init()
      WARNING: drivers/built-in.o(.text+0x35b5dc): Section mismatch in reference from the function s3c2410a_cpufreq_add() to the function .init.text:s3c_cpufreq_register()
      
      This removes the __init markings.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8ac96be7
    • Mikulas Patocka's avatar
      dm snapshot: fix a possible invalid memory access on unload · 0ef16211
      Mikulas Patocka authored
      commit 22aa66a3 upstream.
      
      When the snapshot target is unloaded, snapshot_dtr() waits until
      pending_exceptions_count drops to zero.  Then, it destroys the snapshot.
      Therefore, the function that decrements pending_exceptions_count
      should not touch the snapshot structure after the decrement.
      
      pending_complete() calls free_pending_exception(), which decrements
      pending_exceptions_count, and then it performs up_write(&s->lock) and it
      calls retry_origin_bios() which dereferences  s->origin.  These two
      memory accesses to the fields of the snapshot may touch the dm_snapshot
      struture after it is freed.
      
      This patch moves the call to free_pending_exception() to the end of
      pending_complete(), so that the snapshot will not be destroyed while
      pending_complete() is in progress.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      0ef16211
    • Mikulas Patocka's avatar
      dm: fix a race condition in dm_get_md · 78ef47d9
      Mikulas Patocka authored
      commit 2bec1f4a upstream.
      
      The function dm_get_md finds a device mapper device with a given dev_t,
      increases the reference count and returns the pointer.
      
      dm_get_md calls dm_find_md, dm_find_md takes _minor_lock, finds the
      device, tests that the device doesn't have DMF_DELETING or DMF_FREEING
      flag, drops _minor_lock and returns pointer to the device. dm_get_md then
      calls dm_get. dm_get calls BUG if the device has the DMF_FREEING flag,
      otherwise it increments the reference count.
      
      There is a possible race condition - after dm_find_md exits and before
      dm_get is called, there are no locks held, so the device may disappear or
      DMF_FREEING flag may be set, which results in BUG.
      
      To fix this bug, we need to call dm_get while we hold _minor_lock. This
      patch renames dm_find_md to dm_get_md and changes it so that it calls
      dm_get while holding the lock.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      78ef47d9
    • NeilBrown's avatar
      md/raid5: Fix livelock when array is both resyncing and degraded. · 83b97d9f
      NeilBrown authored
      commit 26ac1073 upstream.
      
      Commit a7854487:
        md: When RAID5 is dirty, force reconstruct-write instead of read-modify-write.
      
      Causes an RCW cycle to be forced even when the array is degraded.
      A degraded array cannot support RCW as that requires reading all data
      blocks, and one may be missing.
      
      Forcing an RCW when it is not possible causes a live-lock and the code
      spins, repeatedly deciding to do something that cannot succeed.
      
      So change the condition to only force RCW on non-degraded arrays.
      Reported-by: default avatarManibalan P <pmanibalan@amiindia.co.in>
      Bisected-by: default avatarJes Sorensen <Jes.Sorensen@redhat.com>
      Tested-by: default avatarJes Sorensen <Jes.Sorensen@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Fixes: a7854487Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      83b97d9f
    • Mitko Haralanov's avatar
      IB/qib: Do not write EEPROM · 048e989d
      Mitko Haralanov authored
      commit 18c0b82a upstream.
      
      This changeset removes all the code that allows the driver to write to
      the EEPROM and update the recorded error counters and power on hours.
      
      These two stats are unused and writing them exposes a timing risk
      which could leave the EEPROM in a bad state preventing further normal
      operation of the HCA.
      Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarMitko Haralanov <mitko.haralanov@intel.com>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      048e989d
    • Tony Battersby's avatar
      sg: fix read() error reporting · 821b2180
      Tony Battersby authored
      commit 3b524a68 upstream.
      
      Fix SCSI generic read() incorrectly returning success after detecting an
      error.
      Signed-off-by: default avatarTony Battersby <tonyb@cybernetics.com>
      Acked-by: default avatarDouglas Gilbert <dgilbert@interlog.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      821b2180
    • Minh Duc Tran's avatar
      fixed invalid assignment of 64bit mask to host dma_boundary for scatter gather... · 8eea6edb
      Minh Duc Tran authored
      fixed invalid assignment of 64bit mask to host dma_boundary for scatter gather segment boundary limit.
      
      commit f76a610a upstream.
      
      In reference to bug https://bugzilla.redhat.com/show_bug.cgi?id=1097141
      Assert is seen with AMD cpu whenever calling pci_alloc_consistent.
      
      [   29.406183] ------------[ cut here ]------------
      [   29.410505] kernel BUG at lib/iommu-helper.c:13!
      Signed-off-by: default avatarMinh Tran <minh.tran@emulex.com>
      Fixes: 6733b39aSigned-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8eea6edb
    • honclo's avatar
      Added Little Endian support to vtpm module · 14e53aea
      honclo authored
      commit eb71f8a5 upstream.
      
      The tpm_ibmvtpm module is affected by an unaligned access problem.
      ibmvtpm_crq_get_version failed with rc=-4 during boot when vTPM is
      enabled in Power partition, which supports both little endian and
      big endian modes.
      
      We added little endian support to fix this problem:
      1) added cpu_to_be64 calls to ensure BE data is sent from an LE OS.
      2) added be16_to_cpu and be32_to_cpu calls to make sure data received
         is in LE format on a LE OS.
      Signed-off-by: default avatarHon Ching(Vicky) Lo <honclo@linux.vnet.ibm.com>
      Signed-off-by: default avatarJoy Latten <jmlatten@linux.vnet.ibm.com>
      [phuewe: manually applied the patch :( ]
      Reviewed-by: default avatarAshley Lai <ashley@ahsleylai.com>
      Signed-off-by: default avatarPeter Huewe <peterhuewe@gmx.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      14e53aea
    • Filipe Manana's avatar
      Btrfs: fix fsync data loss after adding hard link to inode · a7eaec71
      Filipe Manana authored
      commit 1a4bcf47 upstream.
      
      We have a scenario where after the fsync log replay we can lose file data
      that had been previously fsync'ed if we added an hard link for our inode
      and after that we sync'ed the fsync log (for example by fsync'ing some
      other file or directory).
      
      This is because when adding an hard link we updated the inode item in the
      log tree with an i_size value of 0. At that point the new inode item was
      in memory only and a subsequent fsync log replay would not make us lose
      the file data. However if after adding the hard link we sync the log tree
      to disk, by fsync'ing some other file or directory for example, we ended
      up losing the file data after log replay, because the inode item in the
      persisted log tree had an an i_size of zero.
      
      This is easy to reproduce, and the following excerpt from my test for
      xfstests shows this:
      
        _scratch_mkfs >> $seqres.full 2>&1
        _init_flakey
        _mount_flakey
      
        # Create one file with data and fsync it.
        # This made the btrfs fsync log persist the data and the inode metadata with
        # a correct inode->i_size (4096 bytes).
        $XFS_IO_PROG -f -c "pwrite -S 0xaa -b 4K 0 4K" -c "fsync" \
             $SCRATCH_MNT/foo | _filter_xfs_io
      
        # Now add one hard link to our file. This made the btrfs code update the fsync
        # log, in memory only, with an inode metadata having a size of 0.
        ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link
      
        # Now force persistence of the fsync log to disk, for example, by fsyncing some
        # other file.
        touch $SCRATCH_MNT/bar
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar
      
        # Before a power loss or crash, we could read the 4Kb of data from our file as
        # expected.
        echo "File content before:"
        od -t x1 $SCRATCH_MNT/foo
      
        # Simulate a crash/power loss.
        _load_flakey_table $FLAKEY_DROP_WRITES
        _unmount_flakey
      
        _load_flakey_table $FLAKEY_ALLOW_WRITES
        _mount_flakey
      
        # After the fsync log replay, because the fsync log had a value of 0 for our
        # inode's i_size, we couldn't read anymore the 4Kb of data that we previously
        # wrote and fsync'ed. The size of the file became 0 after the fsync log replay.
        echo "File content after:"
        od -t x1 $SCRATCH_MNT/foo
      
      Another alternative test, that doesn't need to fsync an inode in the same
      transaction it was created, is:
      
        _scratch_mkfs >> $seqres.full 2>&1
        _init_flakey
        _mount_flakey
      
        # Create our test file with some data.
        $XFS_IO_PROG -f -c "pwrite -S 0xaa -b 8K 0 8K" \
             $SCRATCH_MNT/foo | _filter_xfs_io
      
        # Make sure the file is durably persisted.
        sync
      
        # Append some data to our file, to increase its size.
        $XFS_IO_PROG -f -c "pwrite -S 0xcc -b 4K 8K 4K" \
             $SCRATCH_MNT/foo | _filter_xfs_io
      
        # Fsync the file, so from this point on if a crash/power failure happens, our
        # new data is guaranteed to be there next time the fs is mounted.
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo
      
        # Add one hard link to our file. This made btrfs write into the in memory fsync
        # log a special inode with generation 0 and an i_size of 0 too. Note that this
        # didn't update the inode in the fsync log on disk.
        ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link
      
        # Now make sure the in memory fsync log is durably persisted.
        # Creating and fsync'ing another file will do it.
        touch $SCRATCH_MNT/bar
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar
      
        # As expected, before the crash/power failure, we should be able to read the
        # 12Kb of file data.
        echo "File content before:"
        od -t x1 $SCRATCH_MNT/foo
      
        # Simulate a crash/power loss.
        _load_flakey_table $FLAKEY_DROP_WRITES
        _unmount_flakey
      
        _load_flakey_table $FLAKEY_ALLOW_WRITES
        _mount_flakey
      
        # After mounting the fs again, the fsync log was replayed.
        # The btrfs fsync log replay code didn't update the i_size of the persisted
        # inode because the inode item in the log had a special generation with a
        # value of 0 (and it couldn't know the correct i_size, since that inode item
        # had a 0 i_size too). This made the last 4Kb of file data inaccessible and
        # effectively lost.
        echo "File content after:"
        od -t x1 $SCRATCH_MNT/foo
      
      This isn't a new issue/regression. This problem has been around since the
      log tree code was added in 2008:
      
        Btrfs: Add a write ahead tree log to optimize synchronous operations
        (commit e02119d5)
      
      Test cases for xfstests follow soon.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      [ kamal: backport to 3.13-stable: context ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      a7eaec71
    • Nicholas Bellinger's avatar
      target: Check for LBA + sectors wrap-around in sbc_parse_cdb · 6a73242c
      Nicholas Bellinger authored
      commit aa179935 upstream.
      
      This patch adds a check to sbc_parse_cdb() in order to detect when
      an LBA + sector vs. end-of-device calculation wraps when the LBA is
      sufficently large enough (eg: 0xFFFFFFFFFFFFFFFF).
      
      Cc: Martin Petersen <martin.petersen@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      6a73242c
    • Nicholas Bellinger's avatar
      target: Add missing WRITE_SAME end-of-device sanity check · 87d06fbc
      Nicholas Bellinger authored
      commit 8e575c50 upstream.
      
      This patch adds a check to sbc_setup_write_same() to verify
      the incoming WRITE_SAME LBA + number of blocks does not exceed
      past the end-of-device.
      
      Also check for potential LBA wrap-around as well.
      Reported-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Martin Petersen <martin.petersen@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      87d06fbc
    • Darrick J. Wong's avatar
      dm io: reject unsupported DISCARD requests with EOPNOTSUPP · 3906ffcc
      Darrick J. Wong authored
      commit 37527b86 upstream.
      
      I created a dm-raid1 device backed by a device that supports DISCARD
      and another device that does NOT support DISCARD with the following
      dm configuration:
      
       #  echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo
       # lsblk -D
       NAME         DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
       sda                 0        4K       1G         0
       `-moo (dm-0)        0        4K       1G         0
       sdb                 0        0B       0B         0
       `-moo (dm-0)        0        4K       1G         0
      
      Notice that the mirror device /dev/mapper/moo advertises DISCARD
      support even though one of the mirror halves doesn't.
      
      If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl
      BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite
      loop in do_region() when it tries to issue a DISCARD request to sdb.
      The problem is that when we call do_region() against sdb, num_sectors
      is set to zero because q->limits.max_discard_sectors is zero.
      Therefore, "remaining" never decreases and the loop never terminates.
      
      To fix this: before entering the loop, check for the combination of
      REQ_DISCARD and no discard and return -EOPNOTSUPP to avoid hanging up
      the mirror device.
      
      This bug was found by the unfortunate coincidence of pvmove and a
      discard operation in the RHEL 6.5 kernel; upstream is also affected.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Acked-by: default avatar"Martin K. Petersen" <martin.petersen@oracle.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3906ffcc
    • Mikulas Patocka's avatar
      dm mirror: do not degrade the mirror on discard error · 5a95abb4
      Mikulas Patocka authored
      commit f2ed51ac upstream.
      
      It may be possible that a device claims discard support but it rejects
      discards with -EOPNOTSUPP.  It happens when using loopback on ext2/ext3
      filesystem driven by the ext4 driver.  It may also happen if the
      underlying devices are moved from one disk on another.
      
      If discard error happens, we reject the bio with -EOPNOTSUPP, but we do
      not degrade the array.
      
      This patch fixes failed test shell/lvconvert-repair-transient.sh in the
      lvm2 testsuite if the testsuite is extracted on an ext2 or ext3
      filesystem and it is being driven by the ext4 driver.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      5a95abb4
    • Chen Jie's avatar
      jffs2: fix handling of corrupted summary length · 66e5e860
      Chen Jie authored
      commit 164c2406 upstream.
      
      sm->offset maybe wrong but magic maybe right, the offset do not have CRC.
      
      Badness at c00c7580 [verbose debug info unavailable]
      NIP: c00c7580 LR: c00c718c CTR: 00000014
      REGS: df07bb40 TRAP: 0700   Not tainted  (2.6.34.13-WR4.3.0.0_standard)
      MSR: 00029000 <EE,ME,CE>  CR: 22084f84  XER: 00000000
      TASK = df84d6e0[908] 'mount' THREAD: df07a000
      GPR00: 00000001 df07bbf0 df84d6e0 00000000 00000001 00000000 df07bb58 00000041
      GPR08: 00000041 c0638860 00000000 00000010 22084f88 100636c8 df814ff8 00000000
      GPR16: df84d6e0 dfa558cc c05adb90 00000048 c0452d30 00000000 000240d0 000040d0
      GPR24: 00000014 c05ae734 c05be2e0 00000000 00000001 00000000 00000000 c05ae730
      NIP [c00c7580] __alloc_pages_nodemask+0x4d0/0x638
      LR [c00c718c] __alloc_pages_nodemask+0xdc/0x638
      Call Trace:
      [df07bbf0] [c00c718c] __alloc_pages_nodemask+0xdc/0x638 (unreliable)
      [df07bc90] [c00c7708] __get_free_pages+0x20/0x48
      [df07bca0] [c00f4a40] __kmalloc+0x15c/0x1ec
      [df07bcd0] [c01fc880] jffs2_scan_medium+0xa58/0x14d0
      [df07bd70] [c01ff38c] jffs2_do_mount_fs+0x1f4/0x6b4
      [df07bdb0] [c020144c] jffs2_do_fill_super+0xa8/0x260
      [df07bdd0] [c020230c] jffs2_fill_super+0x104/0x184
      [df07be00] [c0335814] get_sb_mtd_aux+0x9c/0xec
      [df07be20] [c033596c] get_sb_mtd+0x84/0x1e8
      [df07be60] [c0201ed0] jffs2_get_sb+0x1c/0x2c
      [df07be70] [c0103898] vfs_kern_mount+0x78/0x1e8
      [df07bea0] [c0103a58] do_kern_mount+0x40/0x100
      [df07bec0] [c011fe90] do_mount+0x240/0x890
      [df07bf10] [c0120570] sys_mount+0x90/0xd8
      [df07bf40] [c00110d8] ret_from_syscall+0x0/0x4
      
      === Exception: c01 at 0xff61a34
          LR = 0x100135f0
      Instruction dump:
      38800005 38600000 48010f41 4bfffe1c 4bfc2d15 4bfffe8c 72e90200 4082fc28
      3d20c064 39298860 8809000d 68000001 <0f000000> 2f800000 419efc0c 38000001
      mount: mounting /dev/mtdblock3 on /common failed: Input/output error
      Signed-off-by: default avatarChen Jie <chenjie6@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      66e5e860
    • Adrian Knoth's avatar
      ALSA: hdspm - Constrain periods to 2 on older cards · 22b9a132
      Adrian Knoth authored
      commit f0153c3d upstream.
      
      RME RayDAT and AIO use a fixed buffer size of 16384 samples. With period
      sizes of 32-4096, this translates to 4-512 periods.
      
      The older RME cards have a variable buffer size but require exactly two
      periods.
      
      This patch enforces nperiods=2 on those cards.
      Signed-off-by: default avatarAdrian Knoth <adi@drcomp.erfurt.thur.de>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      22b9a132
    • Alex Deucher's avatar
      drm/radeon: fix voltage setup on hawaii · 42203150
      Alex Deucher authored
      commit 09b6e85f upstream.
      
      Missing parameter when fetching the real voltage values
      from atom.  Fixes problems with dynamic clocking on
      certain boards.
      
      bug:
      https://bugs.freedesktop.org/show_bug.cgi?id=87457Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      42203150
    • Alex Deucher's avatar
      drm/radeon/dp: Set EDP_CONFIGURATION_SET for bridge chips if necessary · fd46aad5
      Alex Deucher authored
      commit 66c2b84b upstream.
      
      Don't restrict it to just eDP panels.  Some LVDS bridge chips require
      this.  Fixes blank panels on resume on certain laptops.  Noticed
      by mrnuke on IRC.
      
      bug:
      https://bugs.freedesktop.org/show_bug.cgi?id=42960Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      fd46aad5
    • Alexey Brodkin's avatar
      ARC: fix page address calculation if PAGE_OFFSET != LINUX_LINK_BASE · 3509c3e7
      Alexey Brodkin authored
      commit 06f34e1c upstream.
      
      We used to calculate page address differently in 2 cases:
      
      1. In virt_to_page(x) we do
       --->8---
       mem_map + (x - CONFIG_LINUX_LINK_BASE) >> PAGE_SHIFT
       --->8---
      
      2. In in pte_page(x) we do
       --->8---
       mem_map + (pte_val(x) - PAGE_OFFSET) >> PAGE_SHIFT
       --->8---
      
      That leads to problems in case PAGE_OFFSET != CONFIG_LINUX_LINK_BASE -
      different pages will be selected depending on where and how we calculate
      page address.
      
      In particular in the STAR 9000853582 when gdb attempted to read memory
      of another process it got improper page in get_user_pages() because this
      is exactly one of the places where we search for a page by pte_page().
      
      The fix is trivial - we need to calculate page address similarly in both
      cases.
      Signed-off-by: default avatarAlexey Brodkin <abrodkin@synopsys.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3509c3e7
    • Hui Wang's avatar
      ALSA: hda - enable mute led quirk for one more hp machine. · f8de5ab2
      Hui Wang authored
      commit 7976eb49 upstream.
      
      Otherwise, the mute led can't work at all.
      Tested-by: default avatarTaihsiang Ho <taihsiang.ho@canonical.com>
      BugLink: https://bugs.launchpad.net/bugs/1410704Signed-off-by: default avatarHui Wang <hui.wang@canonical.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f8de5ab2
    • Naoya Horiguchi's avatar
      mm: hwpoison: drop lru_add_drain_all() in __soft_offline_page() · 1371f3ff
      Naoya Horiguchi authored
      commit 9ab3b598 upstream.
      
      A race condition starts to be visible in recent mmotm, where a PG_hwpoison
      flag is set on a migration source page *before* it's back in buddy page
      poo= l.
      
      This is problematic because no page flag is supposed to be set when
      freeing (see __free_one_page().) So the user-visible effect of this race
      is that it could trigger the BUG_ON() when soft-offlining is called.
      
      The root cause is that we call lru_add_drain_all() to make sure that the
      page is in buddy, but that doesn't work because this function just
      schedule= s a work item and doesn't wait its completion.
      drain_all_pages() does drainin= g directly, so simply dropping
      lru_add_drain_all() solves this problem.
      
      Fixes: f15bdfa8 ("mm/memory-failure.c: fix memory leak in successful soft offlining")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Chen Gong <gong.chen@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1371f3ff
    • Grazvydas Ignotas's avatar
      mm/memory.c: actually remap enough memory · 63d46868
      Grazvydas Ignotas authored
      commit 9cb12d7b upstream.
      
      For whatever reason, generic_access_phys() only remaps one page, but
      actually allows to access arbitrary size.  It's quite easy to trigger
      large reads, like printing out large structure with gdb, which leads to a
      crash.  Fix it by remapping correct size.
      
      Fixes: 28b2ee20 ("access_process_vm device memory infrastructure")
      Signed-off-by: default avatarGrazvydas Ignotas <notasas@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      63d46868
    • Joonsoo Kim's avatar
      mm/compaction: fix wrong order check in compact_finished() · 26dcca5b
      Joonsoo Kim authored
      commit 372549c2 upstream.
      
      What we want to check here is whether there is highorder freepage in buddy
      list of other migratetype in order to steal it without fragmentation.
      But, current code just checks cc->order which means allocation request
      order.  So, this is wrong.
      
      Without this fix, non-movable synchronous compaction below pageblock order
      would not stopped until compaction is complete, because migratetype of
      most pageblocks are movable and high order freepage made by compaction is
      usually on movable type buddy list.
      
      There is some report related to this bug. See below link.
      
        http://www.spinics.net/lists/linux-mm/msg81666.html
      
      Although the issued system still has load spike comes from compaction,
      this makes that system completely stable and responsive according to his
      report.
      
      stress-highalloc test in mmtests with non movable order 7 allocation
      doesn't show any notable difference in allocation success rate, but, it
      shows more compaction success rate.
      
      Compaction success rate (Compaction success * 100 / Compaction stalls, %)
      18.47 : 28.94
      
      Fixes: 1fb3f8ca ("mm: compaction: capture a suitable high-order page immediately when it is made available")
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      26dcca5b
    • Nicholas Bellinger's avatar
      target: Fix PR_APTPL_BUF_LEN buffer size limitation · 37ebf457
      Nicholas Bellinger authored
      commit f161d4b4 upstream.
      
      This patch addresses the original PR_APTPL_BUF_LEN = 8k limitiation
      for write-out of PR APTPL metadata that Martin has recently been
      running into.
      
      It changes core_scsi3_update_and_write_aptpl() to use vzalloc'ed
      memory instead of kzalloc, and increases the default hardcoded
      length to 256k.
      
      It also adds logic in core_scsi3_update_and_write_aptpl() to double
      the original length upon core_scsi3_update_aptpl_buf() failure, and
      retries until the vzalloc'ed buffer is large enough to accommodate
      the outgoing APTPL metadata.
      Reported-by: default avatarMartin Svec <martin.svec@zoner.cz>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      37ebf457
    • Nicholas Bellinger's avatar
      iscsi-target: Drop problematic active_ts_list usage · 0944bc5f
      Nicholas Bellinger authored
      commit 3fd7b60f upstream.
      
      This patch drops legacy active_ts_list usage within iscsi_target_tq.c
      code.  It was originally used to track the active thread sets during
      iscsi-target shutdown, and is no longer used by modern upstream code.
      
      Two people have reported list corruption using traditional iscsi-target
      and iser-target with the following backtrace, that appears to be related
      to iscsi_thread_set->ts_list being used across both active_ts_list and
      inactive_ts_list.
      
      [   60.782534] ------------[ cut here ]------------
      [   60.782543] WARNING: CPU: 0 PID: 9430 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
      [   60.782545] list_del corruption, ffff88045b00d180->next is LIST_POISON1 (dead000000100100)
      [   60.782546] Modules linked in: ib_srpt tcm_qla2xxx qla2xxx tcm_loop tcm_fc libfc scsi_transport_fc scsi_tgt ib_isert rdma_cm iw_cm ib_addr iscsi_target_mod target_core_pscsi target_core_file target_core_iblock target_core_mod configfs ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc autofs4 sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en mlx4_ib ib_sa ib_mad ib_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput iTCO_wdt iTCO_vendor_support microcode serio_raw pcspkr sb_edac edac_core sg i2c_i801 lpc_ich mfd_core mtip32xx igb i2c_algo_bit i2c_core ptp pps_core ioatdma dca wmi ext3(F) jbd(F) mbcache(F) sd_mod(F) crc_t10dif(F) crct10dif_common(F) ahci(F) libahci(F) isci(F) libsas(F) scsi_transport_sas(F) [last unloaded: speedstep_lib]
      [   60.782597] CPU: 0 PID: 9430 Comm: iscsi_ttx Tainted: GF 3.12.19+ #2
      [   60.782598] Hardware name: Supermicro X9DRX+-F/X9DRX+-F, BIOS 3.00 07/09/2013
      [   60.782599]  0000000000000035 ffff88044de31d08 ffffffff81553ae7 0000000000000035
      [   60.782602]  ffff88044de31d58 ffff88044de31d48 ffffffff8104d1cc 0000000000000002
      [   60.782605]  ffff88045b00d180 ffff88045b00d0c0 ffff88045b00d0c0 ffff88044de31e58
      [   60.782607] Call Trace:
      [   60.782611]  [<ffffffff81553ae7>] dump_stack+0x49/0x62
      [   60.782615]  [<ffffffff8104d1cc>] warn_slowpath_common+0x8c/0xc0
      [   60.782618]  [<ffffffff8104d2b6>] warn_slowpath_fmt+0x46/0x50
      [   60.782620]  [<ffffffff81280933>] __list_del_entry+0x63/0xd0
      [   60.782622]  [<ffffffff812809b1>] list_del+0x11/0x40
      [   60.782630]  [<ffffffffa06e7cf9>] iscsi_del_ts_from_active_list+0x29/0x50 [iscsi_target_mod]
      [   60.782635]  [<ffffffffa06e87b1>] iscsi_tx_thread_pre_handler+0xa1/0x180 [iscsi_target_mod]
      [   60.782642]  [<ffffffffa06fb9ae>] iscsi_target_tx_thread+0x4e/0x220 [iscsi_target_mod]
      [   60.782647]  [<ffffffffa06fb960>] ? iscsit_handle_snack+0x190/0x190 [iscsi_target_mod]
      [   60.782652]  [<ffffffffa06fb960>] ? iscsit_handle_snack+0x190/0x190 [iscsi_target_mod]
      [   60.782655]  [<ffffffff8106f99e>] kthread+0xce/0xe0
      [   60.782657]  [<ffffffff8106f8d0>] ? kthread_freezable_should_stop+0x70/0x70
      [   60.782660]  [<ffffffff8156026c>] ret_from_fork+0x7c/0xb0
      [   60.782662]  [<ffffffff8106f8d0>] ? kthread_freezable_should_stop+0x70/0x70
      [   60.782663] ---[ end trace 9662f4a661d33965 ]---
      
      Since this code is no longer used, go ahead and drop the problematic usage
      all-together.
      Reported-by: default avatarGavin Guo <gavin.guo@canonical.com>
      Reported-by: default avatarMoussa Ba <moussaba@micron.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      0944bc5f
    • Roman Gushchin's avatar
      mm/nommu.c: fix arithmetic overflow in __vm_enough_memory() · 9760664d
      Roman Gushchin authored
      commit 8138a67a upstream.
      
      I noticed that "allowed" can easily overflow by falling below 0, because
      (total_vm / 32) can be larger than "allowed".  The problem occurs in
      OVERCOMMIT_NONE mode.
      
      In this case, a huge allocation can success and overcommit the system
      (despite OVERCOMMIT_NONE mode).  All subsequent allocations will fall
      (system-wide), so system become unusable.
      
      The problem was masked out by commit c9b1d098
      ("mm: limit growth of 3% hardcoded other user reserve"),
      but it's easy to reproduce it on older kernels:
      1) set overcommit_memory sysctl to 2
      2) mmap() large file multiple times (with VM_SHARED flag)
      3) try to malloc() large amount of memory
      
      It also can be reproduced on newer kernels, but miss-configured
      sysctl_user_reserve_kbytes is required.
      
      Fix this issue by switching to signed arithmetic here.
      Signed-off-by: default avatarRoman Gushchin <klamm@yandex-team.ru>
      Cc: Andrew Shewmaker <agshew@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9760664d
    • Roman Gushchin's avatar
      mm/mmap.c: fix arithmetic overflow in __vm_enough_memory() · f0f2cf99
      Roman Gushchin authored
      commit 5703b087 upstream.
      
      I noticed, that "allowed" can easily overflow by falling below 0,
      because (total_vm / 32) can be larger than "allowed".  The problem
      occurs in OVERCOMMIT_NONE mode.
      
      In this case, a huge allocation can success and overcommit the system
      (despite OVERCOMMIT_NONE mode).  All subsequent allocations will fall
      (system-wide), so system become unusable.
      
      The problem was masked out by commit c9b1d098
      ("mm: limit growth of 3% hardcoded other user reserve"),
      but it's easy to reproduce it on older kernels:
      1) set overcommit_memory sysctl to 2
      2) mmap() large file multiple times (with VM_SHARED flag)
      3) try to malloc() large amount of memory
      
      It also can be reproduced on newer kernels, but miss-configured
      sysctl_user_reserve_kbytes is required.
      
      Fix this issue by switching to signed arithmetic here.
      
      [akpm@linux-foundation.org: use min_t]
      Signed-off-by: default avatarRoman Gushchin <klamm@yandex-team.ru>
      Cc: Andrew Shewmaker <agshew@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f0f2cf99
    • Vlastimil Babka's avatar
      mm: when stealing freepages, also take pages created by splitting buddy page · 3777163b
      Vlastimil Babka authored
      commit 99592d59 upstream.
      
      When studying page stealing, I noticed some weird looking decisions in
      try_to_steal_freepages().  The first I assume is a bug (Patch 1), the
      following two patches were driven by evaluation.
      
      Testing was done with stress-highalloc of mmtests, using the
      mm_page_alloc_extfrag tracepoint and postprocessing to get counts of how
      often page stealing occurs for individual migratetypes, and what
      migratetypes are used for fallbacks.  Arguably, the worst case of page
      stealing is when UNMOVABLE allocation steals from MOVABLE pageblock.
      RECLAIMABLE allocation stealing from MOVABLE allocation is also not ideal,
      so the goal is to minimize these two cases.
      
      The evaluation of v2 wasn't always clear win and Joonsoo questioned the
      results.  Here I used different baseline which includes RFC compaction
      improvements from [1].  I found that the compaction improvements reduce
      variability of stress-highalloc, so there's less noise in the data.
      
      First, let's look at stress-highalloc configured to do sync compaction,
      and how these patches reduce page stealing events during the test.  First
      column is after fresh reboot, other two are reiterations of test without
      reboot.  That was all accumulater over 5 re-iterations (so the benchmark
      was run 5x3 times with 5 fresh restarts).
      
      Baseline:
      
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                        5-nothp-1       5-nothp-2       5-nothp-3
      Page alloc extfrag event                               10264225     8702233    10244125
      Extfrag fragmenting                                    10263271     8701552    10243473
      Extfrag fragmenting for unmovable                         13595       17616       15960
      Extfrag fragmenting unmovable placed with movable          7989       12193        8447
      Extfrag fragmenting for reclaimable                         658        1840        1817
      Extfrag fragmenting reclaimable placed with movable         558        1677        1679
      Extfrag fragmenting for movable                        10249018     8682096    10225696
      
      With Patch 1:
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                        6-nothp-1       6-nothp-2       6-nothp-3
      Page alloc extfrag event                               11834954     9877523     9774860
      Extfrag fragmenting                                    11833993     9876880     9774245
      Extfrag fragmenting for unmovable                          7342       16129       11712
      Extfrag fragmenting unmovable placed with movable          4191       10547        6270
      Extfrag fragmenting for reclaimable                         373        1130         923
      Extfrag fragmenting reclaimable placed with movable         302         906         738
      Extfrag fragmenting for movable                        11826278     9859621     9761610
      
      With Patch 2:
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                        7-nothp-1       7-nothp-2       7-nothp-3
      Page alloc extfrag event                                4725990     3668793     3807436
      Extfrag fragmenting                                     4725104     3668252     3806898
      Extfrag fragmenting for unmovable                          6678        7974        7281
      Extfrag fragmenting unmovable placed with movable          2051        3829        4017
      Extfrag fragmenting for reclaimable                         429        1208        1278
      Extfrag fragmenting reclaimable placed with movable         369         976        1034
      Extfrag fragmenting for movable                         4717997     3659070     3798339
      
      With Patch 3:
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                        8-nothp-1       8-nothp-2       8-nothp-3
      Page alloc extfrag event                                5016183     4700142     3850633
      Extfrag fragmenting                                     5015325     4699613     3850072
      Extfrag fragmenting for unmovable                          1312        3154        3088
      Extfrag fragmenting unmovable placed with movable          1115        2777        2714
      Extfrag fragmenting for reclaimable                         437        1193        1097
      Extfrag fragmenting reclaimable placed with movable         330         969         879
      Extfrag fragmenting for movable                         5013576     4695266     3845887
      
      In v2 we've seen apparent regression with Patch 1 for unmovable events,
      this is now gone, suggesting it was indeed noise.  Here, each patch
      improves the situation for unmovable events.  Reclaimable is improved by
      patch 1 and then either the same modulo noise, or perhaps sligtly worse -
      a small price for unmovable improvements, IMHO.  The number of movable
      allocations falling back to other migratetypes is most noisy, but it's
      reduced to half at Patch 2 nevertheless.  These are least critical as
      compaction can move them around.
      
      If we look at success rates, the patches don't affect them, that didn't change.
      
      Baseline:
                                   3.19-rc4              3.19-rc4              3.19-rc4
                                  5-nothp-1             5-nothp-2             5-nothp-3
      Success 1 Min         49.00 (  0.00%)       42.00 ( 14.29%)       41.00 ( 16.33%)
      Success 1 Mean        51.00 (  0.00%)       45.00 ( 11.76%)       42.60 ( 16.47%)
      Success 1 Max         55.00 (  0.00%)       51.00 (  7.27%)       46.00 ( 16.36%)
      Success 2 Min         53.00 (  0.00%)       47.00 ( 11.32%)       44.00 ( 16.98%)
      Success 2 Mean        59.60 (  0.00%)       50.80 ( 14.77%)       48.20 ( 19.13%)
      Success 2 Max         64.00 (  0.00%)       56.00 ( 12.50%)       52.00 ( 18.75%)
      Success 3 Min         84.00 (  0.00%)       82.00 (  2.38%)       78.00 (  7.14%)
      Success 3 Mean        85.60 (  0.00%)       82.80 (  3.27%)       79.40 (  7.24%)
      Success 3 Max         86.00 (  0.00%)       83.00 (  3.49%)       80.00 (  6.98%)
      
      Patch 1:
                                   3.19-rc4              3.19-rc4              3.19-rc4
                                  6-nothp-1             6-nothp-2             6-nothp-3
      Success 1 Min         49.00 (  0.00%)       44.00 ( 10.20%)       44.00 ( 10.20%)
      Success 1 Mean        51.80 (  0.00%)       46.00 ( 11.20%)       45.80 ( 11.58%)
      Success 1 Max         54.00 (  0.00%)       49.00 (  9.26%)       49.00 (  9.26%)
      Success 2 Min         58.00 (  0.00%)       49.00 ( 15.52%)       48.00 ( 17.24%)
      Success 2 Mean        60.40 (  0.00%)       51.80 ( 14.24%)       50.80 ( 15.89%)
      Success 2 Max         63.00 (  0.00%)       54.00 ( 14.29%)       55.00 ( 12.70%)
      Success 3 Min         84.00 (  0.00%)       81.00 (  3.57%)       79.00 (  5.95%)
      Success 3 Mean        85.00 (  0.00%)       81.60 (  4.00%)       79.80 (  6.12%)
      Success 3 Max         86.00 (  0.00%)       82.00 (  4.65%)       82.00 (  4.65%)
      
      Patch 2:
      
                                   3.19-rc4              3.19-rc4              3.19-rc4
                                  7-nothp-1             7-nothp-2             7-nothp-3
      Success 1 Min         50.00 (  0.00%)       44.00 ( 12.00%)       39.00 ( 22.00%)
      Success 1 Mean        52.80 (  0.00%)       45.60 ( 13.64%)       42.40 ( 19.70%)
      Success 1 Max         55.00 (  0.00%)       46.00 ( 16.36%)       47.00 ( 14.55%)
      Success 2 Min         52.00 (  0.00%)       48.00 (  7.69%)       45.00 ( 13.46%)
      Success 2 Mean        53.40 (  0.00%)       49.80 (  6.74%)       48.80 (  8.61%)
      Success 2 Max         57.00 (  0.00%)       52.00 (  8.77%)       52.00 (  8.77%)
      Success 3 Min         84.00 (  0.00%)       81.00 (  3.57%)       79.00 (  5.95%)
      Success 3 Mean        85.00 (  0.00%)       82.40 (  3.06%)       79.60 (  6.35%)
      Success 3 Max         86.00 (  0.00%)       83.00 (  3.49%)       80.00 (  6.98%)
      
      Patch 3:
                                   3.19-rc4              3.19-rc4              3.19-rc4
                                  8-nothp-1             8-nothp-2             8-nothp-3
      Success 1 Min         46.00 (  0.00%)       44.00 (  4.35%)       42.00 (  8.70%)
      Success 1 Mean        50.20 (  0.00%)       45.60 (  9.16%)       44.00 ( 12.35%)
      Success 1 Max         52.00 (  0.00%)       47.00 (  9.62%)       47.00 (  9.62%)
      Success 2 Min         53.00 (  0.00%)       49.00 (  7.55%)       48.00 (  9.43%)
      Success 2 Mean        55.80 (  0.00%)       50.60 (  9.32%)       49.00 ( 12.19%)
      Success 2 Max         59.00 (  0.00%)       52.00 ( 11.86%)       51.00 ( 13.56%)
      Success 3 Min         84.00 (  0.00%)       80.00 (  4.76%)       79.00 (  5.95%)
      Success 3 Mean        85.40 (  0.00%)       81.60 (  4.45%)       80.40 (  5.85%)
      Success 3 Max         87.00 (  0.00%)       83.00 (  4.60%)       82.00 (  5.75%)
      
      While there's no improvement here, I consider reduced fragmentation events
      to be worth on its own.  Patch 2 also seems to reduce scanning for free
      pages, and migrations in compaction, suggesting it has somewhat less work
      to do:
      
      Patch 1:
      
      Compaction stalls                 4153        3959        3978
      Compaction success                1523        1441        1446
      Compaction failures               2630        2517        2531
      Page migrate success           4600827     4943120     5104348
      Page migrate failure             19763       16656       17806
      Compaction pages isolated      9597640    10305617    10653541
      Compaction migrate scanned    77828948    86533283    87137064
      Compaction free scanned      517758295   521312840   521462251
      Compaction cost                   5503        5932        6110
      
      Patch 2:
      
      Compaction stalls                 3800        3450        3518
      Compaction success                1421        1316        1317
      Compaction failures               2379        2134        2201
      Page migrate success           4160421     4502708     4752148
      Page migrate failure             19705       14340       14911
      Compaction pages isolated      8731983     9382374     9910043
      Compaction migrate scanned    98362797    96349194    98609686
      Compaction free scanned      496512560   469502017   480442545
      Compaction cost                   5173        5526        5811
      
      As with v2, /proc/pagetypeinfo appears unaffected with respect to numbers
      of unmovable and reclaimable pageblocks.
      
      Configuring the benchmark to allocate like THP page fault (i.e.  no sync
      compaction) gives much noisier results for iterations 2 and 3 after
      reboot.  This is not so surprising given how [1] offers lower improvements
      in this scenario due to less restarts after deferred compaction which
      would change compaction pivot.
      
      Baseline:
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                          5-thp-1         5-thp-2         5-thp-3
      Page alloc extfrag event                                8148965     6227815     6646741
      Extfrag fragmenting                                     8147872     6227130     6646117
      Extfrag fragmenting for unmovable                         10324       12942       15975
      Extfrag fragmenting unmovable placed with movable          5972        8495       10907
      Extfrag fragmenting for reclaimable                         601        1707        2210
      Extfrag fragmenting reclaimable placed with movable         520        1570        2000
      Extfrag fragmenting for movable                         8136947     6212481     6627932
      
      Patch 1:
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                          6-thp-1         6-thp-2         6-thp-3
      Page alloc extfrag event                                8345457     7574471     7020419
      Extfrag fragmenting                                     8343546     7573777     7019718
      Extfrag fragmenting for unmovable                         10256       18535       30716
      Extfrag fragmenting unmovable placed with movable          6893       11726       22181
      Extfrag fragmenting for reclaimable                         465        1208        1023
      Extfrag fragmenting reclaimable placed with movable         353         996         843
      Extfrag fragmenting for movable                         8332825     7554034     6987979
      
      Patch 2:
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                          7-thp-1         7-thp-2         7-thp-3
      Page alloc extfrag event                                3512847     3020756     2891625
      Extfrag fragmenting                                     3511940     3020185     2891059
      Extfrag fragmenting for unmovable                          9017        6892        6191
      Extfrag fragmenting unmovable placed with movable          1524        3053        2435
      Extfrag fragmenting for reclaimable                         445        1081        1160
      Extfrag fragmenting reclaimable placed with movable         375         918         986
      Extfrag fragmenting for movable                         3502478     3012212     2883708
      
      Patch 3:
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                          8-thp-1         8-thp-2         8-thp-3
      Page alloc extfrag event                                3181699     3082881     2674164
      Extfrag fragmenting                                     3180812     3082303     2673611
      Extfrag fragmenting for unmovable                          1201        4031        4040
      Extfrag fragmenting unmovable placed with movable           974        3611        3645
      Extfrag fragmenting for reclaimable                         478        1165        1294
      Extfrag fragmenting reclaimable placed with movable         387         985        1030
      Extfrag fragmenting for movable                         3179133     3077107     2668277
      
      The improvements for first iteration are clear, the rest is much noisier
      and can appear like regression for Patch 1.  Anyway, patch 2 rectifies it.
      
      Allocation success rates are again unaffected so there's no point in
      making this e-mail any longer.
      
      [1] http://marc.info/?l=linux-mm&m=142166196321125&w=2
      
      This patch (of 3):
      
      When __rmqueue_fallback() is called to allocate a page of order X, it will
      find a page of order Y >= X of a fallback migratetype, which is different
      from the desired migratetype.  With the help of try_to_steal_freepages(),
      it may change the migratetype (to the desired one) also of:
      
      1) all currently free pages in the pageblock containing the fallback page
      2) the fallback pageblock itself
      3) buddy pages created by splitting the fallback page (when Y > X)
      
      These decisions take the order Y into account, as well as the desired
      migratetype, with the goal of preventing multiple fallback allocations
      that could e.g.  distribute UNMOVABLE allocations among multiple
      pageblocks.
      
      Originally, decision for 1) has implied the decision for 3).  Commit
      47118af0 ("mm: mmzone: MIGRATE_CMA migration type added") changed that
      (probably unintentionally) so that the buddy pages in case 3) are always
      changed to the desired migratetype, except for CMA pageblocks.
      
      Commit fef903ef ("mm/page_allo.c: restructure free-page stealing code
      and fix a bug") did some refactoring and added a comment that the case of
      3) is intended.  Commit 0cbef29a ("mm: __rmqueue_fallback() should
      respect pageblock type") removed the comment and tried to restore the
      original behavior where 1) implies 3), but due to the previous
      refactoring, the result is instead that only 2) implies 3) - and the
      conditions for 2) are less frequently met than conditions for 1).  This
      may increase fragmentation in situations where the code decides to steal
      all free pages from the pageblock (case 1)), but then gives back the buddy
      pages produced by splitting.
      
      This patch restores the original intended logic where 1) implies 3).
      During testing with stress-highalloc from mmtests, this has shown to
      decrease the number of events where UNMOVABLE and RECLAIMABLE allocations
      steal from MOVABLE pageblocks, which can lead to permanent fragmentation.
      In some cases it has increased the number of events when MOVABLE
      allocations steal from UNMOVABLE or RECLAIMABLE pageblocks, but these are
      fixable by sync compaction and thus less harmful.
      
      Note that evaluation has shown that the behavior introduced by
      47118af0 for buddy pages in case 3) is actually even better than the
      original logic, so the following patch will introduce it properly once
      again.  For stable backports of this patch it makes thus sense to only fix
      versions containing 0cbef29a.
      
      [iamjoonsoo.kim@lge.com: tracepoint fix]
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3777163b