1. 31 Mar, 2014 7 commits
    • Vaibhav Nagarnaik's avatar
      tracing: Fix array size mismatch in format string · 130cfbeb
      Vaibhav Nagarnaik authored
      commit 87291347 upstream.
      
      In event format strings, the array size is reported in two locations.
      One in array subscript and then via the "size:" attribute. The values
      reported there have a mismatch.
      
      For e.g., in sched:sched_switch the prev_comm and next_comm character
      arrays have subscript values as [32] where as the actual field size is
      16.
      
      name: sched_switch
      ID: 301
      format:
              field:unsigned short common_type;       offset:0;       size:2; signed:0;
              field:unsigned char common_flags;       offset:2;       size:1; signed:0;
              field:unsigned char common_preempt_count;       offset:3;       size:1;signed:0;
              field:int common_pid;   offset:4;       size:4; signed:1;
      
              field:char prev_comm[32];       offset:8;       size:16;        signed:1;
              field:pid_t prev_pid;   offset:24;      size:4; signed:1;
              field:int prev_prio;    offset:28;      size:4; signed:1;
              field:long prev_state;  offset:32;      size:8; signed:1;
              field:char next_comm[32];       offset:40;      size:16;        signed:1;
              field:pid_t next_pid;   offset:56;      size:4; signed:1;
              field:int next_prio;    offset:60;      size:4; signed:1;
      
      After bisection, the following commit was blamed:
      92edca07 tracing: Use direct field, type and system names
      
      This commit removes the duplication of strings for field->name and
      field->type assuming that all the strings passed in
      __trace_define_field() are immutable. This is not true for arrays, where
      the type string is created in event_storage variable and field->type for
      all array fields points to event_storage.
      
      Use __stringify() to create a string constant for the type string.
      
      Also, get rid of event_storage and event_storage_mutex that are not
      needed anymore.
      
      also, an added benefit is that this reduces the overhead of events a bit more:
      
         text    data     bss     dec     hex filename
      8424787 2036472 1302528 11763787         b3804b vmlinux
      8420814 2036408 1302528 11759750         b37086 vmlinux.patched
      
      Link: http://lkml.kernel.org/r/1392349908-29685-1-git-send-email-vnagarnaik@google.com
      
      Cc: Laurent Chavey <chavey@google.com>
      Signed-off-by: default avatarVaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      130cfbeb
    • Chris Wilson's avatar
      drm/i915: Disable stolen memory when DMAR is active · 2d111d8a
      Chris Wilson authored
      commit 0f4706d2 upstream.
      
      We have reports of heavy screen corruption if we try to use the stolen
      memory reserved by the BIOS whilst the DMA-Remapper is active. This
      quirk may be only specific to a few machines or BIOSes, but first lets
      apply the big hammer and always disable use of stolen memory when DMAR
      is active.
      
      v2 by Jani: Rebase on -fixes, only look at intel_iommu_gfx_mapped.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68535Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d111d8a
    • Daniel Vetter's avatar
      drm/i915: Don't enable display error interrupts from the start · b52bcddf
      Daniel Vetter authored
      commit 5c673b60 upstream.
      
      We need to enable interrupt processing before all the modeset
      state is set up. But that means we can fall over when we get a pipe
      underrun. This shouldn't happen as long as the bios works correctly
      but as usual this turns out to be wishful thinking.
      
      So disable error interrupts at irq install time and rely on the
      re-enabling code in the modeset functions to take care of this.
      
      Note that due to the SDE interrupt handling race we must
      uncondtionally enable all interrupt sources in SDEIER, hence no need
      to enable the SERR bit specifically.
      
      On gmch platforms we don't have an explicit enable/mask bit for fifo
      underruns. Fixing this up would require a bit of software tracking,
      hence is material for a separate patch. To make this possible we need
      to switch all gmch platforms to the new pipestat interrupt handling
      scheme Imre implemented for vlv, and then also add a safe form of sw
      state checking to __cpu_fifo_underrun_reporting_enabled a bit.
      
      v2: Also handle the ilk/snb cpu fifo underrun bits accordingly.
      Spotted by Ville.
      
      v3: Also handle the south interrupt underrun bits on ibx. Again
      spotted by Ville.
      Reported-by: default avatarRob Clark <robdclark@gmail.com>
      Cc: Rob Clark <robdclark@gmail.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Tested-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b52bcddf
    • Ben Widawsky's avatar
      drm/i915: Fix PSR programming · bab00f22
      Ben Widawsky authored
      commit 24bd9bf5 upstream.
      
      | has a higher precedence than ?. Therefore, the calculation doesn't do
      at all what you would expect. Thanks to Ken for convincing me that this
      was indeed the issue. Send me back to C programmer school, please.
      
      I'm sort of surprised PSR was continuing to work for people. It should
      be broken IMO (and it was broken for me, but I had assumed it never
      worked).
      
      Regression from:
      commit ed8546ac
      Author: Ben Widawsky <benjamin.widawsky@intel.com>
      Date:   Mon Nov 4 22:45:05 2013 -0800
      
          drm/i915/bdw: Support eDP PSR
      
      Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com>
      Cc: Kenneth Graunke <kenneth.w.graunke@intel.com>
      Cc: Art Runyan <arthur.j.runyan@intel.com>
      Reported-by: default avatar"Kumar, Kiran S" <kiran.s.kumar@intel.com>
      Signed-off-by: default avatarBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bab00f22
    • Stefan Agner's avatar
      clocksource: vf_pit_timer: use complement for sched_clock reading · de690a49
      Stefan Agner authored
      commit 224aa3ed upstream.
      
      Vybrids PIT register is monitonic decreasing. However, sched_clock
      reading needs to be monitonic increasing. Use bitwise not to get
      the complement of the clock register. This fixes the clock going
      backward. Also, the clock now starts at 0 since we load the
      register with the maximum value at start.
      Signed-off-by: default avatarStefan Agner <stefan@agner.ch>
      Acked-by: default avatarShawn Guo <shawn.guo@linaro.org>
      Cc: daniel.lezcano@linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux@arm.linux.org.uk
      Link: http://lkml.kernel.org/r/d25af915993aec1b486be653eb86f748ddef54fe.1394057313.git.stefan@agner.chSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de690a49
    • Charles Keepax's avatar
      ALSA: compress: Pass through return value of open ops callback · a36a33a1
      Charles Keepax authored
      commit 749d3223 upstream.
      
      The snd_compr_open function would always return 0 even if the compressed
      ops open function failed, obviously this is incorrect. Looks like this
      was introduced by a small typo in:
      
      commit a0830dbd
      ALSA: Add a reference counter to card instance
      
      This patch returns the value from the compressed op as it should.
      Signed-off-by: default avatarCharles Keepax <ckeepax@opensource.wolfsonmicro.com>
      Acked-by: default avatarVinod Koul <vinod.koul@intel.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a36a33a1
    • Fernando Luis Vazquez Cao's avatar
      HID: hidraw: fix warning destroying hidraw device files after parent · 4ed5cc18
      Fernando Luis Vazquez Cao authored
      commit 47587fc0 upstream.
      
      I noticed that after hot unplugging a Logitech unifying receiver
      (drivers/hid/hid-logitech-dj.c) the kernel would occasionally spew a
      stack trace similar to this:
      
      usb 1-1.1.2: USB disconnect, device number 7
      WARNING: CPU: 0 PID: 2865 at fs/sysfs/group.c:216 device_del+0x40/0x1b0()
      sysfs group ffffffff8187fa20 not found for kobject 'hidraw0'
      [...]
      CPU: 0 PID: 2865 Comm: upowerd Tainted: G        W 3.14.0-rc4 #7
      Hardware name: LENOVO 7783PN4/        , BIOS 9HKT43AUS 07/11/2011
       0000000000000009 ffffffff814cd684 ffff880427ccfdf8 ffffffff810616e7
       ffff88041ec61800 ffff880427ccfe48 ffff88041e444d80 ffff880426fab8e8
       ffff880429359960 ffffffff8106174c ffffffff81714b98 0000000000000028
      Call Trace:
       [<ffffffff814cd684>] ? dump_stack+0x41/0x51
       [<ffffffff810616e7>] ? warn_slowpath_common+0x77/0x90
       [<ffffffff8106174c>] ? warn_slowpath_fmt+0x4c/0x50
       [<ffffffff81374fd0>] ? device_del+0x40/0x1b0
       [<ffffffff8137516f>] ? device_unregister+0x2f/0x50
       [<ffffffff813751fa>] ? device_destroy+0x3a/0x40
       [<ffffffffa03ca245>] ? drop_ref+0x55/0x120 [hid]
       [<ffffffffa03ca3e6>] ? hidraw_release+0x96/0xb0 [hid]
       [<ffffffff811929da>] ? __fput+0xca/0x210
       [<ffffffff8107fe17>] ? task_work_run+0x97/0xd0
       [<ffffffff810139a9>] ? do_notify_resume+0x69/0xa0
       [<ffffffff814dbd22>] ? int_signal+0x12/0x17
      ---[ end trace 63f4a46f6566d737 ]---
      
      During device removal hid_disconnect() is called via hid_hw_stop() to
      stop the device and free all its resources, including the sysfs
      files. The problem is that if a user space process, such as upowerd,
      holds a reference to a hidraw file the corresponding sysfs files will
      be kept around (drop_ref() does not call device_destroy() if the open
      counter is not 0) and it will be usb_disconnect() who, by calling
      device_del() for the USB device, will indirectly remove the sysfs
      files of the hidraw device (sysfs_remove_dir() is recursive these
      days). Because of this, by the time user space releases the last
      reference to the hidraw file and drop_ref() tries to destroy the
      device the sysfs files are already gone and the kernel will print
      the warning above.
      
      Fix this by calling device_destroy() at USB disconnect time.
      Signed-off-by: default avatarFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Reviewed-by: default avatarDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ed5cc18
  2. 24 Mar, 2014 33 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.13.7 · 896c6947
      Greg Kroah-Hartman authored
      896c6947
    • Zhang Rui's avatar
      PNP / ACPI: proper handling of ACPI IO/Memory resource parsing failures · 8a44a89e
      Zhang Rui authored
      commit 89935315 upstream.
      
      Before commit b355cee8 (ACPI / resources: ignore invalid ACPI
      device resources), if acpi_dev_resource_memory()/acpi_dev_resource_io()
      returns false, it means the the resource is not a memeory/IO resource.
      
      But after commit b355cee8, those functions return false if the
      given memory/IO resource entry is invalid (the length of the resource
      is zero).
      
      This breaks pnpacpi_allocated_resource(), because it now recognizes
      the invalid memory/io resources as resources of unknown type.  Thus
      users see confusing warning messages on machines with zero length
      ACPI memory/IO resources.
      
      Fix the problem by rearranging pnpacpi_allocated_resource() so that
      it calls acpi_dev_resource_memory() for memory type and IO type
      resources only, respectively.
      
      Fixes: b355cee8 (ACPI / resources: ignore invalid ACPI device resources)
      Signed-off-by: default avatarZhang Rui <rui.zhang@intel.com>
      Reported-and-tested-by: default avatarMarkus Trippelsdorf <markus@trippelsdorf.de>
      Reported-and-tested-by: default avatarJulian Wollrath <jwollrath@web.de>
      Reported-and-tested-by: default avatarPaul Bolle <pebolle@tiscali.nl>
      [rjw: Changelog]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a44a89e
    • Filipe Brandenburger's avatar
      memcg: reparent charges of children before processing parent · efab06e9
      Filipe Brandenburger authored
      commit 4fb1a86f upstream.
      
      Sometimes the cleanup after memcg hierarchy testing gets stuck in
      mem_cgroup_reparent_charges(), unable to bring non-kmem usage down to 0.
      
      There may turn out to be several causes, but a major cause is this: the
      workitem to offline parent can get run before workitem to offline child;
      parent's mem_cgroup_reparent_charges() circles around waiting for the
      child's pages to be reparented to its lrus, but it's holding
      cgroup_mutex which prevents the child from reaching its
      mem_cgroup_reparent_charges().
      
      Further testing showed that an ordered workqueue for cgroup_destroy_wq
      is not always good enough: percpu_ref_kill_and_confirm's call_rcu_sched
      stage on the way can mess up the order before reaching the workqueue.
      
      Instead, when offlining a memcg, call mem_cgroup_reparent_charges() on
      all its children (and grandchildren, in the correct order) to have their
      charges reparented first.
      
      Fixes: e5fca243 ("cgroup: use a dedicated workqueue for cgroup destruction")
      Signed-off-by: default avatarFilipe Brandenburger <filbranden@google.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>	[v3.10+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      efab06e9
    • Steve Capper's avatar
      arm64: mm: Add double logical invert to pte accessors · 52e7d049
      Steve Capper authored
      commit 84fe6826 upstream.
      
      Page table entries on ARM64 are 64 bits, and some pte functions such as
      pte_dirty return a bitwise-and of a flag with the pte value. If the
      flag to be tested resides in the upper 32 bits of the pte, then we run
      into the danger of the result being dropped if downcast.
      
      For example:
      	gather_stats(page, md, pte_dirty(*pte), 1);
      where pte_dirty(*pte) is downcast to an int.
      
      This patch adds a double logical invert to all the pte_ accessors to
      ensure predictable downcasting.
      Signed-off-by: default avatarSteve Capper <steve.capper@linaro.org>
      [steve.capper@linaro.org: rebased patch to leave pte_write alone to
      allow for merge with 3.13 stable]
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52e7d049
    • Nicholas Bellinger's avatar
      bio-integrity: Fix bio_integrity_verify segment start bug · c6f538a8
      Nicholas Bellinger authored
      commit 5837c80e upstream.
      
      This patch addresses a bug in bio_integrity_verify() code that has
      been causing DIF READ verify operations to be silently skipped.
      
      The issue is that bio->bi_idx will have been incremented within
      bio_advance() code in the normal blk_update_request() ->
      req_bio_endio() completion path, and bio_integrity_verify() is
      using bio_for_each_segment() which starts the bio segment walk
      at the current bio->bi_idx.
      
      So instead use bio_for_each_segment_all() to always start the bio
      segment walk from zero, regardless of the current bio->bi_idx
      value after bio_advance() has been called.
      
      (Context change for v3.10.y -> v3.13.y code - nab)
      
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      c6f538a8
    • Qais Yousef's avatar
      MIPS: include linux/types.h · fb3fbcea
      Qais Yousef authored
      commit 87c99203 upstream.
      
      The file uses u16 type but doesn't include its definition explicitly
      
      I was getting this error when including this header in my driver:
      
        arch/mips/include/asm/mipsregs.h:644:33: error: unknown type name ‘u16’
      Signed-off-by: default avatarQais Yousef <qais.yousef@imgtec.com>
      Reviewed-by: default avatarSteven J. Hill <Steven.Hill@imgtec.com>
      Acked-by: default avatarDavid Daney <david.daney@cavium.com>
      Signed-off-by: default avatarJohn Crispin <blogic@openwrt.org>
      Patchwork: http://patchwork.linux-mips.org/patch/6212/Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fb3fbcea
    • Oleg Drokin's avatar
      Fix mountpoint reference leakage in linkat · 0cdd9e51
      Oleg Drokin authored
      commit d22e6338 upstream.
      
      Recent changes to retry on ESTALE in linkat
      (commit 442e31ca)
      introduced a mountpoint reference leak and a small memory
      leak in case a filesystem link operation returns ESTALE
      which is pretty normal for distributed filesystems like
      lustre, nfs and so on.
      Free old_path in such a case.
      
      [AV: there was another missing path_put() nearby - on the previous
      goto retry]
      Signed-off-by: default avatarOleg Drokin: <green@linuxhacker.ru>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0cdd9e51
    • Shuah Khan's avatar
      regulator: core: Change dummy supplies error message to a warning · d40945b8
      Shuah Khan authored
      commit acc3d5ce upstream.
      
      Change "dummy supplies not allowed" error message to warning instead, as this
      is a just warning message with no change to the behavior.
      
      [Added a CC to stable since some other bug fixes cause this to come up
      more frequently on PCs which is how it was noticed -- broonie]
      Signed-off-by: default avatarShuah Khan <shuah.kh@samsung.com>
      Signed-off-by: default avatarMark Brown <broonie@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      d40945b8
    • Roman Volkov's avatar
      ALSA: oxygen: modify adjust_dg_dac_routing function · e309995b
      Roman Volkov authored
      commit 1f91ecc1 upstream.
      
      When selecting the audio output destinations (headphones,
      FP headphones, multichannel output), the channel routing
      should be changed depending on what destination selected.
      Also unnecessary I2S channels are digitally muted. This
      function called when the user selects the destination
      in the ALSA mixer.
      Signed-off-by: default avatarRoman Volkov <v1ron@mail.ru>
      Signed-off-by: default avatarClemens Ladisch <clemens@ladisch.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      e309995b
    • Dirk Brandewie's avatar
      intel_pstate: Add support for Baytrail turbo P states · ef3d4124
      Dirk Brandewie authored
      commit 61d8d2ab upstream.
      
      A documentation update exposed the existance of the turbo ratio
      register. Update baytrail support to use the turbo range.
      Signed-off-by: default avatarDirk Brandewie <dirk.j.brandewie@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef3d4124
    • Dirk Brandewie's avatar
      intel_pstate: Add setting voltage value for baytrail P states. · e08b9ad3
      Dirk Brandewie authored
      commit 007bea09 upstream.
      
      Baytrail requires setting P state and voltage pairs when adjusting the
      requested P state.  Add function for retrieving the valid voltage
      values and modify *_set_pstate() functions to caluclate the
      appropriate voltage for the requested P state.
      Signed-off-by: default avatarDirk Brandewie <dirk.j.brandewie@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e08b9ad3
    • Gao feng's avatar
      audit: don't generate loginuid log when audit disabled · 80261089
      Gao feng authored
      commit c2412d91 upstream.
      
      If audit is disabled, we shouldn't generate loginuid audit
      log.
      Acked-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: default avatarRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80261089
    • Filipe David Borba Manana's avatar
      Btrfs: fix data corruption when reading/updating compressed extents · cc7d8e30
      Filipe David Borba Manana authored
      commit a2aa75e1 upstream.
      
      When using a mix of compressed file extents and prealloc extents, it
      is possible to fill a page of a file with random, garbage data from
      some unrelated previous use of the page, instead of a sequence of zeroes.
      
      A simple sequence of steps to get into such case, taken from the test
      case I made for xfstests, is:
      
         _scratch_mkfs
         _scratch_mount "-o compress-force=lzo"
         $XFS_IO_PROG -f -c "pwrite -S 0x06 -b 18670 266978 18670" $SCRATCH_MNT/foobar
         $XFS_IO_PROG -c "falloc 26450 665194" $SCRATCH_MNT/foobar
         $XFS_IO_PROG -c "truncate 542872" $SCRATCH_MNT/foobar
         $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar
      
      This results in the following file items in the fs tree:
      
         item 4 key (257 INODE_ITEM 0) itemoff 15879 itemsize 160
             inode generation 6 transid 6 size 542872 block group 0 mode 100600
         item 5 key (257 INODE_REF 256) itemoff 15863 itemsize 16
             inode ref index 2 namelen 6 name: foobar
         item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
             extent data disk byte 0 nr 0 gen 6
             extent data offset 0 nr 24576 ram 266240
             extent compression 0
         item 7 key (257 EXTENT_DATA 24576) itemoff 15757 itemsize 53
             prealloc data disk byte 12849152 nr 241664 gen 6
             prealloc data offset 0 nr 241664
         item 8 key (257 EXTENT_DATA 266240) itemoff 15704 itemsize 53
             extent data disk byte 12845056 nr 4096 gen 6
             extent data offset 0 nr 20480 ram 20480
             extent compression 2
         item 9 key (257 EXTENT_DATA 286720) itemoff 15651 itemsize 53
             prealloc data disk byte 13090816 nr 405504 gen 6
             prealloc data offset 0 nr 258048
      
      The on disk extent at offset 266240 (which corresponds to 1 single disk block),
      contains 5 compressed chunks of file data. Each of the first 4 compress 4096
      bytes of file data, while the last one only compresses 3024 bytes of file data.
      Therefore a read into the file region [285648 ; 286720[ (length = 4096 - 3024 =
      1072 bytes) should always return zeroes (our next extent is a prealloc one).
      
      The solution here is the compression code path to zero the remaining (untouched)
      bytes of the last page it uncompressed data into, as the information about how
      much space the file data consumes in the last page is not known in the upper layer
      fs/btrfs/extent_io.c:__do_readpage(). In __do_readpage we were correctly zeroing
      the remainder of the page but only if it corresponds to the last page of the inode
      and if the inode's size is not a multiple of the page size.
      
      This would cause not only returning random data on reads, but also permanently
      storing random data when updating parts of the region that should be zeroed.
      For the example above, it means updating a single byte in the region [285648 ; 286720[
      would store that byte correctly but also store random data on disk.
      
      A test case for xfstests follows soon.
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc7d8e30
    • Filipe David Borba Manana's avatar
      Btrfs: fix tree mod logging · fc2345d5
      Filipe David Borba Manana authored
      commit 5de865ee upstream.
      
      While running the test btrfs/004 from xfstests in a loop, it failed
      about 1 time out of 20 runs in my desktop. The failure happened in
      the backref walking part of the test, and the test's error message was
      like this:
      
      #  btrfs/004 93s ... [failed, exit status 1] - output mismatch (see /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad)
      #      --- tests/btrfs/004.out	2013-11-26 18:25:29.263333714 +0000
      #      +++ /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad	2013-12-10 15:25:10.327518516 +0000
      #      @@ -1,3 +1,8 @@
      #       QA output created by 004
      #       *** test backref walking
      #      -*** done
      #      +unexpected output from
      #      +	/home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal logical-resolve -P 141512704 /home/fdmanana/btrfs-tests/scratch_1
      #      +expected inum: 405, expected address: 454656, file: /home/fdmanana/btrfs-tests/scratch_1/snap1/p0/d6/d3d/d156/fce, got:
      #      +
             ...
             (Run 'diff -u tests/btrfs/004.out /home/fdmanana/git/hub/xfstests_2/results//btrfs/004.out.bad' to see the entire diff)
        Ran: btrfs/004
        Failures: btrfs/004
        Failed 1 of 1 tests
      
      But immediately after the test finished, the btrfs inspect-internal command
      returned the expected output:
      
        $ btrfs inspect-internal logical-resolve -P 141512704 /home/fdmanana/btrfs-tests/scratch_1
        inode 405 offset 454656 root 258
        inode 405 offset 454656 root 5
      
      It turned out this was because the btrfs_search_old_slot() calls performed
      during backref walking (backref.c:__resolve_indirect_ref) were not finding
      anything. The reason for this turned out to be that the tree mod logging
      code was not logging some node multi-step operations atomically, therefore
      btrfs_search_old_slot() callers iterated often over an incomplete tree that
      wasn't fully consistent with any tree state from the past. Besides missing
      items, this often (but not always) resulted in -EIO errors during old slot
      searches, reported in dmesg like this:
      
      [ 4299.933936] ------------[ cut here ]------------
      [ 4299.933949] WARNING: CPU: 0 PID: 23190 at fs/btrfs/ctree.c:1343 btrfs_search_old_slot+0x57b/0xab0 [btrfs]()
      [ 4299.933950] Modules linked in: btrfs raid6_pq xor pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bnep rfcomm bluetooth parport_pc ppdev binfmt_misc joydev snd_hda_codec_h
      [ 4299.933977] CPU: 0 PID: 23190 Comm: btrfs Tainted: G        W  O 3.12.0-fdm-btrfs-next-16+ #70
      [ 4299.933978] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Pro4, BIOS P1.50 09/04/2012
      [ 4299.933979]  000000000000053f ffff8806f3fd98f8 ffffffff8176d284 0000000000000007
      [ 4299.933982]  0000000000000000 ffff8806f3fd9938 ffffffff8104a81c ffff880659c64b70
      [ 4299.933984]  ffff880659c643d0 ffff8806599233d8 ffff880701e2e938 0000160000000000
      [ 4299.933987] Call Trace:
      [ 4299.933991]  [<ffffffff8176d284>] dump_stack+0x55/0x76
      [ 4299.933994]  [<ffffffff8104a81c>] warn_slowpath_common+0x8c/0xc0
      [ 4299.933997]  [<ffffffff8104a86a>] warn_slowpath_null+0x1a/0x20
      [ 4299.934003]  [<ffffffffa065d3bb>] btrfs_search_old_slot+0x57b/0xab0 [btrfs]
      [ 4299.934005]  [<ffffffff81775f3b>] ? _raw_read_unlock+0x2b/0x50
      [ 4299.934010]  [<ffffffffa0655001>] ? __tree_mod_log_search+0x81/0xc0 [btrfs]
      [ 4299.934019]  [<ffffffffa06dd9b0>] __resolve_indirect_refs+0x130/0x5f0 [btrfs]
      [ 4299.934027]  [<ffffffffa06a21f1>] ? free_extent_buffer+0x61/0xc0 [btrfs]
      [ 4299.934034]  [<ffffffffa06de39c>] find_parent_nodes+0x1fc/0xe40 [btrfs]
      [ 4299.934042]  [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0 [btrfs]
      [ 4299.934048]  [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0 [btrfs]
      [ 4299.934056]  [<ffffffffa06df980>] iterate_extent_inodes+0xe0/0x250 [btrfs]
      [ 4299.934058]  [<ffffffff817762db>] ? _raw_spin_unlock+0x2b/0x50
      [ 4299.934065]  [<ffffffffa06dfb82>] iterate_inodes_from_logical+0x92/0xb0 [btrfs]
      [ 4299.934071]  [<ffffffffa06b13e0>] ? defrag_lookup_extent+0xe0/0xe0 [btrfs]
      [ 4299.934078]  [<ffffffffa06b7015>] btrfs_ioctl+0xf65/0x1f60 [btrfs]
      [ 4299.934080]  [<ffffffff811658b8>] ? handle_mm_fault+0x278/0xb00
      [ 4299.934083]  [<ffffffff81075563>] ? up_read+0x23/0x40
      [ 4299.934085]  [<ffffffff8177a41c>] ? __do_page_fault+0x20c/0x5a0
      [ 4299.934088]  [<ffffffff811b2946>] do_vfs_ioctl+0x96/0x570
      [ 4299.934090]  [<ffffffff81776e23>] ? error_sti+0x5/0x6
      [ 4299.934093]  [<ffffffff810b71e8>] ? trace_hardirqs_off_caller+0x28/0xd0
      [ 4299.934096]  [<ffffffff81776a09>] ? retint_swapgs+0xe/0x13
      [ 4299.934098]  [<ffffffff811b2eb1>] SyS_ioctl+0x91/0xb0
      [ 4299.934100]  [<ffffffff813eecde>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [ 4299.934102]  [<ffffffff8177ef12>] system_call_fastpath+0x16/0x1b
      [ 4299.934102]  [<ffffffff8177ef12>] system_call_fastpath+0x16/0x1b
      [ 4299.934104] ---[ end trace 48f0cfc902491414 ]---
      [ 4299.934378] btrfs bad fsid on block 0
      
      These tree mod log operations that must be performed atomically, tree_mod_log_free_eb,
      tree_mod_log_eb_copy, tree_mod_log_insert_root and tree_mod_log_insert_move, used to
      be performed atomically before the following commit:
      
        c8cc6341
        (Btrfs: stop using GFP_ATOMIC for the tree mod log allocations)
      
      That change removed the atomicity of such operations. This patch restores the
      atomicity while still not doing the GFP_ATOMIC allocations of tree_mod_elem
      structures, so it has to do the allocations using GFP_NOFS before acquiring
      the mod log lock.
      
      This issue has been experienced by several users recently, such as for example:
      
        http://www.spinics.net/lists/linux-btrfs/msg28574.html
      
      After running the btrfs/004 test for 679 consecutive iterations with this
      patch applied, I didn't ran into the issue anymore.
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc2345d5
    • Filipe David Borba Manana's avatar
      Btrfs: return immediately if tree log mod is not necessary · 6ad309b5
      Filipe David Borba Manana authored
      commit 78357766 upstream.
      
      In ctree.c:tree_mod_log_set_node_key() we were calling
      __tree_mod_log_insert_key() even when the modification doesn't need
      to be logged. This would allocate a tree_mod_elem structure, fill it
      and pass it to  __tree_mod_log_insert(), which would just acquire
      the tree mod log write lock and then free the tree_mod_elem structure
      and return (that is, a no-op).
      
      Therefore call tree_mod_log_insert() instead of __tree_mod_log_insert()
      which just returns immediately if the modification doesn't need to be
      logged (without allocating the structure, fill it, acquire write lock,
      free structure).
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ad309b5
    • Suresh Siddha's avatar
      x86, fpu: Check tsk_used_math() in kernel_fpu_end() for eager FPU · 2b7abdff
      Suresh Siddha authored
      commit 731bd6a9 upstream.
      
      For non-eager fpu mode, thread's fpu state is allocated during the first
      fpu usage (in the context of device not available exception). This
      (math_state_restore()) can be a blocking call and hence we enable
      interrupts (which were originally disabled when the exception happened),
      allocate memory and disable interrupts etc.
      
      But the eager-fpu mode, call's the same math_state_restore() from
      kernel_fpu_end(). The assumption being that tsk_used_math() is always
      set for the eager-fpu mode and thus avoid the code path of enabling
      interrupts, allocating fpu state using blocking call and disable
      interrupts etc.
      
      But the below issue was noticed by Maarten Baert, Nate Eldredge and
      few others:
      
      If a user process dumps core on an ecrypt fs while aesni-intel is loaded,
      we get a BUG() in __find_get_block() complaining that it was called with
      interrupts disabled; then all further accesses to our ecrypt fs hang
      and we have to reboot.
      
      The aesni-intel code (encrypting the core file that we are writing) needs
      the FPU and quite properly wraps its code in kernel_fpu_{begin,end}(),
      the latter of which calls math_state_restore(). So after kernel_fpu_end(),
      interrupts may be disabled, which nobody seems to expect, and they stay
      that way until we eventually get to __find_get_block() which barfs.
      
      For eager fpu, most the time, tsk_used_math() is true. At few instances
      during thread exit, signal return handling etc, tsk_used_math() might
      be false.
      
      In kernel_fpu_end(), for eager-fpu, call math_state_restore()
      only if tsk_used_math() is set. Otherwise, don't bother. Kernel code
      path which cleared tsk_used_math() knows what needs to be done
      with the fpu state.
      Reported-by: default avatarMaarten Baert <maarten-baert@hotmail.com>
      Reported-by: default avatarNate Eldredge <nate@thatsmathematics.com>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSuresh Siddha <sbsiddha@gmail.com>
      Link: http://lkml.kernel.org/r/1391410583.3801.6.camel@europa
      Cc: George Spelvin <linux@horizon.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b7abdff
    • Ales Novak's avatar
      SCSI: storvsc: NULL pointer dereference fix · 3d61f12b
      Ales Novak authored
      commit b12bb60d upstream.
      
      If the initialization of storvsc fails, the storvsc_device_destroy()
      causes NULL pointer dereference.
      
      storvsc_bus_scan()
        scsi_scan_target()
          __scsi_scan_target()
            scsi_probe_and_add_lun(hostdata=NULL)
              scsi_alloc_sdev(hostdata=NULL)
      
      	  sdev->hostdata = hostdata
      
      	  now the host allocation fails
      
                __scsi_remove_device(sdev)
      
      	  calls sdev->host->hostt->slave_destroy() ==
      	  storvsc_device_destroy(sdev)
      	    access of sdev->hostdata->request_mempool
      Signed-off-by: default avatarAles Novak <alnovak@suse.cz>
      Signed-off-by: default avatarThomas Abraham <tabraham@suse.com>
      Reviewed-by: default avatarJiri Kosina <jkosina@suse.cz>
      Acked-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d61f12b
    • Chad Dupuis's avatar
      SCSI: qla2xxx: Fix multiqueue MSI-X registration. · f2071913
      Chad Dupuis authored
      commit f324777e upstream.
      
      This fixes requesting of the MSI-X vectors for the base response queue.
      The iteration in the for loop in qla24xx_enable_msix() was incorrect.
      We should only iterate of the first two MSI-X vectors and not the total
      number of MSI-X vectors that have given to the driver for this device
      from pci_enable_msix() in this function.
      Signed-off-by: default avatarChad Dupuis <chad.dupuis@qlogic.com>
      Signed-off-by: default avatarSaurav Kashyap <saurav.kashyap@qlogic.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2071913
    • Giridhar Malavali's avatar
    • Lukasz Dorau's avatar
      SCSI: isci: correct erroneous for_each_isci_host macro · b7394ca8
      Lukasz Dorau authored
      commit c59053a2 upstream.
      
      In the first place, the loop 'for' in the macro 'for_each_isci_host'
      (drivers/scsi/isci/host.h:314) is incorrect, because it accesses
      the 3rd element of 2 element array. After the 2nd iteration it executes
      the instruction:
              ihost = to_pci_info(pdev)->hosts[2]
      (while the size of the 'hosts' array equals 2) and reads an
      out of range element.
      
      In the second place, this loop is incorrectly optimized by GCC v4.8
      (see http://marc.info/?l=linux-kernel&m=138998871911336&w=2).
      As a result, on platforms with two SCU controllers,
      the loop is executed more times than it can be (for i=0,1 and 2).
      It causes kernel panic during entering the S3 state
      and the following oops after 'rmmod isci':
      
      BUG: unable to handle kernel NULL pointer dereference at (null)
      IP: [<ffffffff8131360b>] __list_add+0x1b/0xc0
      Oops: 0000 [#1] SMP
      RIP: 0010:[<ffffffff8131360b>]  [<ffffffff8131360b>] __list_add+0x1b/0xc0
      Call Trace:
        [<ffffffff81661b84>] __mutex_lock_slowpath+0x114/0x1b0
        [<ffffffff81661c3f>] mutex_lock+0x1f/0x30
        [<ffffffffa03e97cb>] sas_disable_events+0x1b/0x50 [libsas]
        [<ffffffffa03e9818>] sas_unregister_ha+0x18/0x60 [libsas]
        [<ffffffffa040316e>] isci_unregister+0x1e/0x40 [isci]
        [<ffffffffa0403efd>] isci_pci_remove+0x5d/0x100 [isci]
        [<ffffffff813391cb>] pci_device_remove+0x3b/0xb0
        [<ffffffff813fbf7f>] __device_release_driver+0x7f/0xf0
        [<ffffffff813fc8f8>] driver_detach+0xa8/0xb0
        [<ffffffff813fbb8b>] bus_remove_driver+0x9b/0x120
        [<ffffffff813fcf2c>] driver_unregister+0x2c/0x50
        [<ffffffff813381f3>] pci_unregister_driver+0x23/0x80
        [<ffffffffa04152f8>] isci_exit+0x10/0x1e [isci]
        [<ffffffff810d199b>] SyS_delete_module+0x16b/0x2d0
        [<ffffffff81012a21>] ? do_notify_resume+0x61/0xa0
        [<ffffffff8166ce29>] system_call_fastpath+0x16/0x1b
      
      The loop has been corrected.
      This patch fixes kernel panic during entering the S3 state
      and the above oops.
      Signed-off-by: default avatarLukasz Dorau <lukasz.dorau@intel.com>
      Reviewed-by: default avatarMaciej Patelczyk <maciej.patelczyk@intel.com>
      Tested-by: default avatarLukasz Dorau <lukasz.dorau@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b7394ca8
    • Dan Williams's avatar
      SCSI: isci: fix reset timeout handling · 25f6f743
      Dan Williams authored
      commit ddfadd77 upstream.
      
      Remove an erroneous BUG_ON() in the case of a hard reset timeout.  The
      reset timeout handler puts the port into the "awaiting link-up" state.
      The timeout causes the device to be disconnected and we need to be in
      the awaiting link-up state to re-connect the port.  The BUG_ON() made
      the incorrect assumption that resets never timeout and we always
      complete the reset in the "resetting" state.
      
      Testing this patch also uncovered that libata continues to attempt to
      reset the port long after the driver has torn down the context.  Once
      the driver has committed to abandoning the link it must indicate to
      libata that recovery ends by returning -ENODEV from
      ->lldd_I_T_nexus_reset().
      Acked-by: default avatarLukasz Dorau <lukasz.dorau@intel.com>
      Reported-by: default avatarDavid Milburn <dmilburn@redhat.com>
      Reported-by: default avatarXun Ni <xun.ni@intel.com>
      Tested-by: default avatarXun Ni <xun.ni@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      25f6f743
    • Marc Kleine-Budde's avatar
      can: flexcan: flexcan_remove(): add missing netif_napi_del() · a0141cd4
      Marc Kleine-Budde authored
      commit d96e43e8 upstream.
      
      This patch adds the missing netif_napi_del() to the flexcan_remove() function.
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a0141cd4
    • Marc Kleine-Budde's avatar
      can: flexcan: factor out transceiver {en,dis}able into seperate functions · 7ffe7913
      Marc Kleine-Budde authored
      commit f003698e upstream.
      
      This patch moves the transceiver enable and disable into seperate functions,
      where the NULL pointer check is hidden.
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7ffe7913
    • Marc Kleine-Budde's avatar
      can: flexcan: fix transition from and to low power mode in chip_{en,dis}able · 6754f4b6
      Marc Kleine-Budde authored
      commit 9b00b300 upstream.
      
      In flexcan_chip_enable() and flexcan_chip_disable() fixed delays are used.
      Experiments have shown that the transition from and to low power mode may take
      several microseconds.
      
      This patch adds a while loop which polls the Low Power Mode ACK bit (LPM_ACK)
      that indicates a successfull mode change. If the function runs into a timeout a
      error value is returned.
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6754f4b6
    • Marc Kleine-Budde's avatar
      can: flexcan: flexcan_open(): fix error path if flexcan_chip_start() fails · 2f264c30
      Marc Kleine-Budde authored
      commit 7e9e148a upstream.
      
      If flexcan_chip_start() in flexcan_open() fails, the interrupt is not freed,
      this patch adds the missing cleanup.
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f264c30
    • Marc Kleine-Budde's avatar
      can: flexcan: fix shutdown: first disable chip, then all interrupts · 76c32c08
      Marc Kleine-Budde authored
      commit 5be93bdd upstream.
      
      When shutting down the CAN interface (ifconfig canX down) during high CAN bus
      loads, the CAN core might hang and freeze the whole CPU.
      
      This patch fixes the shutdown sequence by first disabling the CAN core then
      disabling all interrupts.
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76c32c08
    • Anton Blanchard's avatar
      net: unix socket code abuses csum_partial · 5332f853
      Anton Blanchard authored
      commit 0a13404d upstream.
      
      The unix socket code is using the result of csum_partial to
      hash into a lookup table:
      
      	unix_hash_fold(csum_partial(sunaddr, len, 0));
      
      csum_partial is only guaranteed to produce something that can be
      folded into a checksum, as its prototype explains:
      
       * returns a 32-bit number suitable for feeding into itself
       * or csum_tcpudp_magic
      
      The 32bit value should not be used directly.
      
      Depending on the alignment, the ppc64 csum_partial will return
      different 32bit partial checksums that will fold into the same
      16bit checksum.
      
      This difference causes the following testcase (courtesy of
      Gustavo) to sometimes fail:
      
      #include <sys/socket.h>
      #include <stdio.h>
      
      int main()
      {
      	int fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);
      
      	int i = 1;
      	setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &i, 4);
      
      	struct sockaddr addr;
      	addr.sa_family = AF_LOCAL;
      	bind(fd, &addr, 2);
      
      	listen(fd, 128);
      
      	struct sockaddr_storage ss;
      	socklen_t sslen = (socklen_t)sizeof(ss);
      	getsockname(fd, (struct sockaddr*)&ss, &sslen);
      
      	fd = socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC, 0);
      
      	if (connect(fd, (struct sockaddr*)&ss, sslen) == -1){
      		perror(NULL);
      		return 1;
      	}
      	printf("OK\n");
      	return 0;
      }
      
      As suggested by davem, fix this by using csum_fold to fold the
      partial 32bit checksum into a 16bit checksum before using it.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5332f853
    • Heinz Mauelshagen's avatar
      dm cache: fix access beyond end of origin device · 8b553836
      Heinz Mauelshagen authored
      commit e893fba9 upstream.
      
      In order to avoid wasting cache space a partial block at the end of the
      origin device is not cached.  Unfortunately, the check for such a
      partial block at the end of the origin device was flawed.
      
      Fix accesses beyond the end of the origin device that occured due to
      attempted promotion of an undetected partial block by:
      
      - initializing the per bio data struct to allow cache_end_io to work properly
      - recognizing access to the partial block at the end of the origin device
      - avoiding out of bounds access to the discard bitset
      
      Otherwise, users can experience errors like the following:
      
       attempt to access beyond end of device
       dm-5: rw=0, want=20971520, limit=20971456
       ...
       device-mapper: cache: promotion failed; couldn't copy block
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b553836
    • Heinz Mauelshagen's avatar
      dm cache: fix truncation bug when copying a block to/from >2TB fast device · 29344f5a
      Heinz Mauelshagen authored
      commit 8b9d9666 upstream.
      
      During demotion or promotion to a cache's >2TB fast device we must not
      truncate the cache block's associated sector to 32bits.  The 32bit
      temporary result of from_cblock() caused a 32bit multiplication when
      calculating the sector of the fast device in issue_copy_real().
      
      Use an intermediate 64bit type to store the 32bit from_cblock() to allow
      for proper 64bit multiplication.
      
      Here is an example of how this bug manifests on an ext4 filesystem:
      
       EXT4-fs error (device dm-0): ext4_mb_generate_buddy:756: group 17136, 32768 clusters in bitmap, 30688 in gd; block bitmap corrupt.
       JBD2: Spotted dirty metadata buffer (dev = dm-0, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29344f5a
    • Joe Thornber's avatar
      dm space map metadata: fix refcount decrement below 0 which caused corruption · 3c47ea5b
      Joe Thornber authored
      commit cebc2de4 upstream.
      
      This has been a relatively long-standing issue that wasn't nailed down
      until Teng-Feng Yang's meticulous bug report to dm-devel on 3/7/2014,
      see: http://www.redhat.com/archives/dm-devel/2014-March/msg00021.html
      
      From that report:
        "When decreasing the reference count of a metadata block with its
        reference count equals 3, we will call dm_btree_remove() to remove
        this enrty from the B+tree which keeps the reference count info in
        metadata device.
      
        The B+tree will try to rebalance the entry of the child nodes in each
        node it traversed, and the rebalance process contains the following
        steps.
      
        (1) Finding the corresponding children in current node (shadow_current(s))
        (2) Shadow the children block (issue BOP_INC)
        (3) redistribute keys among children, and free children if necessary (issue BOP_DEC)
      
        Since the update of a metadata block's reference count could be
        recursive, we will stash these reference count update operations in
        smm->uncommitted and then process them in a FILO fashion.
      
        The problem is that step(3) could free the children which is created
        in step(2), so the BOP_DEC issued in step(3) will be carried out
        before the BOP_INC issued in step(2) since these BOPs will be
        processed in FILO fashion. Once the BOP_DEC from step(3) tries to
        decrease the reference count of newly shadow block, it will report
        failure for its reference equals 0 before decreasing. It looks like we
        can solve this issue by processing these BOPs in a FIFO fashion
        instead of FILO."
      
      Commit 5b564d80 ("dm space map: disallow decrementing a reference count
      below zero") changed the code to report an error for this temporary
      refcount decrement below zero.  So what was previously a harmless
      invalid refcount became a hard failure due to the new error path:
      
       device-mapper: space map common: unable to decrement a reference count below 0
       device-mapper: thin: 253:6: dm_thin_insert_block() failed: error = -22
       device-mapper: thin: 253:6: switching pool to read-only mode
      
      This bug is in dm persistent-data code that is common to the DM thin and
      cache targets.  So any users of those targets should apply this fix.
      
      Fix this by applying recursive space map operations in FIFO order rather
      than FILO.
      
      Resolves: https://bugzilla.kernel.org/show_bug.cgi?id=68801Reported-by: default avatarApollon Oikonomopoulos <apoikos@debian.org>
      Reported-by: edwillam1007@gmail.com
      Reported-by: default avatarTeng-Feng Yang <shinrairis@gmail.com>
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c47ea5b
    • Heinz Mauelshagen's avatar
      dm cache mq: fix memory allocation failure for large cache devices · a1bd0d7b
      Heinz Mauelshagen authored
      commit 14f398ca upstream.
      
      The memory allocated for the multiqueue policy's hash table doesn't need
      to be physically contiguous.  Use vzalloc() instead of kzalloc().
      Fedora has been carrying this fix since 10/10/2013.
      
      Failure seen during creation of a 10TB cached device with a 2048 sector
      block size and 411GB cache size:
      
       dmsetup: page allocation failure: order:9, mode:0x10c0d0
       CPU: 11 PID: 29235 Comm: dmsetup Not tainted 3.10.4 #3
       Hardware name: Supermicro X8DTL/X8DTL, BIOS 2.1a       12/30/2011
        000000000010c0d0 ffff880090941898 ffffffff81387ab4 ffff880090941928
        ffffffff810bb26f 0000000000000009 000000000010c0d0 ffff880090941928
        ffffffff81385dbc ffffffff815f3840 ffffffff00000000 000002000010c0d0
       Call Trace:
        [<ffffffff81387ab4>] dump_stack+0x19/0x1b
        [<ffffffff810bb26f>] warn_alloc_failed+0x110/0x124
        [<ffffffff81385dbc>] ? __alloc_pages_direct_compact+0x17c/0x18e
        [<ffffffff810bda2e>] __alloc_pages_nodemask+0x6c7/0x75e
        [<ffffffff810bdad7>] __get_free_pages+0x12/0x3f
        [<ffffffff810ea148>] kmalloc_order_trace+0x29/0x88
        [<ffffffff810ec1fd>] __kmalloc+0x36/0x11b
        [<ffffffffa031eeed>] ? mq_create+0x1dc/0x2cf [dm_cache_mq]
        [<ffffffffa031efc0>] mq_create+0x2af/0x2cf [dm_cache_mq]
        [<ffffffffa0314605>] dm_cache_policy_create+0xa7/0xd2 [dm_cache]
        [<ffffffffa0312530>] ? cache_ctr+0x245/0xa13 [dm_cache]
        [<ffffffffa031263e>] cache_ctr+0x353/0xa13 [dm_cache]
        [<ffffffffa012b916>] dm_table_add_target+0x227/0x2ce [dm_mod]
        [<ffffffffa012e8e4>] table_load+0x286/0x2ac [dm_mod]
        [<ffffffffa012e65e>] ? dev_wait+0x8a/0x8a [dm_mod]
        [<ffffffffa012e324>] ctl_ioctl+0x39a/0x3c2 [dm_mod]
        [<ffffffffa012e35a>] dm_ctl_ioctl+0xe/0x12 [dm_mod]
        [<ffffffff81101181>] vfs_ioctl+0x21/0x34
        [<ffffffff811019d3>] do_vfs_ioctl+0x3b1/0x3f4
        [<ffffffff810f4d2e>] ? ____fput+0x9/0xb
        [<ffffffff81050b6c>] ? task_work_run+0x7e/0x92
        [<ffffffff81101a68>] SyS_ioctl+0x52/0x82
        [<ffffffff81391d92>] system_call_fastpath+0x16/0x1b
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1bd0d7b
    • Laura Abbott's avatar
      mm/compaction: break out of loop on !PageBuddy in isolate_freepages_block · 088e8c54
      Laura Abbott authored
      commit 2af120bc upstream.
      
      We received several reports of bad page state when freeing CMA pages
      previously allocated with alloc_contig_range:
      
          BUG: Bad page state in process Binder_A  pfn:63202
          page:d21130b0 count:0 mapcount:1 mapping:  (null) index:0x7dfbf
          page flags: 0x40080068(uptodate|lru|active|swapbacked)
      
      Based on the page state, it looks like the page was still in use.  The
      page flags do not make sense for the use case though.  Further debugging
      showed that despite alloc_contig_range returning success, at least one
      page in the range still remained in the buddy allocator.
      
      There is an issue with isolate_freepages_block.  In strict mode (which
      CMA uses), if any pages in the range cannot be isolated,
      isolate_freepages_block should return failure 0.  The current check
      keeps track of the total number of isolated pages and compares against
      the size of the range:
      
              if (strict && nr_strict_required > total_isolated)
                      total_isolated = 0;
      
      After taking the zone lock, if one of the pages in the range is not in
      the buddy allocator, we continue through the loop and do not increment
      total_isolated.  If in the last iteration of the loop we isolate more
      than one page (e.g.  last page needed is a higher order page), the check
      for total_isolated may pass and we fail to detect that a page was
      skipped.  The fix is to bail out if the loop immediately if we are in
      strict mode.  There's no benfit to continuing anyway since we need all
      pages to be isolated.  Additionally, drop the error checking based on
      nr_strict_required and just check the pfn ranges.  This matches with
      what isolate_freepages_range does.
      Signed-off-by: default avatarLaura Abbott <lauraa@codeaurora.org>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Acked-by: default avatarMichal Nazarewicz <mina86@mina86.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      088e8c54
    • Arnd Bergmann's avatar
      vmxnet3: fix building without CONFIG_PCI_MSI · ba11e06e
      Arnd Bergmann authored
      commit 0a8d8c44 upstream.
      
      Since commit d25f06ea "vmxnet3: fix netpoll race condition",
      the vmxnet3 driver fails to build when CONFIG_PCI_MSI is disabled,
      because it unconditionally references the vmxnet3_msix_rx()
      function.
      
      To fix this, use the same #ifdef in the caller that exists around
      the function definition.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Shreyas Bhatewara <sbhatewara@vmware.com>
      Cc: "VMware, Inc." <pv-drivers@vmware.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba11e06e