Commits · bb7728b8d454d67bb4743b79aac02db1aeb020b1 · Kirill Smelkov / linux

06 Aug, 2015 6 commits

hpfs: hpfs_error: Remove static buffer, use vsprintf extension %pV instead · bb7728b8

Joe Perches authored Mar 26, 2015

commit a28e4b2b upstream.

Removing unnecessary static buffers is good.
Use the vsprintf %pV extension instead.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mikulas Patocka <mikulas@twibright.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

bb7728b8

hpfs: kstrdup() out of memory handling · 2fab688e

Sanidhya Kashyap authored Mar 21, 2015

commit ce657611 upstream.

There is a possibility of nothing being allocated to the new_opts in
case of memory pressure, therefore return ENOMEM for such case.
Signed-off-by: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Signed-off-by: Mikulas Patocka <mikulas@twibright.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

2fab688e

selinux: don't waste ebitmap space when importing NetLabel categories · d80ad5a2

Paul Moore authored Jul 09, 2015

commit 33246035 upstream.

At present we don't create efficient ebitmaps when importing NetLabel
category bitmaps.  This can present a problem when comparing ebitmaps
since ebitmap_cmp() is very strict about these things and considers
these wasteful ebitmaps not equal when compared to their more
efficient counterparts, even if their values are the same.  This isn't
likely to cause problems on 64-bit systems due to a bit of luck on
how NetLabel/CIPSO works and the default ebitmap size, but it can be
a problem on 32-bit systems.

This patch fixes this problem by being a bit more intelligent when
importing NetLabel category bitmaps by skipping over empty sections
which should result in a nice, efficient ebitmap.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

d80ad5a2

mm: avoid setting up anonymous pages into file mapping · b2ec585b

Kirill A. Shutemov authored Jul 06, 2015

commit 6b7339f4 upstream.

Reading page fault handler code I've noticed that under right
circumstances kernel would map anonymous pages into file mappings: if
the VMA doesn't have vm_ops->fault() and the VMA wasn't fully populated
on ->mmap(), kernel would handle page fault to not populated pte with
do_anonymous_page().

Let's change page fault handler to use do_anonymous_page() only on
anonymous VMA (->vm_ops == NULL) and make sure that the VMA is not
shared.

For file mappings without vm_ops->fault() or shred VMA without vm_ops,
page fault on pte_none() entry would lead to SIGBUS.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ kamal: backport to 3.19-stable: s/do_fault/do_linear_fault/ ]
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

b2ec585b

drm/radeon: unpin cursor BOs on suspend and pin them again on resume (v2) · e198d1d8

Grigori Goronzy authored Jul 07, 2015

commit f3cbb17b upstream.

Everything is evicted from VRAM before suspend, so we need to make
sure all BOs are unpinned and re-pinned after resume. Fixes broken
mouse cursor after resume introduced by commit b9729b17.

[Michel Dänzer: Add pinning BOs on resume]

v2:
[Alex Deucher: merge cursor unpin into fb unpin loop]

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=100541
Reviewed-by: Christian König <christian.koenig@amd.com> (v1)
Signed-off-by: Grigori Goronzy <greg@chown.ath.cx>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

e198d1d8

drm/radeon: Clean up reference counting and pinning of the cursor BOs · 997f42e2

Michel Dänzer authored Jul 07, 2015

commit cd404af0 upstream.

Take a GEM reference for and pin the new cursor BO, unpin and drop the
GEM reference for the old cursor BO in radeon_crtc_cursor_set2, and use
radeon_crtc->cursor_addr in radeon_set_cursor.

This fixes radeon_cursor_reset accidentally incrementing the cursor BO
pin count, and cleans up the code a little.
Reviewed-by: Grigori Goronzy <greg@chown.ath.cx>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

997f42e2

05 Aug, 2015 14 commits

drm/radeon: fix HDP flushing · 8b72decc

Grigori Goronzy authored Jul 03, 2015

commit 54e03986 upstream.

This was regressed by commit 39e7f6f8, although I don't know of any
actual issues caused by it.

The storage domain is read without TTM locking now, but the lock
never helped to prevent any races.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Grigori Goronzy <greg@chown.ath.cx>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

8b72decc

cxl: Fix off by one error allowing subsequent mmap page to be accessed · 8c980662

Ian Munsie authored Jul 07, 2015

commit 10a5894f upstream.

It was discovered that if a process mmaped their problem state area they
were able to access one page more than expected, potentially allowing
them to access the problem state area of an unrelated process.

This was due to a simple off by one error in the mmap fault handler
introduced in 0712dc7e ("cxl: Fix issues
when unmapping contexts"), which is fixed in this patch.

Fixes: 0712dc7e ("cxl: Fix issues when unmapping contexts")
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

8c980662

powerpc/powernv: Fix race in updating core_idle_state · 55fbd98c

Shreyas B. Prabhu authored Jul 07, 2015

commit b32aadc1 upstream.

core_idle_state is maintained for each core. It uses 0-7 bits to track
whether a thread in the core has entered fastsleep or winkle. 8th bit is
used as a lock bit.
The lock bit is set in these 2 scenarios-
 - The thread is first in subcore to wakeup from sleep/winkle.
 - If its the last thread in the core about to enter sleep/winkle

While the lock bit is set, if any other thread in the core wakes up, it
loops until the lock bit is cleared before proceeding in the wakeup
path. This helps prevent race conditions w.r.t fastsleep workaround and
prevents threads from switching to process context before core/subcore
resources are restored.

But, in the path to sleep/winkle entry, we currently don't check for
lock-bit. This exposes us to following race when running with subcore
on-

First thread in the subcorea		Another thread in the same
waking up		   		core entering sleep/winkle

lwarx   r15,0,r14
ori     r15,r15,PNV_CORE_IDLE_LOCK_BIT
stwcx.  r15,0,r14
[Code to restore subcore state]

						lwarx   r15,0,r14
						[clear thread bit]
						stwcx.  r15,0,r14

andi.   r15,r15,PNV_CORE_IDLE_THREAD_BITS
stw     r15,0(r14)

Here, after the thread entering sleep clears its thread bit in
core_idle_state, the value is overwritten by the thread waking up.
In such cases when the core enters fastsleep, code mistakes an idle
thread as running. Because of this, the first thread waking up from
fastsleep which is supposed to resync timebase skips it. So we can
end up having a core with stale timebase value.

This patch fixes the above race by looping on the lock bit even while
entering the idle states.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Fixes: 7b54e9f213f76 'powernv/powerpc: Add winkle support for offline cpus'
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

55fbd98c

ACPI / PNP: Reserve ACPI resources at the fs_initcall_sync stage · 39893e3b

Rafael J. Wysocki authored Jul 04, 2015

commit 0294112e upstream.

This effectively reverts the following three commits:

 7bc10388 ACPI / resources: free memory on error in add_region_before()
 0f1b414d ACPI / PNP: Avoid conflicting resource reservations
 b9a5e5e1 ACPI / init: Fix the ordering of acpi_reserve_resources()

(commit b9a5e5e1 introduced regressions some of which, but not
all, were addressed by commit 0f1b414d and commit 7bc10388
was a fixup on top of the latter) and causes ACPI fixed hardware
resources to be reserved at the fs_initcall_sync stage of system
initialization.

The story is as follows.  First, a boot regression was reported due
to an apparent resource reservation ordering change after a commit
that shouldn't lead to such changes.  Investigation led to the
conclusion that the problem happened because acpi_reserve_resources()
was executed at the device_initcall() stage of system initialization
which wasn't strictly ordered with respect to driver initialization
(and with respect to the initialization of the pcieport driver in
particular), so a random change causing the device initcalls to be
run in a different order might break things.

The response to that was to attempt to run acpi_reserve_resources()
as soon as we knew that ACPI would be in use (commit b9a5e5e1).
However, that turned out to be too early, because it caused resource
reservations made by the PNP system driver to fail on at least one
system and that failure was addressed by commit 0f1b414d.

That fix still turned out to be insufficient, though, because
calling acpi_reserve_resources() before the fs_initcall stage of
system initialization caused a boot regression to happen on the
eCAFE EC-800-H20G/S netbook.  That meant that we only could call
acpi_reserve_resources() at the fs_initcall initialization stage
or later, but then we might just as well call it after the PNP
initalization in which case commit 0f1b414d wouldn't be
necessary any more.

For this reason, the changes made by commit 0f1b414d are reverted
(along with a memory leak fixup on top of that commit), the changes
made by commit b9a5e5e1 that went too far are reverted too and
acpi_reserve_resources() is changed into fs_initcall_sync, which
will cause it to be executed after the PNP subsystem initialization
(which is an fs_initcall) and before device initcalls (including
the pcieport driver initialization) which should avoid the initial
issue.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=100581
Link: http://marc.info/?t=143092384600002&r=1&w=2
Link: https://bugzilla.kernel.org/show_bug.cgi?id=99831
Link: http://marc.info/?t=143389402600001&r=1&w=2
Fixes: b9a5e5e1 "ACPI / init: Fix the ordering of acpi_reserve_resources()"
Reported-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

39893e3b

ARM: dts: am57xx-beagle-x15: Provide supply for usb2_phy2 · 934ab6eb

Roger Quadros authored Jun 17, 2015

commit 9ab402ae upstream.

Without this USB2 breaks if USB1 is disabled or USB1
initializes after USB2 e.g. due to deferred probing.

Fixes: 5a0f93c6 ("ARM: dts: Add am57xx-beagle-x15")
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

934ab6eb

ext4: replace open coded nofail allocation in ext4_free_blocks() · ef505188

Michal Hocko authored Jul 05, 2015

commit 7444a072 upstream.

ext4_free_blocks is looping around the allocation request and mimics
__GFP_NOFAIL behavior without any allocation fallback strategy. Let's
remove the open coded loop and replace it with __GFP_NOFAIL. Without the
flag the allocator has no way to find out never-fail requirement and
cannot help in any way.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

ef505188

ext4: correctly migrate a file with a hole at the beginning · 47eb4b36

Eryu Guan authored Jul 04, 2015

commit 8974fec7 upstream.

Currently ext4_ind_migrate() doesn't correctly handle a file which
contains a hole at the beginning of the file.  This caused the migration
to be done incorrectly, and then if there is a subsequent following
delayed allocation write to the "hole", this would reclaim the same data
blocks again and results in fs corruption.

  # assmuing 4k block size ext4, with delalloc enabled
  # skip the first block and write to the second block
  xfs_io -fc "pwrite 4k 4k" -c "fsync" /mnt/ext4/testfile

  # converting to indirect-mapped file, which would move the data blocks
  # to the beginning of the file, but extent status cache still marks
  # that region as a hole
  chattr -e /mnt/ext4/testfile

  # delayed allocation writes to the "hole", reclaim the same data block
  # again, results in i_blocks corruption
  xfs_io -c "pwrite 0 4k" /mnt/ext4/testfile
  umount /mnt/ext4
  e2fsck -nf /dev/sda6
  ...
  Inode 53, i_blocks is 16, should be 8.  Fix? no
  ...
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

47eb4b36

ext4: be more strict when migrating to non-extent based file · ca842bc9

Eryu Guan authored Jul 03, 2015

commit d6f123a9 upstream.

Currently the check in ext4_ind_migrate() is not enough before doing the
real conversion:

a) delayed allocated extents could bypass the check on eh->eh_entries
   and eh->eh_depth

This can be demonstrated by this script

  xfs_io -fc "pwrite 0 4k" -c "pwrite 8k 4k" /mnt/ext4/testfile
  chattr -e /mnt/ext4/testfile

where testfile has two extents but still be converted to non-extent
based file format.

b) only extent length is checked but not the offset, which would result
   in data lose (delalloc) or fs corruption (nodelalloc), because
   non-extent based file only supports at most (12 + 2^10 + 2^20 + 2^30)
   blocks

This can be demostrated by

  xfs_io -fc "pwrite 5T 4k" /mnt/ext4/testfile
  chattr -e /mnt/ext4/testfile
  sync

If delalloc is enabled, dmesg prints
  EXT4-fs warning (device dm-4): ext4_block_to_path:105: block 1342177280 > max in inode 53
  EXT4-fs (dm-4): Delayed block allocation failed for inode 53 at logical offset 1342177280 with max blocks 1 with error 5
  EXT4-fs (dm-4): This should not happen!! Data will be lost

If delalloc is disabled, e2fsck -nf shows corruption
  Inode 53, i_size is 5497558142976, should be 4096.  Fix? no

Fix the two issues by

a) forcing all delayed allocation blocks to be allocated before checking
   eh->eh_depth and eh->eh_entries
b) limiting the last logical block of the extent is within direct map
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

ca842bc9

ext4: fix reservation release on invalidatepage for delalloc fs · bbc80299

Lukas Czerner authored Jul 03, 2015

commit 9705acd6 upstream.

On delalloc enabled file system on invalidatepage operation
in ext4_da_page_release_reservation() we want to clear the delayed
buffer and remove the extent covering the delayed buffer from the extent
status tree.

However currently there is a bug where on the systems with page size >
block size we will always remove extents from the start of the page
regardless where the actual delayed buffers are positioned in the page.
This leads to the errors like this:

EXT4-fs warning (device loop0): ext4_da_release_space:1225:
ext4_da_release_space: ino 13, to_free 1 with only 0 reserved data
blocks

This however can cause data loss on writeback time if the file system is
in ENOSPC condition because we're releasing reservation for someones
else delayed buffer.

Fix this by only removing extents that corresponds to the part of the
page we want to invalidate.

This problem is reproducible by the following fio receipt (however I was
only able to reproduce it with fio-2.1 or older.

[global]
bs=8k
iodepth=1024
iodepth_batch=60
randrepeat=1
size=1m
directory=/mnt/test
numjobs=20
[job1]
ioengine=sync
bs=1k
direct=1
rw=randread
filename=file1:file2
[job2]
ioengine=libaio
rw=randwrite
direct=1
filename=file1:file2
[job3]
bs=1k
ioengine=posixaio
rw=randwrite
direct=1
filename=file1:file2
[job5]
bs=1k
ioengine=sync
rw=randread
filename=file1:file2
[job7]
ioengine=libaio
rw=randwrite
filename=file1:file2
[job8]
ioengine=posixaio
rw=randwrite
filename=file1:file2
[job10]
ioengine=mmap
rw=randwrite
bs=1k
filename=file1:file2
[job11]
ioengine=mmap
rw=randwrite
direct=1
filename=file1:file2
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

bbc80299

ext4: avoid deadlocks in the writeback path by using sb_getblk_gfp · 1fc56e88

Nikolay Borisov authored Jul 02, 2015

commit c45653c3 upstream.

Switch ext4 to using sb_getblk_gfp with GFP_NOFS added to fix possible
deadlocks in the page writeback path.
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

1fc56e88

bufferhead: Add _gfp version for sb_getblk() · 52a0b6ce

Nikolay Borisov authored Jul 02, 2015

commit bd7ade3c upstream.

sb_getblk() is used during ext4 (and possibly other FSes) writeback
paths. Sometimes such path require allocating memory and guaranteeing
that such allocation won't block. Currently, however, there is no way
to provide user flags for sb_getblk which could lead to deadlocks.

This patch implements a sb_getblk_gfp with the only difference it can
accept user-provided GFP flags.
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

52a0b6ce

Btrfs: fix fsync data loss after append write · 895ee048

Filipe Manana authored Jun 17, 2015

commit e4545de5 upstream.

If we do an append write to a file (which increases its inode's i_size)
that does not have the flag BTRFS_INODE_NEEDS_FULL_SYNC set in its inode,
and the previous transaction added a new hard link to the file, which sets
the flag BTRFS_INODE_COPY_EVERYTHING in the file's inode, and then fsync
the file, the inode's new i_size isn't logged. This has the consequence
that after the fsync log is replayed, the file size remains what it was
before the append write operation, which means users/applications will
not be able to read the data that was successsfully fsync'ed before.

This happens because neither the inode item nor the delayed inode get
their i_size updated when the append write is made - doing so would
require starting a transaction in the buffered write path, something that
we do not do intentionally for performance reasons.

Fix this by making sure that when the flag BTRFS_INODE_COPY_EVERYTHING is
set the inode is logged with its current i_size (log the in-memory inode
into the log tree).

This issue is not a recent regression and is easy to reproduce with the
following test case for fstests:

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"

  here=`pwd`
  tmp=/tmp/$$
  status=1	# failure is the default!

  _cleanup()
  {
          _cleanup_flakey
          rm -f $tmp.*
  }
  trap "_cleanup; exit \$status" 0 1 2 3 15

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter
  . ./common/dmflakey

  # real QA test starts here
  _supported_fs generic
  _supported_os Linux
  _need_to_be_root
  _require_scratch
  _require_dm_flakey
  _require_metadata_journaling $SCRATCH_DEV

  _crash_and_mount()
  {
          # Simulate a crash/power loss.
          _load_flakey_table $FLAKEY_DROP_WRITES
          _unmount_flakey
          # Allow writes again and mount. This makes the fs replay its fsync log.
          _load_flakey_table $FLAKEY_ALLOW_WRITES
          _mount_flakey
  }

  rm -f $seqres.full

  _scratch_mkfs >> $seqres.full 2>&1
  _init_flakey
  _mount_flakey

  # Create the test file with some initial data and then fsync it.
  # The fsync here is only needed to trigger the issue in btrfs, as it causes the
  # the flag BTRFS_INODE_NEEDS_FULL_SYNC to be removed from the btrfs inode.
  $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 32k" \
                  -c "fsync" \
                  $SCRATCH_MNT/foo | _filter_xfs_io
  sync

  # Add a hard link to our file.
  # On btrfs this sets the flag BTRFS_INODE_COPY_EVERYTHING on the btrfs inode,
  # which is a necessary condition to trigger the issue.
  ln $SCRATCH_MNT/foo $SCRATCH_MNT/bar

  # Sync the filesystem to force a commit of the current btrfs transaction, this
  # is a necessary condition to trigger the bug on btrfs.
  sync

  # Now append more data to our file, increasing its size, and fsync the file.
  # In btrfs because the inode flag BTRFS_INODE_COPY_EVERYTHING was set and the
  # write path did not update the inode item in the btree nor the delayed inode
  # item (in memory struture) in the current transaction (created by the fsync
  # handler), the fsync did not record the inode's new i_size in the fsync
  # log/journal. This made the data unavailable after the fsync log/journal is
  # replayed.
  $XFS_IO_PROG -c "pwrite -S 0xbb 32K 32K" \
               -c "fsync" \
               $SCRATCH_MNT/foo | _filter_xfs_io

  echo "File content after fsync and before crash:"
  od -t x1 $SCRATCH_MNT/foo

  _crash_and_mount

  echo "File content after crash and log replay:"
  od -t x1 $SCRATCH_MNT/foo

  status=0
  exit

The expected file output before and after the crash/power failure expects the
appended data to be available, which is:

  0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
  *
  0100000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
  *
  0200000
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

895ee048

Btrfs: fix race between caching kthread and returning inode to inode cache · ff30e038

Filipe Manana authored Jun 13, 2015

commit ae9d8f17 upstream.

While the inode cache caching kthread is calling btrfs_unpin_free_ino(),
we could have a concurrent call to btrfs_return_ino() that adds a new
entry to the root's free space cache of pinned inodes. This concurrent
call does not acquire the fs_info->commit_root_sem before adding a new
entry if the caching state is BTRFS_CACHE_FINISHED, which is a problem
because the caching kthread calls btrfs_unpin_free_ino() after setting
the caching state to BTRFS_CACHE_FINISHED and therefore races with
the task calling btrfs_return_ino(), which is adding a new entry, while
the former (caching kthread) is navigating the cache's rbtree, removing
and freeing nodes from the cache's rbtree without acquiring the spinlock
that protects the rbtree.

This race resulted in memory corruption due to double free of struct
btrfs_free_space objects because both tasks can end up doing freeing the
same objects. Note that adding a new entry can result in merging it with
other entries in the cache, in which case those entries are freed.
This is particularly important as btrfs_free_space structures are also
used for the block group free space caches.

This memory corruption can be detected by a debugging kernel, which
reports it with the following trace:

[132408.501148] slab error in verify_redzone_free(): cache `btrfs_free_space': double free detected
[132408.505075] CPU: 15 PID: 12248 Comm: btrfs-ino-cache Tainted: G        W       4.1.0-rc5-btrfs-next-10+ #1
[132408.505075] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[132408.505075]  ffff880023e7d320 ffff880163d73cd8 ffffffff8145eec7 ffffffff81095dce
[132408.505075]  ffff880009735d40 ffff880163d73ce8 ffffffff81154e1e ffff880163d73d68
[132408.505075]  ffffffff81155733 ffffffffa054a95a ffff8801b6099f00 ffffffffa0505b5f
[132408.505075] Call Trace:
[132408.505075]  [<ffffffff8145eec7>] dump_stack+0x4f/0x7b
[132408.505075]  [<ffffffff81095dce>] ? console_unlock+0x356/0x3a2
[132408.505075]  [<ffffffff81154e1e>] __slab_error.isra.28+0x25/0x36
[132408.505075]  [<ffffffff81155733>] __cache_free+0xe2/0x4b6
[132408.505075]  [<ffffffffa054a95a>] ? __btrfs_add_free_space+0x2f0/0x343 [btrfs]
[132408.505075]  [<ffffffffa0505b5f>] ? btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
[132408.505075]  [<ffffffff810f3b30>] ? time_hardirqs_off+0x15/0x28
[132408.505075]  [<ffffffff81084d42>] ? trace_hardirqs_off+0xd/0xf
[132408.505075]  [<ffffffff811563a1>] ? kfree+0xb6/0x14e
[132408.505075]  [<ffffffff811563d0>] kfree+0xe5/0x14e
[132408.505075]  [<ffffffffa0505b5f>] btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
[132408.505075]  [<ffffffffa0505e08>] caching_kthread+0x29e/0x2d9 [btrfs]
[132408.505075]  [<ffffffffa0505b6a>] ? btrfs_unpin_free_ino+0x99/0x99 [btrfs]
[132408.505075]  [<ffffffff8106698f>] kthread+0xef/0xf7
[132408.505075]  [<ffffffff810f3b08>] ? time_hardirqs_on+0x15/0x28
[132408.505075]  [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
[132408.505075]  [<ffffffff814653d2>] ret_from_fork+0x42/0x70
[132408.505075]  [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
[132408.505075] ffff880023e7d320: redzone 1:0x9f911029d74e35b, redzone 2:0x9f911029d74e35b.
[132409.501654] slab: double free detected in cache 'btrfs_free_space', objp ffff880023e7d320
[132409.503355] ------------[ cut here ]------------
[132409.504241] kernel BUG at mm/slab.c:2571!

Therefore fix this by having btrfs_unpin_free_ino() acquire the lock
that protects the rbtree while doing the searches and removing entries.

Fixes: 1c70d8fb ("Btrfs: fix inode caching vs tree log")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

ff30e038

Btrfs: use kmem_cache_free when freeing entry in inode cache · 4899b628

Filipe Manana authored Jun 13, 2015

commit c3f4a168 upstream.

The free space entries are allocated using kmem_cache_zalloc(),
through __btrfs_add_free_space(), therefore we should use
kmem_cache_free() and not kfree() to avoid any confusion and
any potential problem. Looking at the kfree() definition at
mm/slab.c it has the following comment:

  /*
   * (...)
   *
   * Don't free memory not originally allocated by kmalloc()
   * or you will run into trouble.
   */

So better be safe and use kmem_cache_free().
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

4899b628

04 Aug, 2015 1 commit

sg_start_req(): make sure that there's not too many elements in iovec · b50ae828

Al Viro authored Aug 01, 2015

commit 451a2886 upstream.

unfortunately, allowing an arbitrary 16bit value means a possibility of
overflow in the calculation of total number of pages in bio_map_user_iov() -
we rely on there being no more than PAGE_SIZE members of sum in the
first loop there.  If that sum wraps around, we end up allocating
too small array of pointers to pages and it's easy to overflow it in
the second loop.

X-Coverup: TINC (and there's no lumber cartel either)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: s/MAX_UIOVEC/UIO_MAXIOV/. This was fixed upstream by commit
 fdc81f45 ("sg_start_req(): use import_iovec()"), but we don't have
 that function.]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Reference: CVE-2015-5707
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

b50ae828

30 Jul, 2015 9 commits

KEYS: ensure we free the assoc array edit if edit is valid · 21b0f383

Colin Ian King authored Jul 27, 2015

commit ca4da5dd upstream.

__key_link_end is not freeing the associated array edit structure
and this leads to a 512 byte memory leak each time an identical
existing key is added with add_key().

The reason the add_key() system call returns okay is that
key_create_or_update() calls __key_link_begin() before checking to see
whether it can update a key directly rather than adding/replacing - which
it turns out it can.  Thus __key_link() is not called through
__key_instantiate_and_link() and __key_link_end() must cancel the edit.

CVE-2015-1333
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

21b0f383

x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection · cff4ef3d

Andy Lutomirski authored Jul 10, 2015

commit 810bc075 upstream.

We have a tricky bug in the nested NMI code: if we see RSP pointing
to the NMI stack on NMI entry from kernel mode, we assume that we
are executing a nested NMI.

This isn't quite true.  A malicious userspace program can point RSP
at the NMI stack, issue SYSCALL, and arrange for an NMI to happen
while RSP is still pointing at the NMI stack.

Fix it with a sneaky trick.  Set DF in the region of code that the RSP
check is intended to detect.  IRET will clear DF atomically.

(Note: other than paravirt, there's little need for all this complexity.
 We could check RIP instead of RSP.)

Fixes CVE-2015-3291.
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
[bwh: Backported to 4.0: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
CVE-2015-3291
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

cff4ef3d

x86/nmi/64: Reorder nested NMI checks · 7eddd205

Andy Lutomirski authored Jul 12, 2015

commit a27507ca upstream.

Check the repeat_nmi .. end_repeat_nmi special case first.  The next
patch will rework the RSP check and, as a side effect, the RSP check
will no longer detect repeat_nmi .. end_repeat_nmi, so we'll need
this ordering of the checks.

Note: this is more subtle than it appears.  The check for repeat_nmi
.. end_repeat_nmi jumps straight out of the NMI code instead of
adjusting the "iret" frame to force a repeat.  This is necessary,
because the code between repeat_nmi and end_repeat_nmi sets "NMI
executing" and then writes to the "iret" frame itself.  If a nested
NMI comes in and modifies the "iret" frame while repeat_nmi is also
modifying it, we'll end up with garbage.  The old code got this
right, as does the new code, but the new code is a bit more
explicit.

If we were to move the check right after the "NMI executing" check,
then we'd get it wrong and have random crashes.

This is a prerequisite for the fix for CVE-2015-3291.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
[bwh: Backported to 4.0: adjust filename, spacing]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

7eddd205

x86/nmi/64: Improve nested NMI comments · 3be75fea

Andy Lutomirski authored Jul 10, 2015

commit 0b22930e upstream.

I found the nested NMI documentation to be difficult to follow.
Improve the comments.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
[bwh: Backported to 4.0: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

3be75fea

x86/nmi/64: Switch stacks on userspace NMI entry · d8fa09a2

Andy Lutomirski authored Jul 10, 2015

commit 9b6e6a83 upstream.

Returning to userspace is tricky: IRET can fail, and ESPFIX can
rearrange the stack prior to IRET.

The NMI nesting fixup relies on a precise stack layout and atomic
IRET.  Rather than trying to teach the NMI nesting fixup to handle
ESPFIX and failed IRET, punt: run NMIs that came from user mode on
the normal kernel stack.

This will make some nested NMIs visible to C code, but the C code is
okay with that.

As a side effect, this should speed up perf: it eliminates an RDMSR
when NMIs come from user mode.

Fixes CVE-2015-3290.
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
[bwh: Backported to 4.0:
 - Adjust filename, context
 - s/restore_c_regs_and_iret/restore_args/
 - Use kernel_stack + KERNEL_STACK_OFFSET instead of cpu_current_top_of_stack]
[luto: Open-coded return path to avoid dependency on partial pt_regs details]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
CVE-2015-3290, CVE-2015-5157
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

d8fa09a2

x86/nmi/64: Remove asm code that saves cr2 · bae96303

Andy Lutomirski authored Jul 10, 2015

commit 0e181bb5 upstream.

Now that do_nmi saves cr2, we don't need to save it in asm.

This is a prerequisity for the fix for CVE-2015-3290.
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
[bwh: Backported to 4.0: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

bae96303

x86/nmi: Enable nested do_nmi handling for 64-bit kernels · fe0d50e2

Andy Lutomirski authored Jul 10, 2015

commit 9d050416 upstream.

32-bit kernels handle nested NMIs in C.  Enable the exact same
handling on 64-bit kernels as well.  This isn't currently necessary,
but it will become necessary once the asm code starts allowing
limited nesting.

This is a prerequisite for the fix for CVE-2015-3290.
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

fe0d50e2

x86/asm/entry/64: Remove a redundant jump · f0dc9f75

Denys Vlasenko authored Apr 07, 2015

commit a30b0085 upstream.

Jumping to the very next instruction is not very useful:

        jmp label
    label:

Removing the jump.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428439424-7258-5-git-send-email-dvlasenk@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

f0dc9f75

x86/asm/entry/64: Fold the 'test_in_nmi' macro into its only user · 1c037821

Denys Vlasenko authored Apr 01, 2015

commit 0784b364 upstream.

No code changes.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427899858-7165-1-git-send-email-dvlasenk@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

1c037821

24 Jul, 2015 2 commits

evm: labeling pseudo filesystems exception · 00cc0819

Mimi Zohar authored Apr 21, 2015

commit 5101a185 upstream.

To prevent offline stripping of existing file xattrs and relabeling of
them at runtime, EVM allows only newly created files to be labeled.  As
pseudo filesystems are not persistent, stripping of xattrs is not a
concern.

Some LSMs defer file labeling on pseudo filesystems.  This patch
permits the labeling of existing files on pseudo files systems.
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

00cc0819

ieee802154: Fix sockaddr_ieee802154 implicit padding information leak. · 18ffb713

Lennert Buytenhek authored Jun 03, 2015

commit 8a70cefa upstream.

The AF_IEEE802154 sockaddr looks like this:

	struct sockaddr_ieee802154 {
		sa_family_t family; /* AF_IEEE802154 */
		struct ieee802154_addr_sa addr;
	};

	struct ieee802154_addr_sa {
		int addr_type;
		u16 pan_id;
		union {
			u8 hwaddr[IEEE802154_ADDR_LEN];
			u16 short_addr;
		};
	};

On most architectures there will be implicit structure padding here,
in two different places:

* In struct sockaddr_ieee802154, two bytes of padding between 'family'
  (unsigned short) and 'addr', so that 'addr' starts on a four byte
  boundary.

* In struct ieee802154_addr_sa, two bytes at the end of the structure,
  to make the structure 16 bytes.

When calling recvmsg(2) on a PF_IEEE802154 SOCK_DGRAM socket, the
ieee802154 stack constructs a struct sockaddr_ieee802154 on the
kernel stack without clearing these padding fields, and, depending
on the addr_type, between four and ten bytes of uncleared kernel
stack will be copied to userspace.

We can't just insert two 'u16 __pad's in the right places and zero
those before copying an address to userspace, as not all architectures
insert this implicit padding -- from a quick test it seems that avr32,
cris and m68k don't insert this padding, while every other architecture
that I have cross compilers for does insert this padding.

The easiest way to plug the leak is to just memset the whole struct
sockaddr_ieee802154 before filling in the fields we want to fill in,
and that's what this patch does.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
[ luis: backported to 3.16:
  - file rename: net/ieee802154/socket.c -> net/ieee802154/dgram.c ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

18ffb713

20 Jul, 2015 1 commit
- Linux 3.19.8-ckt4 · 138bca44
  Kamal Mostafa authored Jul 20, 2015
```
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
```
  138bca44
16 Jul, 2015 7 commits

Btrfs: lock superblock before remounting for rw subvol · 8b76631c

Omar Sandoval authored May 18, 2015

commit 773cd04e upstream.

Since commit 0723a047 ("btrfs: allow mounting btrfs subvolumes with
different ro/rw options"), when mounting a subvolume read/write when
another subvolume has previously been mounted read-only, we first do a
remount. However, this should be done with the superblock locked, as per
sync_filesystem():

	/*
	 * We need to be protected against the filesystem going from
	 * r/o to r/w or vice versa.
	 */
	WARN_ON(!rwsem_is_locked(&sb->s_umount));

This WARN_ON can easily be hit with:

mkfs.btrfs -f /dev/vdb
mount /dev/vdb /mnt
btrfs subvol create /mnt/vol1
btrfs subvol create /mnt/vol2
umount /mnt
mount -oro,subvol=/vol1 /dev/vdb /mnt
mount -orw,subvol=/vol2 /dev/vdb /mnt2

Fixes: 0723a047 ("btrfs: allow mounting btrfs subvolumes with different ro/rw options")
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

8b76631c

net/mlx4_core: Enhance the MAD_IFC wrapper to convert VF port to physical · 956b6a4f

Or Gerlitz authored May 21, 2015

commit 7c35ef45 upstream.

Single port VFs always provide port = 1 (even if the actual physical
port used is port 2). As such, we need to convert the port provided
by the VF to the physical port before calling into the firmware.

It turns out that the Linux mlx4 VF RoCE driver maintains a copy of
the GID table and hence this change became critical only for single
ported IB VFs, but it could be needed for other RoCE VF drivers too.

Fixes: 449fc488 ('net/mlx4: Adapt code for N-Port VF')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

956b6a4f

clk: Fix JSON output in debugfs · 374fbd9d

Stefan Wahren authored Apr 29, 2015

commit 7cb81136 upstream.

key/value pairs in a JSON object must be separated by a comma.
After adding the properties "accuracy" and "phase" the JSON output
of /sys/kernel/debug/clk/clk_dump is invalid.

So add the missing commas to fix it.

Fixes: 5279fc40 ("clk: add clk accuracy retrieval support")
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
[sboyd@codeaurora.org: Added comment in function]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
[ kamal: backport to 3.19-stable: context ]
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

374fbd9d

e1000e: Cleanup handling of VLAN_HLEN as a part of max frame size · 2ef56f35

Alexander Duyck authored May 02, 2015

commit 8084b86d upstream.

When the VLAN_HLEN was added to the calculation for the maximum frame size
there seems to have been a number of issues added to the driver.

The first issue is that in some cases the maximum frame size for a device
never really reached the actual maximum frame size as the VLAN header
length was not included the calculation for that value.  As a result some
parts only supported a maximum frame size of either 1496 in the case of
parts that didn't support jumbo frames, and 8996 in the case of the parts
that do.

The second issue is the fact that there were several checks that weren't
updated so as a result setting an MTU of 1500 was treated as enabling jumbo
frames as the calculated value was 1522 instead of 1518.  I have addressed
those by replacing ETH_FRAME_LEN with VLAN_ETH_FRAME_LEN where appropriate.

The final issue was the fact that lowering the MTU below 1500 would cause
the driver to allocate 2K buffers for the rings.  This is an old issue that
was fixed several years ago in igb/ixgbe and I am addressing now by just
replacing == with a <= so that we always just round up to 1522 for anything
that isn't a jumbo frame.

Fixes: c751a3d5 ("e1000e: Correctly include VLAN_HLEN when changing interface MTU")
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
[ luis: backported to 3.16:
  - dropped changes to struct e1000_pch_spt_info as i219 isn't supported ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

2ef56f35

USB: devio: fix a condition in async_completed() · 49cd7468

Dan Carpenter authored May 18, 2015

commit 83ed07c5 upstream.

Static checkers complain that the current condition is never true.  It
seems pretty likely that it's a typo and "URB" was intended instead of
"USB".

Fixes: 3d97ff63 ('usbdevfs: Use scatter-gather lists for large bulk transfers')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

49cd7468

mac80211: fix the beacon csa counter for mesh and ibss · 109cca0d

Chun-Yeow Yeoh authored Jun 09, 2015

commit 8df734e8 upstream.

The csa counter has moved from sdata to beacon/presp but
it is not updated accordingly for mesh and ibss. Fix this.

Fixes: af296bdb ("mac80211: move csa counters from sdata to beacon/presp")
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

109cca0d

mac80211: prevent possible crypto tx tailroom corruption · 6a4cbde2

Michal Kazior authored May 22, 2015

commit ab499db8 upstream.

There was a possible race between
ieee80211_reconfig() and
ieee80211_delayed_tailroom_dec(). This could
result in inability to transmit data if driver
crashed during roaming or rekeying and subsequent
skbs with insufficient tailroom appeared.

This race was probably never seen in the wild
because a device driver would have to crash AND
recover within 0.5s which is very unlikely.

I was able to prove this race exists after
changing the delay to 10s locally and crashing
ath10k via debugfs immediately after GTK
rekeying. In case of ath10k the counter went below
0. This was harmless but other drivers which
actually require tailroom (e.g. for WEP ICV or
MMIC) could end up with the counter at 0 instead
of >0 and introduce insufficient skb tailroom
failures because mac80211 would not resize skbs
appropriately anymore.

Fixes: 8d1f7ecd ("mac80211: defer tailroom counter manipulation when roaming")
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

6a4cbde2