Commits · 1fc56e8864626d2182b1b19deb42c256bf5c597d · Kirill Smelkov / linux

05 Aug, 2015 5 commits

ext4: avoid deadlocks in the writeback path by using sb_getblk_gfp · 1fc56e88

Nikolay Borisov authored Jul 02, 2015

commit c45653c3 upstream.

Switch ext4 to using sb_getblk_gfp with GFP_NOFS added to fix possible
deadlocks in the page writeback path.
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

1fc56e88

bufferhead: Add _gfp version for sb_getblk() · 52a0b6ce

Nikolay Borisov authored Jul 02, 2015

commit bd7ade3c upstream.

sb_getblk() is used during ext4 (and possibly other FSes) writeback
paths. Sometimes such path require allocating memory and guaranteeing
that such allocation won't block. Currently, however, there is no way
to provide user flags for sb_getblk which could lead to deadlocks.

This patch implements a sb_getblk_gfp with the only difference it can
accept user-provided GFP flags.
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

52a0b6ce

Btrfs: fix fsync data loss after append write · 895ee048

Filipe Manana authored Jun 17, 2015

commit e4545de5 upstream.

If we do an append write to a file (which increases its inode's i_size)
that does not have the flag BTRFS_INODE_NEEDS_FULL_SYNC set in its inode,
and the previous transaction added a new hard link to the file, which sets
the flag BTRFS_INODE_COPY_EVERYTHING in the file's inode, and then fsync
the file, the inode's new i_size isn't logged. This has the consequence
that after the fsync log is replayed, the file size remains what it was
before the append write operation, which means users/applications will
not be able to read the data that was successsfully fsync'ed before.

This happens because neither the inode item nor the delayed inode get
their i_size updated when the append write is made - doing so would
require starting a transaction in the buffered write path, something that
we do not do intentionally for performance reasons.

Fix this by making sure that when the flag BTRFS_INODE_COPY_EVERYTHING is
set the inode is logged with its current i_size (log the in-memory inode
into the log tree).

This issue is not a recent regression and is easy to reproduce with the
following test case for fstests:

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"

  here=`pwd`
  tmp=/tmp/$$
  status=1	# failure is the default!

  _cleanup()
  {
          _cleanup_flakey
          rm -f $tmp.*
  }
  trap "_cleanup; exit \$status" 0 1 2 3 15

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter
  . ./common/dmflakey

  # real QA test starts here
  _supported_fs generic
  _supported_os Linux
  _need_to_be_root
  _require_scratch
  _require_dm_flakey
  _require_metadata_journaling $SCRATCH_DEV

  _crash_and_mount()
  {
          # Simulate a crash/power loss.
          _load_flakey_table $FLAKEY_DROP_WRITES
          _unmount_flakey
          # Allow writes again and mount. This makes the fs replay its fsync log.
          _load_flakey_table $FLAKEY_ALLOW_WRITES
          _mount_flakey
  }

  rm -f $seqres.full

  _scratch_mkfs >> $seqres.full 2>&1
  _init_flakey
  _mount_flakey

  # Create the test file with some initial data and then fsync it.
  # The fsync here is only needed to trigger the issue in btrfs, as it causes the
  # the flag BTRFS_INODE_NEEDS_FULL_SYNC to be removed from the btrfs inode.
  $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 32k" \
                  -c "fsync" \
                  $SCRATCH_MNT/foo | _filter_xfs_io
  sync

  # Add a hard link to our file.
  # On btrfs this sets the flag BTRFS_INODE_COPY_EVERYTHING on the btrfs inode,
  # which is a necessary condition to trigger the issue.
  ln $SCRATCH_MNT/foo $SCRATCH_MNT/bar

  # Sync the filesystem to force a commit of the current btrfs transaction, this
  # is a necessary condition to trigger the bug on btrfs.
  sync

  # Now append more data to our file, increasing its size, and fsync the file.
  # In btrfs because the inode flag BTRFS_INODE_COPY_EVERYTHING was set and the
  # write path did not update the inode item in the btree nor the delayed inode
  # item (in memory struture) in the current transaction (created by the fsync
  # handler), the fsync did not record the inode's new i_size in the fsync
  # log/journal. This made the data unavailable after the fsync log/journal is
  # replayed.
  $XFS_IO_PROG -c "pwrite -S 0xbb 32K 32K" \
               -c "fsync" \
               $SCRATCH_MNT/foo | _filter_xfs_io

  echo "File content after fsync and before crash:"
  od -t x1 $SCRATCH_MNT/foo

  _crash_and_mount

  echo "File content after crash and log replay:"
  od -t x1 $SCRATCH_MNT/foo

  status=0
  exit

The expected file output before and after the crash/power failure expects the
appended data to be available, which is:

  0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
  *
  0100000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
  *
  0200000
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

895ee048

Btrfs: fix race between caching kthread and returning inode to inode cache · ff30e038

Filipe Manana authored Jun 13, 2015

commit ae9d8f17 upstream.

While the inode cache caching kthread is calling btrfs_unpin_free_ino(),
we could have a concurrent call to btrfs_return_ino() that adds a new
entry to the root's free space cache of pinned inodes. This concurrent
call does not acquire the fs_info->commit_root_sem before adding a new
entry if the caching state is BTRFS_CACHE_FINISHED, which is a problem
because the caching kthread calls btrfs_unpin_free_ino() after setting
the caching state to BTRFS_CACHE_FINISHED and therefore races with
the task calling btrfs_return_ino(), which is adding a new entry, while
the former (caching kthread) is navigating the cache's rbtree, removing
and freeing nodes from the cache's rbtree without acquiring the spinlock
that protects the rbtree.

This race resulted in memory corruption due to double free of struct
btrfs_free_space objects because both tasks can end up doing freeing the
same objects. Note that adding a new entry can result in merging it with
other entries in the cache, in which case those entries are freed.
This is particularly important as btrfs_free_space structures are also
used for the block group free space caches.

This memory corruption can be detected by a debugging kernel, which
reports it with the following trace:

[132408.501148] slab error in verify_redzone_free(): cache `btrfs_free_space': double free detected
[132408.505075] CPU: 15 PID: 12248 Comm: btrfs-ino-cache Tainted: G        W       4.1.0-rc5-btrfs-next-10+ #1
[132408.505075] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[132408.505075]  ffff880023e7d320 ffff880163d73cd8 ffffffff8145eec7 ffffffff81095dce
[132408.505075]  ffff880009735d40 ffff880163d73ce8 ffffffff81154e1e ffff880163d73d68
[132408.505075]  ffffffff81155733 ffffffffa054a95a ffff8801b6099f00 ffffffffa0505b5f
[132408.505075] Call Trace:
[132408.505075]  [<ffffffff8145eec7>] dump_stack+0x4f/0x7b
[132408.505075]  [<ffffffff81095dce>] ? console_unlock+0x356/0x3a2
[132408.505075]  [<ffffffff81154e1e>] __slab_error.isra.28+0x25/0x36
[132408.505075]  [<ffffffff81155733>] __cache_free+0xe2/0x4b6
[132408.505075]  [<ffffffffa054a95a>] ? __btrfs_add_free_space+0x2f0/0x343 [btrfs]
[132408.505075]  [<ffffffffa0505b5f>] ? btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
[132408.505075]  [<ffffffff810f3b30>] ? time_hardirqs_off+0x15/0x28
[132408.505075]  [<ffffffff81084d42>] ? trace_hardirqs_off+0xd/0xf
[132408.505075]  [<ffffffff811563a1>] ? kfree+0xb6/0x14e
[132408.505075]  [<ffffffff811563d0>] kfree+0xe5/0x14e
[132408.505075]  [<ffffffffa0505b5f>] btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
[132408.505075]  [<ffffffffa0505e08>] caching_kthread+0x29e/0x2d9 [btrfs]
[132408.505075]  [<ffffffffa0505b6a>] ? btrfs_unpin_free_ino+0x99/0x99 [btrfs]
[132408.505075]  [<ffffffff8106698f>] kthread+0xef/0xf7
[132408.505075]  [<ffffffff810f3b08>] ? time_hardirqs_on+0x15/0x28
[132408.505075]  [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
[132408.505075]  [<ffffffff814653d2>] ret_from_fork+0x42/0x70
[132408.505075]  [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
[132408.505075] ffff880023e7d320: redzone 1:0x9f911029d74e35b, redzone 2:0x9f911029d74e35b.
[132409.501654] slab: double free detected in cache 'btrfs_free_space', objp ffff880023e7d320
[132409.503355] ------------[ cut here ]------------
[132409.504241] kernel BUG at mm/slab.c:2571!

Therefore fix this by having btrfs_unpin_free_ino() acquire the lock
that protects the rbtree while doing the searches and removing entries.

Fixes: 1c70d8fb ("Btrfs: fix inode caching vs tree log")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

ff30e038

Btrfs: use kmem_cache_free when freeing entry in inode cache · 4899b628

Filipe Manana authored Jun 13, 2015

commit c3f4a168 upstream.

The free space entries are allocated using kmem_cache_zalloc(),
through __btrfs_add_free_space(), therefore we should use
kmem_cache_free() and not kfree() to avoid any confusion and
any potential problem. Looking at the kfree() definition at
mm/slab.c it has the following comment:

  /*
   * (...)
   *
   * Don't free memory not originally allocated by kmalloc()
   * or you will run into trouble.
   */

So better be safe and use kmem_cache_free().
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

4899b628

04 Aug, 2015 1 commit

sg_start_req(): make sure that there's not too many elements in iovec · b50ae828

Al Viro authored Aug 01, 2015

commit 451a2886 upstream.

unfortunately, allowing an arbitrary 16bit value means a possibility of
overflow in the calculation of total number of pages in bio_map_user_iov() -
we rely on there being no more than PAGE_SIZE members of sum in the
first loop there.  If that sum wraps around, we end up allocating
too small array of pointers to pages and it's easy to overflow it in
the second loop.

X-Coverup: TINC (and there's no lumber cartel either)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[bwh: s/MAX_UIOVEC/UIO_MAXIOV/. This was fixed upstream by commit
 fdc81f45 ("sg_start_req(): use import_iovec()"), but we don't have
 that function.]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Reference: CVE-2015-5707
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

b50ae828

30 Jul, 2015 9 commits

KEYS: ensure we free the assoc array edit if edit is valid · 21b0f383

Colin Ian King authored Jul 27, 2015

commit ca4da5dd upstream.

__key_link_end is not freeing the associated array edit structure
and this leads to a 512 byte memory leak each time an identical
existing key is added with add_key().

The reason the add_key() system call returns okay is that
key_create_or_update() calls __key_link_begin() before checking to see
whether it can update a key directly rather than adding/replacing - which
it turns out it can.  Thus __key_link() is not called through
__key_instantiate_and_link() and __key_link_end() must cancel the edit.

CVE-2015-1333
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

21b0f383

x86/nmi/64: Use DF to avoid userspace RSP confusing nested NMI detection · cff4ef3d

Andy Lutomirski authored Jul 10, 2015

commit 810bc075 upstream.

We have a tricky bug in the nested NMI code: if we see RSP pointing
to the NMI stack on NMI entry from kernel mode, we assume that we
are executing a nested NMI.

This isn't quite true.  A malicious userspace program can point RSP
at the NMI stack, issue SYSCALL, and arrange for an NMI to happen
while RSP is still pointing at the NMI stack.

Fix it with a sneaky trick.  Set DF in the region of code that the RSP
check is intended to detect.  IRET will clear DF atomically.

(Note: other than paravirt, there's little need for all this complexity.
 We could check RIP instead of RSP.)

Fixes CVE-2015-3291.
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
[bwh: Backported to 4.0: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
CVE-2015-3291
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

cff4ef3d

x86/nmi/64: Reorder nested NMI checks · 7eddd205

Andy Lutomirski authored Jul 12, 2015

commit a27507ca upstream.

Check the repeat_nmi .. end_repeat_nmi special case first.  The next
patch will rework the RSP check and, as a side effect, the RSP check
will no longer detect repeat_nmi .. end_repeat_nmi, so we'll need
this ordering of the checks.

Note: this is more subtle than it appears.  The check for repeat_nmi
.. end_repeat_nmi jumps straight out of the NMI code instead of
adjusting the "iret" frame to force a repeat.  This is necessary,
because the code between repeat_nmi and end_repeat_nmi sets "NMI
executing" and then writes to the "iret" frame itself.  If a nested
NMI comes in and modifies the "iret" frame while repeat_nmi is also
modifying it, we'll end up with garbage.  The old code got this
right, as does the new code, but the new code is a bit more
explicit.

If we were to move the check right after the "NMI executing" check,
then we'd get it wrong and have random crashes.

This is a prerequisite for the fix for CVE-2015-3291.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
[bwh: Backported to 4.0: adjust filename, spacing]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

7eddd205

x86/nmi/64: Improve nested NMI comments · 3be75fea

Andy Lutomirski authored Jul 10, 2015

commit 0b22930e upstream.

I found the nested NMI documentation to be difficult to follow.
Improve the comments.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
[bwh: Backported to 4.0: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

3be75fea

x86/nmi/64: Switch stacks on userspace NMI entry · d8fa09a2

Andy Lutomirski authored Jul 10, 2015

commit 9b6e6a83 upstream.

Returning to userspace is tricky: IRET can fail, and ESPFIX can
rearrange the stack prior to IRET.

The NMI nesting fixup relies on a precise stack layout and atomic
IRET.  Rather than trying to teach the NMI nesting fixup to handle
ESPFIX and failed IRET, punt: run NMIs that came from user mode on
the normal kernel stack.

This will make some nested NMIs visible to C code, but the C code is
okay with that.

As a side effect, this should speed up perf: it eliminates an RDMSR
when NMIs come from user mode.

Fixes CVE-2015-3290.
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
[bwh: Backported to 4.0:
 - Adjust filename, context
 - s/restore_c_regs_and_iret/restore_args/
 - Use kernel_stack + KERNEL_STACK_OFFSET instead of cpu_current_top_of_stack]
[luto: Open-coded return path to avoid dependency on partial pt_regs details]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
CVE-2015-3290, CVE-2015-5157
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

d8fa09a2

x86/nmi/64: Remove asm code that saves cr2 · bae96303

Andy Lutomirski authored Jul 10, 2015

commit 0e181bb5 upstream.

Now that do_nmi saves cr2, we don't need to save it in asm.

This is a prerequisity for the fix for CVE-2015-3290.
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
[bwh: Backported to 4.0: adjust filename, context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

bae96303

x86/nmi: Enable nested do_nmi handling for 64-bit kernels · fe0d50e2

Andy Lutomirski authored Jul 10, 2015

commit 9d050416 upstream.

32-bit kernels handle nested NMIs in C.  Enable the exact same
handling on 64-bit kernels as well.  This isn't currently necessary,
but it will become necessary once the asm code starts allowing
limited nesting.

This is a prerequisite for the fix for CVE-2015-3290.
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

fe0d50e2

x86/asm/entry/64: Remove a redundant jump · f0dc9f75

Denys Vlasenko authored Apr 07, 2015

commit a30b0085 upstream.

Jumping to the very next instruction is not very useful:

        jmp label
    label:

Removing the jump.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1428439424-7258-5-git-send-email-dvlasenk@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

f0dc9f75

x86/asm/entry/64: Fold the 'test_in_nmi' macro into its only user · 1c037821

Denys Vlasenko authored Apr 01, 2015

commit 0784b364 upstream.

No code changes.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427899858-7165-1-git-send-email-dvlasenk@redhat.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: John Johansen <john.johansen@canonical.com>
Acked-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

1c037821

24 Jul, 2015 2 commits

evm: labeling pseudo filesystems exception · 00cc0819

Mimi Zohar authored Apr 21, 2015

commit 5101a185 upstream.

To prevent offline stripping of existing file xattrs and relabeling of
them at runtime, EVM allows only newly created files to be labeled.  As
pseudo filesystems are not persistent, stripping of xattrs is not a
concern.

Some LSMs defer file labeling on pseudo filesystems.  This patch
permits the labeling of existing files on pseudo files systems.
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

00cc0819

ieee802154: Fix sockaddr_ieee802154 implicit padding information leak. · 18ffb713

Lennert Buytenhek authored Jun 03, 2015

commit 8a70cefa upstream.

The AF_IEEE802154 sockaddr looks like this:

	struct sockaddr_ieee802154 {
		sa_family_t family; /* AF_IEEE802154 */
		struct ieee802154_addr_sa addr;
	};

	struct ieee802154_addr_sa {
		int addr_type;
		u16 pan_id;
		union {
			u8 hwaddr[IEEE802154_ADDR_LEN];
			u16 short_addr;
		};
	};

On most architectures there will be implicit structure padding here,
in two different places:

* In struct sockaddr_ieee802154, two bytes of padding between 'family'
  (unsigned short) and 'addr', so that 'addr' starts on a four byte
  boundary.

* In struct ieee802154_addr_sa, two bytes at the end of the structure,
  to make the structure 16 bytes.

When calling recvmsg(2) on a PF_IEEE802154 SOCK_DGRAM socket, the
ieee802154 stack constructs a struct sockaddr_ieee802154 on the
kernel stack without clearing these padding fields, and, depending
on the addr_type, between four and ten bytes of uncleared kernel
stack will be copied to userspace.

We can't just insert two 'u16 __pad's in the right places and zero
those before copying an address to userspace, as not all architectures
insert this implicit padding -- from a quick test it seems that avr32,
cris and m68k don't insert this padding, while every other architecture
that I have cross compilers for does insert this padding.

The easiest way to plug the leak is to just memset the whole struct
sockaddr_ieee802154 before filling in the fields we want to fill in,
and that's what this patch does.
Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Acked-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
[ luis: backported to 3.16:
  - file rename: net/ieee802154/socket.c -> net/ieee802154/dgram.c ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

18ffb713

20 Jul, 2015 1 commit
- Linux 3.19.8-ckt4 · 138bca44
  Kamal Mostafa authored Jul 20, 2015
```
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
```
  138bca44
16 Jul, 2015 22 commits

Btrfs: lock superblock before remounting for rw subvol · 8b76631c

Omar Sandoval authored May 18, 2015

commit 773cd04e upstream.

Since commit 0723a047 ("btrfs: allow mounting btrfs subvolumes with
different ro/rw options"), when mounting a subvolume read/write when
another subvolume has previously been mounted read-only, we first do a
remount. However, this should be done with the superblock locked, as per
sync_filesystem():

	/*
	 * We need to be protected against the filesystem going from
	 * r/o to r/w or vice versa.
	 */
	WARN_ON(!rwsem_is_locked(&sb->s_umount));

This WARN_ON can easily be hit with:

mkfs.btrfs -f /dev/vdb
mount /dev/vdb /mnt
btrfs subvol create /mnt/vol1
btrfs subvol create /mnt/vol2
umount /mnt
mount -oro,subvol=/vol1 /dev/vdb /mnt
mount -orw,subvol=/vol2 /dev/vdb /mnt2

Fixes: 0723a047 ("btrfs: allow mounting btrfs subvolumes with different ro/rw options")
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

8b76631c

net/mlx4_core: Enhance the MAD_IFC wrapper to convert VF port to physical · 956b6a4f

Or Gerlitz authored May 21, 2015

commit 7c35ef45 upstream.

Single port VFs always provide port = 1 (even if the actual physical
port used is port 2). As such, we need to convert the port provided
by the VF to the physical port before calling into the firmware.

It turns out that the Linux mlx4 VF RoCE driver maintains a copy of
the GID table and hence this change became critical only for single
ported IB VFs, but it could be needed for other RoCE VF drivers too.

Fixes: 449fc488 ('net/mlx4: Adapt code for N-Port VF')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

956b6a4f

clk: Fix JSON output in debugfs · 374fbd9d

Stefan Wahren authored Apr 29, 2015

commit 7cb81136 upstream.

key/value pairs in a JSON object must be separated by a comma.
After adding the properties "accuracy" and "phase" the JSON output
of /sys/kernel/debug/clk/clk_dump is invalid.

So add the missing commas to fix it.

Fixes: 5279fc40 ("clk: add clk accuracy retrieval support")
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
[sboyd@codeaurora.org: Added comment in function]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
[ kamal: backport to 3.19-stable: context ]
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

374fbd9d

e1000e: Cleanup handling of VLAN_HLEN as a part of max frame size · 2ef56f35

Alexander Duyck authored May 02, 2015

commit 8084b86d upstream.

When the VLAN_HLEN was added to the calculation for the maximum frame size
there seems to have been a number of issues added to the driver.

The first issue is that in some cases the maximum frame size for a device
never really reached the actual maximum frame size as the VLAN header
length was not included the calculation for that value.  As a result some
parts only supported a maximum frame size of either 1496 in the case of
parts that didn't support jumbo frames, and 8996 in the case of the parts
that do.

The second issue is the fact that there were several checks that weren't
updated so as a result setting an MTU of 1500 was treated as enabling jumbo
frames as the calculated value was 1522 instead of 1518.  I have addressed
those by replacing ETH_FRAME_LEN with VLAN_ETH_FRAME_LEN where appropriate.

The final issue was the fact that lowering the MTU below 1500 would cause
the driver to allocate 2K buffers for the rings.  This is an old issue that
was fixed several years ago in igb/ixgbe and I am addressing now by just
replacing == with a <= so that we always just round up to 1522 for anything
that isn't a jumbo frame.

Fixes: c751a3d5 ("e1000e: Correctly include VLAN_HLEN when changing interface MTU")
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
[ luis: backported to 3.16:
  - dropped changes to struct e1000_pch_spt_info as i219 isn't supported ]
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

2ef56f35

USB: devio: fix a condition in async_completed() · 49cd7468

Dan Carpenter authored May 18, 2015

commit 83ed07c5 upstream.

Static checkers complain that the current condition is never true.  It
seems pretty likely that it's a typo and "URB" was intended instead of
"USB".

Fixes: 3d97ff63 ('usbdevfs: Use scatter-gather lists for large bulk transfers')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

49cd7468

mac80211: fix the beacon csa counter for mesh and ibss · 109cca0d

Chun-Yeow Yeoh authored Jun 09, 2015

commit 8df734e8 upstream.

The csa counter has moved from sdata to beacon/presp but
it is not updated accordingly for mesh and ibss. Fix this.

Fixes: af296bdb ("mac80211: move csa counters from sdata to beacon/presp")
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

109cca0d

mac80211: prevent possible crypto tx tailroom corruption · 6a4cbde2

Michal Kazior authored May 22, 2015

commit ab499db8 upstream.

There was a possible race between
ieee80211_reconfig() and
ieee80211_delayed_tailroom_dec(). This could
result in inability to transmit data if driver
crashed during roaming or rekeying and subsequent
skbs with insufficient tailroom appeared.

This race was probably never seen in the wild
because a device driver would have to crash AND
recover within 0.5s which is very unlikely.

I was able to prove this race exists after
changing the delay to 10s locally and crashing
ath10k via debugfs immediately after GTK
rekeying. In case of ath10k the counter went below
0. This was harmless but other drivers which
actually require tailroom (e.g. for WEP ICV or
MMIC) could end up with the counter at 0 instead
of >0 and introduce insufficient skb tailroom
failures because mac80211 would not resize skbs
appropriately anymore.

Fixes: 8d1f7ecd ("mac80211: defer tailroom counter manipulation when roaming")
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

6a4cbde2

HID: rmi: fix some harmless BIT() mistakes · 92eb46f3

Dan Carpenter authored May 14, 2015

commit af43c408 upstream.

These defines are used like this:

	if (!(test_bit(RMI_STARTED, &hdata->flags)))

So the intent was to use bits 0, 1 and 2 but because of the extra BIT()
shifts we're actually using 1, 2 and 4.  It's harmless because it's done
consistently but static checkers will complain.

Fixes: 9fb6bf02 ('HID: rmi: introduce RMI driver for Synaptics touchpads')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

92eb46f3

ACPI / init: Switch over platform to the ACPI mode later · 43451e1e

Rafael J. Wysocki authored Jun 10, 2015

commit b064a8fa upstream.

Commit 73f7d1ca "ACPI / init: Run acpi_early_init() before
timekeeping_init()" moved the ACPI subsystem initialization,
including the ACPI mode enabling, to an earlier point in the
initialization sequence, to allow the timekeeping subsystem
use ACPI early.  Unfortunately, that resulted in boot regressions
on some systems and the early ACPI initialization was moved toward
its original position in the kernel initialization code by commit
c4e1acbb "ACPI / init: Invoke early ACPI initialization later".

However, that turns out to be insufficient, as boot is still broken
on the Tyan S8812 mainboard.

To fix that issue, split the ACPI early initialization code into
two pieces so the majority of it still located in acpi_early_init()
and the part switching over the platform into the ACPI mode goes into
a new function, acpi_subsystem_init(), executed at the original early
ACPI initialization spot.

That fixes the Tyan S8812 boot problem, but still allows ACPI
tables to be loaded earlier which is useful to the EFI code in
efi_enter_virtual_mode().

Link: https://bugzilla.kernel.org/show_bug.cgi?id=97141
Fixes: 73f7d1ca "ACPI / init: Run acpi_early_init() before timekeeping_init()"
Reported-and-tested-by: Marius Tolzmann <tolzmann@molgen.mpg.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: Lee, Chun-Yi <jlee@suse.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

43451e1e

samples/bpf: fix in-source build of samples with clang · 702ca351

Brenden Blanco authored May 11, 2015

commit b88c06e3 upstream.

in-source build of 'make samples/bpf/' was incorrectly
using default compiler instead of invoking clang/llvm.
out-of-source build was ok.

Fixes: a8085782 ("samples: bpf: trivial eBPF program in C")
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

702ca351

iwlwifi: mvm: fix ROC reference accounting · 4d686333

Eliad Peller authored Apr 19, 2015

commit c779273b upstream.

commit b112889c ("iwlwifi: mvm: add Aux ROC request/response flow")
added aux ROC flow in addition to the existing ROC flow. While doing
it, it moved the ROC reference release to a common work item, which
is being called for both the ROC and aux ROC flows.

This resulted in invalid reference accounting, as no reference was
taken in case of aux ROC, while a reference was released on completion.

Fix it by adding a reference for the aux ROC as well, and release
only the relevant references on completion (according to the set bits).

While at it, convert cancel_work_sync() to flush_work(), in order
to make sure the references are being cleaned properly.

Fixes: b112889c ("iwlwifi: mvm: add Aux ROC request/response flow")
Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

4d686333

HID: i2c-hid: fix harmless test_bit() issue · 07bc7ed4

Dan Carpenter authored May 14, 2015

commit c8fd51dc upstream.

These defines are used like this:

	if (test_bit(I2C_HID_STARTED, &ihid->flags))

The intent was to use bits 0, 1, and 2 but because of the extra shifts
we're using bits 1, 2, and 4.  It's harmless becuase it's done
consistently but it's not the intent and static checkers will complain.

Fixes: 4a200c3b ('HID: i2c-hid: introduce HID over i2c specification implementation')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

07bc7ed4

of: return NUMA_NO_NODE from fallback of_node_to_nid() · df007c2b

Konstantin Khlebnikov authored Apr 08, 2015

commit c8fff7bc upstream.

Node 0 might be offline as well as any other numa node,
in this case kernel cannot handle memory allocation and crashes.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 0c3f061c ("of: implement of_node_to_nid as a weak function")
Signed-off-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

df007c2b

ARM: 8372/1: KGDB does not build on BE32 · 8981b0b2

Arnd Bergmann authored May 26, 2015

commit cfeec79e upstream.

KGDB requires code patching, which only works on little-endian
or newer big-endian (BE8) machines but not on the older big-endian
ones (BE32) where it results in this build error:

arch/arm/kernel/patch.c: In function '__patch_text_real':
arch/arm/kernel/patch.c:93:4: error: implicit declaration of function '__opcode_to_mem_thumb32' [-Werror=implicit-function-declaration]
    insn = __opcode_to_mem_thumb32(insn);

This adds a Kconfig dependency to avoid the broken case and
for all other symbols that require code patching.

Fixes: 23a4e405 ("arm: kgdb: Handle read-only text / modules")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

8981b0b2

pktgen: adjust spacing in proc file interface output · defb9bbd

Jesper Dangaard Brouer authored May 21, 2015

commit d079abd1 upstream.

Too many spaces were introduced in commit 63adc6fb ("pktgen: cleanup
checkpatch warnings"), thus misaligning "src_min:" to other columns.

Fixes: 63adc6fb ("pktgen: cleanup checkpatch warnings")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

defb9bbd

security_syslog() should be called once only · f98c9f65

Vasily Averin authored Jun 25, 2015

commit d194e5d6 upstream.

The final version of commit 637241a9 ("kmsg: honor dmesg_restrict
sysctl on /dev/kmsg") lost few hooks, as result security_syslog() are
processed incorrectly:

- open of /dev/kmsg checks syslog access permissions by using
  check_syslog_permissions() where security_syslog() is not called if
  dmesg_restrict is set.

- syslog syscall and /proc/kmsg calls do_syslog() where security_syslog
  can be executed twice (inside check_syslog_permissions() and then
  directly in do_syslog())

With this patch security_syslog() is called once only in all
syslog-related operations regardless of dmesg_restrict value.

Fixes: 637241a9 ("kmsg: honor dmesg_restrict sysctl on /dev/kmsg")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

f98c9f65

NFS: Fix size of NFSACL SETACL operations · b4e41a59

Chuck Lever authored May 26, 2015

commit d683cc49 upstream.

When encoding the NFSACL SETACL operation, reserve just the estimated
size of the ACL rather than a fixed maximum. This eliminates needless
zero padding on the wire that the server ignores.

Fixes: ee5dc773 ('NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!"')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

b4e41a59

MIPS: Octeon: Set OHCI and EHCI MMIO byte order to match CPU · 25cc7159

Ben Hutchings authored May 25, 2015

commit df115f3e upstream.

The Octeon OHCI is now supported by the ohci-platform driver, and
USB_OCTEON_OHCI is marked as deprecated.  However, it is currently
still necessary to enable it in order to select
USB_OHCI_BIG_ENDIAN_MMIO.  Make CPU_CAVIUM_OCTEON select that as well,
so that USB_OCTEON_OHCI is really obsolete.

The old ohci-octeon and ehci-octeon drivers also only enabled big-endian
MMIO in case the CPU was big-endian.  Make the selections of
USB_EHCI_BIG_ENDIAN_MMIO and USB_OHCI_BIG_ENDIAN_MMIO conditional, to
match this.

Fixes: 2193dda5 ("USB: host: Remove ehci-octeon and ohci-octeon drivers")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-mips@linux-mips.org
Cc: David Daney <david.daney@cavium.com>
Cc: Chandrakala Chavva <cchavva@caviumnetworks.com>
Cc: Paul Martin <paul.martin@codethink.co.uk>
Patchwork: https://patchwork.linux-mips.org/patch/10178/Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

25cc7159

clk: ti: dra7-atl-clock: Fix possible ERR_PTR dereference · 4960f359

Krzysztof Kozlowski authored May 13, 2015

commit e0cdcda5 upstream.

of_clk_get_from_provider() returns ERR_PTR on failure. The
dra7-atl-clock driver was not checking its return value and
immediately used it in __clk_get_hw().  __clk_get_hw()
dereferences supplied clock, if it is not NULL, so in that case
it would dereference an ERR_PTR.

Fixes: 9ac33b0c ("CLK: TI: Driver for DRA7 ATL (Audio Tracking Logic)")
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

4960f359

rndis_wlan: harmless issue calling set_bit() · 96698cee

Dan Carpenter authored May 14, 2015

commit e3958e9d upstream.

These are used like:

	set_bit(WORK_LINK_UP, &priv->work_pending);

The problem is that set_bit() takes the actual bit number and not a mask
so static checkers get upset.  It doesn't affect run time because we do
it consistently, but we may as well clean it up.

Fixes: 6010ce07 ('rndis_wlan: do link-down state change in worker thread')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

96698cee

mtd: dc21285: use raw spinlock functions for nw_gpio_lock · 0310462b

Uwe Kleine-König authored May 28, 2015

commit e5babdf9 upstream.

Since commit bd31b859 (which is in 3.2-rc1) nw_gpio_lock is a raw spinlock
that needs usage of the corresponding raw functions.

This fixes:

  drivers/mtd/maps/dc21285.c: In function 'nw_en_write':
  drivers/mtd/maps/dc21285.c:41:340: warning: passing argument 1 of 'spinlock_check' from incompatible pointer type
    spin_lock_irqsave(&nw_gpio_lock, flags);

  In file included from include/linux/seqlock.h:35:0,
                   from include/linux/time.h:5,
                   from include/linux/stat.h:18,
                   from include/linux/module.h:10,
                   from drivers/mtd/maps/dc21285.c:8:
  include/linux/spinlock.h:299:102: note: expected 'struct spinlock_t *' but argument is of type 'struct raw_spinlock_t *'
   static inline raw_spinlock_t *spinlock_check(spinlock_t *lock)
                                                                                                        ^
  drivers/mtd/maps/dc21285.c:43:25: warning: passing argument 1 of 'spin_unlock_irqrestore' from incompatible pointer type
    spin_unlock_irqrestore(&nw_gpio_lock, flags);
                           ^
  In file included from include/linux/seqlock.h:35:0,
                   from include/linux/time.h:5,
                   from include/linux/stat.h:18,
                   from include/linux/module.h:10,
                   from drivers/mtd/maps/dc21285.c:8:
  include/linux/spinlock.h:370:91: note: expected 'struct spinlock_t *' but argument is of type 'struct raw_spinlock_t *'
   static inline void spin_unlock_irqrestore(spinlock_t *lock, unsigned long flags)

Fixes: bd31b859 ("locking, ARM: Annotate low level hw locks as raw")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

0310462b

pktgen: adjust flag NO_TIMESTAMP to be more pktgen compliant · d673de1a

Jesper Dangaard Brouer authored May 07, 2015

commit f1f00d8f upstream.

Allow flag NO_TIMESTAMP to turn timestamping on again, like other flags,
with a negation of the flag like !NO_TIMESTAMP.

Also document the option flag NO_TIMESTAMP.

Fixes: afb84b62 ("pktgen: add flag NO_TIMESTAMP to disable timestamping")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

d673de1a