1. 06 Mar, 2015 40 commits
    • Andy Shevchenko's avatar
      x86: pmc-atom: Assign debugfs node as soon as possible · 25dd360c
      Andy Shevchenko authored
      commit 1b43d712 upstream.
      
      pmc_dbgfs_unregister() will be called when pmc->dbgfs_dir is unconditionally
      NULL on error path in pmc_dbgfs_register(). To prevent this we move the
      assignment to where is should be.
      
      Fixes: f855911c (x86/pmc_atom: Expose PMC device state and platform sleep state)
      Reported-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Aubrey Li <aubrey.li@linux.intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Kumar P. Mahesh <mahesh.kumar.p@intel.com>
      Link: http://lkml.kernel.org/r/1421253575-22509-2-git-send-email-andriy.shevchenko@linux.intel.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      25dd360c
    • Hector Marco-Gisbert's avatar
      x86, mm/ASLR: Fix stack randomization on 64-bit systems · 805f25c4
      Hector Marco-Gisbert authored
      commit 4e7c22d4 upstream.
      
      The issue is that the stack for processes is not properly randomized on
      64 bit architectures due to an integer overflow.
      
      The affected function is randomize_stack_top() in file
      "fs/binfmt_elf.c":
      
        static unsigned long randomize_stack_top(unsigned long stack_top)
        {
                 unsigned int random_variable = 0;
      
                 if ((current->flags & PF_RANDOMIZE) &&
                         !(current->personality & ADDR_NO_RANDOMIZE)) {
                         random_variable = get_random_int() & STACK_RND_MASK;
                         random_variable <<= PAGE_SHIFT;
                 }
                 return PAGE_ALIGN(stack_top) + random_variable;
                 return PAGE_ALIGN(stack_top) - random_variable;
        }
      
      Note that, it declares the "random_variable" variable as "unsigned int".
      Since the result of the shifting operation between STACK_RND_MASK (which
      is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64):
      
      	  random_variable <<= PAGE_SHIFT;
      
      then the two leftmost bits are dropped when storing the result in the
      "random_variable". This variable shall be at least 34 bits long to hold
      the (22+12) result.
      
      These two dropped bits have an impact on the entropy of process stack.
      Concretely, the total stack entropy is reduced by four: from 2^28 to
      2^30 (One fourth of expected entropy).
      
      This patch restores back the entropy by correcting the types involved
      in the operations in the functions randomize_stack_top() and
      stack_maxrandom_size().
      
      The successful fix can be tested with:
      
        $ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done
        7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0                          [stack]
        7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0                          [stack]
        7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0                          [stack]
        7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0                          [stack]
        ...
      
      Once corrected, the leading bytes should be between 7ffc and 7fff,
      rather than always being 7fff.
      Signed-off-by: default avatarHector Marco-Gisbert <hecmargi@upv.es>
      Signed-off-by: default avatarIsmael Ripoll <iripoll@upv.es>
      [ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ]
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Fixes: CVE-2015-1593
      Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.netSigned-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      805f25c4
    • Matt Fleming's avatar
      x86/efi: Avoid triple faults during EFI mixed mode calls · 55c0226f
      Matt Fleming authored
      commit 96738c69 upstream.
      
      Andy pointed out that if an NMI or MCE is received while we're in the
      middle of an EFI mixed mode call a triple fault will occur. This can
      happen, for example, when issuing an EFI mixed mode call while running
      perf.
      
      The reason for the triple fault is that we execute the mixed mode call
      in 32-bit mode with paging disabled but with 64-bit kernel IDT handlers
      installed throughout the call.
      
      At Andy's suggestion, stop playing the games we currently do at runtime,
      such as disabling paging and installing a 32-bit GDT for __KERNEL_CS. We
      can simply switch to the __KERNEL32_CS descriptor before invoking
      firmware services, and run in compatibility mode. This way, if an
      NMI/MCE does occur the kernel IDT handler will execute correctly, since
      it'll jump to __KERNEL_CS automatically.
      
      However, this change is only possible post-ExitBootServices(). Before
      then the firmware "owns" the machine and expects for its 32-bit IDT
      handlers to be left intact to service interrupts, etc.
      
      So, we now need to distinguish between early boot and runtime
      invocations of EFI services. During early boot, we need to restore the
      GDT that the firmware expects to be present. We can only jump to the
      __KERNEL32_CS code segment for mixed mode calls after ExitBootServices()
      has been invoked.
      
      A liberal sprinkling of comments in the thunking code should make the
      differences in early and late environments more apparent.
      Reported-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Tested-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      55c0226f
    • Thadeu Lima de Souza Cascardo's avatar
      blk-throttle: check stats_cpu before reading it from sysfs · b7159073
      Thadeu Lima de Souza Cascardo authored
      commit 045c47ca upstream.
      
      When reading blkio.throttle.io_serviced in a recently created blkio
      cgroup, it's possible to race against the creation of a throttle policy,
      which delays the allocation of stats_cpu.
      
      Like other functions in the throttle code, just checking for a NULL
      stats_cpu prevents the following oops caused by that race.
      
      [ 1117.285199] Unable to handle kernel paging request for data at address 0x7fb4d0020
      [ 1117.285252] Faulting instruction address: 0xc0000000003efa2c
      [ 1137.733921] Oops: Kernel access of bad area, sig: 11 [#1]
      [ 1137.733945] SMP NR_CPUS=2048 NUMA PowerNV
      [ 1137.734025] Modules linked in: bridge stp llc kvm_hv kvm binfmt_misc autofs4
      [ 1137.734102] CPU: 3 PID: 5302 Comm: blkcgroup Not tainted 3.19.0 #5
      [ 1137.734132] task: c000000f1d188b00 ti: c000000f1d210000 task.ti: c000000f1d210000
      [ 1137.734167] NIP: c0000000003efa2c LR: c0000000003ef9f0 CTR: c0000000003ef980
      [ 1137.734202] REGS: c000000f1d213500 TRAP: 0300   Not tainted  (3.19.0)
      [ 1137.734230] MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR: 42008884  XER: 20000000
      [ 1137.734325] CFAR: 0000000000008458 DAR: 00000007fb4d0020 DSISR: 40000000 SOFTE: 0
      GPR00: c0000000003ed3a0 c000000f1d213780 c000000000c59538 0000000000000000
      GPR04: 0000000000000800 0000000000000000 0000000000000000 0000000000000000
      GPR08: ffffffffffffffff 00000007fb4d0020 00000007fb4d0000 c000000000780808
      GPR12: 0000000022000888 c00000000fdc0d80 0000000000000000 0000000000000000
      GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      GPR20: 000001003e120200 c000000f1d5b0cc0 0000000000000200 0000000000000000
      GPR24: 0000000000000001 c000000000c269e0 0000000000000020 c000000f1d5b0c80
      GPR28: c000000000ca3a08 c000000000ca3dec c000000f1c667e00 c000000f1d213850
      [ 1137.734886] NIP [c0000000003efa2c] .tg_prfill_cpu_rwstat+0xac/0x180
      [ 1137.734915] LR [c0000000003ef9f0] .tg_prfill_cpu_rwstat+0x70/0x180
      [ 1137.734943] Call Trace:
      [ 1137.734952] [c000000f1d213780] [d000000005560520] 0xd000000005560520 (unreliable)
      [ 1137.734996] [c000000f1d2138a0] [c0000000003ed3a0] .blkcg_print_blkgs+0xe0/0x1a0
      [ 1137.735039] [c000000f1d213960] [c0000000003efb50] .tg_print_cpu_rwstat+0x50/0x70
      [ 1137.735082] [c000000f1d2139e0] [c000000000104b48] .cgroup_seqfile_show+0x58/0x150
      [ 1137.735125] [c000000f1d213a70] [c0000000002749dc] .kernfs_seq_show+0x3c/0x50
      [ 1137.735161] [c000000f1d213ae0] [c000000000218630] .seq_read+0xe0/0x510
      [ 1137.735197] [c000000f1d213bd0] [c000000000275b04] .kernfs_fop_read+0x164/0x200
      [ 1137.735240] [c000000f1d213c80] [c0000000001eb8e0] .__vfs_read+0x30/0x80
      [ 1137.735276] [c000000f1d213cf0] [c0000000001eb9c4] .vfs_read+0x94/0x1b0
      [ 1137.735312] [c000000f1d213d90] [c0000000001ebb38] .SyS_read+0x58/0x100
      [ 1137.735349] [c000000f1d213e30] [c000000000009218] syscall_exit+0x0/0x98
      [ 1137.735383] Instruction dump:
      [ 1137.735405] 7c6307b4 7f891800 409d00b8 60000000 60420000 3d420004 392a63b0 786a1f24
      [ 1137.735471] 7d49502a e93e01c8 7d495214 7d2ad214 <7cead02a> e9090008 e9490010 e9290018
      
      And here is one code that allows to easily reproduce this, although this
      has first been found by running docker.
      
      void run(pid_t pid)
      {
      	int n;
      	int status;
      	int fd;
      	char *buffer;
      	buffer = memalign(BUFFER_ALIGN, BUFFER_SIZE);
      	n = snprintf(buffer, BUFFER_SIZE, "%d\n", pid);
      	fd = open(CGPATH "/test/tasks", O_WRONLY);
      	write(fd, buffer, n);
      	close(fd);
      	if (fork() > 0) {
      		fd = open("/dev/sda", O_RDONLY | O_DIRECT);
      		read(fd, buffer, 512);
      		close(fd);
      		wait(&status);
      	} else {
      		fd = open(CGPATH "/test/blkio.throttle.io_serviced", O_RDONLY);
      		n = read(fd, buffer, BUFFER_SIZE);
      		close(fd);
      	}
      	free(buffer);
      	exit(0);
      }
      
      void test(void)
      {
      	int status;
      	mkdir(CGPATH "/test", 0666);
      	if (fork() > 0)
      		wait(&status);
      	else
      		run(getpid());
      	rmdir(CGPATH "/test");
      }
      
      int main(int argc, char **argv)
      {
      	int i;
      	for (i = 0; i < NR_TESTS; i++)
      		test();
      	return 0;
      }
      Reported-by: default avatarRicardo Marin Matinata <rmm@br.ibm.com>
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b7159073
    • Filipe Manana's avatar
      Btrfs: fix fsync data loss after adding hard link to inode · f8d6da8a
      Filipe Manana authored
      commit 1a4bcf47 upstream.
      
      We have a scenario where after the fsync log replay we can lose file data
      that had been previously fsync'ed if we added an hard link for our inode
      and after that we sync'ed the fsync log (for example by fsync'ing some
      other file or directory).
      
      This is because when adding an hard link we updated the inode item in the
      log tree with an i_size value of 0. At that point the new inode item was
      in memory only and a subsequent fsync log replay would not make us lose
      the file data. However if after adding the hard link we sync the log tree
      to disk, by fsync'ing some other file or directory for example, we ended
      up losing the file data after log replay, because the inode item in the
      persisted log tree had an an i_size of zero.
      
      This is easy to reproduce, and the following excerpt from my test for
      xfstests shows this:
      
        _scratch_mkfs >> $seqres.full 2>&1
        _init_flakey
        _mount_flakey
      
        # Create one file with data and fsync it.
        # This made the btrfs fsync log persist the data and the inode metadata with
        # a correct inode->i_size (4096 bytes).
        $XFS_IO_PROG -f -c "pwrite -S 0xaa -b 4K 0 4K" -c "fsync" \
             $SCRATCH_MNT/foo | _filter_xfs_io
      
        # Now add one hard link to our file. This made the btrfs code update the fsync
        # log, in memory only, with an inode metadata having a size of 0.
        ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link
      
        # Now force persistence of the fsync log to disk, for example, by fsyncing some
        # other file.
        touch $SCRATCH_MNT/bar
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar
      
        # Before a power loss or crash, we could read the 4Kb of data from our file as
        # expected.
        echo "File content before:"
        od -t x1 $SCRATCH_MNT/foo
      
        # Simulate a crash/power loss.
        _load_flakey_table $FLAKEY_DROP_WRITES
        _unmount_flakey
      
        _load_flakey_table $FLAKEY_ALLOW_WRITES
        _mount_flakey
      
        # After the fsync log replay, because the fsync log had a value of 0 for our
        # inode's i_size, we couldn't read anymore the 4Kb of data that we previously
        # wrote and fsync'ed. The size of the file became 0 after the fsync log replay.
        echo "File content after:"
        od -t x1 $SCRATCH_MNT/foo
      
      Another alternative test, that doesn't need to fsync an inode in the same
      transaction it was created, is:
      
        _scratch_mkfs >> $seqres.full 2>&1
        _init_flakey
        _mount_flakey
      
        # Create our test file with some data.
        $XFS_IO_PROG -f -c "pwrite -S 0xaa -b 8K 0 8K" \
             $SCRATCH_MNT/foo | _filter_xfs_io
      
        # Make sure the file is durably persisted.
        sync
      
        # Append some data to our file, to increase its size.
        $XFS_IO_PROG -f -c "pwrite -S 0xcc -b 4K 8K 4K" \
             $SCRATCH_MNT/foo | _filter_xfs_io
      
        # Fsync the file, so from this point on if a crash/power failure happens, our
        # new data is guaranteed to be there next time the fs is mounted.
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo
      
        # Add one hard link to our file. This made btrfs write into the in memory fsync
        # log a special inode with generation 0 and an i_size of 0 too. Note that this
        # didn't update the inode in the fsync log on disk.
        ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link
      
        # Now make sure the in memory fsync log is durably persisted.
        # Creating and fsync'ing another file will do it.
        touch $SCRATCH_MNT/bar
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar
      
        # As expected, before the crash/power failure, we should be able to read the
        # 12Kb of file data.
        echo "File content before:"
        od -t x1 $SCRATCH_MNT/foo
      
        # Simulate a crash/power loss.
        _load_flakey_table $FLAKEY_DROP_WRITES
        _unmount_flakey
      
        _load_flakey_table $FLAKEY_ALLOW_WRITES
        _mount_flakey
      
        # After mounting the fs again, the fsync log was replayed.
        # The btrfs fsync log replay code didn't update the i_size of the persisted
        # inode because the inode item in the log had a special generation with a
        # value of 0 (and it couldn't know the correct i_size, since that inode item
        # had a 0 i_size too). This made the last 4Kb of file data inaccessible and
        # effectively lost.
        echo "File content after:"
        od -t x1 $SCRATCH_MNT/foo
      
      This isn't a new issue/regression. This problem has been around since the
      log tree code was added in 2008:
      
        Btrfs: Add a write ahead tree log to optimize synchronous operations
        (commit e02119d5)
      
      Test cases for xfstests follow soon.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f8d6da8a
    • David Sterba's avatar
      btrfs: fix leak of path in btrfs_find_item · 751e276c
      David Sterba authored
      commit 381cf658 upstream.
      
      If btrfs_find_item is called with NULL path it allocates one locally but
      does not free it. Affected paths are inserting an orphan item for a file
      and for a subvol root.
      
      Move the path allocation to the callers.
      
      Fixes: 3f870c28 ("btrfs: expand btrfs_find_item() to include find_orphan_item functionality")
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      751e276c
    • David Sterba's avatar
      btrfs: set proper message level for skinny metadata · b4d32c36
      David Sterba authored
      commit 5efa0490 upstream.
      
      This has been confusing people for too long, the message is really just
      informative.
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4d32c36
    • Ilya Dryomov's avatar
      libceph: fix double __remove_osd() problem · cd956502
      Ilya Dryomov authored
      commit 7eb71e03 upstream.
      
      It turns out it's possible to get __remove_osd() called twice on the
      same OSD.  That doesn't sit well with rb_erase() - depending on the
      shape of the tree we can get a NULL dereference, a soft lockup or
      a random crash at some point in the future as we end up touching freed
      memory.  One scenario that I was able to reproduce is as follows:
      
                  <osd3 is idle, on the osd lru list>
      <con reset - osd3>
      con_fault_finish()
        osd_reset()
                                    <osdmap - osd3 down>
                                    ceph_osdc_handle_map()
                                      <takes map_sem>
                                      kick_requests()
                                        <takes request_mutex>
                                        reset_changed_osds()
                                          __reset_osd()
                                            __remove_osd()
                                        <releases request_mutex>
                                      <releases map_sem>
          <takes map_sem>
          <takes request_mutex>
          __kick_osd_requests()
            __reset_osd()
              __remove_osd() <-- !!!
      
      A case can be made that osd refcounting is imperfect and reworking it
      would be a proper resolution, but for now Sage and I decided to fix
      this by adding a safe guard around __remove_osd().
      
      Fixes: http://tracker.ceph.com/issues/8087
      
      Cc: Sage Weil <sage@redhat.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd956502
    • Hans de Goede's avatar
      samsung-laptop: Add use_native_backlight quirk, and enable it on some models · b3b3972a
      Hans de Goede authored
      commit 4690555e upstream.
      
      Since kernel 3.14 the backlight control has been broken on various Samsung
      Atom based netbooks. This has been bisected and this problem happens since
      commit b35684b8 ("drm/i915: do full backlight setup at enable time")
      
      This has been reported and discussed in detail here:
      http://lists.freedesktop.org/archives/intel-gfx/2014-July/049395.html
      
      Unfortunately no-one has been able to fix this. This only affects Samsung
      Atom netbooks, and the Linux kernel and the BIOS of those laptops have never
      worked well together. All affected laptops already have a quirk to avoid using
      the standard acpi-video interface and instead use the samsung specific SABI
      interface which samsung-laptop uses. It seems that recent fixes to the i915
      driver have also broken backlight control through the SABI interface.
      
      The intel_backlight driver OTOH works fine, and also allows for finer grained
      backlight control. So add a new use_native_backlight quirk, and replace the
      broken_acpi_video quirk with this quirk for affected models. This new quirk
      disables acpi-video as before and also stops samsung-laptop from registering
      the SABI based samsung_laptop backlight interface, leaving only the working
      intel_backlight interface.
      
      This commit enables this new quirk for 3 models which are known to be affected,
      chances are that it needs to be used on other models too.
      
      BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1094948 # N145P
      BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1115713 # N250P
      Reported-by: Bertrik Sikken <bertrik@sikken.nl> # N150P
      Cc: stable@vger.kernel.org # 3.16
      Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3b3972a
    • Chen Jie's avatar
      jffs2: fix handling of corrupted summary length · 37ae6d68
      Chen Jie authored
      commit 164c2406 upstream.
      
      sm->offset maybe wrong but magic maybe right, the offset do not have CRC.
      
      Badness at c00c7580 [verbose debug info unavailable]
      NIP: c00c7580 LR: c00c718c CTR: 00000014
      REGS: df07bb40 TRAP: 0700   Not tainted  (2.6.34.13-WR4.3.0.0_standard)
      MSR: 00029000 <EE,ME,CE>  CR: 22084f84  XER: 00000000
      TASK = df84d6e0[908] 'mount' THREAD: df07a000
      GPR00: 00000001 df07bbf0 df84d6e0 00000000 00000001 00000000 df07bb58 00000041
      GPR08: 00000041 c0638860 00000000 00000010 22084f88 100636c8 df814ff8 00000000
      GPR16: df84d6e0 dfa558cc c05adb90 00000048 c0452d30 00000000 000240d0 000040d0
      GPR24: 00000014 c05ae734 c05be2e0 00000000 00000001 00000000 00000000 c05ae730
      NIP [c00c7580] __alloc_pages_nodemask+0x4d0/0x638
      LR [c00c718c] __alloc_pages_nodemask+0xdc/0x638
      Call Trace:
      [df07bbf0] [c00c718c] __alloc_pages_nodemask+0xdc/0x638 (unreliable)
      [df07bc90] [c00c7708] __get_free_pages+0x20/0x48
      [df07bca0] [c00f4a40] __kmalloc+0x15c/0x1ec
      [df07bcd0] [c01fc880] jffs2_scan_medium+0xa58/0x14d0
      [df07bd70] [c01ff38c] jffs2_do_mount_fs+0x1f4/0x6b4
      [df07bdb0] [c020144c] jffs2_do_fill_super+0xa8/0x260
      [df07bdd0] [c020230c] jffs2_fill_super+0x104/0x184
      [df07be00] [c0335814] get_sb_mtd_aux+0x9c/0xec
      [df07be20] [c033596c] get_sb_mtd+0x84/0x1e8
      [df07be60] [c0201ed0] jffs2_get_sb+0x1c/0x2c
      [df07be70] [c0103898] vfs_kern_mount+0x78/0x1e8
      [df07bea0] [c0103a58] do_kern_mount+0x40/0x100
      [df07bec0] [c011fe90] do_mount+0x240/0x890
      [df07bf10] [c0120570] sys_mount+0x90/0xd8
      [df07bf40] [c00110d8] ret_from_syscall+0x0/0x4
      
      === Exception: c01 at 0xff61a34
          LR = 0x100135f0
      Instruction dump:
      38800005 38600000 48010f41 4bfffe1c 4bfc2d15 4bfffe8c 72e90200 4082fc28
      3d20c064 39298860 8809000d 68000001 <0f000000> 2f800000 419efc0c 38000001
      mount: mounting /dev/mtdblock3 on /common failed: Input/output error
      Signed-off-by: default avatarChen Jie <chenjie6@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      37ae6d68
    • Daniel J Blueman's avatar
      EDAC, amd64_edac: Prevent OOPS with >16 memory controllers · 6a35db99
      Daniel J Blueman authored
      commit 0c510cc8 upstream.
      
      When DRAM errors occur on memory controllers after EDAC_MAX_MCS (16),
      the kernel fatally dereferences unallocated structures, see splat below;
      this occurs on at least NumaConnect systems.
      
      Fix by checking if a memory controller info structure was found.
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000320
      IP: [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0
      PGD 2f8b5a3067 PUD 2f8b5a2067 PMD 0
      Oops: 0000 [#2] SMP
      Modules linked in:
      CPU: 224 PID: 11930 Comm: stream_c.exe.gn Tainted: G   D    3.19.0 #1
      Hardware name: Supermicro H8QGL/H8QGL, BIOS 3.5b    01/28/2015
      task: ffff8807dbfb8c00 ti: ffff8807dd16c000 task.ti: ffff8807dd16c000
      RIP: 0010:[<ffffffff819f714f>] [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0
      RSP: 0000:ffff8907dfc03c48 EFLAGS: 00010297
      RAX: 0000000000000001 RBX: 9c67400010080a13 RCX: 0000000000001dc6
      RDX: 000000001dc61dc6 RSI: ffff8907dfc03df0 RDI: 000000000000001c
      RBP: ffff8907dfc03ce8 R08: 0000000000000000 R09: 0000000000000022
      R10: ffff891fffa30380 R11: 00000000001cfc90 R12: 0000000000000008
      R13: 0000000000000000 R14: 000000000000001c R15: 00009c6740001000
      FS: 00007fa97ee18700(0000) GS:ffff8907dfc00000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000320 CR3: 0000003f889b8000 CR4: 00000000000407e0
      Stack:
       0000000000000000 ffff8907dfc03df0 0000000000000008 9c67400010080a13
       000000000000001c 00009c6740001000 ffff8907dfc03c88 ffffffff810e4f9a
       ffff8907dfc03ce8 ffffffff81b375b9 0000000000000000 0000000000000010
      Call Trace:
       <IRQ>
       ? vprintk_default
       ? printk
       amd_decode_mce
       notifier_call_chain
       atomic_notifier_call_chain
       mce_log
       machine_check_poll
       mce_timer_fn
       ? mce_cpu_restart
       call_timer_fn.isra.29
       run_timer_softirq
       __do_softirq
       irq_exit
       smp_apic_timer_interrupt
       apic_timer_interrupt
       <EOI>
       ? down_read_trylock
       __do_page_fault
       ? __schedule
       do_page_fault
       page_fault
      Signed-off-by: default avatarDaniel J Blueman <daniel@numascale.com>
      Link: http://lkml.kernel.org/r/1424144078-24589-1-git-send-email-daniel@numascale.com
      [ Boris: massage commit message ]
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a35db99
    • Borislav Petkov's avatar
      sb_edac: Fix detection on SNB machines · 90de8c9d
      Borislav Petkov authored
      commit 11249e73 upstream.
      
      d0585cd8 ("sb_edac: Claim a different PCI device") changed the
      probing of sb_edac to look for PCI device 0x3ca0:
      
      3f:0e.0 System peripheral: Intel Corporation Xeon E5/Core i7 Processor Home Agent (rev 07)
      00: 86 80 a0 3c 00 00 00 00 07 00 80 08 00 00 80 00
      ...
      
      but we're matching for 0x3ca8, i.e. PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA
      in sbridge_probe() therefore the probing fails.
      
      Changing it to probe for 0x3ca0 (PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA0),
      .i.e., the 14.0 device, fixes the issue and driver loads successfully
      again:
      
      [ 2449.013120] EDAC DEBUG: sbridge_init:
      [ 2449.017029] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
      [ 2449.022368] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca0
      [ 2449.028498] EDAC sbridge: Seeking for: PCI ID 8086:3ca0
      [ 2449.033768] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
      [ 2449.039028] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca8
      [ 2449.045155] EDAC sbridge: Seeking for: PCI ID 8086:3ca8
      ...
      
      Add a debug printk while at it to be able to catch the failure in the
      future and dump driver version on successful load.
      
      Fixes: d0585cd8 ("sb_edac: Claim a different PCI device")
      Acked-by: default avatarAristeu Rozanski <aris@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Acked-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Acked-by: default avatarMauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90de8c9d
    • Tomáš Hodek's avatar
      md/raid1: fix read balance when a drive is write-mostly. · 177f6a7e
      Tomáš Hodek authored
      commit d1901ef0 upstream.
      
      When a drive is marked write-mostly it should only be the
      target of reads if there is no other option.
      
      This behaviour was broken by
      
      commit 9dedf603
          md/raid1: read balance chooses idlest disk for SSD
      
      which causes a write-mostly device to be *preferred* is some cases.
      
      Restore correct behaviour by checking and setting
      best_dist_disk and best_pending_disk rather than best_disk.
      
      We only need to test one of these as they are both changed
      from -1 or >=0 at the same time.
      
      As we leave min_pending and best_dist unchanged, any non-write-mostly
      device will appear better than the write-mostly device.
      Reported-by: default avatarTomáš Hodek <tomas.hodek@volny.cz>
      Reported-by: default avatarDark Penguin <darkpenguin@yandex.ru>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Link: http://marc.info/?l=linux-raid&m=135982797322422
      Fixes: 9dedf603Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      177f6a7e
    • NeilBrown's avatar
      md/raid5: Fix livelock when array is both resyncing and degraded. · dbf3bbd1
      NeilBrown authored
      commit 26ac1073 upstream.
      
      Commit a7854487:
        md: When RAID5 is dirty, force reconstruct-write instead of read-modify-write.
      
      Causes an RCW cycle to be forced even when the array is degraded.
      A degraded array cannot support RCW as that requires reading all data
      blocks, and one may be missing.
      
      Forcing an RCW when it is not possible causes a live-lock and the code
      spins, repeatedly deciding to do something that cannot succeed.
      
      So change the condition to only force RCW on non-degraded arrays.
      Reported-by: default avatarManibalan P <pmanibalan@amiindia.co.in>
      Bisected-by: default avatarJes Sorensen <Jes.Sorensen@redhat.com>
      Tested-by: default avatarJes Sorensen <Jes.Sorensen@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Fixes: a7854487Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dbf3bbd1
    • Adrian Hunter's avatar
      perf tools: Fix probing for PERF_FLAG_FD_CLOEXEC flag · f2ee626b
      Adrian Hunter authored
      commit 48536c91 upstream.
      
      Commit f6edb53c converted the probe to
      a CPU wide event first (pid == -1). For kernels that do not support
      the PERF_FLAG_FD_CLOEXEC flag the probe fails with EINVAL. Since this
      errno is not handled pid is not reset to 0 and the subsequent use of
      pid = -1 as an argument brings in an additional failure path if
      perf_event_paranoid > 0:
      
      $ perf record -- sleep 1
      perf_event_open(..., 0) failed unexpectedly with error 13 (Permission denied)
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.007 MB /tmp/perf.data (11 samples) ]
      
      Also, ensure the fd of the confirmation check is closed and comment why
      pid = -1 is used.
      
      Needs to go to 3.18 stable tree as well.
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Based-on-patch-by: default avatarDavid Ahern <david.ahern@oracle.com>
      Acked-by: default avatarDavid Ahern <david.ahern@oracle.com>
      Cc: David Ahern <dsahern@gmail.com>
      Link: http://lkml.kernel.org/r/54EC610C.8000403@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2ee626b
    • Matthias Brugger's avatar
      clocksource: mtk: Fix race conditions in probe code · df1d6514
      Matthias Brugger authored
      commit d4a19eb3 upstream.
      
      We have two race conditions in the probe code which could lead to a null
      pointer dereference in the interrupt handler.
      
      The interrupt handler accesses the clockevent device, which may not yet be
      registered.
      
      First race condition happens when the interrupt handler gets registered before
      the interrupts get disabled. The second race condition happens when the
      interrupts get enabled, but the clockevent device is not yet registered.
      
      Fix that by disabling the interrupts before we register the interrupt and enable
      the interrupts after the clockevent device got registered.
      Reported-by: default avatarGongbae Park <yongbae2@gmail.com>
      Signed-off-by: default avatarMatthias Brugger <matthias.bgg@gmail.com>
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df1d6514
    • James Hogan's avatar
      metag: Fix KSTK_EIP() and KSTK_ESP() macros · 32effd19
      James Hogan authored
      commit c2996cb2 upstream.
      
      The KSTK_EIP() and KSTK_ESP() macros should return the user program
      counter (PC) and stack pointer (A0StP) of the given task. These are used
      to determine which VMA corresponds to the user stack in
      /proc/<pid>/maps, and for the user PC & A0StP in /proc/<pid>/stat.
      
      However for Meta the PC & A0StP from the task's kernel context are used,
      resulting in broken output. For example in following /proc/<pid>/maps
      output, the 3afff000-3b021000 VMA should be described as the stack:
      
        # cat /proc/self/maps
        ...
        100b0000-100b1000 rwxp 00000000 00:00 0          [heap]
        3afff000-3b021000 rwxp 00000000 00:00 0
      
      And in the following /proc/<pid>/stat output, the PC is in kernel code
      (1074234964 = 0x40078654) and the A0StP is in the kernel heap
      (1335981392 = 0x4fa17550):
      
        # cat /proc/self/stat
        51 (cat) R ... 1335981392 1074234964 ...
      
      Fix the definitions of KSTK_EIP() and KSTK_ESP() to use
      task_pt_regs(tsk)->ctx rather than (tsk)->thread.kernel_context. This
      gets the registers from the user context stored after the thread info at
      the base of the kernel stack, which is from the last entry into the
      kernel from userland, regardless of where in the kernel the task may
      have been interrupted, which results in the following more correct
      /proc/<pid>/maps output:
      
        # cat /proc/self/maps
        ...
        0800b000-08070000 r-xp 00000000 00:02 207        /lib/libuClibc-0.9.34-git.so
        ...
        100b0000-100b1000 rwxp 00000000 00:00 0          [heap]
        3afff000-3b021000 rwxp 00000000 00:00 0          [stack]
      
      And /proc/<pid>/stat now correctly reports the PC in libuClibc
      (134320308 = 0x80190b4) and the A0StP in the [stack] region (989864576 =
      0x3b002280):
      
        # cat /proc/self/stat
        51 (cat) R ... 989864576 134320308 ...
      Reported-by: default avatarAlexey Brodkin <Alexey.Brodkin@synopsys.com>
      Reported-by: default avatarVineet Gupta <Vineet.Gupta1@synopsys.com>
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: linux-metag@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      32effd19
    • Jan Kara's avatar
      xfs: Fix quota type in quota structures when reusing quota file · 1a5f2138
      Jan Kara authored
      commit dfcc70a8 upstream.
      
      For filesystems without separate project quota inode field in the
      superblock we just reuse project quota file for group quotas (and vice
      versa) if project quota file is allocated and we need group quota file.
      When we reuse the file, quota structures on disk suddenly have wrong
      type stored in d_flags though. Nobody really cares about this (although
      structure type reported to userspace was wrong as well) except
      that after commit 14bf61ff (quota: Switch ->get_dqblk() and
      ->set_dqblk() to use bytes as space units) assertion in
      xfs_qm_scall_getquota() started to trigger on xfs/106 test (apparently I
      was testing without XFS_DEBUG so I didn't notice when submitting the
      above commit).
      
      Fix the problem by properly resetting ddq->d_flags when running quotacheck
      for a quota file.
      Reported-by: default avatarAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a5f2138
    • Nicolas Saenz Julienne's avatar
      gpio: tps65912: fix wrong container_of arguments · 171f9925
      Nicolas Saenz Julienne authored
      commit 2f97c20e upstream.
      
      The gpio_chip operations receive a pointer the gpio_chip struct which is
      contained in the driver's private struct, yet the container_of call in those
      functions point to the mfd struct defined in include/linux/mfd/tps65912.h.
      Signed-off-by: default avatarNicolas Saenz Julienne <nicolassaenzj@gmail.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      171f9925
    • Hans Holmberg's avatar
      gpiolib: of: allow of_gpiochip_find_and_xlate to find more than one chip per node · 7229e9bf
      Hans Holmberg authored
      commit 9cf75e9e upstream.
      
      The change:
      
      7b8792bb
      gpiolib: of: Correct error handling in of_get_named_gpiod_flags
      
      assumed that only one gpio-chip is registred per of-node.
      Some drivers register more than one chip per of-node, so
      adjust the matching function of_gpiochip_find_and_xlate to
      not stop looking for chips if a node-match is found and
      the translation fails.
      
      Fixes: 7b8792bb ("gpiolib: of: Correct error handling in of_get_named_gpiod_flags")
      Signed-off-by: default avatarHans Holmberg <hans.holmberg@intel.com>
      Acked-by: default avatarAlexandre Courbot <acourbot@nvidia.com>
      Tested-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Tested-by: default avatarTyler Hall <tylerwhall@gmail.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7229e9bf
    • Catalin Marinas's avatar
      arm64: compat Fix siginfo_t -> compat_siginfo_t conversion on big endian · a4fedc85
      Catalin Marinas authored
      commit 9d42d48a upstream.
      
      The native (64-bit) sigval_t union contains sival_int (32-bit) and
      sival_ptr (64-bit). When a compat application invokes a syscall that
      takes a sigval_t value (as part of a larger structure, e.g.
      compat_sys_mq_notify, compat_sys_timer_create), the compat_sigval_t
      union is converted to the native sigval_t with sival_int overlapping
      with either the least or the most significant half of sival_ptr,
      depending on endianness. When the corresponding signal is delivered to a
      compat application, on big endian the current (compat_uptr_t)sival_ptr
      cast always returns 0 since sival_int corresponds to the top part of
      sival_ptr. This patch fixes copy_siginfo_to_user32() so that sival_int
      is copied to the compat_siginfo_t structure.
      Reported-by: default avatarBamvor Jian Zhang <bamvor.zhangjian@huawei.com>
      Tested-by: default avatarBamvor Jian Zhang <bamvor.zhangjian@huawei.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a4fedc85
    • Martin Vajnar's avatar
      hx4700: regulator: declare full constraints · 17392a1a
      Martin Vajnar authored
      commit a52d2093 upstream.
      
      Since the removal of CONFIG_REGULATOR_DUMMY option, the touchscreen stopped
      working. This patch enables the "replacement" for REGULATOR_DUMMY and
      allows the touchscreen to work even though there is no regulator for "vcc".
      Signed-off-by: default avatarMartin Vajnar <martin.vajnar@gmail.com>
      Signed-off-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      17392a1a
    • Jiang Liu's avatar
      x86/xen: Treat SCI interrupt as normal GSI interrupt · d7e48886
      Jiang Liu authored
      commit b568b860 upstream.
      
      Currently Xen Domain0 has special treatment for ACPI SCI interrupt,
      that is initialize irq for ACPI SCI at early stage in a special way as:
      xen_init_IRQ()
      	->pci_xen_initial_domain()
      		->xen_setup_acpi_sci()
      			Allocate and initialize irq for ACPI SCI
      
      Function xen_setup_acpi_sci() calls acpi_gsi_to_irq() to get an irq
      number for ACPI SCI. But unfortunately acpi_gsi_to_irq() depends on
      IOAPIC irqdomains through following path
      acpi_gsi_to_irq()
      	->mp_map_gsi_to_irq()
      		->mp_map_pin_to_irq()
      			->check IOAPIC irqdomain
      
      For PV domains, it uses Xen event based interrupt manangement and
      doesn't make uses of native IOAPIC, so no irqdomains created for IOAPIC.
      This causes Xen domain0 fail to install interrupt handler for ACPI SCI
      and all ACPI events will be lost. Please refer to:
      https://lkml.org/lkml/2014/12/19/178
      
      So the fix is to get rid of special treatment for ACPI SCI, just treat
      ACPI SCI as normal GSI interrupt as:
      acpi_gsi_to_irq()
      	->acpi_register_gsi()
      		->acpi_register_gsi_xen()
      			->xen_register_gsi()
      
      With above change, there's no need for xen_setup_acpi_sci() anymore.
      The above change also works with bare metal kernel too.
      Signed-off-by: default avatarJiang Liu <jiang.liu@linux.intel.com>
      Tested-by: default avatarSander Eikelenboom <linux@eikelenboom.it>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: xen-devel@lists.xenproject.org
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Link: http://lkml.kernel.org/r/1421720467-7709-2-git-send-email-jiang.liu@linux.intel.comSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7e48886
    • David Hildenbrand's avatar
      KVM: s390: avoid memory leaks if __inject_vm() fails · 0b4a17fc
      David Hildenbrand authored
      commit 428d53be upstream.
      
      We have to delete the allocated interrupt info if __inject_vm() fails.
      
      Otherwise user space can keep flooding kvm with floating interrupts and
      provoke more and more memory leaks.
      Reported-by: default avatarDominik Dingel <dingel@linux.vnet.ibm.com>
      Reviewed-by: default avatarDominik Dingel <dingel@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0b4a17fc
    • David Hildenbrand's avatar
      KVM: s390: floating irqs: fix user triggerable endless loop · 6d6cdcaf
      David Hildenbrand authored
      commit 8e2207cd upstream.
      
      If a vm with no VCPUs is created, the injection of a floating irq
      leads to an endless loop in the kernel.
      
      Let's skip the search for a destination VCPU for a floating irq if no
      VCPUs were created.
      Reviewed-by: default avatarDominik Dingel <dingel@linux.vnet.ibm.com>
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d6cdcaf
    • David Hildenbrand's avatar
      KVM: s390: base hrtimer on a monotonic clock · 6d351cab
      David Hildenbrand authored
      commit 0ac96caf upstream.
      
      The hrtimer that handles the wait with enabled timer interrupts
      should not be disturbed by changes of the host time.
      
      This patch changes our hrtimer to be based on a monotonic clock.
      Signed-off-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Acked-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d351cab
    • David Hildenbrand's avatar
      KVM: s390: forward hrtimer if guest ckc not pending yet · ed9eb285
      David Hildenbrand authored
      commit 2d00f759 upstream.
      
      Patch 0759d068 ("KVM: s390: cleanup handle_wait by reusing
      kvm_vcpu_block") changed the way pending guest clock comparator
      interrupts are detected. It was assumed that as soon as the hrtimer
      wakes up, the condition for the guest ckc is satisfied.
      
      This is however only true as long as adjclock() doesn't speed
      up the monotonic clock. Reason is that the hrtimer is based on
      CLOCK_MONOTONIC, the guest clock comparator detection is based
      on the raw TOD clock. If CLOCK_MONOTONIC runs faster than the
      TOD clock, the hrtimer wakes the target VCPU up too early and
      the target VCPU will not detect any pending interrupts, therefore
      going back to sleep. It will never be woken up again because the
      hrtimer has finished. The VCPU is stuck.
      
      As a quick fix, we have to forward the hrtimer until the guest
      clock comparator is really due, to guarantee properly timed wake
      ups.
      
      As the hrtimer callback might be triggered on another cpu, we
      have to make sure that the timer is really stopped and not currently
      executing the callback on another cpu. This can happen if the vcpu
      thread is scheduled onto another physical cpu, but the timer base
      is not migrated. So lets use hrtimer_cancel instead of try_to_cancel.
      
      A proper fix might be to introduce a RAW based hrtimer.
      Reported-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Acked-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed9eb285
    • Marcelo Tosatti's avatar
      KVM: x86: update masterclock values on TSC writes · d204feb2
      Marcelo Tosatti authored
      commit 7f187922 upstream.
      
      When the guest writes to the TSC, the masterclock TSC copy must be
      updated as well along with the TSC_OFFSET update, otherwise a negative
      tsc_timestamp is calculated at kvm_guest_time_update.
      
      Once "if (!vcpus_matched && ka->use_master_clock)" is simplified to
      "if (ka->use_master_clock)", the corresponding "if (!ka->use_master_clock)"
      becomes redundant, so remove the do_request boolean and collapse
      everything into a single condition.
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d204feb2
    • Jan Kara's avatar
      udf: Check length of extended attributes and allocation descriptors · f21d9d44
      Jan Kara authored
      commit 23b133bd upstream.
      
      Check length of extended attributes and allocation descriptors when
      loading inodes from disk. Otherwise corrupted filesystems could confuse
      the code and make the kernel oops.
      Reported-by: default avatarCarl Henrik Lunde <chlunde@ping.uio.no>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f21d9d44
    • Jan Kara's avatar
      udf: Remove repeated loads blocksize · f4145654
      Jan Kara authored
      commit 79144954 upstream.
      
      Store blocksize in a local variable in udf_fill_inode() since it is used
      a lot of times.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4145654
    • Markos Chandras's avatar
      MIPS: HTW: Prevent accidental HTW start due to nested htw_{start, stop} · 09f2e748
      Markos Chandras authored
      commit ed4cbc81 upstream.
      
      activate_mm() and switch_mm() call get_new_mmu_context() which in turn
      can enable the HTW before the entryhi is changed with the new ASID.
      Since the latter will enable the HTW in local_flush_tlb_all(),
      then there is a small timing window where the HTW is running with the
      new ASID but with an old pgd since the TLBMISS_HANDLER_SETUP_PGD
      hasn't assigned a new one yet. In order to prevent that, we introduce a
      simple htw counter to avoid starting HTW accidentally due to nested
      htw_{start,stop}() sequences. Moreover, since various IPI calls can
      enforce TLB flushing operations on a different core, such an operation
      may interrupt another htw_{stop,start} in progress leading inconsistent
      updates of the htw_seq variable. In order to avoid that, we disable the
      interrupts whenever we update that variable.
      Signed-off-by: default avatarMarkos Chandras <markos.chandras@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/9118/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      09f2e748
    • Alexey Brodkin's avatar
      ARC: fix page address calculation if PAGE_OFFSET != LINUX_LINK_BASE · b3b345af
      Alexey Brodkin authored
      commit 06f34e1c upstream.
      
      We used to calculate page address differently in 2 cases:
      
      1. In virt_to_page(x) we do
       --->8---
       mem_map + (x - CONFIG_LINUX_LINK_BASE) >> PAGE_SHIFT
       --->8---
      
      2. In in pte_page(x) we do
       --->8---
       mem_map + (pte_val(x) - PAGE_OFFSET) >> PAGE_SHIFT
       --->8---
      
      That leads to problems in case PAGE_OFFSET != CONFIG_LINUX_LINK_BASE -
      different pages will be selected depending on where and how we calculate
      page address.
      
      In particular in the STAR 9000853582 when gdb attempted to read memory
      of another process it got improper page in get_user_pages() because this
      is exactly one of the places where we search for a page by pte_page().
      
      The fix is trivial - we need to calculate page address similarly in both
      cases.
      Signed-off-by: default avatarAlexey Brodkin <abrodkin@synopsys.com>
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3b345af
    • Stefan Agner's avatar
      serial: fsl_lpuart: avoid new transfer while DMA is running · 8f9b87b3
      Stefan Agner authored
      commit 5f1437f6 upstream.
      
      When the UART is in DMA receive mode (RDMAS set) and one character
      just arrived while another interrupt is handled (e.g. TX), the RDRF
      (receiver data register full flag) is set due to the water level of
      1. But since the DMA will take care of this character, there is no
      need to handle it by calling lpuart_prepare_rx. Handling it leads to
      adding the RX timeout timer twice:
      
      [   74.336698] Kernel BUG at 80053070 [verbose debug info unavailable]
      [   74.342999] Internal error: Oops - BUG: 0 [#1] ARM0:00.00 khungtaskd
      [   74.347817] Modules linked in:    0 S  0.0  0.0   0:00.00 writeback
      [   74.350926] CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00001-g39d78e2 #1788
      [   74.358617] Hardware name: Freescale Vybrid VF610 (Device Tree)t
      [   74.364563] task: 807a7678 ti: 8079c000 task.ti: 8079c000 kblockd
      [   74.370002] PC is at add_timer+0x24/0x28.0  0.0   0:00.09 kworker/u2:1
      [   74.373960] LR is at lpuart_int+0x15c/0x3d8
      [   74.378171] pc : [<80053070>]    lr : [<802e0d88>]    psr: a0010193
      [   74.378171] sp : 8079de10  ip : 8079de20  fp : 8079de1c
      [   74.389694] r10: 807d44c0  r9 : 8688c300  r8 : 00000013
      [   74.394943] r7 : 20010193  r6 : 00000000  r5 : 000000a0  r4 : 86997210
      [   74.401498] r3 : ffffa7da  r2 : 80817868  r1 : 86997210  r0 : 86997344
      [   74.408052] Flags: NzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
      [   74.415489] Control: 10c5387d  Table: 8611c059  DAC: 00000015
      [   74.421265] Process swapper (pid: 0, stack limit = 0x8079c230)
      ...
      
      Solve this by only execute the receiver path (lpuart_prepare_rx) if
      the DMA receive mode (RDMAS) is not set. Also, make sure the flag is
      cleared on initialization, in case it has been left set.
      
      This can be best reproduced using UART as a serial console, then
      running top while dd'ing data into the terminal.
      Signed-off-by: default avatarStefan Agner <stefan@agner.ch>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8f9b87b3
    • Stefan Agner's avatar
      serial: fsl_lpuart: delete timer on shutdown · 5716a781
      Stefan Agner authored
      commit 4a8588a1 upstream.
      
      If the serial port gets closed while a RX transfer is in progress,
      the timer might fire after the serial port shutdown finished. This
      leads in a NULL pointer dereference:
      
      [    7.508324] Unable to handle kernel NULL pointer dereference at virtual address 00000000
      [    7.516590] pgd = 86348000
      [    7.519445] [00000000] *pgd=86179831, *pte=00000000, *ppte=00000000
      [    7.526145] Internal error: Oops: 17 [#1] ARM
      [    7.530611] Modules linked in:
      [    7.533876] CPU: 0 PID: 123 Comm: systemd Not tainted 3.19.0-rc3-00004-g5b11ea7 #1778
      [    7.541827] Hardware name: Freescale Vybrid VF610 (Device Tree)
      [    7.547862] task: 861c3400 ti: 86ac8000 task.ti: 86ac8000
      [    7.553392] PC is at lpuart_timer_func+0x24/0xf8
      [    7.558127] LR is at lpuart_timer_func+0x20/0xf8
      [    7.562857] pc : [<802df99c>]    lr : [<802df998>]    psr: 600b0113
      [    7.562857] sp : 86ac9b90  ip : 86ac9b90  fp : 86ac9bbc
      [    7.574467] r10: 80817180  r9 : 80817b98  r8 : 80817998
      [    7.579803] r7 : 807acee0  r6 : 86989000  r5 : 00000100  r4 : 86997210
      [    7.586444] r3 : 86ac8000  r2 : 86ac9bc0  r1 : 86997210  r0 : 00000000
      [    7.593085] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
      [    7.600341] Control: 10c5387d  Table: 86348059  DAC: 00000015
      [    7.606203] Process systemd (pid: 123, stack limit = 0x86ac8230)
      
      Setup the timer on UART startup which allows to delete the timer
      unconditionally on shutdown. This also saves the initialization
      on each transfer.
      Signed-off-by: default avatarStefan Agner <stefan@agner.ch>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5716a781
    • John Stultz's avatar
      ntp: Fixup adjtimex freq validation on 32-bit systems · 20dcda8d
      John Stultz authored
      commit 29183a70 upstream.
      
      Additional validation of adjtimex freq values to avoid
      potential multiplication overflows were added in commit
      5e5aeb43 (time: adjtimex: Validate the ADJ_FREQUENCY values)
      
      Unfortunately the patch used LONG_MAX/MIN instead of
      LLONG_MAX/MIN, which was fine on 64-bit systems, but being
      much smaller on 32-bit systems caused false positives
      resulting in most direct frequency adjustments to fail w/
      EINVAL.
      
      ntpd only does direct frequency adjustments at startup, so
      the issue was not as easily observed there, but other time
      sync applications like ptpd and chrony were more effected by
      the bug.
      
      See bugs:
      
        https://bugzilla.kernel.org/show_bug.cgi?id=92481
        https://bugzilla.redhat.com/show_bug.cgi?id=1188074
      
      This patch changes the checks to use LLONG_MAX for
      clarity, and additionally the checks are disabled
      on 32-bit systems since LLONG_MAX/PPM_SCALE is always
      larger then the 32-bit long freq value, so multiplication
      overflows aren't possible there.
      Reported-by: default avatarJosh Boyer <jwboyer@fedoraproject.org>
      Reported-by: default avatarGeorge Joseph <george.joseph@fairview5.com>
      Tested-by: default avatarGeorge Joseph <george.joseph@fairview5.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Link: http://lkml.kernel.org/r/1423553436-29747-1-git-send-email-john.stultz@linaro.org
      [ Prettified the changelog and the comments a bit. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      20dcda8d
    • Jay Lan's avatar
      kdb: fix incorrect counts in KDB summary command output · ab66a1f3
      Jay Lan authored
      commit 14675592 upstream.
      
      The output of KDB 'summary' command should report MemTotal, MemFree
      and Buffers output in kB. Current codes report in unit of pages.
      
      A define of K(x) as
      is defined in the code, but not used.
      
      This patch would apply the define to convert the values to kB.
      Please include me on Cc on replies. I do not subscribe to linux-kernel.
      Signed-off-by: default avatarJay Lan <jlan@sgi.com>
      Signed-off-by: default avatarJason Wessel <jason.wessel@windriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab66a1f3
    • Arnd Bergmann's avatar
      ARM: mvebu: build armada375-smp code conditionally · 67d4f781
      Arnd Bergmann authored
      commit 16523518 upstream.
      
      mvebu_armada375_smp_wa_init is only used on armada 375 but is defined
      for all mvebu machines. As it calls a function that is only provided
      sometimes, this can result in a link error:
      
      arch/arm/mach-mvebu/built-in.o: In function `mvebu_armada375_smp_wa_init':
      :(.text+0x228): undefined reference to `mvebu_setup_boot_addr_wa'
      
      To solve this, we can just change the existing #ifdef around the
      function to also check for Armada375 SMP platforms.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 305969fb ("ARM: mvebu: use the common function for Armada 375 SMP workaround")
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: Gregory Clement <gregory.clement@free-electrons.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67d4f781
    • Arnd Bergmann's avatar
      ARM: vexpress: use ARM_CPU_SUSPEND if needed · 9487809a
      Arnd Bergmann authored
      commit 95fcedb0 upstream.
      
      The vexpress tc2 power management code calls mcpm_loopback, which
      is only available if ARM_CPU_SUSPEND is enabled, otherwise we
      get a link error:
      
      arch/arm/mach-vexpress/built-in.o: In function `tc2_pm_init':
      arch/arm/mach-vexpress/tc2_pm.c:389: undefined reference to `mcpm_loopback'
      
      This explicitly selects ARM_CPU_SUSPEND like other platforms that
      need it.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 3592d7e0 ("ARM: 8082/1: TC2: test the MCPM loopback during boot")
      Acked-by: default avatarNicolas Pitre <nico@linaro.org>
      Acked-by: default avatarLiviu Dudau <liviu.dudau@arm.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Sudeep Holla <sudeep.holla@arm.com>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9487809a
    • Dmitry Eremin-Solenikov's avatar
      ARM: pxa: add regulator_has_full_constraints to poodle board file · 4a6cc3ba
      Dmitry Eremin-Solenikov authored
      commit 9bc78f32 upstream.
      
      Add regulator_has_full_constraints() call to poodle board file to let
      regulator core know that we do not have any additional regulators left.
      This lets it substitute unprovided regulators with dummy ones.
      
      This fixes the following warnings that can be seen on poodle if
      regulators are enabled:
      
      ads7846 spi1.0: unable to get regulator: -517
      spi spi1.0: Driver ads7846 requests probe deferral
      wm8731 0-001b: Failed to get supply 'AVDD': -517
      wm8731 0-001b: Failed to request supplies: -517
      wm8731 0-001b: ASoC: failed to probe component -517
      Signed-off-by: default avatarDmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Acked-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a6cc3ba
    • Dmitry Eremin-Solenikov's avatar
      ARM: pxa: add regulator_has_full_constraints to corgi board file · 30cb324e
      Dmitry Eremin-Solenikov authored
      commit 271e8017 upstream.
      
      Add regulator_has_full_constraints() call to corgi board file to let
      regulator core know that we do not have any additional regulators left.
      This lets it substitute unprovided regulators with dummy ones.
      
      This fixes the following warnings that can be seen on corgi if
      regulators are enabled:
      
      ads7846 spi1.0: unable to get regulator: -517
      spi spi1.0: Driver ads7846 requests probe deferral
      wm8731 0-001b: Failed to get supply 'AVDD': -517
      wm8731 0-001b: Failed to request supplies: -517
      wm8731 0-001b: ASoC: failed to probe component -517
      corgi-audio corgi-audio: ASoC: failed to instantiate card -517
      Signed-off-by: default avatarDmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Acked-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30cb324e