1. 12 Mar, 2015 34 commits
    • Filipe Manana's avatar
      Btrfs: fix data loss in the fast fsync path · cf217634
      Filipe Manana authored
      commit 3a8b36f3 upstream.
      
      When using the fast file fsync code path we can miss the fact that new
      writes happened since the last file fsync and therefore return without
      waiting for the IO to finish and write the new extents to the fsync log.
      
      Here's an example scenario where the fsync will miss the fact that new
      file data exists that wasn't yet durably persisted:
      
      1. fs_info->last_trans_committed == N - 1 and current transaction is
         transaction N (fs_info->generation == N);
      
      2. do a buffered write;
      
      3. fsync our inode, this clears our inode's full sync flag, starts
         an ordered extent and waits for it to complete - when it completes
         at btrfs_finish_ordered_io(), the inode's last_trans is set to the
         value N (via btrfs_update_inode_fallback -> btrfs_update_inode ->
         btrfs_set_inode_last_trans);
      
      4. transaction N is committed, so fs_info->last_trans_committed is now
         set to the value N and fs_info->generation remains with the value N;
      
      5. do another buffered write, when this happens btrfs_file_write_iter
         sets our inode's last_trans to the value N + 1 (that is
         fs_info->generation + 1 == N + 1);
      
      6. transaction N + 1 is started and fs_info->generation now has the
         value N + 1;
      
      7. transaction N + 1 is committed, so fs_info->last_trans_committed
         is set to the value N + 1;
      
      8. fsync our inode - because it doesn't have the full sync flag set,
         we only start the ordered extent, we don't wait for it to complete
         (only in a later phase) therefore its last_trans field has the
         value N + 1 set previously by btrfs_file_write_iter(), and so we
         have:
      
             inode->last_trans <= fs_info->last_trans_committed
                 (N + 1)              (N + 1)
      
         Which made us not log the last buffered write and exit the fsync
         handler immediately, returning success (0) to user space and resulting
         in data loss after a crash.
      
      This can actually be triggered deterministically and the following excerpt
      from a testcase I made for xfstests triggers the issue. It moves a dummy
      file across directories and then fsyncs the old parent directory - this
      is just to trigger a transaction commit, so moving files around isn't
      directly related to the issue but it was chosen because running 'sync' for
      example does more than just committing the current transaction, as it
      flushes/waits for all file data to be persisted. The issue can also happen
      at random periods, since the transaction kthread periodicaly commits the
      current transaction (about every 30 seconds by default).
      The body of the test is:
      
        _scratch_mkfs >> $seqres.full 2>&1
        _init_flakey
        _mount_flakey
      
        # Create our main test file 'foo', the one we check for data loss.
        # By doing an fsync against our file, it makes btrfs clear the 'needs_full_sync'
        # bit from its flags (btrfs inode specific flags).
        $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 8K" \
                        -c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io
      
        # Now create one other file and 2 directories. We will move this second file
        # from one directory to the other later because it forces btrfs to commit its
        # currently open transaction if we fsync the old parent directory. This is
        # necessary to trigger the data loss bug that affected btrfs.
        mkdir $SCRATCH_MNT/testdir_1
        touch $SCRATCH_MNT/testdir_1/bar
        mkdir $SCRATCH_MNT/testdir_2
      
        # Make sure everything is durably persisted.
        sync
      
        # Write more 8Kb of data to our file.
        $XFS_IO_PROG -c "pwrite -S 0xbb 8K 8K" $SCRATCH_MNT/foo | _filter_xfs_io
      
        # Move our 'bar' file into a new directory.
        mv $SCRATCH_MNT/testdir_1/bar $SCRATCH_MNT/testdir_2/bar
      
        # Fsync our first directory. Because it had a file moved into some other
        # directory, this made btrfs commit the currently open transaction. This is
        # a condition necessary to trigger the data loss bug.
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir_1
      
        # Now fsync our main test file. If the fsync succeeds, we expect the 8Kb of
        # data we wrote previously to be persisted and available if a crash happens.
        # This did not happen with btrfs, because of the transaction commit that
        # happened when we fsynced the parent directory.
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo
      
        # Simulate a crash/power loss.
        _load_flakey_table $FLAKEY_DROP_WRITES
        _unmount_flakey
      
        _load_flakey_table $FLAKEY_ALLOW_WRITES
        _mount_flakey
      
        # Now check that all data we wrote before are available.
        echo "File content after log replay:"
        od -t x1 $SCRATCH_MNT/foo
      
        status=0
        exit
      
      The expected golden output for the test, which is what we get with this
      fix applied (or when running against ext3/4 and xfs), is:
      
        wrote 8192/8192 bytes at offset 0
        XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
        wrote 8192/8192 bytes at offset 8192
        XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
        File content after log replay:
        0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
        *
        0020000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
        *
        0040000
      
      Without this fix applied, the output shows the test file does not have
      the second 8Kb extent that we successfully fsynced:
      
        wrote 8192/8192 bytes at offset 0
        XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
        wrote 8192/8192 bytes at offset 8192
        XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
        File content after log replay:
        0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
        *
        0020000
      
      So fix this by skipping the fsync only if we're doing a full sync and
      if the inode's last_trans is <= fs_info->last_trans_committed, or if
      the inode is already in the log. Also remove setting the inode's
      last_trans in btrfs_file_write_iter since it's useless/unreliable.
      
      Also because btrfs_file_write_iter no longer sets inode->last_trans to
      fs_info->generation + 1, don't set last_trans to 0 if we bail out and don't
      bail out if last_trans is 0, otherwise something as simple as the following
      example wouldn't log the second write on the last fsync:
      
        1. write to file
      
        2. fsync file
      
        3. fsync file
             |--> btrfs_inode_in_log() returns true and it set last_trans to 0
      
        4. write to file
             |--> btrfs_file_write_iter() no longers sets last_trans, so it
                  remained with a value of 0
        5. fsync
             |--> inode->last_trans == 0, so it bails out without logging the
                  second write
      
      A test case for xfstests will be sent soon.
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      cf217634
    • David Sterba's avatar
      btrfs: fix lost return value due to variable shadowing · 3caad0ed
      David Sterba authored
      commit 1932b7be upstream.
      
      A block-local variable stores error code but btrfs_get_blocks_direct may
      not return it in the end as there's a ret defined in the function scope.
      
      Fixes: d187663e ("Btrfs: lock extents as we map them in DIO")
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.cz>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3caad0ed
    • Alexander Usyskin's avatar
      mei: make device disabled on stop unconditionally · 8607e6ba
      Alexander Usyskin authored
      commit 6c15a851 upstream.
      
      Set the internal device state to to disabled after hardware reset in stop flow.
      This will cover cases when driver was not brought to disabled state because of
      an error and in stop flow we wish not to retry the reset.
      Signed-off-by: default avatarAlexander Usyskin <alexander.usyskin@intel.com>
      Signed-off-by: default avatarTomas Winkler <tomas.winkler@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8607e6ba
    • Urs Fässler's avatar
      iio: ad5686: fix optional reference voltage declaration · 182830fe
      Urs Fässler authored
      commit da019f59 upstream.
      
      When not using the "_optional" function, a dummy regulator is returned
      and the driver fails to initialize.
      Signed-off-by: default avatarUrs Fässler <urs.fassler@bytesatwork.ch>
      Acked-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      182830fe
    • Rasmus Villemoes's avatar
      iio: imu: adis16400: Fix sign extension · dac99fda
      Rasmus Villemoes authored
      commit 19e353f2 upstream.
      
      The intention is obviously to sign-extend a 12 bit quantity. But
      because of C's promotion rules, the assignment is equivalent to "val16
      &= 0xfff;". Use the proper API for this.
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Acked-by: default avatarLars-Peter Clausen <lars@metafoo.de>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      dac99fda
    • Andy Lutomirski's avatar
      x86/asm/entry/64: Remove a bogus 'ret_from_fork' optimization · 93ba6108
      Andy Lutomirski authored
      commit 956421fb upstream.
      
      'ret_from_fork' checks TIF_IA32 to determine whether 'pt_regs' and
      the related state make sense for 'ret_from_sys_call'.  This is
      entirely the wrong check.  TS_COMPAT would make a little more
      sense, but there's really no point in keeping this optimization
      at all.
      
      This fixes a return to the wrong user CS if we came from int
      0x80 in a 64-bit task.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/4710be56d76ef994ddf59087aad98c000fbab9a4.1424989793.git.luto@amacapital.net
      [ Backported from tip:x86/asm. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      93ba6108
    • Nicholas Bellinger's avatar
      target: Check for LBA + sectors wrap-around in sbc_parse_cdb · 14ee62dd
      Nicholas Bellinger authored
      commit aa179935 upstream.
      
      This patch adds a check to sbc_parse_cdb() in order to detect when
      an LBA + sector vs. end-of-device calculation wraps when the LBA is
      sufficently large enough (eg: 0xFFFFFFFFFFFFFFFF).
      
      Cc: Martin Petersen <martin.petersen@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      14ee62dd
    • Nicholas Bellinger's avatar
      target: Add missing WRITE_SAME end-of-device sanity check · dffc0bca
      Nicholas Bellinger authored
      commit 8e575c50 upstream.
      
      This patch adds a check to sbc_setup_write_same() to verify
      the incoming WRITE_SAME LBA + number of blocks does not exceed
      past the end-of-device.
      
      Also check for potential LBA wrap-around as well.
      Reported-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Martin Petersen <martin.petersen@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      dffc0bca
    • Nicholas Bellinger's avatar
      target: Fix PR_APTPL_BUF_LEN buffer size limitation · b18b42fe
      Nicholas Bellinger authored
      commit f161d4b4 upstream.
      
      This patch addresses the original PR_APTPL_BUF_LEN = 8k limitiation
      for write-out of PR APTPL metadata that Martin has recently been
      running into.
      
      It changes core_scsi3_update_and_write_aptpl() to use vzalloc'ed
      memory instead of kzalloc, and increases the default hardcoded
      length to 256k.
      
      It also adds logic in core_scsi3_update_and_write_aptpl() to double
      the original length upon core_scsi3_update_aptpl_buf() failure, and
      retries until the vzalloc'ed buffer is large enough to accommodate
      the outgoing APTPL metadata.
      Reported-by: default avatarMartin Svec <martin.svec@zoner.cz>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b18b42fe
    • Christian König's avatar
      drm/radeon: workaround for CP HW bug on CIK · cfa1450a
      Christian König authored
      commit a9c73a0e upstream.
      
      Emit the EOP twice to avoid cache flushing problems.
      Signed-off-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      cfa1450a
    • Alex Deucher's avatar
      drm/radeon: only enable kv/kb dpm interrupts once v3 · ccc08964
      Alex Deucher authored
      commit 410af8d7 upstream.
      
      Enable at init and disable on fini. Workaround for hardware problems.
      
      v2 (chk): extend commit message
      v3: add new function
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: Christian König <christian.koenig@amd.com> (v2)
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ccc08964
    • Grazvydas Ignotas's avatar
      mm/memory.c: actually remap enough memory · 970970dc
      Grazvydas Ignotas authored
      commit 9cb12d7b upstream.
      
      For whatever reason, generic_access_phys() only remaps one page, but
      actually allows to access arbitrary size.  It's quite easy to trigger
      large reads, like printing out large structure with gdb, which leads to a
      crash.  Fix it by remapping correct size.
      
      Fixes: 28b2ee20 ("access_process_vm device memory infrastructure")
      Signed-off-by: default avatarGrazvydas Ignotas <notasas@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      970970dc
    • Joonsoo Kim's avatar
      mm/compaction: fix wrong order check in compact_finished() · 5a9653fd
      Joonsoo Kim authored
      commit 372549c2 upstream.
      
      What we want to check here is whether there is highorder freepage in buddy
      list of other migratetype in order to steal it without fragmentation.
      But, current code just checks cc->order which means allocation request
      order.  So, this is wrong.
      
      Without this fix, non-movable synchronous compaction below pageblock order
      would not stopped until compaction is complete, because migratetype of
      most pageblocks are movable and high order freepage made by compaction is
      usually on movable type buddy list.
      
      There is some report related to this bug. See below link.
      
        http://www.spinics.net/lists/linux-mm/msg81666.html
      
      Although the issued system still has load spike comes from compaction,
      this makes that system completely stable and responsive according to his
      report.
      
      stress-highalloc test in mmtests with non movable order 7 allocation
      doesn't show any notable difference in allocation success rate, but, it
      shows more compaction success rate.
      
      Compaction success rate (Compaction success * 100 / Compaction stalls, %)
      18.47 : 28.94
      
      Fixes: 1fb3f8ca ("mm: compaction: capture a suitable high-order page immediately when it is made available")
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5a9653fd
    • Roman Gushchin's avatar
      mm/nommu.c: fix arithmetic overflow in __vm_enough_memory() · 56ade695
      Roman Gushchin authored
      commit 8138a67a upstream.
      
      I noticed that "allowed" can easily overflow by falling below 0, because
      (total_vm / 32) can be larger than "allowed".  The problem occurs in
      OVERCOMMIT_NONE mode.
      
      In this case, a huge allocation can success and overcommit the system
      (despite OVERCOMMIT_NONE mode).  All subsequent allocations will fall
      (system-wide), so system become unusable.
      
      The problem was masked out by commit c9b1d098
      ("mm: limit growth of 3% hardcoded other user reserve"),
      but it's easy to reproduce it on older kernels:
      1) set overcommit_memory sysctl to 2
      2) mmap() large file multiple times (with VM_SHARED flag)
      3) try to malloc() large amount of memory
      
      It also can be reproduced on newer kernels, but miss-configured
      sysctl_user_reserve_kbytes is required.
      
      Fix this issue by switching to signed arithmetic here.
      Signed-off-by: default avatarRoman Gushchin <klamm@yandex-team.ru>
      Cc: Andrew Shewmaker <agshew@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      56ade695
    • Roman Gushchin's avatar
      mm/mmap.c: fix arithmetic overflow in __vm_enough_memory() · da94166f
      Roman Gushchin authored
      commit 5703b087 upstream.
      
      I noticed, that "allowed" can easily overflow by falling below 0,
      because (total_vm / 32) can be larger than "allowed".  The problem
      occurs in OVERCOMMIT_NONE mode.
      
      In this case, a huge allocation can success and overcommit the system
      (despite OVERCOMMIT_NONE mode).  All subsequent allocations will fall
      (system-wide), so system become unusable.
      
      The problem was masked out by commit c9b1d098
      ("mm: limit growth of 3% hardcoded other user reserve"),
      but it's easy to reproduce it on older kernels:
      1) set overcommit_memory sysctl to 2
      2) mmap() large file multiple times (with VM_SHARED flag)
      3) try to malloc() large amount of memory
      
      It also can be reproduced on newer kernels, but miss-configured
      sysctl_user_reserve_kbytes is required.
      
      Fix this issue by switching to signed arithmetic here.
      
      [akpm@linux-foundation.org: use min_t]
      Signed-off-by: default avatarRoman Gushchin <klamm@yandex-team.ru>
      Cc: Andrew Shewmaker <agshew@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      da94166f
    • Naoya Horiguchi's avatar
      mm/hugetlb: add migration entry check in __unmap_hugepage_range · a6b3222b
      Naoya Horiguchi authored
      commit 9fbc1f63 upstream.
      
      If __unmap_hugepage_range() tries to unmap the address range over which
      hugepage migration is on the way, we get the wrong page because pte_page()
      doesn't work for migration entries.  This patch simply clears the pte for
      migration entries as we do for hwpoison entries.
      
      Fixes: 290408d4 ("hugetlb: hugepage migration core")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a6b3222b
    • Naoya Horiguchi's avatar
      mm/hugetlb: add migration/hwpoisoned entry check in hugetlb_change_protection · ea47f034
      Naoya Horiguchi authored
      commit a8bda28d upstream.
      
      There is a race condition between hugepage migration and
      change_protection(), where hugetlb_change_protection() doesn't care about
      migration entries and wrongly overwrites them.  That causes unexpected
      results like kernel crash.  HWPoison entries also can cause the same
      problem.
      
      This patch adds is_hugetlb_entry_(migration|hwpoisoned) check in this
      function to do proper actions.
      
      Fixes: 290408d4 ("hugetlb: hugepage migration core")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ea47f034
    • Jiri Pirko's avatar
      team: don't traverse port list using rcu in team_set_mac_address · c272044a
      Jiri Pirko authored
      [ Upstream commit 9215f437 ]
      
      Currently the list is traversed using rcu variant. That is not correct
      since dev_set_mac_address can be called which eventually calls
      rtmsg_ifinfo_build_skb and there, skb allocation can sleep. So fix this
      by remove the rcu usage here.
      
      Fixes: 3d249d4c "net: introduce ethernet teaming device"
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c272044a
    • Lorenzo Colitti's avatar
      net: ping: Return EAFNOSUPPORT when appropriate. · 7a763ff1
      Lorenzo Colitti authored
      [ Upstream commit 9145736d ]
      
      1. For an IPv4 ping socket, ping_check_bind_addr does not check
         the family of the socket address that's passed in. Instead,
         make it behave like inet_bind, which enforces either that the
         address family is AF_INET, or that the family is AF_UNSPEC and
         the address is 0.0.0.0.
      2. For an IPv6 ping socket, ping_check_bind_addr returns EINVAL
         if the socket family is not AF_INET6. Return EAFNOSUPPORT
         instead, for consistency with inet6_bind.
      3. Make ping_v4_sendmsg and ping_v6_sendmsg return EAFNOSUPPORT
         instead of EINVAL if an incorrect socket address structure is
         passed in.
      4. Make IPv6 ping sockets be IPv6-only. The code does not support
         IPv4, and it cannot easily be made to support IPv4 because
         the protocol numbers for ICMP and ICMPv6 are different. This
         makes connect(::ffff:192.0.2.1) fail with EAFNOSUPPORT instead
         of making the socket unusable.
      
      Among other things, this fixes an oops that can be triggered by:
      
          int s = socket(AF_INET, SOCK_DGRAM, IPPROTO_ICMP);
          struct sockaddr_in6 sin6 = {
              .sin6_family = AF_INET6,
              .sin6_addr = in6addr_any,
          };
          bind(s, (struct sockaddr *) &sin6, sizeof(sin6));
      
      Change-Id: If06ca86d9f1e4593c0d6df174caca3487c57a241
      Signed-off-by: default avatarLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7a763ff1
    • Michal Kubeček's avatar
      udp: only allow UFO for packets from SOCK_DGRAM sockets · 4f958642
      Michal Kubeček authored
      [ Upstream commit acf8dd0a ]
      
      If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a
      UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to
      CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer
      checksum is to be computed on segmentation. However, in this case,
      skb->csum_start and skb->csum_offset are never set as raw socket
      transmit path bypasses udp_send_skb() where they are usually set. As a
      result, driver may access invalid memory when trying to calculate the
      checksum and store the result (as observed in virtio_net driver).
      
      Moreover, the very idea of modifying the userspace provided UDP header
      is IMHO against raw socket semantics (I wasn't able to find a document
      clearly stating this or the opposite, though). And while allowing
      CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit
      too intrusive change just to handle a corner case like this. Therefore
      disallowing UFO for packets from SOCK_DGRAM seems to be the best option.
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4f958642
    • Ben Shelton's avatar
      usb: plusb: Add support for National Instruments host-to-host cable · b207adc1
      Ben Shelton authored
      [ Upstream commit 42c972a1 ]
      
      The National Instruments USB Host-to-Host Cable is based on the Prolific
      PL-25A1 chipset.  Add its VID/PID so the plusb driver will recognize it.
      Signed-off-by: default avatarBen Shelton <ben.shelton@ni.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b207adc1
    • Eric Dumazet's avatar
      macvtap: make sure neighbour code can push ethernet header · 2b6de7d3
      Eric Dumazet authored
      [ Upstream commit 2f1d8b9e ]
      
      Brian reported crashes using IPv6 traffic with macvtap/veth combo.
      
      I tracked the crashes in neigh_hh_output()
      
      -> memcpy(skb->data - HH_DATA_MOD, hh->hh_data, HH_DATA_MOD);
      
      Neighbour code assumes headroom to push Ethernet header is
      at least 16 bytes.
      
      It appears macvtap has only 14 bytes available on arches
      where NET_IP_ALIGN is 0 (like x86)
      
      Effect is a corruption of 2 bytes right before skb->head,
      and possible crashes if accessing non existing memory.
      
      This fix should also increase IPv4 performance, as paranoid code
      in ip_finish_output2() wont have to call skb_realloc_headroom()
      Reported-by: default avatarBrian Rak <brak@vultr.com>
      Tested-by: default avatarBrian Rak <brak@vultr.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      2b6de7d3
    • Catalin Marinas's avatar
      net: compat: Ignore MSG_CMSG_COMPAT in compat_sys_{send, recv}msg · a835f601
      Catalin Marinas authored
      [ Upstream commit d720d8ce ]
      
      With commit a7526eb5 (net: Unbreak compat_sys_{send,recv}msg), the
      MSG_CMSG_COMPAT flag is blocked at the compat syscall entry points,
      changing the kernel compat behaviour from the one before the commit it
      was trying to fix (1be374a0, net: Block MSG_CMSG_COMPAT in
      send(m)msg and recv(m)msg).
      
      On 32-bit kernels (!CONFIG_COMPAT), MSG_CMSG_COMPAT is 0 and the native
      32-bit sys_sendmsg() allows flag 0x80000000 to be set (it is ignored by
      the kernel). However, on a 64-bit kernel, the compat ABI is different
      with commit a7526eb5.
      
      This patch changes the compat_sys_{send,recv}msg behaviour to the one
      prior to commit 1be374a0.
      
      The problem was found running 32-bit LTP (sendmsg01) binary on an arm64
      kernel. Arguably, LTP should not pass 0xffffffff as flags to sendmsg()
      but the general rule is not to break user ABI (even when the user
      behaviour is not entirely sane).
      
      Fixes: a7526eb5 (net: Unbreak compat_sys_{send,recv}msg)
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a835f601
    • Jiri Pirko's avatar
      team: fix possible null pointer dereference in team_handle_frame · 7002744d
      Jiri Pirko authored
      [ Upstream commit 57e59563 ]
      
      Currently following race is possible in team:
      
      CPU0                                        CPU1
                                                  team_port_del
                                                    team_upper_dev_unlink
                                                      priv_flags &= ~IFF_TEAM_PORT
      team_handle_frame
        team_port_get_rcu
          team_port_exists
            priv_flags & IFF_TEAM_PORT == 0
          return NULL (instead of port got
                       from rx_handler_data)
                                                    netdev_rx_handler_unregister
      
      The thing is that the flag is removed before rx_handler is unregistered.
      If team_handle_frame is called in between, team_port_exists returns 0
      and team_port_get_rcu will return NULL.
      So do not check the flag here. It is guaranteed by netdev_rx_handler_unregister
      that team_handle_frame will always see valid rx_handler_data pointer.
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Fixes: 3d249d4c ("net: introduce ethernet teaming device")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7002744d
    • Matthew Thode's avatar
      net: reject creation of netdev names with colons · eb83542b
      Matthew Thode authored
      [ Upstream commit a4176a93 ]
      
      colons are used as a separator in netdev device lookup in dev_ioctl.c
      
      Specific functions are SIOCGIFTXQLEN SIOCETHTOOL SIOCSIFNAME
      Signed-off-by: default avatarMatthew Thode <mthode@mthode.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      eb83542b
    • Ignacy Gawędzki's avatar
      ematch: Fix auto-loading of ematch modules. · c07c93a6
      Ignacy Gawędzki authored
      [ Upstream commit 34eea79e ]
      
      In tcf_em_validate(), after calling request_module() to load the
      kind-specific module, set em->ops to NULL before returning -EAGAIN, so
      that module_put() is not called again by tcf_em_tree_destroy().
      Signed-off-by: default avatarIgnacy Gawędzki <ignacy.gawedzki@green-communications.fr>
      Acked-by: default avatarCong Wang <cwang@twopensource.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c07c93a6
    • Guenter Roeck's avatar
      net: phy: Fix verification of EEE support in phy_init_eee · c83355ce
      Guenter Roeck authored
      [ Upstream commit 54da5a8b ]
      
      phy_init_eee uses phy_find_setting(phydev->speed, phydev->duplex)
      to find a valid entry in the settings array for the given speed
      and duplex value. For full duplex 1000baseT, this will return
      the first matching entry, which is the entry for 1000baseKX_Full.
      
      If the phy eee does not support 1000baseKX_Full, this entry will not
      match, causing phy_init_eee to fail for no good reason.
      
      Fixes: 9a9c56cb ("net: phy: fix a bug when verify the EEE support")
      Fixes: 3e707706 ("phy: Expand phy speed/duplex settings array")
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c83355ce
    • Alexander Drozdov's avatar
      ipv4: ip_check_defrag should not assume that skb_network_offset is zero · a930d60b
      Alexander Drozdov authored
      [ Upstream commit 3e32e733 ]
      
      ip_check_defrag() may be used by af_packet to defragment outgoing packets.
      skb_network_offset() of af_packet's outgoing packets is not zero.
      Signed-off-by: default avatarAlexander Drozdov <al.drozdov@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a930d60b
    • Alexander Drozdov's avatar
      ipv4: ip_check_defrag should correctly check return value of skb_copy_bits · c0c6450b
      Alexander Drozdov authored
      [ Upstream commit fba04a9e ]
      
      skb_copy_bits() returns zero on success and negative value on error,
      so it is needed to invert the condition in ip_check_defrag().
      
      Fixes: 1bf3751e ("ipv4: ip_check_defrag must not modify skb before unsharing")
      Signed-off-by: default avatarAlexander Drozdov <al.drozdov@gmail.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c0c6450b
    • Ignacy Gawędzki's avatar
      gen_stats.c: Duplicate xstats buffer for later use · 85320d5d
      Ignacy Gawędzki authored
      [ Upstream commit 1c4cff0c ]
      
      The gnet_stats_copy_app() function gets called, more often than not, with its
      second argument a pointer to an automatic variable in the caller's stack.
      Therefore, to avoid copying garbage afterwards when calling
      gnet_stats_finish_copy(), this data is better copied to a dynamically allocated
      memory that gets freed after use.
      
      [xiyou.wangcong@gmail.com: remove a useless kfree()]
      Signed-off-by: default avatarIgnacy Gawędzki <ignacy.gawedzki@green-communications.fr>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      85320d5d
    • WANG Cong's avatar
      rtnetlink: call ->dellink on failure when ->newlink exists · 4e522af0
      WANG Cong authored
      [ Upstream commit 7afb8886 ]
      
      Ignacy reported that when eth0 is down and add a vlan device
      on top of it like:
      
        ip link add link eth0 name eth0.1 up type vlan id 1
      
      We will get a refcount leak:
      
        unregister_netdevice: waiting for eth0.1 to become free. Usage count = 2
      
      The problem is when rtnl_configure_link() fails in rtnl_newlink(),
      we simply call unregister_device(), but for stacked device like vlan,
      we almost do nothing when we unregister the upper device, more work
      is done when we unregister the lower device, so call its ->dellink().
      Reported-by: default avatarIgnacy Gawedzki <ignacy.gawedzki@green-communications.fr>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4e522af0
    • Martin KaFai Lau's avatar
      ipv6: fix ipv6_cow_metrics for non DST_HOST case · 818201d3
      Martin KaFai Lau authored
      [ Upstream commit 3b471175 ]
      
      ipv6_cow_metrics() currently assumes only DST_HOST routes require
      dynamic metrics allocation from inetpeer.  The assumption breaks
      when ndisc discovered router with RTAX_MTU and RTAX_HOPLIMIT metric.
      Refer to ndisc_router_discovery() in ndisc.c and note that dst_metric_set()
      is called after the route is created.
      
      This patch creates the metrics array (by calling dst_cow_metrics_generic) in
      ipv6_cow_metrics().
      
      Test:
      radvd.conf:
      interface qemubr0
      {
      	AdvLinkMTU 1300;
      	AdvCurHopLimit 30;
      
      	prefix fd00:face:face:face::/64
      	{
      		AdvOnLink on;
      		AdvAutonomous on;
      		AdvRouterAddr off;
      	};
      };
      
      Before:
      [root@qemu1 ~]# ip -6 r show | egrep -v unreachable
      fd00:face:face:face::/64 dev eth0  proto kernel  metric 256  expires 27sec
      fe80::/64 dev eth0  proto kernel  metric 256
      default via fe80::74df:d0ff:fe23:8ef2 dev eth0  proto ra  metric 1024  expires 27sec
      
      After:
      [root@qemu1 ~]# ip -6 r show | egrep -v unreachable
      fd00:face:face:face::/64 dev eth0  proto kernel  metric 256  expires 27sec mtu 1300
      fe80::/64 dev eth0  proto kernel  metric 256  mtu 1300
      default via fe80::74df:d0ff:fe23:8ef2 dev eth0  proto ra  metric 1024  expires 27sec mtu 1300 hoplimit 30
      
      Fixes: 8e2ec639 (ipv6: don't use inetpeer to store metrics for routes.)
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      818201d3
    • Daniel Borkmann's avatar
      rtnetlink: ifla_vf_policy: fix misuses of NLA_BINARY · 8847ac89
      Daniel Borkmann authored
      [ Upstream commit 364d5716 ]
      
      ifla_vf_policy[] is wrong in advertising its individual member types as
      NLA_BINARY since .type = NLA_BINARY in combination with .len declares the
      len member as *max* attribute length [0, len].
      
      The issue is that when do_setvfinfo() is being called to set up a VF
      through ndo handler, we could set corrupted data if the attribute length
      is less than the size of the related structure itself.
      
      The intent is exactly the opposite, namely to make sure to pass at least
      data of minimum size of len.
      
      Fixes: ebc08a6f ("rtnetlink: Add VF config code to rtnetlink")
      Cc: Mitch Williams <mitch.a.williams@intel.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8847ac89
    • Sabrina Dubroca's avatar
      pktgen: fix UDP checksum computation · 7097d643
      Sabrina Dubroca authored
      [ Upstream commit 7744b5f3 ]
      
      This patch fixes two issues in UDP checksum computation in pktgen.
      
      First, the pseudo-header uses the source and destination IP
      addresses. Currently, the ports are used for IPv4.
      
      Second, the UDP checksum covers both header and data.  So we need to
      generate the data earlier (move pktgen_finalize_skb up), and compute
      the checksum for UDP header + data.
      
      Fixes: c26bf4a5 ("pktgen: Add UDPCSUM flag to support UDP checksums")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7097d643
  2. 10 Mar, 2015 3 commits
  3. 07 Mar, 2015 1 commit
  4. 05 Mar, 2015 2 commits
    • Alan Stern's avatar
      USB: EHCI: adjust error return code · 8920e92c
      Alan Stern authored
      commit c401e7b4 upstream.
      
      The USB stack uses error code -ENOSPC to indicate that the periodic
      schedule is too full, with insufficient bandwidth to accommodate a new
      allocation.  It uses -EFBIG to indicate that an isochronous transfer
      could not be linked into the schedule because it would exceed the
      number of isochronous packets the host controller driver can handle
      (generally because the new transfer would extend too far into the
      future).
      
      ehci-hcd uses the wrong error code at one point.  This patch fixes it,
      along with a misleading comment and debugging message.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8920e92c
    • Ian Abbott's avatar
      staging: comedi: cb_pcidas64: fix incorrect AI range code handling · 5bebf228
      Ian Abbott authored
      commit be8e8908 upstream.
      
      The hardware range code values and list of valid ranges for the AI
      subdevice is incorrect for several supported boards.  The hardware range
      code values for all boards except PCI-DAS4020/12 is determined by
      calling `ai_range_bits_6xxx()` based on the maximum voltage of the range
      and whether it is bipolar or unipolar, however it only returns the
      correct hardware range code for the PCI-DAS60xx boards.  For
      PCI-DAS6402/16 (and /12) it returns the wrong code for the unipolar
      ranges.  For PCI-DAS64/Mx/16 it returns the wrong code for all the
      ranges and the comedi range table is incorrect.
      
      Change `ai_range_bits_6xxx()` to use a look-up table pointed to by new
      member `ai_range_codes` of `struct pcidas64_board` to map the comedi
      range table indices to the hardware range codes.  Use a new comedi range
      table for the PCI-DAS64/Mx/16 boards (and the commented out variants).
      Signed-off-by: default avatarIan Abbott <abbotti@mev.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5bebf228