An error occurred fetching the project authors.
  1. 29 Apr, 2015 25 commits
    • Naoya Horiguchi's avatar
      mm/hugetlb: take page table lock in follow_huge_pmd() · 014275c8
      Naoya Horiguchi authored
      commit e66f17ff upstream.
      
      We have a race condition between move_pages() and freeing hugepages, where
      move_pages() calls follow_page(FOLL_GET) for hugepages internally and
      tries to get its refcount without preventing concurrent freeing.  This
      race crashes the kernel, so this patch fixes it by moving FOLL_GET code
      for hugepages into follow_huge_pmd() with taking the page table lock.
      
      This patch intentionally removes page==NULL check after pte_page.
      This is justified because pte_page() never returns NULL for any
      architectures or configurations.
      
      This patch changes the behavior of follow_huge_pmd() for tail pages and
      then tail pages can be pinned/returned.  So the caller must be changed to
      properly handle the returned tail pages.
      
      We could have a choice to add the similar locking to
      follow_huge_(addr|pud) for consistency, but it's not necessary because
      currently these functions don't support FOLL_GET flag, so let's leave it
      for future development.
      
      Here is the reproducer:
      
        $ cat movepages.c
        #include <stdio.h>
        #include <stdlib.h>
        #include <numaif.h>
      
        #define ADDR_INPUT      0x700000000000UL
        #define HPS             0x200000
        #define PS              0x1000
      
        int main(int argc, char *argv[]) {
                int i;
                int nr_hp = strtol(argv[1], NULL, 0);
                int nr_p  = nr_hp * HPS / PS;
                int ret;
                void **addrs;
                int *status;
                int *nodes;
                pid_t pid;
      
                pid = strtol(argv[2], NULL, 0);
                addrs  = malloc(sizeof(char *) * nr_p + 1);
                status = malloc(sizeof(char *) * nr_p + 1);
                nodes  = malloc(sizeof(char *) * nr_p + 1);
      
                while (1) {
                        for (i = 0; i < nr_p; i++) {
                                addrs[i] = (void *)ADDR_INPUT + i * PS;
                                nodes[i] = 1;
                                status[i] = 0;
                        }
                        ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
                                              MPOL_MF_MOVE_ALL);
                        if (ret == -1)
                                err("move_pages");
      
                        for (i = 0; i < nr_p; i++) {
                                addrs[i] = (void *)ADDR_INPUT + i * PS;
                                nodes[i] = 0;
                                status[i] = 0;
                        }
                        ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
                                              MPOL_MF_MOVE_ALL);
                        if (ret == -1)
                                err("move_pages");
                }
                return 0;
        }
      
        $ cat hugepage.c
        #include <stdio.h>
        #include <sys/mman.h>
        #include <string.h>
      
        #define ADDR_INPUT      0x700000000000UL
        #define HPS             0x200000
      
        int main(int argc, char *argv[]) {
                int nr_hp = strtol(argv[1], NULL, 0);
                char *p;
      
                while (1) {
                        p = mmap((void *)ADDR_INPUT, nr_hp * HPS, PROT_READ | PROT_WRITE,
                                 MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
                        if (p != (void *)ADDR_INPUT) {
                                perror("mmap");
                                break;
                        }
                        memset(p, 0, nr_hp * HPS);
                        munmap(p, nr_hp * HPS);
                }
        }
      
        $ sysctl vm.nr_hugepages=40
        $ ./hugepage 10 &
        $ ./movepages 10 $(pgrep -f hugepage)
      
      
      [n-horiguchi@ah.jp.nec.com: resolve conflict to apply to v3.19.1]
      Fixes: e632a938 ("mm: migrate: add hugepage migration code to move_pages()")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reported-by: default avatarHugh Dickins <hughd@google.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      014275c8
    • Naoya Horiguchi's avatar
      mm/hugetlb: reduce arch dependent code around follow_huge_* · a15d5146
      Naoya Horiguchi authored
      commit 61f77eda upstream.
      
      Currently we have many duplicates in definitions around
      follow_huge_addr(), follow_huge_pmd(), and follow_huge_pud(), so this
      patch tries to remove the m.  The basic idea is to put the default
      implementation for these functions in mm/hugetlb.c as weak symbols
      (regardless of CONFIG_ARCH_WANT_GENERAL_HUGETL B), and to implement
      arch-specific code only when the arch needs it.
      
      For follow_huge_addr(), only powerpc and ia64 have their own
      implementation, and in all other architectures this function just returns
      ERR_PTR(-EINVAL).  So this patch sets returning ERR_PTR(-EINVAL) as
      default.
      
      As for follow_huge_(pmd|pud)(), if (pmd|pud)_huge() is implemented to
      always return 0 in your architecture (like in ia64 or sparc,) it's never
      called (the callsite is optimized away) no matter how implemented it is.
      So in such architectures, we don't need arch-specific implementation.
      
      In some architecture (like mips, s390 and tile,) their current
      arch-specific follow_huge_(pmd|pud)() are effectively identical with the
      common code, so this patch lets these architecture use the common code.
      
      One exception is metag, where pmd_huge() could return non-zero but it
      expects follow_huge_pmd() to always return NULL.  This means that we need
      arch-specific implementation which returns NULL.  This behavior looks
      strange to me (because non-zero pmd_huge() implies that the architecture
      supports PMD-based hugepage, so follow_huge_pmd() can/should return some
      relevant value,) but that's beyond this cleanup patch, so let's keep it.
      
      Justification of non-trivial changes:
      - in s390, follow_huge_pmd() checks !MACHINE_HAS_HPAGE at first, and this
        patch removes the check. This is OK because we can assume MACHINE_HAS_HPAGE
        is true when follow_huge_pmd() can be called (note that pmd_huge() has
        the same check and always returns 0 for !MACHINE_HAS_HPAGE.)
      - in s390 and mips, we use HPAGE_MASK instead of PMD_MASK as done in common
        code. This patch forces these archs use PMD_MASK, but it's OK because
        they are identical in both archs.
        In s390, both of HPAGE_SHIFT and PMD_SHIFT are 20.
        In mips, HPAGE_SHIFT is defined as (PAGE_SHIFT + PAGE_SHIFT - 3) and
        PMD_SHIFT is define as (PAGE_SHIFT + PAGE_SHIFT + PTE_ORDER - 3), but
        PTE_ORDER is always 0, so these are identical.
      
      [n-horiguchi@ah.jp.nec.com: resolve conflict to apply to v3.19.1]
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a15d5146
    • Ian Abbott's avatar
      staging: comedi: adv_pci1710: fix AI INSN_READ for non-zero channel · b52d8696
      Ian Abbott authored
      commit abe46b89 upstream.
      
      Reading of analog input channels by the `INSN_READ` comedi instruction
      is broken for all except channel 0.  `pci171x_ai_insn_read()` calls
      `pci171x_ai_read_sample()` with the wrong value for the third parameter.
      It is supposed to be the current index in a channel list (which is
      always of length 1 in this case, so the index should be 0), but instead
      it is passing the actual channel number.  `pci171x_ai_read_sample()`
      checks the channel number encoded in the raw sample value read from the
      hardware matches the channel number stored in the specified index of the
      previously set up channel list and returns `-ENODATA` if it doesn't
      match.  Since the index should always be 0 in this case, the match will
      fail unless the channel number is also 0.  Fix it by passing 0 as the
      channel index.
      
      Note that when the bug first appeared, it was `pci171x_ai_dropout()`
      that was called with the wrong parameter value.  `pci171x_ai_dropout()`
      got replaced with `pci171x_ai_read_sample()` in commit 7fd2dae2
      ("staging: comedi: adv_pci1710: introduce pci171x_ai_read_sample()").
      
      Fixes: 16c7eb60 ("staging: comedi: adv_pci1710: always enable PCI171x_PARANOIDCHECK code")
      Signed-off-by: default avatarIan Abbott <abbotti@mev.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      b52d8696
    • Radim Krčmář's avatar
      KVM: nVMX: mask unrestricted_guest if disabled on L0 · 1b413722
      Radim Krčmář authored
      commit 0790ec17 upstream.
      
      If EPT was enabled, unrestricted_guest was allowed in L1 regardless of
      L0.  L1 triple faulted when running L2 guest that required emulation.
      
      Another side effect was 'WARN_ON_ONCE(vmx->nested.nested_run_pending)'
      in L0's dmesg:
        WARNING: CPU: 0 PID: 0 at arch/x86/kvm/vmx.c:9190 nested_vmx_vmexit+0x96e/0xb00 [kvm_intel] ()
      
      Prevent this scenario by masking SECONDARY_EXEC_UNRESTRICTED_GUEST when
      the host doesn't have it enabled.
      
      Fixes: 78051e3b ("KVM: nVMX: Disable unrestricted mode if ept=0")
      Cc: stable@vger.kernel.org
      Tested-By: default avatarKashyap Chamarthy <kchamart@redhat.com>
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      1b413722
    • Jun'ichi Nomura \\\\(NEC\\\\)'s avatar
      tg3: Hold tp->lock before calling tg3_halt() from tg3_init_one() · edc7cc6c
      Jun'ichi Nomura \\\\(NEC\\\\) authored
      [ Upstream commit d0af71a3 ]
      
      tg3_init_one() calls tg3_halt() without tp->lock despite its assumption
      and causes deadlock.
      If lockdep is enabled, a warning like this shows up before the stall:
      
        [ BUG: bad unlock balance detected! ]
        3.19.0test #3 Tainted: G            E
        -------------------------------------
        insmod/369 is trying to release lock (&(&tp->lock)->rlock) at:
        [<ffffffffa02d5a1d>] tg3_chip_reset+0x14d/0x780 [tg3]
        but there are no more locks to release!
      
      tg3_init_one() doesn't call tg3_halt() under normal situation but
      during kexec kdump I hit this problem.
      
      Fixes: 932f19de ("tg3: Release tp->lock before invoking synchronize_irq()")
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      edc7cc6c
    • Ben Hutchings's avatar
      usbnet: Fix tx_bytes statistic running backward in cdc_ncm · f1cb2a0f
      Ben Hutchings authored
      [ Upstream commit 7a1e890e ]
      
      cdc_ncm disagrees with usbnet about how much framing overhead should
      be counted in the tx_bytes statistics, and tries 'fix' this by
      decrementing tx_bytes on the transmit path.  But statistics must never
      be decremented except due to roll-over; this will thoroughly confuse
      user-space.  Also, tx_bytes is only incremented by usbnet in the
      completion path.
      
      Fix this by requiring drivers that set FLAG_MULTI_FRAME to set a
      tx_bytes delta along with the tx_packets count.
      
      Fixes: beeecd42 ("net: cdc_ncm/cdc_mbim: adding NCM protocol statistics")
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f1cb2a0f
    • Ben Hutchings's avatar
      usbnet: Fix tx_packets stat for FLAG_MULTI_FRAME drivers · 3d206780
      Ben Hutchings authored
      [ Upstream commit 1e9e39f4 ]
      
      Currently the usbnet core does not update the tx_packets statistic for
      drivers with FLAG_MULTI_PACKET and there is no hook in the TX
      completion path where they could do this.
      
      cdc_ncm and dependent drivers are bumping tx_packets stat on the
      transmit path while asix and sr9800 aren't updating it at all.
      
      Add a packet count in struct skb_data so these drivers can fill it
      in, initialise it to 1 for other drivers, and add the packet count
      to the tx_packets statistic on completion.
      Signed-off-by: default avatarBen Hutchings <ben.hutchings@codethink.co.uk>
      Tested-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d206780
    • Jesse Gross's avatar
      udptunnels: Call handle_offloads after inserting vlan tag. · 174fbb30
      Jesse Gross authored
      [ Upstream commit b736a623 ]
      
      handle_offloads() calls skb_reset_inner_headers() to store
      the layer pointers to the encapsulated packet. However, we
      currently push the vlag tag (if there is one) onto the packet
      afterwards. This changes the MAC header for the encapsulated
      packet but it is not reflected in skb->inner_mac_header, which
      breaks GSO and drivers which attempt to use this for encapsulation
      offloads.
      
      Fixes: 1eaa8178 ("vxlan: Add tx-vlan offload support.")
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      174fbb30
    • Herbert Xu's avatar
      skbuff: Do not scrub skb mark within the same name space · d385d003
      Herbert Xu authored
      [ Upstream commit 213dd74a ]
      
      On Wed, Apr 15, 2015 at 05:41:26PM +0200, Nicolas Dichtel wrote:
      > Le 15/04/2015 15:57, Herbert Xu a écrit :
      > >On Wed, Apr 15, 2015 at 06:22:29PM +0800, Herbert Xu wrote:
      > [snip]
      > >Subject: skbuff: Do not scrub skb mark within the same name space
      > >
      > >The commit ea23192e ("tunnels:
      > Maybe add a Fixes tag?
      > Fixes: ea23192e ("tunnels: harmonize cleanup done on skb on rx path")
      >
      > >harmonize cleanup done on skb on rx path") broke anyone trying to
      > >use netfilter marking across IPv4 tunnels.  While most of the
      > >fields that are cleared by skb_scrub_packet don't matter, the
      > >netfilter mark must be preserved.
      > >
      > >This patch rearranges skb_scurb_packet to preserve the mark field.
      > nit: s/scurb/scrub
      >
      > Else it's fine for me.
      
      Sure.
      
      PS I used the wrong email for James the first time around.  So
      let me repeat the question here.  Should secmark be preserved
      or cleared across tunnels within the same name space? In fact,
      do our security models even support name spaces?
      
      ---8<---
      The commit ea23192e ("tunnels:
      harmonize cleanup done on skb on rx path") broke anyone trying to
      use netfilter marking across IPv4 tunnels.  While most of the
      fields that are cleared by skb_scrub_packet don't matter, the
      netfilter mark must be preserved.
      
      This patch rearranges skb_scrub_packet to preserve the mark field.
      
      Fixes: ea23192e ("tunnels: harmonize cleanup done on skb on rx path")
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d385d003
    • Herbert Xu's avatar
      Revert "net: Reset secmark when scrubbing packet" · 5fe5245d
      Herbert Xu authored
      [ Upstream commit 4c0ee414 ]
      
      This patch reverts commit b8fb4e06
      because the secmark must be preserved even when a packet crosses
      namespace boundaries.  The reason is that security labels apply to
      the system as a whole and is not per-namespace.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5fe5245d
    • Alexei Starovoitov's avatar
      bpf: fix verifier memory corruption · e3666002
      Alexei Starovoitov authored
      [ Upstream commit c3de6317 ]
      
      Due to missing bounds check the DAG pass of the BPF verifier can corrupt
      the memory which can cause random crashes during program loading:
      
      [8.449451] BUG: unable to handle kernel paging request at ffffffffffffffff
      [8.451293] IP: [<ffffffff811de33d>] kmem_cache_alloc_trace+0x8d/0x2f0
      [8.452329] Oops: 0000 [#1] SMP
      [8.452329] Call Trace:
      [8.452329]  [<ffffffff8116cc82>] bpf_check+0x852/0x2000
      [8.452329]  [<ffffffff8116b7e4>] bpf_prog_load+0x1e4/0x310
      [8.452329]  [<ffffffff811b190f>] ? might_fault+0x5f/0xb0
      [8.452329]  [<ffffffff8116c206>] SyS_bpf+0x806/0xa30
      
      Fixes: f1bca824 ("bpf: add search pruning optimization to verifier")
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e3666002
    • Eric Dumazet's avatar
      bnx2x: Fix busy_poll vs netpoll · 63f49b7f
      Eric Dumazet authored
      [ Upstream commit 074975d0 ]
      
      Commit 9a2620c8 ("bnx2x: prevent WARN during driver unload")
      switched the napi/busy_lock locking mechanism from spin_lock() into
      spin_lock_bh(), breaking inter-operability with netconsole, as netpoll
      disables interrupts prior to calling our napi mechanism.
      
      This switches the driver into using atomic assignments instead of the
      spinlock mechanisms previously employed.
      
      Based on initial patch from Yuval Mintz & Ariel Elior
      
      I basically added softirq starvation avoidance, and mixture
      of atomic operations, plain writes and barriers.
      
      Note this slightly reduces the overhead for this driver when no
      busy_poll sockets are in use.
      
      Fixes: 9a2620c8 ("bnx2x: prevent WARN during driver unload")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      63f49b7f
    • Eric Dumazet's avatar
      tcp: tcp_make_synack() should clear skb->tstamp · 1b6c8d50
      Eric Dumazet authored
      [ Upstream commit b50edd78 ]
      
      I noticed tcpdump was giving funky timestamps for locally
      generated SYNACK messages on loopback interface.
      
      11:42:46.938990 IP 127.0.0.1.48245 > 127.0.0.2.23850: S
      945476042:945476042(0) win 43690 <mss 65495,nop,nop,sackOK,nop,wscale 7>
      
      20:28:58.502209 IP 127.0.0.2.23850 > 127.0.0.1.48245: S
      3160535375:3160535375(0) ack 945476043 win 43690 <mss
      65495,nop,nop,sackOK,nop,wscale 7>
      
      This is because we need to clear skb->tstamp before
      entering lower stack, otherwise net_timestamp_check()
      does not set skb->tstamp.
      
      Fixes: 7faee5c0 ("tcp: remove TCP_SKB_CB(skb)->when")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1b6c8d50
    • Jack Morgenstein's avatar
      net/mlx4_core: Fix error message deprecation for ConnectX-2 cards · 3a0cf55b
      Jack Morgenstein authored
      [ Upstream commit fde913e2 ]
      
      Commit 1daa4303 ("net/mlx4_core: Deprecate error message at
      ConnectX-2 cards startup to debug") did the deprecation only for port 1
      of the card. Need to deprecate for port 2 as well.
      
      Fixes: 1daa4303 ("net/mlx4_core: Deprecate error message at ConnectX-2 cards startup to debug")
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarAmir Vadai <amirv@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a0cf55b
    • hannes@stressinduktion.org's avatar
      ipv6: protect skb->sk accesses from recursive dereference inside the stack · 3fe207e4
      hannes@stressinduktion.org authored
      [ Upstream commit f60e5990 ]
      
      We should not consult skb->sk for output decisions in xmit recursion
      levels > 0 in the stack. Otherwise local socket settings could influence
      the result of e.g. tunnel encapsulation process.
      
      ipv6 does not conform with this in three places:
      
      1) ip6_fragment: we do consult ipv6_npinfo for frag_size
      
      2) sk_mc_loop in ipv6 uses skb->sk and checks if we should
         loop the packet back to the local socket
      
      3) ip6_skb_dst_mtu could query the settings from the user socket and
         force a wrong MTU
      
      Furthermore:
      In sk_mc_loop we could potentially land in WARN_ON(1) if we use a
      PF_PACKET socket ontop of an IPv6-backed vxlan device.
      
      Reuse xmit_recursion as we are currently only interested in protecting
      tunnel devices.
      
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3fe207e4
    • Neal Cardwell's avatar
      tcp: fix FRTO undo on cumulative ACK of SACKed range · 84212e36
      Neal Cardwell authored
      [ Upstream commit 666b8051 ]
      
      On processing cumulative ACKs, the FRTO code was not checking the
      SACKed bit, meaning that there could be a spurious FRTO undo on a
      cumulative ACK of a previously SACKed skb.
      
      The FRTO code should only consider a cumulative ACK to indicate that
      an original/unretransmitted skb is newly ACKed if the skb was not yet
      SACKed.
      
      The effect of the spurious FRTO undo would typically be to make the
      connection think that all previously-sent packets were in flight when
      they really weren't, leading to a stall and an RTO.
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Fixes: e33099f9 ("tcp: implement RFC5682 F-RTO")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84212e36
    • Jonathan Davies's avatar
      xen-netfront: transmit fully GSO-sized packets · 5ce52016
      Jonathan Davies authored
      [ Upstream commit 0c36820e ]
      
      xen-netfront limits transmitted skbs to be at most 44 segments in size. However,
      GSO permits up to 65536 bytes, which means a maximum of 45 segments of 1448
      bytes each. This slight reduction in the size of packets means a slight loss in
      efficiency.
      
      Since c/s 9ecd1a75, xen-netfront sets gso_max_size to
          XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER,
      where XEN_NETIF_MAX_TX_SIZE is 65535 bytes.
      
      The calculation used by tcp_tso_autosize (and also tcp_xmit_size_goal since c/s
      6c09fa09) in determining when to split an skb into two is
          sk->sk_gso_max_size - 1 - MAX_TCP_HEADER.
      
      So the maximum permitted size of an skb is calculated to be
          (XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER) - 1 - MAX_TCP_HEADER.
      
      Intuitively, this looks like the wrong formula -- we don't need two TCP headers.
      Instead, there is no need to deviate from the default gso_max_size of 65536 as
      this already accommodates the size of the header.
      
      Currently, the largest skb transmitted by netfront is 63712 bytes (44 segments
      of 1448 bytes each), as observed via tcpdump. This patch makes netfront send
      skbs of up to 65160 bytes (45 segments of 1448 bytes each).
      
      Similarly, the maximum allowable mtu does not need to subtract MAX_TCP_HEADER as
      it relates to the size of the whole packet, including the header.
      
      Fixes: 9ecd1a75 ("xen-netfront: reduce gso_max_size to account for max TCP header")
      Signed-off-by: default avatarJonathan Davies <jonathan.davies@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ce52016
    • Thomas Graf's avatar
      openvswitch: Return vport module ref before destruction · 69ed0224
      Thomas Graf authored
      [ Upstream commit fa2d8ff4 ]
      
      Return module reference before invoking the respective vport
      ->destroy() function. This is needed as ovs_vport_del() is not
      invoked inside an RCU read side critical section so the kfree
      can occur immediately before returning to ovs_vport_del().
      
      Returning the module reference before ->destroy() is safe because
      the module unregistration is blocked on ovs_lock which we hold
      while destroying the datapath.
      
      Fixes: 62b9c8d0 ("ovs: Turn vports with dependencies into separate modules")
      Reported-by: default avatarPravin Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Acked-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      69ed0224
    • Anton Nayshtut's avatar
      bonding: Bonding Overriding Configuration logic restored. · 28cc484c
      Anton Nayshtut authored
      [ Upstream commit f5e2dc5d ]
      
      Before commit 3900f290 ("bonding: slight
      optimizztion for bond_slave_override()") the override logic was to send packets
      with non-zero queue_id through the slave with corresponding queue_id, under two
      conditions only - if the slave can transmit and it's up.
      
      The above mentioned commit changed this logic by introducing an additional
      condition - whether the bond is active (indirectly, using the slave_can_tx and
      later - bond_is_active_slave), that prevents the user from implementing more
      complex policies according to the Documentation/networking/bonding.txt.
      Signed-off-by: default avatarAnton Nayshtut <anton@swortex.com>
      Signed-off-by: default avatarAlexey Bogoslavsky <alexey@swortex.com>
      Signed-off-by: default avatarAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      28cc484c
    • Alexey Kodanev's avatar
      net: tcp6: fix double call of tcp_v6_fill_cb() · 1d069b5a
      Alexey Kodanev authored
      [ Upstream commit 4ad19de8 ]
      
      tcp_v6_fill_cb() will be called twice if socket's state changes from
      TCP_TIME_WAIT to TCP_LISTEN. That can result in control buffer data
      corruption because in the second tcp_v6_fill_cb() call it's not copying
      IP6CB(skb) anymore, but 'seq', 'end_seq', etc., so we can get weird and
      unpredictable results. Performance loss of up to 1200% has been observed
      in LTP/vxlan03 test.
      
      This can be fixed by copying inet6_skb_parm to the beginning of 'cb'
      only if xfrm6_policy_check() and tcp_v6_fill_cb() are going to be
      called again.
      
      Fixes: 2dc49d16 ("tcp6: don't move IP6CB before xfrm6_policy_check()")
      Signed-off-by: default avatarAlexey Kodanev <alexey.kodanev@oracle.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1d069b5a
    • Alex Gartrell's avatar
      tun: return proper error code from tun_do_read · 6e24551f
      Alex Gartrell authored
      [ Upstream commit 957f094f ]
      
      Instead of -1 with EAGAIN, read on a O_NONBLOCK tun fd will return 0.  This
      fixes this by properly returning the error code from __skb_recv_datagram.
      Signed-off-by: default avatarAlex Gartrell <agartrell@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6e24551f
    • D.S. Ljungmark's avatar
      ipv6: Don't reduce hop limit for an interface · 553ecf74
      D.S. Ljungmark authored
      [ Upstream commit 6fd99094 ]
      
      A local route may have a lower hop_limit set than global routes do.
      
      RFC 3756, Section 4.2.7, "Parameter Spoofing"
      
      >   1.  The attacker includes a Current Hop Limit of one or another small
      >       number which the attacker knows will cause legitimate packets to
      >       be dropped before they reach their destination.
      
      >   As an example, one possible approach to mitigate this threat is to
      >   ignore very small hop limits.  The nodes could implement a
      >   configurable minimum hop limit, and ignore attempts to set it below
      >   said limit.
      Signed-off-by: default avatarD.S. Ljungmark <ljungmark@modio.se>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      553ecf74
    • Ido Shamay's avatar
      net/mlx4_en: Call register_netdevice in the proper location · b2137fcc
      Ido Shamay authored
      [ Upstream commit e5eda89d ]
      
      Netdevice registration should be performed a the end of the driver
      initialization flow. If we don't do that, after calling register_netdevice,
      device callbacks may be issued by higher layers of the stack before
      final configuration of the device is done.
      
      For example (VXLAN configuration race), mlx4_SET_PORT_VXLAN was issued
      after the register_netdev command. System network scripts may configure
      the interface (UP) right after the registration, which also attach
      unicast VXLAN steering rule, before mlx4_SET_PORT_VXLAN was called,
      causing the firmware to fail the rule attachment.
      
      Fixes: 837052d0 ("net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunneling")
      Signed-off-by: default avatarIdo Shamay <idos@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2137fcc
    • Simon Horman's avatar
      rocker: handle non-bridge master change · 5fd2b0c0
      Simon Horman authored
      [ Upstream commit a6e95cc7 ]
      
      Master change notifications may occur other than when joining or
      leaving a bridge, for example when being added to or removed from
      a bond or Open vSwitch.
      
      Previously in those cases rocker_port_bridge_leave() was called
      which results in a null-pointer dereference as rocker_port->bridge_dev
      is NULL because there is no bridge device.
      
      This patch makes provision for doing nothing in such cases.
      
      Fixes: 6c707945 ("rocker: implement L2 bridge offloading")
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Acked-by: default avatarScott Feldman <sfeldma@gmail.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5fd2b0c0
    • Michal Kubeček's avatar
      tcp: prevent fetching dst twice in early demux code · c3d7204d
      Michal Kubeček authored
      [ Upstream commit d0c294c5 ]
      
      On s390x, gcc 4.8 compiles this part of tcp_v6_early_demux()
      
              struct dst_entry *dst = sk->sk_rx_dst;
      
              if (dst)
                      dst = dst_check(dst, inet6_sk(sk)->rx_dst_cookie);
      
      to code reading sk->sk_rx_dst twice, once for the test and once for
      the argument of ip6_dst_check() (dst_check() is inline). This allows
      ip6_dst_check() to be called with null first argument, causing a crash.
      
      Protect sk->sk_rx_dst access by READ_ONCE() both in IPv4 and IPv6
      TCP early demux code.
      
      Fixes: 41063e9d ("ipv4: Early TCP socket demux.")
      Fixes: c7109986 ("ipv6: Early TCP socket demux")
      Signed-off-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c3d7204d
  2. 19 Apr, 2015 15 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.19.5 · a96a0302
      Greg Kroah-Hartman authored
      a96a0302
    • Igor Mammedov's avatar
      kvm: avoid page allocation failure in kvm_set_memory_region() · 002d8e07
      Igor Mammedov authored
      commit 74496134 upstream.
      
      KVM guest can fail to startup with following trace on host:
      
      qemu-system-x86: page allocation failure: order:4, mode:0x40d0
      Call Trace:
        dump_stack+0x47/0x67
        warn_alloc_failed+0xee/0x150
        __alloc_pages_direct_compact+0x14a/0x150
        __alloc_pages_nodemask+0x776/0xb80
        alloc_kmem_pages+0x3a/0x110
        kmalloc_order+0x13/0x50
        kmemdup+0x1b/0x40
        __kvm_set_memory_region+0x24a/0x9f0 [kvm]
        kvm_set_ioapic+0x130/0x130 [kvm]
        kvm_set_memory_region+0x21/0x40 [kvm]
        kvm_vm_ioctl+0x43f/0x750 [kvm]
      
      Failure happens when attempting to allocate pages for
      'struct kvm_memslots', however it doesn't have to be
      present in physically contiguous (kmalloc-ed) address
      space, change allocation to kvm_kvzalloc() so that
      it will be vmalloc-ed when its size is more then a page.
      Signed-off-by: default avatarIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      002d8e07
    • Daniel Vetter's avatar
      drm/i915: Push vblank enable/disable past encoder->enable/disable · d5d51dd4
      Daniel Vetter authored
      commit f9b61ff6 upstream.
      
      It is platform/output depenedent when exactly the pipe will start
      running. Sometimes we just need the (cpu) pipe enabled, in other cases
      the pch transcoder is enough and in yet other cases the (DP) port is
      sending the frame start signal.
      
      In a perfect world we'd put the drm_crtc_vblank_on call exactly where
      the pipe starts running, but due to cloning and similar things this
      will get messy. And the current approach of picking the most
      conservative place for all combinations also doesn't work since that
      results in legit vblank waits (in encoder->enable hooks, e.g. the 2
      vblank waits for sdvo) failing.
      
      Completely going back to the old world before
      
      commit 51e31d49
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Mon Sep 15 12:36:02 2014 +0200
      
          drm/i915: Use generic vblank wait
      
      isn't great either since screaming when the vblank wait work because
      the pipe is off is kinda nice.
      
      Pick a compromise and move the drm_crtc_vblank_on right before the
      encoder->enable call. This is a lie on some outputs/platforms, but
      after the ->enable callback the pipe is guaranteed to run everywhere.
      So not that bad really. Suggested by Ville.
      
      v2: Same treatment for drm_crtc_vblank_off and encoder->disable: I've
      missed the ibx pipe B select w/a, which also has a vblank wait in the
      disable function (while the pipe is obviously still running).
      
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Acked-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5d51dd4
    • Dave Chinner's avatar
      xfs: ensure truncate forces zeroed blocks to disk · 363d91b0
      Dave Chinner authored
      commit 5885ebda upstream.
      
      A new fsync vs power fail test in xfstests indicated that XFS can
      have unreliable data consistency when doing extending truncates that
      require block zeroing. The blocks beyond EOF get zeroed in memory,
      but we never force those changes to disk before we run the
      transaction that extends the file size and exposes those blocks to
      userspace. This can result in the blocks not being correctly zeroed
      after a crash.
      
      Because in-memory behaviour is correct, tools like fsx don't pick up
      any coherency problems - it's not until the filesystem is shutdown
      or the system crashes after writing the truncate transaction to the
      journal but before the zeroed data in the page cache is flushed that
      the issue is exposed.
      
      Fix this by also flushing the dirty data in memory region between
      the old size and new size when we've found blocks that need zeroing
      in the truncate process.
      Reported-by: default avatarLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      363d91b0
    • Omar Sandoval's avatar
      ext4: fix indirect punch hole corruption · db8a9ac5
      Omar Sandoval authored
      commit 6f30b7e3 upstream.
      
      Commit 4f579ae7 (ext4: fix punch hole on files with indirect
      mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect
      mapping. However, there are bugs in several corner cases. This fixes 5
      distinct bugs:
      
      1. When there is at least one entire level of indirection between the
      start and end of the punch range and the end of the punch range is the
      first block of its level, we can't return early; we have to free the
      intervening levels.
      
      2. When the end is at a higher level of indirection than the start and
      ext4_find_shared returns a top branch for the end, we still need to free
      the rest of the shared branch it returns; we can't decrement partial2.
      
      3. When a punch happens within one level of indirection, we need to
      converge on an indirect block that contains the start and end. However,
      because the branches returned from ext4_find_shared do not necessarily
      start at the same level (e.g., the partial2 chain will be shallower if
      the last block occurs at the beginning of an indirect group), the walk
      of the two chains can end up "missing" each other and freeing a bunch of
      extra blocks in the process. This mismatch can be handled by first
      making sure that the chains are at the same level, then walking them
      together until they converge.
      
      4. When the punch happens within one level of indirection and
      ext4_find_shared returns a top branch for the start, we must free it,
      but only if the end does not occur within that branch.
      
      5. When the punch happens within one level of indirection and
      ext4_find_shared returns a top branch for the end, then we shouldn't
      free the block referenced by the end of the returned chain (this mirrors
      the different levels case).
      Signed-off-by: default avatarOmar Sandoval <osandov@osandov.com>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      db8a9ac5
    • Preeti U Murthy's avatar
      timers/tick/broadcast-hrtimer: Fix suspicious RCU usage in idle loop · 1a6fe5b6
      Preeti U Murthy authored
      commit a127d2bc upstream.
      
      The hrtimer mode of broadcast queues hrtimers in the idle entry
      path so as to wakeup cpus in deep idle states. The associated
      call graph is :
      
      	cpuidle_idle_call()
      	|____ clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, ....))
      	     |_____tick_broadcast_set_event()
      		   |____clockevents_program_event()
      			|____bc_set_next()
      
      The hrtimer_{start/cancel} functions call into tracing which uses RCU.
      But it is not legal to call into RCU in cpuidle because it is one of the
      quiescent states. Hence protect this region with RCU_NONIDLE which informs
      RCU that the cpu is momentarily non-idle.
      
      As an aside it is helpful to point out that the clock event device that is
      programmed here is not a per-cpu clock device; it is a
      pseudo clock device, used by the broadcast framework alone.
      The per-cpu clock device programming never goes through bc_set_next().
      Signed-off-by: default avatarPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: linuxppc-dev@ozlabs.org
      Cc: mpe@ellerman.id.au
      Cc: tglx@linutronix.de
      Link: http://lkml.kernel.org/r/20150318104705.17763.56668.stgit@preeti.in.ibm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a6fe5b6
    • Majd Dibbiny's avatar
      IB/mlx4: Saturate RoCE port PMA counters in case of overflow · 6531f38e
      Majd Dibbiny authored
      commit 61a3855b upstream.
      
      For RoCE ports, we set the u32 PMA values based on u64 HCA counters. In case of
      overflow, according to the IB spec, we have to saturate a counter to its
      max value, do that.
      
      Fixes: c3779134 ('IB/mlx4: Support PMA counters for IBoE')
      Signed-off-by: default avatarMajd Dibbiny <majd@mellanox.com>
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6531f38e
    • Uwe Kleine-König's avatar
      clk: divider: fix calculation of maximal parent rate for a given divider · 033dc8f0
      Uwe Kleine-König authored
      commit da321133 upstream.
      
      The rate provided at the output of a clk-divider is calculated as:
      
      	DIV_ROUND_UP(parent_rate, div)
      
      since commit b11d282d (clk: divider: fix rate calculation for
      fractional rates). So to yield a rate not bigger than r parent_rate
      must be <= r * div.
      
      The effect of choosing a parent rate that is too big as was done before
      this patch results in wrongly ruling out good dividers.
      
      Note that this is not a complete fix as __clk_round_rate might return a
      value >= its 2nd parameter. Also for dividers with
      CLK_DIVIDER_ROUND_CLOSEST set the calculation is not accurate. But this
      fixes the test case by Sascha Hauer that uses a chain of three dividers
      under a fixed clock.
      
      Fixes: b11d282d (clk: divider: fix rate calculation for fractional rates)
      Suggested-by: default avatarSascha Hauer <s.hauer@pengutronix.de>
      Signed-off-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Acked-by: default avatarSascha Hauer <s.hauer@pengutronix.de>
      Signed-off-by: default avatarMichael Turquette <mturquette@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      033dc8f0
    • Uwe Kleine-König's avatar
      clk: divider: fix selection of divider when rounding to closest · 765c2045
      Uwe Kleine-König authored
      commit 26bac95a upstream.
      
      It's an invalid approach to assume that among two divider values
      the one nearer the exact divider is the better one.
      
      Assume a parent rate of 1000 Hz, a divider with CLK_DIVIDER_POWER_OF_TWO
      and a target rate of 89 Hz. The exact divider is ~ 11.236 so 8 and 16
      are the candidates to choose from yielding rates 125 Hz and 62.5 Hz
      respectivly. While 8 is nearer to 11.236 than 16 is, the latter is still
      the better divider as 62.5 is nearer to 89 than 125 is.
      
      Fixes: 774b5143 (clk: divider: Add round to closest divider)
      Signed-off-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Acked-by: default avatarSascha Hauer <s.hauer@pengutronix.de>
      Acked-by: default avatarMaxime Coquelin <maxime.coquelin@st.com>
      Signed-off-by: default avatarMichael Turquette <mturquette@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      765c2045
    • Hans Verkuil's avatar
      vb2: fix 'UNBALANCED' warnings when calling vb2_thread_stop() · 4858bcfd
      Hans Verkuil authored
      commit 0e661006 upstream.
      
      Stopping the vb2 thread (as used by several DVB devices) can result
      in an 'UNBALANCED' warning such as this:
      
      vb2: counters for queue ffff880407ee9828: UNBALANCED!
      vb2:     setup: 1 start_streaming: 1 stop_streaming: 1
      vb2:     wait_prepare: 249333 wait_finish: 249334
      
      This is due to a race condition between stopping the thread and
      calling vb2_internal_streamoff(). While I have not been able to deduce
      the exact mechanism how this race condition can produce this warning,
      I can see that the way the stream is stopped is likely to lead to a
      race somewhere.
      
      This patch simplifies how this is done by first ensuring that the
      thread is completely stopped before cleaning up the vb2 queue. It
      does that by setting threadio->stop to true, followed by a call to
      vb2_queue_error() which will wake up the thread. The thread sees that
      'stop' is true and it will exit.
      
      The call to kthread_stop() waits until the thread has exited, and only
      then is the queue cleaned up by calling __vb2_cleanup_fileio().
      
      This is a much cleaner sequence and the warning has now disappeared.
      Reported-by: default avatarJurgen Kramer <gtmkramer@xs4all.nl>
      Tested-by: default avatarJurgen Kramer <gtmkramer@xs4all.nl>
      Signed-off-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4858bcfd
    • Sakari Ailus's avatar
      vb2: Fix dma_dir setting for dma-contig mem type · 52c681ce
      Sakari Ailus authored
      commit 4879785e upstream.
      
      The last argument of vb2_dc_get_user_pages() is of type enum
      dma_data_direction, but the caller, vb2_dc_get_userptr() passes a value
      which is the result of comparison dma_dir == DMA_FROM_DEVICE. This results
      in the write parameter to get_user_pages() being zero in all cases, i.e.
      that the caller has no intent to write there.
      
      This was broken by patch "vb2: replace 'write' by 'dma_dir'".
      Signed-off-by: default avatarSakari Ailus <sakari.ailus@linux.intel.com>
      Acked-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52c681ce
    • Geert Uytterhoeven's avatar
      soc-camera: Fix devm_kfree() in soc_of_bind() · 2a9a8392
      Geert Uytterhoeven authored
      commit 8e48a2d5 upstream.
      
      Unlike scan_async_group(), soc_of_bind() doesn't allocate its
      soc_camera_async_client structure using devm_kzalloc(), but has it
      embedded inside the soc_of_info structure.  Hence on failure, it must
      free the whole soc_of_info structure, and not just the embedded
      soc_camera_async_client structure, as the latter causes a warning, and
      may cause slab corruption:
      
          soc-camera-pdrv soc-camera-pdrv.0: Probing soc-camera-pdrv.0
          ------------[ cut here ]------------
          WARNING: CPU: 0 PID: 1 at drivers/base/devres.c:887 devm_kfree+0x30/0x40()
          CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.19.0-shmobile-08386-g37feb0d093cb2d8e #128
          Hardware name: Generic R8A7791 (Flattened Device Tree)
          Backtrace:
          [<c0011e7c>] (dump_backtrace) from [<c0012024>] (show_stack+0x18/0x1c)
           r6:c05a923b r5:00000009 r4:00000000 r3:00204140
          [<c001200c>] (show_stack) from [<c048ed30>] (dump_stack+0x78/0x94)
          [<c048ecb8>] (dump_stack) from [<c002687c>] (warn_slowpath_common+0x8c/0xb8)
           r4:00000000 r3:00000000
          [<c00267f0>] (warn_slowpath_common) from [<c0026980>] (warn_slowpath_null+0x24/0x2c)
           r8:ee7d8214 r7:ed83b810 r6:ed83bc20 r5:fffffffa r4:ed83e510
          [<c002695c>] (warn_slowpath_null) from [<c025e0cc>] (devm_kfree+0x30/0x40)
          [<c025e09c>] (devm_kfree) from [<c032bbf4>] (soc_of_bind.isra.14+0x194/0x1d4)
          [<c032ba60>] (soc_of_bind.isra.14) from [<c032c6b8>] (soc_camera_host_register+0x208/0x31c)
           r9:00000070 r8:ee7e05d0 r7:ee153210 r6:00000000 r5:ee7e0218 r4:ed83bc20
          [<c032c4b0>] (soc_camera_host_register) from [<c032e80c>] (rcar_vin_probe+0x1f4/0x238)
           r8:ee153200 r7:00000008 r6:ee153210 r5:ed83bc10 r4:c066319c r3:000000c0
          [<c032e618>] (rcar_vin_probe) from [<c025c334>] (platform_drv_probe+0x50/0xa0)
           r10:00000000 r9:c0662fa8 r8:00000000 r7:c06a3700 r6:c0662fa8 r5:ee153210
           r4:00000000
          [<c025c2e4>] (platform_drv_probe) from [<c025af08>] (driver_probe_device+0xc4/0x208)
           r6:c06a36f4 r5:00000000 r4:ee153210 r3:c025c2e4
          [<c025ae44>] (driver_probe_device) from [<c025b108>] (__driver_attach+0x70/0x94)
           r9:c066f9c0 r8:c0624a98 r7:c065b790 r6:c0662fa8 r5:ee153244 r4:ee153210
          [<c025b098>] (__driver_attach) from [<c025984c>] (bus_for_each_dev+0x74/0x98)
           r6:c025b098 r5:c0662fa8 r4:00000000 r3:00000001
          [<c02597d8>] (bus_for_each_dev) from [<c025b1dc>] (driver_attach+0x20/0x28)
           r6:ed83c200 r5:00000000 r4:c0662fa8
          [<c025b1bc>] (driver_attach) from [<c025a00c>] (bus_add_driver+0xdc/0x1c4)
          [<c0259f30>] (bus_add_driver) from [<c025b8f4>] (driver_register+0xa4/0xe8)
           r7:c0624a98 r6:00000000 r5:c060b010 r4:c0662fa8
          [<c025b850>] (driver_register) from [<c025ccd0>] (__platform_driver_register+0x50/0x64)
           r5:c060b010 r4:ed8394c0
          [<c025cc80>] (__platform_driver_register) from [<c060b028>] (rcar_vin_driver_init+0x18/0x20)
          [<c060b010>] (rcar_vin_driver_init) from [<c05edde8>] (do_one_initcall+0x108/0x1b8)
          [<c05edce0>] (do_one_initcall) from [<c05edfb4>] (kernel_init_freeable+0x11c/0x1e4)
           r9:c066f9c0 r8:c066f9c0 r7:c062eab0 r6:c06252c4 r5:000000ad r4:00000006
          [<c05ede98>] (kernel_init_freeable) from [<c048c3d0>] (kernel_init+0x10/0xec)
           r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c048c3c0 r4:00000000
          [<c048c3c0>] (kernel_init) from [<c000eba0>] (ret_from_fork+0x14/0x34)
           r4:00000000 r3:ee04e000
          ---[ end trace e3a984cc0335c8a0 ]---
          rcar_vin e6ef1000.video: group probe failed: -6
      
      Fixes: 1ddc6a6c ("[media] soc_camera: add support for dt binding soc_camera drivers")
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarGuennadi Liakhovetski <g.liakhovetski@gmx.de>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2a9a8392
    • Hans Verkuil's avatar
      cx23885: fix querycap · 5ddff4d7
      Hans Verkuil authored
      commit 6b46211f upstream.
      
      cap->device_caps wasn't set in cx23885-417.c causing a warning from
      the v4l2-core.
      Reported-by: default avatarJoseph Jasi <joe.yasi@gmail.com>
      Signed-off-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5ddff4d7
    • Marek Szyprowski's avatar
      media: s5p-mfc: fix mmap support for 64bit arch · c80fa6bc
      Marek Szyprowski authored
      commit 05b676ab upstream.
      
      TASK_SIZE is depends on the systems architecture (32 or 64 bits) and it
      should not be used for defining offset boundary for mmaping buffers for
      CAPTURE and OUTPUT queues. This patch fixes support for MMAP calls on
      the CAPTURE queue on 64bit architectures (like ARM64).
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: default avatarKamil Debski <k.debski@samsung.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c80fa6bc
    • Hans Verkuil's avatar
      sh_veu: v4l2_dev wasn't set · 8d73f670
      Hans Verkuil authored
      commit ab312030 upstream.
      
      The v4l2_dev field of struct video_device must be set correctly.
      This was never done for this driver, so no video nodes were created
      anymore.
      Signed-off-by: default avatarHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d73f670