1. 26 Jan, 2015 40 commits
    • Tetsuo Handa's avatar
      drm/ttm: Avoid memory allocation from shrinker functions. · fa13c23e
      Tetsuo Handa authored
      commit 881fdaa5 upstream.
      
      Andrew Morton wrote:
      > On Wed, 12 Nov 2014 13:08:55 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote:
      >
      > > Andrew Morton wrote:
      > > > Poor ttm guys - this is a bit of a trap we set for them.
      > >
      > > Commit a91576d7 ("drm/ttm: Pass GFP flags in order to avoid deadlock.")
      > > changed to use sc->gfp_mask rather than GFP_KERNEL.
      > >
      > > -       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *),
      > > -                       GFP_KERNEL);
      > > +       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp);
      > >
      > > But this bug is caused by sc->gfp_mask containing some flags which are not
      > > in GFP_KERNEL, right? Then, I think
      > >
      > > -       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp);
      > > +       pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp & GFP_KERNEL);
      > >
      > > would hide this bug.
      > >
      > > But I think we should use GFP_ATOMIC (or drop __GFP_WAIT flag)
      >
      > Well no - ttm_page_pool_free() should stop calling kmalloc altogether.
      > Just do
      >
      > 	struct page *pages_to_free[16];
      >
      > and rework the code to free 16 pages at a time.  Easy.
      
      Well, ttm code wants to process 512 pages at a time for performance.
      Memory footprint increased by 512 * sizeof(struct page *) buffer is
      only 4096 bytes. What about using static buffer like below?
      ----------
      >From d3cb5393c9c8099d6b37e769f78c31af1541fe8c Mon Sep 17 00:00:00 2001
      From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Date: Thu, 13 Nov 2014 22:21:54 +0900
      Subject: drm/ttm: Avoid memory allocation from shrinker functions.
      
      Commit a91576d7 ("drm/ttm: Pass GFP flags in order to avoid
      deadlock.") caused BUG_ON() due to sc->gfp_mask containing flags
      which are not in GFP_KERNEL.
      
        https://bugzilla.kernel.org/show_bug.cgi?id=87891
      
      Changing from sc->gfp_mask to (sc->gfp_mask & GFP_KERNEL) would
      avoid the BUG_ON(), but avoiding memory allocation from shrinker
      function is better and reliable fix.
      
      Shrinker function is already serialized by global lock, and
      clean up function is called after shrinker function is unregistered.
      Thus, we can use static buffer when called from shrinker function
      and clean up function.
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      fa13c23e
    • Thomas Hellstrom's avatar
      drm/vmwgfx: Fix fence event code · fc350ec8
      Thomas Hellstrom authored
      commit 89669e7a upstream.
      
      The commit "vmwgfx: Rework fence event action" introduced a number of bugs
      that are fixed with this commit:
      
      a) A forgotten return stateemnt.
      b) An if statement with identical branches.
      Reported-by: default avatarRob Clark <robdclark@gmail.com>
      Signed-off-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Reviewed-by: default avatarJakob Bornecrantz <jakob@vmware.com>
      Reviewed-by: default avatarSinclair Yeh <syeh@vmware.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      fc350ec8
    • Akash Goel's avatar
      drm/i915: Resolving the memory region conflict for Stolen area · b743bb06
      Akash Goel authored
      commit 3617dc96 upstream.
      
      There is a conflict seen when requesting the kernel to reserve
      the physical space used for the stolen area. This is because
      some BIOS are wrapping the stolen area in the root PCI bus, but have
      an off-by-one error. As a workaround we retry the reservation with an
      offset of 1 instead of 0.
      
      v2: updated commit message & the comment in source file (Daniel)
      Signed-off-by: default avatarAkash Goel <akash.goel@intel.com>
      Reviewed-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
      Tested-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b743bb06
    • Konstantin Khlebnikov's avatar
      ACPI / osl: speedup grace period in acpi_os_map_cleanup · b3959c77
      Konstantin Khlebnikov authored
      commit 74b51ee1 upstream.
      
      ACPI maintains cache of ioremap regions to speed up operations and
      access to them from irq context where ioremap() calls aren't allowed.
      This code abuses synchronize_rcu() on unmap path for synchronization
      with fast-path in acpi_os_read/write_memory which uses this cache.
      
      Since v3.10 CPUs are allowed to enter idle state even if they have RCU
      callbacks queued, see commit c0f4dfd4
      ("rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks").
      That change caused problems with nvidia proprietary driver which calls
      acpi_os_map/unmap_generic_address several times during initialization.
      Each unmap calls synchronize_rcu and adds significant delay. Totally
      initialization is slowed for a couple of seconds and that is enough to
      trigger timeout in hardware, gpu decides to "fell off the bus". Widely
      spread workaround is reducing "rcu_idle_gp_delay" from 4 to 1 jiffy.
      
      This patch replaces synchronize_rcu() with synchronize_rcu_expedited()
      which is much faster.
      
      Link: https://devtalk.nvidia.com/default/topic/567297/linux/linux-3-10-driver-crash/Signed-off-by: default avatarKonstantin Khlebnikov <koct9i@gmail.com>
      Reported-and-tested-by: default avatarAlexander Monakov <amonakov@gmail.com>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b3959c77
    • Dan Carpenter's avatar
      netfilter: ipset: small potential read beyond the end of buffer · 4d871c60
      Dan Carpenter authored
      commit 2196937e upstream.
      
      We could be reading 8 bytes into a 4 byte buffer here.  It seems
      harmless but adding a check is the right thing to do and it silences a
      static checker warning.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: default avatarJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      4d871c60
    • Jay Vosburgh's avatar
      net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding · e5924c95
      Jay Vosburgh authored
      [ Upstream commit 2c26d34b ]
      
      When using VXLAN tunnels and a sky2 device, I have experienced
      checksum failures of the following type:
      
      [ 4297.761899] eth0: hw csum failure
      [...]
      [ 4297.765223] Call Trace:
      [ 4297.765224]  <IRQ>  [<ffffffff8172f026>] dump_stack+0x46/0x58
      [ 4297.765235]  [<ffffffff8162ba52>] netdev_rx_csum_fault+0x42/0x50
      [ 4297.765238]  [<ffffffff8161c1a0>] ? skb_push+0x40/0x40
      [ 4297.765240]  [<ffffffff8162325c>] __skb_checksum_complete+0xbc/0xd0
      [ 4297.765243]  [<ffffffff8168c602>] tcp_v4_rcv+0x2e2/0x950
      [ 4297.765246]  [<ffffffff81666ca0>] ? ip_rcv_finish+0x360/0x360
      
      	These are reliably reproduced in a network topology of:
      
      container:eth0 == host(OVS VXLAN on VLAN) == bond0 == eth0 (sky2) -> switch
      
      	When VXLAN encapsulated traffic is received from a similarly
      configured peer, the above warning is generated in the receive
      processing of the encapsulated packet.  Note that the warning is
      associated with the container eth0.
      
              The skbs from sky2 have ip_summed set to CHECKSUM_COMPLETE, and
      because the packet is an encapsulated Ethernet frame, the checksum
      generated by the hardware includes the inner protocol and Ethernet
      headers.
      
      	The receive code is careful to update the skb->csum, except in
      __dev_forward_skb, as called by dev_forward_skb.  __dev_forward_skb
      calls eth_type_trans, which in turn calls skb_pull_inline(skb, ETH_HLEN)
      to skip over the Ethernet header, but does not update skb->csum when
      doing so.
      
      	This patch resolves the problem by adding a call to
      skb_postpull_rcsum to update the skb->csum after the call to
      eth_type_trans.
      Signed-off-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e5924c95
    • Govindarajulu Varadarajan's avatar
      enic: fix rx skb checksum · 73c0b665
      Govindarajulu Varadarajan authored
      [ Upstream commit 17e96834 ]
      
      Hardware always provides compliment of IP pseudo checksum. Stack expects
      whole packet checksum without pseudo checksum if CHECKSUM_COMPLETE is set.
      
      This causes checksum error in nf & ovs.
      
      kernel: qg-19546f09-f2: hw csum failure
      kernel: CPU: 9 PID: 0 Comm: swapper/9 Tainted: GF          O--------------   3.10.0-123.8.1.el7.x86_64 #1
      kernel: Hardware name: Cisco Systems Inc UCSB-B200-M3/UCSB-B200-M3, BIOS B200M3.2.2.3.0.080820141339 08/08/2014
      kernel: ffff881218f40000 df68243feb35e3a8 ffff881237a43ab8 ffffffff815e237b
      kernel: ffff881237a43ad0 ffffffff814cd4ca ffff8829ec71eb00 ffff881237a43af0
      kernel: ffffffff814c6232 0000000000000286 ffff8829ec71eb00 ffff881237a43b00
      kernel: Call Trace:
      kernel: <IRQ>  [<ffffffff815e237b>] dump_stack+0x19/0x1b
      kernel: [<ffffffff814cd4ca>] netdev_rx_csum_fault+0x3a/0x40
      kernel: [<ffffffff814c6232>] __skb_checksum_complete_head+0x62/0x70
      kernel: [<ffffffff814c6251>] __skb_checksum_complete+0x11/0x20
      kernel: [<ffffffff8155a20c>] nf_ip_checksum+0xcc/0x100
      kernel: [<ffffffffa049edc7>] icmp_error+0x1f7/0x35c [nf_conntrack_ipv4]
      kernel: [<ffffffff814cf419>] ? netif_rx+0xb9/0x1d0
      kernel: [<ffffffffa040eb7b>] ? internal_dev_recv+0xdb/0x130 [openvswitch]
      kernel: [<ffffffffa04c8330>] nf_conntrack_in+0xf0/0xa80 [nf_conntrack]
      kernel: [<ffffffff81509380>] ? inet_del_offload+0x40/0x40
      kernel: [<ffffffffa049e302>] ipv4_conntrack_in+0x22/0x30 [nf_conntrack_ipv4]
      kernel: [<ffffffff815005ca>] nf_iterate+0xaa/0xc0
      kernel: [<ffffffff81509380>] ? inet_del_offload+0x40/0x40
      kernel: [<ffffffff81500664>] nf_hook_slow+0x84/0x140
      kernel: [<ffffffff81509380>] ? inet_del_offload+0x40/0x40
      kernel: [<ffffffff81509dd4>] ip_rcv+0x344/0x380
      
      Hardware verifies IP & tcp/udp header checksum but does not provide payload
      checksum, use CHECKSUM_UNNECESSARY. Set it only if its valid IP tcp/udp packet.
      
      Cc: Jiri Benc <jbenc@redhat.com>
      Cc: Stefan Assmann <sassmann@redhat.com>
      Reported-by: default avatarSunil Choudhary <schoudha@redhat.com>
      Signed-off-by: default avatarGovindarajulu Varadarajan <_govind@gmx.com>
      Reviewed-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      73c0b665
    • Jiri Pirko's avatar
      team: avoid possible underflow of count_pending value for notify_peers and mcast_rejoin · c18e1d2c
      Jiri Pirko authored
      [ Upstream commit b0d11b42 ]
      
      This patch is fixing a race condition that may cause setting
      count_pending to -1, which results in unwanted big bulk of arp messages
      (in case of "notify peers").
      
      Consider following scenario:
      
      count_pending == 2
         CPU0                                           CPU1
      					team_notify_peers_work
      					  atomic_dec_and_test (dec count_pending to 1)
      					  schedule_delayed_work
       team_notify_peers
         atomic_add (adding 1 to count_pending)
      					team_notify_peers_work
      					  atomic_dec_and_test (dec count_pending to 1)
      					  schedule_delayed_work
      					team_notify_peers_work
      					  atomic_dec_and_test (dec count_pending to 0)
         schedule_delayed_work
      					team_notify_peers_work
      					  atomic_dec_and_test (dec count_pending to -1)
      
      Fix this race by using atomic_dec_if_positive - that will prevent
      count_pending running under 0.
      
      Fixes: fc423ff0 ("team: add peer notification")
      Fixes: 492b200e  ("team: add support for sending multicast rejoins")
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c18e1d2c
    • Eric Dumazet's avatar
      alx: fix alx_poll() · cbac9213
      Eric Dumazet authored
      [ Upstream commit 7a05dc64 ]
      
      Commit d75b1ade ("net: less interrupt masking in NAPI") uncovered
      wrong alx_poll() behavior.
      
      A NAPI poll() handler is supposed to return exactly the budget when/if
      napi_complete() has not been called.
      
      It is also supposed to return number of frames that were received, so
      that netdev_budget can have a meaning.
      
      Also, in case of TX pressure, we still have to dequeue received
      packets : alx_clean_rx_irq() has to be called even if
      alx_clean_tx_irq(alx) returns false, otherwise device is half duplex.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: d75b1ade ("net: less interrupt masking in NAPI")
      Reported-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      Bisected-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      Tested-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      cbac9213
    • Herbert Xu's avatar
      tcp: Do not apply TSO segment limit to non-TSO packets · dc5e8861
      Herbert Xu authored
      [ Upstream commit 843925f3 ]
      
      Thomas Jarosch reported IPsec TCP stalls when a PMTU event occurs.
      
      In fact the problem was completely unrelated to IPsec.  The bug is
      also reproducible if you just disable TSO/GSO.
      
      The problem is that when the MSS goes down, existing queued packet
      on the TX queue that have not been transmitted yet all look like
      TSO packets and get treated as such.
      
      This then triggers a bug where tcp_mss_split_point tells us to
      generate a zero-sized packet on the TX queue.  Once that happens
      we're screwed because the zero-sized packet can never be removed
      by ACKs.
      
      Fixes: 1485348d ("tcp: Apply device TSO segment limit earlier")
      Reported-by: default avatarThomas Jarosch <thomas.jarosch@intra2net.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      
      Cheers,
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      dc5e8861
    • Thomas Graf's avatar
      net: Reset secmark when scrubbing packet · 00c47292
      Thomas Graf authored
      [ Upstream commit b8fb4e06 ]
      
      skb_scrub_packet() is called when a packet switches between a context
      such as between underlay and overlay, between namespaces, or between
      L3 subnets.
      
      While we already scrub the packet mark, connection tracking entry,
      and cached destination, the security mark/context is left intact.
      
      It seems wrong to inherit the security context of a packet when going
      from overlay to underlay or across forwarding paths.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Acked-by: default avatarFlavio Leitner <fbl@sysclose.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      00c47292
    • Toshiaki Makita's avatar
      net: Fix stacked vlan offload features computation · 6e1a8eb4
      Toshiaki Makita authored
      [ Upstream commit 796f2da8 ]
      
      When vlan tags are stacked, it is very likely that the outer tag is stored
      in skb->vlan_tci and skb->protocol shows the inner tag's vlan_proto.
      Currently netif_skb_features() first looks at skb->protocol even if there
      is the outer tag in vlan_tci, thus it incorrectly retrieves the protocol
      encapsulated by the inner vlan instead of the inner vlan protocol.
      This allows GSO packets to be passed to HW and they end up being
      corrupted.
      
      Fixes: 58e998c6 ("offloading: Force software GSO for multiple vlan tags.")
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      6e1a8eb4
    • Prashant Sreedharan's avatar
      tg3: tg3_disable_ints using uninitialized mailbox value to disable interrupts · e6f7e1dc
      Prashant Sreedharan authored
      [ Upstream commit 05b0aa57 ]
      
      During driver load in tg3_init_one, if the driver detects DMA activity before
      intializing the chip tg3_halt is called. As part of tg3_halt interrupts are
      disabled using routine tg3_disable_ints. This routine was using mailbox value
      which was not initialized (default value is 0). As a result driver was writing
      0x00000001 to pci config space register 0, which is the vendor id / device id.
      
      This driver bug was exposed because of the commit a7877b17a667 (PCI: Check only
      the Vendor ID to identify Configuration Request Retry). Also this issue is only
      seen in older generation chipsets like 5722 because config space write to offset
      0 from driver is possible. The newer generation chips ignore writes to offset 0.
      Also without commit a7877b17a667, for these older chips when a GRC reset is
      issued the Bootcode would reprogram the vendor id/device id, which is the reason
      this bug was masked earlier.
      
      Fixed by initializing the interrupt mailbox registers before calling tg3_halt.
      
      Please queue for -stable.
      Reported-by: default avatarNils Holland <nholland@tisys.org>
      Reported-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarPrashant Sreedharan <prashant@broadcom.com>
      Signed-off-by: default avatarMichael Chan <mchan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      e6f7e1dc
    • stephen hemminger's avatar
      in6: fix conflict with glibc · ef260858
      stephen hemminger authored
      [ Upstream commit 6d08acd2 ]
      
      Resolve conflicts between glibc definition of IPV6 socket options
      and those defined in Linux headers. Looks like earlier efforts to
      solve this did not cover all the definitions.
      
      It resolves warnings during iproute2 build.
      Please consider for stable as well.
      Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ef260858
    • Thomas Graf's avatar
      netlink: Don't reorder loads/stores before marking mmap netlink frame as available · d7ab02d6
      Thomas Graf authored
      [ Upstream commit a18e6a18 ]
      
      Each mmap Netlink frame contains a status field which indicates
      whether the frame is unused, reserved, contains data or needs to
      be skipped. Both loads and stores may not be reordeded and must
      complete before the status field is changed and another CPU might
      pick up the frame for use. Use an smp_mb() to cover needs of both
      types of callers to netlink_set_status(), callers which have been
      reading data frame from the frame, and callers which have been
      filling or releasing and thus writing to the frame.
      
      - Example code path requiring a smp_rmb():
        memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, hdr->nm_len);
        netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
      
      - Example code path requiring a smp_wmb():
        hdr->nm_uid	= from_kuid(sk_user_ns(sk), NETLINK_CB(skb).creds.uid);
        hdr->nm_gid	= from_kgid(sk_user_ns(sk), NETLINK_CB(skb).creds.gid);
        netlink_frame_flush_dcache(hdr);
        netlink_set_status(hdr, NL_MMAP_STATUS_VALID);
      
      Fixes: f9c228 ("netlink: implement memory mapped recvmsg()")
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d7ab02d6
    • David Miller's avatar
      netlink: Always copy on mmap TX. · 837f719b
      David Miller authored
      [ Upstream commit 4682a035 ]
      
      Checking the file f_count and the nlk->mapped count is not completely
      sufficient to prevent the mmap'd area contents from changing from
      under us during netlink mmap sendmsg() operations.
      
      Be careful to sample the header's length field only once, because this
      could change from under us as well.
      
      Fixes: 5fd96123 ("netlink: implement memory mapped sendmsg()")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      837f719b
    • Linus Torvalds's avatar
      mm: Don't count the stack guard page towards RLIMIT_STACK · 3e89bd36
      Linus Torvalds authored
      commit 690eac53 upstream.
      
      Commit fee7e49d ("mm: propagate error from stack expansion even for
      guard page") made sure that we return the error properly for stack
      growth conditions.  It also theorized that counting the guard page
      towards the stack limit might break something, but also said "Let's see
      if anybody notices".
      
      Somebody did notice.  Apparently android-x86 sets the stack limit very
      close to the limit indeed, and including the guard page in the rlimit
      check causes the android 'zygote' process problems.
      
      So this adds the (fairly trivial) code to make the stack rlimit check be
      against the actual real stack size, rather than the size of the vma that
      includes the guard page.
      Reported-and-tested-by: default avatarChih-Wei Huang <cwhuang@android-x86.org>
      Cc: Jay Foad <jay.foad@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3e89bd36
    • Linus Torvalds's avatar
      mm: propagate error from stack expansion even for guard page · 88d43e17
      Linus Torvalds authored
      commit fee7e49d upstream.
      
      Jay Foad reports that the address sanitizer test (asan) sometimes gets
      confused by a stack pointer that ends up being outside the stack vma
      that is reported by /proc/maps.
      
      This happens due to an interaction between RLIMIT_STACK and the guard
      page: when we do the guard page check, we ignore the potential error
      from the stack expansion, which effectively results in a missing guard
      page, since the expected stack expansion won't have been done.
      
      And since /proc/maps explicitly ignores the guard page (commit
      d7824370: "mm: fix up some user-visible effects of the stack guard
      page"), the stack pointer ends up being outside the reported stack area.
      
      This is the minimal patch: it just propagates the error.  It also
      effectively makes the guard page part of the stack limit, which in turn
      measn that the actual real stack is one page less than the stack limit.
      
      Let's see if anybody notices.  We could teach acct_stack_growth() to
      allow an extra page for a grow-up/grow-down stack in the rlimit test,
      but I don't want to add more complexity if it isn't needed.
      Reported-and-tested-by: default avatarJay Foad <jay.foad@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      88d43e17
    • Vlastimil Babka's avatar
      mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed · 72af0d9a
      Vlastimil Babka authored
      commit 9e5e3661 upstream.
      
      Charles Shirron and Paul Cassella from Cray Inc have reported kswapd
      stuck in a busy loop with nothing left to balance, but
      kswapd_try_to_sleep() failing to sleep.  Their analysis found the cause
      to be a combination of several factors:
      
      1. A process is waiting in throttle_direct_reclaim() on pgdat->pfmemalloc_wait
      
      2. The process has been killed (by OOM in this case), but has not yet been
         scheduled to remove itself from the waitqueue and die.
      
      3. kswapd checks for throttled processes in prepare_kswapd_sleep():
      
              if (waitqueue_active(&pgdat->pfmemalloc_wait)) {
                      wake_up(&pgdat->pfmemalloc_wait);
      		return false; // kswapd will not go to sleep
      	}
      
         However, for a process that was already killed, wake_up() does not remove
         the process from the waitqueue, since try_to_wake_up() checks its state
         first and returns false when the process is no longer waiting.
      
      4. kswapd is running on the same CPU as the only CPU that the process is
         allowed to run on (through cpus_allowed, or possibly single-cpu system).
      
      5. CONFIG_PREEMPT_NONE=y kernel is used. If there's nothing to balance, kswapd
         encounters no voluntary preemption points and repeatedly fails
         prepare_kswapd_sleep(), blocking the process from running and removing
         itself from the waitqueue, which would let kswapd sleep.
      
      So, the source of the problem is that we prevent kswapd from going to
      sleep until there are processes waiting on the pfmemalloc_wait queue,
      and a process waiting on a queue is guaranteed to be removed from the
      queue only when it gets scheduled.  This was done to make sure that no
      process is left sleeping on pfmemalloc_wait when kswapd itself goes to
      sleep.
      
      However, it isn't necessary to postpone kswapd sleep until the
      pfmemalloc_wait queue actually empties.  To prevent processes from being
      left sleeping, it's actually enough to guarantee that all processes
      waiting on pfmemalloc_wait queue have been woken up by the time we put
      kswapd to sleep.
      
      This patch therefore fixes this issue by substituting 'wake_up' with
      'wake_up_all' and removing 'return false' in the code snippet from
      prepare_kswapd_sleep() above.  Note that if any process puts itself in
      the queue after this waitqueue_active() check, or after the wake up
      itself, it means that the process will also wake up kswapd - and since
      we are under prepare_to_wait(), the wake up won't be missed.  Also we
      update the comment prepare_kswapd_sleep() to hopefully more clearly
      describe the races it is preventing.
      
      Fixes: 5515061d ("mm: throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage")
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      72af0d9a
    • Krzysztof Kozlowski's avatar
      mmc: sdhci: Fix sleep in atomic after inserting SD card · 0cd8eb43
      Krzysztof Kozlowski authored
      commit 2836766a upstream.
      
      Sleep in atomic context happened on Trats2 board after inserting or
      removing SD card because mmc_gpio_get_cd() was called under spin lock.
      
      Fix this by moving card detection earlier, before acquiring spin lock.
      The mmc_gpio_get_cd() call does not have to be protected by spin lock
      because it does not access any sdhci internal data.
      The sdhci_do_get_cd() call access host flags (SDHCI_DEVICE_DEAD). After
      moving it out side of spin lock it could theoretically race with driver
      removal but still there is no actual protection against manual card
      eject.
      
      Dmesg after inserting SD card:
      [   41.663414] BUG: sleeping function called from invalid context at drivers/gpio/gpiolib.c:1511
      [   41.670469] in_atomic(): 1, irqs_disabled(): 128, pid: 30, name: kworker/u8:1
      [   41.677580] INFO: lockdep is turned off.
      [   41.681486] irq event stamp: 61972
      [   41.684872] hardirqs last  enabled at (61971): [<c0490ee0>] _raw_spin_unlock_irq+0x24/0x5c
      [   41.693118] hardirqs last disabled at (61972): [<c04907ac>] _raw_spin_lock_irq+0x18/0x54
      [   41.701190] softirqs last  enabled at (61648): [<c0026fd4>] __do_softirq+0x234/0x2c8
      [   41.708914] softirqs last disabled at (61631): [<c00273a0>] irq_exit+0xd0/0x114
      [   41.716206] Preemption disabled at:[<  (null)>]   (null)
      [   41.721500]
      [   41.722985] CPU: 3 PID: 30 Comm: kworker/u8:1 Tainted: G        W      3.18.0-rc5-next-20141121 #883
      [   41.732111] Workqueue: kmmcd mmc_rescan
      [   41.735945] [<c0014d2c>] (unwind_backtrace) from [<c0011c80>] (show_stack+0x10/0x14)
      [   41.743661] [<c0011c80>] (show_stack) from [<c0489d14>] (dump_stack+0x70/0xbc)
      [   41.750867] [<c0489d14>] (dump_stack) from [<c0228b74>] (gpiod_get_raw_value_cansleep+0x18/0x30)
      [   41.759628] [<c0228b74>] (gpiod_get_raw_value_cansleep) from [<c03646e8>] (mmc_gpio_get_cd+0x38/0x58)
      [   41.768821] [<c03646e8>] (mmc_gpio_get_cd) from [<c036d378>] (sdhci_request+0x50/0x1a4)
      [   41.776808] [<c036d378>] (sdhci_request) from [<c0357934>] (mmc_start_request+0x138/0x268)
      [   41.785051] [<c0357934>] (mmc_start_request) from [<c0357cc8>] (mmc_wait_for_req+0x58/0x1a0)
      [   41.793469] [<c0357cc8>] (mmc_wait_for_req) from [<c0357e68>] (mmc_wait_for_cmd+0x58/0x78)
      [   41.801714] [<c0357e68>] (mmc_wait_for_cmd) from [<c0361c00>] (mmc_io_rw_direct_host+0x98/0x124)
      [   41.810480] [<c0361c00>] (mmc_io_rw_direct_host) from [<c03620f8>] (sdio_reset+0x2c/0x64)
      [   41.818641] [<c03620f8>] (sdio_reset) from [<c035a3d8>] (mmc_rescan+0x254/0x2e4)
      [   41.826028] [<c035a3d8>] (mmc_rescan) from [<c003a0e0>] (process_one_work+0x180/0x3f4)
      [   41.833920] [<c003a0e0>] (process_one_work) from [<c003a3bc>] (worker_thread+0x34/0x4b0)
      [   41.841991] [<c003a3bc>] (worker_thread) from [<c003fed8>] (kthread+0xe4/0x104)
      [   41.849285] [<c003fed8>] (kthread) from [<c000f268>] (ret_from_fork+0x14/0x2c)
      [   42.038276] mmc0: new high speed SDHC card at address 1234
      Signed-off-by: default avatarKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Fixes: 94144a46 ("mmc: sdhci: add get_cd() implementation")
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0cd8eb43
    • Stefan Roese's avatar
      spi: fsl: Fix problem with multi message transfers · 44189ec1
      Stefan Roese authored
      commit 4302a596 upstream.
      
      When used via spidev with more than one messages to tranfer via
      SPI_IOC_MESSAGE the current implementation would return with
      -EINVAL, since bits_per_word and speed_hz are set in all
      transfer structs. And in the 2nd loop status will stay at
      -EINVAL as its not overwritten again via fsl_spi_setup_transfer().
      
      This patch changes this behavious by first checking if one of
      the messages uses different settings. If this is the case
      the function will return with -EINVAL. If not, the messages
      are transferred correctly.
      Signed-off-by: default avatarStefan Roese <sr@denx.de>
      Signed-off-by: default avatarMark Brown <broonie@linaro.org>
      Cc: Esben Haabendal <esbenhaabendal@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      44189ec1
    • Jiri Olsa's avatar
      perf session: Do not fail on processing out of order event · 0f069ee1
      Jiri Olsa authored
      commit f61ff6c0 upstream.
      
      Linus reported perf report command being interrupted due to processing
      of 'out of order' event, with following error:
      
        Timestamp below last timeslice flush
        0x5733a8 [0x28]: failed to process type: 3
      
      I could reproduce the issue and in my case it was caused by one CPU
      (mmap) being behind during record and userspace mmap reader seeing the
      data after other CPUs data were already stored.
      
      This is expected under some circumstances because we need to limit the
      number of events that we queue for reordering when we receive a
      PERF_RECORD_FINISHED_ROUND or when we force flush due to memory
      pressure.
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt.fleming@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1417016371-30249-1-git-send-email-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      [zhangzhiqiang: backport to 3.10:
       - adjust context
       - commit f61ff6c0 struct events_stats was defined in tools/perf/util/event.h
         while 3.10 stable defined in tools/perf/util/hist.h.
       - 3.10 stable there is no pr_oe_time() which used for debug.
       - After the above adjustments, becomes same to the original patch:
         https://github.com/torvalds/linux/commit/f61ff6c06dc8f32c7036013ad802c899ec590607
      ]
      Signed-off-by: default avatarZhiqiang Zhang <zhangzhiqiang.zhang@huawei.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      0f069ee1
    • Jiri Olsa's avatar
      perf: Fix events installation during moving group · a1f8e247
      Jiri Olsa authored
      commit 9fc81d87 upstream.
      
      We allow PMU driver to change the cpu on which the event
      should be installed to. This happened in patch:
      
        e2d37cd2 ("perf: Allow the PMU driver to choose the CPU on which to install events")
      
      This patch also forces all the group members to follow
      the currently opened events cpu if the group happened
      to be moved.
      
      This and the change of event->cpu in perf_install_in_context()
      function introduced in:
      
        0cda4c02 ("perf: Introduce perf_pmu_migrate_context()")
      
      forces group members to change their event->cpu,
      if the currently-opened-event's PMU changed the cpu
      and there is a group move.
      
      Above behaviour causes problem for breakpoint events,
      which uses event->cpu to touch cpu specific data for
      breakpoints accounting. By changing event->cpu, some
      breakpoints slots were wrongly accounted for given
      cpu.
      
      Vinces's perf fuzzer hit this issue and caused following
      WARN on my setup:
      
         WARNING: CPU: 0 PID: 20214 at arch/x86/kernel/hw_breakpoint.c:119 arch_install_hw_breakpoint+0x142/0x150()
         Can't find any breakpoint slot
         [...]
      
      This patch changes the group moving code to keep the event's
      original cpu.
      Reported-by: default avatarVince Weaver <vince@deater.net>
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vince@deater.net>
      Cc: Yan, Zheng <zheng.z.yan@intel.com>
      Link: http://lkml.kernel.org/r/1418243031-20367-3-git-send-email-jolsa@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a1f8e247
    • Jiri Olsa's avatar
      perf/x86/intel/uncore: Make sure only uncore events are collected · 5f34dcba
      Jiri Olsa authored
      commit af91568e upstream.
      
      The uncore_collect_events functions assumes that event group
      might contain only uncore events which is wrong, because it
      might contain any type of events.
      
      This bug leads to uncore framework touching 'not' uncore events,
      which could end up all sorts of bugs.
      
      One was triggered by Vince's perf fuzzer, when the uncore code
      touched breakpoint event private event space as if it was uncore
      event and caused BUG:
      
         BUG: unable to handle kernel paging request at ffffffff82822068
         IP: [<ffffffff81020338>] uncore_assign_events+0x188/0x250
         ...
      
      The code in uncore_assign_events() function was looking for
      event->hw.idx data while the event was initialized as a
      breakpoint with different members in event->hw union.
      
      This patch forces uncore_collect_events() to collect only uncore
      events.
      Reported-by: default avatarVince Weaver <vince@deater.net>
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Yan, Zheng <zheng.z.yan@intel.com>
      Link: http://lkml.kernel.org/r/1418243031-20367-2-git-send-email-jolsa@kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5f34dcba
    • Thomas Petazzoni's avatar
      ARM: mvebu: disable I/O coherency on non-SMP situations on Armada 370/375/38x/XP · 39ca76b2
      Thomas Petazzoni authored
      commit e5535545 upstream.
      
      Enabling the hardware I/O coherency on Armada 370, Armada 375, Armada
      38x and Armada XP requires a certain number of conditions:
      
       - On Armada 370, the cache policy must be set to write-allocate.
      
       - On Armada 375, 38x and XP, the cache policy must be set to
         write-allocate, the pages must be mapped with the shareable
         attribute, and the SMP bit must be set
      
      Currently, on Armada XP, when CONFIG_SMP is enabled, those conditions
      are met. However, when Armada XP is used in a !CONFIG_SMP kernel, none
      of these conditions are met. With Armada 370, the situation is worse:
      since the processor is single core, regardless of whether CONFIG_SMP
      or !CONFIG_SMP is used, the cache policy will be set to write-back by
      the kernel and not write-allocate.
      
      Since solving this problem turns out to be quite complicated, and we
      don't want to let users with a mainline kernel known to have
      infrequent but existing data corruptions, this commit proposes to
      simply disable hardware I/O coherency in situations where it is known
      not to work.
      
      And basically, the is_smp() function of the kernel tells us whether it
      is OK to enable hardware I/O coherency or not, so this commit slightly
      refactors the coherency_type() function to return
      COHERENCY_FABRIC_TYPE_NONE when is_smp() is false, or the appropriate
      type of the coherency fabric in the other case.
      
      Thanks to this, the I/O coherency fabric will no longer be used at all
      in !CONFIG_SMP configurations. It will continue to be used in
      CONFIG_SMP configurations on Armada XP, Armada 375 and Armada 38x
      (which are multiple cores processors), but will no longer be used on
      Armada 370 (which is a single core processor).
      
      In the process, it simplifies the implementation of the
      coherency_type() function, and adds a missing call to of_node_put().
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Fixes: e60304f8 ("arm: mvebu: Add hardware I/O Coherency support")
      Acked-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      Link: https://lkml.kernel.org/r/1415871540-20302-3-git-send-email-thomas.petazzoni@free-electrons.comSigned-off-by: default avatarJason Cooper <jason@lakedaemon.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      39ca76b2
    • Pavel Machek's avatar
      Revert "ARM: 7830/1: delay: don't bother reporting bogomips in /proc/cpuinfo" · f7ab6b67
      Pavel Machek authored
      commit 4bf9636c upstream.
      
      Commit 9fc2105a ("ARM: 7830/1: delay: don't bother reporting
      bogomips in /proc/cpuinfo") breaks audio in python, and probably
      elsewhere, with message
      
        FATAL: cannot locate cpu MHz in /proc/cpuinfo
      
      I'm not the first one to hit it, see for example
      
        https://theredblacktree.wordpress.com/2014/08/10/fatal-cannot-locate-cpu-mhz-in-proccpuinfo/
        https://devtalk.nvidia.com/default/topic/765800/workaround-for-fatal-cannot-locate-cpu-mhz-in-proc-cpuinf/?offset=1
      
      Reading original changelog, I have to say "Stop breaking working setups.
      You know who you are!".
      Signed-off-by: default avatarPavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f7ab6b67
    • Nishanth Menon's avatar
      ARM: OMAP4: PM: Only do static dependency configuration in omap4_init_static_deps · 02d684c2
      Nishanth Menon authored
      commit 9008d83f upstream.
      
      Commit 705814b5 ("ARM: OMAP4+: PM: Consolidate OMAP4 PM code to
      re-use it for OMAP5")
      
      Moved logic generic for OMAP5+ as part of the init routine by
      introducing omap4_pm_init. However, the patch left the powerdomain
      initial setup, an unused omap4430 es1.0 check and a spurious log
      "Power Management for TI OMAP4." in the original code.
      
      Remove the duplicate code which is already present in omap4_pm_init from
      omap4_init_static_deps.
      
      As part of this change, also move the u-boot version print out of the
      static dependency function to the omap4_pm_init function.
      
      Fixes: 705814b5 ("ARM: OMAP4+: PM: Consolidate OMAP4 PM code to re-use it for OMAP5")
      Signed-off-by: default avatarNishanth Menon <nm@ti.com>
      Signed-off-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      02d684c2
    • Johannes Berg's avatar
      scripts/kernel-doc: don't eat struct members with __aligned · 3e0c111c
      Johannes Berg authored
      commit 7b990789 upstream.
      
      The change from \d+ to .+ inside __aligned() means that the following
      structure:
      
        struct test {
              u8 a __aligned(2);
              u8 b __aligned(2);
        };
      
      essentially gets modified to
      
        struct test {
              u8 a;
        };
      
      for purposes of kernel-doc, thus dropping a struct member, which in
      turns causes warnings and invalid kernel-doc generation.
      
      Fix this by replacing the catch-all (".") with anything that's not a
      semicolon ("[^;]").
      
      Fixes: 9dc30918 ("scripts/kernel-doc: handle struct member __aligned without numbers")
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Cc: Nishanth Menon <nm@ti.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      3e0c111c
    • Ryusuke Konishi's avatar
      nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races · bd15ad4f
      Ryusuke Konishi authored
      commit 705304a8 upstream.
      
      Same story as in commit 41080b5a ("nfsd race fixes: ext2") (similar
      ext2 fix) except that nilfs2 needs to use insert_inode_locked4() instead
      of insert_inode_locked() and a bug of a check for dead inodes needs to
      be fixed.
      
      If nilfs_iget() is called from nfsd after nilfs_new_inode() calls
      insert_inode_locked4(), nilfs_iget() will wait for unlock_new_inode() at
      the end of nilfs_mkdir()/nilfs_create()/etc to unlock the inode.
      
      If nilfs_iget() is called before nilfs_new_inode() calls
      insert_inode_locked4(), it will create an in-core inode and read its
      data from the on-disk inode.  But, nilfs_iget() will find i_nlink equals
      zero and fail at nilfs_read_inode_common(), which will lead it to call
      iget_failed() and cleanly fail.
      
      However, this sanity check doesn't work as expected for reused on-disk
      inodes because they leave a non-zero value in i_mode field and it
      hinders the test of i_nlink.  This patch also fixes the issue by
      removing the test on i_mode that nilfs2 doesn't need.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      bd15ad4f
    • Brian Norris's avatar
      mtd: tests: abort torturetest on erase errors · a3c56362
      Brian Norris authored
      commit 68f29815 upstream.
      
      The torture test should quit once it actually induces an error in the
      flash. This step was accidentally removed during refactoring.
      
      Without this fix, the torturetest just continues infinitely, or until
      the maximum cycle count is reached. e.g.:
      
         ...
         [ 7619.218171] mtd_test: error -5 while erasing EB 100
         [ 7619.297981] mtd_test: error -5 while erasing EB 100
         [ 7619.377953] mtd_test: error -5 while erasing EB 100
         [ 7619.457998] mtd_test: error -5 while erasing EB 100
         [ 7619.537990] mtd_test: error -5 while erasing EB 100
         ...
      
      Fixes: 6cf78358 ("mtd: mtd_torturetest: use mtd_test helpers")
      Signed-off-by: default avatarBrian Norris <computersforpeace@gmail.com>
      Cc: Akinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a3c56362
    • Dan Carpenter's avatar
      ceph: do_sync is never initialized · f844de9a
      Dan Carpenter authored
      commit 021b77be upstream.
      
      Probably this code was syncing a lot more often then intended because
      the do_sync variable wasn't set to zero.
      
      Fixes: c62988ec ('ceph: avoid meaningless calling ceph_caps_revoking if sync_mode == WB_SYNC_ALL.')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      f844de9a
    • Benjamin Coddington's avatar
      nfsd4: fix xdr4 inclusion of escaped char · 03e6a1dc
      Benjamin Coddington authored
      commit 5a64e569 upstream.
      
      Fix a bug where nfsd4_encode_components_esc() includes the esc_end char as
      an additional string encoding.
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Fixes: e7a0444a "nfsd: add IPv6 addr escaping to fs_location hosts"
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      03e6a1dc
    • Rasmus Villemoes's avatar
      fs: nfsd: Fix signedness bug in compare_blob · b2abe3c2
      Rasmus Villemoes authored
      commit ef17af2a upstream.
      
      Bugs similar to the one in acbbe6fb (kcmp: fix standard comparison
      bug) are in rich supply.
      
      In this variant, the problem is that struct xdr_netobj::len has type
      unsigned int, so the expression o1->len - o2->len _also_ has type
      unsigned int; it has completely well-defined semantics, and the result
      is some non-negative integer, which is always representable in a long
      long. But this means that if the conditional triggers, we are
      guaranteed to return a positive value from compare_blob.
      
      In this case it could be fixed by
      
      -       res = o1->len - o2->len;
      +       res = (long long)o1->len - (long long)o2->len;
      
      but I'd rather eliminate the usually broken 'return a - b;' idiom.
      Reviewed-by: default avatarJeff Layton <jlayton@primarydata.com>
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b2abe3c2
    • Vitaly Kuznetsov's avatar
      Drivers: hv: vmbus: Fix a race condition when unregistering a device · 5ea7d58c
      Vitaly Kuznetsov authored
      commit 04a258c1 upstream.
      
      When build with Debug the following crash is sometimes observed:
      Call Trace:
       [<ffffffff812b9600>] string+0x40/0x100
       [<ffffffff812bb038>] vsnprintf+0x218/0x5e0
       [<ffffffff810baf7d>] ? trace_hardirqs_off+0xd/0x10
       [<ffffffff812bb4c1>] vscnprintf+0x11/0x30
       [<ffffffff8107a2f0>] vprintk+0xd0/0x5c0
       [<ffffffffa0051ea0>] ? vmbus_process_rescind_offer+0x0/0x110 [hv_vmbus]
       [<ffffffff8155c71c>] printk+0x41/0x45
       [<ffffffffa004ebac>] vmbus_device_unregister+0x2c/0x40 [hv_vmbus]
       [<ffffffffa0051ecb>] vmbus_process_rescind_offer+0x2b/0x110 [hv_vmbus]
      ...
      
      This happens due to the following race: between 'if (channel->device_obj)' check
      in vmbus_process_rescind_offer() and pr_debug() in vmbus_device_unregister() the
      device can disappear. Fix the issue by taking an additional reference to the
      device before proceeding to vmbus_device_unregister().
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5ea7d58c
    • Christian Riesch's avatar
      n_tty: Fix read_buf race condition, increment read_head after pushing data · fcd05986
      Christian Riesch authored
      commit 8bfbe2de upstream.
      
      Commit 19e2ad6a ("n_tty: Remove overflow
      tests from receive_buf() path") moved the increment of read_head into
      the arguments list of read_buf_addr(). Function calls represent a
      sequence point in C. Therefore read_head is incremented before the
      character c is placed in the buffer. Since the circular read buffer is
      a lock-less design since commit 6d76bd26
      ("n_tty: Make N_TTY ldisc receive path lockless"), this creates a race
      condition that leads to communication errors.
      
      This patch modifies the code to increment read_head _after_ the data
      is placed in the buffer and thus fixes the race for non-SMP machines.
      To fix the problem for SMP machines, memory barriers must be added in
      a separate patch.
      Signed-off-by: default avatarChristian Riesch <christian.riesch@omicron.at>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      fcd05986
    • Robert Baldyga's avatar
      serial: samsung: wait for transfer completion before clock disable · d053001d
      Robert Baldyga authored
      commit 1ff383a4 upstream.
      
      This patch adds waiting until transmit buffer and shifter will be empty
      before clock disabling.
      
      Without this fix it's possible to have clock disabled while data was
      not transmited yet, which causes unproper state of TX line and problems
      in following data transfers.
      Signed-off-by: default avatarRobert Baldyga <r.baldyga@samsung.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d053001d
    • Tejun Heo's avatar
      writeback: fix a subtle race condition in I_DIRTY clearing · 36e4033e
      Tejun Heo authored
      commit 9c6ac78e upstream.
      
      After invoking ->dirty_inode(), __mark_inode_dirty() does smp_mb() and
      tests inode->i_state locklessly to see whether it already has all the
      necessary I_DIRTY bits set.  The comment above the barrier doesn't
      contain any useful information - memory barriers can't ensure "changes
      are seen by all cpus" by itself.
      
      And it sure enough was broken.  Please consider the following
      scenario.
      
       CPU 0					CPU 1
       -------------------------------------------------------------------------------
      
      					enters __writeback_single_inode()
      					grabs inode->i_lock
      					tests PAGECACHE_TAG_DIRTY which is clear
       enters __set_page_dirty()
       grabs mapping->tree_lock
       sets PAGECACHE_TAG_DIRTY
       releases mapping->tree_lock
       leaves __set_page_dirty()
      
       enters __mark_inode_dirty()
       smp_mb()
       sees I_DIRTY_PAGES set
       leaves __mark_inode_dirty()
      					clears I_DIRTY_PAGES
      					releases inode->i_lock
      
      Now @inode has dirty pages w/ I_DIRTY_PAGES clear.  This doesn't seem
      to lead to an immediately critical problem because requeue_inode()
      later checks PAGECACHE_TAG_DIRTY instead of I_DIRTY_PAGES when
      deciding whether the inode needs to be requeued for IO and there are
      enough unintentional memory barriers inbetween, so while the inode
      ends up with inconsistent I_DIRTY_PAGES flag, it doesn't fall off the
      IO list.
      
      The lack of explicit barrier may also theoretically affect the other
      I_DIRTY bits which deal with metadata dirtiness.  There is no
      guarantee that a strong enough barrier exists between
      I_DIRTY_[DATA]SYNC clearing and write_inode() writing out the dirtied
      inode.  Filesystem inode writeout path likely has enough stuff which
      can behave as full barrier but it's theoretically possible that the
      writeout may not see all the updates from ->dirty_inode().
      
      Fix it by adding an explicit smp_mb() after I_DIRTY clearing.  Note
      that I_DIRTY_PAGES needs a special treatment as it always needs to be
      cleared to be interlocked with the lockless test on
      __mark_inode_dirty() side.  It's cleared unconditionally and
      reinstated after smp_mb() if the mapping still has dirty pages.
      
      Also add comments explaining how and why the barriers are paired.
      
      Lightly tested.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      36e4033e
    • Oliver Neukum's avatar
      cdc-acm: memory leak in error case · b38841c6
      Oliver Neukum authored
      commit d908f847 upstream.
      
      If probe() fails not only the attributes need to be removed
      but also the memory freed.
      Reported-by: default avatarAhmed Tamrawi <ahmedtamrawi@gmail.com>
      Signed-off-by: default avatarOliver Neukum <oneukum@suse.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      b38841c6
    • Jens Axboe's avatar
      genhd: check for int overflow in disk_expand_part_tbl() · faabc4f5
      Jens Axboe authored
      commit 5fabcb4c upstream.
      
      We can get here from blkdev_ioctl() -> blkpg_ioctl() -> add_partition()
      with a user passed in partno value. If we pass in 0x7fffffff, the
      new target in disk_expand_part_tbl() overflows the 'int' and we
      access beyond the end of ptbl->part[] and even write to it when we
      do the rcu_assign_pointer() to assign the new partition.
      Reported-by: default avatarDavid Ramos <daramos@stanford.edu>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      faabc4f5
    • Steev Klimaszewski's avatar
      Add USB_EHCI_EXYNOS to multi_v7_defconfig · d379dec7
      Steev Klimaszewski authored
      commit 007487f1 upstream.
      
      Currently we enable Exynos devices in the multi v7 defconfig, however, when
      testing on my ODROID-U3, I noticed that USB was not working.  Enabling this
      option causes USB to work, which enables networking support as well since the
      ODROID-U3 has networking on the USB bus.
      
      [arnd] Support for odroid-u3 was added in 3.10, so it would be nice to
      backport this fix at least that far.
      Signed-off-by: default avatarSteev Klimaszewski <steev@gentoo.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d379dec7