1. 10 Feb, 2015 15 commits
    • Eric Dumazet's avatar
      ipv4: tcp: get rid of ugly unicast_sock · 1a95b47a
      Eric Dumazet authored
      [ Upstream commit bdbbb852 ]
      
      In commit be9f4a44 ("ipv4: tcp: remove per net tcp_sock")
      I tried to address contention on a socket lock, but the solution
      I chose was horrible :
      
      commit 3a7c384f ("ipv4: tcp: unicast_sock should not land outside
      of TCP stack") addressed a selinux regression.
      
      commit 0980e56e ("ipv4: tcp: set unicast_sock uc_ttl to -1")
      took care of another regression.
      
      commit b5ec8eea ("ipv4: fix ip_send_skb()") fixed another regression.
      
      commit 811230cd ("tcp: ipv4: initialize unicast_sock sk_pacing_rate")
      was another shot in the dark.
      
      Really, just use a proper socket per cpu, and remove the skb_orphan()
      call, to re-enable flow control.
      
      This solves a serious problem with FQ packet scheduler when used in
      hostile environments, as we do not want to allocate a flow structure
      for every RST packet sent in response to a spoofed packet.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      1a95b47a
    • Eric Dumazet's avatar
      tcp: ipv4: initialize unicast_sock sk_pacing_rate · 5e1ab4d7
      Eric Dumazet authored
      [ Upstream commit 811230cd ]
      
      When I added sk_pacing_rate field, I forgot to initialize its value
      in the per cpu unicast_sock used in ip_send_unicast_reply()
      
      This means that for sch_fq users, RST packets, or ACK packets sent
      on behalf of TIME_WAIT sockets might be sent to slowly or even dropped
      once we reach the per flow limit.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: 95bd09eb ("tcp: TSO packets automatic sizing")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      5e1ab4d7
    • Roopa Prabhu's avatar
      bridge: dont send notification when skb->len == 0 in rtnl_bridge_notify · efa4b71b
      Roopa Prabhu authored
      [ Upstream commit 59ccaaaa ]
      
      Reported in: https://bugzilla.kernel.org/show_bug.cgi?id=92081
      
      This patch avoids calling rtnl_notify if the device ndo_bridge_getlink
      handler does not return any bytes in the skb.
      
      Alternately, the skb->len check can be moved inside rtnl_notify.
      
      For the bridge vlan case described in 92081, there is also a fix needed
      in bridge driver to generate a proper notification. Will fix that in
      subsequent patch.
      
      v2: rebase patch on net tree
      Signed-off-by: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      efa4b71b
    • Christoph Hellwig's avatar
      net: don't OOPS on socket aio · d7a5ebfe
      Christoph Hellwig authored
      [ Upstream commit 06539d30 ]
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d7a5ebfe
    • Govindarajulu Varadarajan's avatar
      bnx2x: fix napi poll return value for repoll · 7b254cb1
      Govindarajulu Varadarajan authored
      [ Upstream commit 24e579c8 ]
      
      With the commit d75b1ade ("net: less interrupt masking in NAPI") napi
      repoll is done only when work_done == budget. When in busy_poll is we return 0
      in napi_poll. We should return budget.
      Signed-off-by: default avatarGovindarajulu Varadarajan <_govind@gmx.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7b254cb1
    • Hannes Frederic Sowa's avatar
      ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too · c196ece7
      Hannes Frederic Sowa authored
      [ Upstream commit 6e9e16e6 ]
      
      Lubomir Rintel reported that during replacing a route the interface
      reference counter isn't correctly decremented.
      
      To quote bug <https://bugzilla.kernel.org/show_bug.cgi?id=91941>:
      | [root@rhel7-5 lkundrak]# sh -x lal
      | + ip link add dev0 type dummy
      | + ip link set dev0 up
      | + ip link add dev1 type dummy
      | + ip link set dev1 up
      | + ip addr add 2001:db8:8086::2/64 dev dev0
      | + ip route add 2001:db8:8086::/48 dev dev0 proto static metric 20
      | + ip route add 2001:db8:8088::/48 dev dev1 proto static metric 10
      | + ip route replace 2001:db8:8086::/48 dev dev1 proto static metric 20
      | + ip link del dev0 type dummy
      | Message from syslogd@rhel7-5 at Jan 23 10:54:41 ...
      |  kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2
      |
      | Message from syslogd@rhel7-5 at Jan 23 10:54:51 ...
      |  kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2
      
      During replacement of a rt6_info we must walk all parent nodes and check
      if the to be replaced rt6_info got propagated. If so, replace it with
      an alive one.
      
      Fixes: 4a287eba ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag")
      Reported-by: default avatarLubomir Rintel <lkundrak@v3.sk>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Tested-by: default avatarLubomir Rintel <lkundrak@v3.sk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c196ece7
    • subashab@codeaurora.org's avatar
      ping: Fix race in free in receive path · 757651a7
      subashab@codeaurora.org authored
      [ Upstream commit fc752f1f ]
      
      An exception is seen in ICMP ping receive path where the skb
      destructor sock_rfree() tries to access a freed socket. This happens
      because ping_rcv() releases socket reference with sock_put() and this
      internally frees up the socket. Later icmp_rcv() will try to free the
      skb and as part of this, skb destructor is called and which leads
      to a kernel panic as the socket is freed already in ping_rcv().
      
      -->|exception
      -007|sk_mem_uncharge
      -007|sock_rfree
      -008|skb_release_head_state
      -009|skb_release_all
      -009|__kfree_skb
      -010|kfree_skb
      -011|icmp_rcv
      -012|ip_local_deliver_finish
      
      Fix this incorrect free by cloning this skb and processing this cloned
      skb instead.
      
      This patch was suggested by Eric Dumazet
      Signed-off-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      757651a7
    • Herbert Xu's avatar
      udp_diag: Fix socket skipping within chain · fd09a4de
      Herbert Xu authored
      [ Upstream commit 86f3cddb ]
      
      While working on rhashtable walking I noticed that the UDP diag
      dumping code is buggy.  In particular, the socket skipping within
      a chain never happens, even though we record the number of sockets
      that should be skipped.
      
      As this code was supposedly copied from TCP, this patch does what
      TCP does and resets num before we walk a chain.
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: default avatarPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      fd09a4de
    • Hannes Frederic Sowa's avatar
      ipv4: try to cache dst_entries which would cause a redirect · 8277cc54
      Hannes Frederic Sowa authored
      [ Upstream commit df4d9254 ]
      
      Not caching dst_entries which cause redirects could be exploited by hosts
      on the same subnet, causing a severe DoS attack. This effect aggravated
      since commit f8864972 ("ipv4: fix dst race in sk_dst_get()").
      
      Lookups causing redirects will be allocated with DST_NOCACHE set which
      will force dst_release to free them via RCU.  Unfortunately waiting for
      RCU grace period just takes too long, we can end up with >1M dst_entries
      waiting to be released and the system will run OOM. rcuos threads cannot
      catch up under high softirq load.
      
      Attaching the flag to emit a redirect later on to the specific skb allows
      us to cache those dst_entries thus reducing the pressure on allocation
      and deallocation.
      
      This issue was discovered by Marcelo Leitner.
      
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarMarcelo Leitner <mleitner@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      8277cc54
    • Daniel Borkmann's avatar
      net: sctp: fix slab corruption from use after free on INIT collisions · 43e39c2f
      Daniel Borkmann authored
      [ Upstream commit 600ddd68 ]
      
      When hitting an INIT collision case during the 4WHS with AUTH enabled, as
      already described in detail in commit 1be9a950 ("net: sctp: inherit
      auth_capable on INIT collisions"), it can happen that we occasionally
      still remotely trigger the following panic on server side which seems to
      have been uncovered after the fix from commit 1be9a950 ...
      
      [  533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff
      [  533.913657] IP: [<ffffffff811ac385>] __kmalloc+0x95/0x230
      [  533.940559] PGD 5030f2067 PUD 0
      [  533.957104] Oops: 0000 [#1] SMP
      [  533.974283] Modules linked in: sctp mlx4_en [...]
      [  534.939704] Call Trace:
      [  534.951833]  [<ffffffff81294e30>] ? crypto_init_shash_ops+0x60/0xf0
      [  534.984213]  [<ffffffff81294e30>] crypto_init_shash_ops+0x60/0xf0
      [  535.015025]  [<ffffffff8128c8ed>] __crypto_alloc_tfm+0x6d/0x170
      [  535.045661]  [<ffffffff8128d12c>] crypto_alloc_base+0x4c/0xb0
      [  535.074593]  [<ffffffff8160bd42>] ? _raw_spin_lock_bh+0x12/0x50
      [  535.105239]  [<ffffffffa0418c11>] sctp_inet_listen+0x161/0x1e0 [sctp]
      [  535.138606]  [<ffffffff814e43bd>] SyS_listen+0x9d/0xb0
      [  535.166848]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b
      
      ... or depending on the the application, for example this one:
      
      [ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff
      [ 1370.026506] IP: [<ffffffff811ab455>] kmem_cache_alloc+0x75/0x1d0
      [ 1370.054568] PGD 633c94067 PUD 0
      [ 1370.070446] Oops: 0000 [#1] SMP
      [ 1370.085010] Modules linked in: sctp kvm_amd kvm [...]
      [ 1370.963431] Call Trace:
      [ 1370.974632]  [<ffffffff8120f7cf>] ? SyS_epoll_ctl+0x53f/0x960
      [ 1371.000863]  [<ffffffff8120f7cf>] SyS_epoll_ctl+0x53f/0x960
      [ 1371.027154]  [<ffffffff812100d3>] ? anon_inode_getfile+0xd3/0x170
      [ 1371.054679]  [<ffffffff811e3d67>] ? __alloc_fd+0xa7/0x130
      [ 1371.080183]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b
      
      With slab debugging enabled, we can see that the poison has been overwritten:
      
      [  669.826368] BUG kmalloc-128 (Tainted: G        W     ): Poison overwritten
      [  669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b
      [  669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494
      [  669.826424]  __slab_alloc+0x4bf/0x566
      [  669.826433]  __kmalloc+0x280/0x310
      [  669.826453]  sctp_auth_create_key+0x23/0x50 [sctp]
      [  669.826471]  sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp]
      [  669.826488]  sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp]
      [  669.826505]  sctp_do_sm+0x29d/0x17c0 [sctp] [...]
      [  669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494
      [  669.826635]  __slab_free+0x39/0x2a8
      [  669.826643]  kfree+0x1d6/0x230
      [  669.826650]  kzfree+0x31/0x40
      [  669.826666]  sctp_auth_key_put+0x19/0x20 [sctp]
      [  669.826681]  sctp_assoc_update+0x1ee/0x2d0 [sctp]
      [  669.826695]  sctp_do_sm+0x674/0x17c0 [sctp]
      
      Since this only triggers in some collision-cases with AUTH, the problem at
      heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice
      when having refcnt 1, once directly in sctp_assoc_update() and yet again
      from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on
      the already kzfree'd memory, which is also consistent with the observation
      of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected
      at a later point in time when poison is checked on new allocation).
      
      Reference counting of auth keys revisited:
      
      Shared keys for AUTH chunks are being stored in endpoints and associations
      in endpoint_shared_keys list. On endpoint creation, a null key is being
      added; on association creation, all endpoint shared keys are being cached
      and thus cloned over to the association. struct sctp_shared_key only holds
      a pointer to the actual key bytes, that is, struct sctp_auth_bytes which
      keeps track of users internally through refcounting. Naturally, on assoc
      or enpoint destruction, sctp_shared_key are being destroyed directly and
      the reference on sctp_auth_bytes dropped.
      
      User space can add keys to either list via setsockopt(2) through struct
      sctp_authkey and by passing that to sctp_auth_set_key() which replaces or
      adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes
      with refcount 1 and in case of replacement drops the reference on the old
      sctp_auth_bytes. A key can be set active from user space through setsockopt()
      on the id via sctp_auth_set_active_key(), which iterates through either
      endpoint_shared_keys and in case of an assoc, invokes (one of various places)
      sctp_auth_asoc_init_active_key().
      
      sctp_auth_asoc_init_active_key() computes the actual secret from local's
      and peer's random, hmac and shared key parameters and returns a new key
      directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops
      the reference if there was a previous one. The secret, which where we
      eventually double drop the ref comes from sctp_auth_asoc_set_secret() with
      intitial refcount of 1, which also stays unchanged eventually in
      sctp_assoc_update(). This key is later being used for crypto layer to
      set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac().
      
      To close the loop: asoc->asoc_shared_key is freshly allocated secret
      material and independant of the sctp_shared_key management keeping track
      of only shared keys in endpoints and assocs. Hence, also commit 4184b2a7
      ("net: sctp: fix memory leak in auth key management") is independant of
      this bug here since it concerns a different layer (though same structures
      being used eventually). asoc->asoc_shared_key is reference dropped correctly
      on assoc destruction in sctp_association_free() and when active keys are
      being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount
      of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is
      to remove that sctp_auth_key_put() from there which fixes these panics.
      
      Fixes: 730fc3d0 ("[SCTP]: Implete SCTP-AUTH parameter processing")
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      43e39c2f
    • Eric Dumazet's avatar
      netxen: fix netxen_nic_poll() logic · 795c55d8
      Eric Dumazet authored
      [ Upstream commit 6088beef ]
      
      NAPI poll logic now enforces that a poller returns exactly the budget
      when it wants to be called again.
      
      If a driver limits TX completion, it has to return budget as well when
      the limit is hit, not the number of received packets.
      Reported-and-tested-by: default avatarMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: d75b1ade ("net: less interrupt masking in NAPI")
      Cc: Manish Chopra <manish.chopra@qlogic.com>
      Acked-by: default avatarManish Chopra <manish.chopra@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      795c55d8
    • Hagen Paul Pfeifer's avatar
      ipv6: stop sending PTB packets for MTU < 1280 · a20ac4fd
      Hagen Paul Pfeifer authored
      [ Upstream commit 9d289715 ]
      
      Reduce the attack vector and stop generating IPv6 Fragment Header for
      paths with an MTU smaller than the minimum required IPv6 MTU
      size (1280 byte) - called atomic fragments.
      
      See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1]
      for more information and how this "feature" can be misused.
      
      [1] https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00Signed-off-by: default avatarFernando Gont <fgont@si6networks.com>
      Signed-off-by: default avatarHagen Paul Pfeifer <hagen@jauu.net>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      a20ac4fd
    • Eric Dumazet's avatar
      net: rps: fix cpu unplug · 488070c6
      Eric Dumazet authored
      [ Upstream commit ac64da0b ]
      
      softnet_data.input_pkt_queue is protected by a spinlock that
      we must hold when transferring packets from victim queue to an active
      one. This is because other cpus could still be trying to enqueue packets
      into victim queue.
      
      A second problem is that when we transfert the NAPI poll_list from
      victim to current cpu, we absolutely need to special case the percpu
      backlog, because we do not want to add complex locking to protect
      process_queue : Only owner cpu is allowed to manipulate it, unless cpu
      is offline.
      
      Based on initial patch from Prasad Sodagudi & Subash Abhinov
      Kasiviswanathan.
      
      This version is better because we do not slow down packet processing,
      only make migration safer.
      Reported-by: default avatarPrasad Sodagudi <psodagud@codeaurora.org>
      Reported-by: default avatarSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      488070c6
    • Willem de Bruijn's avatar
      ip: zero sockaddr returned on error queue · 7c82148b
      Willem de Bruijn authored
      [ Upstream commit f812116b ]
      
      The sockaddr is returned in IP(V6)_RECVERR as part of errhdr. That
      structure is defined and allocated on the stack as
      
          struct {
                  struct sock_extended_err ee;
                  struct sockaddr_in(6)    offender;
          } errhdr;
      
      The second part is only initialized for certain SO_EE_ORIGIN values.
      Always initialize it completely.
      
      An MTU exceeded error on a SOCK_RAW/IPPROTO_RAW is one example that
      would return uninitialized bytes.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      
      ----
      
      Also verified that there is no padding between errhdr.ee and
      errhdr.offender that could leak additional kernel data.
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      7c82148b
    • Mathias Krause's avatar
      crypto: crc32c - add missing crypto module alias · 21c29e8e
      Mathias Krause authored
      The backport of commit 5d26a105 ("crypto: prefix module autoloading
      with "crypto-"") lost the MODULE_ALIAS_CRYPTO() annotation of crc32c.c.
      Add it to fix the reported filesystem related regressions.
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Reported-by: default avatarPhilip Müller <philm@manjaro.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Rob McCathie <rob@manjaro.org>
      Cc: Luis Henriques <luis.henriques@canonical.com>
      Cc: Kamal Mostafa <kamal@canonical.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      21c29e8e
  2. 09 Feb, 2015 18 commits
  3. 08 Feb, 2015 7 commits
    • Tejun Heo's avatar
      workqueue: fix subtle pool management issue which can stall whole worker_pool · c6662b96
      Tejun Heo authored
      commit 29187a9e upstream.
      
      A worker_pool's forward progress is guaranteed by the fact that the
      last idle worker assumes the manager role to create more workers and
      summon the rescuers if creating workers doesn't succeed in timely
      manner before proceeding to execute work items.
      
      This manager role is implemented in manage_workers(), which indicates
      whether the worker may proceed to work item execution with its return
      value.  This is necessary because multiple workers may contend for the
      manager role, and, if there already is a manager, others should
      proceed to work item execution.
      
      Unfortunately, the function also indicates that the worker may proceed
      to work item execution if need_to_create_worker() is false at the head
      of the function.  need_to_create_worker() tests the following
      conditions.
      
      	pending work items && !nr_running && !nr_idle
      
      The first and third conditions are protected by pool->lock and thus
      won't change while holding pool->lock; however, nr_running can change
      asynchronously as other workers block and resume and while it's likely
      to be zero, as someone woke this worker up in the first place, some
      other workers could have become runnable inbetween making it non-zero.
      
      If this happens, manage_worker() could return false even with zero
      nr_idle making the worker, the last idle one, proceed to execute work
      items.  If then all workers of the pool end up blocking on a resource
      which can only be released by a work item which is pending on that
      pool, the whole pool can deadlock as there's no one to create more
      workers or summon the rescuers.
      
      This patch fixes the problem by removing the early exit condition from
      maybe_create_worker() and making manage_workers() return false iff
      there's already another manager, which ensures that the last worker
      doesn't start executing work items.
      
      We can leave the early exit condition alone and just ignore the return
      value but the only reason it was put there is because the
      manage_workers() used to perform both creations and destructions of
      workers and thus the function may be invoked while the pool is trying
      to reduce the number of workers.  Now that manage_workers() is called
      only when more workers are needed, the only case this early exit
      condition is triggered is rare race conditions rendering it pointless.
      
      Tested with simulated workload and modified workqueue code which
      trigger the pool deadlock reliably without this patch.
      
      tj: Updated to v3.14 where manage_workers() is responsible not only
          for creating more workers but also destroying surplus ones.
          maybe_create_worker() needs to keep its early exit condition to
          avoid creating a new worker when manage_workers() is called to
          destroy surplus ones.  Other than that, the adaptabion is
          straight-forward.  Both maybe_{create|destroy}_worker() functions
          are converted to return void and manage_workers() returns %false
          iff it lost manager arbitration.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarEric Sandeen <sandeen@sandeen.net>
      Link: http://lkml.kernel.org/g/54B019F4.8030009@sandeen.net
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      c6662b96
    • Ilya Dryomov's avatar
      rbd: fix rbd_dev_parent_get() when parent_overlap == 0 · 926e01a5
      Ilya Dryomov authored
      commit ae43e9d0 upstream.
      
      The comment for rbd_dev_parent_get() said
      
          * We must get the reference before checking for the overlap to
          * coordinate properly with zeroing the parent overlap in
          * rbd_dev_v2_parent_info() when an image gets flattened.  We
          * drop it again if there is no overlap.
      
      but the "drop it again if there is no overlap" part was missing from
      the implementation.  This lead to absurd parent_ref values for images
      with parent_overlap == 0, as parent_ref was incremented for each
      img_request and virtually never decremented.
      
      Fix this by leveraging the fact that refresh path calls
      rbd_dev_v2_parent_info() under header_rwsem and use it for read in
      rbd_dev_parent_get(), instead of messing around with atomics.  Get rid
      of barriers in rbd_dev_v2_parent_info() while at it - I don't see what
      they'd pair with now and I suspect we are in a pretty miserable
      situation as far as proper locking goes regardless.
      Signed-off-by: default avatarIlya Dryomov <idryomov@redhat.com>
      Reviewed-by: default avatarJosh Durgin <jdurgin@redhat.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      [idryomov@redhat.com: backport to 3.14: context]
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      926e01a5
    • Liu ShuoX's avatar
      pstore: Fix NULL pointer fault if get NULL prz in ramoops_get_next_prz · ed64a543
      Liu ShuoX authored
      commit b0aa931f upstream.
      
      ramoops_get_next_prz get the prz according the paramters. If it get a
      uninitialized prz, access its members by following persistent_ram_old_size(prz)
      will cause a NULL pointer crash.
      Ex: if ftrace_size is 0, fprz will be NULL.
      
      Fix it by return NULL in advance.
      Signed-off-by: default avatarLiu ShuoX <shuox.liu@intel.com>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: HuKeping <hukeping@huawei.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      ed64a543
    • Liu ShuoX's avatar
      pstore: skip zero size persistent ram buffer in traverse · 013d52e2
      Liu ShuoX authored
      commit aa9a4a1e upstream.
      
      In ramoops_pstore_read, a valid prz pointer with zero size buffer will
      break traverse of all persistent ram buffers.  The latter buffer might be
      lost.
      Signed-off-by: default avatarLiu ShuoX <shuox.liu@intel.com>
      Cc: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
      Cc: Colin Cross <ccross@android.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: HuKeping <hukeping@huawei.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      013d52e2
    • Liu ShuoX's avatar
      pstore: clarify clearing of _read_cnt in ramoops_context · 42ecc20b
      Liu ShuoX authored
      commit 57fd8353 upstream.
      
      *_read_cnt in ramoops_context need to be cleared during pstore ->open to
      support mutli times getting the records.  The patch added missed
      ftrace_read_cnt clearing and removed duplicate clearing in ramoops_probe.
      Signed-off-by: default avatarLiu ShuoX <shuox.liu@intel.com>
      Cc: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
      Cc: Colin Cross <ccross@android.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: HuKeping <hukeping@huawei.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      42ecc20b
    • Russell King's avatar
      ARM: DMA: ensure that old section mappings are flushed from the TLB · 51173bf8
      Russell King authored
      commit 6b076991 upstream.
      
      When setting up the CMA region, we must ensure that the old section
      mappings are flushed from the TLB before replacing them with page
      tables, otherwise we can suffer from mismatched aliases if the CPU
      speculatively prefetches from these mappings at an inopportune time.
      
      A mismatched alias can occur when the TLB contains a section mapping,
      but a subsequent prefetch causes it to load a page table mapping,
      resulting in the possibility of the TLB containing two matching
      mappings for the same virtual address region.
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Cc: Hou Pengyang <houpengyang@huawei.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      51173bf8
    • Bob Paauwe's avatar
      drm/i915: Only fence tiled region of object. · d5ce8947
      Bob Paauwe authored
      commit af1a7301 upstream.
      
      When creating a fence for a tiled object, only fence the area that
      makes up the actual tiles.  The object may be larger than the tiled
      area and if we allow those extra addresses to be fenced, they'll
      get converted to addresses beyond where the object is mapped. This
      opens up the possiblity of writes beyond the end of object.
      
      To prevent this, we adjust the size of the fence to only encompass
      the area that makes up the actual tiles.  The extra space is considered
      un-tiled and now behaves as if it was a linear object.
      
      Testcase: igt/gem_tiled_fence_overflow
      Reported-by: default avatarDan Hettena <danh@ghs.com>
      Signed-off-by: default avatarBob Paauwe <bob.j.paauwe@intel.com>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      d5ce8947