1. 19 Sep, 2018 40 commits
    • Wei Yongjun's avatar
      mtd: ubi: wl: Fix error return code in ubi_wl_init() · 8626c40a
      Wei Yongjun authored
      commit 7233982a upstream.
      
      Fix to return error code -ENOMEM from the kmem_cache_alloc() error
      handling case instead of 0, as done elsewhere in this function.
      
      Fixes: f78e5623 ("ubi: fastmap: Erase outdated anchor PEBs during
      attach")
      Signed-off-by: default avatarWei Yongjun <weiyongjun1@huawei.com>
      Reviewed-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8626c40a
    • Taehee Yoo's avatar
      ip: frags: fix crash in ip_do_fragment() · 08fb833b
      Taehee Yoo authored
      commit 5d407b07 upstream
      
      A kernel crash occurrs when defragmented packet is fragmented
      in ip_do_fragment().
      In defragment routine, skb_orphan() is called and
      skb->ip_defrag_offset is set. but skb->sk and
      skb->ip_defrag_offset are same union member. so that
      frag->sk is not NULL.
      Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
      defragmented packet is fragmented.
      
      test commands:
         %iptables -t nat -I POSTROUTING -j MASQUERADE
         %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000
      
      splat looks like:
      [  261.069429] kernel BUG at net/ipv4/ip_output.c:636!
      [  261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [  261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3
      [  261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600
      [  261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c
      [  261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202
      [  261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004
      [  261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8
      [  261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395
      [  261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4
      [  261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000
      [  261.174169] FS:  00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
      [  261.183012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0
      [  261.198158] Call Trace:
      [  261.199018]  ? dst_output+0x180/0x180
      [  261.205011]  ? save_trace+0x300/0x300
      [  261.209018]  ? ip_copy_metadata+0xb00/0xb00
      [  261.213034]  ? sched_clock_local+0xd4/0x140
      [  261.218158]  ? kill_l4proto+0x120/0x120 [nf_conntrack]
      [  261.223014]  ? rt_cpu_seq_stop+0x10/0x10
      [  261.227014]  ? find_held_lock+0x39/0x1c0
      [  261.233008]  ip_finish_output+0x51d/0xb50
      [  261.237006]  ? ip_fragment.constprop.56+0x220/0x220
      [  261.243011]  ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack]
      [  261.250152]  ? rcu_is_watching+0x77/0x120
      [  261.255010]  ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4]
      [  261.261033]  ? nf_hook_slow+0xb1/0x160
      [  261.265007]  ip_output+0x1c7/0x710
      [  261.269005]  ? ip_mc_output+0x13f0/0x13f0
      [  261.273002]  ? __local_bh_enable_ip+0xe9/0x1b0
      [  261.278152]  ? ip_fragment.constprop.56+0x220/0x220
      [  261.282996]  ? nf_hook_slow+0xb1/0x160
      [  261.287007]  raw_sendmsg+0x21f9/0x4420
      [  261.291008]  ? dst_output+0x180/0x180
      [  261.297003]  ? sched_clock_cpu+0x126/0x170
      [  261.301003]  ? find_held_lock+0x39/0x1c0
      [  261.306155]  ? stop_critical_timings+0x420/0x420
      [  261.311004]  ? check_flags.part.36+0x450/0x450
      [  261.315005]  ? _raw_spin_unlock_irq+0x29/0x40
      [  261.320995]  ? _raw_spin_unlock_irq+0x29/0x40
      [  261.326142]  ? cyc2ns_read_end+0x10/0x10
      [  261.330139]  ? raw_bind+0x280/0x280
      [  261.334138]  ? sched_clock_cpu+0x126/0x170
      [  261.338995]  ? check_flags.part.36+0x450/0x450
      [  261.342991]  ? __lock_acquire+0x4500/0x4500
      [  261.348994]  ? inet_sendmsg+0x11c/0x500
      [  261.352989]  ? dst_output+0x180/0x180
      [  261.357012]  inet_sendmsg+0x11c/0x500
      [ ... ]
      
      v2:
       - clear skb->sk at reassembly routine.(Eric Dumarzet)
      
      Fixes: fa0f5273 ("ip: use rb trees for IP frag queue.")
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      08fb833b
    • Peter Oskolkov's avatar
      ip: process in-order fragments efficiently · b3a0c61b
      Peter Oskolkov authored
      This patch changes the runtime behavior of IP defrag queue:
      incoming in-order fragments are added to the end of the current
      list/"run" of in-order fragments at the tail.
      
      On some workloads, UDP stream performance is substantially improved:
      
      RX: ./udp_stream -F 10 -T 2 -l 60
      TX: ./udp_stream -c -H <host> -F 10 -T 5 -l 60
      
      with this patchset applied on a 10Gbps receiver:
      
        throughput=9524.18
        throughput_units=Mbit/s
      
      upstream (net-next):
      
        throughput=4608.93
        throughput_units=Mbit/s
      Reported-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit a4fd284a)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3a0c61b
    • Peter Oskolkov's avatar
      ip: add helpers to process in-order fragments faster. · c91f27fb
      Peter Oskolkov authored
      This patch introduces several helper functions/macros that will be
      used in the follow-up patch. No runtime changes yet.
      
      The new logic (fully implemented in the second patch) is as follows:
      
      * Nodes in the rb-tree will now contain not single fragments, but lists
        of consecutive fragments ("runs").
      
      * At each point in time, the current "active" run at the tail is
        maintained/tracked. Fragments that arrive in-order, adjacent
        to the previous tail fragment, are added to this tail run without
        triggering the re-balancing of the rb-tree.
      
      * If a fragment arrives out of order with the offset _before_ the tail run,
        it is inserted into the rb-tree as a single fragment.
      
      * If a fragment arrives after the current tail fragment (with a gap),
        it starts a new "tail" run, as is inserted into the rb-tree
        at the end as the head of the new run.
      
      skb->cb is used to store additional information
      needed here (suggested by Eric Dumazet).
      Reported-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 353c9cb3)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c91f27fb
    • Dan Carpenter's avatar
      ipv4: frags: precedence bug in ip_expire() · 04b28f40
      Dan Carpenter authored
      We accidentally removed the parentheses here, but they are required
      because '!' has higher precedence than '&'.
      
      Fixes: fa0f5273 ("ip: use rb trees for IP frag queue.")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 70837ffe)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      04b28f40
    • Eric Dumazet's avatar
      net: sk_buff rbnode reorg · 6b921536
      Eric Dumazet authored
      commit bffa72cf upstream
      
      skb->rbnode shares space with skb->next, skb->prev and skb->tstamp
      
      Current uses (TCP receive ofo queue and netem) need to save/restore
      tstamp, while skb->dev is either NULL (TCP) or a constant for a given
      queue (netem).
      
      Since we plan using an RB tree for TCP retransmit queue to speedup SACK
      processing with large BDP, this patch exchanges skb->dev and
      skb->tstamp.
      
      This saves some overhead in both TCP and netem.
      
      v2: removes the swtstamp field from struct tcp_skb_cb
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b921536
    • Eric Dumazet's avatar
      net: add rb_to_skb() and other rb tree helpers · 37c7cc80
      Eric Dumazet authored
      Geeralize private netem_rb_to_skb()
      
      TCP rtx queue will soon be converted to rb-tree,
      so we will need skb_rbtree_walk() helpers.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 18a4c0ea)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      37c7cc80
    • Eric Dumazet's avatar
      net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends · 6bf32cda
      Eric Dumazet authored
      After working on IP defragmentation lately, I found that some large
      packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
      zero paddings on the last (small) fragment.
      
      While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
      to CHECKSUM_NONE, forcing a full csum validation, even if all prior
      fragments had CHECKSUM_COMPLETE set.
      
      We can instead compute the checksum of the part we are trimming,
      usually smaller than the part we keep.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 88078d98)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6bf32cda
    • Florian Westphal's avatar
      ipv6: defrag: drop non-last frags smaller than min mtu · 5123ffda
      Florian Westphal authored
      don't bother with pathological cases, they only waste cycles.
      IPv6 requires a minimum MTU of 1280 so we should never see fragments
      smaller than this (except last frag).
      
      v3: don't use awkward "-offset + len"
      v2: drop IPv4 part, which added same check w. IPV4_MIN_MTU (68).
          There were concerns that there could be even smaller frags
          generated by intermediate nodes, e.g. on radio networks.
      
      Cc: Peter Oskolkov <posk@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 0ed4229b)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5123ffda
    • Peter Oskolkov's avatar
      net: modify skb_rbtree_purge to return the truesize of all purged skbs. · 3bde783e
      Peter Oskolkov authored
      Tested: see the next patch is the series.
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 385114de)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3bde783e
    • Eric Dumazet's avatar
      net: speed up skb_rbtree_purge() · 7750c414
      Eric Dumazet authored
      As measured in my prior patch ("sch_netem: faster rb tree removal"),
      rbtree_postorder_for_each_entry_safe() is nice looking but much slower
      than using rb_next() directly, except when tree is small enough
      to fit in CPU caches (then the cost is the same)
      
      Also note that there is not even an increase of text size :
      $ size net/core/skbuff.o.before net/core/skbuff.o
         text	   data	    bss	    dec	    hex	filename
        40711	   1298	      0	  42009	   a419	net/core/skbuff.o.before
        40711	   1298	      0	  42009	   a419	net/core/skbuff.o
      
      From: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 7c90584c)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7750c414
    • Peter Oskolkov's avatar
      ip: discard IPv4 datagrams with overlapping segments. · 1c449691
      Peter Oskolkov authored
      This behavior is required in IPv6, and there is little need
      to tolerate overlapping fragments in IPv4. This change
      simplifies the code and eliminates potential DDoS attack vectors.
      
      Tested: ran ip_defrag selftest (not yet available uptream).
      Suggested-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Acked-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 7969e5c4)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c449691
    • Eric Dumazet's avatar
      inet: frags: fix ip6frag_low_thresh boundary · 5fff99e8
      Eric Dumazet authored
      Giving an integer to proc_doulongvec_minmax() is dangerous on 64bit arches,
      since linker might place next to it a non zero value preventing a change
      to ip6frag_low_thresh.
      
      ip6frag_low_thresh is not used anymore in the kernel, but we do not
      want to prematuraly break user scripts wanting to change it.
      
      Since specifying a minimal value of 0 for proc_doulongvec_minmax()
      is moot, let's remove these zero values in all defrag units.
      
      Fixes: 6e00f7dd ("ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMaciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 3d234012)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5fff99e8
    • Eric Dumazet's avatar
      inet: frags: get rid of ipfrag_skb_cb/FRAG_CB · 48c2afc1
      Eric Dumazet authored
      ip_defrag uses skb->cb[] to store the fragment offset, and unfortunately
      this integer is currently in a different cache line than skb->next,
      meaning that we use two cache lines per skb when finding the insertion point.
      
      By aliasing skb->ip_defrag_offset and skb->dev, we pack all the fields
      in a single cache line and save precious memory bandwidth.
      
      Note that after the fast path added by Changli Gao in commit
      d6bebca9 ("fragment: add fast path for in-order fragments")
      this change wont help the fast path, since we still need
      to access prev->len (2nd cache line), but will show great
      benefits when slow path is entered, since we perform
      a linear scan of a potentially long list.
      
      Also, note that this potential long list is an attack vector,
      we might consider also using an rb-tree there eventually.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit bf663371)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      48c2afc1
    • Eric Dumazet's avatar
      inet: frags: reorganize struct netns_frags · 8291cd94
      Eric Dumazet authored
      Put the read-mostly fields in a separate cache line
      at the beginning of struct netns_frags, to reduce
      false sharing noticed in inet_frag_kill()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit c2615cf5)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8291cd94
    • Eric Dumazet's avatar
      rhashtable: reorganize struct rhashtable layout · bd946fb5
      Eric Dumazet authored
      While under frags DDOS I noticed unfortunate false sharing between
      @nelems and @params.automatic_shrinking
      
      Move @nelems at the end of struct rhashtable so that first cache line
      is shared between all cpus, because almost never dirtied.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit e5d672a0)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd946fb5
    • Eric Dumazet's avatar
      ipv6: frags: rewrite ip6_expire_frag_queue() · 3226bdcb
      Eric Dumazet authored
      Make it similar to IPv4 ip_expire(), and release the lock
      before calling icmp functions.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 05c0b86b)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3226bdcb
    • Eric Dumazet's avatar
      inet: frags: do not clone skb in ip_expire() · 085a0147
      Eric Dumazet authored
      An skb_clone() was added in commit ec4fbd64 ("inet: frag: release
      spinlock before calling icmp_send()")
      
      While fixing the bug at that time, it also added a very high cost
      for DDOS frags, as the ICMP rate limit is applied after this
      expensive operation (skb_clone() + consume_skb(), implying memory
      allocations, copy, and freeing)
      
      We can use skb_get(head) here, all we want is to make sure skb wont
      be freed by another cpu.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 1eec5d56)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      085a0147
    • Eric Dumazet's avatar
      inet: frags: break the 2GB limit for frags storage · 990204dd
      Eric Dumazet authored
      Some users are willing to provision huge amounts of memory to be able
      to perform reassembly reasonnably well under pressure.
      
      Current memory tracking is using one atomic_t and integers.
      
      Switch to atomic_long_t so that 64bit arches can use more than 2GB,
      without any cost for 32bit arches.
      
      Note that this patch avoids an overflow error, if high_thresh was set
      to ~2GB, since this test in inet_frag_alloc() was never true :
      
      if (... || frag_mem_limit(nf) > nf->high_thresh)
      
      Tested:
      
      $ echo 16000000000 >/proc/sys/net/ipv4/ipfrag_high_thresh
      
      <frag DDOS>
      
      $ grep FRAG /proc/net/sockstat
      FRAG: inuse 14705885 memory 16000002880
      
      $ nstat -n ; sleep 1 ; nstat | grep Reas
      IpReasmReqds                    3317150            0.0
      IpReasmFails                    3317112            0.0
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 3e67f106)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      990204dd
    • Eric Dumazet's avatar
      inet: frags: remove inet_frag_maybe_warn_overflow() · caa4249e
      Eric Dumazet authored
      This function is obsolete, after rhashtable addition to inet defrag.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 2d44ed22)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      caa4249e
    • Eric Dumazet's avatar
      inet: frags: get rif of inet_frag_evicting() · 5b1b3ad4
      Eric Dumazet authored
      This refactors ip_expire() since one indentation level is removed.
      
      Note: in the future, we should try hard to avoid the skb_clone()
      since this is a serious performance cost.
      Under DDOS, the ICMP message wont be sent because of rate limits.
      
      Fact that ip6_expire_frag_queue() does not use skb_clone() is
      disturbing too. Presumably IPv6 should have the same
      issue than the one we fixed in commit ec4fbd64
      ("inet: frag: release spinlock before calling icmp_send()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 399d1404)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5b1b3ad4
    • Eric Dumazet's avatar
      inet: frags: remove some helpers · bd3df633
      Eric Dumazet authored
      Remove sum_frag_mem_limit(), ip_frag_mem() & ip6_frag_mem()
      
      Also since we use rhashtable we can bring back the number of fragments
      in "grep FRAG /proc/net/sockstat /proc/net/sockstat6" that was
      removed in commit 434d3054 ("inet: frag: don't account number
      of fragment queues")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 6befe4a7)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd3df633
    • Eric Dumazet's avatar
      inet: frags: use rhashtables for reassembly units · 9aee41ef
      Eric Dumazet authored
      Some applications still rely on IP fragmentation, and to be fair linux
      reassembly unit is not working under any serious load.
      
      It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!)
      
      A work queue is supposed to garbage collect items when host is under memory
      pressure, and doing a hash rebuild, changing seed used in hash computations.
      
      This work queue blocks softirqs for up to 25 ms when doing a hash rebuild,
      occurring every 5 seconds if host is under fire.
      
      Then there is the problem of sharing this hash table for all netns.
      
      It is time to switch to rhashtables, and allocate one of them per netns
      to speedup netns dismantle, since this is a critical metric these days.
      
      Lookup is now using RCU. A followup patch will even remove
      the refcount hold/release left from prior implementation and save
      a couple of atomic operations.
      
      Before this patch, 16 cpus (16 RX queue NIC) could not handle more
      than 1 Mpps frags DDOS.
      
      After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB
      of storage for the fragments (exact number depends on frags being evicted
      after timeout)
      
      $ grep FRAG /proc/net/sockstat
      FRAG: inuse 1966916 memory 2140004608
      
      A followup patch will change the limits for 64bit arches.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Alexander Aring <alex.aring@gmail.com>
      Cc: Stefan Schmidt <stefan@osg.samsung.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 648700f7)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9aee41ef
    • Eric Dumazet's avatar
      rhashtable: add schedule points · 33dc9f7c
      Eric Dumazet authored
      Rehashing and destroying large hash table takes a lot of time,
      and happens in process context. It is safe to add cond_resched()
      in rhashtable_rehash_table() and rhashtable_free_and_destroy()
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit ae6da1f5)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33dc9f7c
    • Eric Dumazet's avatar
      ipv6: export ip6 fragments sysctl to unprivileged users · 11be675b
      Eric Dumazet authored
      IPv4 was changed in commit 52a773d6 ("net: Export ip fragment
      sysctl to unprivileged users")
      
      The only sysctl that is not per-netns is not used :
      ip6frag_secret_interval
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Nikolay Borisov <kernel@kyup.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 18dcbe12)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      11be675b
    • Eric Dumazet's avatar
      inet: frags: refactor lowpan_net_frag_init() · 266da0fb
      Eric Dumazet authored
      We want to call lowpan_net_frag_init() earlier.
      Similar to commit "inet: frags: refactor ipv6_frag_init()"
      
      This is a prereq to "inet: frags: use rhashtables for reassembly units"
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 807f1844)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      266da0fb
    • Eric Dumazet's avatar
      inet: frags: refactor ipv6_frag_init() · eb1686ae
      Eric Dumazet authored
      We want to call inet_frags_init() earlier.
      
      This is a prereq to "inet: frags: use rhashtables for reassembly units"
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 5b975bab)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb1686ae
    • Kees Cook's avatar
      inet: frags: Convert timers to use timer_setup() · 0512f7e9
      Kees Cook authored
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly.
      
      Cc: Alexander Aring <alex.aring@gmail.com>
      Cc: Stefan Schmidt <stefan@osg.samsung.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: linux-wpan@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: netfilter-devel@vger.kernel.org
      Cc: coreteam@netfilter.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: Stefan Schmidt <stefan@osg.samsung.com> # for ieee802154
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 78802011)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0512f7e9
    • Eric Dumazet's avatar
      inet: frags: refactor ipfrag_init() · 0cbf74b9
      Eric Dumazet authored
      We need to call inet_frags_init() before register_pernet_subsys(),
      as a prereq for following patch ("inet: frags: use rhashtables for reassembly units")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 483a6e4f)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0cbf74b9
    • Eric Dumazet's avatar
      inet: frags: add a pointer to struct netns_frags · 673220d6
      Eric Dumazet authored
      In order to simplify the API, add a pointer to struct inet_frags.
      This will allow us to make things less complex.
      
      These functions no longer have a struct inet_frags parameter :
      
      inet_frag_destroy(struct inet_frag_queue *q  /*, struct inet_frags *f */)
      inet_frag_put(struct inet_frag_queue *q /*, struct inet_frags *f */)
      inet_frag_kill(struct inet_frag_queue *q /*, struct inet_frags *f */)
      inet_frags_exit_net(struct netns_frags *nf /*, struct inet_frags *f */)
      ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 093ba729)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      673220d6
    • Eric Dumazet's avatar
      inet: frags: change inet_frags_init_net() return value · 6093d5ab
      Eric Dumazet authored
      We will soon initialize one rhashtable per struct netns_frags
      in inet_frags_init_net().
      
      This patch changes the return value to eventually propagate an
      error.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      (cherry picked from commit 787bea77)
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6093d5ab
    • Jani Nikula's avatar
      drm/i915: set DP Main Stream Attribute for color range on DDI platforms · 6f7bf899
      Jani Nikula authored
      commit 6209c285 upstream.
      
      Since Haswell we have no color range indication either in the pipe or
      port registers for DP. Instead, there's a separate register for setting
      the DP Main Stream Attributes (MSA) directly. The MSA register
      definition makes no references to colorimetry, just a vague reference to
      the DP spec. The connection to the color range was lost.
      
      Apparently we've failed to set the proper MSA bit for limited, or CEA,
      range ever since the first DDI platforms. We've started setting other
      MSA parameters since commit dae84799 ("drm/i915: add
      intel_ddi_set_pipe_settings").
      
      Without the crucial bit of information, the DP sink has no way of
      knowing the source is actually transmitting limited range RGB, leading
      to "washed out" colors. With the colorimetry information, compliant
      sinks should be able to handle the limited range properly. Native
      (i.e. non-LSPCON) HDMI was not affected because we do pass the color
      range via AVI infoframes.
      
      Though not the root cause, the problem was made worse for DDI platforms
      with commit 55bc60db ("drm/i915: Add "Automatic" mode for the
      "Broadcast RGB" property"), which selects limited range RGB
      automatically based on the mode, as per the DP, HDMI and CEA specs.
      
      After all these years, the fix boils down to flipping one bit.
      
      [Per testing reports, this fixes DP sinks, but not the LSPCON. My
       educated guess is that the LSPCON fails to turn the CEA range MSA into
       AVI infoframes for HDMI.]
      Reported-by: default avatarMichał Kopeć <mkopec12@gmail.com>
      Reported-by: default avatarN. W. <nw9165-3201@yahoo.com>
      Reported-by: default avatarNicholas Stommel <nicholas.stommel@gmail.com>
      Reported-by: default avatarTom Yan <tom.ty89@gmail.com>
      Tested-by: default avatarNicholas Stommel <nicholas.stommel@gmail.com>
      References: https://bugs.freedesktop.org/show_bug.cgi?id=100023
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107476
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=94921
      Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: <stable@vger.kernel.org> # v3.9+
      Reviewed-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180814060001.18224-1-jani.nikula@intel.com
      (cherry picked from commit dc5977da)
      Signed-off-by: default avatarRodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6f7bf899
    • Parav Pandit's avatar
      RDMA/cma: Do not ignore net namespace for unbound cm_id · bdbf6e0b
      Parav Pandit authored
      [ Upstream commit 643d213a ]
      
      Currently if the cm_id is not bound to any netdevice, than for such cm_id,
      net namespace is ignored; which is incorrect.
      
      Regardless of cm_id bound to a netdevice or not, net namespace must
      match. When a cm_id is bound to a netdevice, in such case net namespace
      and netdevice both must match.
      
      Fixes: 4c21b5bc ("IB/cma: Add net_dev and private data checks to RDMA CM")
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bdbf6e0b
    • Paul Burton's avatar
      MIPS: WARN_ON invalid DMA cache maintenance, not BUG_ON · 0d1d365d
      Paul Burton authored
      [ Upstream commit d4da0e97 ]
      
      If a driver causes DMA cache maintenance with a zero length then we
      currently BUG and kill the kernel. As this is a scenario that we may
      well be able to recover from, WARN & return in the condition instead.
      Signed-off-by: default avatarPaul Burton <paul.burton@mips.com>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Patchwork: https://patchwork.linux-mips.org/patch/14623/
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d1d365d
    • Trond Myklebust's avatar
      NFSv4.1: Fix a potential layoutget/layoutrecall deadlock · 1181e868
      Trond Myklebust authored
      [ Upstream commit bd3d16a8 ]
      
      If the client is sending a layoutget, but the server issues a callback
      to recall what it thinks may be an outstanding layout, then we may find
      an uninitialised layout attached to the inode due to the layoutget.
      In that case, it is appropriate to return NFS4ERR_NOMATCHING_LAYOUT
      rather than NFS4ERR_DELAY, as the latter can end up deadlocking.
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1181e868
    • Chao Yu's avatar
      f2fs: fix to do sanity check with {sit,nat}_ver_bitmap_bytesize · 0983ef55
      Chao Yu authored
      [ Upstream commit c77ec61c ]
      
      This patch adds to do sanity check with {sit,nat}_ver_bitmap_bytesize
      during mount, in order to avoid accessing across cache boundary with
      this abnormal bitmap size.
      
      - Overview
      buffer overrun in build_sit_info() when mounting a crafted f2fs image
      
      - Reproduce
      
      - Kernel message
      [  548.580867] F2FS-fs (loop0): Invalid log blocks per segment (8201)
      
      [  548.580877] F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
      [  548.584979] ==================================================================
      [  548.586568] BUG: KASAN: use-after-free in kmemdup+0x36/0x50
      [  548.587715] Read of size 64 at addr ffff8801e9c265ff by task mount/1295
      
      [  548.589428] CPU: 1 PID: 1295 Comm: mount Not tainted 4.18.0-rc1+ #4
      [  548.589432] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  548.589438] Call Trace:
      [  548.589474]  dump_stack+0x7b/0xb5
      [  548.589487]  print_address_description+0x70/0x290
      [  548.589492]  kasan_report+0x291/0x390
      [  548.589496]  ? kmemdup+0x36/0x50
      [  548.589509]  check_memory_region+0x139/0x190
      [  548.589514]  memcpy+0x23/0x50
      [  548.589518]  kmemdup+0x36/0x50
      [  548.589545]  f2fs_build_segment_manager+0x8fa/0x3410
      [  548.589551]  ? __asan_loadN+0xf/0x20
      [  548.589560]  ? f2fs_sanity_check_ckpt+0x1be/0x240
      [  548.589566]  ? f2fs_flush_sit_entries+0x10c0/0x10c0
      [  548.589587]  ? __put_user_ns+0x40/0x40
      [  548.589604]  ? find_next_bit+0x57/0x90
      [  548.589610]  f2fs_fill_super+0x194b/0x2b40
      [  548.589617]  ? f2fs_commit_super+0x1b0/0x1b0
      [  548.589637]  ? set_blocksize+0x90/0x140
      [  548.589651]  mount_bdev+0x1c5/0x210
      [  548.589655]  ? f2fs_commit_super+0x1b0/0x1b0
      [  548.589667]  f2fs_mount+0x15/0x20
      [  548.589672]  mount_fs+0x60/0x1a0
      [  548.589683]  ? alloc_vfsmnt+0x309/0x360
      [  548.589688]  vfs_kern_mount+0x6b/0x1a0
      [  548.589699]  do_mount+0x34a/0x18c0
      [  548.589710]  ? lockref_put_or_lock+0xcf/0x160
      [  548.589716]  ? copy_mount_string+0x20/0x20
      [  548.589728]  ? memcg_kmem_put_cache+0x1b/0xa0
      [  548.589734]  ? kasan_check_write+0x14/0x20
      [  548.589740]  ? _copy_from_user+0x6a/0x90
      [  548.589744]  ? memdup_user+0x42/0x60
      [  548.589750]  ksys_mount+0x83/0xd0
      [  548.589755]  __x64_sys_mount+0x67/0x80
      [  548.589781]  do_syscall_64+0x78/0x170
      [  548.589797]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  548.589820] RIP: 0033:0x7f76fc331b9a
      [  548.589821] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
      [  548.589880] RSP: 002b:00007ffd4f0a0e48 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
      [  548.589890] RAX: ffffffffffffffda RBX: 000000000146c030 RCX: 00007f76fc331b9a
      [  548.589892] RDX: 000000000146c210 RSI: 000000000146df30 RDI: 0000000001474ec0
      [  548.589895] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
      [  548.589897] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000000001474ec0
      [  548.589900] R13: 000000000146c210 R14: 0000000000000000 R15: 0000000000000003
      
      [  548.590242] The buggy address belongs to the page:
      [  548.591243] page:ffffea0007a70980 count:0 mapcount:0 mapping:0000000000000000 index:0x0
      [  548.592886] flags: 0x2ffff0000000000()
      [  548.593665] raw: 02ffff0000000000 dead000000000100 dead000000000200 0000000000000000
      [  548.595258] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      [  548.603713] page dumped because: kasan: bad access detected
      
      [  548.605203] Memory state around the buggy address:
      [  548.606198]  ffff8801e9c26480: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      [  548.607676]  ffff8801e9c26500: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      [  548.609157] >ffff8801e9c26580: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      [  548.610629]                                                                 ^
      [  548.612088]  ffff8801e9c26600: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      [  548.613674]  ffff8801e9c26680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      [  548.615141] ==================================================================
      [  548.616613] Disabling lock debugging due to kernel taint
      [  548.622871] WARNING: CPU: 1 PID: 1295 at mm/page_alloc.c:4065 __alloc_pages_slowpath+0xe4a/0x1420
      [  548.622878] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
      [  548.623217] CPU: 1 PID: 1295 Comm: mount Tainted: G    B             4.18.0-rc1+ #4
      [  548.623219] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  548.623226] RIP: 0010:__alloc_pages_slowpath+0xe4a/0x1420
      [  548.623227] Code: ff ff 01 89 85 c8 fe ff ff e9 91 fc ff ff 41 89 c5 e9 5c fc ff ff 0f 0b 89 f8 25 ff ff f7 ff 89 85 8c fe ff ff e9 d5 f2 ff ff <0f> 0b e9 65 f2 ff ff 65 8b 05 38 81 d2 47 f6 c4 01 74 1c 65 48 8b
      [  548.623281] RSP: 0018:ffff8801f28c7678 EFLAGS: 00010246
      [  548.623284] RAX: 0000000000000000 RBX: 00000000006040c0 RCX: ffffffffb82f73b7
      [  548.623287] RDX: 1ffff1003e518eeb RSI: 000000000000000c RDI: 0000000000000000
      [  548.623290] RBP: ffff8801f28c7880 R08: 0000000000000000 R09: ffffed0047fff2c5
      [  548.623292] R10: 0000000000000001 R11: ffffed0047fff2c4 R12: ffff8801e88de040
      [  548.623295] R13: 00000000006040c0 R14: 000000000000000c R15: ffff8801f28c7938
      [  548.623299] FS:  00007f76fca51840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
      [  548.623302] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  548.623304] CR2: 00007f19b9171760 CR3: 00000001ed952000 CR4: 00000000000006e0
      [  548.623317] Call Trace:
      [  548.623325]  ? kasan_check_read+0x11/0x20
      [  548.623330]  ? __zone_watermark_ok+0x92/0x240
      [  548.623336]  ? get_page_from_freelist+0x1c3/0x1d90
      [  548.623347]  ? _raw_spin_lock_irqsave+0x2a/0x60
      [  548.623353]  ? warn_alloc+0x250/0x250
      [  548.623358]  ? save_stack+0x46/0xd0
      [  548.623361]  ? kasan_kmalloc+0xad/0xe0
      [  548.623366]  ? __isolate_free_page+0x2a0/0x2a0
      [  548.623370]  ? mount_fs+0x60/0x1a0
      [  548.623374]  ? vfs_kern_mount+0x6b/0x1a0
      [  548.623378]  ? do_mount+0x34a/0x18c0
      [  548.623383]  ? ksys_mount+0x83/0xd0
      [  548.623387]  ? __x64_sys_mount+0x67/0x80
      [  548.623391]  ? do_syscall_64+0x78/0x170
      [  548.623396]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  548.623401]  __alloc_pages_nodemask+0x3c5/0x400
      [  548.623407]  ? __alloc_pages_slowpath+0x1420/0x1420
      [  548.623412]  ? __mutex_lock_slowpath+0x20/0x20
      [  548.623417]  ? kvmalloc_node+0x31/0x80
      [  548.623424]  alloc_pages_current+0x75/0x110
      [  548.623436]  kmalloc_order+0x24/0x60
      [  548.623442]  kmalloc_order_trace+0x24/0xb0
      [  548.623448]  __kmalloc_track_caller+0x207/0x220
      [  548.623455]  ? f2fs_build_node_manager+0x399/0xbb0
      [  548.623460]  kmemdup+0x20/0x50
      [  548.623465]  f2fs_build_node_manager+0x399/0xbb0
      [  548.623470]  f2fs_fill_super+0x195e/0x2b40
      [  548.623477]  ? f2fs_commit_super+0x1b0/0x1b0
      [  548.623481]  ? set_blocksize+0x90/0x140
      [  548.623486]  mount_bdev+0x1c5/0x210
      [  548.623489]  ? f2fs_commit_super+0x1b0/0x1b0
      [  548.623495]  f2fs_mount+0x15/0x20
      [  548.623498]  mount_fs+0x60/0x1a0
      [  548.623503]  ? alloc_vfsmnt+0x309/0x360
      [  548.623508]  vfs_kern_mount+0x6b/0x1a0
      [  548.623513]  do_mount+0x34a/0x18c0
      [  548.623518]  ? lockref_put_or_lock+0xcf/0x160
      [  548.623523]  ? copy_mount_string+0x20/0x20
      [  548.623528]  ? memcg_kmem_put_cache+0x1b/0xa0
      [  548.623533]  ? kasan_check_write+0x14/0x20
      [  548.623537]  ? _copy_from_user+0x6a/0x90
      [  548.623542]  ? memdup_user+0x42/0x60
      [  548.623547]  ksys_mount+0x83/0xd0
      [  548.623552]  __x64_sys_mount+0x67/0x80
      [  548.623557]  do_syscall_64+0x78/0x170
      [  548.623562]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  548.623566] RIP: 0033:0x7f76fc331b9a
      [  548.623567] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
      [  548.623632] RSP: 002b:00007ffd4f0a0e48 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
      [  548.623636] RAX: ffffffffffffffda RBX: 000000000146c030 RCX: 00007f76fc331b9a
      [  548.623639] RDX: 000000000146c210 RSI: 000000000146df30 RDI: 0000000001474ec0
      [  548.623641] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
      [  548.623643] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000000001474ec0
      [  548.623646] R13: 000000000146c210 R14: 0000000000000000 R15: 0000000000000003
      [  548.623650] ---[ end trace 4ce02f25ff7d3df5 ]---
      [  548.623656] F2FS-fs (loop0): Failed to initialize F2FS node manager
      [  548.627936] F2FS-fs (loop0): Invalid log blocks per segment (8201)
      
      [  548.627940] F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
      [  548.635835] F2FS-fs (loop0): Failed to initialize F2FS node manager
      
      - Location
      https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/segment.c#L3578
      
      	sit_i->sit_bitmap = kmemdup(src_bitmap, bitmap_size, GFP_KERNEL);
      
      Buffer overrun happens when doing memcpy. I suspect there is missing (inconsistent) checks on bitmap_size.
      
      Reported by Wen Xu (wen.xu@gatech.edu) from SSLab, Gatech.
      Reported-by: default avatarWen Xu <wen.xu@gatech.edu>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0983ef55
    • Zumeng Chen's avatar
      mfd: ti_am335x_tscadc: Fix struct clk memory leak · 7beff543
      Zumeng Chen authored
      [ Upstream commit c2b1509c ]
      
      Use devm_elk_get() to let Linux manage struct clk memory to avoid the following
      memory leakage report:
      
      unreferenced object 0xdd75efc0 (size 64):
        comm "systemd-udevd", pid 186, jiffies 4294945126 (age 1195.750s)
        hex dump (first 32 bytes):
          61 64 63 5f 74 73 63 5f 66 63 6b 00 00 00 00 00  adc_tsc_fck.....
          00 00 00 00 92 03 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<c0a15260>] kmemleak_alloc+0x40/0x74
          [<c0287a10>] __kmalloc_track_caller+0x198/0x388
          [<c0255610>] kstrdup+0x40/0x5c
          [<c025565c>] kstrdup_const+0x30/0x3c
          [<c0636630>] __clk_create_clk+0x60/0xac
          [<c0630918>] clk_get_sys+0x74/0x144
          [<c0630cdc>] clk_get+0x5c/0x68
          [<bf0ac540>] ti_tscadc_probe+0x260/0x468 [ti_am335x_tscadc]
          [<c06f3c0c>] platform_drv_probe+0x60/0xac
          [<c06f1abc>] driver_probe_device+0x214/0x2dc
          [<c06f1c18>] __driver_attach+0x94/0xc0
          [<c06efe2c>] bus_for_each_dev+0x90/0xa0
          [<c06f1470>] driver_attach+0x28/0x30
          [<c06f1030>] bus_add_driver+0x184/0x1ec
          [<c06f2b74>] driver_register+0xb0/0xf0
          [<c06f3b4c>] __platform_driver_register+0x40/0x54
      Signed-off-by: default avatarZumeng Chen <zumeng.chen@gmail.com>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7beff543
    • Geert Uytterhoeven's avatar
      iommu/ipmmu-vmsa: Fix allocation in atomic context · b28c14ae
      Geert Uytterhoeven authored
      [ Upstream commit 46583e8c ]
      
      When attaching a device to an IOMMU group with
      CONFIG_DEBUG_ATOMIC_SLEEP=y:
      
          BUG: sleeping function called from invalid context at mm/slab.h:421
          in_atomic(): 1, irqs_disabled(): 128, pid: 61, name: kworker/1:1
          ...
          Call trace:
           ...
           arm_lpae_alloc_pgtable+0x114/0x184
           arm_64_lpae_alloc_pgtable_s1+0x2c/0x128
           arm_32_lpae_alloc_pgtable_s1+0x40/0x6c
           alloc_io_pgtable_ops+0x60/0x88
           ipmmu_attach_device+0x140/0x334
      
      ipmmu_attach_device() takes a spinlock, while arm_lpae_alloc_pgtable()
      allocates memory using GFP_KERNEL.  Originally, the ipmmu-vmsa driver
      had its own custom page table allocation implementation using
      GFP_ATOMIC, hence the spinlock was fine.
      
      Fix this by replacing the spinlock by a mutex, like the arm-smmu driver
      does.
      
      Fixes: f20ed39f ("iommu/ipmmu-vmsa: Use the ARM LPAE page table allocator")
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Reviewed-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b28c14ae
    • Dan Carpenter's avatar
      f2fs: Fix uninitialized return in f2fs_ioc_shutdown() · 1252c1da
      Dan Carpenter authored
      [ Upstream commit 2a96d8ad ]
      
      "ret" can be uninitialized on the success path when "in ==
      F2FS_GOING_DOWN_FULLSYNC".
      
      Fixes: 60b2b4ee ("f2fs: Fix deadlock in shutdown ioctl")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1252c1da
    • Chao Yu's avatar
      f2fs: fix to wait on page writeback before updating page · 9d54a48e
      Chao Yu authored
      [ Upstream commit 6aead161 ]
      
      In error path of f2fs_move_rehashed_dirents, inode page could be writeback
      state, so we should wait on inode page writeback before updating it.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9d54a48e