1. 11 Sep, 2018 1 commit
  2. 10 Sep, 2018 2 commits
  3. 09 Sep, 2018 10 commits
    • Taehee Yoo's avatar
      ip: frags: fix crash in ip_do_fragment() · 5d407b07
      Taehee Yoo authored
      A kernel crash occurrs when defragmented packet is fragmented
      in ip_do_fragment().
      In defragment routine, skb_orphan() is called and
      skb->ip_defrag_offset is set. but skb->sk and
      skb->ip_defrag_offset are same union member. so that
      frag->sk is not NULL.
      Hence crash occurrs in skb->sk check routine in ip_do_fragment() when
      defragmented packet is fragmented.
      
      test commands:
         %iptables -t nat -I POSTROUTING -j MASQUERADE
         %hping3 192.168.4.2 -s 1000 -p 2000 -d 60000
      
      splat looks like:
      [  261.069429] kernel BUG at net/ipv4/ip_output.c:636!
      [  261.075753] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [  261.083854] CPU: 1 PID: 1349 Comm: hping3 Not tainted 4.19.0-rc2+ #3
      [  261.100977] RIP: 0010:ip_do_fragment+0x1613/0x2600
      [  261.106945] Code: e8 e2 38 e3 fe 4c 8b 44 24 18 48 8b 74 24 08 e9 92 f6 ff ff 80 3c 02 00 0f 85 da 07 00 00 48 8b b5 d0 00 00 00 e9 25 f6 ff ff <0f> 0b 0f 0b 44 8b 54 24 58 4c 8b 4c 24 18 4c 8b 5c 24 60 4c 8b 6c
      [  261.127015] RSP: 0018:ffff8801031cf2c0 EFLAGS: 00010202
      [  261.134156] RAX: 1ffff1002297537b RBX: ffffed0020639e6e RCX: 0000000000000004
      [  261.142156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880114ba9bd8
      [  261.150157] RBP: ffff880114ba8a40 R08: ffffed0022975395 R09: ffffed0022975395
      [  261.158157] R10: 0000000000000001 R11: ffffed0022975394 R12: ffff880114ba9ca4
      [  261.166159] R13: 0000000000000010 R14: ffff880114ba9bc0 R15: dffffc0000000000
      [  261.174169] FS:  00007fbae2199700(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
      [  261.183012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  261.189013] CR2: 00005579244fe000 CR3: 0000000119bf4000 CR4: 00000000001006e0
      [  261.198158] Call Trace:
      [  261.199018]  ? dst_output+0x180/0x180
      [  261.205011]  ? save_trace+0x300/0x300
      [  261.209018]  ? ip_copy_metadata+0xb00/0xb00
      [  261.213034]  ? sched_clock_local+0xd4/0x140
      [  261.218158]  ? kill_l4proto+0x120/0x120 [nf_conntrack]
      [  261.223014]  ? rt_cpu_seq_stop+0x10/0x10
      [  261.227014]  ? find_held_lock+0x39/0x1c0
      [  261.233008]  ip_finish_output+0x51d/0xb50
      [  261.237006]  ? ip_fragment.constprop.56+0x220/0x220
      [  261.243011]  ? nf_ct_l4proto_register_one+0x5b0/0x5b0 [nf_conntrack]
      [  261.250152]  ? rcu_is_watching+0x77/0x120
      [  261.255010]  ? nf_nat_ipv4_out+0x1e/0x2b0 [nf_nat_ipv4]
      [  261.261033]  ? nf_hook_slow+0xb1/0x160
      [  261.265007]  ip_output+0x1c7/0x710
      [  261.269005]  ? ip_mc_output+0x13f0/0x13f0
      [  261.273002]  ? __local_bh_enable_ip+0xe9/0x1b0
      [  261.278152]  ? ip_fragment.constprop.56+0x220/0x220
      [  261.282996]  ? nf_hook_slow+0xb1/0x160
      [  261.287007]  raw_sendmsg+0x21f9/0x4420
      [  261.291008]  ? dst_output+0x180/0x180
      [  261.297003]  ? sched_clock_cpu+0x126/0x170
      [  261.301003]  ? find_held_lock+0x39/0x1c0
      [  261.306155]  ? stop_critical_timings+0x420/0x420
      [  261.311004]  ? check_flags.part.36+0x450/0x450
      [  261.315005]  ? _raw_spin_unlock_irq+0x29/0x40
      [  261.320995]  ? _raw_spin_unlock_irq+0x29/0x40
      [  261.326142]  ? cyc2ns_read_end+0x10/0x10
      [  261.330139]  ? raw_bind+0x280/0x280
      [  261.334138]  ? sched_clock_cpu+0x126/0x170
      [  261.338995]  ? check_flags.part.36+0x450/0x450
      [  261.342991]  ? __lock_acquire+0x4500/0x4500
      [  261.348994]  ? inet_sendmsg+0x11c/0x500
      [  261.352989]  ? dst_output+0x180/0x180
      [  261.357012]  inet_sendmsg+0x11c/0x500
      [ ... ]
      
      v2:
       - clear skb->sk at reassembly routine.(Eric Dumarzet)
      
      Fixes: fa0f5273 ("ip: use rb trees for IP frag queue.")
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d407b07
    • Vakul Garg's avatar
      net/tls: Set count of SG entries if sk_alloc_sg returns -ENOSPC · 52ea992c
      Vakul Garg authored
      tls_sw_sendmsg() allocates plaintext and encrypted SG entries using
      function sk_alloc_sg(). In case the number of SG entries hit
      MAX_SKB_FRAGS, sk_alloc_sg() returns -ENOSPC and sets the variable for
      current SG index to '0'. This leads to calling of function
      tls_push_record() with 'sg_encrypted_num_elem = 0' and later causes
      kernel crash. To fix this, set the number of SG elements to the number
      of elements in plaintext/encrypted SG arrays in case sk_alloc_sg()
      returns -ENOSPC.
      
      Fixes: 3c4d7559 ("tls: kernel TLS support")
      Signed-off-by: default avatarVakul Garg <vakul.garg@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52ea992c
    • David S. Miller's avatar
      Merge branch 'ena-fixes' · 0e1f4c76
      David S. Miller authored
      Netanel Belgazal says:
      
      ====================
      bug fixes for ENA Ethernet driver
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e1f4c76
    • Netanel Belgazal's avatar
      net: ena: fix incorrect usage of memory barriers · 37dff155
      Netanel Belgazal authored
      Added memory barriers where they were missing to support multiple
      architectures, and removed redundant ones.
      
      As part of removing the redundant memory barriers and improving
      performance, we moved to more relaxed versions of memory barriers,
      as well as to the more relaxed version of writel - writel_relaxed,
      while maintaining correctness.
      Signed-off-by: default avatarNetanel Belgazal <netanel@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37dff155
    • Netanel Belgazal's avatar
      net: ena: fix missing calls to READ_ONCE · 28abf4e9
      Netanel Belgazal authored
      Add READ_ONCE calls where necessary (for example when iterating
      over a memory field that gets updated by the hardware).
      Signed-off-by: default avatarNetanel Belgazal <netanel@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      28abf4e9
    • Netanel Belgazal's avatar
      net: ena: fix missing lock during device destruction · 944b28aa
      Netanel Belgazal authored
      acquire the rtnl_lock during device destruction to avoid
      using partially destroyed device.
      
      ena_remove() shares almost the same logic as ena_destroy_device(),
      so use ena_destroy_device() and avoid duplications.
      Signed-off-by: default avatarNetanel Belgazal <netanel@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      944b28aa
    • Netanel Belgazal's avatar
      net: ena: fix potential double ena_destroy_device() · fe870c77
      Netanel Belgazal authored
      ena_destroy_device() can potentially be called twice.
      To avoid this, check that the device is running and
      only then proceed destroying it.
      Signed-off-by: default avatarNetanel Belgazal <netanel@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe870c77
    • Netanel Belgazal's avatar
      net: ena: fix device destruction to gracefully free resources · cfa324a5
      Netanel Belgazal authored
      When ena_destroy_device() is called from ena_suspend(), the device is
      still reachable from the driver. Therefore, the driver can send a command
      to the device to free all resources.
      However, in all other cases of calling ena_destroy_device(), the device is
      potentially in an error state and unreachable from the driver. In these
      cases the driver must not send commands to the device.
      
      The current implementation does not request resource freeing from the
      device even when possible. We add the graceful parameter to
      ena_destroy_device() to enable resource freeing when possible, and
      use it in ena_suspend().
      Signed-off-by: default avatarNetanel Belgazal <netanel@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cfa324a5
    • Netanel Belgazal's avatar
      net: ena: fix driver when PAGE_SIZE == 64kB · ef5b0771
      Netanel Belgazal authored
      The buffer length field in the ena rx descriptor is 16 bit, and the
      current driver passes a full page in each ena rx descriptor.
      When PAGE_SIZE equals 64kB or more, the buffer length field becomes
      zero.
      To solve this issue, limit the ena Rx descriptor to use 16kB even
      when allocating 64kB kernel pages. This change would not impact ena
      device functionality, as 16kB is still larger than maximum MTU.
      Signed-off-by: default avatarNetanel Belgazal <netanel@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef5b0771
    • Netanel Belgazal's avatar
      net: ena: fix surprise unplug NULL dereference kernel crash · 772ed869
      Netanel Belgazal authored
      Starting with driver version 1.5.0, in case of a surprise device
      unplug, there is a race caused by invoking ena_destroy_device()
      from two different places. As a result, the readless register might
      be accessed after it was destroyed.
      Signed-off-by: default avatarNetanel Belgazal <netanel@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      772ed869
  4. 08 Sep, 2018 3 commits
  5. 07 Sep, 2018 2 commits
    • Maciej S. Szmigiero's avatar
      r8169: set TxConfig register after TX / RX is enabled, just like RxConfig · f74dd480
      Maciej S. Szmigiero authored
      Commit 3559d81e ("r8169: simplify rtl_hw_start_8169") changed order of
      two register writes:
      1) Caused RxConfig to be written before TX / RX is enabled,
      2) Caused TxConfig to be written before TX / RX is enabled.
      
      At least on XIDs 10000000 ("RTL8169sb/8110sb") and
      18000000 ("RTL8169sc/8110sc") such writes are ignored by the chip, leaving
      values in these registers intact.
      
      Change 1) was reverted by
      commit 05212ba8 ("r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices"),
      however change 2) wasn't.
      
      In practice, this caused TxConfig's "InterFrameGap time" and "Max DMA Burst
      Size per Tx DMA Burst" bits to be zero dramatically reducing TX performance
      (in my tests it dropped from around 500Mbps to around 50Mbps).
      
      This patch fixes the issue by moving TxConfig register write a bit later in
      the code so it happens after TX / RX is already enabled.
      
      Fixes: 05212ba8 ("r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices")
      Signed-off-by: default avatarMaciej S. Szmigiero <mail@maciej.szmigiero.name>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f74dd480
    • Cong Wang's avatar
      tipc: call start and done ops directly in __tipc_nl_compat_dumpit() · 8f5c5fcf
      Cong Wang authored
      __tipc_nl_compat_dumpit() uses a netlink_callback on stack,
      so the only way to align it with other ->dumpit() call path
      is calling tipc_dump_start() and tipc_dump_done() directly
      inside it. Otherwise ->dumpit() would always get NULL from
      cb->args[].
      
      But tipc_dump_start() uses sock_net(cb->skb->sk) to retrieve
      net pointer, the cb->skb here doesn't set skb->sk, the net pointer
      is saved in msg->net instead, so introduce a helper function
      __tipc_dump_start() to pass in msg->net.
      
      Ying pointed out cb->args[0...3] are already used by other
      callbacks on this call path, so we can't use cb->args[0] any
      more, use cb->args[4] instead.
      
      Fixes: 9a07efa9 ("tipc: switch to rhashtable iterator")
      Reported-and-tested-by: syzbot+e93a2c41f91b8e2c7d9b@syzkaller.appspotmail.com
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f5c5fcf
  6. 06 Sep, 2018 17 commits
  7. 05 Sep, 2018 3 commits
  8. 04 Sep, 2018 2 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 28619527
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Must perform TXQ teardown before unregistering interfaces in
          mac80211, from Toke Høiland-Jørgensen.
      
       2) Don't allow creating mac80211_hwsim with less than one channel, from
          Johannes Berg.
      
       3) Division by zero in cfg80211, fix from Johannes Berg.
      
       4) Fix endian issue in tipc, from Haiqing Bai.
      
       5) BPF sockmap use-after-free fixes from Daniel Borkmann.
      
       6) Spectre-v1 in mac80211_hwsim, from Jinbum Park.
      
       7) Missing rhashtable_walk_exit() in tipc, from Cong Wang.
      
       8) Revert kvzalloc() conversion of AF_PACKET, it breaks mmap() when
          kvzalloc() tries to use kmalloc() pages. From Eric Dumazet.
      
       9) Fix deadlock in hv_netvsc, from Dexuan Cui.
      
      10) Do not restart timewait timer on RST, from Florian Westphal.
      
      11) Fix double lwstate refcount grab in ipv6, from Alexey Kodanev.
      
      12) Unsolicit report count handling is off-by-one, fix from Hangbin Liu.
      
      13) Sleep-in-atomic in cadence driver, from Jia-Ju Bai.
      
      14) Respect ttl-inherit in ip6 tunnel driver, from Hangbin Liu.
      
      15) Use-after-free in act_ife, fix from Cong Wang.
      
      16) Missing hold to meta module in act_ife, from Vlad Buslov.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (91 commits)
        net: phy: sfp: Handle unimplemented hwmon limits and alarms
        net: sched: action_ife: take reference to meta module
        act_ife: fix a potential use-after-free
        net/mlx5: Fix SQ offset in QPs with small RQ
        tipc: correct spelling errors for tipc_topsrv_queue_evt() comments
        tipc: correct spelling errors for struct tipc_bc_base's comment
        bnxt_en: Do not adjust max_cp_rings by the ones used by RDMA.
        bnxt_en: Clean up unused functions.
        bnxt_en: Fix firmware signaled resource change logic in open.
        sctp: not traverse asoc trans list if non-ipv6 trans exists for ipv6_flowlabel
        sctp: fix invalid reference to the index variable of the iterator
        net/ibm/emac: wrong emac_calc_base call was used by typo
        net: sched: null actions array pointer before releasing action
        vhost: fix VHOST_GET_BACKEND_FEATURES ioctl request definition
        r8169: add support for NCube 8168 network card
        ip6_tunnel: respect ttl inherit for ip6tnl
        mac80211: shorten the IBSS debug messages
        mac80211: don't Tx a deauth frame if the AP forbade Tx
        mac80211: Fix station bandwidth setting after channel switch
        mac80211: fix a race between restart and CSA flows
        ...
      28619527
    • Andrew Lunn's avatar
      net: phy: sfp: Handle unimplemented hwmon limits and alarms · a33710bd
      Andrew Lunn authored
      Not all SFPs implement the registers containing sensor limits and
      alarms. Luckily, there is a bit indicating if they are implemented or
      not. Add checking for this bit, when deciding if the hwmon attributes
      should be visible.
      
      Fixes: 1323061a ("net: phy: sfp: Add HWMON support for module sensors")
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a33710bd