1. 03 May, 2017 2 commits
    • Ingo Molnar's avatar
      Merge tag 'perf-core-for-mingo-4.12-20170503' of... · 12c1c2fd
      Ingo Molnar authored
      Merge tag 'perf-core-for-mingo-4.12-20170503' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      Fixes:
      
      - Support setting probes in versioned user space symbols, such as
        pthread_create@@GLIBC_2.1, picking the default one, more work
        needed to make it possible to set it on the other versions, as
        the 'perf probe' syntax already uses @ for other purposes.
        (Paul Clarke)
      
      - Do not special case address zero as an error for routines that
        return addresses (symbol lookup), instead use the return as the
        success/error indication and pass a pointer to return the address,
        fixing 'perf test vmlinux' (the one that compares address between
        vmlinux and kallsyms) on s/390, where the '_text' address is equal
        to zero (Arnaldo Carvalho de Melo)
      
      Infrastructure changes:
      
      - More header sanitization, moving stuff out of util.h into
        more appropriate headers and objects and sometimes creating
        new ones (Arnaldo Carvalho de Melo)
      
      - Refactor a duplicated code for obtaining config file name (Taeung Song)
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      12c1c2fd
    • Vince Weaver's avatar
      perf/x86: Fix Broadwell-EP DRAM RAPL events · 33b88e70
      Vince Weaver authored
      It appears as though the Broadwell-EP DRAM units share the special
      units quirk with Haswell-EP/KNL.
      
      Without this patch, you get really high results (a single DRAM using 20W
      of power).
      
      The powercap driver in drivers/powercap/intel_rapl.c already has this
      change.
      Signed-off-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      33b88e70
  2. 02 May, 2017 3 commits
    • Taeung Song's avatar
      perf config: Refactor a duplicated code for obtaining config file name · 4341ec6b
      Taeung Song authored
      We were doing the same sequence to figure out what is the config
      pathname to use, fix it by doing it before those two uses.
      Signed-off-by: default avatarTaeung Song <treeze.taeung@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lkml.kernel.org/r/1493209268-5543-2-git-send-email-treeze.taeung@gmail.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4341ec6b
    • Paul Clarke's avatar
      perf symbols: Allow user probes on versioned symbols · d8040645
      Paul Clarke authored
      Symbol versioning, as in glibc, results in symbols being defined as:
      
        <real symbol>@[@]<version>
      
      (Note that "@@" identifies a default symbol, if the symbol name is
      repeated.)
      
      perf is currently unable to deal with this, and is unable to create user
      probes at such symbols:
      
        --
        $ nm /lib/powerpc64le-linux-gnu/libpthread.so.0 | grep pthread_create
        0000000000008d30 t __pthread_create_2_1
        0000000000008d30 T pthread_create@@GLIBC_2.17
        $ /usr/bin/sudo perf probe -v -x /lib/powerpc64le-linux-gnu/libpthread.so.0 pthread_create
        probe-definition(0): pthread_create
        symbol:pthread_create file:(null) line:0 offset:0 return:0 lazy:(null)
        0 arguments
        Open Debuginfo file: /usr/lib/debug/lib/powerpc64le-linux-gnu/libpthread-2.19.so
        Try to find probe point from debuginfo.
        Probe point 'pthread_create' not found.
           Error: Failed to add events. Reason: No such file or directory (Code: -2)
        --
      
      One is not able to specify the fully versioned symbol, either, due to
      syntactic conflicts with other uses of "@" by perf:
      
        --
        $ /usr/bin/sudo perf probe -v -x /lib/powerpc64le-linux-gnu/libpthread.so.0 pthread_create@@GLIBC_2.17
        probe-definition(0): pthread_create@@GLIBC_2.17
        Semantic error :SRC@SRC is not allowed.
        0 arguments
           Error: Command Parse Error. Reason: Invalid argument (Code: -22)
        --
      
      This patch ignores versioning for default symbols, thus allowing probes to be
      created for these symbols:
      
        --
        $ /usr/bin/sudo ./perf probe -x /lib/powerpc64le-linux-gnu/libpthread.so.0 pthread_create
        Added new event:
           probe_libpthread:pthread_create (on pthread_create in /lib/powerpc64le-linux-gnu/libpthread-2.19.so)
      
        You can now use it in all perf tools, such as:
      
                 perf record -e probe_libpthread:pthread_create -aR sleep 1
      
        $ /usr/bin/sudo ./perf record -e probe_libpthread:pthread_create -aR ./test 2
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.052 MB perf.data (2 samples) ]
        $ /usr/bin/sudo ./perf script
                     test  2915 [000] 19124.260729: probe_libpthread:pthread_create: (3fff99248d38)
                     test  2916 [000] 19124.260962: probe_libpthread:pthread_create: (3fff99248d38)
        $ /usr/bin/sudo ./perf probe --del=probe_libpthread:pthread_create
        Removed event: probe_libpthread:pthread_create
        --
      
      Committer note:
      
      Change the variable storing the result of strlen() to 'int', to fix the build
      on debian:experimental-x-mipsel, fedora:24-x-ARC-uClibc, ubuntu:16.04-x-arm,
      etc:
      
        util/symbol.c: In function 'symbol__match_symbol_name':
        util/symbol.c:422:11: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
           if (len < versioning - name)
                   ^
      Signed-off-by: default avatarPaul A. Clarke <pc@us.ibm.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Link: http://lkml.kernel.org/r/c2b18d9c-17f8-9285-4868-f58b6359ccac@us.ibm.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      d8040645
    • Arnaldo Carvalho de Melo's avatar
      perf symbols: Accept symbols starting at address 0 · b843f62a
      Arnaldo Carvalho de Melo authored
      That is the case of _text on s390, and we have some functions that return an
      address, using address zero to report problems, oops.
      
      This would lead the symbol loading routines to not use "_text" as the reference
      relocation symbol, or the first symbol for the kernel, but use instead
      "_stext", that is at the same address on x86_64 and others, but not on s390:
      
        [acme@localhost perf-4.11.0-rc6]$ head -15 /proc/kallsyms
        0000000000000000 T _text
        0000000000000418 t iplstart
        0000000000000800 T start
        000000000000080a t .base
        000000000000082e t .sk8x8
        0000000000000834 t .gotr
        0000000000000842 t .cmd
        0000000000000846 t .parm
        000000000000084a t .lowcase
        0000000000010000 T startup
        0000000000010010 T startup_kdump
        0000000000010214 t startup_kdump_relocated
        0000000000011000 T startup_continue
        00000000000112a0 T _ehead
        0000000000100000 T _stext
        [acme@localhost perf-4.11.0-rc6]$
      
      Which in turn would make 'perf test vmlinux' to fail because it wouldn't find
      the symbols before "_stext" in kallsyms.
      
      Fix it by using the return value only for errors and storing the
      address, when the symbol is successfully found, in a provided pointer
      arg.
      
      Before this patch:
      
      After:
      
        [acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
         1: vmlinux symtab matches kallsyms            :
        --- start ---
        test child forked, pid 40693
        Looking at the vmlinux_path (8 entries long)
        Using /usr/lib/debug/lib/modules/3.10.0-654.el7.s390x/vmlinux for symbols
        ERR : 0: _text not on kallsyms
        ERR : 0x418: iplstart not on kallsyms
        ERR : 0x800: start not on kallsyms
        ERR : 0x80a: .base not on kallsyms
        ERR : 0x82e: .sk8x8 not on kallsyms
        ERR : 0x834: .gotr not on kallsyms
        ERR : 0x842: .cmd not on kallsyms
        ERR : 0x846: .parm not on kallsyms
        ERR : 0x84a: .lowcase not on kallsyms
        ERR : 0x10000: startup not on kallsyms
        ERR : 0x10010: startup_kdump not on kallsyms
        ERR : 0x10214: startup_kdump_relocated not on kallsyms
        ERR : 0x11000: startup_continue not on kallsyms
        ERR : 0x112a0: _ehead not on kallsyms
        <SNIP warnings>
        test child finished with -1
        ---- end ----
        vmlinux symtab matches kallsyms: FAILED!
        [acme@localhost perf-4.11.0-rc6]$
      
      After:
      
        [acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
         1: vmlinux symtab matches kallsyms            :
        --- start ---
        test child forked, pid 47160
        <SNIP warnings>
        test child finished with 0
        ---- end ----
        vmlinux symtab matches kallsyms: Ok
        [acme@localhost perf-4.11.0-rc6]$
      Reported-by: default avatarMichael Petlan <mpetlan@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-9x9bwgd3btwdk1u51xie93fz@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      b843f62a
  3. 01 May, 2017 1 commit
  4. 30 Apr, 2017 3 commits
  5. 29 Apr, 2017 3 commits
  6. 28 Apr, 2017 15 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 0e911788
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Just a couple more stragglers, I really hope this is it.
      
        1) Don't let frags slip down into the GRO segmentation handlers, from
           Steffen Klassert.
      
        2) Truesize under-estimation triggers warnings in TCP over loopback
           with socket filters, 2 part fix from Eric Dumazet.
      
        3) Fix undesirable reset of bonding MTU to ETH_HLEN on slave removal,
           from Paolo Abeni.
      
        4) If we flush the XFRM policy after garbage collection, it doesn't
           work because stray entries can be created afterwards. Fix from Xin
           Long.
      
        5) Hung socket connection fixes in TIPC from Parthasarathy Bhuvaragan.
      
        6) Fix GRO regression with IPSEC when netfilter is disabled, from
           Sabrina Dubroca.
      
        7) Fix cpsw driver Kconfig dependency regression, from Arnd Bergmann"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        net: hso: register netdev later to avoid a race condition
        net: adjust skb->truesize in ___pskb_trim()
        tcp: do not underestimate skb->truesize in tcp_trim_head()
        bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal
        ipv4: Don't pass IP fragments to upper layer GRO handlers.
        cpsw/netcp: refine cpts dependency
        tipc: close the connection if protocol messages contain errors
        tipc: improve error validations for sockets in CONNECTING state
        tipc: Fix missing connection request handling
        xfrm: fix GRO for !CONFIG_NETFILTER
        xfrm: do the garbage collection after flushing policy
      0e911788
    • Andreas Kemnade's avatar
      net: hso: register netdev later to avoid a race condition · 4c761daf
      Andreas Kemnade authored
      If the netdev is accessed before the urbs are initialized,
      there will be NULL pointer dereferences. That is avoided by
      registering it when it is fully initialized.
      
      This case occurs e.g. if dhcpcd is running in the background
      and the device is probed, either after insmod hso or
      when the device appears on the usb bus.
      
      A backtrace is the following:
      
      [ 1357.356048] usb 1-2: new high-speed USB device number 12 using ehci-omap
      [ 1357.551177] usb 1-2: New USB device found, idVendor=0af0, idProduct=8800
      [ 1357.558654] usb 1-2: New USB device strings: Mfr=3, Product=2, SerialNumber=0
      [ 1357.568572] usb 1-2: Product: Globetrotter HSUPA Modem
      [ 1357.574096] usb 1-2: Manufacturer: Option N.V.
      [ 1357.685882] hso 1-2:1.5: Not our interface
      [ 1460.886352] hso: unloaded
      [ 1460.889984] usbcore: deregistering interface driver hso
      [ 1513.769134] hso: ../drivers/net/usb/hso.c: Option Wireless
      [ 1513.846771] Unable to handle kernel NULL pointer dereference at virtual address 00000030
      [ 1513.887664] hso 1-2:1.5: Not our interface
      [ 1513.906890] usbcore: registered new interface driver hso
      [ 1513.937988] pgd = ecdec000
      [ 1513.949890] [00000030] *pgd=acd15831, *pte=00000000, *ppte=00000000
      [ 1513.956573] Internal error: Oops: 817 [#1] PREEMPT SMP ARM
      [ 1513.962371] Modules linked in: hso usb_f_ecm omap2430 bnep bluetooth g_ether usb_f_rndis u_ether libcomposite configfs ipv6 arc4 wl18xx wlcore mac80211 cfg80211 bq27xxx_battery panel_tpo_td028ttec1 omapdrm drm_kms_helper cfbfillrect snd_soc_simple_card syscopyarea cfbimgblt snd_soc_simple_card_utils sysfillrect sysimgblt fb_sys_fops snd_soc_omap_twl4030 cfbcopyarea encoder_opa362 drm twl4030_madc_hwmon wwan_on_off snd_soc_gtm601 pwm_omap_dmtimer generic_adc_battery connector_analog_tv pwm_bl extcon_gpio omap3_isp wlcore_sdio videobuf2_dma_contig videobuf2_memops w1_bq27000 videobuf2_v4l2 videobuf2_core omap_hdq snd_soc_omap_mcbsp ov9650 snd_soc_omap bmp280_i2c bmg160_i2c v4l2_common snd_pcm_dmaengine bmp280 bmg160_core at24 bmc150_magn_i2c nvmem_core videodev phy_twl4030_usb bmc150_accel_i2c tsc2007
      [ 1514.037384]  bmc150_magn bmc150_accel_core media leds_tca6507 bno055 industrialio_triggered_buffer kfifo_buf gpio_twl4030 musb_hdrc snd_soc_twl4030 twl4030_vibra twl4030_madc twl4030_pwrbutton twl4030_charger industrialio w2sg0004 ehci_omap omapdss [last unloaded: hso]
      [ 1514.062622] CPU: 0 PID: 3433 Comm: dhcpcd Tainted: G        W       4.11.0-rc8-letux+ #1
      [ 1514.071136] Hardware name: Generic OMAP36xx (Flattened Device Tree)
      [ 1514.077758] task: ee748240 task.stack: ecdd6000
      [ 1514.082580] PC is at hso_start_net_device+0x50/0xc0 [hso]
      [ 1514.088287] LR is at hso_net_open+0x68/0x84 [hso]
      [ 1514.093231] pc : [<bf79c304>]    lr : [<bf79ced8>]    psr: a00f0013
      sp : ecdd7e20  ip : 00000000  fp : ffffffff
      [ 1514.105316] r10: 00000000  r9 : ed0e080c  r8 : ecd8fe2c
      [ 1514.110839] r7 : bf79cef4  r6 : ecd8fe00  r5 : 00000000  r4 : ed0dbd80
      [ 1514.117706] r3 : 00000000  r2 : c0020c80  r1 : 00000000  r0 : ecdb7800
      [ 1514.124572] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
      [ 1514.132110] Control: 10c5387d  Table: acdec019  DAC: 00000051
      [ 1514.138153] Process dhcpcd (pid: 3433, stack limit = 0xecdd6218)
      [ 1514.144470] Stack: (0xecdd7e20 to 0xecdd8000)
      [ 1514.149078] 7e20: ed0dbd80 ecd8fe98 00000001 00000000 ecd8f800 ecd8fe00 ecd8fe60 00000000
      [ 1514.157714] 7e40: ed0e080c bf79ced8 bf79ce70 ecd8f800 00000001 bf7a0258 ecd8f830 c068d958
      [ 1514.166320] 7e60: c068d8b8 ecd8f800 00000001 00001091 00001090 c068dba4 ecd8f800 00001090
      [ 1514.174926] 7e80: ecd8f940 ecd8f800 00000000 c068dc60 00000000 00000001 ed0e0800 ecd8f800
      [ 1514.183563] 7ea0: 00000000 c06feaa8 c0ca39c2 beea57dc 00000020 00000000 306f7368 00000000
      [ 1514.192169] 7ec0: 00000000 00000000 00001091 00000000 00000000 00000000 00000000 00008914
      [ 1514.200805] 7ee0: eaa9ab60 beea57dc c0c9bfc0 eaa9ab40 00000006 00000000 00046858 c066a948
      [ 1514.209411] 7f00: beea57dc eaa9ab60 ecc6b0c0 c02837b0 00000006 c0282c90 0000c000 c0283654
      [ 1514.218017] 7f20: c09b0c00 c098bc31 00000001 c0c5e513 c0c5e513 00000000 c0151354 c01a20c0
      [ 1514.226654] 7f40: c0c5e513 c01a3134 ecdd6000 c01a3160 ee7487f0 600f0013 00000000 ee748240
      [ 1514.235260] 7f60: ee748734 00000000 ecc6b0c0 ecc6b0c0 beea57dc 00008914 00000006 00000000
      [ 1514.243896] 7f80: 00046858 c02837b0 00001091 0003a1f0 00046608 0003a248 00000036 c01071e4
      [ 1514.252502] 7fa0: ecdd6000 c0107040 0003a1f0 00046608 00000006 00008914 beea57dc 00001091
      [ 1514.261108] 7fc0: 0003a1f0 00046608 0003a248 00000036 0003ac0c 00046608 00046610 00046858
      [ 1514.269744] 7fe0: 0003a0ac beea57d4 000167eb b6f23106 400f0030 00000006 00000000 00000000
      [ 1514.278411] [<bf79c304>] (hso_start_net_device [hso]) from [<bf79ced8>] (hso_net_open+0x68/0x84 [hso])
      [ 1514.288238] [<bf79ced8>] (hso_net_open [hso]) from [<c068d958>] (__dev_open+0xa0/0xf4)
      [ 1514.296600] [<c068d958>] (__dev_open) from [<c068dba4>] (__dev_change_flags+0x8c/0x130)
      [ 1514.305023] [<c068dba4>] (__dev_change_flags) from [<c068dc60>] (dev_change_flags+0x18/0x48)
      [ 1514.313934] [<c068dc60>] (dev_change_flags) from [<c06feaa8>] (devinet_ioctl+0x348/0x714)
      [ 1514.322540] [<c06feaa8>] (devinet_ioctl) from [<c066a948>] (sock_ioctl+0x2b0/0x308)
      [ 1514.330627] [<c066a948>] (sock_ioctl) from [<c0282c90>] (vfs_ioctl+0x20/0x34)
      [ 1514.338165] [<c0282c90>] (vfs_ioctl) from [<c0283654>] (do_vfs_ioctl+0x82c/0x93c)
      [ 1514.346038] [<c0283654>] (do_vfs_ioctl) from [<c02837b0>] (SyS_ioctl+0x4c/0x74)
      [ 1514.353759] [<c02837b0>] (SyS_ioctl) from [<c0107040>] (ret_fast_syscall+0x0/0x1c)
      [ 1514.361755] Code: e3822103 e3822080 e1822781 e5981014 (e5832030)
      [ 1514.510833] ---[ end trace dfb3e53c657f34a0 ]---
      Reported-by: default avatarH. Nikolaus Schaller <hns@goldelico.com>
      Signed-off-by: default avatarAndreas Kemnade <andreas@kemnade.info>
      Reviewed-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4c761daf
    • Eric Dumazet's avatar
      net: adjust skb->truesize in ___pskb_trim() · c21b48cc
      Eric Dumazet authored
      Andrey found a way to trigger the WARN_ON_ONCE(delta < len) in
      skb_try_coalesce() using syzkaller and a filter attached to a TCP
      socket.
      
      As we did recently in commit 158f323b ("net: adjust skb->truesize in
      pskb_expand_head()") we can adjust skb->truesize from ___pskb_trim(),
      via a call to skb_condense().
      
      If all frags were freed, then skb->truesize can be recomputed.
      
      This call can be done if skb is not yet owned, or destructor is
      sock_edemux().
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c21b48cc
    • Eric Dumazet's avatar
      tcp: do not underestimate skb->truesize in tcp_trim_head() · 7162fb24
      Eric Dumazet authored
      Andrey found a way to trigger the WARN_ON_ONCE(delta < len) in
      skb_try_coalesce() using syzkaller and a filter attached to a TCP
      socket over loopback interface.
      
      I believe one issue with looped skbs is that tcp_trim_head() can end up
      producing skb with under estimated truesize.
      
      It hardly matters for normal conditions, since packets sent over
      loopback are never truncated.
      
      Bytes trimmed from skb->head should not change skb truesize, since
      skb->head is not reallocated.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7162fb24
    • Paolo Abeni's avatar
      bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal · 19cdead3
      Paolo Abeni authored
      On slave list updates, the bonding driver computes its hard_header_len
      as the maximum of all enslaved devices's hard_header_len.
      If the slave list is empty, e.g. on last enslaved device removal,
      ETH_HLEN is used.
      
      Since the bonding header_ops are set only when the first enslaved
      device is attached, the above can lead to header_ops->create()
      being called with the wrong skb headroom in place.
      
      If bond0 is configured on top of ipoib devices, with the
      following commands:
      
      ifup bond0
      for slave in $BOND_SLAVES_LIST; do
      	ip link set dev $slave nomaster
      done
      ping -c 1 <ip on bond0 subnet>
      
      we will obtain a skb_under_panic() with a similar call trace:
      	skb_push+0x3d/0x40
      	push_pseudo_header+0x17/0x30 [ib_ipoib]
      	ipoib_hard_header+0x4e/0x80 [ib_ipoib]
      	arp_create+0x12f/0x220
      	arp_send_dst.part.19+0x28/0x50
      	arp_solicit+0x115/0x290
      	neigh_probe+0x4d/0x70
      	__neigh_event_send+0xa7/0x230
      	neigh_resolve_output+0x12e/0x1c0
      	ip_finish_output2+0x14b/0x390
      	ip_finish_output+0x136/0x1e0
      	ip_output+0x76/0xe0
      	ip_local_out+0x35/0x40
      	ip_send_skb+0x19/0x40
      	ip_push_pending_frames+0x33/0x40
      	raw_sendmsg+0x7d3/0xb50
      	inet_sendmsg+0x31/0xb0
      	sock_sendmsg+0x38/0x50
      	SYSC_sendto+0x102/0x190
      	SyS_sendto+0xe/0x10
      	do_syscall_64+0x67/0x180
      	entry_SYSCALL64_slow_path+0x25/0x25
      
      This change addresses the issue avoiding updating the bonding device
      hard_header_len when the slaves list become empty, forbidding to
      shrink it below the value used by header_ops->create().
      
      The bug is there since commit 54ef3137 ("[PATCH] bonding: Handle large
      hard_header_len") but the panic can be triggered only since
      commit fc791b63 ("IB/ipoib: move back IB LL address into the hard
      header").
      Reported-by: default avatarNorbert P <noe@physik.uzh.ch>
      Fixes: 54ef3137 ("[PATCH] bonding: Handle large hard_header_len")
      Fixes: fc791b63 ("IB/ipoib: move back IB LL address into the hard header")
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19cdead3
    • Steffen Klassert's avatar
      ipv4: Don't pass IP fragments to upper layer GRO handlers. · 9b83e031
      Steffen Klassert authored
      Upper layer GRO handlers can not handle IP fragments, so
      exit GRO processing in this case.
      
      This fixes ESP GRO because the packet must be reassembled
      before we can decapsulate, otherwise we get authentication
      failures.
      
      It also aligns IPv4 to IPv6 where packets with fragmentation
      headers are not passed to upper layer GRO handlers.
      
      Fixes: 7785bba2 ("esp: Add a software GRO codepath")
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b83e031
    • Arnd Bergmann's avatar
      cpsw/netcp: refine cpts dependency · 504926df
      Arnd Bergmann authored
      Tony Lindgren reports a kernel oops that resulted from my compile-time
      fix on the default config. This shows two problems:
      
      a) configurations that did not already enable PTP_1588_CLOCK will
         now miss the cpts driver
      
      b) when cpts support is disabled, the driver crashes. This is a
         preexisting problem that we did not notice before my patch.
      
      While the second problem is still being investigated, this modifies
      the dependencies again, getting us back to the original state, with
      another 'select NET_PTP_CLASSIFY' added in to avoid the original
      link error we got, and the 'depends on POSIX_TIMERS' to hide
      the CPTS support when turning it on would be useless.
      
      Cc: stable@vger.kernel.org # 4.11 needs this
      Fixes: 07fef362 ("cpsw/netcp: cpts depends on posix_timers")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Tested-by: default avatarTony Lindgren <tony@atomide.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      504926df
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 5577e679
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2017-04-28
      
      1) Do garbage collecting after a policy flush to remove old
         bundles immediately. From Xin Long.
      
      2) Fix GRO if netfilter is not defined.
         From Sabrina Dubroca.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5577e679
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · affb852d
      Linus Torvalds authored
      Pull input fix from Dmitry Torokhov:
       "Yet another quirk to i8042 to get touchpad recognized on some laptops"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: i8042 - add Clevo P650RS to the i8042 reset list
      affb852d
    • Arnd Bergmann's avatar
      clk: sunxi-ng: always select CCU_GATE · 36c02d0b
      Arnd Bergmann authored
      When the base driver is enabled but all SoC specific drivers are turned
      off, we now get a build error after code was added to always refer to the
      clk gates:
      
      drivers/clk/built-in.o: In function `ccu_pll_notifier_cb':
      :(.text+0x154f8): undefined reference to `ccu_gate_helper_disable'
      :(.text+0x15504): undefined reference to `ccu_gate_helper_enable'
      
      This changes the Kconfig to always require the gate code to be built-in
      when CONFIG_SUNXI_CCU is set.
      
      Fixes: 02ae2bc6 ("clk: sunxi-ng: Add clk notifier to gate then ungate PLL clocks")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarMaxime Ripard <maxime.ripard@free-electrons.com>
      Signed-off-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      36c02d0b
    • Linus Torvalds's avatar
      Merge branch 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 28b20135
      Linus Torvalds authored
      Pull btrfs fix from Chris Mason:
       "We have one more fix for btrfs.
      
        This gets rid of a new WARN_ON from rc1 that ended up making more
        noise than we really want. The larger fix for the underflow got
        delayed a bit and it's better for now to put it under
        CONFIG_BTRFS_DEBUG"
      
      * 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        btrfs: qgroup: move noisy underflow warning to debugging build
      28b20135
    • David S. Miller's avatar
      Merge branch 'tipc-socket-connection-hangs' · c5184717
      David S. Miller authored
      Parthasarathy Bhuvaragan says:
      
      ====================
      tipc: fix hanging socket connections
      
      This patch series contains fixes for the socket layer to
      prevent hanging / stale connections.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5184717
    • Parthasarathy Bhuvaragan's avatar
      tipc: close the connection if protocol messages contain errors · c1be7756
      Parthasarathy Bhuvaragan authored
      When a socket is shutting down, we notify the peer node about the
      connection termination by reusing an incoming message if possible.
      If the last received message was a connection acknowledgment
      message, we reverse this message and set the error code to
      TIPC_ERR_NO_PORT and send it to peer.
      
      In tipc_sk_proto_rcv(), we never check for message errors while
      processing the connection acknowledgment or probe messages. Thus
      this message performs the usual flow control accounting and leaves
      the session hanging.
      
      In this commit, we terminate the connection when we receive such
      error messages.
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1be7756
    • Parthasarathy Bhuvaragan's avatar
      tipc: improve error validations for sockets in CONNECTING state · 4e0df495
      Parthasarathy Bhuvaragan authored
      Until now, the checks for sockets in CONNECTING state was based on
      the assumption that the incoming message was always from the
      peer's accepted data socket.
      
      However an application using a non-blocking socket sends an implicit
      connect, this socket which is in CONNECTING state can receive error
      messages from the peer's listening socket. As we discard these
      messages, the application socket hangs as there due to inactivity.
      In addition to this, there are other places where we process errors
      but do not notify the user.
      
      In this commit, we process such incoming error messages and notify
      our users about them using sk_state_change().
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e0df495
    • Parthasarathy Bhuvaragan's avatar
      tipc: Fix missing connection request handling · 42b531de
      Parthasarathy Bhuvaragan authored
      In filter_connect, we use waitqueue_active() to check for any
      connections to wakeup. But waitqueue_active() is missing memory
      barriers while accessing the critical sections, leading to
      inconsistent results.
      
      In this commit, we replace this with an SMP safe wq_has_sleeper()
      using the generic socket callback sk_data_ready().
      Signed-off-by: default avatarParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Reviewed-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      42b531de
  7. 27 Apr, 2017 7 commits
    • Linus Torvalds's avatar
      Merge tag 'nfsd-4.11-3' of git://linux-nfs.org/~bfields/linux · 8b5d11e4
      Linus Torvalds authored
      Pull nfsd fixes from Bruce Fields:
       "Thanks to Ari Kauppi and Tuomas Haanpää at Synopsis for spotting bugs
        in our NFSv2/v3 xdr code that could crash the server or leak memory"
      
      * tag 'nfsd-4.11-3' of git://linux-nfs.org/~bfields/linux:
        nfsd: stricter decoding of write-like NFSv2/v3 ops
        nfsd4: minor NFSv2/v3 write decoding cleanup
        nfsd: check for oversized NFSv2/v3 arguments
      8b5d11e4
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-4.11-rc9' of git://github.com/ceph/ceph-client · 19ac4474
      Linus Torvalds authored
      Pull ceph fix from Ilya Dryomov:
       "A fix for a kernel stack overflow bug in ceph setattr code, marked for
        stable"
      
      * tag 'ceph-for-4.11-rc9' of git://github.com/ceph/ceph-client:
        ceph: fix recursion between ceph_set_acl() and __ceph_setattr()
      19ac4474
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · f56fc7bd
      Linus Torvalds authored
      Pull vfs fixes from Al Viro:
      
       - fix orangefs handling of faults on write() - I'd missed that one back
         when orangefs was going through review.
      
       - readdir counterpart of "9p: cope with bogus responses from server in
         p9_client_{read,write}" - server might be lying or broken, and we'd
         better not overrun the kmalloc'ed buffer we are copying the results
         into.
      
       - NFS O_DIRECT read/write can leave iov_iter advanced by too much;
         that's what had been causing iov_iter_pipe() warnings davej had been
         seeing.
      
       - statx_timestamp.tv_nsec type fix (s32 -> u32). That one really should
         go in before 4.11.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        uapi: change the type of struct statx_timestamp.tv_nsec to unsigned
        fix nfs O_DIRECT advancing iov_iter too much
        p9_client_readdir() fix
        orangefs_bufmap_copy_from_iovec(): fix EFAULT handling
      f56fc7bd
    • Michael Kerrisk (man-pages)'s avatar
      statx: correct error handling of NULL pathname · 59372bbf
      Michael Kerrisk (man-pages) authored
      The change in commit 1e2f82d1 ("statx: Kill fd-with-NULL-path
      support in favour of AT_EMPTY_PATH") to error on a NULL pathname to
      statx() is inconsistent.
      
      It results in the error EINVAL for a NULL pathname.  Other system calls
      with similar APIs (fchownat(), fstatat(), linkat()), return EFAULT.
      
      The solution is simply to remove the EINVAL check.  As I already pointed
      out in [1], user_path_at*() and filename_lookup() will handle the NULL
      pathname as per the other APIs, to correctly produce the error EFAULT.
      
      [1] https://lkml.org/lkml/2017/4/26/561Signed-off-by: default avatarMichael Kerrisk <mtk.manpages@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Eric Sandeen <sandeen@sandeen.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      59372bbf
    • Sabrina Dubroca's avatar
      xfrm: fix GRO for !CONFIG_NETFILTER · cfcf99f9
      Sabrina Dubroca authored
      In xfrm_input() when called from GRO, async == 0, and we end up
      skipping the processing in xfrm4_transport_finish(). GRO path will
      always skip the NF_HOOK, so we don't need the special-case for
      !NETFILTER during GRO processing.
      
      Fixes: 7785bba2 ("esp: Add a software GRO codepath")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      cfcf99f9
    • Frederic Weisbecker's avatar
      sched/cputime: Fix ksoftirqd cputime accounting regression · 25e2d8c1
      Frederic Weisbecker authored
      irq_time_read() returns the irqtime minus the ksoftirqd time. This
      is necessary because irq_time_read() is used to substract the IRQ time
      from the sum_exec_runtime of a task. If we were to include the softirq
      time of ksoftirqd, this task would substract its own CPU time everytime
      it updates ksoftirqd->sum_exec_runtime which would therefore never
      progress.
      
      But this behaviour got broken by:
      
        a499a5a1 ("sched/cputime: Increment kcpustat directly on irqtime account")
      
      ... which now includes ksoftirqd softirq time in the time returned by
      irq_time_read().
      
      This has resulted in wrong ksoftirqd cputime reported to userspace
      through /proc/stat and thus "top" not showing ksoftirqd when it should
      after intense networking load.
      
      ksoftirqd->stime happens to be correct but it gets scaled down by
      sum_exec_runtime through task_cputime_adjusted().
      
      To fix this, just account the strict IRQ time in a separate counter and
      use it to report the IRQ time.
      Reported-and-tested-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1493129448-5356-1-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      25e2d8c1
    • Dmitry V. Levin's avatar
      uapi: change the type of struct statx_timestamp.tv_nsec to unsigned · 1741937d
      Dmitry V. Levin authored
      The comment asserting that the value of struct statx_timestamp.tv_nsec
      must be negative when statx_timestamp.tv_sec is negative, is wrong, as
      could be seen from the following example:
      
      	#define _FILE_OFFSET_BITS 64
      	#include <assert.h>
      	#include <fcntl.h>
      	#include <stdio.h>
      	#include <sys/stat.h>
      	#include <unistd.h>
      	#include <asm/unistd.h>
      	#include <linux/stat.h>
      
      	int main(void)
      	{
      		static const struct timespec ts[2] = {
      			{ .tv_nsec = UTIME_OMIT },
      			{ .tv_sec = -2, .tv_nsec = 42 }
      		};
      		assert(utimensat(AT_FDCWD, ".", ts, 0) == 0);
      
      		struct stat st;
      		assert(stat(".", &st) == 0);
      		printf("st_mtim.tv_sec = %lld, st_mtim.tv_nsec = %lu\n",
      		       (long long) st.st_mtim.tv_sec,
      		       (unsigned long) st.st_mtim.tv_nsec);
      
      		struct statx stx;
      		assert(syscall(__NR_statx, AT_FDCWD, ".", 0, 0, &stx) == 0);
      		printf("stx_mtime.tv_sec = %lld, stx_mtime.tv_nsec = %lu\n",
      		       (long long) stx.stx_mtime.tv_sec,
      		       (unsigned long) stx.stx_mtime.tv_nsec);
      
      		return 0;
      	}
      
      It expectedly prints:
      st_mtim.tv_sec = -2, st_mtim.tv_nsec = 42
      stx_mtime.tv_sec = -2, stx_mtime.tv_nsec = 42
      
      The more generic comment asserting that the value of struct
      statx_timestamp.tv_nsec might be negative is confusing to say the least.
      
      It contradicts both the struct stat.st_[acm]time_nsec tradition and
      struct timespec.tv_nsec requirements in utimensat syscall.
      If statx syscall ever returns a stx_[acm]time containing a negative
      tv_nsec that cannot be passed unmodified to utimensat syscall,
      it will cause an immense confusion.
      
      Fix this source of confusion by changing the type of struct
      statx_timestamp.tv_nsec from __s32 to __u32.
      
      Fixes: a528d35e ("statx: Add a system call to make enhanced file info available")
      Signed-off-by: default avatarDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: linux-api@vger.kernel.org
      cc: mtk.manpages@gmail.com
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1741937d
  8. 26 Apr, 2017 6 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · f8324608
      Linus Torvalds authored
      Pull sparc fixes from David Miller:
       "I didn't want the release to go out without the statx system call
        properly hooked up"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc: Update syscall tables.
        sparc64: Fill in rest of HAVE_REGS_AND_STACK_ACCESS_API
      f8324608
    • David Howells's avatar
      statx: Kill fd-with-NULL-path support in favour of AT_EMPTY_PATH · 1e2f82d1
      David Howells authored
      With the new statx() syscall, the following both allow the attributes of
      the file attached to a file descriptor to be retrieved:
      
      	statx(dfd, NULL, 0, ...);
      
      and:
      
      	statx(dfd, "", AT_EMPTY_PATH, ...);
      
      Change the code to reject the first option, though this means copying
      the path and engaging pathwalk for the fstat() equivalent.  dfd can be a
      non-directory provided path is "".
      
      [ The timing of this isn't wonderful, but applying this now before we
        have statx() in any released kernel, before anybody starts using the
        NULL special case.    - Linus ]
      
      Fixes: a528d35e ("statx: Add a system call to make enhanced file info available")
      Reported-by: default avatarMichael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Eric Sandeen <sandeen@sandeen.net>
      cc: fstests@vger.kernel.org
      cc: linux-api@vger.kernel.org
      cc: linux-man@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1e2f82d1
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · fc08b197
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) MLX5 bug fixes from Saeed Mahameed et al:
           - released wrong resources when firmware timeout happens
           - fix wrong check for encapsulation size limits
           - UAR memory leak
           - ETHTOOL_GRXCLSRLALL failed to fill in info->data
      
       2) Don't cache l3mdev on mis-matches local route, causes net devices to
          leak refs. From Robert Shearman.
      
       3) Handle fragmented SKBs properly in macsec driver, the problem is
          that we were mis-sizing the sgvec table. From Jason A. Donenfeld.
      
       4) We cannot have checksum offload enabled for inner UDP tunneled
          packet during IPSEC, from Ansis Atteka.
      
       5) Fix double SKB free in ravb driver, from Dan Carpenter.
      
       6) Fix CPU port handling in b53 DSA driver, from Florian Dainelli.
      
       7) Don't use on-stack buffers for usb_control_msg() in CAN usb driver,
          from Maksim Salau.
      
       8) Fix device leak in macvlan driver, from Herbert Xu. We have to purge
          the broadcast queue properly on port destroy.
      
       9) Fix tx ring entry limit on EF10 devices in sfc driver. From Bert
          Kenward.
      
      10) Fix memory leaks in team driver, from Pan Bian.
      
      11) Don't setup ipv6_stub before it can be actually used, from Paolo
          Abeni.
      
      12) Fix tipc socket flow control accounting, from Parthasarathy
          Bhuvaragan.
      
      13) Fix crash on module unload in hso driver, from Andreas Kemnade.
      
      14) Fix purging of bridge multicast entries, the problem is that if we
          don't defer it to ndo_uninit it's possible for new entries to get
          added after we purge. Fix from Xin Long.
      
      15) Don't return garbage for PACKET_HDRLEN getsockopt, from Alexander
          Potapenko.
      
      16) Fix autoneg stall properly in PHY layer, and revert micrel driver
          change that was papering over it. From Alexander Kochetkov.
      
      17) Don't dereference an ipv4 route as an ipv6 one in the ip6_tunnnel
          code, from Cong Wang.
      
      18) Clear out the congestion control private of the TCP socket in all of
          the right places, from Wei Wang.
      
      19) rawv6_ioctl measures SKB length incorrectly, fix from Jamie
          Bainbridge.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
        ipv6: check raw payload size correctly in ioctl
        tcp: memset ca_priv data to 0 properly
        ipv6: check skb->protocol before lookup for nexthop
        net: core: Prevent from dereferencing null pointer when releasing SKB
        macsec: dynamically allocate space for sglist
        Revert "phy: micrel: Disable auto negotiation on startup"
        net: phy: fix auto-negotiation stall due to unavailable interrupt
        net/packet: check length in getsockopt() called with PACKET_HDRLEN
        net: ipv6: regenerate host route if moved to gc list
        bridge: move bridge multicast cleanup to ndo_uninit
        ipv6: fix source routing
        qed: Fix error in the dcbx app meta data initialization.
        netvsc: fix calculation of available send sections
        net: hso: fix module unloading
        tipc: fix socket flow control accounting error at tipc_recv_stream
        tipc: fix socket flow control accounting error at tipc_send_stream
        ipv6: move stub initialization after ipv6 setup completion
        team: fix memory leaks
        sfc: tx ring can only have 2048 entries for all EF10 NICs
        macvlan: Fix device ref leak when purging bc_queue
        ...
      fc08b197
    • Jamie Bainbridge's avatar
      ipv6: check raw payload size correctly in ioctl · 105f5528
      Jamie Bainbridge authored
      In situations where an skb is paged, the transport header pointer and
      tail pointer can be the same because the skb contents are in frags.
      
      This results in ioctl(SIOCINQ/FIONREAD) incorrectly returning a
      length of 0 when the length to receive is actually greater than zero.
      
      skb->len is already correctly set in ip6_input_finish() with
      pskb_pull(), so use skb->len as it always returns the correct result
      for both linear and paged data.
      Signed-off-by: default avatarJamie Bainbridge <jbainbri@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      105f5528
    • Wei Wang's avatar
      tcp: memset ca_priv data to 0 properly · c1201444
      Wei Wang authored
      Always zero out ca_priv data in tcp_assign_congestion_control() so that
      ca_priv data is cleared out during socket creation.
      Also always zero out ca_priv data in tcp_reinit_congestion_control() so
      that when cc algorithm is changed, ca_priv data is cleared out as well.
      We should still zero out ca_priv data even in TCP_CLOSE state because
      user could call connect() on AF_UNSPEC to disconnect the socket and
      leave it in TCP_CLOSE state and later call setsockopt() to switch cc
      algorithm on this socket.
      
      Fixes: 2b0a8c9e ("tcp: add CDG congestion control")
      Reported-by: default avatarAndrey Konovalov  <andreyknvl@google.com>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1201444
    • WANG Cong's avatar
      ipv6: check skb->protocol before lookup for nexthop · 199ab00f
      WANG Cong authored
      Andrey reported a out-of-bound access in ip6_tnl_xmit(), this
      is because we use an ipv4 dst in ip6_tnl_xmit() and cast an IPv4
      neigh key as an IPv6 address:
      
              neigh = dst_neigh_lookup(skb_dst(skb),
                                       &ipv6_hdr(skb)->daddr);
              if (!neigh)
                      goto tx_err_link_failure;
      
              addr6 = (struct in6_addr *)&neigh->primary_key; // <=== HERE
              addr_type = ipv6_addr_type(addr6);
      
              if (addr_type == IPV6_ADDR_ANY)
                      addr6 = &ipv6_hdr(skb)->daddr;
      
              memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr));
      
      Also the network header of the skb at this point should be still IPv4
      for 4in6 tunnels, we shold not just use it as IPv6 header.
      
      This patch fixes it by checking if skb->protocol is ETH_P_IPV6: if it
      is, we are safe to do the nexthop lookup using skb_dst() and
      ipv6_hdr(skb)->daddr; if not (aka IPv4), we have no clue about which
      dest address we can pick here, we have to rely on callers to fill it
      from tunnel config, so just fall to ip6_route_output() to make the
      decision.
      
      Fixes: ea3dc960 ("ip6_tunnel: Add support for wildcard tunnel endpoints.")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      199ab00f