1. 31 Aug, 2017 1 commit
    • Naoya Horiguchi's avatar
      x86/boot/KASLR: Work around firmware bugs by excluding EFI_BOOT_SERVICES_* and... · 0982adc7
      Naoya Horiguchi authored
      x86/boot/KASLR: Work around firmware bugs by excluding EFI_BOOT_SERVICES_* and EFI_LOADER_* from KASLR's choice
      
      There's a potential bug in how we select the KASLR kernel address n
      the early boot code.
      
      The KASLR boot code currently chooses the kernel image's physical memory
      location from E820_TYPE_RAM regions by walking over all e820 entries.
      
      E820_TYPE_RAM includes EFI_BOOT_SERVICES_CODE and EFI_BOOT_SERVICES_DATA
      as well, so those regions can end up hosting the kernel image. According to
      the UEFI spec, all memory regions marked as EfiBootServicesCode and
      EfiBootServicesData are available as free memory after the first call
      to ExitBootServices(). I.e. so such regions should be usable for the
      kernel, per spec.
      
      In real life however, we have workarounds for broken x86 firmware,
      where we keep such regions reserved until SetVirtualAddressMap() is done.
      
      See the following code in should_map_region():
      
      	static bool should_map_region(efi_memory_desc_t *md)
      	{
      		...
      		/*
      		 * Map boot services regions as a workaround for buggy
      		 * firmware that accesses them even when they shouldn't.
      		 *
      		 * See efi_{reserve,free}_boot_services().
      		 */
      		if (md->type =3D=3D EFI_BOOT_SERVICES_CODE ||
      			md->type =3D=3D EFI_BOOT_SERVICES_DATA)
      				return false;
      
      This workaround suppressed a boot crash, but potential issues still
      remain because no one prevents the regions from overlapping with kernel
      image by KASLR.
      
      So let's make sure that EFI_BOOT_SERVICES_{CODE|DATA} regions are never
      chosen as kernel memory for the workaround to work fine.
      
      Furthermore, EFI_LOADER_{CODE|DATA} regions are also excluded because
      they can be used after ExitBootServices() as defined in EFI spec.
      
      As a result, we choose kernel address only from EFI_CONVENTIONAL_MEMORY
      which is the only memory type we know to be safely free.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Junichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: fanc.fnst@cn.fujitsu.com
      Cc: izumi.taku@jp.fujitsu.com
      Link: http://lkml.kernel.org/r/20170828074444.GC23181@hori1.linux.bs1.fc.nec.co.jp
      [ Rewrote/fixed/clarified the changelog and the in code comments. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0982adc7
  2. 17 Aug, 2017 4 commits
    • Baoquan He's avatar
      x86/boot/KASLR: Prefer mirrored memory regions for the kernel physical address · c05cd797
      Baoquan He authored
      Currently KASLR will parse all e820 entries of RAM type and add all
      candidate positions into the slots array. After that we choose one slot
      randomly as the new position which the kernel will be decompressed into
      and run at.
      
      On systems with EFI enabled, e820 memory regions are coming from EFI
      memory regions by combining adjacent regions.
      
      These EFI memory regions have various attributes, and the "mirrored"
      attribute is one of them. The physical memory region whose descriptors
      in EFI memory map has EFI_MEMORY_MORE_RELIABLE attribute (bit: 16) are
      mirrored. The address range mirroring feature of the kernel arranges such
      mirrored regions into normal zones and other regions into movable zones.
      
      With the mirroring feature enabled, the code and data of the kernel can only
      be located in the more reliable mirrored regions. However, the current KASLR
      code doesn't check EFI memory entries, and could choose a new kernel position
      in non-mirrored regions. This will break the intended functionality of the
      address range mirroring feature.
      
      To fix this, if EFI is detected, iterate EFI memory map and pick the mirrored
      region to process for adding candidate of randomization slot. If EFI is disabled
      or no mirrored region found, still process the e820 memory map.
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: ard.biesheuvel@linaro.org
      Cc: fanc.fnst@cn.fujitsu.com
      Cc: izumi.taku@jp.fujitsu.com
      Cc: keescook@chromium.org
      Cc: linux-efi@vger.kernel.org
      Cc: matt@codeblueprint.co.uk
      Cc: n-horiguchi@ah.jp.nec.com
      Cc: thgarnie@google.com
      Link: http://lkml.kernel.org/r/1502722464-20614-3-git-send-email-bhe@redhat.com
      [ Rewrote most of the text. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c05cd797
    • Baoquan He's avatar
      efi: Introduce efi_early_memdesc_ptr to get pointer to memmap descriptor · 02e43c2d
      Baoquan He authored
      The existing map iteration helper for_each_efi_memory_desc_in_map can
      only be used after the kernel initializes the EFI subsystem to set up
      struct efi_memory_map.
      
      Before that we also need iterate map descriptors which are stored in several
      intermediate structures, like struct efi_boot_memmap for arch independent
      usage and struct efi_info for x86 arch only.
      
      Introduce efi_early_memdesc_ptr() to get pointer to a map descriptor, and
      replace several places where that primitive is open coded.
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      [ Various improvements to the text. ]
      Acked-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: ard.biesheuvel@linaro.org
      Cc: fanc.fnst@cn.fujitsu.com
      Cc: izumi.taku@jp.fujitsu.com
      Cc: keescook@chromium.org
      Cc: linux-efi@vger.kernel.org
      Cc: n-horiguchi@ah.jp.nec.com
      Cc: thgarnie@google.com
      Link: http://lkml.kernel.org/r/20170816134651.GF21273@x1Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      02e43c2d
    • Ingo Molnar's avatar
      2257e268
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · ac9a4090
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "A couple of minor fixes (st, ses) and some bigger driver fixes for
        qla2xxx (crash triggered by fw dump) and ipr (lockdep problems with
        mq)"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: ses: Fix wrong page error
        scsi: ipr: Fix scsi-mq lockdep issue
        scsi: st: fix blk_get_queue usage
        scsi: qla2xxx: Fix system crash while triggering FW dump
      ac9a4090
  3. 16 Aug, 2017 11 commits
    • Linus Torvalds's avatar
      Merge tag 'audit-pr-20170816' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit · 422ce075
      Linus Torvalds authored
      Pull audit fixes from Paul Moore:
       "Two small fixes to the audit code, both explained well in the
        respective patch descriptions, but the quick summary is one
        use-after-free fix, and one silly fanotify notification flag fix"
      
      * tag 'audit-pr-20170816' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
        audit: Receive unmount event
        audit: Fix use after free in audit_remove_watch_rule()
      422ce075
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 510c8a89
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix TCP checksum offload handling in iwlwifi driver, from Emmanuel
          Grumbach.
      
       2) In ksz DSA tagging code, free SKB if skb_put_padto() fails. From
          Vivien Didelot.
      
       3) Fix two regressions with bonding on wireless, from Andreas Born.
      
       4) Fix build when busypoll is disabled, from Daniel Borkmann.
      
       5) Fix copy_linear_skb() wrt. SO_PEEK_OFF, from Eric Dumazet.
      
       6) Set SKB cached route properly in inet_rtm_getroute(), from Florian
          Westphal.
      
       7) Fix PCI-E relaxed ordering handling in cxgb4 driver, from Ding
          Tianhong.
      
       8) Fix module refcnt leak in ULP code, from Sabrina Dubroca.
      
       9) Fix use of GFP_KERNEL in atomic contexts in AF_KEY code, from Eric
          Dumazet.
      
      10) Need to purge socket write queue in dccp_destroy_sock(), also from
          Eric Dumazet.
      
      11) Make bpf_trace_printk() work properly on 32-bit architectures, from
          Daniel Borkmann.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
        bpf: fix bpf_trace_printk on 32 bit archs
        PCI: fix oops when try to find Root Port for a PCI device
        sfc: don't try and read ef10 data on non-ef10 NIC
        net_sched: remove warning from qdisc_hash_add
        net_sched/sfq: update hierarchical backlog when drop packet
        net_sched: reset pointers to tcf blocks in classful qdiscs' destructors
        ipv4: fix NULL dereference in free_fib_info_rcu()
        net: Fix a typo in comment about sock flags.
        ipv6: fix NULL dereference in ip6_route_dev_notify()
        tcp: fix possible deadlock in TCP stack vs BPF filter
        dccp: purge write queue in dccp_destroy_sock()
        udp: fix linear skb reception with PEEK_OFF
        ipv6: release rt6->rt6i_idev properly during ifdown
        af_key: do not use GFP_KERNEL in atomic contexts
        tcp: ulp: avoid module refcnt leak in tcp_set_ulp
        net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
        net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
        PCI: Disable Relaxed Ordering Attributes for AMD A1100
        PCI: Disable Relaxed Ordering for some Intel processors
        PCI: Disable PCIe Relaxed Ordering if unsupported
        ...
      510c8a89
    • Daniel Borkmann's avatar
      bpf: fix bpf_trace_printk on 32 bit archs · 88a5c690
      Daniel Borkmann authored
      James reported that on MIPS32 bpf_trace_printk() is currently
      broken while MIPS64 works fine:
      
        bpf_trace_printk() uses conditional operators to attempt to
        pass different types to __trace_printk() depending on the
        format operators. This doesn't work as intended on 32-bit
        architectures where u32 and long are passed differently to
        u64, since the result of C conditional operators follows the
        "usual arithmetic conversions" rules, such that the values
        passed to __trace_printk() will always be u64 [causing issues
        later in the va_list handling for vscnprintf()].
      
        For example the samples/bpf/tracex5 test printed lines like
        below on MIPS32, where the fd and buf have come from the u64
        fd argument, and the size from the buf argument:
      
          [...] 1180.941542: 0x00000001: write(fd=1, buf=  (null), size=6258688)
      
        Instead of this:
      
          [...] 1625.616026: 0x00000001: write(fd=1, buf=009e4000, size=512)
      
      One way to get it working is to expand various combinations
      of argument types into 8 different combinations for 32 bit
      and 64 bit kernels. Fix tested by James on MIPS32 and MIPS64
      as well that it resolves the issue.
      
      Fixes: 9c959c86 ("tracing: Allow BPF programs to call bpf_trace_printk()")
      Reported-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Tested-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88a5c690
    • dingtianhong's avatar
      PCI: fix oops when try to find Root Port for a PCI device · 0e405232
      dingtianhong authored
      Eric report a oops when booting the system after applying
      the commit a99b646a ("PCI: Disable PCIe Relaxed..."):
      
      [    4.241029] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
      [    4.247001] IP: pci_find_pcie_root_port+0x62/0x80
      [    4.253011] PGD 0
      [    4.253011] P4D 0
      [    4.253011]
      [    4.258013] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [    4.262015] Modules linked in:
      [    4.265005] CPU: 31 PID: 1 Comm: swapper/0 Not tainted 4.13.0-dbx-DEV #316
      [    4.271002] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016
      [    4.279002] task: ffffa2ee38cfa040 task.stack: ffffa51ec0004000
      [    4.285001] RIP: 0010:pci_find_pcie_root_port+0x62/0x80
      [    4.290012] RSP: 0000:ffffa51ec0007ab8 EFLAGS: 00010246
      [    4.295003] RAX: 0000000000000000 RBX: ffffa2ee36bae000 RCX: 0000000000000006
      [    4.303002] RDX: 000000000000081c RSI: ffffa2ee38cfa8c8 RDI: ffffa2ee36bae000
      [    4.310013] RBP: ffffa51ec0007b58 R08: 0000000000000001 R09: 0000000000000000
      [    4.317001] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa51ec0007ad0
      [    4.324005] R13: ffffa2ee36bae098 R14: 0000000000000002 R15: ffffa2ee37204818
      [    4.331002] FS:  0000000000000000(0000) GS:ffffa2ee3fcc0000(0000) knlGS:0000000000000000
      [    4.339002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    4.345001] CR2: 0000000000000050 CR3: 000000401000f000 CR4: 00000000001406e0
      [    4.351002] Call Trace:
      [    4.354012]  ? pci_configure_device+0x19f/0x570
      [    4.359002]  ? pci_conf1_read+0xb8/0xf0
      [    4.363002]  ? raw_pci_read+0x23/0x40
      [    4.366011]  ? pci_read+0x2c/0x30
      [    4.370014]  ? pci_read_config_word+0x67/0x70
      [    4.374012]  pci_device_add+0x28/0x230
      [    4.378012]  ? pci_vpd_f0_read+0x50/0x80
      [    4.382014]  pci_scan_single_device+0x96/0xc0
      [    4.386012]  pci_scan_slot+0x79/0xf0
      [    4.389001]  pci_scan_child_bus+0x31/0x180
      [    4.394014]  acpi_pci_root_create+0x1c6/0x240
      [    4.398013]  pci_acpi_scan_root+0x15f/0x1b0
      [    4.402012]  acpi_pci_root_add+0x2e6/0x400
      [    4.406012]  ? acpi_evaluate_integer+0x37/0x60
      [    4.411002]  acpi_bus_attach+0xdf/0x200
      [    4.415002]  acpi_bus_attach+0x6a/0x200
      [    4.418014]  acpi_bus_attach+0x6a/0x200
      [    4.422013]  acpi_bus_scan+0x38/0x70
      [    4.426011]  acpi_scan_init+0x10c/0x271
      [    4.429001]  acpi_init+0x2fa/0x348
      [    4.433004]  ? acpi_sleep_proc_init+0x2d/0x2d
      [    4.437001]  do_one_initcall+0x43/0x169
      [    4.441001]  kernel_init_freeable+0x1d0/0x258
      [    4.445003]  ? rest_init+0xe0/0xe0
      [    4.449001]  kernel_init+0xe/0x150
      
      ====================== cut here =============================
      
      It looks like the pci_find_pcie_root_port() was trying to
      find the Root Port for the PCI device which is the Root
      Port already, it will return NULL and trigger the problem,
      so check the highest_pcie_bridge to fix thie problem.
      
      Fixes: a99b646a ("PCI: Disable PCIe Relaxed Ordering if unsupported")
      Fixes: c56d4450 ("PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum")
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e405232
    • Bert Kenward's avatar
      sfc: don't try and read ef10 data on non-ef10 NIC · 61deee96
      Bert Kenward authored
      The MAC stats command takes a port ID, which doesn't exist on
      pre-ef10 NICs (5000- and 6000- series). This is extracted from the
      NIC specific data; we misinterpret this as the ef10 data structure,
      causing us to read potentially unallocated data. With a KASAN kernel
      this can cause errors with:
         BUG: KASAN: slab-out-of-bounds in efx_mcdi_mac_stats
      
      Fixes: 0a2ab4d9 ("sfc: set the port-id when calling MC_CMD_MAC_STATS")
      Reported-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Tested-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarBert Kenward <bkenward@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61deee96
    • Konstantin Khlebnikov's avatar
      net_sched: remove warning from qdisc_hash_add · c90e9514
      Konstantin Khlebnikov authored
      It was added in commit e57a784d ("pkt_sched: set root qdisc
      before change() in attach_default_qdiscs()") to hide duplicates
      from "tc qdisc show" for incative deivices.
      
      After 59cc1f61 ("net: sched: convert qdisc linked list to hashtable")
      it triggered when classful qdisc is added to inactive device because
      default qdiscs are added before switching root qdisc.
      
      Anyway after commit ea327469 ("net: sched: avoid duplicates in
      qdisc dump") duplicates are filtered right in dumper.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c90e9514
    • Konstantin Khlebnikov's avatar
      net_sched/sfq: update hierarchical backlog when drop packet · 325d5dc3
      Konstantin Khlebnikov authored
      When sfq_enqueue() drops head packet or packet from another queue it
      have to update backlog at upper qdiscs too.
      
      Fixes: 2ccccf5f ("net_sched: update hierarchical backlog too")
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      325d5dc3
    • Konstantin Khlebnikov's avatar
      net_sched: reset pointers to tcf blocks in classful qdiscs' destructors · 89890422
      Konstantin Khlebnikov authored
      Traffic filters could keep direct pointers to classes in classful qdisc,
      thus qdisc destruction first removes all filters before freeing classes.
      Class destruction methods also tries to free attached filters but now
      this isn't safe because tcf_block_put() unlike to tcf_destroy_chain()
      cannot be called second time.
      
      This patch set class->block to NULL after first tcf_block_put() and
      turn second call into no-op.
      
      Fixes: 6529eaba ("net: sched: introduce tcf block infractructure")
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89890422
    • Eric Dumazet's avatar
      ipv4: fix NULL dereference in free_fib_info_rcu() · 187e5b3a
      Eric Dumazet authored
      If fi->fib_metrics could not be allocated in fib_create_info()
      we attempt to dereference a NULL pointer in free_fib_info_rcu() :
      
          m = fi->fib_metrics;
          if (m != &dst_default_metrics && atomic_dec_and_test(&m->refcnt))
                  kfree(m);
      
      Before my recent patch, we used to call kfree(NULL) and nothing wrong
      happened.
      
      Instead of using RCU to defer freeing while we are under memory stress,
      it seems better to take immediate action.
      
      This was reported by syzkaller team.
      
      Fixes: 3fb07daf ("ipv4: add reference counting to metrics")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      187e5b3a
    • Tonghao Zhang's avatar
    • Eric Dumazet's avatar
      ipv6: fix NULL dereference in ip6_route_dev_notify() · 12d94a80
      Eric Dumazet authored
      Based on a syzkaller report [1], I found that a per cpu allocation
      failure in snmp6_alloc_dev() would then lead to NULL dereference in
      ip6_route_dev_notify().
      
      It seems this is a very old bug, thus no Fixes tag in this submission.
      
      Let's add in6_dev_put_clear() helper, as we will probably use
      it elsewhere (once available/present in net-next)
      
      [1]
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 17294 Comm: syz-executor6 Not tainted 4.13.0-rc2+ #10
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff88019f456680 task.stack: ffff8801c6e58000
      RIP: 0010:__read_once_size include/linux/compiler.h:250 [inline]
      RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline]
      RIP: 0010:refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178
      RSP: 0018:ffff8801c6e5f1b0 EFLAGS: 00010202
      RAX: 0000000000000037 RBX: dffffc0000000000 RCX: ffffc90005d25000
      RDX: ffff8801c6e5f218 RSI: ffffffff82342bbf RDI: 0000000000000001
      RBP: ffff8801c6e5f240 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff10038dcbe37
      R13: 0000000000000006 R14: 0000000000000001 R15: 00000000000001b8
      FS:  00007f21e0429700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001ddbc22000 CR3: 00000001d632b000 CR4: 00000000001426e0
      DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
      Call Trace:
       refcount_dec_and_test+0x1a/0x20 lib/refcount.c:211
       in6_dev_put include/net/addrconf.h:335 [inline]
       ip6_route_dev_notify+0x1c9/0x4a0 net/ipv6/route.c:3732
       notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1678
       call_netdevice_notifiers net/core/dev.c:1694 [inline]
       rollback_registered_many+0x91c/0xe80 net/core/dev.c:7107
       rollback_registered+0x1be/0x3c0 net/core/dev.c:7149
       register_netdevice+0xbcd/0xee0 net/core/dev.c:7587
       register_netdev+0x1a/0x30 net/core/dev.c:7669
       loopback_net_init+0x76/0x160 drivers/net/loopback.c:214
       ops_init+0x10a/0x570 net/core/net_namespace.c:118
       setup_net+0x313/0x710 net/core/net_namespace.c:294
       copy_net_ns+0x27c/0x580 net/core/net_namespace.c:418
       create_new_namespaces+0x425/0x880 kernel/nsproxy.c:107
       unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:206
       SYSC_unshare kernel/fork.c:2347 [inline]
       SyS_unshare+0x653/0xfa0 kernel/fork.c:2297
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x4512c9
      RSP: 002b:00007f21e0428c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000110
      RAX: ffffffffffffffda RBX: 0000000000718150 RCX: 00000000004512c9
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000062020200
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b973d
      R13: 00000000ffffffff R14: 000000002001d000 R15: 00000000000002dd
      Code: 50 2b 34 82 c7 00 f1 f1 f1 f1 c7 40 04 04 f2 f2 f2 c7 40 08 f3 f3
      f3 f3 e8 a1 43 39 ff 4c 89 f8 48 8b 95 70 ff ff ff 48 c1 e8 03 <0f> b6
      0c 18 4c 89 f8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f 85
      RIP: __read_once_size include/linux/compiler.h:250 [inline] RSP:
      ffff8801c6e5f1b0
      RIP: atomic_read arch/x86/include/asm/atomic.h:26 [inline] RSP:
      ffff8801c6e5f1b0
      RIP: refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178 RSP:
      ffff8801c6e5f1b0
      ---[ end trace e441d046c6410d31 ]---
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      12d94a80
  4. 15 Aug, 2017 16 commits
    • Jan Kara's avatar
      audit: Receive unmount event · b5fed474
      Jan Kara authored
      Although audit_watch_handle_event() can handle FS_UNMOUNT event, it is
      not part of AUDIT_FS_WATCH mask and thus such event never gets to
      audit_watch_handle_event(). Thus fsnotify marks are deleted by fsnotify
      subsystem on unmount without audit being notified about that which leads
      to a strange state of existing audit rules with dead fsnotify marks.
      
      Add FS_UNMOUNT to the mask of events to be received so that audit can
      clean up its state accordingly.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      b5fed474
    • Jan Kara's avatar
      audit: Fix use after free in audit_remove_watch_rule() · d76036ab
      Jan Kara authored
      audit_remove_watch_rule() drops watch's reference to parent but then
      continues to work with it. That is not safe as parent can get freed once
      we drop our reference. The following is a trivial reproducer:
      
      mount -o loop image /mnt
      touch /mnt/file
      auditctl -w /mnt/file -p wax
      umount /mnt
      auditctl -D
      <crash in fsnotify_destroy_mark()>
      
      Grab our own reference in audit_remove_watch_rule() earlier to make sure
      mark does not get freed under us.
      
      CC: stable@vger.kernel.org
      Reported-by: default avatarTony Jones <tonyj@suse.de>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Tested-by: default avatarTony Jones <tonyj@suse.de>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      d76036ab
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-4.13-rc6-fixes' of... · 40c6d1b9
      Linus Torvalds authored
      Merge tag 'linux-kselftest-4.13-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull kselftest fixes from Shuah Khan:
       "This update consists of important compile and run-time error fixes to
        timers/freq-step, kmod, and sysctl tests"
      
      * tag 'linux-kselftest-4.13-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests: timers: freq-step: fix compile error
        selftests: futex: fix run_tests target
        test_sysctl: fix sysctl.sh by making it executable
        test_kmod: fix kmod.sh by making it executable
      40c6d1b9
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-for-davem-2017-08-15' of... · 0a6f0418
      David S. Miller authored
      Merge tag 'wireless-drivers-for-davem-2017-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for 4.13
      
      This time quite a few fixes for iwlwifi and one major regression fix
      for brcmfmac. For the iwlwifi aggregation bug a small change was
      needed for mac80211, but as Johannes is still away the mac80211 patch
      is taken via wireless-drivers tree.
      
      brcmfmac
      
      * fix firmware crash (a recent regression in bcm4343{0,1,8}
      
      iwlwifi
      
      * Some simple PCI HW ID fix-ups and additions for family 9000
      
      * Remove a bogus warning message with new FWs (bug #196915)
      
      * Don't allow illegal channel options to be used (bug #195299)
      
      * A fix for checksum offload in family 9000
      
      * A fix serious throughput degradation in 11ac with multiple streams
      
      * An old bug in SMPS where the firmware was not aware of SMPS changes
      
      * Fix a memory leak in the SAR code
      
      * Fix a stuck queue case in AP mode;
      
      * Convert a WARN to a simple debug in a legitimate race case (from
        which we can recover)
      
      * Fix a severe throughput aggregation on 9000-family devices due to
        aggregation issues, needed a small change in mac80211
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a6f0418
    • Eric Dumazet's avatar
      tcp: fix possible deadlock in TCP stack vs BPF filter · d624d276
      Eric Dumazet authored
      Filtering the ACK packet was not put at the right place.
      
      At this place, we already allocated a child and put it
      into accept queue.
      
      We absolutely need to call tcp_child_process() to release
      its spinlock, or we will deadlock at accept() or close() time.
      
      Found by syzkaller team (Thanks a lot !)
      
      Fixes: 8fac365f ("tcp: Add a tcp_filter hook before handle ack packet")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Chenbo Feng <fengc@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d624d276
    • Eric Dumazet's avatar
      dccp: purge write queue in dccp_destroy_sock() · 7749d4ff
      Eric Dumazet authored
      syzkaller reported that DCCP could have a non empty
      write queue at dismantle time.
      
      WARNING: CPU: 1 PID: 2953 at net/core/stream.c:199 sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 1 PID: 2953 Comm: syz-executor0 Not tainted 4.13.0-rc4+ #2
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       panic+0x1e4/0x417 kernel/panic.c:180
       __warn+0x1c4/0x1d9 kernel/panic.c:541
       report_bug+0x211/0x2d0 lib/bug.c:183
       fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
       do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
       do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
       do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
       do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
       invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846
      RIP: 0010:sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199
      RSP: 0018:ffff8801d182f108 EFLAGS: 00010297
      RAX: ffff8801d1144140 RBX: ffff8801d13cb280 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff85137b00 RDI: ffff8801d13cb280
      RBP: ffff8801d182f148 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d13cb4d0
      R13: ffff8801d13cb3b8 R14: ffff8801d13cb300 R15: ffff8801d13cb3b8
       inet_csk_destroy_sock+0x175/0x3f0 net/ipv4/inet_connection_sock.c:835
       dccp_close+0x84d/0xc10 net/dccp/proto.c:1067
       inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
       sock_release+0x8d/0x1e0 net/socket.c:597
       sock_close+0x16/0x20 net/socket.c:1126
       __fput+0x327/0x7e0 fs/file_table.c:210
       ____fput+0x15/0x20 fs/file_table.c:246
       task_work_run+0x18a/0x260 kernel/task_work.c:116
       exit_task_work include/linux/task_work.h:21 [inline]
       do_exit+0xa32/0x1b10 kernel/exit.c:865
       do_group_exit+0x149/0x400 kernel/exit.c:969
       get_signal+0x7e8/0x17e0 kernel/signal.c:2330
       do_signal+0x94/0x1ee0 arch/x86/kernel/signal.c:808
       exit_to_usermode_loop+0x21c/0x2d0 arch/x86/entry/common.c:157
       prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
       syscall_return_slowpath+0x3a7/0x450 arch/x86/entry/common.c:263
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7749d4ff
    • Al Viro's avatar
      udp: fix linear skb reception with PEEK_OFF · 42b73059
      Al Viro authored
      copy_linear_skb() is broken; both of its callers actually
      expect 'len' to be the amount we are trying to copy,
      not the offset of the end.
      Fix it keeping the meanings of arguments in sync with what the
      callers (both of them) expect.
      Also restore a saner behavior on EFAULT (i.e. preserving
      the iov_iter position in case of failure):
      
      The commit fd851ba9 ("udp: harden copy_linear_skb()")
      avoids the more destructive effect of the buggy
      copy_linear_skb(), e.g. no more invalid memory access, but
      said function still behaves incorrectly: when peeking with
      offset it can fail with EINVAL instead of copying the
      appropriate amount of memory.
      Reported-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Fixes: b65ac446 ("udp: try to avoid 2 cache miss on dequeue")
      Fixes: fd851ba9 ("udp: harden copy_linear_skb()")
      Signed-off-by: default avatarAl Viro <viro@ZenIV.linux.org.uk>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Tested-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      42b73059
    • Wei Wang's avatar
      ipv6: release rt6->rt6i_idev properly during ifdown · e5645f51
      Wei Wang authored
      When a dst is created by addrconf_dst_alloc() for a host route or an
      anycast route, dst->dev points to loopback dev while rt6->rt6i_idev
      points to a real device.
      When the real device goes down, the current cleanup code only checks for
      dst->dev and assumes rt6->rt6i_idev->dev is the same. This causes the
      refcount leak on the real device in the above situation.
      This patch makes sure to always release the refcount taken on
      rt6->rt6i_idev during dst_dev_put().
      
      Fixes: 587fea74 ("ipv6: mark DST_NOGC and remove the operation of
      dst_free()")
      Reported-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Tested-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Tested-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5645f51
    • Eric Dumazet's avatar
      af_key: do not use GFP_KERNEL in atomic contexts · 36f41f8f
      Eric Dumazet authored
      pfkey_broadcast() might be called from non process contexts,
      we can not use GFP_KERNEL in these cases [1].
      
      This patch partially reverts commit ba51b6be ("net: Fix RCU splat in
      af_key"), only keeping the GFP_ATOMIC forcing under rcu_read_lock()
      section.
      
      [1] : syzkaller reported :
      
      in_atomic(): 1, irqs_disabled(): 0, pid: 2932, name: syzkaller183439
      3 locks held by syzkaller183439/2932:
       #0:  (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [<ffffffff83b43888>] pfkey_sendmsg+0x4c8/0x9f0 net/key/af_key.c:3649
       #1:  (&pfk->dump_lock){+.+.+.}, at: [<ffffffff83b467f6>] pfkey_do_dump+0x76/0x3f0 net/key/af_key.c:293
       #2:  (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] spin_lock_bh include/linux/spinlock.h:304 [inline]
       #2:  (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] xfrm_policy_walk+0x192/0xa30 net/xfrm/xfrm_policy.c:1028
      CPU: 0 PID: 2932 Comm: syzkaller183439 Not tainted 4.13.0-rc4+ #24
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       ___might_sleep+0x2b2/0x470 kernel/sched/core.c:5994
       __might_sleep+0x95/0x190 kernel/sched/core.c:5947
       slab_pre_alloc_hook mm/slab.h:416 [inline]
       slab_alloc mm/slab.c:3383 [inline]
       kmem_cache_alloc+0x24b/0x6e0 mm/slab.c:3559
       skb_clone+0x1a0/0x400 net/core/skbuff.c:1037
       pfkey_broadcast_one+0x4b2/0x6f0 net/key/af_key.c:207
       pfkey_broadcast+0x4ba/0x770 net/key/af_key.c:281
       dump_sp+0x3d6/0x500 net/key/af_key.c:2685
       xfrm_policy_walk+0x2f1/0xa30 net/xfrm/xfrm_policy.c:1042
       pfkey_dump_sp+0x42/0x50 net/key/af_key.c:2695
       pfkey_do_dump+0xaa/0x3f0 net/key/af_key.c:299
       pfkey_spddump+0x1a0/0x210 net/key/af_key.c:2722
       pfkey_process+0x606/0x710 net/key/af_key.c:2814
       pfkey_sendmsg+0x4d6/0x9f0 net/key/af_key.c:3650
      sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       ___sys_sendmsg+0x755/0x890 net/socket.c:2035
       __sys_sendmsg+0xe5/0x210 net/socket.c:2069
       SYSC_sendmsg net/socket.c:2080 [inline]
       SyS_sendmsg+0x2d/0x50 net/socket.c:2076
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x445d79
      RSP: 002b:00007f32447c1dc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000445d79
      RDX: 0000000000000000 RSI: 000000002023dfc8 RDI: 0000000000000008
      RBP: 0000000000000086 R08: 00007f32447c2700 R09: 00007f32447c2700
      R10: 00007f32447c2700 R11: 0000000000000202 R12: 0000000000000000
      R13: 00007ffe33edec4f R14: 00007f32447c29c0 R15: 0000000000000000
      
      Fixes: ba51b6be ("net: Fix RCU splat in af_key")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: David Ahern <dsa@cumulusnetworks.com>
      Acked-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36f41f8f
    • Sabrina Dubroca's avatar
      tcp: ulp: avoid module refcnt leak in tcp_set_ulp · 539a06ba
      Sabrina Dubroca authored
      __tcp_ulp_find_autoload returns tcp_ulp_ops after taking a reference on
      the module. Then, if ->init fails, tcp_set_ulp propagates the error but
      nothing releases that reference.
      
      Fixes: 734942cc ("tcp: ULP infrastructure")
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      539a06ba
    • David S. Miller's avatar
      Merge branch 'Add-new-PCI_DEV_FLAGS_NO_RELAXED_ORDERING-flag' · bae514a6
      David S. Miller authored
      Ding Tianhong says:
      
      ====================
      Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
      
      Some devices have problems with Transaction Layer Packets with the Relaxed
      Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
      PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
      devices with Relaxed Ordering issues, and a use of this new flag by the
      cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
      Ports.
      
      It's been years since I've submitted kernel.org patches, I appolgise for the
      almost certain submission errors.
      
      v2: Alexander point out that the v1 was only a part of the whole solution,
          some platform which has some issues could use the new flag to indicate
          that it is not safe to enable relaxed ordering attribute, then we need
          to clear the relaxed ordering enable bits in the PCI configuration when
          initializing the device. So add a new second patch to modify the PCI
          initialization code to clear the relaxed ordering enable bit in the
          event that the root complex doesn't want relaxed ordering enabled.
      
          The third patch was base on the v1's second patch and only be changed
          to query the relaxed ordering enable bit in the PCI configuration space
          to allow the Chelsio NIC to send TLPs with the relaxed ordering attributes
          set.
      
          This version didn't plan to drop the defines for Intel Drivers to use the
          new checking way to enable relaxed ordering because it is not the hardest
          part of the moment, we could fix it in next patchset when this patches
          reach the goal.
      
      v3: Redesigned the logic for pci_configure_relaxed_ordering when configuration,
          If a PCIe device didn't enable the relaxed ordering attribute default,
          we should not do anything in the PCIe configuration, otherwise we
          should check if any of the devices above us do not support relaxed
          ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
          the result if we get a return that indicate that the relaxed ordering
          is not supported we should update our device to disable relaxed ordering
          in configuration space. If the device above us doesn't exist or isn't
          the PCIe device, we shouldn't do anything and skip updating relaxed ordering
          because we are probably running in a guest.
      
      v4: Rename the functions pcie_get_relaxed_ordering and pcie_disable_relaxed_ordering
          according John's suggestion, and modify the description, use the true/false
          as the return value.
      
          We shouldn't enable relaxed ordering attribute by the setting in the root
          complex configuration space for PCIe device, so fix it for cxgb4.
      
          Fix some format issues.
      
      v5: Removed the unnecessary code for some function which only return the bool
          value, and add the check for VF device.
      
          Make this patch set base on 4.12-rc5.
      
      v6: Fix the logic error in the need to enable the relaxed ordering attribute for cxgb4.
      
      v7: The cxgb4 drivers will enable the PCIe Capability Device Control[Relaxed
          Ordering Enable] in PCI Probe() routine, this will break our current
          solution for some platform which has problematic when enable the relaxed
          ordering attribute. According to the latest recommendations, remove the
          enable_pcie_relaxed_ordering(), although it could not cover the Peer-to-Peer
          scene, but we agree to leave this problem until we really trigger it.
      
          Make this patch set base on 4.12 release version.
      
      v8: Change the second patch title and description to make it more reasonable,
          add the acked-by from Alex and Ashok.
      
          Add a new patch to enable the Relaxed Ordering Attribute for cxgb4vf driver.
      
          Make this patch set base on 4.13-rc2.
      
      v9: The document (https://software.intel.com/sites/default/files/managed/9e/
          bc/64-ia-32-architectures-optimization-manual.pdf) indicate that the Xeon
          processors based on Broadwell/Haswell microarchitecture has the problem
          with Relaxed Ordering Attribute enabled, so add the whole list Device ID
          from Intel to the patch.
      
      v10: Significant rework based on Bjorn's feedback, reorganize the first 2 patches,
           now the Intel and AMD erratum soc has been divided to the different patches,
           rename the pcie_relaxed_ordering_supported() to pcie_relaxed_ordering_enabled(),
           and no need to check every intervening switch except the root ports, update
           some commits.
      
      v11: We shouldn't let the Intel engineer to acked the AMD's erratum patch, fix the
           funny mistake.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bae514a6
    • Casey Leedom's avatar
      net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag · b629276d
      Casey Leedom authored
      cxgb4vf Ethernet driver now queries PCIe configuration space to
      determine if it can send TLPs to it with the Relaxed Ordering
      Attribute set, just like the pf did.
      Signed-off-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Reviewed-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b629276d
    • Casey Leedom's avatar
      net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag · b0ba9d5f
      Casey Leedom authored
      cxgb4 Ethernet driver now queries PCIe configuration space to determine
      if it can send TLPs to it with the Relaxed Ordering Attribute set.
      
      Remove the enable_pcie_relaxed_ordering() to avoid enable PCIe Capability
      Device Control[Relaxed Ordering Enable] at probe routine, to make sure
      the driver will not send the Relaxed Ordering TLPs to the Root Complex which
      could not deal the Relaxed Ordering TLPs.
      Signed-off-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Reviewed-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0ba9d5f
    • dingtianhong's avatar
      PCI: Disable Relaxed Ordering Attributes for AMD A1100 · 077fa19c
      dingtianhong authored
      Casey reported that the AMD ARM A1100 SoC has a bug in its PCIe
      Root Port where Upstream Transaction Layer Packets with the Relaxed
      Ordering Attribute clear are allowed to bypass earlier TLPs with
      Relaxed Ordering set, it would cause Data Corruption, so we need
      to disable Relaxed Ordering Attribute when Upstream TLPs to the
      Root Port.
      Reported-and-suggested-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Acked-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      077fa19c
    • dingtianhong's avatar
      PCI: Disable Relaxed Ordering for some Intel processors · 87e09cde
      dingtianhong authored
      According to the Intel spec section 3.9.1 said:
      
          3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory
                and Toward MMIO Regions (P2P)
      
          In order to maximize performance for PCIe devices in the processors
          listed in Table 3-6 below, the soft- ware should determine whether the
          accesses are toward coherent memory (system memory) or toward MMIO
          regions (P2P access to other devices). If the access is toward MMIO
          region, then software can command HW to set the RO bit in the TLP
          header, as this would allow hardware to achieve maximum throughput for
          these types of accesses. For accesses toward coherent memory, software
          can command HW to clear the RO bit in the TLP header (no RO), as this
          would allow hardware to achieve maximum throughput for these types of
          accesses.
      
          Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing
                     PCIe Performance
      
          Processor                            CPU RP Device IDs
      
          Intel Xeon processors based on       6F01H-6F0EH
          Broadwell microarchitecture
      
          Intel Xeon processors based on       2F01H-2F0EH
          Haswell microarchitecture
      
      It means some Intel processors has performance issue when use the Relaxed
      Ordering Attribute, so disable Relaxed Ordering for these root port.
      Signed-off-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Acked-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarAshok Raj <ashok.raj@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87e09cde
    • dingtianhong's avatar
      PCI: Disable PCIe Relaxed Ordering if unsupported · a99b646a
      dingtianhong authored
      When bit4 is set in the PCIe Device Control register, it indicates
      whether the device is permitted to use relaxed ordering.
      On some platforms using relaxed ordering can have performance issues or
      due to erratum can cause data-corruption. In such cases devices must avoid
      using relaxed ordering.
      
      The patch adds a new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING to indicate that
      Relaxed Ordering (RO) attribute should not be used for Transaction Layer
      Packets (TLP) targeted towards these affected root complexes.
      
      This patch checks if there is any node in the hierarchy that indicates that
      using relaxed ordering is not safe. In such cases the patch turns off the
      relaxed ordering by clearing the capability for this device.
      Signed-off-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Acked-by: default avatarAshok Raj <ashok.raj@intel.com>
      Acked-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarCasey Leedom <leedom@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a99b646a
  5. 14 Aug, 2017 7 commits
    • Linus Torvalds's avatar
      Merge tag 'md/4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md · fcd07350
      Linus Torvalds authored
      Pull MD fixes from Shaohua Li:
       "Fix several bugs:
      
         - fix a rcu stall issue introduced in 4.12 (Neil Brown)
      
         - fix two raid5 cache race conditions (Song Liu)"
      
      * tag 'md/4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
        MD: not clear ->safemode for external metadata array
        md/r5cache: fix io_unit handling in r5l_log_endio()
        md/r5cache: call mddev_lock/unlock() in r5c_journal_mode_set
        md: fix test in md_write_start()
        md: always clear ->safemode when md_check_recovery gets the mddev lock.
      fcd07350
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 6b9d1c24
      Linus Torvalds authored
      Pull crypto fixes from Herbert Xu:
       "Fix an error path bug in ixp4xx as well as a read overrun in
       sha1-avx2"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: x86/sha1 - Fix reads beyond the number of blocks passed
        crypto: ixp4xx - Fix error handling path in 'aead_perform()'
      6b9d1c24
    • Jon Paul Maloy's avatar
      tipc: avoid inheriting msg_non_seq flag when message is returned · 59a361bc
      Jon Paul Maloy authored
      In the function msg_reverse(), we reverse the header while trying to
      reuse the original buffer whenever possible. Those rejected/returned
      messages are always transmitted as unicast, but the msg_non_seq field
      is not explicitly set to zero as it should be.
      
      We have seen cases where multicast senders set the message type to
      "NOT dest_droppable", meaning that a multicast message shorter than
      one MTU will be returned, e.g., during receive buffer overflow, by
      reusing the original buffer. This has the effect that even the
      'msg_non_seq' field is inadvertently inherited by the rejected message,
      although it is now sent as a unicast message. This again leads the
      receiving unicast link endpoint to steer the packet toward the broadcast
      link receive function, where it is dropped. The affected unicast link is
      thereafter (after 100 failed retransmissions) declared 'stale' and
      reset.
      
      We fix this by unconditionally setting the 'msg_non_seq' flag to zero
      for all rejected/returned messages.
      Reported-by: default avatarCanh Duc Luu <canh.d.luu@dektech.com.au>
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      59a361bc
    • Jon Paul Maloy's avatar
      tipc: accept PACKET_MULTICAST packets · fed5f571
      Jon Paul Maloy authored
      On L2 bearers, the TIPC broadcast function is sending out packets using
      the corresponding L2 broadcast address. At reception, we filter such
      packets under the assumption that they will also be delivered as
      broadcast packets.
      
      This assumption doesn't always hold true. Under high load, we have seen
      that a switch may convert the destination address and deliver the packet
      as a PACKET_MULTICAST, something leading to inadvertently dropped
      packets and a stale and reset broadcast link.
      
      We fix this by extending the reception filtering to accept packets of
      type PACKET_MULTICAST.
      Signed-off-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fed5f571
    • Florian Westphal's avatar
      ipv4: route: fix inet_rtm_getroute induced crash · 2c87d63a
      Florian Westphal authored
      "ip route get $daddr iif eth0 from $saddr" causes:
       BUG: KASAN: use-after-free in ip_route_input_rcu+0x1535/0x1b50
       Call Trace:
        ip_route_input_rcu+0x1535/0x1b50
        ip_route_input_noref+0xf9/0x190
        tcp_v4_early_demux+0x1a4/0x2b0
        ip_rcv+0xbcb/0xc05
        __netif_receive_skb+0x9c/0xd0
        netif_receive_skb_internal+0x5a8/0x890
      
      Problem is that inet_rtm_getroute calls either ip_route_input_rcu (if an
      iif was provided) or ip_route_output_key_hash_rcu.
      
      But ip_route_input_rcu, unlike ip_route_output_key_hash_rcu, already
      associates the dst_entry with the skb.  This clears the SKB_DST_NOREF
      bit (i.e. skb_dst_drop will release/free the entry while it should not).
      
      Thus only set the dst if we called ip_route_output_key_hash_rcu().
      
      I tested this patch by running:
       while true;do ip r get 10.0.1.2;done > /dev/null &
       while true;do ip r get 10.0.1.2 iif eth0  from 10.0.1.1;done > /dev/null &
      ... and saw no crash or memory leak.
      
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Cc: David Ahern <dsahern@gmail.com>
      Fixes: ba52d61e ("ipv4: route: restore skb_dst_set in inet_rtm_getroute")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c87d63a
    • Arend Van Spriel's avatar
      brcmfmac: feature check for multi-scheduled scan fails on bcm4343x devices · e9bf53ab
      Arend Van Spriel authored
      The firmware feature check introduced for multi-scheduled scan turned out
      to be failing for bcm4343{0,1,8} devices resulting in a firmware crash.
      The reason for this crash has not yet been root cause so this patch avoids
      the feature check for those device as a short-term fix.
      Reported-by: default avatarStefan Wahren <stefan.wahren@i2se.com>
      Reported-by: default avatarIan Molton <ian@mnementh.co.uk>
      Fixes: 9fe929aa ("brcmfmac: add firmware feature detection for gscan feature")
      Signed-off-by: default avatarArend van Spriel <arend.vanspriel@broadcom.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      e9bf53ab
    • Andreas Born's avatar
      bonding: ratelimit failed speed/duplex update warning · 11e9d782
      Andreas Born authored
      bond_miimon_commit() handles the UP transition for each slave of a bond
      in the case of MII. It is triggered 10 times per second for the default
      MII Polling interval of 100ms. For device drivers that do not implement
      __ethtool_get_link_ksettings() the call to bond_update_speed_duplex()
      fails persistently while the MII status could remain UP. That is, in
      this and other cases where the speed/duplex update keeps failing over a
      longer period of time while the MII state is UP, a warning is printed
      every MII polling interval.
      
      To address these excessive warnings net_ratelimit() should be used.
      Printing a warning once would not be sufficient since the call to
      bond_update_speed_duplex() could recover to succeed and fail again
      later. In that case there would be no new indication what went wrong.
      
      Fixes: b5bf0f5b (bonding: correctly update link status during mii-commit phase)
      Signed-off-by: default avatarAndreas Born <futur.andy@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11e9d782
  6. 13 Aug, 2017 1 commit