1. 28 Jun, 2019 3 commits
    • Linus Torvalds's avatar
      Merge tag 'afs-fixes-20190620' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · cd0f3aae
      Linus Torvalds authored
      Pull AFS fixes from David Howells:
       "The in-kernel AFS client has been undergoing testing on opendev.org on
        one of their mirror machines. They are using AFS to hold data that is
        then served via apache, and Ian Wienand had reported seeing oopses,
        spontaneous machine reboots and updates to volumes going missing. This
        patch series appears to have fixed the problem, very probably due to
        patch (2), but it's not 100% certain.
      
        (1) Fix the printing of the "vnode modified" warning to exclude checks
            on files for which we don't have a callback promise from the
            server (and so don't expect the server to tell us when it
            changes).
      
            Without this, for every file or directory for which we still have
            an in-core inode that gets changed on the server, we may get a
            message logged when we next look at it. This can happen in bulk
            if, for instance, someone does "vos release" to update a R/O
            volume from a R/W volume and a whole set of files are all changed
            together.
      
            We only really want to log a message if the file changed and the
            server didn't tell us about it or we failed to track the state
            internally.
      
        (2) Fix accidental corruption of either afs_vlserver struct objects or
            the the following memory locations (which could hold anything).
            The issue is caused by a union that points to two different
            structs in struct afs_call (to save space in the struct). The call
            cleanup code assumes that it can simply call the cleanup for one
            of those structs if not NULL - when it might be actually pointing
            to the other struct.
      
            This means that every Volume Location RPC op is going to corrupt
            something.
      
        (3) Fix an uninitialised spinlock. This isn't too bad, it just causes
            a one-off warning if lockdep is enabled when "vos release" is
            called, but the spinlock still behaves correctly.
      
        (4) Fix the setting of i_block in the inode. This causes du, for
            example, to produce incorrect results, but otherwise should not be
            dangerous to the kernel"
      
      * tag 'afs-fixes-20190620' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        afs: Fix setting of i_blocks
        afs: Fix uninitialised spinlock afs_volume::cb_break_lock
        afs: Fix vlserver record corruption
        afs: Fix over zealous "vnode modified" warnings
      cd0f3aae
    • Linus Torvalds's avatar
      Merge tag 'csky-for-linus-5.2-fixup-gcc-unwind' of git://github.com/c-sky/csky-linux · 139ca258
      Linus Torvalds authored
      Pull arch/csky fixup from Guo Ren:
       "A fixup patch for rt_sigframe in signal.c"
      
      * tag 'csky-for-linus-5.2-fixup-gcc-unwind' of git://github.com/c-sky/csky-linux:
        csky: Fixup libgcc unwind error
      139ca258
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · c84afab0
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix ppp_mppe crypto soft dependencies, from Takashi Iawi.
      
       2) Fix TX completion to be finite, from Sergej Benilov.
      
       3) Use register_pernet_device to avoid a dst leak in tipc, from Xin
          Long.
      
       4) Double free of TX cleanup in Dirk van der Merwe.
      
       5) Memory leak in packet_set_ring(), from Eric Dumazet.
      
       6) Out of bounds read in qmi_wwan, from Bjørn Mork.
      
       7) Fix iif used in mcast/bcast looped back packets, from Stephen
          Suryaputra.
      
       8) Fix neighbour resolution on raw ipv6 sockets, from Nicolas Dichtel.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (25 commits)
        af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET
        sctp: change to hold sk after auth shkey is created successfully
        ipv6: fix neighbour resolution with raw socket
        ipv6: constify rt6_nexthop()
        net: dsa: microchip: Use gpiod_set_value_cansleep()
        net: aquantia: fix vlans not working over bridged network
        ipv4: reset rt_iif for recirculated mcast/bcast out pkts
        team: Always enable vlan tx offload
        net/smc: Fix error path in smc_init
        net/smc: hold conns_lock before calling smc_lgr_register_conn()
        bonding: Always enable vlan tx offload
        net/ipv6: Fix misuse of proc_dointvec "skip_notify_on_dev_down"
        ipv4: Use return value of inet_iif() for __raw_v4_lookup in the while loop
        qmi_wwan: Fix out-of-bounds read
        tipc: check msg->req data len in tipc_nl_compat_bearer_disable
        net: macb: do not copy the mac address if NULL
        net/packet: fix memory leak in packet_set_ring()
        net/tls: fix page double free on TX cleanup
        net/sched: cbs: Fix error path of cbs_module_init
        tipc: change to use register_pernet_device
        ...
      c84afab0
  2. 27 Jun, 2019 2 commits
    • Neil Horman's avatar
      af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET · 89ed5b51
      Neil Horman authored
      When an application is run that:
      a) Sets its scheduler to be SCHED_FIFO
      and
      b) Opens a memory mapped AF_PACKET socket, and sends frames with the
      MSG_DONTWAIT flag cleared, its possible for the application to hang
      forever in the kernel.  This occurs because when waiting, the code in
      tpacket_snd calls schedule, which under normal circumstances allows
      other tasks to run, including ksoftirqd, which in some cases is
      responsible for freeing the transmitted skb (which in AF_PACKET calls a
      destructor that flips the status bit of the transmitted frame back to
      available, allowing the transmitting task to complete).
      
      However, when the calling application is SCHED_FIFO, its priority is
      such that the schedule call immediately places the task back on the cpu,
      preventing ksoftirqd from freeing the skb, which in turn prevents the
      transmitting task from detecting that the transmission is complete.
      
      We can fix this by converting the schedule call to a completion
      mechanism.  By using a completion queue, we force the calling task, when
      it detects there are no more frames to send, to schedule itself off the
      cpu until such time as the last transmitted skb is freed, allowing
      forward progress to be made.
      
      Tested by myself and the reporter, with good results
      
      Change Notes:
      
      V1->V2:
      	Enhance the sleep logic to support being interruptible and
      allowing for honoring to SK_SNDTIMEO (Willem de Bruijn)
      
      V2->V3:
      	Rearrage the point at which we wait for the completion queue, to
      avoid needing to check for ph/skb being null at the end of the loop.
      Also move the complete call to the skb destructor to avoid needing to
      modify __packet_set_status.  Also gate calling complete on
      packet_read_pending returning zero to avoid multiple calls to complete.
      (Willem de Bruijn)
      
      	Move timeo computation within loop, to re-fetch the socket
      timeout since we also use the timeo variable to record the return code
      from the wait_for_complete call (Neil Horman)
      
      V3->V4:
      	Willem has requested that the control flow be restored to the
      previous state.  Doing so lets us eliminate the need for the
      po->wait_on_complete flag variable, and lets us get rid of the
      packet_next_frame function, but introduces another complexity.
      Specifically, but using the packet pending count, we can, if an
      applications calls sendmsg multiple times with MSG_DONTWAIT set, each
      set of transmitted frames, when complete, will cause
      tpacket_destruct_skb to issue a complete call, for which there will
      never be a wait_on_completion call.  This imbalance will lead to any
      future call to wait_for_completion here to return early, when the frames
      they sent may not have completed.  To correct this, we need to re-init
      the completion queue on every call to tpacket_snd before we enter the
      loop so as to ensure we wait properly for the frames we send in this
      iteration.
      
      	Change the timeout and interrupted gotos to out_put rather than
      out_status so that we don't try to free a non-existant skb
      	Clean up some extra newlines (Willem de Bruijn)
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-by: default avatarMatteo Croce <mcroce@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89ed5b51
    • Xin Long's avatar
      sctp: change to hold sk after auth shkey is created successfully · 25bff6d5
      Xin Long authored
      Now in sctp_endpoint_init(), it holds the sk then creates auth
      shkey. But when the creation fails, it doesn't release the sk,
      which causes a sk defcnf leak,
      
      Here to fix it by only holding the sk when auth shkey is created
      successfully.
      
      Fixes: a29a5bd4 ("[SCTP]: Implement SCTP-AUTH initializations.")
      Reported-by: syzbot+afabda3890cc2f765041@syzkaller.appspotmail.com
      Reported-by: syzbot+276ca1c77a19977c0130@syzkaller.appspotmail.com
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25bff6d5
  3. 26 Jun, 2019 12 commits
  4. 25 Jun, 2019 2 commits
  5. 24 Jun, 2019 11 commits
    • Linus Torvalds's avatar
      Merge branch 'parisc-5.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · 249155c2
      Linus Torvalds authored
      Pull parisc fix from Helge Deller:
       "Add missing PCREL64 relocation in module loader to fix module load
        errors when the static branch and JUMP_LABEL feature is enabled on
        a 64-bit kernel"
      
      * 'parisc-5.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Fix module loading error with JUMP_LABEL feature
      249155c2
    • Linus Torvalds's avatar
      Merge tag 'mfd-fixes-5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd · c88e40e0
      Linus Torvalds authored
      Pull mfd bugfix from Lee Jones.
      
      Fix stmfx type confusion between regmap_read() (which takes an "u32")
      and the bitmap operations (which take an "unsigned long" array).
      
      * tag 'mfd-fixes-5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
        mfd: stmfx: Fix an endian bug in stmfx_irq_handler()
        mfd: stmfx: Uninitialized variable in stmfx_irq_handler()
      c88e40e0
    • Bjørn Mork's avatar
      qmi_wwan: Fix out-of-bounds read · 904d88d7
      Bjørn Mork authored
      The syzbot reported
      
       Call Trace:
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0xca/0x13e lib/dump_stack.c:113
        print_address_description+0x67/0x231 mm/kasan/report.c:188
        __kasan_report.cold+0x1a/0x32 mm/kasan/report.c:317
        kasan_report+0xe/0x20 mm/kasan/common.c:614
        qmi_wwan_probe+0x342/0x360 drivers/net/usb/qmi_wwan.c:1417
        usb_probe_interface+0x305/0x7a0 drivers/usb/core/driver.c:361
        really_probe+0x281/0x660 drivers/base/dd.c:509
        driver_probe_device+0x104/0x210 drivers/base/dd.c:670
        __device_attach_driver+0x1c2/0x220 drivers/base/dd.c:777
        bus_for_each_drv+0x15c/0x1e0 drivers/base/bus.c:454
      
      Caused by too many confusing indirections and casts.
      id->driver_info is a pointer stored in a long.  We want the
      pointer here, not the address of it.
      
      Thanks-to: Hillf Danton <hdanton@sina.com>
      Reported-by: syzbot+b68605d7fadd21510de1@syzkaller.appspotmail.com
      Cc: Kristian Evensen <kristian.evensen@gmail.com>
      Fixes: e4bf6348 ("qmi_wwan: Add quirk for Quectel dynamic config")
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      904d88d7
    • Xin Long's avatar
      tipc: check msg->req data len in tipc_nl_compat_bearer_disable · 4f07b80c
      Xin Long authored
      This patch is to fix an uninit-value issue, reported by syzbot:
      
        BUG: KMSAN: uninit-value in memchr+0xce/0x110 lib/string.c:981
        Call Trace:
          __dump_stack lib/dump_stack.c:77 [inline]
          dump_stack+0x191/0x1f0 lib/dump_stack.c:113
          kmsan_report+0x130/0x2a0 mm/kmsan/kmsan.c:622
          __msan_warning+0x75/0xe0 mm/kmsan/kmsan_instr.c:310
          memchr+0xce/0x110 lib/string.c:981
          string_is_valid net/tipc/netlink_compat.c:176 [inline]
          tipc_nl_compat_bearer_disable+0x2a1/0x480 net/tipc/netlink_compat.c:449
          __tipc_nl_compat_doit net/tipc/netlink_compat.c:327 [inline]
          tipc_nl_compat_doit+0x3ac/0xb00 net/tipc/netlink_compat.c:360
          tipc_nl_compat_handle net/tipc/netlink_compat.c:1178 [inline]
          tipc_nl_compat_recv+0x1b1b/0x27b0 net/tipc/netlink_compat.c:1281
      
      TLV_GET_DATA_LEN() may return a negtive int value, which will be
      used as size_t (becoming a big unsigned long) passed into memchr,
      cause this issue.
      
      Similar to what it does in tipc_nl_compat_bearer_enable(), this
      fix is to return -EINVAL when TLV_GET_DATA_LEN() is negtive in
      tipc_nl_compat_bearer_disable(), as well as in
      tipc_nl_compat_link_stat_dump() and tipc_nl_compat_link_reset_stats().
      
      v1->v2:
        - add the missing Fixes tags per Eric's request.
      
      Fixes: 0762216c ("tipc: fix uninit-value in tipc_nl_compat_bearer_enable")
      Fixes: 8b66fee7 ("tipc: fix uninit-value in tipc_nl_compat_link_reset_stats")
      Reported-by: syzbot+30eaa8bf392f7fafffaf@syzkaller.appspotmail.com
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4f07b80c
    • Antoine Tenart's avatar
      net: macb: do not copy the mac address if NULL · 2bf4ecbc
      Antoine Tenart authored
      This patch fixes the MAC address setup in the probe. The MAC address
      retrieved using of_get_mac_address was checked for not containing an
      error, but it may also be NULL which wasn't tested. Fix it by replacing
      IS_ERR with IS_ERR_OR_NULL.
      
      Fixes: 541ddc66 ("net: macb: support of_get_mac_address new ERR_PTR error")
      Signed-off-by: default avatarAntoine Tenart <antoine.tenart@bootlin.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2bf4ecbc
    • Eric Dumazet's avatar
      net/packet: fix memory leak in packet_set_ring() · 55655e3d
      Eric Dumazet authored
      syzbot found we can leak memory in packet_set_ring(), if user application
      provides buggy parameters.
      
      Fixes: 7f953ab2 ("af_packet: TX_RING support for TPACKET_V3")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55655e3d
    • Dirk van der Merwe's avatar
      net/tls: fix page double free on TX cleanup · 9354544c
      Dirk van der Merwe authored
      With commit 94850257 ("tls: Fix tls_device handling of partial records")
      a new path was introduced to cleanup partial records during sk_proto_close.
      This path does not handle the SW KTLS tx_list cleanup.
      
      This is unnecessary though since the free_resources calls for both
      SW and offload paths will cleanup a partial record.
      
      The visible effect is the following warning, but this bug also causes
      a page double free.
      
          WARNING: CPU: 7 PID: 4000 at net/core/stream.c:206 sk_stream_kill_queues+0x103/0x110
          RIP: 0010:sk_stream_kill_queues+0x103/0x110
          RSP: 0018:ffffb6df87e07bd0 EFLAGS: 00010206
          RAX: 0000000000000000 RBX: ffff8c21db4971c0 RCX: 0000000000000007
          RDX: ffffffffffffffa0 RSI: 000000000000001d RDI: ffff8c21db497270
          RBP: ffff8c21db497270 R08: ffff8c29f4748600 R09: 000000010020001a
          R10: ffffb6df87e07aa0 R11: ffffffff9a445600 R12: 0000000000000007
          R13: 0000000000000000 R14: ffff8c21f03f2900 R15: ffff8c21f03b8df0
          Call Trace:
           inet_csk_destroy_sock+0x55/0x100
           tcp_close+0x25d/0x400
           ? tcp_check_oom+0x120/0x120
           tls_sk_proto_close+0x127/0x1c0
           inet_release+0x3c/0x60
           __sock_release+0x3d/0xb0
           sock_close+0x11/0x20
           __fput+0xd8/0x210
           task_work_run+0x84/0xa0
           do_exit+0x2dc/0xb90
           ? release_sock+0x43/0x90
           do_group_exit+0x3a/0xa0
           get_signal+0x295/0x720
           do_signal+0x36/0x610
           ? SYSC_recvfrom+0x11d/0x130
           exit_to_usermode_loop+0x69/0xb0
           do_syscall_64+0x173/0x180
           entry_SYSCALL_64_after_hwframe+0x3d/0xa2
          RIP: 0033:0x7fe9b9abc10d
          RSP: 002b:00007fe9b19a1d48 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
          RAX: fffffffffffffe00 RBX: 0000000000000006 RCX: 00007fe9b9abc10d
          RDX: 0000000000000002 RSI: 0000000000000080 RDI: 00007fe948003430
          RBP: 00007fe948003410 R08: 00007fe948003430 R09: 0000000000000000
          R10: 0000000000000000 R11: 0000000000000246 R12: 00005603739d9080
          R13: 00007fe9b9ab9f90 R14: 00007fe948003430 R15: 0000000000000000
      
      Fixes: 94850257 ("tls: Fix tls_device handling of partial records")
      Signed-off-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9354544c
    • Dan Carpenter's avatar
      mfd: stmfx: Fix an endian bug in stmfx_irq_handler() · 63b2de12
      Dan Carpenter authored
      It's not okay to cast a "u32 *" to "unsigned long *" when you are
      doing a for_each_set_bit() loop because that will break on big
      endian systems.
      
      Fixes: 386145601b82 ("mfd: stmfx: Uninitialized variable in stmfx_irq_handler()")
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Tested-by: default avatarAmelie Delaunay <amelie.delaunay@st.com>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      63b2de12
    • Linus Torvalds's avatar
      Merge tag 'mtd/fixes-for-5.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux · 39071cf8
      Linus Torvalds authored
      Pull mtd fixes from Miquel Raynal:
      
       - Set the raw NAND number of targets to the right value
      
       - Fix a bug uncovered by a recent patch on Spansion SPI-NOR flashes
      
      * tag 'mtd/fixes-for-5.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
        mtd: spi-nor: use 16-bit WRR command when QE is set on spansion flashes
        mtd: rawnand: initialize ntargets with maxchips
      39071cf8
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.2-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 26df62aa
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "One fix for a bug in our context id handling on 64-bit hash CPUs,
        which can lead to unrelated processes being able to read/write to each
        other's virtual memory. See the commit for full details.
      
        That is the fix for CVE-2019-12817.
      
        This also adds a kernel selftest for the bug"
      
      * tag 'powerpc-5.2-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        selftests/powerpc: Add test of fork with mapping above 512TB
        powerpc/mm/64s/hash: Reallocate context ids on fork
      26df62aa
    • Linus Torvalds's avatar
      Merge tag 'auxdisplay-for-linus-v5.2-rc7' of git://github.com/ojeda/linux · 92165146
      Linus Torvalds authored
      Pull auxdisplay cleanup from Miguel Ojeda:
       "A cleanup for two drivers in auxdisplay: convert them to use
        vm_map_pages_zero() (Souptick Joarder)"
      
      * tag 'auxdisplay-for-linus-v5.2-rc7' of git://github.com/ojeda/linux:
        auxdisplay/ht16k33.c: Convert to use vm_map_pages_zero()
        auxdisplay/cfag12864bfb.c: Convert to use vm_map_pages_zero()
      92165146
  6. 23 Jun, 2019 2 commits
  7. 22 Jun, 2019 8 commits
    • Xin Long's avatar
      tipc: change to use register_pernet_device · c492d4c7
      Xin Long authored
      This patch is to fix a dst defcnt leak, which can be reproduced by doing:
      
        # ip net a c; ip net a s; modprobe tipc
        # ip net e s ip l a n eth1 type veth peer n eth1 netns c
        # ip net e c ip l s lo up; ip net e c ip l s eth1 up
        # ip net e s ip l s lo up; ip net e s ip l s eth1 up
        # ip net e c ip a a 1.1.1.2/8 dev eth1
        # ip net e s ip a a 1.1.1.1/8 dev eth1
        # ip net e c tipc b e m udp n u1 localip 1.1.1.2
        # ip net e s tipc b e m udp n u1 localip 1.1.1.1
        # ip net d c; ip net d s; rmmod tipc
      
      and it will get stuck and keep logging the error:
      
        unregister_netdevice: waiting for lo to become free. Usage count = 1
      
      The cause is that a dst is held by the udp sock's sk_rx_dst set on udp rx
      path with udp_early_demux == 1, and this dst (eventually holding lo dev)
      can't be released as bearer's removal in tipc pernet .exit happens after
      lo dev's removal, default_device pernet .exit.
      
       "There are two distinct types of pernet_operations recognized: subsys and
        device.  At creation all subsys init functions are called before device
        init functions, and at destruction all device exit functions are called
        before subsys exit function."
      
      So by calling register_pernet_device instead to register tipc_net_ops, the
      pernet .exit() will be invoked earlier than loopback dev's removal when a
      netns is being destroyed, as fou/gue does.
      
      Note that vxlan and geneve udp tunnels don't have this issue, as the udp
      sock is released in their device ndo_stop().
      
      This fix is also necessary for tipc dst_cache, which will hold dsts on tx
      path and I will introduce in my next patch.
      Reported-by: default avatarLi Shuang <shuali@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c492d4c7
    • Sergej Benilov's avatar
      sis900: fix TX completion · 8ac8a010
      Sergej Benilov authored
      Since commit 605ad7f1 "tcp: refine TSO autosizing",
      outbound throughput is dramatically reduced for some connections, as sis900
      is doing TX completion within idle states only.
      
      Make TX completion happen after every transmitted packet.
      
      Test:
      netperf
      
      before patch:
      > netperf -H remote -l -2000000 -- -s 1000000
      MIGRATED TCP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 95.223.112.76 () port 0 AF_INET : demo
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380 327680 327680    253.44      0.06
      
      after patch:
      > netperf -H remote -l -10000000 -- -s 1000000
      MIGRATED TCP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 95.223.112.76 () port 0 AF_INET : demo
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380 327680 327680    5.38       14.89
      
      Thx to Dave Miller and Eric Dumazet for helpful hints
      Signed-off-by: default avatarSergej Benilov <sergej.benilov@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ac8a010
    • Roland Hii's avatar
      net: stmmac: set IC bit when transmitting frames with HW timestamp · d0bb82fd
      Roland Hii authored
      When transmitting certain PTP frames, e.g. SYNC and DELAY_REQ, the
      PTP daemon, e.g. ptp4l, is polling the driver for the frame transmit
      hardware timestamp. The polling will most likely timeout if the tx
      coalesce is enabled due to the Interrupt-on-Completion (IC) bit is
      not set in tx descriptor for those frames.
      
      This patch will ignore the tx coalesce parameter and set the IC bit
      when transmitting PTP frames which need to report out the frame
      transmit hardware timestamp to user space.
      
      Fixes: f748be53 ("net: stmmac: Rework coalesce timer and fix multi-queue races")
      Signed-off-by: default avatarRoland Hii <roland.king.guan.hii@intel.com>
      Signed-off-by: default avatarOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: default avatarVoon Weifeng <weifeng.voon@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0bb82fd
    • Roland Hii's avatar
      net: stmmac: fixed new system time seconds value calculation · a1e5388b
      Roland Hii authored
      When ADDSUB bit is set, the system time seconds field is calculated as
      the complement of the seconds part of the update value.
      
      For example, if 3.000000001 seconds need to be subtracted from the
      system time, this field is calculated as
      2^32 - 3 = 4294967296 - 3 = 0x100000000 - 3 = 0xFFFFFFFD
      
      Previously, the 0x100000000 is mistakenly written as 100000000.
      
      This is further simplified from
        sec = (0x100000000ULL - sec);
      to
        sec = -sec;
      
      Fixes: ba1ffd74 ("stmmac: fix PTP support for GMAC4")
      Signed-off-by: default avatarRoland Hii <roland.king.guan.hii@intel.com>
      Signed-off-by: default avatarOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: default avatarVoon Weifeng <weifeng.voon@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1e5388b
    • Linus Torvalds's avatar
      Linux 5.2-rc6 · 4b972a01
      Linus Torvalds authored
      4b972a01
    • Linus Torvalds's avatar
      Merge tag 'iommu-fix-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 6698a71a
      Linus Torvalds authored
      Pull iommu fix from Joerg Roedel:
       "Revert a commit from the previous pile of fixes which causes new
        lockdep splats. It is better to revert it for now and work on a better
        and more well tested fix"
      
      * tag 'iommu-fix-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock"
      6698a71a
    • Peter Xu's avatar
      Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock" · 0aafc8ae
      Peter Xu authored
      This reverts commit 7560cc3c.
      
      With 5.2.0-rc5 I can easily trigger this with lockdep and iommu=pt:
      
          ======================================================
          WARNING: possible circular locking dependency detected
          5.2.0-rc5 #78 Not tainted
          ------------------------------------------------------
          swapper/0/1 is trying to acquire lock:
          00000000ea2b3beb (&(&iommu->lock)->rlock){+.+.}, at: domain_context_mapping_one+0xa5/0x4e0
          but task is already holding lock:
          00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0
          which lock already depends on the new lock.
          the existing dependency chain (in reverse order) is:
          -> #1 (device_domain_lock){....}:
                 _raw_spin_lock_irqsave+0x3c/0x50
                 dmar_insert_one_dev_info+0xbb/0x510
                 domain_add_dev_info+0x50/0x90
                 dev_prepare_static_identity_mapping+0x30/0x68
                 intel_iommu_init+0xddd/0x1422
                 pci_iommu_init+0x16/0x3f
                 do_one_initcall+0x5d/0x2b4
                 kernel_init_freeable+0x218/0x2c1
                 kernel_init+0xa/0x100
                 ret_from_fork+0x3a/0x50
          -> #0 (&(&iommu->lock)->rlock){+.+.}:
                 lock_acquire+0x9e/0x170
                 _raw_spin_lock+0x25/0x30
                 domain_context_mapping_one+0xa5/0x4e0
                 pci_for_each_dma_alias+0x30/0x140
                 dmar_insert_one_dev_info+0x3b2/0x510
                 domain_add_dev_info+0x50/0x90
                 dev_prepare_static_identity_mapping+0x30/0x68
                 intel_iommu_init+0xddd/0x1422
                 pci_iommu_init+0x16/0x3f
                 do_one_initcall+0x5d/0x2b4
                 kernel_init_freeable+0x218/0x2c1
                 kernel_init+0xa/0x100
                 ret_from_fork+0x3a/0x50
      
          other info that might help us debug this:
           Possible unsafe locking scenario:
                 CPU0                    CPU1
                 ----                    ----
            lock(device_domain_lock);
                                         lock(&(&iommu->lock)->rlock);
                                         lock(device_domain_lock);
            lock(&(&iommu->lock)->rlock);
      
           *** DEADLOCK ***
          2 locks held by swapper/0/1:
           #0: 00000000033eb13d (dmar_global_lock){++++}, at: intel_iommu_init+0x1e0/0x1422
           #1: 00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0
      
          stack backtrace:
          CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5 #78
          Hardware name: LENOVO 20KGS35G01/20KGS35G01, BIOS N23ET50W (1.25 ) 06/25/2018
          Call Trace:
           dump_stack+0x85/0xc0
           print_circular_bug.cold.57+0x15c/0x195
           __lock_acquire+0x152a/0x1710
           lock_acquire+0x9e/0x170
           ? domain_context_mapping_one+0xa5/0x4e0
           _raw_spin_lock+0x25/0x30
           ? domain_context_mapping_one+0xa5/0x4e0
           domain_context_mapping_one+0xa5/0x4e0
           ? domain_context_mapping_one+0x4e0/0x4e0
           pci_for_each_dma_alias+0x30/0x140
           dmar_insert_one_dev_info+0x3b2/0x510
           domain_add_dev_info+0x50/0x90
           dev_prepare_static_identity_mapping+0x30/0x68
           intel_iommu_init+0xddd/0x1422
           ? printk+0x58/0x6f
           ? lockdep_hardirqs_on+0xf0/0x180
           ? do_early_param+0x8e/0x8e
           ? e820__memblock_setup+0x63/0x63
           pci_iommu_init+0x16/0x3f
           do_one_initcall+0x5d/0x2b4
           ? do_early_param+0x8e/0x8e
           ? rcu_read_lock_sched_held+0x55/0x60
           ? do_early_param+0x8e/0x8e
           kernel_init_freeable+0x218/0x2c1
           ? rest_init+0x230/0x230
           kernel_init+0xa/0x100
           ret_from_fork+0x3a/0x50
      
      domain_context_mapping_one() is taking device_domain_lock first then
      iommu lock, while dmar_insert_one_dev_info() is doing the reverse.
      
      That should be introduced by commit:
      
      7560cc3c ("iommu/vt-d: Fix lock inversion between iommu->lock and
                    device_domain_lock", 2019-05-27)
      
      So far I still cannot figure out how the previous deadlock was
      triggered (I cannot find iommu lock taken before calling of
      iommu_flush_dev_iotlb()), however I'm pretty sure that that change
      should be incomplete at least because it does not fix all the places
      so we're still taking the locks in different orders, while reverting
      that commit is very clean to me so far that we should always take
      device_domain_lock first then the iommu lock.
      
      We can continue to try to find the real culprit mentioned in
      7560cc3c, but for now I think we should revert it to fix current
      breakage.
      
      CC: Joerg Roedel <joro@8bytes.org>
      CC: Lu Baolu <baolu.lu@linux.intel.com>
      CC: dave.jiang@intel.com
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Tested-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      0aafc8ae
    • Linus Torvalds's avatar
      Merge tag 'pci-v5.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · b253d5f3
      Linus Torvalds authored
      Pull PCI fix from Bjorn Helgaas:
       "If an IOMMU is present, ignore the P2PDMA whitelist we added for v5.2
        because we don't yet know how to support P2PDMA in that case (Logan
        Gunthorpe)"
      
      * tag 'pci-v5.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI/P2PDMA: Ignore root complex whitelist when an IOMMU is present
      b253d5f3