1. 06 Aug, 2021 10 commits
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · cc4e5eec
      Jakub Kicinski authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Restrict range element expansion in ipset to avoid soft lockup,
         from Jozsef Kadlecsik.
      
      2) Memleak in error path for nf_conntrack_bridge for IPv4 packets,
         from Yajun Deng.
      
      3) Simplify conntrack garbage collection strategy to avoid frequent
         wake-ups, from Florian Westphal.
      
      4) Fix NFNLA_HOOK_FUNCTION_NAME string, do not include module name.
      
      5) Missing chain family netlink attribute in chain description
         in nfnetlink_hook.
      
      6) Incorrect sequence number on nfnetlink_hook dumps.
      
      7) Use netlink request family in reply message for consistency.
      
      8) Remove offload_pickup sysctl, use conntrack for established state
         instead, from Florian Westphal.
      
      9) Translate NFPROTO_INET/ingress to NFPROTO_NETDEV/ingress, since
         NFPROTO_INET is not exposed through nfnetlink_hook.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
        netfilter: nfnetlink_hook: translate inet ingress to netdev
        netfilter: conntrack: remove offload_pickup sysctl again
        netfilter: nfnetlink_hook: Use same family as request message
        netfilter: nfnetlink_hook: use the sequence number of the request message
        netfilter: nfnetlink_hook: missing chain family
        netfilter: nfnetlink_hook: strip off module name from hookfn
        netfilter: conntrack: collect all entries in one cycle
        netfilter: nf_conntrack_bridge: Fix memory leak when error
        netfilter: ipset: Limit the maximal range of consecutive elements to add/delete
      ====================
      
      Link: https://lore.kernel.org/r/20210806151149.6356-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cc4e5eec
    • Pablo Neira Ayuso's avatar
      netfilter: nfnetlink_hook: translate inet ingress to netdev · 269fc695
      Pablo Neira Ayuso authored
      The NFPROTO_INET pseudofamily is not exposed through this new netlink
      interface. The netlink dump either shows NFPROTO_IPV4 or NFPROTO_IPV6
      for NFPROTO_INET prerouting/input/forward/output/postrouting hooks.
      The NFNLA_CHAIN_FAMILY attribute provides the family chain, which
      specifies if this hook applies to inet traffic only (either IPv4 or
      IPv6).
      
      Translate the inet/ingress hook to netdev/ingress to fully hide the
      NFPROTO_INET implementation details.
      
      Fixes: e2cf17d3 ("netfilter: add new hook nfnl subsystem")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      269fc695
    • Florian Westphal's avatar
      netfilter: conntrack: remove offload_pickup sysctl again · 4592ee7f
      Florian Westphal authored
      These two sysctls were added because the hardcoded defaults (2 minutes,
      tcp, 30 seconds, udp) turned out to be too low for some setups.
      
      They appeared in 5.14-rc1 so it should be fine to remove it again.
      
      Marcelo convinced me that there should be no difference between a flow
      that was offloaded vs. a flow that was not wrt. timeout handling.
      Thus the default is changed to those for TCP established and UDP stream,
      5 days and 120 seconds, respectively.
      
      Marcelo also suggested to account for the timeout value used for the
      offloading, this avoids increase beyond the value in the conntrack-sysctl
      and will also instantly expire the conntrack entry with altered sysctls.
      
      Example:
         nf_conntrack_udp_timeout_stream=60
         nf_flowtable_udp_timeout=60
      
      This will remove offloaded udp flows after one minute, rather than two.
      
      An earlier version of this patch also cleared the ASSURED bit to
      allow nf_conntrack to evict the entry via early_drop (i.e., table full).
      However, it looks like we can safely assume that connection timed out
      via HW is still in established state, so this isn't needed.
      
      Quoting Oz:
       [..] the hardware sends all packets with a set FIN flags to sw.
       [..] Connections that are aged in hardware are expected to be in the
       established state.
      
      In case it turns out that back-to-sw-path transition can occur for
      'dodgy' connections too (e.g., one side disappeared while software-path
      would have been in RETRANS timeout), we can adjust this later.
      
      Cc: Oz Shlomo <ozsh@nvidia.com>
      Cc: Paul Blakey <paulb@nvidia.com>
      Suggested-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      4592ee7f
    • Pablo Neira Ayuso's avatar
      netfilter: nfnetlink_hook: Use same family as request message · 69311e7c
      Pablo Neira Ayuso authored
      Use the same family as the request message, for consistency. The
      netlink payload provides sufficient information to describe the hook
      object, including the family.
      
      This makes it easier to userspace to correlate the hooks are that
      visited by the packets for a certain family.
      
      Fixes: e2cf17d3 ("netfilter: add new hook nfnl subsystem")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      69311e7c
    • Pablo Neira Ayuso's avatar
      netfilter: nfnetlink_hook: use the sequence number of the request message · 3d9bbaf6
      Pablo Neira Ayuso authored
      The sequence number allows to correlate the netlink reply message (as
      part of the dump) with the original request message.
      
      The cb->seq field is internally used to detect an interference (update)
      of the hook list during the netlink dump, do not use it as sequence
      number in the netlink dump header.
      
      Fixes: e2cf17d3 ("netfilter: add new hook nfnl subsystem")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      3d9bbaf6
    • Pablo Neira Ayuso's avatar
      netfilter: nfnetlink_hook: missing chain family · a6e57c4a
      Pablo Neira Ayuso authored
      The family is relevant for pseudo-families like NFPROTO_INET
      otherwise the user needs to rely on the hook function name to
      differentiate it from NFPROTO_IPV4 and NFPROTO_IPV6 names.
      
      Add nfnl_hook_chain_desc_attributes instead of using the existing
      NFTA_CHAIN_* attributes, since these do not provide a family number.
      
      Fixes: e2cf17d3 ("netfilter: add new hook nfnl subsystem")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a6e57c4a
    • Pablo Neira Ayuso's avatar
      netfilter: nfnetlink_hook: strip off module name from hookfn · 61e0c2bc
      Pablo Neira Ayuso authored
      NFNLA_HOOK_FUNCTION_NAME should include the hook function name only,
      the module name is already provided by NFNLA_HOOK_MODULE_NAME.
      
      Fixes: e2cf17d3 ("netfilter: add new hook nfnl subsystem")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      61e0c2bc
    • Florian Westphal's avatar
      netfilter: conntrack: collect all entries in one cycle · 4608fdfc
      Florian Westphal authored
      Michal Kubecek reports that conntrack gc is responsible for frequent
      wakeups (every 125ms) on idle systems.
      
      On busy systems, timed out entries are evicted during lookup.
      The gc worker is only needed to remove entries after system becomes idle
      after a busy period.
      
      To resolve this, always scan the entire table.
      If the scan is taking too long, reschedule so other work_structs can run
      and resume from next bucket.
      
      After a completed scan, wait for 2 minutes before the next cycle.
      Heuristics for faster re-schedule are removed.
      
      GC_SCAN_INTERVAL could be exposed as a sysctl in the future to allow
      tuning this as-needed or even turn the gc worker off.
      Reported-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      4608fdfc
    • John Hubbard's avatar
      net: mvvp2: fix short frame size on s390 · 704e624f
      John Hubbard authored
      On s390, the following build warning occurs:
      
      drivers/net/ethernet/marvell/mvpp2/mvpp2.h:844:2: warning: overflow in
      conversion from 'long unsigned int' to 'int' changes value from
      '18446744073709551584' to '-32' [-Woverflow]
      844 |  ((total_size) - MVPP2_SKB_HEADROOM - MVPP2_SKB_SHINFO_SIZE)
      
      This happens because MVPP2_SKB_SHINFO_SIZE, which is 320 bytes (which is
      already 64-byte aligned) on some architectures, actually gets ALIGN'd up
      to 512 bytes in the s390 case.
      
      So then, when this is invoked:
      
          MVPP2_RX_MAX_PKT_SIZE(MVPP2_BM_SHORT_FRAME_SIZE)
      
      ...that turns into:
      
           704 - 224 - 512 == -32
      
      ...which is not a good frame size to end up with! The warning above is a
      bit lucky: it notices a signed/unsigned bad behavior here, which leads
      to the real problem of a frame that is too short for its contents.
      
      Increase MVPP2_BM_SHORT_FRAME_SIZE by 32 (from 704 to 736), which is
      just exactly big enough. (The other values can't readily be changed
      without causing a lot of other problems.)
      
      Fixes: 07dd0a7a ("mvpp2: add basic XDP support")
      Cc: Sven Auhagen <sven.auhagen@voleatech.de>
      Cc: Matteo Croce <mcroce@microsoft.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      704e624f
    • DENG Qingfang's avatar
      net: dsa: mt7530: add the missing RxUnicast MIB counter · aff51c5d
      DENG Qingfang authored
      Add the missing RxUnicast counter.
      
      Fixes: b8f126a8 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
      Signed-off-by: default avatarDENG Qingfang <dqfext@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aff51c5d
  2. 05 Aug, 2021 22 commits
  3. 04 Aug, 2021 8 commits
    • Hui Su's avatar
      scripts/tracing: fix the bug that can't parse raw_trace_func · 1c0cec64
      Hui Su authored
      Since commit 77271ce4 ("tracing: Add irq, preempt-count and need resched info
      to default trace output"), the default trace output format has been changed to:
                <idle>-0       [009] d.h. 22420.068695: _raw_spin_lock_irqsave <-hrtimer_interrupt
                <idle>-0       [000] ..s. 22420.068695: _nohz_idle_balance <-run_rebalance_domains
                <idle>-0       [011] d.h. 22420.068695: account_process_tick <-update_process_times
      
      origin trace output format:(before v3.2.0)
           # tracer: nop
           #
           #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
           #              | |       |          |         |
                migration/0-6     [000]    50.025810: rcu_note_context_switch <-__schedule
                migration/0-6     [000]    50.025812: trace_rcu_utilization <-rcu_note_context_switch
                migration/0-6     [000]    50.025813: rcu_sched_qs <-rcu_note_context_switch
                migration/0-6     [000]    50.025815: rcu_preempt_qs <-rcu_note_context_switch
                migration/0-6     [000]    50.025817: trace_rcu_utilization <-rcu_note_context_switch
                migration/0-6     [000]    50.025818: debug_lockdep_rcu_enabled <-__schedule
                migration/0-6     [000]    50.025820: debug_lockdep_rcu_enabled <-__schedule
      
      The draw_functrace.py(introduced in v2.6.28) can't parse the new version format trace_func,
      So we need modify draw_functrace.py to adapt the new version trace output format.
      
      Link: https://lkml.kernel.org/r/20210611022107.608787-1-suhui@zeku.com
      
      Cc: stable@vger.kernel.org
      Fixes: 77271ce4 tracing: Add irq, preempt-count and need resched info to default trace output
      Signed-off-by: default avatarHui Su <suhui@zeku.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      1c0cec64
    • Nathan Chancellor's avatar
      scripts/recordmcount.pl: Remove check_objcopy() and $can_use_local · b18b851b
      Nathan Chancellor authored
      When building ARCH=riscv allmodconfig with llvm-objcopy, the objcopy
      version warning from this script appears:
      
      WARNING: could not find objcopy version or version is less than 2.17.
              Local function references are disabled.
      
      The check_objcopy() function in scripts/recordmcount.pl is set up to
      parse GNU objcopy's version string, not llvm-objcopy's, which triggers
      the warning.
      
      Commit 799c4341 ("kbuild: thin archives make default for all archs")
      made binutils 2.20 mandatory and commit ba64beb1 ("kbuild: check the
      minimum assembler version in Kconfig") enforces this at configuration
      time so just remove check_objcopy() and $can_use_local instead, assuming
      --globalize-symbol is always available.
      
      llvm-objcopy has supported --globalize-symbol since LLVM 7.0.0 in 2018
      and the minimum version for building the kernel with LLVM is 10.0.1 so
      there is no issue introduced:
      
      Link: https://github.com/llvm/llvm-project/commit/ee5be798dae30d5f9414b01f76ff807edbc881aa
      Link: https://lkml.kernel.org/r/20210802210307.3202472-1-nathan@kernel.orgReviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      b18b851b
    • Masami Hiramatsu's avatar
      tracing: Reject string operand in the histogram expression · a9d10ca4
      Masami Hiramatsu authored
      Since the string type can not be the target of the addition / subtraction
      operation, it must be rejected. Without this fix, the string type silently
      converted to digits.
      
      Link: https://lkml.kernel.org/r/162742654278.290973.1523000673366456634.stgit@devnote2
      
      Cc: stable@vger.kernel.org
      Fixes: 100719dc ("tracing: Add simple expression support to hist triggers")
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      a9d10ca4
    • Steven Rostedt (VMware)'s avatar
      tracing / histogram: Give calculation hist_fields a size · 2c05caa7
      Steven Rostedt (VMware) authored
      When working on my user space applications, I found a bug in the synthetic
      event code where the automated synthetic event field was not matching the
      event field calculation it was attached to. Looking deeper into it, it was
      because the calculation hist_field was not given a size.
      
      The synthetic event fields are matched to their hist_fields either by
      having the field have an identical string type, or if that does not match,
      then the size and signed values are used to match the fields.
      
      The problem arose when I tried to match a calculation where the fields
      were "unsigned int". My tool created a synthetic event of type "u32". But
      it failed to match. The string was:
      
        diff=field1-field2:onmatch(event).trace(synth,$diff)
      
      Adding debugging into the kernel, I found that the size of "diff" was 0.
      And since it was given "unsigned int" as a type, the histogram fallback
      code used size and signed. The signed matched, but the size of u32 (4) did
      not match zero, and the event failed to be created.
      
      This can be worse if the field you want to match is not one of the
      acceptable fields for a synthetic event. As event fields can have any type
      that is supported in Linux, this can cause an issue. For example, if a
      type is an enum. Then there's no way to use that with any calculations.
      
      Have the calculation field simply take on the size of what it is
      calculating.
      
      Link: https://lkml.kernel.org/r/20210730171951.59c7743f@oasis.local.home
      
      Cc: Tom Zanussi <zanussi@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Fixes: 100719dc ("tracing: Add simple expression support to hist triggers")
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      2c05caa7
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 251a1524
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Seven fixes, five in drivers.
      
        The two core changes are a trivial warning removal in scsi_scan.c and
        a change to rescan for capacity when a device makes a user induced
        (via a write to the state variable) offline->running transition to fix
        issues with device mapper"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: core: Fix capacity set to zero after offlinining device
        scsi: sr: Return correct event when media event code is 3
        scsi: ibmvfc: Fix command state accounting and stale response detection
        scsi: core: Avoid printing an error if target_alloc() returns -ENXIO
        scsi: scsi_dh_rdac: Avoid crash during rdac_bus_attach()
        scsi: megaraid_mm: Fix end of loop tests for list_for_each_entry()
        scsi: pm80xx: Fix TMF task completion race condition
      251a1524
    • Linus Torvalds's avatar
      Merge tag 'gpio-updates-for-v5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 0c2e31d2
      Linus Torvalds authored
      Pull gpio fixes from Bartosz Golaszewski:
      
       - revert a patch intruducing breakage in interrupt handling in
         gpio-mpc8xxx
      
       - correctly handle missing IRQs in gpio-tqmx86 by really making them
         optional
      
      * tag 'gpio-updates-for-v5.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        gpio: tqmx86: really make IRQ optional
        Revert "gpio: mpc8xxx: change the gpio interrupt flags."
      0c2e31d2
    • Maxim Levitsky's avatar
      KVM: selftests: fix hyperv_clock test · 13c2c3cf
      Maxim Levitsky authored
      The test was mistakenly using addr_gpa2hva on a gva and that happened
      to work accidentally.  Commit 106a2e76 ("KVM: selftests: Lower the
      min virtual address for misc page allocations") revealed this bug.
      
      Fixes: 2c7f76b4 ("selftests: kvm: Add basic Hyper-V clocksources tests", 2021-03-18)
      Signed-off-by: default avatarMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210804112057.409498-1-mlevitsk@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      13c2c3cf
    • Mingwei Zhang's avatar
      KVM: SVM: improve the code readability for ASID management · bb2baeb2
      Mingwei Zhang authored
      KVM SEV code uses bitmaps to manage ASID states. ASID 0 was always skipped
      because it is never used by VM. Thus, in existing code, ASID value and its
      bitmap postion always has an 'offset-by-1' relationship.
      
      Both SEV and SEV-ES shares the ASID space, thus KVM uses a dynamic range
      [min_asid, max_asid] to handle SEV and SEV-ES ASIDs separately.
      
      Existing code mixes the usage of ASID value and its bitmap position by
      using the same variable called 'min_asid'.
      
      Fix the min_asid usage: ensure that its usage is consistent with its name;
      allocate extra size for ASID 0 to ensure that each ASID has the same value
      with its bitmap position. Add comments on ASID bitmap allocation to clarify
      the size change.
      Signed-off-by: default avatarMingwei Zhang <mizhang@google.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Marc Orr <marcorr@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Alper Gun <alpergun@google.com>
      Cc: Dionna Glaze <dionnaglaze@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Vipin Sharma <vipinsh@google.com>
      Cc: Peter Gonda <pgonda@google.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Message-Id: <20210802180903.159381-1-mizhang@google.com>
      [Fix up sev_asid_free to also index by ASID, as suggested by Sean
       Christopherson, and use nr_asids in sev_cpu_init. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      bb2baeb2