1. 19 Dec, 2022 2 commits
  2. 18 Dec, 2022 1 commit
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 89529367
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2022-12-15 (igc)
      
      Muhammad Husaini Zulkifli says:
      
      This patch series fixes bugs for the Time-Sensitive Networking(TSN)
      Qbv Scheduling features.
      
      An overview of each patch series is given below:
      
      Patch 1: Using a first flag bit to schedule a packet to the next cycle if
      packet cannot fit in current Qbv cycle.
      Patch 2: Enable strict cycle for Qbv scheduling.
      Patch 3: Prevent user to set basetime less than zero during tc config.
      Patch 4: Allow the basetime enrollment with zero value.
      Patch 5: Calculate the new end time value to exclude the time interval that
      exceed the cycle time as user can specify the cycle time in tc config.
      Patch 6: Resolve the HW bugs where the gate is not fully closed.
      ---
      This contains the net patches from this original pull request:
      https://lore.kernel.org/netdev/20221205212414.3197525-1-anthony.l.nguyen@intel.com/
      
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89529367
  3. 17 Dec, 2022 3 commits
  4. 16 Dec, 2022 7 commits
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 13e3c779
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2022-12-16
      
      We've added 7 non-merge commits during the last 2 day(s) which contain
      a total of 9 files changed, 119 insertions(+), 36 deletions(-).
      
      1) Fix for recent syzkaller XDP dispatcher update splat, from Jiri Olsa.
      
      2) Fix BPF program refcount leak in LSM attachment failure path,
         from Milan Landaverde.
      
      3) Fix BPF program type in map compatibility check for fext,
         from Toke Høiland-Jørgensen.
      
      4) Fix a BPF selftest compilation error under !CONFIG_SMP config,
         from Yonghong Song.
      
      5) Fix CI to enable CONFIG_FUNCTION_ERROR_INJECTION after it got changed
         to a prompt, from Song Liu.
      
      6) Various BPF documentation fixes for socket local storage,
         from Donald Hunter.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        selftests/bpf: Add a test for using a cpumap from an freplace-to-XDP program
        bpf: Resolve fext program type when checking map compatibility
        bpf: Synchronize dispatcher update with bpf_dispatcher_xdp_func
        bpf: prevent leak of lsm program after failed attach
        selftests/bpf: Select CONFIG_FUNCTION_ERROR_INJECTION
        selftests/bpf: Fix a selftest compilation error with CONFIG_SMP=n
        docs/bpf: Reword docs for BPF_MAP_TYPE_SK_STORAGE
      ====================
      
      Link: https://lore.kernel.org/r/20221216174540.16598-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      13e3c779
    • Eelco Chaudron's avatar
      openvswitch: Fix flow lookup to use unmasked key · 68bb1010
      Eelco Chaudron authored
      The commit mentioned below causes the ovs_flow_tbl_lookup() function
      to be called with the masked key. However, it's supposed to be called
      with the unmasked key. This due to the fact that the datapath supports
      installing wider flows, and OVS relies on this behavior. For example
      if ipv4(src=1.1.1.1/192.0.0.0, dst=1.1.1.2/192.0.0.0) exists, a wider
      flow (smaller mask) of ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/
      128.0.0.0) is allowed to be added.
      
      However, if we try to add a wildcard rule, the installation fails:
      
      $ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
        ipv4(src=1.1.1.1/192.0.0.0,dst=1.1.1.2/192.0.0.0,frag=no)" 2
      $ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
        ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.2/0.0.0.0,frag=no)" 2
      ovs-vswitchd: updating flow table (File exists)
      
      The reason is that the key used to determine if the flow is already
      present in the system uses the original key ANDed with the mask.
      This results in the IP address not being part of the (miniflow) key,
      i.e., being substituted with an all-zero value. When doing the actual
      lookup, this results in the key wrongfully matching the first flow,
      and therefore the flow does not get installed.
      
      This change reverses the commit below, but rather than having the key
      on the stack, it's allocated.
      
      Fixes: 190aa3e7 ("openvswitch: Fix Frame-size larger than 1024 bytes warning.")
      Signed-off-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68bb1010
    • David S. Miller's avatar
      Merge branch 'devlink-fixes' · 3e31d209
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      devlink: region snapshot locking fix and selftest adjustments
      
      Minor fix for region snapshot locking and adjustments to selftests.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e31d209
    • Jakub Kicinski's avatar
      selftests: devlink: add a warning for interfaces coming up · d1c4a346
      Jakub Kicinski authored
      NetworkManager (and other daemons) may bring the interface up
      and cause failures in quiescence checks. Print a helpful warning,
      and take the interface down again.
      
      I seem to forget about this every time I run these tests on a new VM.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1c4a346
    • Jakub Kicinski's avatar
      selftests: devlink: fix the fd redirect in dummy_reporter_test · 2fc60e2f
      Jakub Kicinski authored
      $number + > bash means redirect FD $number, e.g. commonly
      used 2> redirects stderr (fd 2). The test uses 8192> to
      write the number 8192 to a file, this results in:
      
        ./devlink.sh: line 499: 8192: Bad file descriptor
      
      Oddly the test also papers over this issue by checking
      for failure (expecting an error rather than success)
      so it passes, anyway.
      
      Fixes: ff18176a ("selftests: Add a test of large binary to devlink health test")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fc60e2f
    • Jakub Kicinski's avatar
      devlink: hold region lock when flushing snapshots · b4cafb3d
      Jakub Kicinski authored
      Netdevsim triggers a splat on reload, when it destroys regions
      with snapshots pending:
      
        WARNING: CPU: 1 PID: 787 at net/core/devlink.c:6291 devlink_region_snapshot_del+0x12e/0x140
        CPU: 1 PID: 787 Comm: devlink Not tainted 6.1.0-07460-g7ae9888d #580
        RIP: 0010:devlink_region_snapshot_del+0x12e/0x140
        Call Trace:
         <TASK>
         devl_region_destroy+0x70/0x140
         nsim_dev_reload_down+0x2f/0x60 [netdevsim]
         devlink_reload+0x1f7/0x360
         devlink_nl_cmd_reload+0x6ce/0x860
         genl_family_rcv_msg_doit.isra.0+0x145/0x1c0
      
      This is the locking assert in devlink_region_snapshot_del(),
      we're supposed to be holding the region->snapshot_lock here.
      
      Fixes: 2dec18ad ("net: devlink: remove region snapshots list dependency on devlink->lock")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4cafb3d
    • Daniel Golle's avatar
      net: dsa: mt7530: remove redundant assignment · 32f1002e
      Daniel Golle authored
      Russell King correctly pointed out that the MAC_2500FD capability is
      already added for port 5 (if not in RGMII mode) and port 6 (which only
      supports SGMII) by mt7531_mac_port_get_caps. Remove the reduntant
      setting of this capability flag which was added by a previous commit.
      
      Fixes: e19de30d ("net: dsa: mt7530: add support for in-band link status")
      Reported-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Link: https://lore.kernel.org/r/Y5qY7x6la5TxZxzX@makrotopia.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      32f1002e
  5. 15 Dec, 2022 15 commits
    • Tan Tee Min's avatar
      igc: Set Qbv start_time and end_time to end_time if not being configured in GCL · 72abeedd
      Tan Tee Min authored
      The default setting of end_time minus start_time is whole 1 second.
      Thus, if it's not being configured in any GCL entry then it will be
      staying at original 1 second.
      
      This patch is changing the start_time and end_time to be end_time as
      if setting zero will be having weird HW behavior where the gate will
      not be fully closed.
      
      Fixes: ec50a9d4 ("igc: Add support for taprio offloading")
      Signed-off-by: default avatarTan Tee Min <tee.min.tan@linux.intel.com>
      Signed-off-by: default avatarMuhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      72abeedd
    • Tan Tee Min's avatar
      igc: recalculate Qbv end_time by considering cycle time · 6d05251d
      Tan Tee Min authored
      Qbv users can specify a cycle time that is not equal to the total GCL
      intervals. Hence, recalculation is necessary here to exclude the time
      interval that exceeds the cycle time. As those GCL which exceeds the
      cycle time will be truncated.
      
      According to IEEE Std. 802.1Q-2018 section 8.6.9.2, once the end of
      the list is reached, it will switch to the END_OF_CYCLE state and
      leave the gates in the same state until the next cycle is started.
      
      Fixes: ec50a9d4 ("igc: Add support for taprio offloading")
      Signed-off-by: default avatarTan Tee Min <tee.min.tan@linux.intel.com>
      Signed-off-by: default avatarMuhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      6d05251d
    • Tan Tee Min's avatar
      igc: allow BaseTime 0 enrollment for Qbv · e17090eb
      Tan Tee Min authored
      Introduce qbv_enable flag in igc_adapter struct to store the Qbv on/off.
      So this allow the BaseTime to enroll with zero value.
      
      Fixes: 61572d5f ("igc: Simplify TSN flags handling")
      Signed-off-by: default avatarMuhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
      Signed-off-by: default avatarTan Tee Min <tee.min.tan@linux.intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      e17090eb
    • Muhammad Husaini Zulkifli's avatar
      igc: Add checking for basetime less than zero · 3b61764f
      Muhammad Husaini Zulkifli authored
      Using the tc qdisc command, the user can set basetime to any value.
      Checking should be done on the driver's side to prevent registering
      basetime values that are less than zero.
      
      Fixes: ec50a9d4 ("igc: Add support for taprio offloading")
      Signed-off-by: default avatarMuhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      3b61764f
    • Vinicius Costa Gomes's avatar
      igc: Use strict cycles for Qbv scheduling · d8f45be0
      Vinicius Costa Gomes authored
      Configuring strict cycle mode in the controller forces more well
      behaved transmissions when taprio is offloaded.
      
      When set this strict_cycle and strict_end, transmission is not
      enabled if the whole packet cannot be completed before end of
      the Qbv cycle.
      
      Fixes: 82faa9b7 ("igc: Add support for ETF offloading")
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: default avatarAravindhan Gunasekaran <aravindhan.gunasekaran@intel.com>
      Signed-off-by: default avatarMuhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      d8f45be0
    • Vinicius Costa Gomes's avatar
      igc: Enhance Qbv scheduling by using first flag bit · db0b124f
      Vinicius Costa Gomes authored
      The I225 hardware has a limitation that packets can only be scheduled
      in the [0, cycle-time] interval. So, scheduling a packet to the start
      of the next cycle doesn't usually work.
      
      To overcome this, we use the Transmit Descriptor first flag to indicates
      that a packet should be the first packet (from a queue) in a cycle
      according to the section 7.5.2.9.3.4 The First Packet on Each QBV Cycle
      in Intel Discrete I225/6 User Manual.
      
      But this only works if there was any packet from that queue during the
      current cycle, to avoid this issue, we issue an empty packet if that's
      not the case. Also require one more descriptor to be available, to take
      into account the empty packet that might be issued.
      
      Test Setup:
      
      Talker: Use l2_tai to generate the launchtime into packet load.
      
      Listener: Use timedump.c to compute the delta between packet arrival
      and LaunchTime packet payload.
      
      Test Result:
      
      Before:
      
      1666000610127300000,1666000610127300096,96,621273
      1666000610127400000,1666000610127400192,192,621274
      1666000610127500000,1666000610127500032,32,621275
      1666000610127600000,1666000610127600128,128,621276
      1666000610127700000,1666000610127700224,224,621277
      1666000610127800000,1666000610127800064,64,621278
      1666000610127900000,1666000610127900160,160,621279
      1666000610128000000,1666000610128000000,0,621280
      1666000610128100000,1666000610128100096,96,621281
      1666000610128200000,1666000610128200192,192,621282
      1666000610128300000,1666000610128300032,32,621283
      1666000610128400000,1666000610128301056,-98944,621284
      1666000610128500000,1666000610128302080,-197920,621285
      1666000610128600000,1666000610128302848,-297152,621286
      1666000610128700000,1666000610128303872,-396128,621287
      1666000610128800000,1666000610128304896,-495104,621288
      1666000610128900000,1666000610128305664,-594336,621289
      1666000610129000000,1666000610128306688,-693312,621290
      1666000610129100000,1666000610128307712,-792288,621291
      1666000610129200000,1666000610128308480,-891520,621292
      1666000610129300000,1666000610128309504,-990496,621293
      1666000610129400000,1666000610128310528,-1089472,621294
      1666000610129500000,1666000610128311296,-1188704,621295
      1666000610129600000,1666000610128312320,-1287680,621296
      1666000610129700000,1666000610128313344,-1386656,621297
      1666000610129800000,1666000610128314112,-1485888,621298
      1666000610129900000,1666000610128315136,-1584864,621299
      1666000610130000000,1666000610128316160,-1683840,621300
      1666000610130100000,1666000610128316928,-1783072,621301
      1666000610130200000,1666000610128317952,-1882048,621302
      1666000610130300000,1666000610128318976,-1981024,621303
      1666000610130400000,1666000610128319744,-2080256,621304
      1666000610130500000,1666000610128320768,-2179232,621305
      1666000610130600000,1666000610128321792,-2278208,621306
      1666000610130700000,1666000610128322816,-2377184,621307
      1666000610130800000,1666000610128323584,-2476416,621308
      1666000610130900000,1666000610128324608,-2575392,621309
      1666000610131000000,1666000610128325632,-2674368,621310
      1666000610131100000,1666000610128326400,-2773600,621311
      1666000610131200000,1666000610128327424,-2872576,621312
      1666000610131300000,1666000610128328448,-2971552,621313
      1666000610131400000,1666000610128329216,-3070784,621314
      1666000610131500000,1666000610131500032,32,621315
      1666000610131600000,1666000610131600128,128,621316
      1666000610131700000,1666000610131700224,224,621317
      
      After:
      
      1666073510646200000,1666073510646200064,64,2676462
      1666073510646300000,1666073510646300160,160,2676463
      1666073510646400000,1666073510646400256,256,2676464
      1666073510646500000,1666073510646500096,96,2676465
      1666073510646600000,1666073510646600192,192,2676466
      1666073510646700000,1666073510646700032,32,2676467
      1666073510646800000,1666073510646800128,128,2676468
      1666073510646900000,1666073510646900224,224,2676469
      1666073510647000000,1666073510647000064,64,2676470
      1666073510647100000,1666073510647100160,160,2676471
      1666073510647200000,1666073510647200256,256,2676472
      1666073510647300000,1666073510647300096,96,2676473
      1666073510647400000,1666073510647400192,192,2676474
      1666073510647500000,1666073510647500032,32,2676475
      1666073510647600000,1666073510647600128,128,2676476
      1666073510647700000,1666073510647700224,224,2676477
      1666073510647800000,1666073510647800064,64,2676478
      1666073510647900000,1666073510647900160,160,2676479
      1666073510648000000,1666073510648000000,0,2676480
      1666073510648100000,1666073510648100096,96,2676481
      1666073510648200000,1666073510648200192,192,2676482
      1666073510648300000,1666073510648300032,32,2676483
      1666073510648400000,1666073510648400128,128,2676484
      1666073510648500000,1666073510648500224,224,2676485
      1666073510648600000,1666073510648600064,64,2676486
      1666073510648700000,1666073510648700160,160,2676487
      1666073510648800000,1666073510648800000,0,2676488
      1666073510648900000,1666073510648900096,96,2676489
      1666073510649000000,1666073510649000192,192,2676490
      1666073510649100000,1666073510649100032,32,2676491
      1666073510649200000,1666073510649200128,128,2676492
      1666073510649300000,1666073510649300224,224,2676493
      1666073510649400000,1666073510649400064,64,2676494
      1666073510649500000,1666073510649500160,160,2676495
      1666073510649600000,1666073510649600000,0,2676496
      1666073510649700000,1666073510649700096,96,2676497
      1666073510649800000,1666073510649800192,192,2676498
      1666073510649900000,1666073510649900032,32,2676499
      1666073510650000000,1666073510650000128,128,2676500
      
      Fixes: 82faa9b7 ("igc: Add support for ETF offloading")
      Signed-off-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Co-developed-by: default avatarAravindhan Gunasekaran <aravindhan.gunasekaran@intel.com>
      Signed-off-by: default avatarAravindhan Gunasekaran <aravindhan.gunasekaran@intel.com>
      Co-developed-by: default avatarMuhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
      Signed-off-by: default avatarMuhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
      Signed-off-by: default avatarMalli C <mallikarjuna.chilakala@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      db0b124f
    • Vladimir Oltean's avatar
      net: dsa: mv88e6xxx: avoid reg_lock deadlock in mv88e6xxx_setup_port() · a7d82367
      Vladimir Oltean authored
      In the blamed commit, it was not noticed that one implementation of
      chip->info->ops->phylink_get_caps(), called by mv88e6xxx_get_caps(),
      may access hardware registers, and in doing so, it takes the
      mv88e6xxx_reg_lock(). Namely, this is mv88e6352_phylink_get_caps().
      
      This is a problem because mv88e6xxx_get_caps(), apart from being
      a top-level function (method invoked by dsa_switch_ops), is now also
      directly called from mv88e6xxx_setup_port(), which runs under the
      mv88e6xxx_reg_lock() taken by mv88e6xxx_setup(). Therefore, when running
      on mv88e6352, the reg_lock would be acquired a second time and the
      system would deadlock on driver probe.
      
      The things that mv88e6xxx_setup() can compete with in terms of register
      access with are the IRQ handlers and MDIO bus operations registered by
      mv88e6xxx_probe(). So there is a real need to acquire the register lock.
      
      The register lock can, in principle, be dropped and re-acquired pretty
      much at will within the driver, as long as no operations that involve
      waiting for indirect access to complete (essentially, callers of
      mv88e6xxx_smi_direct_wait() and mv88e6xxx_wait_mask()) are interrupted
      with the lock released. However, I would guess that in mv88e6xxx_setup(),
      the critical section is kept open for such a long time just in order to
      optimize away multiple lock/unlock operations on the registers.
      
      We could, in principle, drop the reg_lock right before the
      mv88e6xxx_setup_port() -> mv88e6xxx_get_caps() call, and
      re-acquire it immediately afterwards. But this would look ugly, because
      mv88e6xxx_setup_port() would release a lock which it didn't acquire, but
      the caller did.
      
      A cleaner solution to this issue comes from the observation that struct
      mv88e6xxxx_ops methods generally assume they are called with the
      reg_lock already acquired. Whereas mv88e6352_phylink_get_caps() is more
      the exception rather than the norm, in that it acquires the lock itself.
      
      Let's enforce the same locking pattern/convention for
      chip->info->ops->phylink_get_caps() as well, and make
      mv88e6xxx_get_caps(), the top-level function, acquire the register lock
      explicitly, for this one implementation that will access registers for
      port 4 to work properly.
      
      This means that mv88e6xxx_setup_port() will no longer call the top-level
      function, but the low-level mv88e6xxx_ops method which expects the
      correct calling context (register lock held).
      
      Compared to chip->info->ops->phylink_get_caps(), mv88e6xxx_get_caps()
      also fixes up the supported_interfaces bitmap for internal ports, since
      that can be done generically and does not require per-switch knowledge.
      That's code which will no longer execute, however mv88e6xxx_setup_port()
      doesn't need that. It just needs to look at the mac_capabilities bitmap.
      
      Fixes: cc1049cc ("net: dsa: mv88e6xxx: fix speed setting for CPU/DSA ports")
      Reported-by: default avatarMaksim Kiselev <bigunclemax@gmail.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarMaksim Kiselev <bigunclemax@gmail.com>
      Link: https://lore.kernel.org/r/20221214110120.3368472-1-vladimir.oltean@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a7d82367
    • Biju Das's avatar
      ravb: Fix "failed to switch device to config mode" message during unbind · c72a7e42
      Biju Das authored
      This patch fixes the error "ravb 11c20000.ethernet eth0: failed to switch
      device to config mode" during unbind.
      
      We are doing register access after pm_runtime_put_sync().
      
      We usually do cleanup in reverse order of init. Currently in
      remove(), the "pm_runtime_put_sync" is not in reverse order.
      
      Probe
      	reset_control_deassert(rstc);
      	pm_runtime_enable(&pdev->dev);
      	pm_runtime_get_sync(&pdev->dev);
      
      remove
      	pm_runtime_put_sync(&pdev->dev);
      	unregister_netdev(ndev);
      	..
      	ravb_mdio_release(priv);
      	pm_runtime_disable(&pdev->dev);
      
      Consider the call to unregister_netdev()
      unregister_netdev->unregister_netdevice_queue->rollback_registered_many
      that calls the below functions which access the registers after
      pm_runtime_put_sync()
       1) ravb_get_stats
       2) ravb_close
      
      Fixes: c156633f ("Renesas Ethernet AVB driver proper")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBiju Das <biju.das.jz@bp.renesas.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20221214105118.2495313-1-biju.das.jz@bp.renesas.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c72a7e42
    • Gaosheng Cui's avatar
      net: stmmac: fix errno when create_singlethread_workqueue() fails · 2cb815cf
      Gaosheng Cui authored
      We should set the return value to -ENOMEM explicitly when
      create_singlethread_workqueue() fails in stmmac_dvr_probe(),
      otherwise we'll lose the error value.
      
      Fixes: a137f3f2 ("net: stmmac: fix possible memory leak in stmmac_dvr_probe()")
      Signed-off-by: default avatarGaosheng Cui <cuigaosheng1@huawei.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20221214080117.3514615-1-cuigaosheng1@huawei.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2cb815cf
    • Li Zetao's avatar
      r6040: Fix kmemleak in probe and remove · 7e43039a
      Li Zetao authored
      There is a memory leaks reported by kmemleak:
      
        unreferenced object 0xffff888116111000 (size 2048):
          comm "modprobe", pid 817, jiffies 4294759745 (age 76.502s)
          hex dump (first 32 bytes):
            00 c4 0a 04 81 88 ff ff 08 10 11 16 81 88 ff ff  ................
            08 10 11 16 81 88 ff ff 00 00 00 00 00 00 00 00  ................
          backtrace:
            [<ffffffff815bcd82>] kmalloc_trace+0x22/0x60
            [<ffffffff827e20ee>] phy_device_create+0x4e/0x90
            [<ffffffff827e6072>] get_phy_device+0xd2/0x220
            [<ffffffff827e7844>] mdiobus_scan+0xa4/0x2e0
            [<ffffffff827e8be2>] __mdiobus_register+0x482/0x8b0
            [<ffffffffa01f5d24>] r6040_init_one+0x714/0xd2c [r6040]
            ...
      
      The problem occurs in probe process as follows:
        r6040_init_one:
          mdiobus_register
            mdiobus_scan    <- alloc and register phy_device,
                               the reference count of phy_device is 3
          r6040_mii_probe
            phy_connect     <- connect to the first phy_device,
                               so the reference count of the first
                               phy_device is 4, others are 3
          register_netdev   <- fault inject succeeded, goto error handling path
      
          // error handling path
          err_out_mdio_unregister:
            mdiobus_unregister(lp->mii_bus);
          err_out_mdio:
            mdiobus_free(lp->mii_bus);    <- the reference count of the first
                                             phy_device is 1, it is not released
                                             and other phy_devices are released
        // similarly, the remove process also has the same problem
      
      The root cause is traced to the phy_device is not disconnected when
      removes one r6040 device in r6040_remove_one() or on error handling path
      after r6040_mii probed successfully. In r6040_mii_probe(), a net ethernet
      device is connected to the first PHY device of mii_bus, in order to
      notify the connected driver when the link status changes, which is the
      default behavior of the PHY infrastructure to handle everything.
      Therefore the phy_device should be disconnected when removes one r6040
      device or on error handling path.
      
      Fix it by adding phy_disconnect() when removes one r6040 device or on
      error handling path after r6040_mii probed successfully.
      
      Fixes: 3831861b ("r6040: implement phylib")
      Signed-off-by: default avatarLi Zetao <lizetao1@huawei.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20221213125614.927754-1-lizetao1@huawei.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7e43039a
    • Kirill Tkhai's avatar
      unix: Fix race in SOCK_SEQPACKET's unix_dgram_sendmsg() · 3ff8bff7
      Kirill Tkhai authored
      There is a race resulting in alive SOCK_SEQPACKET socket
      may change its state from TCP_ESTABLISHED to TCP_CLOSE:
      
      unix_release_sock(peer)                  unix_dgram_sendmsg(sk)
        sock_orphan(peer)
          sock_set_flag(peer, SOCK_DEAD)
                                                 sock_alloc_send_pskb()
                                                   if !(sk->sk_shutdown & SEND_SHUTDOWN)
                                                     OK
                                                 if sock_flag(peer, SOCK_DEAD)
                                                   sk->sk_state = TCP_CLOSE
        sk->sk_shutdown = SHUTDOWN_MASK
      
      After that socket sk remains almost normal: it is able to connect, listen, accept
      and recvmsg, while it can't sendmsg.
      
      Since this is the only possibility for alive SOCK_SEQPACKET to change
      the state in such way, we should better fix this strange and potentially
      danger corner case.
      
      Note, that we will return EPIPE here like this is normally done in sock_alloc_send_pskb().
      Originally used ECONNREFUSED looks strange, since it's strange to return
      a specific retval in dependence of race in kernel, when user can't affect on this.
      
      Also, move TCP_CLOSE assignment for SOCK_DGRAM sockets under state lock
      to fix race with unix_dgram_connect():
      
      unix_dgram_connect(other)            unix_dgram_sendmsg(sk)
                                             unix_peer(sk) = NULL
                                             unix_state_unlock(sk)
        unix_state_double_lock(sk, other)
        sk->sk_state  = TCP_ESTABLISHED
        unix_peer(sk) = other
        unix_state_double_unlock(sk, other)
                                             sk->sk_state  = TCP_CLOSED
      
      This patch fixes both of these races.
      
      Fixes: 83301b53 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")
      Signed-off-by: default avatarKirill Tkhai <tkhai@ya.ru>
      Link: https://lore.kernel.org/r/135fda25-22d5-837a-782b-ceee50e19844@ya.ruSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3ff8bff7
    • Toke Høiland-Jørgensen's avatar
      selftests/bpf: Add a test for using a cpumap from an freplace-to-XDP program · f506439e
      Toke Høiland-Jørgensen authored
      This adds a simple test for inserting an XDP program into a cpumap that is
      "owned" by an XDP program that was loaded as PROG_TYPE_EXT (as libxdp
      does). Prior to the kernel fix this would fail because the map type
      ownership would be set to PROG_TYPE_EXT instead of being resolved to
      PROG_TYPE_XDP.
      
      v5:
      - Fix a few nits from Andrii, add his ACK
      v4:
      - Use skeletons for selftest
      v3:
      - Update comment to better explain the cause
      - Add Yonghong's ACK
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/r/20221214230254.790066-2-toke@redhat.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      f506439e
    • Toke Høiland-Jørgensen's avatar
      bpf: Resolve fext program type when checking map compatibility · 1c123c56
      Toke Høiland-Jørgensen authored
      The bpf_prog_map_compatible() check makes sure that BPF program types are
      not mixed inside BPF map types that can contain programs (tail call maps,
      cpumaps and devmaps). It does this by setting the fields of the map->owner
      struct to the values of the first program being checked against, and
      rejecting any subsequent programs if the values don't match.
      
      One of the values being set in the map owner struct is the program type,
      and since the code did not resolve the prog type for fext programs, the map
      owner type would be set to PROG_TYPE_EXT and subsequent loading of programs
      of the target type into the map would fail.
      
      This bug is seen in particular for XDP programs that are loaded as
      PROG_TYPE_EXT using libxdp; these cannot insert programs into devmaps and
      cpumaps because the check fails as described above.
      
      Fix the bug by resolving the fext program type to its target program type
      as elsewhere in the verifier.
      
      v3:
      - Add Yonghong's ACK
      
      Fixes: f45d5b6c ("bpf: generalise tail call map compatibility check")
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/r/20221214230254.790066-1-toke@redhat.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      1c123c56
    • Minsuk Kang's avatar
      nfc: pn533: Clear nfc_target before being used · 9f281577
      Minsuk Kang authored
      Fix a slab-out-of-bounds read that occurs in nla_put() called from
      nfc_genl_send_target() when target->sensb_res_len, which is duplicated
      from an nfc_target in pn533, is too large as the nfc_target is not
      properly initialized and retains garbage values. Clear nfc_targets with
      memset() before they are used.
      
      Found by a modified version of syzkaller.
      
      BUG: KASAN: slab-out-of-bounds in nla_put
      Call Trace:
       memcpy
       nla_put
       nfc_genl_dump_targets
       genl_lock_dumpit
       netlink_dump
       __netlink_dump_start
       genl_family_rcv_msg_dumpit
       genl_rcv_msg
       netlink_rcv_skb
       genl_rcv
       netlink_unicast
       netlink_sendmsg
       sock_sendmsg
       ____sys_sendmsg
       ___sys_sendmsg
       __sys_sendmsg
       do_syscall_64
      
      Fixes: 673088fb ("NFC: pn533: Send ATR_REQ directly for active device detection")
      Fixes: 361f3cb7 ("NFC: DEP link hook implementation for pn533")
      Signed-off-by: default avatarMinsuk Kang <linuxlovemin@yonsei.ac.kr>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Link: https://lore.kernel.org/r/20221214015139.119673-1-linuxlovemin@yonsei.ac.krSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9f281577
    • Vladimir Oltean's avatar
      net: enetc: avoid buffer leaks on xdp_do_redirect() failure · 628050ec
      Vladimir Oltean authored
      Before enetc_clean_rx_ring_xdp() calls xdp_do_redirect(), each software
      BD in the RX ring between index orig_i and i can have one of 2 refcount
      values on its page.
      
      We are the owner of the current buffer that is being processed, so the
      refcount will be at least 1.
      
      If the current owner of the buffer at the diametrically opposed index
      in the RX ring (i.o.w, the other half of this page) has not yet called
      kfree(), this page's refcount could even be 2.
      
      enetc_page_reusable() in enetc_flip_rx_buff() tests for the page
      refcount against 1, and [ if it's 2 ] does not attempt to reuse it.
      
      But if enetc_flip_rx_buff() is put after the xdp_do_redirect() call,
      the page refcount can have one of 3 values. It can also be 0, if there
      is no owner of the other page half, and xdp_do_redirect() for this
      buffer ran so far that it triggered a flush of the devmap/cpumap bulk
      queue, and the consumers of those bulk queues also freed the buffer,
      all by the time xdp_do_redirect() returns the execution back to enetc.
      
      This is the reason why enetc_flip_rx_buff() is called before
      xdp_do_redirect(), but there is a big flaw with that reasoning:
      enetc_flip_rx_buff() will set rx_swbd->page = NULL on both sides of the
      enetc_page_reusable() branch, and if xdp_do_redirect() returns an error,
      we call enetc_xdp_free(), which does not deal gracefully with that.
      
      In fact, what happens is quite special. The page refcounts start as 1.
      enetc_flip_rx_buff() figures they're reusable, transfers these
      rx_swbd->page pointers to a different rx_swbd in enetc_reuse_page(), and
      bumps the refcount to 2. When xdp_do_redirect() later returns an error,
      we call the no-op enetc_xdp_free(), but we still haven't lost the
      reference to that page. A copy of it is still at rx_ring->next_to_alloc,
      but that has refcount 2 (and there are no concurrent owners of it in
      flight, to drop the refcount). What really kills the system is when
      we'll flip the rx_swbd->page the second time around. With an updated
      refcount of 2, the page will not be reusable and we'll really leak it.
      Then enetc_new_page() will have to allocate more pages, which will then
      eventually leak again on further errors from xdp_do_redirect().
      
      The problem, summarized, is that we zeroize rx_swbd->page before we're
      completely done with it, and this makes it impossible for the error path
      to do something with it.
      
      Since the packet is potentially multi-buffer and therefore the
      rx_swbd->page is potentially an array, manual passing of the old
      pointers between enetc_flip_rx_buff() and enetc_xdp_free() is a bit
      difficult.
      
      For the sake of going with a simple solution, we accept the possibility
      of racing with xdp_do_redirect(), and we move the flip procedure to
      execute only on the redirect success path. By racing, I mean that the
      page may be deemed as not reusable by enetc (having a refcount of 0),
      but there will be no leak in that case, either.
      
      Once we accept that, we have something better to do with buffers on
      XDP_REDIRECT failure. Since we haven't performed half-page flipping yet,
      we won't, either (and this way, we can avoid enetc_xdp_free()
      completely, which gives the entire page to the slab allocator).
      Instead, we'll call enetc_xdp_drop(), which will recycle this half of
      the buffer back to the RX ring.
      
      Fixes: 9d2b68cc ("net: enetc: add support for XDP_REDIRECT")
      Suggested-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20221213001908.2347046-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      628050ec
  6. 14 Dec, 2022 12 commits