1. 25 Sep, 2024 25 commits
  2. 10 Sep, 2024 10 commits
    • Cindy Lu's avatar
      vdpa/mlx5: Add the support of set mac address · 6d17035a
      Cindy Lu authored
      Add the function to support setting the MAC address.
      For vdpa/mlx5, the function will use mlx5_mpfs_add_mac
      to set the mac address
      
      Tested in ConnectX-6 Dx device
      Signed-off-by: default avatarCindy Lu <lulu@redhat.com>
      Message-Id: <20240731031653.1047692-4-lulu@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      6d17035a
    • Cindy Lu's avatar
      vdpa_sim_net: Add the support of set mac address · 218bb7ec
      Cindy Lu authored
      Add the function to support setting the MAC address.
      For vdpa_sim_net, the driver will write the MAC address
      to the config space, and other devices can implement
      their own functions to support this.
      Signed-off-by: default avatarCindy Lu <lulu@redhat.com>
      Message-Id: <20240731031653.1047692-3-lulu@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      218bb7ec
    • Cindy Lu's avatar
      vdpa: support set mac address from vdpa tool · 2f87e9cf
      Cindy Lu authored
      Add new UAPI to support the mac address from vdpa tool
      Function vdpa_nl_cmd_dev_attr_set_doit() will get the
      new MAC address from the vdpa tool and then set it to the device.
      
      The usage is: vdpa dev set name vdpa_name mac **:**:**:**:**:**
      
      Here is example:
      root@L1# vdpa -jp dev config show vdpa0
      {
          "config": {
              "vdpa0": {
                  "mac": "82:4d:e9:5d:d7:e6",
                  "link ": "up",
                  "link_announce ": false,
                  "mtu": 1500
              }
          }
      }
      
      root@L1# vdpa dev set name vdpa0 mac 00:11:22:33:44:55
      
      root@L1# vdpa -jp dev config show vdpa0
      {
          "config": {
              "vdpa0": {
                  "mac": "00:11:22:33:44:55",
                  "link ": "up",
                  "link_announce ": false,
                  "mtu": 1500
              }
          }
      }
      Signed-off-by: default avatarCindy Lu <lulu@redhat.com>
      Message-Id: <20240731031653.1047692-2-lulu@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      2f87e9cf
    • Zhu Jun's avatar
      tools/virtio:Fix the wrong format specifier · a8927f69
      Zhu Jun authored
      The unsigned int should use "%u" instead of "%d".
      Signed-off-by: default avatarZhu Jun <zhujun2@cmss.chinamobile.com>
      Message-Id: <20240724074108.9530-1-zhujun2@cmss.chinamobile.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarEugenio Pérez <eperezma@redhat.com>
      Reviewed-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      a8927f69
    • zhenwei pi's avatar
      virtio_balloon: introduce memory scan/reclaim info · 74c025c5
      zhenwei pi authored
      Expose memory scan/reclaim information to the host side via virtio
      balloon device.
      
      Now we have a metric to analyze the memory performance:
      
      y: counter increases
      n: counter does not changes
      h: the rate of counter change is high
      l: the rate of counter change is low
      
      OOM: VIRTIO_BALLOON_S_OOM_KILL
      STALL: VIRTIO_BALLOON_S_ALLOC_STALL
      ASCAN: VIRTIO_BALLOON_S_SCAN_ASYNC
      DSCAN: VIRTIO_BALLOON_S_SCAN_DIRECT
      ARCLM: VIRTIO_BALLOON_S_RECLAIM_ASYNC
      DRCLM: VIRTIO_BALLOON_S_RECLAIM_DIRECT
      
      - OOM[y], STALL[*], ASCAN[*], DSCAN[*], ARCLM[*], DRCLM[*]:
        the guest runs under really critial memory pressure
      
      - OOM[n], STALL[h], ASCAN[*], DSCAN[l], ARCLM[*], DRCLM[l]:
        the memory allocation stalls due to cgroup, not the global memory
        pressure.
      
      - OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[h]:
        the memory allocation stalls due to global memory pressure. The
        performance gets hurt a lot. A high ratio between DRCLM/DSCAN shows
        quite effective memory reclaiming.
      
      - OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[l]:
        the memory allocation stalls due to global memory pressure.
        the ratio between DRCLM/DSCAN gets low, the guest OS is thrashing
        heavily, the serious case leads poor performance and difficult
        trouble shooting. Ex, sshd may block on memory allocation when
        accepting new connections, a user can't login a VM by ssh command.
      
      - OOM[n], STALL[n], ASCAN[h], DSCAN[n], ARCLM[l], DRCLM[n]:
        the low ratio between ARCLM/ASCAN shows that the guest tries to
        reclaim more memory, but it can't. Once more memory is required in
        future, it will struggle to reclaim memory.
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarzhenwei pi <pizhenwei@bytedance.com>
      Message-Id: <20240423034109.1552866-5-pizhenwei@bytedance.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      74c025c5
    • zhenwei pi's avatar
      virtio_balloon: introduce memory allocation stall counter · c5b70a26
      zhenwei pi authored
      Memory allocation stall counter represents the performance/latency of
      memory allocation, expose this counter to the host side by virtio
      balloon device via out-of-bound way.
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarzhenwei pi <pizhenwei@bytedance.com>
      Message-Id: <20240423034109.1552866-4-pizhenwei@bytedance.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      c5b70a26
    • zhenwei pi's avatar
      virtio_balloon: introduce oom-kill invocations · 6cf1c97d
      zhenwei pi authored
      When the guest OS runs under critical memory pressure, the guest
      starts to kill processes. A guest monitor agent may scan 'oom_kill'
      from /proc/vmstat, and reports the OOM KILL event. However, the agent
      may be killed and we will loss this critical event(and the later
      events).
      
      For now we can also grep for magic words in guest kernel log from host
      side. Rather than this unstable way, virtio balloon reports OOM-KILL
      invocations instead.
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarzhenwei pi <pizhenwei@bytedance.com>
      Message-Id: <20240423034109.1552866-3-pizhenwei@bytedance.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      6cf1c97d
    • Philip Chen's avatar
      virtio_pmem: Check device status before requesting flush · e25fbcd9
      Philip Chen authored
      If a pmem device is in a bad status, the driver side could wait for
      host ack forever in virtio_pmem_flush(), causing the system to hang.
      
      So add a status check in the beginning of virtio_pmem_flush() to return
      early if the device is not activated.
      Signed-off-by: default avatarPhilip Chen <philipchen@chromium.org>
      Message-Id: <20240826215313.2673566-1-philipchen@chromium.org>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com
      e25fbcd9
    • Jason Wang's avatar
      vhost_vdpa: assign irq bypass producer token correctly · 02e9e936
      Jason Wang authored
      We used to call irq_bypass_unregister_producer() in
      vhost_vdpa_setup_vq_irq() which is problematic as we don't know if the
      token pointer is still valid or not.
      
      Actually, we use the eventfd_ctx as the token so the life cycle of the
      token should be bound to the VHOST_SET_VRING_CALL instead of
      vhost_vdpa_setup_vq_irq() which could be called by set_status().
      
      Fixing this by setting up irq bypass producer's token when handling
      VHOST_SET_VRING_CALL and un-registering the producer before calling
      vhost_vring_ioctl() to prevent a possible use after free as eventfd
      could have been released in vhost_vring_ioctl(). And such registering
      and unregistering will only be done if DRIVER_OK is set.
      Reported-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Tested-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Reviewed-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Fixes: 2cf1ba9a ("vhost_vdpa: implement IRQ offloading in vhost_vdpa")
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Message-Id: <20240816031900.18013-1-jasowang@redhat.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      02e9e936
    • Dragos Tatulea's avatar
      vdpa/mlx5: Fix invalid mr resource destroy · dc125029
      Dragos Tatulea authored
      Certain error paths from mlx5_vdpa_dev_add() can end up releasing mr
      resources which never got initialized in the first place.
      
      This patch adds the missing check in mlx5_vdpa_destroy_mr_resources()
      to block releasing non-initialized mr resources.
      
      Reference trace:
      
        mlx5_core 0000:08:00.2: mlx5_vdpa_dev_add:3274:(pid 2700) warning: No mac address provisioned?
        BUG: kernel NULL pointer dereference, address: 0000000000000000
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 140216067 P4D 0
        Oops: 0000 [#1] PREEMPT SMP NOPTI
        CPU: 8 PID: 2700 Comm: vdpa Kdump: loaded Not tainted 5.14.0-496.el9.x86_64 #1
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        RIP: 0010:vhost_iotlb_del_range+0xf/0xe0 [vhost_iotlb]
        Code: [...]
        RSP: 0018:ff1c823ac23077f0 EFLAGS: 00010246
        RAX: ffffffffc1a21a60 RBX: ffffffff899567a0 RCX: 0000000000000000
        RDX: ffffffffffffffff RSI: 0000000000000000 RDI: 0000000000000000
        RBP: ff1bda1f7c21e800 R08: 0000000000000000 R09: ff1c823ac2307670
        R10: ff1c823ac2307668 R11: ffffffff8a9e7b68 R12: 0000000000000000
        R13: 0000000000000000 R14: ff1bda1f43e341a0 R15: 00000000ffffffea
        FS:  00007f56eba7c740(0000) GS:ff1bda269f800000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000000 CR3: 0000000104d90001 CR4: 0000000000771ef0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        PKRU: 55555554
        Call Trace:
      
         ? show_trace_log_lvl+0x1c4/0x2df
         ? show_trace_log_lvl+0x1c4/0x2df
         ? mlx5_vdpa_free+0x3d/0x150 [mlx5_vdpa]
         ? __die_body.cold+0x8/0xd
         ? page_fault_oops+0x134/0x170
         ? __irq_work_queue_local+0x2b/0xc0
         ? irq_work_queue+0x2c/0x50
         ? exc_page_fault+0x62/0x150
         ? asm_exc_page_fault+0x22/0x30
         ? __pfx_mlx5_vdpa_free+0x10/0x10 [mlx5_vdpa]
         ? vhost_iotlb_del_range+0xf/0xe0 [vhost_iotlb]
         mlx5_vdpa_free+0x3d/0x150 [mlx5_vdpa]
         vdpa_release_dev+0x1e/0x50 [vdpa]
         device_release+0x31/0x90
         kobject_cleanup+0x37/0x130
         mlx5_vdpa_dev_add+0x2d2/0x7a0 [mlx5_vdpa]
         vdpa_nl_cmd_dev_add_set_doit+0x277/0x4c0 [vdpa]
         genl_family_rcv_msg_doit+0xd9/0x130
         genl_family_rcv_msg+0x14d/0x220
         ? __pfx_vdpa_nl_cmd_dev_add_set_doit+0x10/0x10 [vdpa]
         ? _copy_to_user+0x1a/0x30
         ? move_addr_to_user+0x4b/0xe0
         genl_rcv_msg+0x47/0xa0
         ? __import_iovec+0x46/0x150
         ? __pfx_genl_rcv_msg+0x10/0x10
         netlink_rcv_skb+0x54/0x100
         genl_rcv+0x24/0x40
         netlink_unicast+0x245/0x370
         netlink_sendmsg+0x206/0x440
         __sys_sendto+0x1dc/0x1f0
         ? do_read_fault+0x10c/0x1d0
         ? do_pte_missing+0x10d/0x190
         __x64_sys_sendto+0x20/0x30
         do_syscall_64+0x5c/0xf0
         ? __count_memcg_events+0x4f/0xb0
         ? mm_account_fault+0x6c/0x100
         ? handle_mm_fault+0x116/0x270
         ? do_user_addr_fault+0x1d6/0x6a0
         ? do_syscall_64+0x6b/0xf0
         ? clear_bhb_loop+0x25/0x80
         ? clear_bhb_loop+0x25/0x80
         ? clear_bhb_loop+0x25/0x80
         ? clear_bhb_loop+0x25/0x80
         ? clear_bhb_loop+0x25/0x80
         entry_SYSCALL_64_after_hwframe+0x78/0x80
      
      Fixes: 512c0cdd ("vdpa/mlx5: Decouple cvq iotlb handling from hw mapping code")
      Signed-off-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Reviewed-by: default avatarCosmin Ratiu <cratiu@nvidia.com>
      Message-Id: <20240827160808.2448017-2-dtatulea@nvidia.com>
      Signed-off-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarSi-Wei Liu <si-wei.liu@oracle.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      dc125029
  3. 01 Sep, 2024 3 commits
    • Linus Torvalds's avatar
      Linux 6.11-rc6 · 431c1646
      Linus Torvalds authored
      431c1646
    • Linus Torvalds's avatar
      Merge tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 6b9ffc45
      Linus Torvalds authored
      Pull smb client fixes from Steve French:
      
       - copy_file_range fix
      
       - two read fixes including read past end of file rc fix and read retry
         crediting fix
      
       - falloc zero range fix
      
      * tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: Fix FALLOC_FL_ZERO_RANGE to preflush buffered part of target region
        cifs: Fix copy offload to flush destination region
        netfs, cifs: Fix handling of short DIO read
        cifs: Fix lack of credit renegotiation on read retry
      6b9ffc45
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs · a4c76312
      Linus Torvalds authored
      Push bcachefs fixes from Kent Overstreet:
       "The data corruption in the buffered write path is troubling; inode
        lock should not have been able to cause that...
      
         - Fix a rare data corruption in the rebalance path, caught as a nonce
           inconsistency on encrypted filesystems
      
         - Revert lockless buffered write path
      
         - Mark more errors as autofix"
      
      * tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs:
        bcachefs: Mark more errors as autofix
        bcachefs: Revert lockless buffered IO path
        bcachefs: Fix bch2_extents_match() false positive
        bcachefs: Fix failure to return error in data_update_index_update()
      a4c76312
  4. 31 Aug, 2024 2 commits
    • Kent Overstreet's avatar
      bcachefs: Mark more errors as autofix · 3d3020c4
      Kent Overstreet authored
      errors that are known to always be safe to fix should be autofix: this
      should be most errors even at this point, but that will need some
      thorough review.
      
      note that errors are still logged in the superblock, so we'll still know
      that they happened.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      3d3020c4
    • Kent Overstreet's avatar
      bcachefs: Revert lockless buffered IO path · e3e69409
      Kent Overstreet authored
      We had a report of data corruption on nixos when building installer
      images.
      
      https://github.com/NixOS/nixpkgs/pull/321055#issuecomment-2184131334
      
      It seems that writes are being dropped, but only when issued by QEMU,
      and possibly only in snapshot mode. It's undetermined if it's write
      calls are being dropped or dirty folios.
      
      Further testing, via minimizing the original patch to just the change
      that skips the inode lock on non appends/truncates, reveals that it
      really is just not taking the inode lock that causes the corruption: it
      has nothing to do with the other logic changes for preserving write
      atomicity in corner cases.
      
      It's also kernel config dependent: it doesn't reproduce with the minimal
      kernel config that ktest uses, but it does reproduce with nixos's distro
      config. Bisection the kernel config initially pointer the finger at page
      migration or compaction, but it appears that was erroneous; we haven't
      yet determined what kernel config option actually triggers it.
      
      Sadly it appears this will have to be reverted since we're getting too
      close to release and my plate is full, but we'd _really_ like to fully
      debug it.
      
      My suspicion is that this patch is exposing a preexisting bug - the
      inode lock actually covers very little in IO paths, and we have a
      different lock (the pagecache add lock) that guards against races with
      truncate here.
      
      Fixes: 7e64c86c ("bcachefs: Buffered write path now can avoid the inode lock")
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      e3e69409