1. 04 Jun, 2020 1 commit
    • David Howells's avatar
      afs: Build an abstraction around an "operation" concept · e49c7b2f
      David Howells authored
      Turn the afs_operation struct into the main way that most fileserver
      operations are managed.  Various things are added to the struct, including
      the following:
      
       (1) All the parameters and results of the relevant operations are moved
           into it, removing corresponding fields from the afs_call struct.
           afs_call gets a pointer to the op.
      
       (2) The target volume is made the main focus of the operation, rather than
           the target vnode(s), and a bunch of op->vnode->volume are made
           op->volume instead.
      
       (3) Two vnode records are defined (op->file[]) for the vnode(s) involved
           in most operations.  The vnode record (struct afs_vnode_param)
           contains:
      
      	- The vnode pointer.
      
      	- The fid of the vnode to be included in the parameters or that was
                returned in the reply (eg. FS.MakeDir).
      
      	- The status and callback information that may be returned in the
           	  reply about the vnode.
      
      	- Callback break and data version tracking for detecting
                simultaneous third-parth changes.
      
       (4) Pointers to dentries to be updated with new inodes.
      
       (5) An operations table pointer.  The table includes pointers to functions
           for issuing AFS and YFS-variant RPCs, handling the success and abort
           of an operation and handling post-I/O-lock local editing of a
           directory.
      
      To make this work, the following function restructuring is made:
      
       (A) The rotation loop that issues calls to fileservers that can be found
           in each function that wants to issue an RPC (such as afs_mkdir()) is
           extracted out into common code, in a new file called fs_operation.c.
      
       (B) The rotation loops, such as the one in afs_mkdir(), are replaced with
           a much smaller piece of code that allocates an operation, sets the
           parameters and then calls out to the common code to do the actual
           work.
      
       (C) The code for handling the success and failure of an operation are
           moved into operation functions (as (5) above) and these are called
           from the core code at appropriate times.
      
       (D) The pseudo inode getting stuff used by the dynamic root code is moved
           over into dynroot.c.
      
       (E) struct afs_iget_data is absorbed into the operation struct and
           afs_iget() expects to be given an op pointer and a vnode record.
      
       (F) Point (E) doesn't work for the root dir of a volume, but we know the
           FID in advance (it's always vnode 1, unique 1), so a separate inode
           getter, afs_root_iget(), is provided to special-case that.
      
       (G) The inode status init/update functions now also take an op and a vnode
           record.
      
       (H) The RPC marshalling functions now, for the most part, just take an
           afs_operation struct as their only argument.  All the data they need
           is held there.  The result delivery functions write their answers
           there as well.
      
       (I) The call is attached to the operation and then the operation core does
           the waiting.
      
      And then the new operation code is, for the moment, made to just initialise
      the operation, get the appropriate vnode I/O locks and do the same rotation
      loop as before.
      
      This lays the foundation for the following changes in the future:
      
       (*) Overhauling the rotation (again).
      
       (*) Support for asynchronous I/O, where the fileserver rotation must be
           done asynchronously also.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      e49c7b2f
  2. 31 May, 2020 12 commits
    • David Howells's avatar
      afs: Rename struct afs_fs_cursor to afs_operation · a310082f
      David Howells authored
      As a prelude to implementing asynchronous fileserver operations in the afs
      filesystem, rename struct afs_fs_cursor to afs_operation.
      
      This struct is going to form the core of the operation management and is
      going to acquire more members in later.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      a310082f
    • David Howells's avatar
      afs: Remove the error argument from afs_protocol_error() · 7126ead9
      David Howells authored
      Remove the error argument from afs_protocol_error() as it's always
      -EBADMSG.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      7126ead9
    • David Howells's avatar
      afs: Set error flag rather than return error from file status decode · 38355eec
      David Howells authored
      Set a flag in the call struct to indicate an unmarshalling error rather
      than return and handle an error from the decoding of file statuses.  This
      flag is checked on a successful return from the delivery function.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      38355eec
    • David Howells's avatar
      afs: Make callback processing more efficient. · 8230fd82
      David Howells authored
      afs_vol_interest objects represent the volume IDs currently being accessed
      from a fileserver.  These hold lists of afs_cb_interest objects that
      repesent the superblocks using that volume ID on that server.
      
      When a callback notification from the server telling of a modification by
      another client arrives, the volume ID specified in the notification is
      looked up in the server's afs_vol_interest list.  Through the
      afs_cb_interest list, the relevant superblocks can be iterated over and the
      specific inode looked up and marked in each one.
      
      Make the following efficiency improvements:
      
       (1) Hold rcu_read_lock() over the entire processing rather than locking it
           each time.
      
       (2) Do all the callbacks for each vid together rather than individually.
           Each volume then only needs to be looked up once.
      
       (3) afs_vol_interest objects are now stored in an rb_tree rather than a
           flat list to reduce the lookup step count.
      
       (4) afs_vol_interest lookup is now done with RCU, but because it's in an
           rb_tree which may rotate under us, a seqlock is used so that if it
           changes during the walk, we repeat the walk with a lock held.
      
      With this and the preceding patch which adds RCU-based lookups in the inode
      cache, target volumes/vnodes can be taken without the need to take any
      locks, except on the target itself.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      8230fd82
    • David Howells's avatar
      afs: Show more information in /proc/net/afs/servers · 6d043a57
      David Howells authored
      Show more information in /proc/net/afs/servers to make it easier to see
      what's going on with the server probing.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      6d043a57
    • David Howells's avatar
      afs: Actively poll fileservers to maintain NAT or firewall openings · f6cbb368
      David Howells authored
      When an AFS client accesses a file, it receives a limited-duration callback
      promise that the server will notify it if another client changes a file.
      This callback duration can be a few hours in length.
      
      If a client mounts a volume and then an application prevents it from being
      unmounted, say by chdir'ing into it, but then does nothing for some time,
      the rxrpc_peer record will expire and rxrpc-level keepalive will cease.
      
      If there is NAT or a firewall between the client and the server, the route
      back for the server may close after a comparatively short duration, meaning
      that attempts by the server to notify the client may then bounce.
      
      The client, however, may (so far as it knows) still have a valid unexpired
      promise and will then rely on its cached data and will not see changes made
      on the server by a third party until it incidentally rechecks the status or
      the promise needs renewal.
      
      To deal with this, the client needs to regularly probe the server.  This
      has two effects: firstly, it keeps a route open back for the server, and
      secondly, it causes the server to disgorge any notifications that got
      queued up because they couldn't be sent.
      
      Fix this by adding a mechanism to emit regular probes.
      
      Two levels of probing are made available: Under normal circumstances the
      'slow' queue will be used for a fileserver - this just probes the preferred
      address once every 5 mins or so; however, if server fails to respond to any
      probes, the server will shift to the 'fast' queue from which all its
      interfaces will be probed every 30s.  When it finally responds, the record
      will switch back to the slow queue.
      
      Further notes:
      
       (1) Probing is now no longer driven from the fileserver rotation
           algorithm.
      
       (2) Probes are dispatched to all interfaces on a fileserver when that an
           afs_server object is set up to record it.
      
       (3) The afs_server object is removed from the probe queues when we start
           to probe it.  afs_is_probing_server() returns true if it's not listed
           - ie. it's undergoing probing.
      
       (4) The afs_server object is added back on to the probe queue when the
           final outstanding probe completes, but the probed_at time is set when
           we're about to launch a probe so that it's not dependent on the probe
           duration.
      
       (5) The timer and the work item added for this must be handed a count on
           net->servers_outstanding, which they hand on or release.  This makes
           sure that network namespace cleanup waits for them.
      
      Fixes: d2ddc776 ("afs: Overhaul volume and server record caching and fileserver rotation")
      Reported-by: default avatarDave Botsch <botsch@cnf.cornell.edu>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      f6cbb368
    • David Howells's avatar
      afs: Split the usage count on struct afs_server · 977e5f8e
      David Howells authored
      Split the usage count on the afs_server struct to have an active count that
      registers who's actually using it separately from the reference count on
      the object.
      
      This allows a future patch to dispatch polling probes without advancing the
      "unuse" time into the future each time we emit a probe, which would
      otherwise prevent unused server records from expiring.
      
      Included in this:
      
       (1) The latter part of afs_destroy_server() in which the RCU destruction
           of afs_server objects is invoked and the outstanding server count is
           decremented is split out into __afs_put_server().
      
       (2) afs_put_server() now calls __afs_put_server() rather then setting the
           management timer.
      
       (3) The calls begun by afs_fs_give_up_all_callbacks() and
           afs_fs_get_capabilities() can now take a ref on the server record, so
           afs_destroy_server() can just drop its ref and needn't wait for the
           completion of these calls.  They'll put the ref when they're done.
      
       (4) Because of (3), afs_fs_probe_done() no longer needs to wake up
           afs_destroy_server() with server->probe_outstanding.
      
       (5) afs_gc_servers can be simplified.  It only needs to check if
           server->active is 0 rather than playing games with the refcount.
      
       (6) afs_manage_servers() can propose a server for gc if usage == 0 rather
           than if ref == 1.  The gc is effected by (5).
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      977e5f8e
    • David Howells's avatar
      afs: Use the serverUnique field in the UVLDB record to reduce rpc ops · 81006805
      David Howells authored
      The U-version VLDB volume record retrieved by the VL.GetEntryByNameU rpc op
      carries a change counter (the serverUnique field) for each fileserver
      listed in the record as backing that volume.  This is incremented whenever
      the registration details for a fileserver change (such as its address
      list).  Note that the same value will be seen in all UVLDB records that
      refer to that fileserver.
      
      This should be checked before calling the VL server to re-query the address
      list for a fileserver.  If it's the same, there's no point doing the query.
      Reported-by: default avatarJeffrey Altman <jaltman@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      81006805
    • David Howells's avatar
      afs: Always include dir in bulk status fetch from afs_do_lookup() · 13fcc635
      David Howells authored
      When a lookup is done in an AFS directory, the filesystem will speculate
      and fetch up to 49 other statuses for files in the same directory and fetch
      those as well, turning them into inodes or updating inodes that already
      exist.
      
      However, occasionally, a callback break might go missing due to NAT timing
      out, but the afs filesystem doesn't then realise that the directory is not
      up to date.
      
      Alleviate this by using one of the status slots to check the directory in
      which the lookup is being done.
      Reported-by: default avatarDave Botsch <botsch@cnf.cornell.edu>
      Suggested-by: default avatarJeffrey Altman <jaltman@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      13fcc635
    • David Howells's avatar
      rxrpc: Adjust /proc/net/rxrpc/calls to display call->debug_id not user_ID · 32f71aa4
      David Howells authored
      The user ID value isn't actually much use - and leaks a kernel pointer or a
      userspace value - so replace it with the call debug ID, which appears in trace
      points.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      32f71aa4
    • David Howells's avatar
      rxrpc: Map the EACCES error produced by some ICMP6 to EHOSTUNREACH · 23e2db31
      David Howells authored
      Map the EACCES error that is produced by some ICMP6 packets to EHOSTUNREACH
      when we get them as EACCES has other meanings within a filesystem context.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      23e2db31
    • David Howells's avatar
      vfs, afs, ext4: Make the inode hash table RCU searchable · 3f19b2ab
      David Howells authored
      Make the inode hash table RCU searchable so that searches that want to
      access or modify an inode without taking a ref on that inode can do so
      without taking the inode hash table lock.
      
      The main thing this requires is some RCU annotation on the list
      manipulation operations.  Inodes are already freed by RCU in most cases.
      
      Users of this interface must take care as the inode may be still under
      construction or may be being torn down around them.
      
      There are at least three instances where this can be of use:
      
       (1) Testing whether the inode number iunique() is going to return is
           currently unique (the iunique_lock is still held).
      
       (2) Ext4 date stamp updating.
      
       (3) AFS callback breaking.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      cc: linux-ext4@vger.kernel.org
      cc: linux-afs@lists.infradead.org
      3f19b2ab
  3. 24 May, 2020 5 commits
    • Linus Torvalds's avatar
      Linux 5.7-rc7 · 9cb1fd0e
      Linus Torvalds authored
      9cb1fd0e
    • Linus Torvalds's avatar
      Merge tag 'efi-urgent-2020-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 98790bba
      Linus Torvalds authored
      Pull EFI fixes from Thomas Gleixner:
       "A set of EFI fixes:
      
         - Don't return a garbage screen info when EFI framebuffer is not
           available
      
         - Make the early EFI console work properly with wider fonts instead
           of drawing garbage
      
         - Prevent a memory buffer leak in allocate_e820()
      
         - Print the firmware error record properly so it can be decoded by
           users
      
         - Fix a symbol clash in the host tool build which only happens with
           newer compilers.
      
         - Add a missing check for the event log version of TPM which caused
           boot failures on several Dell systems due to an attempt to decode
           SHA-1 format with the crypto agile algorithm"
      
      * tag 'efi-urgent-2020-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        tpm: check event log version before reading final events
        efi: Pull up arch-specific prototype efi_systab_show_arch()
        x86/boot: Mark global variables as static
        efi: cper: Add support for printing Firmware Error Record Reference
        efi/libstub/x86: Avoid EFI map buffer alloc in allocate_e820()
        efi/earlycon: Fix early printk for wider fonts
        efi/libstub: Avoid returning uninitialized data from setup_graphics()
      98790bba
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2020-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 667b6249
      Linus Torvalds authored
      Pull x86 fixes from Thomas Gleixner:
       "Two fixes for x86:
      
         - Unbreak stack dumps for inactive tasks by interpreting the special
           first frame left by __switch_to_asm() correctly.
      
           The recent change not to skip the first frame so ORC and frame
           unwinder behave in the same way caused all entries to be
           unreliable, i.e. prepended with '?'.
      
         - Use cpumask_available() instead of an implicit NULL check of a
           cpumask_var_t in mmio trace to prevent a Clang build warning"
      
      * tag 'x86-urgent-2020-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/unwind/orc: Fix unwind_get_return_address_ptr() for inactive tasks
        x86/mmiotrace: Use cpumask_available() for cpumask_var_t variables
      667b6249
    • Linus Torvalds's avatar
      Merge tag 'sched-urgent-2020-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9e61d12b
      Linus Torvalds authored
      Pull scheduler fixes from Thomas Gleixner:
       "A set of fixes for the scheduler:
      
         - Fix handling of throttled parents in enqueue_task_fair() completely.
      
           The recent fix overlooked a corner case where the first iteration
           terminates due to an entity already being on the runqueue which
           makes the list management incomplete and later triggers the
           assertion which checks for completeness.
      
         - Fix a similar problem in unthrottle_cfs_rq().
      
         - Show the correct uclamp values in procfs which prints the effective
           value twice instead of requested and effective"
      
      * tag 'sched-urgent-2020-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/fair: Fix unthrottle_cfs_rq() for leaf_cfs_rq list
        sched/debug: Fix requested task uclamp values shown in procfs
        sched/fair: Fix enqueue_task_fair() warning some more
      9e61d12b
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · caffb99b
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix RCU warnings in ipv6 multicast router code, from Madhuparna
          Bhowmik.
      
       2) Nexthop attributes aren't being checked properly because of
          mis-initialized iterator, from David Ahern.
      
       3) Revert iop_idents_reserve() change as it caused performance
          regressions and was just working around what is really a UBSAN bug
          in the compiler. From Yuqi Jin.
      
       4) Read MAC address properly from ROM in bmac driver (double iteration
          proceeds past end of address array), from Jeremy Kerr.
      
       5) Add Microsoft Surface device IDs to r8152, from Marc Payne.
      
       6) Prevent reference to freed SKB in __netif_receive_skb_core(), from
          Boris Sukholitko.
      
       7) Fix ACK discard behavior in rxrpc, from David Howells.
      
       8) Preserve flow hash across packet scrubbing in wireguard, from Jason
          A. Donenfeld.
      
       9) Cap option length properly for SO_BINDTODEVICE in AX25, from Eric
          Dumazet.
      
      10) Fix encryption error checking in kTLS code, from Vadim Fedorenko.
      
      11) Missing BPF prog ref release in flow dissector, from Jakub Sitnicki.
      
      12) dst_cache must be used with BH disabled in tipc, from Eric Dumazet.
      
      13) Fix use after free in mlxsw driver, from Jiri Pirko.
      
      14) Order kTLS key destruction properly in mlx5 driver, from Tariq
          Toukan.
      
      15) Check devm_platform_ioremap_resource() return value properly in
          several drivers, from Tiezhu Yang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits)
        net: smsc911x: Fix runtime PM imbalance on error
        net/mlx4_core: fix a memory leak bug.
        net: ethernet: ti: cpsw: fix ASSERT_RTNL() warning during suspend
        net: phy: mscc: fix initialization of the MACsec protocol mode
        net: stmmac: don't attach interface until resume finishes
        net: Fix return value about devm_platform_ioremap_resource()
        net/mlx5: Fix error flow in case of function_setup failure
        net/mlx5e: CT: Correctly get flow rule
        net/mlx5e: Update netdev txq on completions during closure
        net/mlx5: Annotate mutex destroy for root ns
        net/mlx5: Don't maintain a case of del_sw_func being null
        net/mlx5: Fix cleaning unmanaged flow tables
        net/mlx5: Fix memory leak in mlx5_events_init
        net/mlx5e: Fix inner tirs handling
        net/mlx5e: kTLS, Destroy key object after destroying the TIS
        net/mlx5e: Fix allowed tc redirect merged eswitch offload cases
        net/mlx5: Avoid processing commands before cmdif is ready
        net/mlx5: Fix a race when moving command interface to events mode
        net/mlx5: Add command entry handling completion
        rxrpc: Fix a memory leak in rxkad_verify_response()
        ...
      caffb99b
  4. 23 May, 2020 22 commits