1. 11 Nov, 2010 1 commit
  2. 09 Nov, 2010 3 commits
    • Sage Weil's avatar
      ceph: explicitly specify page alignment in network messages · c5c6b19d
      Sage Weil authored
      The alignment used for reading data into or out of pages used to be taken
      from the data_off field in the message header.  This only worked as long
      as the page alignment matched the object offset, breaking direct io to
      non-page aligned offsets.
      
      Instead, explicitly specify the page alignment next to the page vector
      in the ceph_msg struct, and use that instead of the message header (which
      probably shouldn't be trusted).  The alloc_msg callback is responsible for
      filling in this field properly when it sets up the page vector.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      c5c6b19d
    • Sage Weil's avatar
      ceph: make page alignment explicit in osd interface · b7495fc2
      Sage Weil authored
      We used to infer alignment of IOs within a page based on the file offset,
      which assumed they matched.  This broke with direct IO that was not aligned
      to pages (e.g., 512-byte aligned IO).  We were also trusting the alignment
      specified in the OSD reply, which could have been adjusted by the server.
      
      Explicitly specify the page alignment when setting up OSD IO requests.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      b7495fc2
    • Sage Weil's avatar
      ceph: fix comment, remove extraneous args · e98b6fed
      Sage Weil authored
      The offset/length arguments aren't used.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      e98b6fed
  3. 08 Nov, 2010 4 commits
    • Sage Weil's avatar
      ceph: fix update of ctime from MDS · d8672d64
      Sage Weil authored
      The client can have a newer ctime than the MDS due to AUTH_EXCL and
      XATTR_EXCL caps as well; update the check in ceph_fill_file_time
      appropriately.
      
      This fixes cases where ctime/mtime goes backward under the right sequence
      of local updates (e.g. chmod) and mds replies (e.g. subsequent stat that
      goes to the MDS).
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      d8672d64
    • Sage Weil's avatar
      ceph: fix version check on racing inode updates · 8bd59e01
      Sage Weil authored
      We may get updates on the same inode from multiple MDSs; generally we only
      pay attention if the update is newer than what we already have.  The
      exception is when an MDS sense unstable information, in which case we
      always update.
      
      The old > check got this wrong when our version was odd (e.g. 3) and the
      reply version was even (e.g. 2): the older stale (v2) info would be
      applied.  Fixed and clarified the comment.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      8bd59e01
    • Sage Weil's avatar
      ceph: fix uid/gid on resent mds requests · cb4276cc
      Sage Weil authored
      MDS requests can be rebuilt and resent in non-process context, but were
      filling in uid/gid from current_fsuid/gid.  Put that information in the
      request struct on request setup.
      
      This fixes incorrect (and root) uid/gid getting set for requests that
      are forwarded between MDSs, usually due to metadata migrations.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      cb4276cc
    • Sage Weil's avatar
      ceph: fix rdcache_gen usage and invalidate · cd045cb4
      Sage Weil authored
      We used to use rdcache_gen to indicate whether we "might" have cached
      pages.  Now we just look at the mapping to determine that.  However, some
      old behavior remains from that transition.
      
      First, rdcache_gen == 0 no longer means we have no pages.  That can happen
      at any time (presumably when we carry FILE_CACHE).  We should not reset it
      to zero, and we should not check that it is zero.
      
      That means that the only purpose for rdcache_revoking is to resolve races
      between new issues of FILE_CACHE and an async invalidate.  If they are
      equal, we should invalidate.  On success, we decrement rdcache_revoking,
      so that it is no longer equal to rdcache_gen.  Similarly, if we success
      in doing a sync invalidate, set revoking = gen - 1.  (This is a small
      optimization to avoid doing unnecessary invalidate work and does not
      affect correctness.)
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      cd045cb4
  4. 07 Nov, 2010 4 commits
    • Sage Weil's avatar
      ceph: re-request max_size if cap auth changes · feb4cc9b
      Sage Weil authored
      If the auth cap migrates to another MDS, clear requested_max_size so that
      we resend any pending max_size increase requests.  This fixes potential
      hangs on writes that extend a file and race with an cap migration between
      MDSs.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      feb4cc9b
    • Sage Weil's avatar
      ceph: only let auth caps update max_size · 912a9b03
      Sage Weil authored
      Only the auth MDS has a meaningful max_size value for us, so only update it
      in fill_inode if we're being issued an auth cap.  Otherwise, a random
      stat result from a non-auth MDS can clobber a meaningful max_size, get
      the client<->mds cap state out of sync, and make writes hang.
      
      Specifically, even if the client re-requests a larger max_size (which it
      will), the MDS won't respond because as far as it knows we already have a
      sufficiently large value.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      912a9b03
    • Sage Weil's avatar
      ceph: fix open for write on clustered mds · 7421ab80
      Sage Weil authored
      Normally when we open a file we already have a cap, and simply update the
      wanted set.  However, if we open a file for write, but don't have an auth
      cap, that doesn't work; we need to open a new cap with the auth MDS.  Only
      reuse existing caps if we are opening for read or the existing cap is auth.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      7421ab80
    • Sage Weil's avatar
      ceph: fix bad pointer dereference in ceph_fill_trace · d8b16b3d
      Sage Weil authored
      We dereference *in a few lines down, but only set it on rename.  It is
      apparently pretty rare for this to trigger, but I have been hitting it
      with a clustered MDSs.
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      d8b16b3d
  5. 01 Nov, 2010 1 commit
    • Sage Weil's avatar
      ceph: fix small seq message skipping · df9f86fa
      Sage Weil authored
      If the client gets out of sync with the server message sequence number, we
      normally skip low seq messages (ones we already received).  The skip code
      was also incrementing the expected seq, such that all subsequent messages
      also appeared old and got skipped, and an eventual timeout on the osd
      connection.  This resulted in some lagging requests and console messages
      like
      
      [233480.882885] ceph: skipping osd22 10.138.138.13:6804 seq 2016, expected 2017
      [233480.882919] ceph: skipping osd22 10.138.138.13:6804 seq 2017, expected 2018
      [233480.882963] ceph: skipping osd22 10.138.138.13:6804 seq 2018, expected 2019
      [233480.883488] ceph: skipping osd22 10.138.138.13:6804 seq 2019, expected 2020
      [233485.219558] ceph: skipping osd22 10.138.138.13:6804 seq 2020, expected 2021
      [233485.906595] ceph: skipping osd22 10.138.138.13:6804 seq 2021, expected 2022
      [233490.379536] ceph: skipping osd22 10.138.138.13:6804 seq 2022, expected 2023
      [233495.523260] ceph: skipping osd22 10.138.138.13:6804 seq 2023, expected 2024
      [233495.923194] ceph: skipping osd22 10.138.138.13:6804 seq 2024, expected 2025
      [233500.534614] ceph:  tid 6023602 timed out on osd22, will reset osd
      Reported-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      df9f86fa
  6. 28 Oct, 2010 1 commit
    • Sage Weil's avatar
      Revert "ceph: update issue_seq on cap grant" · 2f56f56a
      Sage Weil authored
      This reverts commit d91f2438.
      
      The intent of issue_seq is to distinguish between mds->client messages that
      (re)create the cap and those that do not, which means we should _only_ be
      updating that value in the create paths.  By updating it in handle_cap_grant,
      we reset it to zero, which then breaks release.
      
      The larger question is what workload/problem made me think it should be
      updated here...
      Signed-off-by: default avatarSage Weil <sage@newdream.net>
      2f56f56a
  7. 20 Oct, 2010 22 commits
  8. 14 Oct, 2010 4 commits
    • Linus Torvalds's avatar
      Linux 2.6.36-rc8 · cd07202c
      Linus Torvalds authored
      cd07202c
    • Linus Torvalds's avatar
      Un-inline the core-dump helper functions · 3aa0ce82
      Linus Torvalds authored
      Tony Luck reports that the addition of the access_ok() check in commit
      0eead9ab ("Don't dump task struct in a.out core-dumps") broke the
      ia64 compile due to missing the necessary header file includes.
      
      Rather than add yet another include (<asm/unistd.h>) to make everything
      happy, just uninline the silly core dump helper functions and move the
      bodies to fs/exec.c where they make a lot more sense.
      
      dump_seek() in particular was too big to be an inline function anyway,
      and none of them are in any way performance-critical.  And we really
      don't need to mess up our include file headers more than they already
      are.
      Reported-and-tested-by: default avatarTony Luck <tony.luck@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3aa0ce82
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · ae42d8d4
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
        ehea: Fix a checksum issue on the receive path
        net: allow FEC driver to use fixed PHY support
        tg3: restore rx_dropped accounting
        b44: fix carrier detection on bind
        net: clear heap allocations for privileged ethtool actions
        NET: wimax, fix use after free
        ATM: iphase, remove sleep-inside-atomic
        ATM: mpc, fix use after free
        ATM: solos-pci, remove use after free
        net/fec: carrier off initially to avoid root mount failure
        r8169: use device model DMA API
        r8169: allocate with GFP_KERNEL flag when able to sleep
      ae42d8d4
    • Linus Torvalds's avatar
      Don't dump task struct in a.out core-dumps · 0eead9ab
      Linus Torvalds authored
      akiphie points out that a.out core-dumps have that odd task struct
      dumping that was never used and was never really a good idea (it goes
      back into the mists of history, probably the original core-dumping
      code).  Just remove it.
      
      Also do the access_ok() check on dump_write().  It probably doesn't
      matter (since normal filesystems all seem to do it anyway), but he
      points out that it's normally done by the VFS layer, so ...
      
      [ I suspect that we should possibly do "vfs_write()" instead of
        calling ->write directly.  That also does the whole fsnotify and write
        statistics thing, which may or may not be a good idea. ]
      
      And just to be anal, do this all for the x86-64 32-bit a.out emulation
      code too, even though it's not enabled (and won't currently even
      compile)
      Reported-by: default avatarakiphie <akiphie@lavabit.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0eead9ab