1. 10 Sep, 2013 1 commit
    • Milosz Tanski's avatar
      fscache: check consistency does not decrement refcount · 9c89d629
      Milosz Tanski authored
      __fscache_check_consistency() does not decrement the count of operations
      active after it finishes in the success case. This leads to a hung tasks on
      cookie de-registration (commonly in inode eviction).
      
      INFO: task kworker/1:2:4214 blocked for more than 120 seconds.
      kworker/1:2     D ffff880443513fc0     0  4214      2 0x00000000
      Workqueue: ceph-msgr con_work [libceph]
        ...
      Call Trace:
       [<ffffffff81569fc6>] ? _raw_spin_unlock_irqrestore+0x16/0x20
       [<ffffffffa0016570>] ? fscache_wait_bit_interruptible+0x30/0x30 [fscache]
       [<ffffffff81568d09>] schedule+0x29/0x70
       [<ffffffffa001657e>] fscache_wait_atomic_t+0xe/0x20 [fscache]
       [<ffffffff815665cf>] out_of_line_wait_on_atomic_t+0x9f/0xe0
       [<ffffffff81083560>] ? autoremove_wake_function+0x40/0x40
       [<ffffffffa0015a9c>] __fscache_relinquish_cookie+0x15c/0x310 [fscache]
       [<ffffffffa00a4fae>] ceph_fscache_unregister_inode_cookie+0x3e/0x50 [ceph]
       [<ffffffffa007e373>] ceph_destroy_inode+0x33/0x200 [ceph]
       [<ffffffff811c13ae>] ? __fsnotify_inode_delete+0xe/0x10
       [<ffffffff8119ba1c>] destroy_inode+0x3c/0x70
       [<ffffffff8119bb69>] evict+0x119/0x1b0
      Signed-off-by: default avatarMilosz Tanski <milosz@adfin.com>
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      9c89d629
  2. 09 Sep, 2013 6 commits
    • Josh Durgin's avatar
      rbd: fix error handling from rbd_snap_name() · da6a6b63
      Josh Durgin authored
      rbd_snap_name() calls rbd_dev_v{1,2}_snap_name() depending on the
      format of the image. The format 1 version returns NULL on error, which
      is handled by the caller. The format 2 version returns an ERR_PTR,
      which the caller of rbd_snap_name() does not expect.
      
      Fortunately this is unlikely to occur in practice because
      rbd_snap_id_by_name() is called before rbd_snap_name(). This would hit
      similar errors to rbd_snap_name() (like the snapshot not existing) and
      return early, so rbd_snap_name() would not hit an error unless the
      snapshot was removed between the two calls or memory was exhausted.
      
      Use an ERR_PTR in rbd_dev_v1_snap_name() so that the specific error
      can be propagated, and it is consistent with rbd_dev_v2_snap_name().
      Handle the ERR_PTR in the only rbd_snap_name() caller.
      Suggested-by: default avatarAlex Elder <alex.elder@linaro.org>
      Signed-off-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      da6a6b63
    • Josh Durgin's avatar
      rbd: ignore unmapped snapshots that no longer exist · efadc98a
      Josh Durgin authored
      This prevents erroring out while adding a device when a snapshot
      unrelated to the current mapping is deleted between reading the
      snapshot context and reading the snapshot names. If the mapped
      snapshot name is not found an error still occurs as usual.
      Signed-off-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      efadc98a
    • Josh Durgin's avatar
      rbd: fix use-after free of rbd_dev->disk · 9875201e
      Josh Durgin authored
      Removing a device deallocates the disk, unschedules the watch, and
      finally cleans up the rbd_dev structure. rbd_dev_refresh(), called
      from the watch callback, updates the disk size and rbd_dev
      structure. With no locking between them, rbd_dev_refresh() may use the
      device or rbd_dev after they've been freed.
      
      To fix this, check whether RBD_DEV_FLAG_REMOVING is set before
      updating the disk size in rbd_dev_refresh(). In order to prevent a
      race where rbd_dev_refresh() is already revalidating the disk when
      rbd_remove() is called, move the call to rbd_bus_del_dev() after the
      watch is unregistered and all notifies are complete. It's safe to
      defer deleting this structure because no new requests can be submitted
      once the RBD_DEV_FLAG_REMOVING is set, since the device cannot be
      opened.
      
      Fixes: http://tracker.ceph.com/issues/5636Signed-off-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      9875201e
    • Josh Durgin's avatar
      rbd: make rbd_obj_notify_ack() synchronous · 20e0af67
      Josh Durgin authored
      The only user of rbd_obj_notify_ack() is rbd_watch_cb(). It used
      asynchronously with no tracking of when the notify ack completes, so
      it may still be in progress when the osd_client is shut down.  This
      results in a BUG() since the osd client assumes no requests are in
      flight when it stops. Since all notifies are flushed before the
      osd_client is stopped, waiting for the notify ack to complete before
      returning from the watch callback ensures there are no notify acks in
      flight during shutdown.
      
      Rename rbd_obj_notify_ack() to rbd_obj_notify_ack_sync() to reflect
      its new synchronous nature.
      Signed-off-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      20e0af67
    • Josh Durgin's avatar
      rbd: complete notifies before cleaning up osd_client and rbd_dev · 9abc5990
      Josh Durgin authored
      To ensure rbd_dev is not used after it's released, flush all pending
      notify callbacks before calling rbd_dev_image_release(). No new
      notifies can be added to the queue at this point because the watch has
      already be unregistered with the osd_client.
      Signed-off-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      9abc5990
    • Josh Durgin's avatar
      libceph: add function to ensure notifies are complete · dd935f44
      Josh Durgin authored
      Without a way to flush the osd client's notify workqueue, a watch
      event that is unregistered could continue receiving callbacks
      indefinitely.
      
      Unregistering the event simply means no new notifies are added to the
      queue, but there may still be events in the queue that will call the
      watch callback for the event. If the queue is flushed after the event
      is unregistered, the caller can be sure no more watch callbacks will
      occur for the canceled watch.
      Signed-off-by: default avatarJosh Durgin <josh.durgin@inktank.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      Reviewed-by: default avatarAlex Elder <elder@linaro.org>
      dd935f44
  3. 06 Sep, 2013 13 commits
    • Yan, Zheng's avatar
      ceph: use d_invalidate() to invalidate aliases · a8d436f0
      Yan, Zheng authored
      d_invalidate() is the standard VFS method to invalidate dentry.
      compare to d_delete(), it also try shrinking children dentries.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      a8d436f0
    • Yan, Zheng's avatar
      ceph: remove ceph_lookup_inode() · ed284c49
      Yan, Zheng authored
      commit 6f60f889 (ceph: fix freeing inode vs removing session caps race)
      introduced ceph_lookup_inode(). But there is already a ceph_find_inode()
      which provides similar function. So remove ceph_lookup_inode(), use
      ceph_find_inode() instead.
      Signed-off-by: default avatarYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: default avatarAlex Elder <alex.elder@linary.org>
      Reviewed-by: default avatarSage Weil <sage@inktank.com>
      ed284c49
    • Milosz Tanski's avatar
      ceph: trivial buildbot warnings fix · 971f0bde
      Milosz Tanski authored
      The linux-next build bot found a three of warnings, this addresses all of them.
      
       * non-ANSI function declaration of function 'ceph_fscache_register' and
         'ceph_fscache_unregister'
       * symbol 'ceph_cache_netfs' was not declared, now it's extern in the header.
       * warning: "pr_fmt" redefined
      Signed-off-by: default avatarMilosz Tanski <milosz@adfin.com>
      971f0bde
    • Milosz Tanski's avatar
      ceph: Do not do invalidate if the filesystem is mounted nofsc · e81568eb
      Milosz Tanski authored
      Previously we would always try to enqueue work even if the filesystem is not
      mounted with fscache enabled (or the file has no cookie). In the case of the
      filesystem mouned nofsc (but with fscache compiled in) this would lead to a
      crash.
      Signed-off-by: default avatarMilosz Tanski <milosz@adfin.com>
      e81568eb
    • Milosz Tanski's avatar
      ceph: page still marked private_2 · d4d3aa38
      Milosz Tanski authored
      Previous patch that allowed us to cleanup most of the issues with pages marked
      as private_2 when calling ceph_readpages. However, there seams to be a case in
      the error case clean up in start read that still trigers this from time to
      time. I've only seen this one a couple times.
      
      BUG: Bad page state in process petabucket  pfn:335b82
      page:ffffea000cd6e080 count:0 mapcount:0 mapping:          (null) index:0x0
      page flags: 0x200000000001000(private_2)
      Call Trace:
       [<ffffffff81563442>] dump_stack+0x46/0x58
       [<ffffffff8112c7f7>] bad_page+0xc7/0x120
       [<ffffffff8112cd9e>] free_pages_prepare+0x10e/0x120
       [<ffffffff8112e580>] free_hot_cold_page+0x40/0x160
       [<ffffffff81132427>] __put_single_page+0x27/0x30
       [<ffffffff81132d95>] put_page+0x25/0x40
       [<ffffffffa02cb409>] ceph_readpages+0x2e9/0x6f0 [ceph]
       [<ffffffff811313cf>] __do_page_cache_readahead+0x1af/0x260
      Signed-off-by: default avatarMilosz Tanski <milosz@adfin.com>
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      d4d3aa38
    • Milosz Tanski's avatar
      ceph: ceph_readpage_to_fscache didn't check if marked · 9b8dd1e8
      Milosz Tanski authored
      Previously ceph_readpage_to_fscache did not call if page was marked as cached
      before calling fscache_write_page resulting in a BUG inside of fscache.
      
      FS-Cache: Assertion failed
      ------------[ cut here ]------------
      kernel BUG at fs/fscache/page.c:874!
      invalid opcode: 0000 [#1] SMP
      Call Trace:
       [<ffffffffa02e6566>] __ceph_readpage_to_fscache+0x66/0x80 [ceph]
       [<ffffffffa02caf84>] readpage_nounlock+0x124/0x210 [ceph]
       [<ffffffffa02cb08d>] ceph_readpage+0x1d/0x40 [ceph]
       [<ffffffff81126db6>] generic_file_aio_read+0x1f6/0x700
       [<ffffffffa02c6fcc>] ceph_aio_read+0x5fc/0xab0 [ceph]
      Signed-off-by: default avatarMilosz Tanski <milosz@adfin.com>
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      9b8dd1e8
    • Milosz Tanski's avatar
      ceph: clean PgPrivate2 on returning from readpages · 76be778b
      Milosz Tanski authored
      In some cases the ceph readapages code code bails without filling all the pages
      already marked by fscache. When we return back to readahead code this causes
      a BUG.
      Signed-off-by: default avatarMilosz Tanski <milosz@adfin.com>
      76be778b
    • Milosz Tanski's avatar
      ceph: use fscache as a local presisent cache · 99ccbd22
      Milosz Tanski authored
      Adding support for fscache to the Ceph filesystem. This would bring it to on
      par with some of the other network filesystems in Linux (like NFS, AFS, etc...)
      
      In order to mount the filesystem with fscache the 'fsc' mount option must be
      passed.
      Signed-off-by: default avatarMilosz Tanski <milosz@adfin.com>
      Signed-off-by: default avatarSage Weil <sage@inktank.com>
      99ccbd22
    • Milosz Tanski's avatar
      Merge tag 'fscache-fixes-for-ceph' into wip-fscache · cd0a2df6
      Milosz Tanski authored
      Patches for Ceph FS-Cache support
      cd0a2df6
    • Milosz Tanski's avatar
      fscache: Netfs function for cleanup post readpages · 5a6f282a
      Milosz Tanski authored
      Currently the fscache code expect the netfs to call fscache_readpages_or_alloc
      inside the aops readpages callback.  It marks all the pages in the list
      provided by readahead with PG_private_2.  In the cases that the netfs fails to
      read all the pages (which is legal) it ends up returning to the readahead and
      triggering a BUG.  This happens because the page list still contains marked
      pages.
      
      This patch implements a simple fscache_readpages_cancel function that the netfs
      should call before returning from readpages.  It will revoke the pages from the
      underlying cache backend and unmark them.
      
      The problem was originally worked out in the Ceph devel tree, but it also
      occurs in CIFS.  It appears that NFS, AFS and 9P are okay as read_cache_pages()
      will clean up the unprocessed pages in the case of an error.
      
      This can be used to address the following oops:
      
      [12410647.597278] BUG: Bad page state in process petabucket  pfn:3d504e
      [12410647.597292] page:ffffea000f541380 count:0 mapcount:0 mapping:
      	(null) index:0x0
      [12410647.597298] page flags: 0x200000000001000(private_2)
      
      ...
      
      [12410647.597334] Call Trace:
      [12410647.597345]  [<ffffffff815523f2>] dump_stack+0x19/0x1b
      [12410647.597356]  [<ffffffff8111def7>] bad_page+0xc7/0x120
      [12410647.597359]  [<ffffffff8111e49e>] free_pages_prepare+0x10e/0x120
      [12410647.597361]  [<ffffffff8111fc80>] free_hot_cold_page+0x40/0x170
      [12410647.597363]  [<ffffffff81123507>] __put_single_page+0x27/0x30
      [12410647.597365]  [<ffffffff81123df5>] put_page+0x25/0x40
      [12410647.597376]  [<ffffffffa02bdcf9>] ceph_readpages+0x2e9/0x6e0 [ceph]
      [12410647.597379]  [<ffffffff81122a8f>] __do_page_cache_readahead+0x1af/0x260
      [12410647.597382]  [<ffffffff81122ea1>] ra_submit+0x21/0x30
      [12410647.597384]  [<ffffffff81118f64>] filemap_fault+0x254/0x490
      [12410647.597387]  [<ffffffff8113a74f>] __do_fault+0x6f/0x4e0
      [12410647.597391]  [<ffffffff810125bd>] ? __switch_to+0x16d/0x4a0
      [12410647.597395]  [<ffffffff810865ba>] ? finish_task_switch+0x5a/0xc0
      [12410647.597398]  [<ffffffff8113d856>] handle_pte_fault+0xf6/0x930
      [12410647.597401]  [<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110
      [12410647.597403]  [<ffffffff81008cce>] ? xen_pmd_val+0xe/0x10
      [12410647.597405]  [<ffffffff81005469>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
      [12410647.597407]  [<ffffffff8113f361>] handle_mm_fault+0x251/0x370
      [12410647.597411]  [<ffffffff812b0ac4>] ? call_rwsem_down_read_failed+0x14/0x30
      [12410647.597414]  [<ffffffff8155bffa>] __do_page_fault+0x1aa/0x550
      [12410647.597418]  [<ffffffff8108011d>] ? up_write+0x1d/0x20
      [12410647.597422]  [<ffffffff8113141c>] ? vm_mmap_pgoff+0xbc/0xe0
      [12410647.597425]  [<ffffffff81143bb8>] ? SyS_mmap_pgoff+0xd8/0x240
      [12410647.597427]  [<ffffffff8155c3ae>] do_page_fault+0xe/0x10
      [12410647.597431]  [<ffffffff81558818>] page_fault+0x28/0x30
      Signed-off-by: default avatarMilosz Tanski <milosz@adfin.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      5a6f282a
    • David Howells's avatar
      FS-Cache: Fix heading in documentation · 696f69b6
      David Howells authored
      Fix a heading in the documentation to make it consistent with the contents
      list.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      696f69b6
    • David Howells's avatar
      CacheFiles: Implement interface to check cache consistency · 5002d7be
      David Howells authored
      Implement the FS-Cache interface to check the consistency of a cache object in
      CacheFiles.
      
      Original-author: Hongyi Jia <jiayisuse@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Hongyi Jia <jiayisuse@gmail.com>
      cc: Milosz Tanski <milosz@adfin.com>
      5002d7be
    • David Howells's avatar
      FS-Cache: Add interface to check consistency of a cached object · da9803bc
      David Howells authored
      Extend the fscache netfs API so that the netfs can ask as to whether a cache
      object is up to date with respect to its corresponding netfs object:
      
      	int fscache_check_consistency(struct fscache_cookie *cookie)
      
      This will call back to the netfs to check whether the auxiliary data associated
      with a cookie is correct.  It returns 0 if it is and -ESTALE if it isn't; it
      may also return -ENOMEM and -ERESTARTSYS.
      
      The backends now have to implement a mandatory operation pointer:
      
      	int (*check_consistency)(struct fscache_object *object)
      
      that corresponds to the above API call.  FS-Cache takes care of pinning the
      object and the cookie in memory and managing this call with respect to the
      object state.
      
      Original-author: Hongyi Jia <jiayisuse@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Hongyi Jia <jiayisuse@gmail.com>
      cc: Milosz Tanski <milosz@adfin.com>
      da9803bc
  4. 04 Sep, 2013 4 commits
  5. 02 Sep, 2013 4 commits
  6. 31 Aug, 2013 3 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · a8787645
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) There was a simplification in the ipv6 ndisc packet sending
          attempted here, which avoided using memory accounting on the
          per-netns ndisc socket for sending NDISC packets.  It did fix some
          important issues, but it causes regressions so it gets reverted here
          too.  Specifically, the problem with this change is that the IPV6
          output path really depends upon there being a valid skb->sk
          attached.
      
          The reason we want to do this change in some form when we figure out
          how to do it right, is that if a device goes down the ndisc_sk
          socket send queue will fill up and block NDISC packets that we want
          to send to other devices too.  That's really bad behavior.
      
          Hopefully Thomas can come up with a better version of this change.
      
       2) Fix a severe TCP performance regression by reverting a change made
          to dev_pick_tx() quite some time ago.  From Eric Dumazet.
      
       3) TIPC returns wrongly signed error codes, fix from Erik Hugne.
      
       4) Fix OOPS when doing IPSEC over ipv4 tunnels due to orphaning the
          skb->sk too early.  Fix from Li Hongjun.
      
       5) RAW ipv4 sockets can use the wrong routing key during lookup, from
          Chris Clark.
      
       6) Similar to #1 revert an older change that tried to use plain
          alloc_skb() for SYN/ACK TCP packets, this broke the netfilter owner
          mark which needs to see the skb->sk for such frames.  From Phil
          Oester.
      
       7) BNX2x driver bug fixes from Ariel Elior and Yuval Mintz,
          specifically in the handling of virtual functions.
      
       8) IPSEC path error propagations to sockets is not done properly when
          we have v4 in v6, and v6 in v4 type rules.  Fix from Hannes Frederic
          Sowa.
      
       9) Fix missing channel context release in mac80211, from Johannes Berg.
      
      10) Fix network namespace handing wrt.  SCM_RIGHTS, from Andy
          Lutomirski.
      
      11) Fix usage of bogus NAPI weight in jme, netxen, and ps3_gelic
          drivers.  From Michal Schmidt.
      
      12) Hopefully a complete and correct fix for the genetlink dump locking
          and module reference counting.  From Pravin B Shelar.
      
      13) sk_busy_loop() must do a cpu_relax(), from Eliezer Tamir.
      
      14) Fix handling of timestamp offset when restoring a snapshotted TCP
          socket.  From Andrew Vagin.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
        net: fec: fix time stamping logic after napi conversion
        net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay
        mISDN: return -EINVAL on error in dsp_control_req()
        net: revert 8728c544 ("net: dev_pick_tx() fix")
        Revert "ipv6: Don't depend on per socket memory for neighbour discovery messages"
        ipv4 tunnels: fix an oops when using ipip/sit with IPsec
        tipc: set sk_err correctly when connection fails
        tcp: tcp_make_synack() should use sock_wmalloc
        bridge: separate querier and query timer into IGMP/IPv4 and MLD/IPv6 ones
        ipv6: Don't depend on per socket memory for neighbour discovery messages
        ipv4: sendto/hdrincl: don't use destination address found in header
        tcp: don't apply tsoffset if rcv_tsecr is zero
        tcp: initialize rcv_tstamp for restored sockets
        net: xilinx: fix memleak
        net: usb: Add HP hs2434 device to ZLP exception table
        net: add cpu_relax to busy poll loop
        net: stmmac: fixed the pbl setting with DT
        genl: Hold reference on correct module while netlink-dump.
        genl: Fix genl dumpit() locking.
        xfrm: Fix potential null pointer dereference in xdst_queue_output
        ...
      a8787645
    • Ian Campbell's avatar
      MAINTAINERS: change my DT related maintainer address · de80963e
      Ian Campbell authored
      Filtering capabilities on my work email are pretty much non-existent and this
      has turned out to be something of a firehose...
      
      Cc: Stephen Warren <swarren@wwwdotorg.org>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Acked-by: default avatarPawel Moll <pawel.moll@arm.com>
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      de80963e
    • Linus Torvalds's avatar
      Merge tag 'sound-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 936dbcc3
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "This contains two Oops fixes (opti9xx and HD-audio) and a simple fixup
        for an Acer laptop.  All marked as stable patches"
      
      * tag 'sound-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: opti9xx: Fix conflicting driver object name
        ALSA: hda - Fix NULL dereference with CONFIG_SND_DYNAMIC_MINORS=n
        ALSA: hda - Add inverted digital mic fixup for Acer Aspire One
      936dbcc3
  7. 30 Aug, 2013 9 commits