An error occurred fetching the project authors.
- 15 Oct, 2019 1 commit
-
-
Jeff Layton authored
In the future, we're going to want to extend the ceph_reply_info_extra for create replies. Currently though, the kernel code doesn't accept an extra blob that is larger than the expected data. Change the code to skip over any unrecognized fields at the end of the extra blob, rather than returning -EIO. Cc: stable@vger.kernel.org Signed-off-by:
Jeff Layton <jlayton@kernel.org> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
- 16 Sep, 2019 5 commits
-
-
Erqi Chen authored
If client mds session is evicted in CEPH_MDS_SESSION_OPENING state, mds won't send session msg to client, and delayed_work skip CEPH_MDS_SESSION_OPENING state session, the session hang forever. Allow ceph_con_keepalive to reconnect a session in OPENING to avoid session hang. Also, ensure that we skip sessions in RESTARTING and REJECTED states since those states can't be resurrected by issuing a keepalive. Link: https://tracker.ceph.com/issues/41551 Signed-off-by: Erqi Chen chenerqi@gmail.com Reviewed-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Jeff Layton <jlayton@kernel.org> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
It's only used to keep count of caps being trimmed, but that requires that we hold the session->s_mutex to prevent multiple trimming operations from running concurrently. We can achieve the same effect using an integer on the stack, which allows us to (eventually) not need the s_mutex. Signed-off-by:
Jeff Layton <jlayton@kernel.org> Reviewed-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Yan, Zheng authored
Make client use osd reply and session message to infer if itself is blacklisted. Client reconnect to cluster using new entity addr if it is blacklisted. Auto reconnect is limited to once every 30 minutes. Auto reconnect is disabled by default. It can be enabled/disabled by recover_session=<no|clean> mount option. In 'clean' mode, client drops any dirty data/metadata, invalidates page caches and invalidates all writable file handles. After reconnect, file locks become stale because MDS loses track of them. If an inode contains any stale file locks, read/write on the indoe are not allowed until applications release all stale file locks. Signed-off-by:
"Yan, Zheng" <zyan@redhat.com> Reviewed-by:
Jeff Layton <jlayton@kernel.org> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Yan, Zheng authored
It closes mds sessions, drop all caps and invalidates page caches, then use new entity address to reconnect to the cluster. After reconnect, all dirty data/metadata are dropped, file locks get lost sliently. Open files continue to work because client will try renewing caps on later read/write. Signed-off-by:
"Yan, Zheng" <zyan@redhat.com> Reviewed-by:
Jeff Layton <jlayton@kernel.org> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Yan, Zheng authored
Use errseq_t to track and report errors of async metadata operations, similar to how kernel handles errors during writeback. If any dirty caps or any unsafe request gets dropped during session eviction, record -EIO in corresponding inode's i_meta_err. The error will be reported by subsequent fsync, Signed-off-by:
"Yan, Zheng" <zyan@redhat.com> Reviewed-by:
Jeff Layton <jlayton@kernel.org> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
- 08 Jul, 2019 7 commits
-
-
Jeff Layton authored
Signed-off-by:
Jeff Layton <jlayton@kernel.org> Reviewed-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
Signed-off-by:
Jeff Layton <jlayton@kernel.org> Reviewed-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Yan, Zheng authored
Link: https://tracker.ceph.com/issues/40339Signed-off-by:
"Yan, Zheng" <zyan@redhat.com> Reviewed-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Yan, Zheng authored
handle_cap_export() may add placeholder caps to session that is in opening state. These caps' session pointer become wild after session get unregistered. The fix is not to unregister session in opening state during mds failovers, just let client to reconnect later when mds is recovered. Link: https://tracker.ceph.com/issues/40190Signed-off-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Yan, Zheng authored
Signed-off-by:
"Yan, Zheng" <zyan@redhat.com> Reviewed-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Yan, Zheng authored
Signed-off-by:
"Yan, Zheng" <zyan@redhat.com> Reviewed-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
David Disseldorp authored
MDS InodeStat v3 wire structures include a trailing snapshot creation time member. Unmarshall this and retain it for a future vxattr. Signed-off-by:
David Disseldorp <ddiss@suse.de> Reviewed-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
- 27 Jun, 2019 1 commit
-
-
Jeff Layton authored
When ceph_mdsc_build_path is handed a positive dentry, it will return a zero-length path string with the base set to that dentry. This is not what we want. Always include at least one path component in the string. ceph_mdsc_build_path has behaved this way for a long time but it didn't matter until recent d_name handling rework. Fixes: 964fff74 ("ceph: use ceph_mdsc_build_path instead of clone_dentry_name") Signed-off-by:
Jeff Layton <jlayton@kernel.org> Reviewed-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
- 05 Jun, 2019 1 commit
-
-
Yan, Zheng authored
iput_final() may wait for reahahead pages. The wait can cause deadlock. For example: Workqueue: ceph-msgr ceph_con_workfn [libceph] Call Trace: schedule+0x36/0x80 io_schedule+0x16/0x40 __lock_page+0x101/0x140 truncate_inode_pages_range+0x556/0x9f0 truncate_inode_pages_final+0x4d/0x60 evict+0x182/0x1a0 iput+0x1d2/0x220 iterate_session_caps+0x82/0x230 [ceph] dispatch+0x678/0xa80 [ceph] ceph_con_workfn+0x95b/0x1560 [libceph] process_one_work+0x14d/0x410 worker_thread+0x4b/0x460 kthread+0x105/0x140 ret_from_fork+0x22/0x40 Workqueue: ceph-msgr ceph_con_workfn [libceph] Call Trace: __schedule+0x3d6/0x8b0 schedule+0x36/0x80 schedule_preempt_disabled+0xe/0x10 mutex_lock+0x2f/0x40 ceph_check_caps+0x505/0xa80 [ceph] ceph_put_wrbuffer_cap_refs+0x1e5/0x2c0 [ceph] writepages_finish+0x2d3/0x410 [ceph] __complete_request+0x26/0x60 [libceph] handle_reply+0x6c8/0xa10 [libceph] dispatch+0x29a/0xbb0 [libceph] ceph_con_workfn+0x95b/0x1560 [libceph] process_one_work+0x14d/0x410 worker_thread+0x4b/0x460 kthread+0x105/0x140 ret_from_fork+0x22/0x40 In above example, truncate_inode_pages_range() waits for readahead pages while holding s_mutex. ceph_check_caps() waits for s_mutex and blocks OSD dispatch thread. Later OSD replies (for readahead) can't be handled. ceph_check_caps() also may lock snap_rwsem for read. So similar deadlock can happen if iput_final() is called while holding snap_rwsem. In general, it's not good to call iput_final() inside MDS/OSD dispatch threads or while holding any mutex. The fix is introducing ceph_async_iput(), which calls iput_final() in workqueue. Signed-off-by:
"Yan, Zheng" <zyan@redhat.com> Reviewed-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
- 07 May, 2019 11 commits
-
-
Jeff Layton authored
Signed-off-by:
Jeff Layton <jlayton@kernel.org> Reviewed-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
I originally thought there was a potential race here, but the fact that this is called with the mdsc->mutex held, ensures that the last reference to the session can't be put here. Still, it's clearer to just return the value from get_session here, and may prevent a bug later if we ever rework this code to be less reliant on mutexes. Signed-off-by:
Jeff Layton <jlayton@kernel.org> Reviewed-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
Signed-off-by:
Jeff Layton <jlayton@kernel.org> Reviewed-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
Nothing calls ceph_mdsc_submit_request today, but in later patches we'll need to be able to call this separately. Have the helper return an int so we can check the r_err under the mutex, and have the caller just check the error code from the submit. Also move the acquisition of CEPH_CAP_PIN references into the same function. Signed-off-by:
Jeff Layton <jlayton@kernel.org> Reviewed-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
No MDS requests use r_callback today, but that will change in the future. The OSD client always does r_callback and then completes r_completion. Let's have the MDS client do the same. Signed-off-by:
Jeff Layton <jlayton@kernel.org> Reviewed-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
We make copies of the dentry name in set_request_path_attr, but then create_request_message re-fetches the lengths out of the dentry. While we don't currently set the *_drop fields unless the parents are locked, it's still better not to rely on that sort of implicit assumption. Use the pathlen values that set_request_path_attr returned instead, as they will always be correct for the returned paths themselves. Signed-off-by:
Jeff Layton <jlayton@kernel.org> Reviewed-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
Al suggested we get rid of the kmalloc here and just use __getname and __putname to get a full PATH_MAX pathname buffer. Since we build the path in reverse, we continue to return a pointer to the beginning of the string and the length, and add a new helper to free the thing at the end. Suggested-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Jeff Layton <jlayton@kernel.org> Reviewed-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
While it may be slightly more efficient, it's probably not worthwhile to optimize for the case that clone_dentry_name handles. We can get the same result by just calling ceph_mdsc_build_path when the parent isn't locked, with less code duplication. Signed-off-by:
Jeff Layton <jlayton@kernel.org> Reviewed-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
temp is not defined outside of the RCU critical section here. Ensure we grab that value before we drop the rcu_read_lock. Reported-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Jeff Layton <jlayton@kernel.org> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
Signed-off-by:
Jeff Layton <jlayton@kernel.org> Reviewed-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Luis Henriques authored
The CephFS kernel client does not enforce quotas set in a directory that isn't visible from the mount point. For example, given the path '/dir1/dir2', if quotas are set in 'dir1' and the filesystem is mounted with mount -t ceph <server>:<port>:/dir1/ /mnt then the client won't be able to access 'dir1' inode, even if 'dir2' belongs to a quota realm that points to it. This patch fixes this issue by simply doing an MDS LOOKUPINO operation for unknown inodes. Any inode reference obtained this way will be added to a list in ceph_mds_client, and will only be released when the filesystem is umounted. Link: https://tracker.ceph.com/issues/38482Reported-by:
Hendrik Peyerl <hpeyerl@plusline.net> Signed-off-by:
Luis Henriques <lhenriques@suse.com> Reviewed-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
- 23 Apr, 2019 2 commits
-
-
Yan, Zheng authored
We missed two places that i_wrbuffer_ref_head, i_wr_ref, i_dirty_caps and i_flushing_caps may change. When they are all zeros, we should free i_head_snapc. Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/38224Reported-and-tested-by:
Luis Henriques <lhenriques@suse.com> Signed-off-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Jeff Layton authored
Ben reported tripping the BUG_ON in create_request_message during some performance testing. Analysis of the vmcore showed that the length of the r_dentry->d_name string changed after we allocated the buffer, but before we encoded it. build_dentry_path returns pointers to d_name in the common case of non-snapped dentries, but this optimization isn't safe unless the parent directory is locked. When it isn't, have the code make a copy of the d_name while holding the d_lock. Cc: stable@vger.kernel.org Reported-by:
Ben England <bengland@redhat.com> Signed-off-by:
Jeff Layton <jlayton@kernel.org> Reviewed-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
- 05 Mar, 2019 9 commits
-
-
Yan, Zheng authored
If number of caps exceed the limit, ceph_trim_dentires() also trim dentries with valid leases. Trimming dentry releases references to associated inode, which may evict inode and release caps. By default, there is no limit for caps count. Signed-off-by:
"Yan, Zheng" <zyan@redhat.com> Reviewed-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Yan, Zheng authored
Previous commit make VFS delete stale dentry when last reference is dropped. Lease also can become invalid when corresponding dentry has no reference. This patch make cephfs periodically scan lease list, delete corresponding dentry if lease is invalid. There are two types of lease, dentry lease and dir lease. dentry lease has life time and applies to singe dentry. Dentry lease is added to tail of a list when it's updated, leases at front of the list will expire first. Dir lease is CEPH_CAP_FILE_SHARED on directory inode, it applies to all dentries in the directory. Dentries have dir leases are added to another list. Dentries in the list are periodically checked in a round robin manner. Signed-off-by:
"Yan, Zheng" <zyan@redhat.com> Reviewed-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Yan, Zheng authored
introduce ceph_d_delete(), which checks if dentry has valid lease. Signed-off-by:
"Yan, Zheng" <zyan@redhat.com> Reviewed-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Yan, Zheng authored
When pending cap releases fill up one message, start a work to send cap release message. (old way is sending cap releases every 5 seconds) Signed-off-by:
"Yan, Zheng" <zyan@redhat.com> Reviewed-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Yan, Zheng authored
Link: http://tracker.ceph.com/issues/37576Signed-off-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Yan, Zheng authored
In versioned reply, inodestat, dirstat and lease are encoded with version, compat_version and struct_len. Based on a patch from Jos Collin <jcollin@redhat.com>. Link: http://tracker.ceph.com/issues/26936Signed-off-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Yan, Zheng authored
ceph_getattr() return zero dev ID for head inodes and set dev ID to snapid directly for snaphost inodes. This is not good because userspace utilities may consider device ID of 0 as invalid, snapid may conflict with other device's ID. This patch introduces "snapids to anonymous bdev IDs" map. we create a new mapping when we see a snapid for the first time. we trim unused mapping after it is ilde for 5 minutes. Link: http://tracker.ceph.com/issues/22353Signed-off-by:
"Yan, Zheng" <zyan@redhat.com> Acked-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Yan, Zheng authored
Signed-off-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Yan, Zheng authored
Signed-off-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
- 26 Dec, 2018 2 commits
-
-
Yan, Zheng authored
mds hasn't used inode pathes since introducing inode backtrace. Signed-off-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
Yan, Zheng authored
mds contains an optimization, it does not re-issue stale caps if client does not want any cap. A special case of the optimization is that client wants some caps, but skipped updating 'wanted'. For this case, client needs to update 'wanted' when stale session get renewed. Signed-off-by:
"Yan, Zheng" <zyan@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com>
-
- 08 Nov, 2018 1 commit
-
-
Ilya Dryomov authored
No one is running pre-argonaut. In addition one of the argonaut features (NOSRCADDR) has been required since day one (and a half, 2.6.34 vs 2.6.35) of the kernel client. Allow for the possibility of reusing these feature bits later. Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Reviewed-by:
Sage Weil <sage@redhat.com>
-