Commits · 463bb8da5042c165bf50ae2688d251c5af26f3cf · Kirill Smelkov / linux

07 Jul, 2017 37 commits

libceph: compute actual pgid in ceph_pg_to_up_acting_osds() · 463bb8da

Ilya Dryomov authored Jun 21, 2017

Move raw_pg_to_pg() call out of get_temp_osds() and into
ceph_pg_to_up_acting_osds(), for upcoming apply_upmap().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

463bb8da

libceph: pg_upmap[_items] infrastructure · 6f428df4

Ilya Dryomov authored Jun 21, 2017

pg_temp and pg_upmap encodings are the same (PG -> array of osds),
except for the incremental remove: it's an empty mapping in new_pg_temp
for pg_temp and a separate old_pg_upmap set for pg_upmap.  (This isn't
to allow for empty pg_upmap mappings -- apparently, pg_temp just wasn't
looked at as an example for pg_upmap encoding.)

Reuse __decode_pg_temp() for decoding pg_upmap and new_pg_upmap.
__decode_pg_temp() stores into pg_temp union member, but since pg_upmap
union member is identical, reading through pg_upmap later is OK.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

6f428df4

libceph: ceph_decode_skip_* helpers · 278b1d70

Ilya Dryomov authored Jun 21, 2017

Some of these won't be as efficient as they could be (e.g.
ceph_decode_skip_set(... 32 ...) could advance by len * sizeof(u32)
once instead of advancing by sizeof(u32) len times), but that's fine
and not worth a bunch of extra macro code.

Replace skip_name_map() with ceph_decode_skip_map as an example.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

278b1d70

libceph: kill __{insert,lookup,remove}_pg_mapping() · ab75144b

Ilya Dryomov authored Jun 21, 2017

Switch to DEFINE_RB_FUNCS2-generated {insert,lookup,erase}_pg_mapping().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

ab75144b

libceph: introduce and switch to decode_pg_mapping() · a303bb0e
Ilya Dryomov authored Jun 21, 2017
```
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
```
a303bb0e

libceph: don't pass pgid by value · 33333d10

Ilya Dryomov authored Jun 21, 2017

Make __{lookup,remove}_pg_mapping() look like their ceph_spg_mapping
counterparts: take const struct ceph_pg *.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

33333d10

libceph: respect RADOS_BACKOFF backoffs · a02a946d
Ilya Dryomov authored Jun 19, 2017
```
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
```
a02a946d

libceph: make DEFINE_RB_* helpers more general · 76f827a7

Ilya Dryomov authored Jun 19, 2017

Initially for ceph_pg_mapping, ceph_spg_mapping and ceph_hobject_id,
compared with ceph_pg_compare(), ceph_spg_compare() and hoid_compare()
respectively.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

76f827a7

libceph: avoid unnecessary pi lookups in calc_target() · df28152d
Ilya Dryomov authored Jun 15, 2017
```
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
```
df28152d

libceph: use target pi for calc_target() calculations · 6d637a54

Ilya Dryomov authored Jun 15, 2017

For luminous and beyond we are encoding the actual spgid, which
requires operating with the correct pg_num, i.e. that of the target
pool.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

6d637a54

libceph: always populate t->target_{oid,oloc} in calc_target() · db098ec4

Ilya Dryomov authored Jun 15, 2017

need_check_tiering logic doesn't make a whole lot of sense. Drop it
and apply tiering unconditionally on every calc_target() call instead.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

db098ec4

libceph: make sure need_resend targets reflect latest map · 04c7d789

Ilya Dryomov authored Jun 15, 2017

Otherwise we may miss events like PG splits, pool deletions, etc when
we get multiple incremental maps at once.  Because check_pool_dne() can
now be fed an unlinked request, finish_request() needed to be taught to
handle unlinked requests.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

04c7d789

libceph: delete from need_resend_linger before check_linger_pool_dne() · a10bcb19

Ilya Dryomov authored Jun 15, 2017

When processing a map update consisting of multiple incrementals, we
may end up running check_linger_pool_dne() on a lingering request that
was previously added to need_resend_linger list.  If it is concluded
that the target pool doesn't exist, the request is killed off while
still on need_resend_linger list, which leads to a crash on a NULL
lreq->osd in kick_requests():

    libceph: linger_id 18446462598732840961 pool does not exist
    BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
    IP: ceph_osdc_handle_map+0x4ae/0x870
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

a10bcb19

libceph: resend on PG splits if OSD has RESEND_ON_SPLIT · 7de030d6

Ilya Dryomov authored Jun 15, 2017

Note that ceph_osd_request_target fields are updated regardless of
RESEND_ON_SPLIT.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

7de030d6

libceph: drop need_resend from calc_target() · 84ed45df

Ilya Dryomov authored Jun 15, 2017

Replace it with more fine-grained bools to separate updating
ceph_osd_request_target fields and the decision to resend.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

84ed45df

libceph: MOSDOp v8 encoding (actual spgid + full hash) · 8cb441c0
Ilya Dryomov authored Jun 15, 2017
```
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
```
8cb441c0

libceph: ceph_connection_operations::reencode_message() method · 98ad5ebd

Ilya Dryomov authored Jun 15, 2017

Give upper layers a chance to reencode the message after the connection
is negotiated and ->peer_features is set. OSD client will use this to
support both luminous and pre-luminous OSDs (in a single cluster): the
former need MOSDOp v8; the latter will continue to be sent MOSDOp v4.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

98ad5ebd

libceph: encode_{pgid,oloc}() helpers · 2e59ffd1

Ilya Dryomov authored Jun 15, 2017

Factor out encode_{pgid,oloc}() and use ceph_encode_string() for oid.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

2e59ffd1

libceph: introduce ceph_spg, ceph_pg_to_primary_shard() · dc98ff72

Ilya Dryomov authored Jun 15, 2017

Store both raw pgid and actual spgid in ceph_osd_request_target.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

dc98ff72

libceph: new pi->last_force_request_resend · 8e48cf00

Ilya Dryomov authored Jun 05, 2017

The old (v15) pi->last_force_request_resend has been repurposed to
make pre-RESEND_ON_SPLIT clients that don't check for PG splits but do
obey pi->last_force_request_resend resend on splits.  See ceph.git
commit 189ca7ec6420 ("mon/OSDMonitor: make pre-luminous clients resend
ops on split").
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

8e48cf00

libceph: fold [l]req->last_force_resend into ceph_osd_request_target · dc93e0e2
Ilya Dryomov authored Jun 05, 2017
```
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
```
dc93e0e2

libceph: support SERVER_JEWEL feature bits · 220abf5a

Ilya Dryomov authored Jun 05, 2017

Only MON_STATEFUL_SUB, really.  MON_ROUTE_OSDMAP and
OSDSUBOP_NO_SNAPCONTEXT are irrelevant.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

220abf5a

libceph: advertise support for OSD_POOLRESEND · 2d7522e0

Ilya Dryomov authored Jun 05, 2017

The code has been in place since commit 63244fa1 ("libceph:
introduce ceph_osd_request_target, calc_target()"), and, with the
ceph_{oloc,oid}_copy() issue fixed in the previous commit, is now
in working order.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

2d7522e0

libceph: handle non-empty dest in ceph_{oloc,oid}_copy() · ca35ffea
Ilya Dryomov authored Jun 05, 2017
```
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
```
ca35ffea
libceph: new features macros · f179d3ba
Ilya Dryomov authored Jun 05, 2017
```
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
```
f179d3ba

libceph: remove ceph_sanitize_features() workaround · dcbbd97c

Ilya Dryomov authored Jun 05, 2017

Reflects ceph.git commit ff1959282826ae6acd7134e1b1ede74ffd1cc04a.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

dcbbd97c

ceph: update ceph_dentry_info::lease_session when necessary · 481f001f

Yan, Zheng authored Jul 03, 2017

Current code does not update ceph_dentry_info::lease_session once
it is set. If auth mds of corresponding dentry changes, dentry lease
keeps in an invalid state.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

481f001f

ceph: new mount option that specifies fscache uniquifier · 1d8f8360

Yan, Zheng authored Jun 27, 2017

Current ceph uses FSID as primary index key of fscache data. This
allows ceph to retain cached data across remount. But this causes
problem (kernel opps, fscache does not support sharing data) when
a filesystem get mounted several times (with fscache enabled, with
different mount options).

The fix is adding a new mount option, which specifies uniquifier
for fscache.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

1d8f8360

ceph: avoid accessing freeing inode in ceph_check_delayed_caps() · 4b9f2042
Yan, Zheng authored Jun 27, 2017
```
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
```
4b9f2042

ceph: avoid invalid memory dereference in the middle of umount · 62a65f36

Yan, Zheng authored Jun 22, 2017

extra_mon_dispatch() and debugfs' foo_show functions dereference
fsc->mdsc. we should clean up fsc->client->extra_mon_dispatch
and debugfs before destroying fsc->mds.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

62a65f36

ceph: getattr before read on ceph.* xattrs · 1684dd03

Yan, Zheng authored Jun 14, 2017

Previously we were returning values for quota, layout
xattrs without any kind of update -- the user just got
whatever happened to be in our cache.

Clearly this extra round trip has a cost, but reads of
these xattrs are fairly rare, happening on admin
intervention rather than in normal operation.

Link: http://tracker.ceph.com/issues/17939Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

1684dd03

ceph: don't re-send interrupted flock request · 92e57e62

Yan, Zheng authored Jun 05, 2017

Don't re-send interrupted flock request in cases of mds failover
and receiving request forward. Because corresponding 'lock intr'
request may have been finished, it won't get re-sent.

Link: http://tracker.ceph.com/issues/20170Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

92e57e62

ceph: cleanup writepage_nounlock() · 43986881

Yan, Zheng authored May 23, 2017

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

43986881

ceph: redirty page when writepage_nounlock() skips unwritable page · fa71fefb

Yan, Zheng authored May 23, 2017

Ceph needs to flush dirty page in the order in which in which snap
context they belong to. Dirty pages belong to older snap context
should be flushed earlier. if writepage_nounlock() can not flush a
page, it should redirty the page.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

fa71fefb

ceph: remove useless page->mapping check in writepage_nounlock() · f2b0c45f

Yan, Zheng authored May 23, 2017

Callers of writepage_nounlock() have already ensured non-null
page->mapping.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

f2b0c45f

ceph: update the 'approaching max_size' code · efb0ca76

Yan, Zheng authored May 22, 2017

The old 'approaching max_size' code expects MDS set max_size to
'2 * reported_size'. This is no longer true. The new code reports
file size when half of previous max_size increment has been used.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

efb0ca76

ceph: re-request max size after importing caps · 84eea8c7

Yan, Zheng authored May 16, 2017

The 'wanted max size' could be sent to inode's old auth mds, re-send
it to inode's new auth mds if necessary. Otherwise write syscall may
hang.
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

84eea8c7

02 Jul, 2017 3 commits

Linux 4.12 · 6f7da290
Linus Torvalds authored Jul 02, 2017

6f7da290

moduleparam: fix doc: hwparam_irq configures an IRQ · 401e000a

Sylvain 'ythier' Hitier authored Jul 02, 2017

Signed-off-by: Sylvain 'ythier' Hitier <sylvain.hitier@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

401e000a

Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · 79c49681

Linus Torvalds authored Jul 02, 2017

Pull MIPS fixes from Ralf Baechle:
 "Here's a final round of fixes for 4.12:

   - Fix misordered instructions in assembly code making kenel startup
     via UHB unreliable.

   - Fix special case of MADDF and MADDF emulation.

   - Fix alignment issue in address calculation in pm-cps on 64 bit.

   - Fix IRQ tracing & lockdep when rescheduling

   - Systems with MAARs require post-DMA cache flushes.

  The reordering fix and the MADDF/MSUBF fix have sat in linux-next for
  a number of days. The others haven't propagated from my pull tree to
  linux-next yet but all have survived manual testing and Imagination's
  automated test system and there are no pending bug reports"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: Avoid accidental raw backtrace
  MIPS: Perform post-DMA cache flushes on systems with MAARs
  MIPS: Fix IRQ tracing & lockdep when rescheduling
  MIPS: pm-cps: Drop manual cache-line alignment of ready_count
  MIPS: math-emu: Handle zero accumulator case in MADDF and MSUBF separately
  MIPS: head: Reorder instructions missing a delay slot

79c49681