CMFActivity: Do not use offset for scanning messages to validate.
This was inefficient for two reasons: - any message we could validate during current iteration means a message we did not consider is now in the range we just scanned. And it will not be considered until validation node starts over and scan this same range again. - "LIMIT x,1000" pattern on >1000 messages causes a quick-growing number of extra rows scanned by the SQL database just to skip the "x" first rows: at 2000 rows present it must scan 1000 + 2000 = 3000 rows for a complete loop over all pending activities. At 3k rows it must scan 6k rows. At 4k, 10k. While this is an overestimation (some rows should be possible to validate, so these would be scanned once only), this overhead grows so large that this overestimation can become negligible. Instead, use a range condition consistent with query's "SORT ON", which is already efficiently materialised by an index: SQL database just has to dive into the existing index to start just above the last message from previous iteration, and resume scanning from there, solving both issues listed above.
-
Owner
@jm here is the new version, with one more coma which angered the previous tests. Now they look happy (though not over yet). So the remaining question is finding a better name for
above_uid
if you think it needs one.[edit]: Also, I fixed the commit message. There was a pretty bad typo in the previous one which made the first inefficiency description very hard to understand, I think.
-
mentioned in merge request !776 (merged)
-
Owner
@jm: I realise this change is bad. Not on the intent, but on the implementation:
- the sort on uids, which was here before this change, is meaningless since the switch to random uids. The original intent was that uids implemented a finer-grained "date" value, giving information about activity age when activities are created within a single second. With random uids, this became meaningless.
- ...but it is required for this change to work, otherwise if there are >=1k activities on a given priority and date and cannot be validated, then validation loops on these forever
So this change relies on something meaningless to work, so it's not good.
Also, while trying to improve this change I also realised something I completely forgot: validation node restarts from the beginning everytime it could validate 1k activities. I think it is wrong (why not just continue scanning the current activity queue until all activities have been considered ? this looks like an optimisation for the special case where validation node is also a processing node, which I think as meaningless to optimise for - either performance does not matter, or a cluster should be used and responsibilities split). This behaviour means the expected large gain from when there are many pending activities is not actually achieved.
All this makes me want to revert this commit, to keep the code simple and not adding more code relying on dubious features. And we can continue living with a meaningless sort on uid until a better solution is found.
What do you think ?
-
Owner
Increasing the precision of the date column (
DATETIME(6)
) looks enough, doesn't it ?MariaDB [test]> select UTC_TIMESTAMP(6); select UTC_TIMESTAMP(6); +----------------------------+ | UTC_TIMESTAMP(6) | +----------------------------+ | 2018-12-19 15:48:51.207722 | +----------------------------+ 1 row in set (0.00 sec) +----------------------------+ | UTC_TIMESTAMP(6) | +----------------------------+ | 2018-12-19 15:48:51.207986 | +----------------------------+ 1 row in set (0.00 sec)
Note that several
UTC_TIMESTAMP(6)
within the same request return the same result (either MariaDB is too fast or it computes the result only once) but it should be ok to sort by uids here. -
Owner
Increasing date precision looks like a good idea, yes.
I would not say that it fixes the issue (we still need to sort by a meaningless value, just as a tie-break) but it certainly improves the result, and pushes the limit of how many nodes we can have and how fast they can be before they start to actually rely on that tie-break.
What is needed ? I believe there is something needed on the insertion queries, doesn't it ? I also faintly remember that you worked on date precision in the past and had to patch dtml rendering.
-
mentioned in merge request !820 (merged)