xlte:master commitshttps://lab.node.vifib.com/kirr/xlte/-/commits/master2023-12-29T14:03:34+03:00https://lab.node.vifib.com/kirr/xlte/-/commit/8e606c643aeb92e87e61dcf8df8e4a03bb5fe3b1nrarfcn: Fix behaviour on invalid input parameters2023-12-29T14:03:34+03:00Kirill Smelkovkirr@nexedi.com
Contrary to earfcn, where band can be automatically deduced from earfcn
number because 4G bands never overlap, most functions in nrarfcn accept
as input parameters both nr_arfcn and band, because 5G bands can and do
overlap. As the result it is possible to invoke e.g. dl2ul with
dl_nr_arfcn being outside of downlink spectrum of specified band.
However in <a href="/kirr/xlte/-/commit/b8065120763aafa35a83f48c35db78d97ed4cf45" data-original="b8065120" data-link="false" data-link-reference="false" data-project="1550" data-commit="b8065120763aafa35a83f48c35db78d97ed4cf45" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="nrarfcn: New package to do computations with NR bands, frequencies and NR-ARFCN numbers." class="gfm gfm-commit has-tooltip">b8065120</a> I've made a thinko and handled such situation with
simple assert which does not lead to useful error feedback from a user
perspective, for example:
In [2]: xnrarfcn.dl2ul(10000, 1)
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[2], line 1
----> 1 n.dl2ul(10000, 1)
File ~/src/wendelin/xlte/nrarfcn.py:85, in dl2ul(dl_nr_arfcn, band)
83 if dl_lo == 'N/A':
84 raise AssertionError('band%r does not have downlink spectrum' % band)
---> 85 assert dl_lo <= dl_nr_arfcn <= dl_hi
86 ul_lo, ul_hi = nr.get_nrarfcn_range(band, 'ul')
87 if ul_lo == 'N/A':
AssertionError:
The issue here is that asserts can be used to only verify internal
invariants, and that reported error does not provide details about which
nrarfcn and band were used in the query.
-> Fix this by providing details in the error reported to incorrect
module usage, and by consistently raising ValueError for "invalid
parameters" cases.
The reported error for above example now becomes
ValueError: band1: NR-ARFCN=10000 is outside of downlink spectrumhttps://lab.node.vifib.com/kirr/xlte/-/commit/b8065120763aafa35a83f48c35db78d97ed4cf45nrarfcn: New package to do computations with NR bands, frequencies and NR-ARF...2023-12-05T12:08:14+03:00Kirill Smelkovkirr@nexedi.com
Do a package for converting DL/UL NR-ARFCN in between each other and to
convert DL NR-ARFCN to SSB NR-ARFCN. The API mimics xlte.earfcn added in <a href="/kirr/xlte/-/commit/6cb9d37fdfbe5f960182c190c05ffa675770c7d8" data-original="6cb9d37f" data-link="false" data-link-reference="false" data-project="1550" data-commit="6cb9d37fdfbe5f960182c190c05ffa675770c7d8" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="earfcn: New package to do computations with LTE bands, frequencies and EARFCN numbers" class="gfm gfm-commit has-tooltip">6cb9d37f</a>.
xlte.nrarfcn complements pypi.org/project/nrarfcn, which we use here under the hood.
See package documentation for API details.https://lab.node.vifib.com/kirr/xlte/-/commit/6cb9d37fdfbe5f960182c190c05ffa675770c7d8earfcn: New package to do computations with LTE bands, frequencies and EARFCN...2023-10-25T13:57:16+03:00Kirill Smelkovkirr@nexedi.com
Do a package which provides calculations like EARFCN -> frequency,
EARFCN -> band info, and to convert DL/UL EARFCN in between each other.
I was hoping to find something ready on the net, but could find only
pypi.org/project/nrarfcn for 5G, while for LTE everything I found
was of lesser quality and capability.
-> So do it myself.
See package documentation for API details.https://lab.node.vifib.com/kirr/xlte/-/commit/bcfd82ddcaefe6429a917160865b997236f8a815amari.xlog: Add support for arbitrary query options2023-07-25T09:17:09+00:00Kirill Smelkovkirr@nexedi.com
Before this patch we were supporting only boolean option flags - with, for
example, stats[rf] meaning stats query with {"rf": True} arguments. Now we add
support for arbitrary types, so that it is possible to specify e.g. integer or
string query options, as well as some boolean flag set to false.
This should be good for generality.
For backward compatibility the old way to implicitly specify "on" flags is
continued to be supported.https://lab.node.vifib.com/kirr/xlte/-/commit/70b4b71cbbe13bd727ce431d6514809fabdb3889amari.xlog: Test and polish str(LogSpec)2023-07-25T09:13:18+00:00Kirill Smelkovkirr@nexedi.com
If the parsed period was '60s' we were printing it back as '60.0s' on str.
Fix it by using %g insted of %s.https://lab.node.vifib.com/kirr/xlte/-/commit/ce38349245ba290c1dc6acb01dcd0902ea70dcc9fixup! amari.xlog: Implement log rotation2023-04-27T14:38:51+03:00Kirill Smelkovkirr@nexedi.com
Adjust plain writer to append to log file instead of truncating it on
the open. The rotating writers are already ok as they use "a" mode by default.https://lab.node.vifib.com/kirr/xlte/-/commit/bf96c767b6cadf053d22e377c3438e13e087f27ekpi: Add way to compute aggregated counters + showcase this2023-04-20T13:18:07+03:00Kirill Smelkovkirr@nexedi.com
- add Calc.cum to aggregate Measurements.
- add ΣMeasurement type to represent result of this. It is very similar
to Measurement, but every field comes accompanied with information
about how much time there was no data for that field. In other words
it is not all or nothing for NA in the result. For example a field
might be present 90% of the time and NA only 10% of the time. We want to
preserver knowledge about that 90% of valid values in the result. And we
also want to know how much time there was no data.
- amend kpidemo.py and kpidemo.ipynb to demonstrate this.https://lab.node.vifib.com/kirr/xlte/-/commit/bda7ab218712189a816d70b46bc3ac35dd1f03ddPre-Alpha -> Alpha2023-04-18T21:46:09+03:00Kirill Smelkovkirr@nexedi.com
XLTE should be ready to be tried to be used for real now.https://lab.node.vifib.com/kirr/xlte/-/commit/e716ab5177a0bbe24cf98491260e6f482f95f804Test via tox support for all py3.9 py3.10 and py3.112023-04-18T21:44:36+03:00Kirill Smelkovkirr@nexedi.com
That's python versions we care about. Add explicit tests to cover that.https://lab.node.vifib.com/kirr/xlte/-/commit/9fff99c8a3032a8c983a59a7256486808960bd5dfixup! amari.xlog: Implement log rotation2023-04-18T21:33:31+03:00Kirill Smelkovkirr@nexedi.com
Xavier reports that str|None is supported only by python ≥ 3.10, while
we still should care to support at least 3.9 - e.g. SlapOS uses it by
default as well as Debian 11.
Let's not delve deep into typing game. If we cannot express things
easily we can omit the type completely or express it in comments.
/reported-by <a href="/xavier_thompson" data-user="3136" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Xavier Thompson">@xavier_thompson</a> at <a href="https://lab.nexedi.com/kirr/xlte/merge_requests/5#note_182930" data-original="https://lab.nexedi.com/kirr/xlte/merge_requests/5#note_182930" data-link="false" data-link-reference="true" data-project="1550" data-merge-request="6698" data-project-path="kirr/xlte" data-iid="5" data-mr-title="amari.xlog: Implement log rotation" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!5 (comment 182930)</a>https://lab.node.vifib.com/kirr/xlte/-/commit/a2c3afaadc10833885b21c909e1ed913f87f58d2amari.xlog: Implement log rotation2023-04-17T20:35:02+03:00Kirill Smelkovkirr@nexedi.com
Rotate output enb.xlog ourselves at sync points so that nothing is lost
in the output (hello `logrotate copytruncate`) and so that we can emit
pre- and post- logrotate syncs.
Reuse logging's RotatingFileHandler and TimedRotatingFileHandler to
implement actual rotation, but carefully wrap them in our writer
classes so that we emit exactly the output we prepared explicitly
without any headers prepended by logging, and that we explicitly control
when rotation happens.
/proposed-for-review-at <a href="https://lab.nexedi.com/kirr/xlte/merge_requests/5" data-original="https://lab.nexedi.com/kirr/xlte/merge_requests/5" data-link="false" data-link-reference="true" data-project="1550" data-merge-request="6698" data-project-path="kirr/xlte" data-iid="5" data-mr-title="amari.xlog: Implement log rotation" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!5</a>https://lab.node.vifib.com/kirr/xlte/-/commit/582202537bc540566def3eb083027676a8d2aa7bAdd way to run tests via nxdtest2023-03-28T17:43:37+03:00Kirill Smelkovkirr@nexedi.com
Nxdtest[1] is tox-like tool to run tests under Nexedi testing
infrastructure.
[1] <a href="https://lab.nexedi.com/nexedi/nxdtest">https://lab.nexedi.com/nexedi/nxdtest</a>https://lab.node.vifib.com/kirr/xlte/-/commit/612a3d0febbb600bfee20817656b2299d56b3ad2demo: Add __init__.py2023-03-27T20:25:35+03:00Kirill Smelkovkirr@nexedi.com
py3 worked without it ok, but py2 was failing to import xlte.demo
without demo/__init__.py present.https://lab.node.vifib.com/kirr/xlte/-/commit/a94430607190ec62a4c23e381cfd87e6fed9631aamari.kpi: tests: Use %r instead of %s to workaround float-point rounding2023-03-27T20:23:42+03:00Kirill Smelkovkirr@nexedi.com
For py3 it does not matter, but on py2 %s prettifies result a bit:
kirr@deca:~$ python
Python 2.7.18 (default, Jul 14 2021, 08:11:37)
>>> str(0.1-0.01)
'0.09'
>>> repr(0.1-0.01)
'0.09000000000000001'
kirr@deca:~$ python3
Python 3.9.2 (default, Feb 28 2021, 17:03:44)
>>> str(0.1-0.01)
'0.09000000000000001'
>>> repr(0.1-0.01)
'0.09000000000000001'
Should make the diff in between master and py2 a bit smaller.https://lab.node.vifib.com/kirr/xlte/-/commit/8a079af917d99d02f2fc2cdfdd03a688bdc10aeb*: from __future__ import print_function, division, absolute_import2023-03-27T20:16:36+03:00Kirill Smelkovkirr@nexedi.com
This will make diff in between master and py2 backport a bit smaller.https://lab.node.vifib.com/kirr/xlte/-/commit/67466ae5c49a020d1279ef1f49a6e7e93ccd1f6bamari.xlog: Sync, reverse reading, timestamps for eNB < 2022-12-012023-03-22T10:49:05+03:00Kirill Smelkovkirr@nexedi.com
Rework XLog protocol to come with periodic sync events that come from time to
time so that xlog stream becomes self-synchronizing. Sync events should be
useful for Wendelin to start reading xlog stream from any point, and to verify
that the stream is ok by matching its content vs messages schedule coming in
the syncs.
Teach xlog.Reader to read streams in reverse order from end to start. This
should be useful to look at tail of a log without reading it in full from the
start.
Teach xlog.Reader to reconstruct messages timestamps for xlog streams produced
with Amarisoft releases < 2022-12-01. There messages do not have .utc field
added in <a href="https://support.amarisoft.com/issues/21934" rel="nofollow noreferrer noopener" target="_blank">https://support.amarisoft.com/issues/21934</a> and come with only .time
field that represent internal eNB time using clock originating at eNB startup.
We combine message.time and δ(utc, enb.time) from sync to build message.timestamp .
See individual patches for details and
<a href="https://lab.nexedi.com/kirr/xlte/merge_requests/3" data-original="https://lab.nexedi.com/kirr/xlte/merge_requests/3" data-link="false" data-link-reference="true" data-project="1550" data-merge-request="6384" data-project-path="kirr/xlte" data-iid="3" data-mr-title="WIP: amari.xlog: Approximate utc timestamp" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!3</a> for preliminary discussion.
/reviewed-by <a href="/xavier_thompson" data-user="3136" data-reference-type="user" data-container="body" data-placement="top" data-html="true" class="gfm gfm-project_member" title="Xavier Thompson">@xavier_thompson</a>
/reviewed-on <a href="https://lab.nexedi.com/kirr/xlte/merge_requests/4" data-original="https://lab.nexedi.com/kirr/xlte/merge_requests/4" data-link="false" data-link-reference="true" data-project="1550" data-merge-request="6616" data-project-path="kirr/xlte" data-iid="4" data-mr-title="Sync, reverse reading, timestamps for eNB < 2022-12-01" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!4</a>https://lab.node.vifib.com/kirr/xlte/-/commit/0c772eb4afd59d6f2e64536028fe6504a894afcaamari.xlog: attach,sync += information about on-service time2023-03-22T10:45:50+03:00Kirill Smelkovkirr@nexedi.com
We currently emit information about local time in events, and
information about on-service time in messages. Events don't have
information about on-service time and messages don't carry information
about local time. That is mostly ok, since primary xlog setup is to run
on the same machine, where eNB runs because on-service .utc correlates
with .time in events.
However for eNB < 2022-12-01 on-service time includes only .time field
without .utc field with .time representing "time passed since when eNB
was started". This way for enb.xlog streams generated on older systems
it is not possible for xlog.Reader to know the absolute timestamps of
read messages.
To fix this we amend "attach" and "sync" events to carry both local and
on-service times. This way xlog.Reader, after seeing e.g. "sync" with
.time and only .srv_time without .srv_utc, should be able to
correlate local and on-service clocks and to approximate srv_utc as
srv_utc' = srv_time' + (time - srv_time)
where time and srv_time correspond to last synchronization, and
srv_time' is what xlog.Reader retrieves for a further-read message in
question.
See <a href="https://lab.nexedi.com/kirr/xlte/merge_requests/3" data-original="https://lab.nexedi.com/kirr/xlte/merge_requests/3" data-link="false" data-link-reference="true" data-project="1550" data-merge-request="6384" data-project-path="kirr/xlte" data-iid="3" data-mr-title="WIP: amari.xlog: Approximate utc timestamp" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!3</a> for related discussion.https://lab.node.vifib.com/kirr/xlte/-/commit/dbecc158a393691463643026efa4fb25b25d3f9damari.xlog: Teach Reader to read xlog in reverse order from end to start2023-03-22T10:45:50+03:00Kirill Smelkovkirr@nexedi.com
This functionality is useful to look at tail of a log without reading it
in full from the start.https://lab.node.vifib.com/kirr/xlte/-/commit/515c15739b070a894e794fac224225712a89701eamari.xlog: Reader: tests: Verify yielded .pos2023-03-22T10:45:50+03:00Kirill Smelkovkirr@nexedi.com
.pos verification should be there from xlog.Reader start in <a href="/kirr/xlte/-/commit/0633d26f7477932333c117c70b3fbae3a64deb9d" data-original="0633d26f" data-link="false" data-link-reference="false" data-project="1550" data-commit="0633d26f7477932333c117c70b3fbae3a64deb9d" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="amari.xlog += Reader" class="gfm gfm-commit has-tooltip">0633d26f</a>
(amari.xlog += Reader)https://lab.node.vifib.com/kirr/xlte/-/commit/b412d488473e2ddee78c77d9d94d29fab122aaccamari.xlog: Require sync to be present at least every 1000 records2023-03-22T10:45:50+03:00Kirill Smelkovkirr@nexedi.com
This way xlog.Reader can be sure that if it looked around in such a
window and did not find a sync, then something is not good with the
stream and it does not need to go beyond that limit looking around.
This is a change of the protocol. But it is early days and existing logs
- that we use in the demo, are all below 1000 lines limit, so they will
continue to be loaded ok.
No direct test for actual Loss Of Sync detection - this functionality is
draft for now and should be improved later. However for no-LOS cases
xlog.Reader is already covered with tests.https://lab.node.vifib.com/kirr/xlte/-/commit/9d9d20f31cdba9b6ebcd284c0a789878932b8fb2amari.xlog: Unify start with sync2023-03-22T10:45:40+03:00Kirill Smelkovkirr@nexedi.com
Let's use "sync(reason=start)" instead of dedicated "start" event for
uniformity. Periodic syncs are now "sync(reason=periodic)" and after
logrotation support there will be also "pre-logrotate" and
"post-logrotate" reasons. Emit "sync(reason=stop)" at xlog shutdown for
uniformity and to make it more clear from looking at just enb.xlog about
what is xlog state at the end.
Stop requiring "start" to be present in the header - we will soon rework
xlog reader to look around for nearby sync automatically so that reading
could be started from any position in the stream.https://lab.node.vifib.com/kirr/xlte/-/commit/67ece6014ac2989f09912b5b2577579cd0c3653eamari.xlog: Emit config_get after every sync(attached) instead of only after ...2023-03-21T15:11:00+03:00Kirill Smelkovkirr@nexedi.com
We emit config_get after every attach from the beginning of xlog in
<a href="/kirr/xlte/-/commit/e0cc8a38ce6d57906e9ef0e012b03c1071076f48" data-original="e0cc8a38" data-link="false" data-link-reference="false" data-project="1550" data-commit="e0cc8a38ce6d57906e9ef0e012b03c1071076f48" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="amari.xlog: Initial draft" class="gfm gfm-commit has-tooltip">e0cc8a38</a> (amari.xlog: Initial draft). The reasoning here is that it is
useful by default to know configuration of a service.
In the previous patch we added sync events so that xlog stream becomes
self-synchronizing. To continue that line it is now useful to have that
config_get emitted after every such synchronization point instead of
only after attaching to the service. That's what hereby patch does.
As a bonus the code is reworked in a way that config_get setup is not
hardcoded anymore and config_get periodicity now can be controlled by
users via explicitly specifying config_get in the logspec.https://lab.node.vifib.com/kirr/xlte/-/commit/964f954a53ca56e8f3c18d5d105cf2d7603bd393amari.xlog: Emit sync events periodically2023-03-21T15:09:41+03:00Kirill Smelkovkirr@nexedi.com
So that xlog stream becomes self-synchronized and could be used even if
we start reading it from some intermediate point instead of only from
the beginning.
We will need this in general - to be able to start reading long log not
only from its beginning, and also in particular for Wendelin systems
where logs are uploaded by Fluentd in chunks and some chunks could be
potentially lost.
Sync events are emitted always unconditionally with default sync
interval being 10x the longest specified period. We also provide users a
way to control sync periodicity via explicitly specifying
"meta.sync/period" query in the logspec.
See <a href="https://lab.nexedi.com/kirr/xlte/merge_requests/3#note_175796" data-original="https://lab.nexedi.com/kirr/xlte/merge_requests/3#note_175796" data-link="false" data-link-reference="true" data-project="1550" data-merge-request="6384" data-project-path="kirr/xlte" data-iid="3" data-mr-title="WIP: amari.xlog: Approximate utc timestamp" data-reference-type="merge_request" data-container="body" data-placement="top" data-html="true" title="" class="gfm gfm-merge_request">!3 (comment 175796)</a> and
further for related discussion.
This is change of xlog protocol. But it is early days and the only
direct consumer of xlog is amari.kpi which we adjust accordingly. So it
should be ok.https://lab.node.vifib.com/kirr/xlte/-/commit/271fad82a1b61d577800a757249edb842443e5c0amari.xlog: Add LogSpec constructor2023-03-17T07:37:45+03:00Kirill Smelkovkirr@nexedi.com
We will soon need to construct logspecs not only by way of parsing.https://lab.node.vifib.com/kirr/xlte/-/commit/393b52702e669fe6d4610512979b699720f0abb5fixup! amari.xlog: Move main logger to a thread2023-03-17T07:27:40+03:00Kirill Smelkovkirr@nexedi.com
In <a href="/kirr/xlte/-/commit/79d10eb9209ae6a62a28338aa47a675910a0b95c" data-original="79d10eb9" data-link="false" data-link-reference="false" data-project="1550" data-commit="79d10eb9209ae6a62a28338aa47a675910a0b95c" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="amari.xlog: Move main logger to a thread" class="gfm gfm-commit has-tooltip">79d10eb9</a> that patch wired ctx through xlog callchains and added
corresponding handling of cancellation. But I overlooked one place where
plain sleep was used.
-> Fix it.https://lab.node.vifib.com/kirr/xlte/-/commit/2a016d486cc3cea1f4143ebc176b502dc61486a8Draft support for E-UTRAN IP Throughput KPI2023-03-09T11:41:12+03:00Kirill Smelkovkirr@nexedi.com
The most interesting patches are
- <a href="/kirr/xlte/-/commit/d102ffaaa518c60ac6626311b0d8c5d69b90d82f" data-original="d102ffaa" data-link="false" data-link-reference="false" data-project="1550" data-commit="d102ffaaa518c60ac6626311b0d8c5d69b90d82f" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="amari.drb: Start of the package" class="gfm gfm-commit has-tooltip">d102ffaa</a> (drb: Start of the package)
- <a href="/kirr/xlte/-/commit/5bf7dc1c9460b58e2f2e40ff2b68ea196c4bb918" data-original="5bf7dc1c" data-link="false" data-link-reference="false" data-project="1550" data-commit="5bf7dc1c9460b58e2f2e40ff2b68ea196c4bb918" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="amari.{drb,xlog}: Provide aggregated DRB statistics in the form of synthetic x.drb_stats message" class="gfm gfm-commit has-tooltip">5bf7dc1c</a> (amari.{drb,xlog}: Provide aggregated DRB statistics in the form of synthetic x.drb_stats message)
- <a href="/kirr/xlte/-/commit/499a7c1be9f5b038f210c3e3a4f80bd28ffffd22" data-original="499a7c1b" data-link="false" data-link-reference="false" data-project="1550" data-commit="499a7c1be9f5b038f210c3e3a4f80bd28ffffd22" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="amari.kpi: Teach LogMeasure to handle x.drb_stats messages" class="gfm gfm-commit has-tooltip">499a7c1b</a> (amari.kpi: Teach LogMeasure to handle x.drb_stats messages)
- <a href="/kirr/xlte/-/commit/2824f50ddb5ad83950f30899559496e1cd9fb6ec" data-original="2824f50d" data-link="false" data-link-reference="false" data-project="1550" data-commit="2824f50ddb5ad83950f30899559496e1cd9fb6ec" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="kpi: Calc: Add support for E-UTRAN IP Throughput KPI" class="gfm gfm-commit has-tooltip">2824f50d</a> (kpi: Calc: Add support for E-UTRAN IP Throughput KPI)
- <a href="/kirr/xlte/-/commit/4b2c8c2194d24664f345784dd3232a21f2460d83" data-original="4b2c8c21" data-link="false" data-link-reference="false" data-project="1550" data-commit="4b2c8c2194d24664f345784dd3232a21f2460d83" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="demo/kpidemo.*: Add support for E-UTRAN IP Throughput KPI + demonstrate it in the notebook" class="gfm gfm-commit has-tooltip">4b2c8c21</a> (demo/kpidemo.*: Add support for E-UTRAN IP Throughput KPI + demonstrate it in the notebook)
The other patches introduce or adjust needed infrastructure. A byproduct
of particular note is that kpi.Measurement now supports QCI.
A demo might be seen in the last part of
<a href="https://nbviewer.org/urls/lab.nexedi.com/kirr/xlte/raw/43aac33e/demo/kpidemo.ipynb" rel="nofollow noreferrer noopener" target="_blank">https://nbviewer.org/urls/lab.nexedi.com/kirr/xlte/raw/43aac33e/demo/kpidemo.ipynb</a>
And below we provide the overall overview of the implementation.
Overview of E-UTRAN IP Throughput computation
---------------------------------------------
Before we begin explaining how IP Throughput is computed, let's first refresh
what it is and have a look at what is required to compute it reasonably.
This KPI is defined in TS 32.450[1] and aggregates transmission volume and
time over bursts of transmissions from an average UE point of view. It should be
particularly noted that only the time, during which transmission is going on,
should be accounted. For example if an UE receives 10KB over 4ms burst and the rest of
the time there is no transmission to it during, say, 1 minute, the downlink IP
Throughput for that UE over the minute is 20Mbit/s (= 8·10KB/4ms), not 1.3Kbit/s (= 8·10KB/60s).
This KPI basically shows what would be the speed to e.g. download a response for
HTTP request issued from a mobile.
[1] <a href="https://www.etsi.org/deliver/etsi_ts/132400_132499/132450/16.00.00_60/ts_132450v160000p.pdf#page=13" rel="nofollow noreferrer noopener" target="_blank">https://www.etsi.org/deliver/etsi_ts/132400_132499/132450/16.00.00_60/ts_132450v160000p.pdf#page=13</a>
To compute IP Throughput we thus need to know Σ of transmitted amount
of bytes, and Σ of the time of all transmission bursts.
Σ of the bytes is relatively easy to get. eNB already provides close values in
overall `stats` and in per-UE `ue_get[stats]` messages. However there is no
anything readily available out-of-the box for Σ of bursts transmission time.
Thus we need to measure the time of transmission bursts ourselves somehow.
It turns out that with current state of things the only practical way to
measure it to some degree is to poll eNB frequently with `ue_get[stats]` and
estimate transmission time based on δ of `ue_get` timestamps.
Let's see how frequently we need to poll to get to reasonably accuracy of resulting throughput.
A common situation for HTTP requests issued via LTE is that response content
downloading time takes only few milliseconds. For example I used chromium
network profiler to access various sites via internet tethered from my phone
and saw that for many requests response content downloading time was e.g. 4ms,
5ms, 3.2ms, etc. The accuracy of measuring transmission time should be thus in
the order of millisecond to cover that properly. It makes a real difference for
reported throughput, if say a download sample with 10KB took 4ms, or it took
e.g. "something under 100ms". In the first case we know that for that sample
downlink throughput is 2500KB/s, while in the second case all we know is that
downlink throughput is "higher than 100KB/s" - a 25 times difference and not
certain. Similarly if we poll at 10ms rate we would get that throughput is "higher
than 1000KB/s" - a 2.5 times difference from actual value. The accuracy of 1
millisecond coincides with TTI time and with how downlink/uplink transmissions
generally work in LTE.
With the above the scheme to compute IP Throughput looks to be as
follows: poll eNB at 1000Hz rate for `ue_get[stats]`, process retrieved
information into per-UE and per-QCI streams, detect bursts on each UE/QCI pair,
and aggregate `tx_bytes` and `tx_time` from every burst.
It looks to be straightforward, but 1000Hz polling will likely create
non-negligible additional load on the system and disturb eNB itself
introducing much jitter and harming its latency requirements. That's probably
why eNB actually rate-limits WebSocket requests not to go higher than 100Hz -
the frequency 10 times less compared to what we need to get to reasonable
accuracy for IP throughput.
Fortunately there is additional information that provides a way to improve
accuracy of measured `tx_time` even when polled every 10ms at 100Hz rate:
that additional information is the number of transmitted transport blocks to/from
an UE. If we know that during 10ms frame it was e.g. 4 transport blocks transmitted
to the UE, that there were no retransmissions *and* that eNB is not congested, we can
reasonably estimate that it was actually a 4ms transmission. And if eNB is
congested we can still say that transmission time is somewhere in `[4ms, 10ms]`
interval because transmitting each transport block takes 1 TTI. Even if
imprecise that still provides some information that could be useful.
Also 100Hz polling turns to be acceptable from performance point of view and
does not disturb the system much. For example on the callbox machine the process,
that issues polls, takes only about 3% of CPU load and only on one core, and
the CPU usage of eNB does not practically change and its reported tx/rx latency
does not change as well. For sure, there is some disturbance, but it appears to
be small. To have a better idea of what rate of polling is possible, I've made
an experiment with the poller accessing my own websocket echo server quickly
implemented in python. Both the poller and the echo server are not optimized,
but without rate-limiting they could go to 8000Hz frequency with reaching 100%
CPU usage of one CPU core. That 8000Hz is 80x times more compared to 100Hz
frequency actually allowed by eNB. This shows what kind of polling
frequency limit the system can handle, if absolutely needed, and that 100Hz
turns out to be not so high a frequency. Also the Linux 5.6 kernel, installed
on the callbox from Fedora32, is configured with `CONFIG_HZ=1000`, which is
likely helping here.
Implementation overview
~~~~~~~~~~~~~~~~~~~~~~~
The scheme to compute E-UTRAN IP Throughput is thus as follows: poll eNB at
100Hz frequency for `ue_get[stats]` and retrieve information about per-UE/QCI
streams and the number of transport blocks dl/ul-ed to the UE in question
during that 10ms frame. Estimate `tx_time` taking into account
the number of transmitted transport blocks. And estimate whether eNB is congested or
not based on `dl_use_avg`/`ul_use_avg` taken from `stats`. For the latter we
also need to poll for `stats` at 100Hz frequency and synchronize
`ue_get[stats]` and `stats` requests in time so that they both cover the same
time interval of particular frame.
Then organize the polling process to provide aggregated statistics in the form of
new `x.drb_stats` message, and teach `xamari xlog` to save that messages to
`enb.xlog` together with `stats`. Then further adjust `amari.kpi.LogMeasure`
and generic `kpi.Measurement` and `kpi.Calc` to handle DRB-related data.
That is how it is implemented.
The main part, that performs 100Hz polling and flow aggregation, is in amari/drb.py.
There `Sampler` extracts bursts of data transmissions from stream of `ue_get[stats]`
observations and `x_stats_srv` organizes whole 100Hz sampling process and provides
aggregated `x.drb_stats` messages to `amari.xlog`.
Even though the main idea is relatively straightforward, several aspects
deserves to be noted:
1. information about transmitted bytes and corresponding transmitted transport
blocks is emitted by eNB not synchronized in time. The reason here is that,
for example, for DL a block is transmitted via PDCCH+PDSCH during one TTI, and
then the base station awaits HARQ ACK/NACK. That ACK/NACK comes later via
PUCCH or PUSCH. The time window in between original transmission and
reception of the ACK/NACK is 4 TTIs for FDD and 4-13 TTIs for TDD(*).
And Amarisoft LTEENB updates counters for dl_total_bytes and dl_tx at
different times:
ue.erab.dl_total_bytes - right after sending data on PDCCH+PDSCH
ue.cell.{dl_tx,dl_retx} - after receiving ACK/NACK via PUCCH|PUSCH
this way an update to dl_total_bytes might be seen in one frame (= 10·TTI),
while corresponding update to dl_tx/dl_retx might be seen in either same, or
next, or next-next frame.
`Sampler` brings δ(tx_bytes) and #tx_tb in sync itself via `BitSync`.
2. when we see multiple transmissions related to UE on different QCIs, we
cannot directly use corresponding global number of transport blocks to estimate
transmissions times because we do not know how eNB scheduler placed those
transmissions onto resource map. So without additional information we can only
estimate corresponding lower and upper bounds.
3. for output stability and to avoid throughput being affected by partial fill
of tail TTI of a burst, E-UTRAN IP Throughput is required to be computed
without taking into account last TTI of every sample. We don't have that
level of details since all we have is total amount of transmitted bytes in a
burst and estimation of how long in time the burst is. Thus, once again, we
can only provide an estimation so that resulting E-UTRAN IP
Throughput uncertainty window cover the right value required by 3GPP standard.
A curious reader might be interested to look at tests in `amari/drb_test.py` ,
and at the whole changes that brought E-UTRAN IP Throughput alive.
Limitations
~~~~~~~~~~~
Current implementation has the following limitations:
- we account whole PDCP instead of only IP traffic.
- the KPI is computed with uncertainty window instead of being precise even when the
connection to eNB is alive all the time. The shorter bursts are the more
the uncertainty.
- the implementation works correctly for FDD, but not for TDD. That's because
BitSync currently supports only "next frame" case and support for "next-next
frame" case is marked as TODO.
- eNB `t` monitor command practically stops working and now only reports
``Warning, remote API ue_get (stats = true) pending...`` instead of reporting
useful information. This is due to that contrary to `stats`, for `ue_get` eNB
does not maintain per-connection state and uses global singleton counters.
- the performance overhead might be more noticeable on machines less
powerful compared to callbox.
To address the limitations I plan to talk to Amarisoft about eNB improvements
so that E-UTRAN IP Throughput could be computed precisely from DRB statistics
directly provided by eNB itself.
However it is still useful to have current implementation, even with all its
limitations, because it already works today with existing eNB versions.
Kirillhttps://lab.node.vifib.com/kirr/xlte/-/commit/43aac33e4697602622b2ebf13dfff75869fbcc6f*: Cosmetics + minor2023-03-09T11:30:43+03:00Kirill Smelkovkirr@nexedi.com
Noticed while developing support for E-UTRAN IP Throughtput.https://lab.node.vifib.com/kirr/xlte/-/commit/4b2c8c2194d24664f345784dd3232a21f2460d83demo/kpidemo.*: Add support for E-UTRAN IP Throughput KPI + demonstrate it in...2023-03-09T11:30:41+03:00Kirill Smelkovkirr@nexedi.com
Show how to compute that KPI, add corresponding plotting routines, and
teach kpidemo.py to display both E-RAB Accessibility and E-UTRAN IP
Throughput simultaneously in the same window.
Add corresponding demonstration into demo notebook with data from
throughput experiment showcasing several scenarious and how E-UTRAN IP
Throughput implementation handles them.https://lab.node.vifib.com/kirr/xlte/-/commit/517859806db82c8e581dabb8febd7b1550c9cfc5demo/kpidemo.*: Refactor commonly used bits into helper routines2023-03-09T01:56:43+03:00Kirill Smelkovkirr@nexedi.com
- move code to load amari.kpi.LogMeasure -> kpi.MeasurementLog into
load_measurements(). We will need to use that when showcasing E-UTRAN
IP Throughput KPI to load another enb.xlog dataset.
- factor code to iterate over MeasurementLog and invoke kpi.Calc on each
period into calc_each_period(). Same reason.
- factor plotting code into helper routines located only in kpidemo.py.
The notebook version now uses those routines by way of importing. The
plotting code is not helping to understand the KPI computation
pipeline usage, so it makes sense not to show it out of the box in the
demo notebook.https://lab.node.vifib.com/kirr/xlte/-/commit/1dc74b0ca58c4297d2d97e16726f3ff63e8fad32t/udpflood: Test program to simulate transmission bursts2023-03-09T01:56:43+03:00Kirill Smelkovkirr@nexedi.com
It is useful to verify E-UTRAN IP Throughput KPI implementation, as
that KPI is defined in terms of burst samples.https://lab.node.vifib.com/kirr/xlte/-/commit/2824f50ddb5ad83950f30899559496e1cd9fb6eckpi: Calc: Add support for E-UTRAN IP Throughput KPI2023-03-09T01:55:56+03:00Kirill Smelkovkirr@nexedi.com
This patch provides the final building block for E-UTRAN IP Throughput KPI.
It continues
<a href="/kirr/xlte/-/commit/d102ffaaa518c60ac6626311b0d8c5d69b90d82f" data-original="d102ffaa" data-link="false" data-link-reference="false" data-project="1550" data-commit="d102ffaaa518c60ac6626311b0d8c5d69b90d82f" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="amari.drb: Start of the package" class="gfm gfm-commit has-tooltip">d102ffaa</a> (drb: Start of the package)
<a href="/kirr/xlte/-/commit/5bf7dc1c9460b58e2f2e40ff2b68ea196c4bb918" data-original="5bf7dc1c" data-link="false" data-link-reference="false" data-project="1550" data-commit="5bf7dc1c9460b58e2f2e40ff2b68ea196c4bb918" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="amari.{drb,xlog}: Provide aggregated DRB statistics in the form of synthetic x.drb_stats message" class="gfm gfm-commit has-tooltip">5bf7dc1c</a> (amari.{drb,xlog}: Provide aggregated DRB statistics in the form of synthetic x.drb_stats message)
<a href="/kirr/xlte/-/commit/499a7c1be9f5b038f210c3e3a4f80bd28ffffd22" data-original="499a7c1b" data-link="false" data-link-reference="false" data-project="1550" data-commit="499a7c1be9f5b038f210c3e3a4f80bd28ffffd22" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="amari.kpi: Teach LogMeasure to handle x.drb_stats messages" class="gfm gfm-commit has-tooltip">499a7c1b</a> (amari.kpi: Teach LogMeasure to handle x.drb_stats messages)
Quoting those patches
The scheme to compute E-UTRAN IP Throughput is thus as follows: poll eNB at
100Hz frequency for `ue_get[stats]` and retrieve information about per-UE/QCI
streams and the number of transport blocks dl/ul-ed to the UE in question
during that 10ms frame. Estimate `tx_time` taking into account
the number of transmitted transport blocks. And estimate whether eNB is congested or
not based on `dl_use_avg`/`ul_use_avg` taken from `stats`. For the latter we
also need to poll for `stats` at 100Hz frequency and synchronize
`ue_get[stats]` and `stats` requests in time so that they both cover the same
time interval of particular frame.
Then organize the polling process to provide aggregated statistics in the form of
new `x.drb_stats` message, and teach `xamari xlog` to save that messages to
`enb.xlog` together with `stats`.
Then further adjust `amari.kpi.LogMeasure` and generic `kpi.Measurement`
and `kpi.Calc` to handle DRB-related data. <-- NOTE
So here we implement that last noted step:
We add Calc.eutran_ip_throughput() whose implementation is relatively
straightforward as the hard part is done by amari.drb and amari.kpi - in the
Calc we basically need to only divide provided DRB.IPVolDl / DRB.IPTimeDl.https://lab.node.vifib.com/kirr/xlte/-/commit/499a7c1be9f5b038f210c3e3a4f80bd28ffffd22amari.kpi: Teach LogMeasure to handle x.drb_stats messages2023-03-09T01:54:33+03:00Kirill Smelkovkirr@nexedi.com
This patch provides next building block for E-UTRAN IP Throughput KPI
and continues
<a href="/kirr/xlte/-/commit/d102ffaaa518c60ac6626311b0d8c5d69b90d82f" data-original="d102ffaa" data-link="false" data-link-reference="false" data-project="1550" data-commit="d102ffaaa518c60ac6626311b0d8c5d69b90d82f" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="amari.drb: Start of the package" class="gfm gfm-commit has-tooltip">d102ffaa</a> (drb: Start of the package)
<a href="/kirr/xlte/-/commit/5bf7dc1c9460b58e2f2e40ff2b68ea196c4bb918" data-original="5bf7dc1c" data-link="false" data-link-reference="false" data-project="1550" data-commit="5bf7dc1c9460b58e2f2e40ff2b68ea196c4bb918" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="amari.{drb,xlog}: Provide aggregated DRB statistics in the form of synthetic x.drb_stats message" class="gfm gfm-commit has-tooltip">5bf7dc1c</a> (amari.{drb,xlog}: Provide aggregated DRB statistics in the form of synthetic x.drb_stats message)
Quoting those patches
The scheme to compute E-UTRAN IP Throughput is thus as follows: poll eNB at
100Hz frequency for `ue_get[stats]` and retrieve information about per-UE/QCI
streams and the number of transport blocks dl/ul-ed to the UE in question
during that 10ms frame. Estimate `tx_time` taking into account
the number of transmitted transport blocks. And estimate whether eNB is congested or
not based on `dl_use_avg`/`ul_use_avg` taken from `stats`. For the latter we
also need to poll for `stats` at 100Hz frequency and synchronize
`ue_get[stats]` and `stats` requests in time so that they both cover the same
time interval of particular frame.
Then organize the polling process to provide aggregated statistics in the form of
new `x.drb_stats` message, and teach `xamari xlog` to save that messages to
`enb.xlog` together with `stats`.
Then further adjust `amari.kpi.LogMeasure` <-- NOTE
and generic `kpi.Measurement` and `kpi.Calc` to handle DRB-related data.
So here we implement the noted step:
We teach LogMeasure to take x.drb_stats messages into account and update IP
Throughput related fields in appropriate Measurement from x.drb_stats
data.
This process is relatively straightforward besides one place: for stable
output E-UTRAN IP Throughput is required to be computed without taking
into account last TTI of every sample. We don't have that level of
details since all we have is total amount of transmitted bytes in a
burst and estimation of how long in time the burst is. Thus we can only
provide an estimation for the E-UTRAN IP Throughput as follows:
DRB.IPVol and DRB.IPTime are collected to compute throughput.
thp = ΣB*/ΣT* where B* is tx'ed bytes in the sample without taking last tti into account
and T* is time of tx also without taking that sample's tail tti.
we only know ΣB (whole amount of tx), ΣT and ΣT* with some error.
-> thp can be estimated to be inside the following interval:
ΣB ΣB
───── ≤ thp ≤ ───── (1)
ΣT_hi ΣT*_lo
the upper layer in xlte.kpi will use the following formula for
final throughput calculation:
DRB.IPVol
thp = ────────── (2)
DRB.IPTime
-> set DRB.IPTime and its error to mean and δ of ΣT_hi and ΣT*_lo
so that (2) becomes (1).
for this to work we also need to introduce new fields to Measurement
that represent error of DRB.IPTime. The hope is that introduction is
temporary and should be removed once we rework DRB stats to provide B*
and T* directly.https://lab.node.vifib.com/kirr/xlte/-/commit/fd7870f4c5d7f4dca4c1930cb4704e5230fda9cfamari.kpi: Rework LogMeasure to prepare Measurement incrementally2023-03-09T01:53:51+03:00Kirill Smelkovkirr@nexedi.com
We added LogMeasure in <a href="/kirr/xlte/-/commit/71087f67d5b1cd4444e3adfa9096816ed47aba92" data-original="71087f67" data-link="false" data-link-reference="false" data-project="1550" data-commit="71087f67d5b1cd4444e3adfa9096816ed47aba92" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="amari.kpi: New package with driver for Amarisoft LTE stack to retrieve..." class="gfm gfm-commit has-tooltip">71087f67</a> (amari.kpi: New package with driver for
Amarisoft LTE stack to retrieve KPI-related measurements from logs) and
its original logic is to read `stats` messages and to create Measurement
that covers [Sx, Sx+1) only after seeing Sx+1.
However in the next patch we will need to also take into account other
smaller messages besides stats, and for that messages we need
being-prepared Measurement to already exist to be able to amend it with
partial data we see. So we need to rework the process to create
Measurement that will cover [Sx, Sx+1) right after seeing Sx without
waiting for Sx+1 to come in.
This patch does that.
Along the way it unifies how events and stats are handled. Previously
events and stats were handled via different objects and the code had many
scattered places that tried to handle cases like event-event,
event-stats, stats-event and stats-stats. And for all those cases the
intent was that we still want to emit corresponding Measurement for all
of them, even if maybe if all NA data besides timestamps. Thus it does
not make sense to split events and stats into different flows - as we can
handle all combinations by considering just one flow of "stats or
events". This simplifies logic and removes several sporadic branches
of code to emit M(ø) around events. It also discovers several places
where we were not emitting such M(ø) even though the intent was to do
so. All this is fixed now with updated tests.https://lab.node.vifib.com/kirr/xlte/-/commit/5bf7dc1c9460b58e2f2e40ff2b68ea196c4bb918amari.{drb,xlog}: Provide aggregated DRB statistics in the form of synthetic ...2023-03-09T01:53:32+03:00Kirill Smelkovkirr@nexedi.com
This patch provides next building block for E-UTRAN IP Throughput KPI
and continues <a href="/kirr/xlte/-/commit/d102ffaaa518c60ac6626311b0d8c5d69b90d82f" data-original="d102ffaa" data-link="false" data-link-reference="false" data-project="1550" data-commit="d102ffaaa518c60ac6626311b0d8c5d69b90d82f" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="amari.drb: Start of the package" class="gfm gfm-commit has-tooltip">d102ffaa</a> (drb: Start of the package). Quoting that patch
The scheme to compute E-UTRAN IP Throughput is thus as follows: poll eNB at
100Hz frequency for `ue_get[stats]` and retrieve information about per-UE/QCI
streams and the number of transport blocks dl/ul-ed to the UE in question
during that 10ms frame. Estimate `tx_time` taking into account
the number of transmitted transport blocks. And estimate whether eNB is congested or
not based on `dl_use_avg`/`ul_use_avg` taken from `stats`. For the latter we
also need to poll for `stats` at 100Hz frequency and synchronize
`ue_get[stats]` and `stats` requests in time so that they both cover the same
time interval of particular frame.
Then organize the polling process to provide aggregated statistics in the form of <-- NOTE
new `x.drb_stats` message, and teach `xamari xlog` to save that messages to <-- NOTE
`enb.xlog` together with `stats`. <-- NOTE
Then further adjust `amari.kpi.LogMeasure` and generic
`kpi.Measurement` and `kpi.Calc` to handle DRB-related data.
So here we implement the noted step:
- add drv._x_stats_srv server that polls eNB at 100Hz rate, uses Sampler
to extract bursts and aggregates information about those bursts.
- teach xlog to organize servers for synthetic messages and communicate
with them, and register drv._x_stats_srv as such server to handle
generation of x.drb_stats message.https://lab.node.vifib.com/kirr/xlte/-/commit/78f26e3a418f4f87cc0405d18eff77bf90831632amari.drb += _IncStats2023-03-09T01:52:42+03:00Kirill Smelkovkirr@nexedi.com
An utility class to compute avg/std incrementally.
Thanks to <a href="https://www.johndcook.com/blog/standard_deviation/" rel="nofollow noreferrer noopener" target="_blank">https://www.johndcook.com/blog/standard_deviation/</a> for the
recipe of how to do it.https://lab.node.vifib.com/kirr/xlte/-/commit/d102ffaaa518c60ac6626311b0d8c5d69b90d82famari.drb: Start of the package2023-03-09T01:52:12+03:00Kirill Smelkovkirr@nexedi.com
This package will be used to implement E-UTRAN IP Throughput KPI.
In hereby patch we add `drb.Sampler` that extracts samples of
transmission bursts from `ue_get[stats]` observations.
Let's go through what E-UTRAN IP Throughput KPI is and how it motivates
functionality provided by this patch.
Overview of E-UTRAN IP Throughput computation
---------------------------------------------
This KPI is defined in TS 32.450 [1] and aggregates transmission volume and
time over bursts of transmissions from an average UE point of view. It should be
particularly noted that only the time, during which transmission is going on,
should be accounted. For example if an UE receives 10KB over 4ms burst and the rest of
the time there is no transmission to it during, say, 1 minute, the downlink IP
Throughput for that UE over the minute is 20Mbit/s (= 8·10KB/4ms), not 1.3Kbit/s (= 8·10KB/60s).
This KPI basically shows what would be the speed to e.g. download a response for
HTTP request issued from a mobile.
[1] <a href="https://www.etsi.org/deliver/etsi_ts/132400_132499/132450/16.00.00_60/ts_132450v160000p.pdf#page=13" rel="nofollow noreferrer noopener" target="_blank">https://www.etsi.org/deliver/etsi_ts/132400_132499/132450/16.00.00_60/ts_132450v160000p.pdf#page=13</a>
To compute IP Throughput we thus need to know Σ of transmitted amount
of bytes, and Σ of the time of all transmission bursts.
Σ of the bytes is relatively easy to get. eNB already provides close values in
overall `stats` and in per-UE `ue_get[stats]` messages. However there is no
anything readily available out-of-the box for Σ of bursts transmission time.
Thus we need to measure the time of transmission bursts ourselves somehow.
It turns out that with current state of things the only practical way to
measure it to some degree is to poll eNB frequently with `ue_get[stats]` and
estimate transmission time based on δ of `ue_get` timestamps.
Let's see how frequently we need to poll to get to reasonably accuracy of resulting throughput.
A common situation for HTTP requests issued via LTE is that response content
downloading time takes only few milliseconds. For example I used chromium
network profiler to access various sites via internet tethered from my phone
and saw that for many requests response content downloading time was e.g. 4ms,
5ms, 3.2ms, etc. The accuracy of measuring transmission time should be thus in
the order of millisecond to cover that properly. It makes a real difference for
reported throughput, if say a download sample with 10KB took 4ms, or it took
e.g. "something under 100ms". In the first case we know that for that sample
downlink throughput is 2500KB/s, while in the second case all we know is that
downlink throughput is "higher than 100KB/s" - a 25 times difference and not
certain. Similarly if we poll at 10ms rate we would get that throughput is "higher
than 1000KB/s" - a 2.5 times difference from actual value. The accuracy of 1
millisecond coincides with TTI time and with how downlink/uplink transmissions
generally work in LTE.
With the above the scheme to compute IP Throughput looks to be as
follows: poll eNB at 1000Hz rate for `ue_get[stats]`, process retrieved
information into per-UE and per-QCI streams, detect bursts on each UE/QCI pair,
and aggregate `tx_bytes` and `tx_time` from every burst.
It looks to be straightforward, but 1000Hz polling will likely create
non-negligible additional load on the system and disturb eNB itself
introducing much jitter and harming its latency requirements. That's probably
why eNB actually rate-limits WebSocket requests not to go higher than 100Hz -
the frequency 10 times less compared to what we need to get to reasonable
accuracy for IP throughput.
Fortunately there is additional information that provides a way to improve
accuracy of measured `tx_time` even when polled every 10ms at 100Hz rate:
that additional information is the number of transmitted transport blocks to/from
an UE. If we know that during 10ms frame it was e.g. 4 transport blocks transmitted
to the UE, that there were no retransmissions *and* that eNB is not congested, we can
reasonably estimate that it was actually a 4ms transmission. And if eNB is
congested we can still say that transmission time is somewhere in `[4ms, 10ms]`
interval because transmitting each transport block takes 1 TTI. Even if
imprecise that still provides some information that could be useful.
Also 100Hz polling turns to be acceptable from performance point of view and
does not disturb the system much. For example on the callbox machine the process,
that issues polls, takes only about 3% of CPU load and only on one core, and
the CPU usage of eNB does not practically change and its reported tx/rx latency
does not change as well. For sure, there is some disturbance, but it appears to
be small. To have a better idea of what rate of polling is possible, I've made
an experiment with the poller accessing my own websocket echo server quickly
implemented in python. Both the poller and the echo server are not optimized,
but without rate-limiting they could go to 8000Hz frequency with reaching 100%
CPU usage of one CPU core. That 8000Hz is 80x times more compared to 100Hz
frequency actually allowed by eNB. This shows what kind of polling
frequency limit the system can handle, if absolutely needed, and that 100Hz
turns out to be not so high a frequency. Also the Linux 5.6 kernel, installed
on the callbox from Fedora32, is configured with `CONFIG_HZ=1000`, which is
likely helping here.
Implementation overview
~~~~~~~~~~~~~~~~~~~~~~~
The scheme to compute E-UTRAN IP Throughput is thus as follows: poll eNB at
100Hz frequency for `ue_get[stats]` and retrieve information about per-UE/QCI
streams and the number of transport blocks dl/ul-ed to the UE in question
during that 10ms frame. Estimate `tx_time` taking into account
the number of transmitted transport blocks. And estimate whether eNB is congested or
not based on `dl_use_avg`/`ul_use_avg` taken from `stats`. For the latter we
also need to poll for `stats` at 100Hz frequency and synchronize
`ue_get[stats]` and `stats` requests in time so that they both cover the same
time interval of particular frame.
Then organize the polling process to provide aggregated statistics in the form of
new `x.drb_stats` message, and teach `xamari xlog` to save that messages to
`enb.xlog` together with `stats`. Then further adjust `amari.kpi.LogMeasure`
and generic `kpi.Measurement` and `kpi.Calc` to handle DRB-related data.
----------------------------------------
In this patch we provide first building block - `Sampler` that extracts bursts
of data transmissions from stream of `ue_get[stats]` observations.
Even though main idea behind `Sampler` is relatively straightforward, several
aspects deserves to be noted:
1. information about transmitted bytes and corresponding transmitted transport
blocks is emitted by eNB not synchronized in time. The reason here is that,
for example, for DL a block is transmitted via PDCCH+PDSCH during one TTI, and
then the base station awaits HARQ ACK/NACK. That ACK/NACK comes later via
PUCCH or PUSCH. The time window in between original transmission and
reception of the ACK/NACK is 4 TTIs for FDD and 4-13 TTIs for TDD (*).
And Amarisoft LTEENB updates counters for dl_total_bytes and dl_tx at
different times:
ue.erab.dl_total_bytes - right after sending data on PDCCH+PDSCH
ue.cell.{dl_tx,dl_retx} - after receiving ACK/NACK via PUCCH|PUSCH
this way an update to dl_total_bytes might be seen in one frame (= 10·TTI),
while corresponding update to dl_tx/dl_retx might be seen in either same, or
next, or next-next frame.
We bring `δ(tx_bytes)` and `#tx_tb` in sync ourselves via _BitSync.
(*) see e.g. Figure 8.1 in "An introduction to LTE, 2nd ed."
2. when we see multiple transmissions related to UE on different QCIs, we
cannot directly use corresponding number of transport blocks to estimate
transmissions times because we do not know how eNB scheduler placed those
transmissions onto resource map. So without additional information we can only
estimate corresponding lower and upper bounds.https://lab.node.vifib.com/kirr/xlte/-/commit/79d10eb9209ae6a62a28338aa47a675910a0b95camari.xlog: Move main logger to a thread2023-03-08T21:56:25+03:00Kirill Smelkovkirr@nexedi.com
We will soon need to run 2 threads:
- one with the main logger, and
- another one to serve requests for synthetic x.drb_stats queries
Both main and the second thread will be run via sync.WorkGroup to cancel
each other in case of failure somewhere. So since WorkGroup.wait(),
similarly to all pygolang operations, is not interrupted by signals(*),
we need to wire ctx to be passed through all operations and manage to
cancel that context on SIGINT/SIGTERM.
This patch:
1. adjusts xlog to wire ctx through all call chains and moves ._xlog1()
to be run in the thread.
2. adjusts amari.Conn to take ctx as argument on all operations and
react reasonably on that ctx cancel. We need to do it here because
xlog uses Conn internally.
3. adjusts xamari main driver to setup root context that is canceled on
SIGINT/SIGTERM similarly e.g. to how nxdtest does it in
<a href="https://lab.nexedi.com/nexedi/nxdtest/commit/b0cf277d" data-original="https://lab.nexedi.com/nexedi/nxdtest/commit/b0cf277d" data-link="false" data-link-reference="true" data-project="1269" data-commit="b0cf277d68d40df9d9bf2a75b7e49a6741980948" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="Cancel test run on SIGINT/SIGTERM" class="gfm gfm-commit has-tooltip">nexedi/nxdtest@b0cf277d</a> .
(*) see <a href="https://lab.nexedi.com/nexedi/pygolang/commit/e18adbab" data-original="https://lab.nexedi.com/nexedi/pygolang/commit/e18adbab" data-link="false" data-link-reference="true" data-project="1156" data-commit="e18adbabda57f5af3c33ed071f39db1874a5f118" data-reference-type="commit" data-container="body" data-placement="top" data-html="true" title="Nogil signals" class="gfm gfm-commit has-tooltip">nexedi/pygolang@e18adbab</a> for details.https://lab.node.vifib.com/kirr/xlte/-/commit/c967c8b55078e5d334bfc8aca3e9f9dfaaad838camari.xlog: Rework "service detach" to be detected and done via defer instead...2023-03-08T21:27:24+03:00Kirill Smelkovkirr@nexedi.com
We will soon add more levels of trying to this part of the code and
linear defers are easier to follow compared to many levels of try/except
nesting.https://lab.node.vifib.com/kirr/xlte/-/commit/749e1659a17795beefe410989f0a85033e7ea14aamari: Conn: Provide a way to retrieve websocket URI to where a Conn is conne...2023-03-08T21:17:19+03:00Kirill Smelkovkirr@nexedi.com
We will soon need this to know at runtime the address of eNB service
attached by Conn to establish another connection attached to the same eNB.https://lab.node.vifib.com/kirr/xlte/-/commit/ffffb93332c3c25e76f64103f9ccfe5f185d5b86kpi: Add support for QCI to Measurements2023-03-08T20:51:54+03:00Kirill Smelkovkirr@nexedi.com
Previously for Measurement fields with .QCI or .CAUSE suffix we had only
the .sum value and no per-QCI nor per-CAUSE values. In other words
support for QCI and CAUSE was stub. In this patch we add support for
QCI: every field X.QCI is now automatically expanded into X[256] array
and X.sum . For convenience we also provide X.<qci> aliases that alias
X[qci]. For example field DRB.IPVolDl.9 aliases 9'th element of
DRB.IPVolDl array.
We will need QCI support for E-UTRAN IP Throughput KPI which is required
to provide resulting values for every QCI individually.
CAUSE support remains stub for now.