Commit 77f57fec authored by Rafael Monnerat's avatar Rafael Monnerat

slapos/collect: Preserve entries at the database for 15 days

  This may case more memory usage and more 'live' data to handle,
  for this reason, I'm making it configurable
parent 04bdaacb
...@@ -89,7 +89,8 @@ def do_collect(conf): ...@@ -89,7 +89,8 @@ def do_collect(conf):
user_dict[snapshot.username].append(snapshot) user_dict[snapshot.username].append(snapshot)
except (KeyboardInterrupt, SystemExit, NoSuchProcess): except (KeyboardInterrupt, SystemExit, NoSuchProcess):
raise raise
days_to_preserve = conf.getint("slapos", "collect_cache", 15)
log_directory = "%s/var/data-log" % conf.get("slapos", "instance_root") log_directory = "%s/var/data-log" % conf.get("slapos", "instance_root")
mkdir_p(log_directory, 0o755) mkdir_p(log_directory, 0o755)
...@@ -161,7 +162,7 @@ def do_collect(conf): ...@@ -161,7 +162,7 @@ def do_collect(conf):
compressLogFolder(log_directory) compressLogFolder(log_directory)
# Drop older entries already reported # Drop older entries already reported
database.garbageCollect() database.garbageCollect(int(days_to_preserve))
except AccessDenied: except AccessDenied:
print("You HAVE TO execute this script with root permission.") print("You HAVE TO execute this script with root permission.")
......
...@@ -259,7 +259,7 @@ class Database: ...@@ -259,7 +259,7 @@ class Database:
return [i[0] for i in self._execute( return [i[0] for i in self._execute(
"SELECT name FROM sqlite_master WHERE type='table'")] "SELECT name FROM sqlite_master WHERE type='table'")]
def _getGarbageCollectionDateList(self, days_to_preserve=3): def _getGarbageCollectionDateList(self, days_to_preserve):
""" Return the list of dates to Preserve when data collect """ Return the list of dates to Preserve when data collect
""" """
base = datetime.datetime.utcnow().date() base = datetime.datetime.utcnow().date()
...@@ -268,11 +268,11 @@ class Database: ...@@ -268,11 +268,11 @@ class Database:
date_list.append((base - datetime.timedelta(days=x)).strftime("%Y-%m-%d")) date_list.append((base - datetime.timedelta(days=x)).strftime("%Y-%m-%d"))
return date_list return date_list
def garbageCollect(self): def garbageCollect(self, days_to_preserve=3):
""" Garbase collect the database, by removing older records already """ Garbase collect the database, by removing older records already
reported. reported.
""" """
date_list = self._getGarbageCollectionDateList() date_list = self._getGarbageCollectionDateList(days_to_preserve)
where_clause = "reported = 1" where_clause = "reported = 1"
for _date in date_list: for _date in date_list:
where_clause += " AND date != '%s' " % _date where_clause += " AND date != '%s' " % _date
......
...@@ -214,7 +214,7 @@ class TestCollectDatabase(unittest.TestCase): ...@@ -214,7 +214,7 @@ class TestCollectDatabase(unittest.TestCase):
def test_garbage_collection_date_list(self): def test_garbage_collection_date_list(self):
database = db.Database(self.instance_root) database = db.Database(self.instance_root)
self.assertEqual(len(database._getGarbageCollectionDateList()), 3) self.assertEqual(len(database._getGarbageCollectionDateList(3)), 3)
self.assertEqual(len(database._getGarbageCollectionDateList(1)), 1) self.assertEqual(len(database._getGarbageCollectionDateList(1)), 1)
self.assertEqual(len(database._getGarbageCollectionDateList(0)), 0) self.assertEqual(len(database._getGarbageCollectionDateList(0)), 0)
......
  • Hi @rafael :

    What was the reason to extend the history to 15 days instead of 3 ?

    I'm asking because on a recent production server I was getting alerts for too high load (around 50 of load for a 16-core CPU). Thanks to iotop I found that slapos node collect was using way too much io. I run a few test :

    root@server:/srv/slapgrid/var/data-log# du -sh collector.db 
    517M    collector.db
    root@server:/root# time slapos node collect
    real    0m40.205s
    user    0m3.768s
    sys     0m3.760s

    Then I switched the parameter in slapos.cfg to only keep 3 days of history :

    root@server:/srv/slapgrid/var/data-log# du -sh collector.db 
    78M     collector.db
    root@server:/srv/slapgrid/var/data-log# time slapos node collect # run twice, to test a "normal" case
    real    0m10.675s
    user    0m0.804s
    sys     0m0.660s

    This server has only 2 slapparts used (a runner0 and a PBS).

    So keeping only 3 days make the process 75% faster. As this process is executed every minute, 10s of run is acceptable, whereas 40s is killing my server. I would like to commit back the default value of 3 days, as even if this is configurable, I would prefer shipping with reasonable defaults.

    Also, only entries marked as already exported are garbage-collected. I don't understand well how everything plays together, but I'm not sure of why we should better keep Day-15 entries, when, all entries older than Day-1 are already exported.

    Edited by Nicolas Wavrant
  • Can you try with pragma_mode=WAL ?

    It is indeed not efficient to record a month specially for slow disks, perhaps is a lack of index or adjustment on the sqlite. The increment was to buuld a wider chart on the monitor UI, so I tested hold larger period.

Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment