Commit 2562f56f authored by Jérome Perrin's avatar Jérome Perrin

software/grafana: version up and include loki/promtail

This enable vieiwing applications logs directly in grafana, as long as
loki agent can access the log file and the regular expressions to parse
the logs are defined

Other changes:
 - provision data sources automatically so that user do not have to add
   them manually
 - update TODO in README
 - introduce a SR test
 - switch to dep to install dependencies, because that's how these
  packages manage their dependencies.

Versions up:
  grafana: v6.3.0
  telegraf: v1.11.1-0
  influxdb: v1.7.8rc1
New:
  loki: v0.3.0
parent e70c9e52
# grafana / telegraf / influxdb
This is an experimental integration, mainly to evaluate these solutions.
## Custom telegraf plugins
See https://github.com/influxdata/telegraf to learn about plugins.
Useful plugins in this context are probably
[exec](https://github.com/influxdata/telegraf/tree/1.5.1/plugins/inputs/exec)
[exec](https://github.com/influxdata/telegraf/tree/1.11.1/plugins/inputs/exec),
[logparser](https://github.com/influxdata/telegraf/tree/1.11.1/plugins/inputs/logparser)
or
[httpjson](https://github.com/influxdata/telegraf/tree/1.5.1/plugins/inputs/httpjson).
[http](https://github.com/influxdata/telegraf/tree/1.11.1/plugins/inputs/http).
Telegraf will save in the `telegraf` database from the embedded influxdb server.
## Grafana
You'll have to add yourself the influxdb data source in grafana, using the
parameters published by the slapos instance.
http://docs.grafana.org/features/datasources/influxdb/
A default user is created, username and password are published as connection
parameters. You can add more users in grafana interface.
When adding datasource, use *proxy* option, otherwise Grafana makes your
browser query influxdb directly, which also uses a self signed certificate.
One workaround is to configure your browser to also accept influxdb certificate
before using grafana, but using proxy seems easier.
Datasources should be automatically added.
## Influxdb
......@@ -34,8 +31,32 @@ One important thing to notice is that the backup protocol is enabled on ipv4
provided by slapos, so make sure this ip is not reachable from untrusted
sources.
## TODO
# Ingesting/Visualizing logs
Eventhough main feature is visualizing metrics, Grafana has a feature called "Explore" to view logs for a time frame.
The following backend can be used:
## Loki
See `TestLoki` in test for an example.
## Influxdb
Influxdb logs only have tags and there does not seem to be a way to search (except than tag and time frame).
To inject log files containing:
```
INFO the message
WARN another message
```
use config like:
```
[[inputs.logparser]]
files = ["/tmp/x*.log", "/tmp/aaa.log"]
* influxdb and telegraf runs with very low priority, this could become an option
* make one partition for each service and use switch software type
* make it easier to add custom configuration (how ?)
[inputs.logparser.grok]
measurement = "logs"
patterns = ['^%{WORD:level:tag} %{GREEDYDATA:message:string}']
```
......@@ -15,7 +15,7 @@
[instance-profile]
filename = instance.cfg.in
md5sum = 461d515da03de5e422e6f75189d09184
md5sum = 8abfd8ce01c375257e94f64824478f5c
[influxdb-config-file]
filename = influxdb-config-file.cfg.in
......@@ -28,3 +28,15 @@ md5sum = a1a9c22c2a7829c66a49fc2504604d21
[grafana-config-file]
filename = grafana-config-file.cfg.in
md5sum = 8244d430905b968795c7946049bed9e3
[grafana-provisioning-config-file]
filename = grafana-provisioning-config-file.cfg.in
md5sum = 3aa0f1ed752b2a59ea2b5e7c1733daf3
[loki-config-file]
filename = loki-config-file.cfg.in
md5sum = 02ba5acf23fcf88f5594919f46838533
[promtail-config-file]
filename = promtail-config-file.cfg.in
md5sum = c77788d0a3cc654ad9393eb4b1f31e94
This diff is collapsed.
# https://grafana.com/docs/administration/provisioning/#example-datasource-config-file
apiVersion: 1
datasources:
- name: telegraf
type: influxdb
access: proxy
url: {{ influxdb['url'] }}
user: {{ influxdb['auth-username'] }}
database: telegraf
isDefault: true
jsonData:
tlsSkipVerify: true
secureJsonData:
password: {{ influxdb['auth-password'] }}
version: 1
editable: false
- name: loki
type: loki
access: proxy
url: {{ loki['url'] }}
version: 1
editable: false
......@@ -28,6 +28,11 @@
"description": "Name used in From: header of emails",
"default": "Grafana",
"type": "string"
},
"promtail-extra-scrape-config": {
"description": "Raw promtail config (experimental parameter, see https://github.com/grafana/loki/blob/master/docs/promtail.md#scrape-configs for detail)",
"default": "",
"type": "string"
}
}
}
......@@ -41,11 +41,14 @@ grafana-data-dir = ${:grafana-dir}/data
grafana-logs-dir = ${:var}/log
grafana-plugins-dir = ${:grafana-dir}/plugins
grafana-provisioning-config-dir = ${:grafana-dir}/provisioning-config
grafana-provisioning-datasources = ${:grafana-provisioning-config-dir}/datasources
grafana-provisioning-dashboards = ${:grafana-provisioning-config-dir}/dashboards
grafana-provisioning-datasources-dir = ${:grafana-provisioning-config-dir}/datasources
grafana-provisioning-dashboards-dir = ${:grafana-provisioning-config-dir}/dashboards
telegraf-dir = ${:srv}/telegraf
telegraf-extra-config-dir = ${:telegraf-dir}/extra-config
loki-dir = ${:srv}/loki
loki-storage-boltdb-dir = ${:loki-dir}/index/
loki-storage-filesystem-dir = ${:loki-dir}/chunks/
promtail-dir = ${:srv}/promtail
# macros
[generate-certificate]
......@@ -75,7 +78,11 @@ extensions = jinja2.ext.do
module = check_port_listening
name = ${:_buildout_section_name_}.py
[check-url-available-promise]
recipe = slapos.cookbook:check_url_available
path = ${directory:promise}/${:_buildout_section_name_}
dash_path = {{ dash_bin }}
curl_path = {{ curl_bin }}
[influxdb]
ipv6 = ${instance-parameter:ipv6-random}
......@@ -133,6 +140,7 @@ data-dir = ${directory:grafana-data-dir}
logs-dir = ${directory:grafana-logs-dir}
plugins-dir = ${directory:grafana-plugins-dir}
provisioning-config-dir = ${directory:grafana-provisioning-config-dir}
provisioning-datasources-dir = ${directory:grafana-provisioning-datasources-dir}
admin-user = ${grafana-password:username}
admin-password = ${grafana-password:passwd}
secret-key = ${grafana-secret-key:passwd}
......@@ -160,6 +168,15 @@ context =
section grafana grafana
section apache_frontend apache-frontend
key slapparameter_dict slap-configuration:configuration
depends =
${grafana-provisioning-config-file:rendered}
[grafana-provisioning-config-file]
<= config-file
rendered = ${grafana:provisioning-datasources-dir}/datasource.yaml
context =
section influxdb influxdb
section loki loki
[grafana-listen-promise]
<= check-port-listening-promise
......@@ -183,6 +200,53 @@ context =
section telegraf telegraf
[loki]
recipe = slapos.cookbook:wrapper
command-line =
bash -c 'nice -19 chrt --idle 0 ionice -c3 {{ loki_bin }} -config.file=${loki-config-file:rendered}'
wrapper-path = ${directory:service}/loki
storage-boltdb-dir = ${directory:loki-storage-boltdb-dir}
storage-filesystem-dir = ${directory:loki-storage-filesystem-dir}
ip = ${instance-parameter:ipv4-random}
port = 3100
url = http://${:ip}:${:port}
[loki-config-file]
<= config-file
context =
section loki loki
[loki-listen-promise]
<= check-url-available-promise
url = ${loki:url}/ready
[promtail]
recipe = slapos.cookbook:wrapper
command-line =
bash -c 'nice -19 chrt --idle 0 ionice -c3 {{ promtail_bin }} -config.file=${promtail-config-file:rendered}'
wrapper-path = ${directory:service}/promtail
dir = ${directory:promtail-dir}
http_port = 19080
ip = ${instance-parameter:ipv4-random}
url = http://${:ip}:${:http_port}
[promtail-config-file]
<= config-file
context =
section promtail promtail
section loki loki
key slapparameter_dict slap-configuration:configuration
[promtail-listen-promise]
<= check-port-listening-promise
hostname= ${promtail:ip}
port = ${promtail:http_port}
[apache-frontend]
<= slap-connection
recipe = slapos.cookbook:requestoptional
......@@ -201,6 +265,8 @@ instance-promises =
${influxdb-listen-promise:path}
${influxdb-password-promise:wrapper-path}
${grafana-listen-promise:path}
${loki-listen-promise:path}
${promtail-listen-promise:path}
[publish-connection-parameter]
......@@ -213,4 +279,6 @@ telegraf-extra-config-dir = ${telegraf:extra-config-dir}
grafana-url = ${grafana:url}
grafana-username = ${grafana:admin-user}
grafana-password = ${grafana:admin-password}
loki-url = ${loki:url}
promtail-url = ${promtail:url}
url = ${apache-frontend:connection-secure_access}
auth_enabled: false
server:
http_listen_port: {{ loki['port'] }}
ingester:
lifecycler:
address: {{ loki['ip'] }}
ring:
kvstore:
store: inmemory
replication_factor: 1
chunk_idle_period: 15m
schema_config:
configs:
- from: 2018-04-15
store: boltdb
object_store: filesystem
schema: v9
index:
prefix: index_
period: 168h
storage_config:
boltdb:
directory: {{ loki['storage-boltdb-dir'] }}
filesystem:
directory: {{ loki['storage-filesystem-dir'] }}
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
chunk_store_config:
max_look_back_period: 0
table_manager:
chunk_tables_provisioning:
inactive_read_throughput: 0
inactive_write_throughput: 0
provisioned_read_throughput: 0
provisioned_write_throughput: 0
index_tables_provisioning:
inactive_read_throughput: 0
inactive_write_throughput: 0
provisioned_read_throughput: 0
provisioned_write_throughput: 0
retention_deletes_enabled: false
retention_period: 0
# https://github.com/grafana/loki/blob/master/docs/logentry/processing-log-lines.md
server:
http_listen_port: {{ promtail['http_port'] }}
# XXX this external_url does not work ... promtail still listen on all IPs
external_url: {{ promtail['url'] }}
grpc_listen_port: 0
positions:
filename: {{ promtail['dir'] }}/positions.yaml
clients:
- url: {{ loki['url'] }}/api/prom/push
scrape_configs:
- job_name: test
static_configs:
- targets:
- localhost
labels:
job: grafanalogs
__path__: ./var/log/*log
{{ slapparameter_dict.get('promtail-extra-scrape-config', '') }}
......@@ -6,6 +6,8 @@ extends =
../../component/make/buildout.cfg
../../component/golang/buildout.cfg
../../component/openssl/buildout.cfg
../../component/curl/buildout.cfg
../../component/dash/buildout.cfg
buildout.hash.cfg
gowork.cfg
......@@ -17,6 +19,9 @@ parts =
influxdb-config-file
telegraf-config-file
grafana-config-file
grafana-provisioning-config-file
loki-config-file
promtail-config-file
[nodejs]
......@@ -31,7 +36,7 @@ md5sum = 46790033c23803387890f545e4040690
[gowork]
# All the softwares installed in the go work have "non standard" installation
# All the softwares installed in the go workspace have "non standard" installation
# methods, so we install them in specific parts with custom commands.
# They will be installed because they are dependencies of ${gowork.goinstall}
install =
......@@ -41,6 +46,8 @@ influx-bin = ${:bin}/influx
influxd-bin = ${:bin}/influxd
grafana-bin = ${:bin}/grafana-server
grafana-homepath = ${go_github.com_grafana_grafana:location}
loki-bin = ${:bin}/loki
promtail-bin = ${:bin}/promtail
# use recent go
golang = ${golang1.12:location}
......@@ -48,21 +55,26 @@ gcc-bin-directory = ${golang1.12:gcc-bin-directory}
[gowork.goinstall]
command = :
depends =
depends =
${influxdb-install:recipe}
${telegraf-install:recipe}
${grafana-install:recipe}
${loki-install:recipe}
${promtail-install:recipe}
[influxdb-install]
<= gowork.goinstall
command = bash -c ". ${gowork:env.sh} && \
go install -v github.com/golang/dep/cmd/dep && \
cd ${gowork:directory}/src/github.com/influxdata/influxdb && \
dep ensure && \
go install ./cmd/..."
update-command =
[telegraf-install]
<= gowork.goinstall
command = bash -c ". ${gowork:env.sh} && \
go install -v github.com/golang/dep/cmd/dep && \
cd ${gowork:directory}/src/github.com/influxdata/telegraf && \
${make:location}/bin/make &&
cp telegraf ${gowork:bin}"
......@@ -74,12 +86,32 @@ update-command =
command = bash -c "export PATH=${nodejs:location}/bin/:$PATH && \
. ${gowork:env.sh} && \
cd ${gowork:directory}/src/github.com/grafana/grafana && \
${gowork:golang}/bin/go run build.go setup && \
${gowork:golang}/bin/go run build.go build && \
go run build.go setup && \
go run build.go build && \
${yarn:location}/bin/yarn install --pure-lockfile && \
${nodejs:location}/bin/npm run build"
update-command =
[loki-install]
<= gowork.goinstall
# loki also uses nodejs
command = bash -c "export PATH=${nodejs:location}/bin/:$PATH && \
. ${gowork:env.sh} && \
go install -v github.com/golang/dep/cmd/dep && \
cd ${gowork:directory}/src/github.com/grafana/loki && \
go install ./cmd/loki"
update-command =
[promtail-install]
<= gowork.goinstall
# CGO_ENABLED is to disable systemd support (did not compile in my case)
command = bash -c "export CGO_ENABLED=0 && \
. ${gowork:env.sh} && \
go install -v github.com/golang/dep/cmd/dep && \
cd ${gowork:directory}/src/github.com/grafana/loki && \
go install ./cmd/promtail"
update-command =
[download-file-base]
recipe = slapos.recipe.build:download
......@@ -96,6 +128,15 @@ mode = 0644
[grafana-config-file]
<= download-file-base
[grafana-provisioning-config-file]
<= download-file-base
[loki-config-file]
<= download-file-base
[promtail-config-file]
<= download-file-base
[instance-profile]
recipe = slapos.recipe.template:jinja2
template = ${:_profile_base_location_}/${:filename}
......@@ -110,10 +151,14 @@ context =
key influx_bin gowork:influx-bin
key grafana_bin gowork:grafana-bin
key grafana_homepath gowork:grafana-homepath
key loki_bin gowork:loki-bin
key promtail_bin gowork:promtail-bin
key curl_bin :curl-bin
key dash_bin :dash-bin
key monitor_template monitor2-template:rendered
curl-bin = ${curl:location}/bin/curl
dash-bin = ${dash:location}/bin/dash
[versions]
slapos.recipe.template = 4.2
inotifyx = 0.2.2
Tests for Grafana software release
##############################################################################
#
# Copyright (c) 2018 Nexedi SA and Contributors. All Rights Reserved.
#
# WARNING: This program as such is intended to be used by professional
# programmers who take the whole responsibility of assessing all potential
# consequences resulting from its eventual inadequacies and bugs
# End users who are looking for a ready-to-use solution with commercial
# guarantees and support are strongly adviced to contract a Free Software
# Service Company
#
# This program is Free Software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 3
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
##############################################################################
from setuptools import setup, find_packages
version = '0.0.1.dev0'
name = 'slapos.test.grafana'
long_description = open("README.md").read()
setup(
name=name,
version=version,
description="Test for SlapOS' Grafana",
long_description=long_description,
long_description_content_type='text/markdown',
maintainer="Nexedi",
maintainer_email="info@nexedi.com",
url="https://lab.nexedi.com/nexedi/slapos",
packages=find_packages(),
install_requires=[
'slapos.core',
'slapos.libnetworkcache',
'erp5.util',
'requests',
'supervisor',
'psutil',
],
zip_safe=True,
test_suite='test',
)
##############################################################################
#
# Copyright (c) 2019 Nexedi SA and Contributors. All Rights Reserved.
#
# WARNING: This program as such is intended to be used by professional
# programmers who take the whole responsibility of assessing all potential
# consequences resulting from its eventual inadequacies and bugs
# End users who are looking for a ready-to-use solution with commercial
# guarantees and support are strongly adviced to contract a Free Software
# Service Company
#
# This program is Free Software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 3
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
##############################################################################
import os
import textwrap
import logging
import tempfile
import time
import requests
from slapos.testing.testcase import makeModuleSetUpAndTestCaseClass
setUpModule, SlapOSInstanceTestCase = makeModuleSetUpAndTestCaseClass(
os.path.abspath(
os.path.join(os.path.dirname(__file__), '..', 'software.cfg')))
class GrafanaTestCase(SlapOSInstanceTestCase):
"""Base test case for grafana.
Since the instances takes timte to start and stop,
we increate as lot the number of retries.
"""
report_max_retry = 30
instance_max_retry = 30
class TestGrafana(GrafanaTestCase):
def setUp(self):
self.grafana_url = self.computer_partition.getConnectionParameterDict(
)['grafana-url']
def test_grafana_available(self):
resp = requests.get(self.grafana_url, verify=False)
self.assertEqual(requests.codes.ok, resp.status_code)
def test_grafana_api(self):
# check API is usable
api_org_url = '{self.grafana_url}/api/org'.format(**locals())
resp = requests.get(api_org_url, verify=False)
self.assertEqual(requests.codes.unauthorized, resp.status_code)
connection_params = self.computer_partition.getConnectionParameterDict()
resp = requests.get(
api_org_url,
verify=False,
auth=requests.auth.HTTPBasicAuth(
connection_params['grafana-username'],
connection_params['grafana-password'],
))
self.assertEqual(requests.codes.ok, resp.status_code)
self.assertEqual(1, resp.json()['id'])
def test_grafana_datasource_povisinonned(self):
# data sources are provisionned
connection_params = self.computer_partition.getConnectionParameterDict()
resp = requests.get(
'{self.grafana_url}/api/datasources'.format(**locals()),
verify=False,
auth=requests.auth.HTTPBasicAuth(
connection_params['grafana-username'],
connection_params['grafana-password'],
))
self.assertEqual(requests.codes.ok, resp.status_code)
self.assertEqual(
sorted(['influxdb', 'loki']),
sorted([ds['type'] for ds in resp.json()]))
class TestInfluxDb(GrafanaTestCase):
def setUp(self):
self.influxdb_url = self.computer_partition.getConnectionParameterDict(
)['influxdb-url']
def test_influxdb_available(self):
ping_url = '{self.influxdb_url}/ping'.format(**locals())
resp = requests.get(ping_url, verify=False)
self.assertEqual(requests.codes.no_content, resp.status_code)
def test_influxdb_api(self):
query_url = '{self.influxdb_url}/query'.format(**locals())
connection_params = self.computer_partition.getConnectionParameterDict()
for i in range(5):
# retry, as it may take a little delay to create databases
resp = requests.get(
query_url,
verify=False,
params=dict(
q='SHOW DATABASES',
u=connection_params['influxdb-username'],
p=connection_params['influxdb-password']))
self.assertEqual(requests.codes.ok, resp.status_code)
result, = resp.json()['results']
if result['series']:
break
time.sleep(0.5 * i)
self.assertIn(
[connection_params['influxdb-database']], result['series'][0]['values'])
class TestTelegraf(GrafanaTestCase):
def test_telegraf_running(self):
with self.slap.instance_supervisor_rpc as supervisor:
all_process_info = supervisor.getAllProcessInfo()
process_info, = [p for p in all_process_info if 'telegraf' in p['name']]
self.assertEqual('RUNNING', process_info['statename'])
class TestLoki(GrafanaTestCase):
@classmethod
def getInstanceParameterDict(cls):
cls._logfile = tempfile.NamedTemporaryFile(suffix='log')
return {
'promtail-extra-scrape-config':
textwrap.dedent(
r'''
- job_name: {cls.__name__}
pipeline_stages:
- regex:
expression: "^(?P<timestamp>.*) - (?P<name>\\S+) - (?P<level>\\S+) - (?P<message>.*)"
- timestamp:
format: 2006-01-02T15:04:05Z00:00
source: timestamp
- labels:
level:
name:
static_configs:
- targets:
- localhost
labels:
job: {cls.__name__}
__path__: {cls._logfile.name}
''').format(**locals())
}
@classmethod
def tearDownClass(cls):
cls._logfile.close()
super(TestLoki, cls).tearDownClass()
def setUp(self):
self.loki_url = self.computer_partition.getConnectionParameterDict(
)['loki-url']
def test_loki_available(self):
self.assertEqual(
requests.codes.ok,
requests.get('{self.loki_url}/ready'.format(**locals()),
verify=False).status_code)
def test_log_ingested(self):
# create a logger logging to the file that we have
# configured in instance parameter.
test_logger = logging.getLogger(self.id()) # type: logging.Logger
test_handler = logging.FileHandler(filename=self._logfile.name)
test_handler.setFormatter(
logging.Formatter(
'%(asctime)s - %(name)s - %(levelname)s - %(message)s'))
test_logger.addHandler(test_handler)
test_logger.info("testing message")
test_logger.info("testing another message")
test_logger.warning("testing warn")
# Check our messages have been ingested
# we retry a few times, because there's a short delay until messages are
# ingested and returned.
for i in range(10):
resp = requests.get(
'{self.loki_url}/api/prom/query?query={{job="TestLoki"}}'.format(
**locals()),
verify=False).json()
if not resp or len(resp['streams']) < 2:
time.sleep(0.5 * i)
continue
warn_stream, = [stream for stream in resp['streams'] if 'level="WARNING"' in stream['labels']]
self.assertIn("testing warn", warn_stream['entries'][0]['line'])
info_stream, = [stream for stream in resp['streams'] if 'level="INFO"' in stream['labels']]
self.assertTrue(
[
line for line in info_stream['entries']
if "testing message" in line['line']
])
self.assertTrue(
[
line for line in info_stream['entries']
if "testing another message" in line['line']
])
# The labels we have configued are also available
resp = requests.get(
'{self.loki_url}/api/prom/label'.format(**locals()),
verify=False).json()
self.assertIn('level', resp['values'])
self.assertIn('name', resp['values'])
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment