Add Grafana Service writeup
This commit is contained in:
parent
f56e1ed11b
commit
521eb692ad
@ -19,7 +19,7 @@ Longer term, regional replicas of the DN42 site may be provided however this is
|
||||
|
||||
## Looking Glass
|
||||
|
||||
[lg.burble.com](https://lg.burble.com) (public internet link)
|
||||
[lg.burble.com](https://lg.burble.com) (public internet link)
|
||||
[lg.burble.dn42](https://lg.burble.dn42) (dn42 link)
|
||||
|
||||
The burble.dn42 looking glass is based on [bird-lg](https://github.com/sileht/bird-lg) with patches by
|
||||
|
BIN
user/pages/01.home/grafana-service/DN42 Monitoring 190524.png
Normal file
BIN
user/pages/01.home/grafana-service/DN42 Monitoring 190524.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 72 KiB |
75
user/pages/01.home/grafana-service/default.md
Normal file
75
user/pages/01.home/grafana-service/default.md
Normal file
@ -0,0 +1,75 @@
|
||||
Details of the burble.dn42 hosted Grafana service.
|
||||
|
||||
===
|
||||
|
||||
# Hosted Grafana Service
|
||||
|
||||
|Host / URL|Service|
|
||||
|:--|:--|
|
||||
|[http://grafana.burble.dn42](http://grafana.burble.dn42)|Grafana Dashboards (dn42 link)|
|
||||
|[https://grafana.burble.com](https://grafana.burble.com)|Grafana Dashboards (public internet link)|
|
||||
|influx.burble.dn42:8086|InfluxDB Endpoint|
|
||||
|
||||
The hosted grafana service provides an [InfluxDB](https://www.influxdata.com/) and
|
||||
[Grafana](https://grafana.com/) combination for storing and displaying stats and metrics.
|
||||
The service can accept metrics from any source that is able to
|
||||
[publish](https://docs.influxdata.com/influxdb/v1.7/supported_protocols/) to the InfluxDB, including
|
||||
[Prometheus](https://prometheus.io/) and
|
||||
[Telegraf](https://www.influxdata.com/time-series-platform/telegraf/).
|
||||
|
||||
To apply for an account, contact dn42@burble.com.
|
||||
|
||||
Accounts are provided with a dedicated database and Grafana organisation
|
||||
allowing users to create and manage their own graphs and dashboards as required. The Influx
|
||||
database will store up to 1 year of data with a minimum interval of 1 minute.
|
||||
|
||||
The grafana service is hosted on dn42-fr-rbx1.burble.dn42. Service users are encouraged to peer
|
||||
directly with the service node in order to lower latencies and avoid sending large amounts of
|
||||
data through other nodes in DN42.
|
||||
|
||||
# DN42 Infrastructure Monitoring
|
||||
|
||||
The burble.dn42 network provides monitoring and alerting of key DN42 infrastructure.
|
||||
The monitoring service logs metrics to the hosted grafana service, and presents alerts to
|
||||
the #dn42-bots channel and slack. Two monitoring nodes hosted in separate regions ensure that
|
||||
alerts will be generated if the main monitoring node fails.
|
||||
|
||||
The monitoring architecture is detailed below:
|
||||
|
||||

|
||||
|
||||
#### Nodes
|
||||
|
||||
The main monitoring node is hosted on dn42-de-fra1, with a secondary backup node on dn42-us-nyc1.
|
||||
Both nodes monitor the availability of services on each other and are capable of alerting if the
|
||||
peer node is unavailable.
|
||||
|
||||
#### Presentation
|
||||
|
||||
Metrics collected by the service are presented as public graphs in the burble.dn42 grafana service
|
||||
(see above).
|
||||
|
||||
#### Alerting
|
||||
|
||||
AlertManager is configured as a cluster, operating across both monitoring nodes. Alerts are
|
||||
published in real time to the #dn42-bots hackint IRC channel (using
|
||||
[alertmanager-irc-relay](https://github.com/google/alertmanager-irc-relay) and
|
||||
burble.dn42/dn42-alerts channel in slack.
|
||||
|
||||
Alerts typically fire when a problem occurs for 5 minutes or longer.
|
||||
|
||||
#### Collection and Storage
|
||||
|
||||
Prometheus is used to collect metrics from the various probes and publish them to the hosted Influx
|
||||
database. Typically metrics are collected every minute, although this is reduced to every five minutes
|
||||
for the clearnet DN42 services to avoid excessive load.
|
||||
|
||||
The main node for data collection is monitor.de-fra1.burble.dn42
|
||||
|
||||
#### Probes
|
||||
|
||||
|||
|
||||
|:--|:--|
|
||||
|[blackbox_exporter](https://github.com/prometheus/blackbox_exporter)|Used to ping hosts or query services (e.g. HTTP/s probes)|
|
||||
|[netdata](https://github.com/netdata/netdata)|Used to collect many host system metrics|
|
||||
|[dn42promsrv](https://git.burble.com/burble.dn42/dn42promsrv)|Custom scripts for DN42 specific probdes|
|
@ -10,6 +10,12 @@ A log of changes to the burble.dn42 network.
|
||||
|
||||
## burble.dn42 Maintenance Log
|
||||
|
||||
#### 24th May 2019
|
||||
|
||||
Moved and extended the DN42 monitoring so that it is more independent and also clustered.
|
||||
|
||||
A writeup of the hosted grafana service and monitoring is available [here](/home/grafana-services).
|
||||
|
||||
#### 21st May 2019
|
||||
|
||||
dn42-uk-lon1 is back again after being out of action for the day.
|
||||
|
Loading…
x
Reference in New Issue
Block a user