Fixups
This commit is contained in:
parent
521eb692ad
commit
32279b7780
@ -115,10 +115,28 @@ Please mail [dn42@burble.com](mailto:dn42@burble.com) for further details.
|
|||||||
|
|
||||||
## Network Status and Reporting
|
## Network Status and Reporting
|
||||||
|
|
||||||
|
### Hosted Grafana Service
|
||||||
|
|
||||||
|
[http://grafana.burble.dn42](http://grafana.burble.dn42) dn42 link
|
||||||
|
[https://grafana.burble.com](https://grafana.burble.com) public internet link
|
||||||
|
|
||||||
|
The hosted grafana service has it's own page [here](/home/grafana-service).
|
||||||
|
|
||||||
|
### DN42 Infrastructure Monitoring
|
||||||
|
|
||||||
|
burble.dn42 hosts monitoring and alerting of key DN42 services, see the
|
||||||
|
[hosted grafana service](/home/grafana-service) for more details.
|
||||||
|
|
||||||
|
### burble.dn42 status
|
||||||
|
|
||||||
[dn42.status.burble.com](https://dn42.status.burble.com/)
|
[dn42.status.burble.com](https://dn42.status.burble.com/)
|
||||||
|
|
||||||
Each node in the network is monitored by [UptimeRobot](https://uptimerobot.com/) with alerts if a node becomes unavailable.
|
Each node in the network is monitored by [UptimeRobot](https://uptimerobot.com/) with alerts
|
||||||
|
if a node becomes unavailable.
|
||||||
|
|
||||||
Internally, nodes are measured by [netdata](https://github.com/netdata/netdata) which provides a real time view of each node. [prometheus](https://prometheus.io/) is then used to collect and store that data for historical reporting. [grafana](https://grafana.com/) is used for visualisation, but this is not currently a public service.
|
Internally, nodes are measured by [netdata](https://github.com/netdata/netdata) which provides
|
||||||
|
a real time view of each node. [prometheus](https://prometheus.io/) is then used to collect and
|
||||||
|
store that data for historical reporting. [grafana](https://grafana.com/) is used for
|
||||||
|
visualisation. Some public graphs are available on the [hosted grafana service](/home/grafana-service).
|
||||||
|
|
||||||
Syslogs are exported in real time to a central logging node on the internal network.
|
Syslogs are exported in real time to a central logging node on the internal network.
|
@ -1,17 +1,23 @@
|
|||||||
|
---
|
||||||
|
title: Hosted Grafana
|
||||||
|
visible: true
|
||||||
|
---
|
||||||
|
|
||||||
Details of the burble.dn42 hosted Grafana service.
|
Details of the burble.dn42 hosted Grafana service.
|
||||||
|
|
||||||
===
|
===
|
||||||
|
|
||||||
# Hosted Grafana Service
|
## Hosted Grafana Service
|
||||||
|
|
||||||
|Host / URL|Service|
|
|Host / URL|Service|
|
||||||
|:--|:--|
|
|:--|:--|
|
||||||
|[http://grafana.burble.dn42](http://grafana.burble.dn42)|Grafana Dashboards (dn42 link)|
|
|[http://grafana.burble.dn42/](http://grafana.burble.dn42/)|Grafana Dashboards (dn42 link)|
|
||||||
|[https://grafana.burble.com](https://grafana.burble.com)|Grafana Dashboards (public internet link)|
|
|[https://grafana.burble.com/](https://grafana.burble.com/)|Grafana Dashboards (public internet link)|
|
||||||
|influx.burble.dn42:8086|InfluxDB Endpoint|
|
|influx.burble.dn42:8086|InfluxDB Endpoint|
|
||||||
|
|
||||||
|
|
||||||
The hosted grafana service provides an [InfluxDB](https://www.influxdata.com/) and
|
The hosted grafana service provides an [InfluxDB](https://www.influxdata.com/) and
|
||||||
[Grafana](https://grafana.com/) combination for storing and displaying stats and metrics.
|
[Grafana](https://grafana.com/) combination for storing and displaying stats and metrics.
|
||||||
The service can accept metrics from any source that is able to
|
The service can accept metrics from any source that is able to
|
||||||
[publish](https://docs.influxdata.com/influxdb/v1.7/supported_protocols/) to the InfluxDB, including
|
[publish](https://docs.influxdata.com/influxdb/v1.7/supported_protocols/) to the InfluxDB, including
|
||||||
[Prometheus](https://prometheus.io/) and
|
[Prometheus](https://prometheus.io/) and
|
||||||
@ -27,9 +33,9 @@ The grafana service is hosted on dn42-fr-rbx1.burble.dn42. Service users are enc
|
|||||||
directly with the service node in order to lower latencies and avoid sending large amounts of
|
directly with the service node in order to lower latencies and avoid sending large amounts of
|
||||||
data through other nodes in DN42.
|
data through other nodes in DN42.
|
||||||
|
|
||||||
# DN42 Infrastructure Monitoring
|
## DN42 Infrastructure Monitoring
|
||||||
|
|
||||||
The burble.dn42 network provides monitoring and alerting of key DN42 infrastructure.
|
The burble.dn42 network hosts monitoring and alerting of key DN42 infrastructure.
|
||||||
The monitoring service logs metrics to the hosted grafana service, and presents alerts to
|
The monitoring service logs metrics to the hosted grafana service, and presents alerts to
|
||||||
the #dn42-bots channel and slack. Two monitoring nodes hosted in separate regions ensure that
|
the #dn42-bots channel and slack. Two monitoring nodes hosted in separate regions ensure that
|
||||||
alerts will be generated if the main monitoring node fails.
|
alerts will be generated if the main monitoring node fails.
|
||||||
@ -40,7 +46,7 @@ The monitoring architecture is detailed below:
|
|||||||
|
|
||||||
#### Nodes
|
#### Nodes
|
||||||
|
|
||||||
The main monitoring node is hosted on dn42-de-fra1, with a secondary backup node on dn42-us-nyc1.
|
The main monitoring node is hosted on dn42-de-fra1, with a secondary backup node on dn42-us-nyc1.
|
||||||
Both nodes monitor the availability of services on each other and are capable of alerting if the
|
Both nodes monitor the availability of services on each other and are capable of alerting if the
|
||||||
peer node is unavailable.
|
peer node is unavailable.
|
||||||
|
|
||||||
@ -51,8 +57,8 @@ Metrics collected by the service are presented as public graphs in the burble.dn
|
|||||||
|
|
||||||
#### Alerting
|
#### Alerting
|
||||||
|
|
||||||
AlertManager is configured as a cluster, operating across both monitoring nodes. Alerts are
|
AlertManager is configured as a cluster, operating across both monitoring nodes.
|
||||||
published in real time to the #dn42-bots hackint IRC channel (using
|
Alerts are published in real time to the #dn42-bots hackint IRC channel (using
|
||||||
[alertmanager-irc-relay](https://github.com/google/alertmanager-irc-relay) and
|
[alertmanager-irc-relay](https://github.com/google/alertmanager-irc-relay) and
|
||||||
burble.dn42/dn42-alerts channel in slack.
|
burble.dn42/dn42-alerts channel in slack.
|
||||||
|
|
||||||
@ -61,7 +67,8 @@ Alerts typically fire when a problem occurs for 5 minutes or longer.
|
|||||||
#### Collection and Storage
|
#### Collection and Storage
|
||||||
|
|
||||||
Prometheus is used to collect metrics from the various probes and publish them to the hosted Influx
|
Prometheus is used to collect metrics from the various probes and publish them to the hosted Influx
|
||||||
database. Typically metrics are collected every minute, although this is reduced to every five minutes
|
database.
|
||||||
|
Typically metrics are collected every minute, although this is reduced to every five minutes
|
||||||
for the clearnet DN42 services to avoid excessive load.
|
for the clearnet DN42 services to avoid excessive load.
|
||||||
|
|
||||||
The main node for data collection is monitor.de-fra1.burble.dn42
|
The main node for data collection is monitor.de-fra1.burble.dn42
|
||||||
@ -72,4 +79,4 @@ The main node for data collection is monitor.de-fra1.burble.dn42
|
|||||||
|:--|:--|
|
|:--|:--|
|
||||||
|[blackbox_exporter](https://github.com/prometheus/blackbox_exporter)|Used to ping hosts or query services (e.g. HTTP/s probes)|
|
|[blackbox_exporter](https://github.com/prometheus/blackbox_exporter)|Used to ping hosts or query services (e.g. HTTP/s probes)|
|
||||||
|[netdata](https://github.com/netdata/netdata)|Used to collect many host system metrics|
|
|[netdata](https://github.com/netdata/netdata)|Used to collect many host system metrics|
|
||||||
|[dn42promsrv](https://git.burble.com/burble.dn42/dn42promsrv)|Custom scripts for DN42 specific probdes|
|
|[dn42promsrv](https://git.burble.com/burble.dn42/dn42promsrv)|Custom collector for DN42 specific probes|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user