Installation#
Prometheus & the related components (eg: pushgateway, blackbox_exporter, alertmanager) - scrape engine - it’s better to look at the
README.md
andDockerfile
on their github repo to get the information about the docker setupGrafana - for visualization; drawing graphs
Node Exporter (via package manager) - Centos7 (need epel repo); Ubuntu20.04 - monitor VM’s states
cAdvisor - monitor docker host’s states
kube-prometheus-stack (eks) - included grafana and prometheus
References#
- docker-compose.yml for prometheus and node exporter
- grafana.ini
- prometheus.yml
- alertmanager.yml
- web-config.yml - tls and basic auth settings - pass
--web.config.file
when starting the binary. This works on any prometheus related products eg: prometheus, node_exporter, alertmanager - prometheus query functions
Grafana Dashboards#
Prometheus#
- Customise the storage config
- Setup an alert
- Setup federation - a primary prometheus will work as an aggregator to collect metrics from other prometheus instances. This can reduce the loading on a single prometheus server.
- Setup recording rules
- Setup node-exporter with push gatewaty - but this isn’t recommended. The reasons are in here
- Snapshot (backup)
- EC2 host discovery - use for dynamically adding the ec2 instances as the scrape targets
Thanos#
To archive the HA, third-party solutions like thanos will be needed.
These two articles are very good introduction of the concept of thanos
Also, there is a hand-on lab provided by killercoda which is recommended by thanos
This shows the architecture of thanos.
The thanos gateway works as a proxy on each prometheus instance (sidecar) to communicate to other thanos components. Also, data can be optionally saved to object stores (like aws s3 / mino) for long-term storage as well.
Queries will add an additional layer to the system. Instead of obtaining the data directly from prometheus, the client (grafana / end user) will now talk to the thanos querier instead. The queries will
- deduplicate the data they get from the replicas
- combine the data from the long term object store and shards
Thus, with the queries, prometheus instances can logically group into different shards / replicas.
Grafana#
- Provision dashboards and Data sources
- But we can’t provision the users, orgs, alerts
- Add prometheus to grafana
- Import or export a dashboard
- Backup grafana
- Create repeated row / panel
- Instead of creating a field overriding, we can choose the series color quickly by just clicking the color bar next to the series
- HA in Grafana - setup database as a backend to share the persistent data