1.8 KiB
title | date | draft | summary | tags | |||||
---|---|---|---|---|---|---|---|---|---|
Homelab 2: Monitoring | 2023-09-01T21:15:40-07:00 | true | How metrics and logs work for my homelab |
|
Every good production environment needs robust monitoring in-place before it could be considered production and I feel like my homelab shouldn't be an exception.
With that in mind, I created a monitoring stack that I've used professionally at companies that I've worked at and I'm pretty happy with it.
stackem
The different components of the system are thus:
- Prometheus
- Grafana
- Prometheus's node_exporter
- Consul
Here is what I want to collect metrics for, from most critical to least:
- The base host resources. This includes available memory, CPU, disk space, network traffic, etc. Also includes the ZFS pool.
- Nomad.
- The services that run on the host via Nomad.
In essence, monitoring starts for the platform used to deliver my applications and then moves upward to the end services I expose to myself and my friends.
base system monitoring
Starting from the base, the Prometheus project supports the collection of metrics for Linux based hosts through their node_exporter project. It's a Golang binary that exposes an absolute treasure trove of data including ZFS stats! It covers my first two monitoring needs.
While I could run the node_exporter
via systemd, I instead opted to use the exec
driver for Nomad. There is a Nix package for installing the exporter so the job definition just relies on executing the binary itself. Doing it this way means I get visibility into the exporter process and logs via the Nomad dashboard.
everything else
Nomad and Consul both expose their own metrics and so it's easy enough to add a Prometheus stat