howdoicomputer.lol/content/posts/homelab-2.md

1.8 KiB

title date draft summary tags
Homelab 2: Monitoring 2023-09-01T21:15:40-07:00 true How metrics and logs work for my homelab
Prometheus
Nomad
Grafana
Monitoring
Consul

Every good production environment needs robust monitoring in-place before it could be considered production and I feel like my homelab shouldn't be an exception.

With that in mind, I created a monitoring stack that I've used professionally at companies that I've worked at and I'm pretty happy with it.

stackem

The different components of the system are thus:

  • Prometheus
  • Grafana
  • Prometheus's node_exporter
  • Consul

Here is what I want to collect metrics for, from most critical to least:

  1. The base host resources. This includes available memory, CPU, disk space, network traffic, etc. Also includes the ZFS pool.
  2. Nomad.
  3. The services that run on the host via Nomad.

In essence, monitoring starts for the platform used to deliver my applications and then moves upward to the end services I expose to myself and my friends.

base system monitoring

Starting from the base, the Prometheus project supports the collection of metrics for Linux based hosts through their node_exporter project. It's a Golang binary that exposes an absolute treasure trove of data including ZFS stats! It covers my first two monitoring needs.

While I could run the node_exporter via systemd, I instead opted to use the exec driver for Nomad. There is a Nix package for installing the exporter so the job definition just relies on executing the binary itself. Doing it this way means I get visibility into the exporter process and logs via the Nomad dashboard.

everything else

Nomad and Consul both expose their own metrics and so it's easy enough to add a Prometheus stat