44 lines
1.8 KiB
Markdown
44 lines
1.8 KiB
Markdown
|
---
|
||
|
title: "Homelab 2: Monitoring"
|
||
|
date: 2023-09-01T21:15:40-07:00
|
||
|
draft: true
|
||
|
summary: How metrics and logs work for my homelab
|
||
|
tags:
|
||
|
- Prometheus
|
||
|
- Nomad
|
||
|
- Grafana
|
||
|
- Monitoring
|
||
|
- Consul
|
||
|
---
|
||
|
|
||
|
Every good production environment needs robust monitoring in-place before it could be considered production and I feel like my homelab shouldn't be an exception.
|
||
|
|
||
|
With that in mind, I created a monitoring stack that I've used professionally at companies that I've worked at and I'm pretty happy with it.
|
||
|
|
||
|
## stackem
|
||
|
|
||
|
The different components of the system are thus:
|
||
|
|
||
|
* Prometheus
|
||
|
* Grafana
|
||
|
* Prometheus's node_exporter
|
||
|
* Consul
|
||
|
|
||
|
Here is what I want to collect metrics for, from most critical to least:
|
||
|
|
||
|
1. The base host resources. This includes available memory, CPU, disk space, network traffic, etc. Also includes the ZFS pool.
|
||
|
2. Nomad.
|
||
|
4. The services that run on the host via Nomad.
|
||
|
|
||
|
In essence, monitoring starts for the platform used to deliver my applications and then moves upward to the end services I expose to myself and my friends.
|
||
|
|
||
|
## base system monitoring
|
||
|
|
||
|
Starting from the base, the Prometheus project supports the collection of metrics for Linux based hosts through their [node_exporter](https://github.com/prometheus/node_exporter) project. It's a Golang binary that exposes an absolute treasure trove of data including ZFS stats! It covers my first two monitoring needs.
|
||
|
|
||
|
While I could run the `node_exporter` via systemd, I instead opted to use the `exec` driver for Nomad. There is a Nix package for installing the exporter so the job definition just relies on executing the binary itself. Doing it this way means I get visibility into the exporter process and logs via the Nomad dashboard.
|
||
|
|
||
|
## everything else
|
||
|
|
||
|
Nomad and Consul both expose their own metrics and so it's easy enough to add a Prometheus stat
|