update

2023-03-19 13:40:33 +01:00 · 2023-03-19 13:40:33 +01:00 · 4f0c9f1f17
parent 25754ad263
commit 4f0c9f1f17
1 changed files with 119 additions and 73 deletions
--- a/prometheus_grafana/readme.md
+++ b/prometheus_grafana/readme.md
@ -27,6 +27,7 @@ Lot of the prometheus stuff here is based off the magnificent
 # Chapters

 * **[Core prometheus+grafana](#Overview)** - nice dashboards with metrics of docker host and containers
+* **[PromQL](#PromQL)** - links to various learning resources
 * **[Pushgateway](#Pushgateway)** - push data to prometheus from anywhere
 * **[Alertmanager](#Alertmanager)** - setting alerts and getting notifications
 * **[Loki](#Loki)** - prometheus for logs
@ -294,29 +295,29 @@ the default time interval is set to 1h instead of 15m

 # PromQL

-Some concept, highlights and examples of PromQL.
+Some concept, highlights and examples.

-PromQL returns results as vectors"
-
-* [The official](https://prometheus.io/docs/prometheus/latest/querying/basics/) basics page, quite to the point and short
-* [Introduction to PromQL](https://blog.knoldus.com/introduction-to-promql/)
-* [relatively short video to the point](https://youtu.be/yLPTHinHB6Y)
-* [Prometheus Cheat Sheet - How to Join Multiple Metrics](https://iximiuz.com/en/posts/prometheus-vector-matching/)
-* [decent stackoverflow answer](https://stackoverflow.com/questions/68223824/prometheus-instant-vector-vs-range-vector)
+* [Stackoverflow - Prometheus instant vector vs range vector](https://stackoverflow.com/questions/68223824/prometheus-instant-vector-vs-range-vector)
+* [Short video](https://youtu.be/yLPTHinHB6Y)
+* [Prometheus Cheat Sheet - Basics \(Metrics, Labels, Time Series, Scraping\)](https://iximiuz.com/en/posts/prometheus-metrics-labels-time-series/)
+* [Learning Prometheus and PromQL - Learning Series](https://iximiuz.com/en/series/learning-prometheus-and-promql/)
+* [The official](https://prometheus.io/docs/prometheus/latest/querying/basics/)

+One thing to get from these is what kind of data a query in PromQL returns.
+Instant verctor vs range vector.

 ---
 ---

 # Pushgateway

-Gives freedom to push information in to prometheus from anywhere.
+Gives freedom to **push** information in to prometheus from **anywhere**.<bt>

-## The setup
+### The setup

-To add pushgateway functionality to the current stack:
+To **add** pushgateway functionality to the current stack:

-* New container `pushgateway` added to the compose file.
+* **New container** `pushgateway` added to the **compose** file.

  <details>
  <summary>docker-compose.yml</summary>
@ -342,7 +343,8 @@ To add pushgateway functionality to the current stack:
  ```
  </details>

-* Adding pushgateway to the Caddyfile of the reverse proxy so that it can be reached at `https://push.example.com`<br>
+* Adding pushgateway to the **Caddyfile** of the reverse proxy so that
+  it can be reached at `https://push.example.com`<br>

  <details>
  <summary>Caddyfile</summary>
@ -354,7 +356,7 @@ To add pushgateway functionality to the current stack:
  ```
  </details>  

-* Adding pushgateway's scrape point to `prometheus.yml`<br>
+* Adding pushgateway's **scrape point** to `prometheus.yml`<br>

  <details>
  <summary>prometheus.yml</summary>
@ -372,7 +374,7 @@ To add pushgateway functionality to the current stack:
  ```
  </details>

-## The basics
+### The basics

 ![veeam-dash](https://i.imgur.com/TOuv9bM.png)

@ -386,16 +388,20 @@ Now in grafana, in **Explore** section you should see some results
 when quering for `some_metric`.

 The metrics sit on the pushgateway **forever**, unless deleted or container
-shuts down. Prometheus will not remove the metrics from it after scraping,
-it will keep scraping the pushgateway and store the value with the time of
-scraping.
+shuts down. **Prometheus will not remove** the metrics from it **after scraping**,
+it will keep scraping the pushgateway and store the value that sits there with
+the time of scraping.

-To wipe the pushgateway clean<br>
+To **wipe** the pushgateway clean<br>
 `curl -X PUT https://push.example.com/api/v1/admin/wipe`

-More on pushgateway setup, with the real world use to monitor backups,
-along with pushing metrics from windows in powershell - 
-[**Veeam Prometheus Grafana**](https://github.com/DoTheEvo/veeam-prometheus-grafana)<br>
+### The real world use
+
+[**Veeam Prometheus Grafana - guide-by-example**](https://github.com/DoTheEvo/veeam-prometheus-grafana) 
+
+Linked above is much more on **pushgateway setup**,
+a real world use to **monitor backups**, along with **pushing metrics
+from windows** in powershell.<br>

 ![veeam-dash](https://i.imgur.com/dUyzuyl.png)

@ -404,18 +410,18 @@ along with pushing metrics from windows in powershell -

 # Alertmanager

-To send a notification about some metric breaching some preset condition.<br>
-Notifications chanels set here will be email and
-[ntfy](https://github.com/DoTheEvo/selfhosted-apps-docker/tree/master/gotify-ntfy-signal) 
+To send a **notification** about some **metric** breaching some preset **condition**.<br>
+Notifications **chanels** set here will be **email** and
+[**ntfy**](https://github.com/DoTheEvo/selfhosted-apps-docker/tree/master/gotify-ntfy-signal) 

 ![alert](https://i.imgur.com/b4hchSu.png)

 ## The setup

-To add alertmanager to the current stack:
+To **add** alertmanager to the current stack:

-* New file - `alertmanager.yml` will be bind mounted in alertmanager container.<br>
-  This file contains configuration on how and where to deliver alerts.<br>
+* **New file** - `alertmanager.yml` will be **bind mounted** in alertmanager container.<br>
+  This is the **configuration** on how and where **to deliver** alerts.<br>

  <details>
  <summary>alertmanager.yml</summary>
@ -441,8 +447,8 @@ To add alertmanager to the current stack:
  ```
  </details>

-* New file - `alert.rules` will be mounted in to prometheus container<br>
-  This file defines which value of some metric becomes an alert event.
+* **New file** - `alert.rules` will be **bind mounted** in to prometheus container<br>
+  This file **defines** at what value a metric becomes an **alert** event.

  <details>
  <summary>alert.rules</summary>
@ -461,8 +467,8 @@ To add alertmanager to the current stack:
  ```
  </details>

-* Changed `prometheus.yml`. Added `alerting` section that points to alertmanager
-  container, and also set is a path to a `rules` file.
+* **Changed** `prometheus.yml`. Added **alerting section** that points to alertmanager
+  container, and also **set path** to a `rules` file.

  <details>
  <summary>prometheus.yml</summary>
@ -497,8 +503,8 @@ To add alertmanager to the current stack:
  ```
  </details>

-* New container - `alertmanager` added to the compose file and prometheus
-  container has bind mount rules file added.
+* **New container** - `alertmanager` added to the compose file and **prometheus
+  container** has bind mount **rules file** added.

  <details>
    <summary>docker-compose.yml</summary>
@ -555,9 +561,10 @@ To add alertmanager to the current stack:
  ```
  </details>

-* Adding alertmanager to the Caddyfile of the reverse proxy so that it can be reached
-  at `https://alert.example.com`. Not really necessary, but useful as it allows
-  to send alerts from anywhere, not just from prometheus.
+* **Adding** alertmanager to the **Caddyfile** of the reverse proxy so that
+  it can be reached at `https://alert.example.com`. **Not necessary**,
+  but useful as it **allows to send alerts from anywhere**,
+  not just from prometheus, or other containers on same docker network.

  <details>
  <summary>Caddyfile</summary>
@ -574,15 +581,15 @@ To add alertmanager to the current stack:
 ![alert](https://i.imgur.com/C7g0xJt.png)


-Once above setup is done an alert about low disk space should fire and notification
-email should come.<br>
-In `alertmanager.yml` switch from email to ntfy can be done.
+Once above setup is done **an alert** about low disk space **should fire**
+and a **notification** email should come.<br>
+In `alertmanager.yml` switch from email **to ntfy** can be done.

 *Useful*

-* alert from anywhere using curl:<br>
+* **alert** from anywhere using **curl**:<br>
  `curl -H 'Content-Type: application/json' -d '[{"labels":{"alertname":"blabla"}}]' https://alert.example.com/api/v1/alerts`
-* reload rules:<br>
+* **reload rules**:<br>
  `curl -X POST https://prom.example.com/-/reload`

 [stefanprodan/dockprom](https://github.com/stefanprodan/dockprom#define-alerts)
@ -653,9 +660,9 @@ A **minecraft server** and a **caddy revers proxy**, both docker containers.
    * **URL** changed for this setup.
    * **Compactor** section is added, to have control over
      [data retention.](https://grafana.com/docs/loki/latest/operations/storage/retention/)
-    * **Fixing** error - *"too many outstanding requests"*, source
-      [here.](https://github.com/grafana/loki/issues/5123)
-      It turn's off parallelism, both split by time interval and shards split.
+    * **Fixing** error - *"too many outstanding requests"*, discussion
+      [here.](https://github.com/grafana/loki/issues/5123)<br>
+      It turns off parallelism, both split by time interval and shards split.

  <details>
  <summary>loki-config.yml</summary>
@ -1101,7 +1108,6 @@ Templates resources
 * [Overview of Grafana Alerting and Message Templating for Slack](https://faun.pub/overview-of-grafana-alerting-and-message-templating-for-slack-6bb740ec44af)
 * [youtube - Unified Alerting Grafana 8 | Prometheus | Victoria | Telegraf | Notifications | Alert Templating](https://youtu.be/UtmmhLraSnE)
 * [Dot notation](https://www.practical-go-lessons.com/chap-32-templates#dot-notation)
-* 

 ---
 ---
@ -1127,7 +1133,7 @@ of all the http/https **traffic** that goes in. So focus on monitoring this

 **Requirements** - grafana, prometheus, loki, caddy container

-## Metrics - Prometheus
+## Caddy - Metrics - Prometheus

 ![logo](https://i.imgur.com/6QdZuVR.png)

@ -1218,14 +1224,15 @@ to what **service**,.. well for that monitoring of **access logs** is needed.
 ---
 ---

-## Logs - Loki
+## Caddy - Logs - Loki
+
+![logs_dash](https://i.imgur.com/j9CcJ44.png)

 **Loki** itself just **stores** the logs. To get them to Loki a **Promtail** container is used
 that has **access** to caddy's **logs**. Its job is to **scrape** them regularly, maybe
 **process** them in some way, and then **push** them to Loki.<br>
 Once there, a basic grafana **dashboard** can be made.

-![logs_dash](https://i.imgur.com/j9CcJ44.png)

 ### The setup

@ -1346,7 +1353,7 @@ Once there, a basic grafana **dashboard** can be made.
  [**access logs**](https://caddyserver.com/docs/caddyfile/directives/log).
  Unfortunetly this **can't be globally** enabled, so the easiest way seems to be 
  to create a **logging** [**snippet**](https://caddyserver.com/docs/caddyfile/concepts#snippets)
-  and copy paste the **import line** in to every site block.
+  called `log_common` and copy paste the **import line** in to every site block.

  <details>
  <summary>Caddyfile</summary>
@ -1373,18 +1380,20 @@ Once there, a basic grafana **dashboard** can be made.
 * at this points logs should be visible and **explorable in grafana**<br>
  Explore > `{job="caddy_access_log"} |= "" | json`

-## Geoip
+### Geoip

-Promtail got recently a geoip stage. One can feed an IP address and an mmdb geoIP 
-datbase and it adds geoip labels to the log entry.
+![geoip_info](https://i.imgur.com/f4P8ydl.png)
+
+**Promtail** got recently a **geoip stage**. One can feed it an **IP address** and an mmdb **geoIP 
+datbase** and it adds geoip **labels** to the log entry.

 [The official documentation.](https://github.com/grafana/loki/blob/main/docs/sources/clients/promtail/stages/geoip.md)

-* Register account on [maxmind.com](https://www.maxmind.com/en/geolite2/signup).
-* Download mmdb format database, either
+* **Register** a free account on [maxmind.com](https://www.maxmind.com/en/geolite2/signup).
+* **Download** one of the mmdb format **databases**
  * `GeoLite2 City` - 70MB full geoip info - city, postal code, time zone, latitude/longitude,..
  * `GeoLite2 Country` 6MB, just country and continent
-* Bind mount whichever database in to promtail container.
+* **Bind mount** whichever database in to **promtail container**.

  <details>
  <summary>docker-compose.yml</summary>
@ -1428,9 +1437,9 @@ datbase and it adds geoip labels to the log entry.
      external: true
  ```

-* In promtail config add json stage where IP address is loaded in to a variable,
-  which then is used in geoip stage.
-  If all is done correctly, the geoip labels are automaticly added to the log entry.
+* In **promtail** config, **json stage** is added where IP address is loaded in to
+  a **variable** called `remote_ip`, which then is used in **geoip stage**.
+  If all else is set correctly, the geoip **labels** are automaticly added to the log entry.

  <details>
  <summary>geoip promtail-config.yml</summary>
@ -1466,19 +1475,21 @@ datbase and it adds geoip labels to the log entry.
 Can be tested with opera build in VPN, or some online 
 [site tester](https://pagespeed.web.dev/).

-![geoip_info](https://i.imgur.com/f4P8ydl.png)
+### Dashboard

-## dashboard
+![pane1](https://i.imgur.com/hW92sLO.png)

-* **new pane**, will be **time series** graph showing **logs volume** in time
+* **new pane**, will be **time series** graph showing **Subdomains hits timeline**

+  * Graph type = Time series
  * Data source = Loki
  * switch from builder to code<br>
    `sum(count_over_time({job="caddy_access_log"} |= "" | json [1m])) by (request_host)`
-  * Transform > Rename by regex > Match = `\{request_host="(.*)"\}`; Replace = $1
  * Query options > Min interval = 1m
-  * Graph type = Time series
-  * Title = "Access timeline"
+  * Transform > Rename by regex 
+    * Match = `\{request_host="(.*)"\}`
+    * Replace = `$1`
+  * Title = "Subdomains hits timeline"
  * Transparent
  * Tooltip mode = All
  * Tooltip values sort order = Descending
@ -1487,19 +1498,56 @@ Can be tested with opera build in VPN, or some online
  * Graph style = Bars
  * Fill opacity = 50

+![pane2](https://i.imgur.com/KYZdotg.png)
+
 * Add **another pane**, will be a **pie chart**, showing **subdomains** divide

+  * Graph type = Pie chart
  * Data source = Loki
  * switch from builder to code<br>
    `sum(count_over_time({job="caddy_access_log"} |= "" | json [$__range])) by (request_host)`
-  * Transform > Rename by regex > Match = `\{request_host="(.*)"\}`; Replace = $1
-  * Graph type = Pie chart
-  * Title = "Subdomains divide"
+  * Query options > Min interval = 1m
+  * Transform > Rename by regex
+    * Match = `\{request_host="(.*)"\}`
+    * Replace = `$1`
+  * Title = "Subdomains use"
  * Transparent
  * Legen Placement = Right
-  * Value = Total
-  * Graph style = Bars
+  * Value = Last

+![pane3](https://i.imgur.com/MjbLVlJ.png)
+
+* Add **another pane**, will be a **Geomap**, showing location of machine accessing
+  Caddy
+
+  * Graph type = Geomap
+  * Data source = Loki
+  * switch from builder to code<br>
+    `{job="caddy_access_log"} |= "" | json`
+  * Query options > Min interval = 1m
+  * Transform > Extract fields
+    * Source = labels
+    * Format = JSON
+    * 1. Field = geoip_location_latitude; Alias = latitude
+    * 2. Field = geoip_location_longitude; Alias = longitude
+  * Title = "Geomap"
+  * Transparent
+  * Map view > View > *Drag and zoom around* > Use current map setting
+
+* Add **another pane**, will be a **pie chart**, showing **IPs** that hit the most
+
+  * Graph type = Pie chart
+  * Data source = Loki
+  * switch from builder to code<br>
+    `sum(count_over_time({job="caddy_access_log"} |= "" | json [$__range])) by (request_remote_ip)`
+  * Query options > Min interval = 1m
+  * Transform > Rename by regex
+    * Match = `\{request_remote_ip="(.*)"\}`
+    * Replace = `$1`
+  * Title = "IPs by number of requests"
+  * Transparent
+  * Legen Placement = Right
+  * Value = Last or Total
  
 * Add **another pane**, this will be actual **log view**

@ -1511,10 +1559,8 @@ Can be tested with opera build in VPN, or some online
  * Deduplication - Exact or Signature
  * Save

-useful resources
+![pane3](https://i.imgur.com/bzE6JEg.png)

-* [Unified Alerting Grafana 8 | Prometheus | Notifications | Alert Templating](https://www.youtube.com/watch?v=UtmmhLraSnE)<br>
-  Even if its for v8, it's decently useful

 # Update