An example of Prometheus data:
http_request_total{method="GET", endpoint="/contact-us", status="200"} 1 2 3 4 5
http_request_total{method="POST", endpoint="/auth", status="400"} 1 6 8 10 15
Key Concepts#
- Metric: Quantity measurement (e.g.:
http_request_total
) - Metric label: Metadata for the measurement (e.g.:
method="GET"
) - Sample: Data point at a certain time (e.g.:
5
) - float64 - Series: Unique combination of metric labels (e.g.:
http_request_total{method="GET", endpoint="/contact-us", status="200"}
andhttp_request_total{method="POST", endpoint="/auth", status="400"}
) - Time series: Samples over time (e.g.:
1 2 3 4 5
)
Data Types#
- Instant vector:
http_request_total{method="GET"}
- Range vector:
http_request_total{method="GET"}[5m]
- Scalar:
numbers
Metric Types#
Prometheus supports four metric types:
- Gauge: Values can go up and down (e.g.:
logged_users
) - Counter: Values can only increase (e.g.:
http_request_total
) - Histogram: Provides
<metric_name>_bucket
,<metric_name>_sum
,<metric_name>_count
. Usehistogram_quantile()
for server-side quantile calculation (e.g.:http_request_duration_seconds
) - Summary: Similar to Histogram, but quantiles are calculated client-side (application). Thus, it cannot be further aggregated.
promql#
Operator Precedence#
Prometheus supports a range of binary operators with different precedence levels. From highest to lowest precedence:
- ^
- *, /, %, atan2
- +, -
- ==, !=, <=, <, >=, >
- and, unless
- or
Modifiers#
- @ 1609746000 - pretend the query time is 1609746000
- offset 5m - pretend the query time is 5 minutes ago
Have to use right after the select (before any function call)
Vector Matching#
Vector
scalar: - Example: http_request_total / 2
Vector
Vector: - Types of matching:
- One-to-One
- One-to-Many
- Many-to-One
- Matches vectors using labels by default
- Customize matching key with ignore() or in()
- Use group_right() or group_left() for many side
- Use group_left(labels) to bring labels from one to many side
- Types of matching:
method_code:http_errors:rate5m{method="get", code="500"} 24
method_code:http_errors:rate5m{method="get", code="404"} 30
method_code:http_errors:rate5m{method="put", code="501"} 3
method_code:http_errors:rate5m{method="post", code="500"} 6
method_code:http_errors:rate5m{method="post", code="404"} 21
method:http_requests:rate5m{method="get", foo="bar"} 600
method:http_requests:rate5m{method="del", foo="bar1"} 34
method:http_requests:rate5m{method="post", foo="bar2"} 12
method_code:http_errors:rate5m{code="500"} / ignoring(code) group_left(foo) method:http_requests:rate5m
{method="get", code="500", foo="bar"} 0.04
{method="get", code="404", foo="bar"} 0.05
{method="post", code="500", foo="bar2"} 0.05
{method="post", code="404", foo="bar2"} 0.175
If no group_left(foo), foo=”bar”
will gone
Common Prometheus Functions#
changes()
: Number of changes over timetime()
: Current timestamptimestamp()
: Timestamp of the sample- Derivative and Rate:
deriv()
: gauge;rate()
,irate()
: counter
- Delta and Increase:
delta()
,idelta()
: gaugeincrease()
: counter
irate()
vsrate()
:irate()
: (last - first datapoint)/time rangerate()
: (projected end - start time datapoint)/time range
- Aggregration:
<aggregation>
: sum, count, max, min, avg, etc: Aggregates across dimensions (group by labels)<aggregation>_over_time()
: Aggregates across time (group by time)
Examples:
sum(http_request_total)
Result:
{} 9
sum_over_time(http_request_total{method="GET"}[5m])
Result:
{method="GET", endpoint="/contact-us", status="200"} 10 # 1+2+3+4+5
{method="POST", endpoint="/auth", status="400"} 25 #1+6+8+10+15
Prometheus Client Library Usage#
- Instrumentation
- Writing exporters
- Pushing metrics to Pushgateway
Storage#
- Not recommended to use NFS for storage: reference for storage
Agent Mode#
- Disables query, alert, and recording rule functions
- Scrapes metrics from target and remotely writes to other instances
- Reference
Service Discovery#
- Static: Define target servers in the config file
- *_sd_config: Use built-in configurations (e.g.: EC2, Kubernetes, file)
- Custom: Use file_sd_config. Update the file periodically.
Each scrape config can have:
- interval
- timeout
- proxy
- metrics_path
Relabeling#
- relabel_configs: Modify scrape parameters before scraping (e.g.: Blackbox exporter)
- metrics_relabel_configs: Modify data collected after scraping (e.g.: remove unwanted metrics)
Alerting in Prometheus#
- Evaluates rules, fires alerts, routes to destination
- Does not handle notifications
- Routes by matching rules with labels
- Labels: alert identity
- Annotations: longer-form description
- Annotations support templating with go lang syntax
- Reference labels in annotations can be done by
{{ $labels.foo }}
Alertmanager#
Silencing alerts use cases:
- Provisioning new servers
- Decommissioning servers
- Maintenance
Inhibiting:
- Stop a group of alerts when another alert is triggered
- Example: Cluster down alert inhibits memory or disk check alerts