Server Monitoring – Who Needs Engineers https://whoneedsengineers.com/wne_live Software Engineering Recruitment Sun, 04 Aug 2024 11:56:59 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 https://whoneedsengineers.com/wne_live/wp-content/uploads/2023/06/cropped-wne_logo-3-32x32.png Server Monitoring – Who Needs Engineers https://whoneedsengineers.com/wne_live 32 32 A Detailed Practical Guide to Using Prometheus for Monitoring and Alerting https://whoneedsengineers.com/a-detailed-practical-guide-to-using-prometheus-for-monitoring-and-alerting/ Sun, 04 Aug 2024 11:56:59 +0000 https://whoneedsengineers.com/wne_live/?p=9863 Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability. Developed by SoundCloud and now a part of the Cloud Native Computing Foundation, Prometheus has become a leading choice for system and application monitoring. This guide will walk you through installing, configuring, and using Prometheus effectively.

What is Prometheus?

Prometheus is a powerful system monitoring and alerting toolkit that:

  • Collects and stores metrics as time-series data.
  • Uses a powerful query language called PromQL to aggregate and query metrics.
  • Supports multiple modes of graphing and dashboarding.
  • Integrates with numerous third-party tools and services.

Getting Started with Prometheus

1. Installation and Setup

Step 1: Download Prometheus

Step 2: Install Prometheus

  • Extract the downloaded archive and navigate to the directory.
  • You should see binaries like prometheus and promtool.

Step 3: Configure Prometheus

  • Create a configuration file named prometheus.yml. Here’s an example configuration:
global:
  scrape_interval: 15s  # Set the scrape interval to 15 seconds.
  evaluation_interval: 15s  # Evaluate rules every 15 seconds.

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']  # The Prometheus server itself.

Step 4: Start Prometheus

  • Run the Prometheus server:
./prometheus --config.file=prometheus.yml
  • Access the Prometheus web UI at http://localhost:9090.

2. Collecting Metrics

Prometheus scrapes metrics from HTTP endpoints. Applications need to expose metrics in a format that Prometheus understands.

Step 1: Exporting Metrics

Example (Python)

  • Install the client library:
pip install prometheus-client
  • Instrument your application:
from prometheus_client import start_http_server, Summary
import random
import time

# Create a metric to track time spent and requests made.
REQUEST_TIME = Summary('request_processing_seconds', 'Time spent processing request')

# Decorate function with metric.
@REQUEST_TIME.time()
def process_request(t):
    time.sleep(t)

if __name__ == '__main__':
    start_http_server(8000)
    while True:
        process_request(random.random())

Step 2: Configure Prometheus to Scrape Your Application

  • Update your prometheus.yml configuration file:
scrape_configs:
  - job_name: 'python_app'
    static_configs:
      - targets: ['localhost:8000']

3. Querying Metrics with PromQL

PromQL is a powerful query language used to aggregate and retrieve time-series data.

Basic Queries

  • Instant Vector: up
  • Range Vector: up[5m]
  • Aggregation: sum(rate(http_requests_total[1m]))
  • Label Filtering: http_requests_total{job="python_app"}

Step 1: Access Prometheus UI

  • Navigate to the Graph tab in the Prometheus web UI.

Step 2: Run a Query

  • Enter a query in the query box and click “Execute”. For example:
rate(http_requests_total[5m])
  • This query calculates the per-second rate of HTTP requests over the last 5 minutes.

4. Setting Up Alerts

Prometheus allows you to define alerting rules and integrates with Alertmanager for handling alerts.

Step 1: Define Alerting Rules

  • Create a file named alert.rules.yml:
groups:
  - name: example
    rules:
      - alert: HighErrorRate
        expr: rate(http_requests_total{status="500"}[5m]) > 0.05
        for: 10m
        labels:
          severity: page
        annotations:
          summary: "High error rate detected"
          description: "Error rate is greater than 5% for the last 10 minutes."

Step 2: Configure Prometheus to Use the Alerting Rules

  • Update your prometheus.yml:
rule_files:
  - "alert.rules.yml"

Step 3: Install and Configure Alertmanager

  • Download Alertmanager from the Prometheus download page.
  • Create a configuration file for Alertmanager, alertmanager.yml:
global:
  resolve_timeout: 5m

route:
  receiver: 'email'

receivers:
  - name: 'email'
    email_configs:
      - to: 'you@example.com'
        from: 'alertmanager@example.com'
        smarthost: 'smtp.example.com:587'
        auth_username: 'alertmanager@example.com'
        auth_identity: 'alertmanager@example.com'
        auth_password: 'password'

Step 4: Start Alertmanager

  • Run Alertmanager:
./alertmanager --config.file=alertmanager.yml

Step 5: Configure Prometheus to Send Alerts to Alertmanager

  • Update your prometheus.yml:
alerting:
  alertmanagers:
    - static_configs:
        - targets: ['localhost:9093']

5. Visualizing Metrics

Prometheus does not include advanced visualization capabilities. Instead, it integrates seamlessly with Grafana for advanced dashboarding.

Step 1: Install Grafana

Step 2: Start Grafana

  • Follow the installation instructions and start the Grafana server.

Step 3: Add Prometheus as a Data Source

  • Log in to Grafana (default http://localhost:3000, admin/admin).
  • Go to “Configuration” > “Data Sources”.
  • Click “Add data source” and select “Prometheus”.
  • Configure the URL (e.g., http://localhost:9090) and save.

Step 4: Create a Dashboard

  • Go to “Dashboards” > “New Dashboard”.
  • Click “Add new panel” and use PromQL to query Prometheus metrics.
  • Customize the panel with different visualization options and save the dashboard.

]]>