Bob Gains Observability in Kubernetes

How to implement comprehensive observability in his Kubernetes cluster using logging, metrics, and tracing to monitor, troubleshoot, and optimize his applications.

Let’s move on to Chapter 27, “Bob Gains Observability in Kubernetes!”. In this chapter, Bob will learn how to implement comprehensive observability in his Kubernetes cluster using logging, metrics, and tracing to monitor, troubleshoot, and optimize his applications.

1. Introduction: Observability and Its Importance

Bob has built a robust Kubernetes environment, but keeping everything running smoothly requires complete visibility. Observability gives Bob insights into application performance, resource usage, and potential issues before they become problems.

“Observability isn’t just nice to have—it’s essential for running a healthy cluster!” Bob says, eager to dive in.

2. Setting Up Centralized Logging

Bob starts with centralized logging to collect logs from all containers in the cluster.

  • Deploying the EFK Stack:

    • Bob chooses the EFK Stack (Elasticsearch, Fluentd, Kibana) for log aggregation.

    • Installing Elasticsearch:

      kubectl apply -f
    • Installing Fluentd:

      • Fluentd collects logs from containers and forwards them to Elasticsearch.
      kubectl apply -f
    • Installing Kibana:

      • Kibana visualizes the logs stored in Elasticsearch.
      kubectl apply -f
  • Testing the Logging Stack:

    • Bob generates test logs by accessing one of his services.

    • He opens Kibana in his browser and verifies the logs are collected:


“Now I can see logs from every pod in one place—no more chasing individual logs!” Bob says, excited by the visibility.

3. Monitoring Metrics with Prometheus and Grafana

Next, Bob sets up Prometheus and Grafana to monitor metrics in his cluster.

  • Deploying Prometheus:

    kubectl apply -f
  • Setting Up Grafana:

    kubectl apply -f
  • Connecting Prometheus to Grafana:

    • Bob adds Prometheus as a data source in Grafana and creates a dashboard for CPU, memory, and network metrics.
  • Creating Alerts in Prometheus:

    • Bob configures alerts for high CPU usage:

      - name: cpu-alerts
        - alert: HighCPUUsage
          expr: sum(rate(container_cpu_usage_seconds_total[1m])) > 0.8
          for: 2m
            severity: warning
            summary: "High CPU usage detected"

“With Prometheus and Grafana, I can track performance and get alerted to problems instantly!” Bob says, loving the insight.

4. Implementing Distributed Tracing with Jaeger

Bob learns that Jaeger helps trace requests as they flow through his microservices, making it easier to debug complex issues.

  • Deploying Jaeger:

    kubectl create namespace observability
    kubectl apply -f
  • Instrumenting Applications:

    • Bob modifies his Python Flask backend to include Jaeger tracing:

      from flask import Flask
      from jaeger_client import Config
      app = Flask(__name__)
      def init_tracer(service_name):
          config = Config(
                  'sampler': {'type': 'const', 'param': 1},
                  'local_agent': {'reporting_host': 'jaeger-agent'},
          return config.initialize_tracer()
      tracer = init_tracer('backend')
  • Viewing Traces:

    • Bob accesses the Jaeger UI and traces a request through the backend:


“Tracing makes it so much easier to pinpoint where a request slows down!” Bob says, impressed.

5. Using Kubernetes Built-In Tools for Debugging

Bob explores built-in Kubernetes tools for quick diagnostics.

  • Viewing Pod Logs:

    kubectl logs <pod-name>
  • Checking Pod Resource Usage:

    kubectl top pod
  • Debugging with kubectl exec:

    kubectl exec -it <pod-name> -- sh
  • Inspecting Cluster Events:

    kubectl get events

“The built-in tools are great for quick troubleshooting!” Bob notes.

6. Monitoring Application Health with Liveness and Readiness Probes

Bob ensures his applications remain healthy by adding probes to their configurations.

  • Adding Probes to a Deployment:

    • Bob updates his Nginx deployment:

      apiVersion: apps/v1
      kind: Deployment
        name: nginx
        - name: nginx
          image: nginx:latest
              path: /
              port: 80
            initialDelaySeconds: 5
            periodSeconds: 10
              path: /
              port: 80
            initialDelaySeconds: 5
            periodSeconds: 10
  • Testing Probes:

    • Bob intentionally breaks the service and observes Kubernetes restarting the pod to restore functionality.

“Probes make my apps self-healing!” Bob says, impressed by the resilience.

7. Combining Observability Tools into Dashboards

Bob creates unified dashboards in Grafana to combine logs, metrics, and traces.

  • Adding Logs to Grafana:

    • Bob integrates Elasticsearch with Grafana to visualize logs alongside metrics.
  • Customizing Dashboards:

    • He creates panels for:
      • CPU and memory usage.
      • Log error counts.
      • Request trace durations.

“One dashboard to rule them all—everything I need in one place!” Bob says, thrilled.

8. Automating Observability with Helm Charts

To simplify observability setup, Bob learns to use Helm charts.

  • Installing Helm:

    sudo dnf install helm
  • Deploying the EFK Stack with Helm:

    helm repo add elastic
    helm install efk elastic/efk-stack
  • Deploying Prometheus with Helm:

    helm repo add prometheus-community
    helm install prometheus prometheus-community/prometheus

“Helm makes deploying complex observability stacks a breeze!” Bob says, loving the efficiency.

9. Conclusion: Bob’s Observability Triumph

With centralized logging, metrics, and tracing in place, Bob’s Kubernetes cluster is fully observable. He can monitor, debug, and optimize his applications with confidence, ensuring everything runs smoothly.

Next, Bob plans to explore advanced scheduling and workload management in Kubernetes, diving into node affinities, taints, and tolerations.

Stay tuned for the next chapter: “Bob Masters Kubernetes Scheduling and Workload Management!”

Last modified 12.02.2025: some corrections and edits. (d17b3d6)