在 K8s 上部署 Prometheus 和 Grafana#

🔧 准备工作#

环境要求#

操作系统: Ubuntu 20.04+
Kubernetes: 正常运行的集群
Helm: 3.x 版本
kubectl: 已配置且可用

安装必要工具#

PRTCL // BASH

1
# 安装 Helm 3
2
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
3

4
# 安装 kubectl（如未安装）
5
sudo snap install kubectl --classic
6

7
# 验证安装
8
helm version
9
kubectl cluster-info

创建监控命名空间#

PRTCL // BASH

1
kubectl create namespace monitoring

📦 部署 Prometheus#

添加 Helm 仓库#

PRTCL // BASH

1
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
2
helm repo update

安装 Prometheus Stack#

PRTCL // BASH

1
helm install prometheus prometheus-community/kube-prometheus-stack \
2
  --namespace monitoring \
3
  --set prometheus.service.type=NodePort \
4
  --set prometheus.service.nodePort=30090

自定义配置（可选）#

创建 values.yaml:

PRTCL // YAML

1
prometheus:
2
  prometheusSpec:
3
    retention: 15d
4
    resources:
5
      requests:
6
        memory: "512Mi"
7
        cpu: "500m"
8
    storageSpec:
9
      volumeClaimTemplate:
10
        spec:
11
          accessModes: ["ReadWriteOnce"]
12
          resources:
13
            requests:
14
              storage: 10Gi

应用配置：

PRTCL // BASH

1
helm upgrade prometheus prometheus-community/kube-prometheus-stack \
2
  --namespace monitoring \
3
  -f values.yaml

🎨 配置 Grafana#

暴露 Grafana 服务#

PRTCL // BASH

1
helm upgrade prometheus prometheus-community/kube-prometheus-stack \
2
  --namespace monitoring \
3
  --set grafana.service.type=NodePort \
4
  --set grafana.service.nodePort=30000

获取访问凭证#

PRTCL // BASH

1
# 获取管理员密码
2
kubectl get secret -n monitoring prometheus-grafana \
3
  -o jsonpath='{.data.admin-password}' | base64 --decode

// …existing code…

🔄 持久化存储配置#

创建持久卷声明（PVC）#

PRTCL // YAML

1
apiVersion: v1
2
kind: PersistentVolumeClaim
3
metadata:
4
  name: grafana-pvc
5
  namespace: monitoring
6
spec:
7
  accessModes:
8
    - ReadWriteOnce
9
  resources:
10
    requests:
11
      storage: 10Gi

更新 Grafana 配置#

PRTCL // YAML

1
grafana:
2
  persistence:
3
    enabled: true
4
    existingClaim: grafana-pvc
5
    size: 10Gi

🌐 配置 Ingress 访问#

创建 Ingress 规则#

PRTCL // YAML

1
apiVersion: networking.k8s.io/v1
2
kind: Ingress
3
metadata:
4
  name: monitoring-ingress
5
  namespace: monitoring
6
  annotations:
7
    nginx.ingress.kubernetes.io/rewrite-target: /
8
spec:
9
  rules:
10
  - host: grafana.example.com
11
    http:
12
      paths:
13
      - path: /
14
        pathType: Prefix
15
        backend:
16
          service:
17
            name: prometheus-grafana
18
            port:
19
              number: 80

应用 Ingress 配置#

PRTCL // BASH

1
kubectl apply -f ingress.yaml

📊 导入 Dashboard#

常用 Dashboard ID#

用途	Dashboard ID	说明
节点监控	1860	Node Exporter 全量指标
集群概览	13105	Kubernetes 集群监控
资源使用	8685	Pod 资源使用详情
网络监控	12175	网络流量分析

导入步骤#

访问 Grafana UI
点击 + > Import
输入 Dashboard ID
选择数据源（默认 Prometheus）
点击 Import

🔍 监控验证#

检查组件状态#

PRTCL // BASH

1
# 检查 Pod 状态
2
kubectl get pods -n monitoring
3

4
# 检查服务状态
5
kubectl get svc -n monitoring
6

7
# 查看资源使用
8
kubectl top nodes

验证数据采集#

PRTCL // BASH

1
# 查看目标状态
2
curl -s http://< 节点 IP>:30090/targets | grep "UP"
3

4
# 检查告警规则
5
kubectl get prometheusrules -n monitoring

⚠️ 常见问题排查#

Pod 无法启动

PRTCL // BASH

1
# 查看详细错误
2
kubectl describe pod <pod-name> -n monitoring
3

4
# 查看容器日志
5
kubectl logs <pod-name> -n monitoring

数据采集异常

检查 ServiceMonitor 配置
验证标签选择器
确认端口暴露正确

Grafana 访问问题

确认服务暴露方式
检查 Ingress 配置
验证网络策略

📝 运维建议#

定期备份 Grafana 配置

PRTCL // BASH

1
kubectl cp monitoring/prometheus-grafana-xxx:/var/lib/grafana ./grafana-backup

监控资源使用

PRTCL // BASH

1
# 设置资源告警
2
kubectl apply -f prometheus-rules.yaml

日志轮转配置

PRTCL // YAML

1
prometheus:
2
  prometheusSpec:
3
    retention: 15d
4
    retentionSize: "10GB"

🔗 参考资源#

作者：EchoWang

小红书：汪多多是只猫

B 站：汪多多是只猫

公众号：汪多多是只猫

博客：https://blog.echospace.top

在-Kubernetes-上部署-Prometheus-和-Grafana-(Helm)

在 K8s 上部署 Prometheus 和 Grafana#

🔧 准备工作#

环境要求#

安装必要工具#

创建监控命名空间#

📦 部署 Prometheus#

添加 Helm 仓库#

安装 Prometheus Stack#

自定义配置（可选）#

🎨 配置 Grafana#

暴露 Grafana 服务#

获取访问凭证#

🔄 持久化存储配置#

创建持久卷声明（PVC）#

更新 Grafana 配置#

🌐 配置 Ingress 访问#

创建 Ingress 规则#

应用 Ingress 配置#

📊 导入 Dashboard#

常用 Dashboard ID#

导入步骤#

🔍 监控验证#

检查组件状态#

验证数据采集#

⚠️ 常见问题排查#

📝 运维建议#

🔗 参考资源#

Related Posts

Comments