使用Rook安装Ceph

Posted by elrond on October 7, 2021

1. Rook简介

Rook是一个云原生的存储管理工具,当前支持 CephCassandraNFS

  • 主要解决了云原生场景下Ceph的部署、使用、维护等问题,可见的例如
    • rgw、nfs的高可用,都可以通过ing-service-pod来实现
    • 部署非常快
  • 也带来了一些挑战 – 依赖K8S,Ceph使用容器化管理之后,故障率会提升、维护难度会增大。

2. Rook架构

rook架构

简言之,rook以云原生方式(Operator)实现了Ceph的生命周期管理和Ceph资源的使用

3. Rook部署

3.1. 前提

  • 要有一个K8S集群
  • K8S集群的worker节点需要至少有一个空闲的块设备作为Ceph的OSD

3.2. 环境描述

如果使用Ceph O版及之后的版本部署会比较顺利,这里使用N版,部署时出现了一点点问题

组件名称 版本 备注
操作系统 CentOS Linux release 7.8.2003 (Core)  
内核 5.13.2-1.el7.elrepo.x86_64  
K8S v1.19.7  
Ceph 14.2.22  
Rook v1.7.0  

3.3. 部署

3.3.1. 获取rook代码

gh repo clone rook/rook
# 切到v1.7.0tag
git checkout v1.7.0

3.3.2. 修改变量

cd cluster/examples/kubernetes/ceph

3.3.2.1. cluster.yaml

只修改了下面两个有注释的地方

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph # namespace:cluster
spec:
  cephVersion:
  # 指定Ceph版本
    image: quay.io/ceph/ceph:v14.2.22
    allowUnsupported: false
  dataDirHostPath: /var/lib/rook
  skipUpgradeChecks: false
  continueUpgradeAfterChecksEvenIfNotHealthy: false
  waitTimeoutForHealthyOSDInMinutes: 10
  mon:
    count: 3
    allowMultiplePerNode: false
  mgr:
    count: 1
    modules:
      - name: pg_autoscaler
        enabled: true
  dashboard:
    enabled: true
    ssl: true
  monitoring:
    enabled: false
    rulesNamespace: rook-ceph
  network:
  crashCollector:
    disable: false
  cleanupPolicy:
    confirmation: ""
    sanitizeDisks:
      method: quick
      dataSource: zero
      iteration: 1
    allowUninstallWithVolumes: false
  annotations:
  labels:
  resources:
  removeOSDsIfOutAndSafeToRemove: false
  storage: # cluster level storage configuration and selection
    useAllNodes: true
    useAllDevices: true
    config:
    # 指定Ceph磁盘个数和盘符 可以用正则匹配
      osdsPerDevice: "1" # this value can be overridden at the node or device level
      devices: "vdb"
  disruptionManagement:
    managePodBudgets: true
    osdMaintenanceTimeout: 30
    pgHealthCheckTimeout: 0
    manageMachineDisruptionBudgets: false
    machineDisruptionBudgetNamespace: openshift-machine-api
  healthCheck:
    daemonHealth:
      mon:
        disabled: false
        interval: 45s
      osd:
        disabled: false
        interval: 60s
      status:
        disabled: false
        interval: 60s
    livenessProbe:
      mon:
        disabled: false
      mgr:
        disabled: false
      osd:
        disabled: false

3.3.2.2. operator.yaml

只修改了csi版本相关信息

kind: ConfigMap
apiVersion: v1
metadata:
  name: rook-ceph-operator-config
  namespace: rook-ceph # namespace:operator
data:
  ROOK_LOG_LEVEL: "INFO"
  ROOK_CSI_ENABLE_CEPHFS: "true"
  ROOK_CSI_ENABLE_RBD: "true"
  ROOK_CSI_ENABLE_GRPC_METRICS: "false"
  CSI_ENABLE_CEPHFS_SNAPSHOTTER: "true"
  CSI_ENABLE_RBD_SNAPSHOTTER: "true"
  CSI_FORCE_CEPHFS_KERNEL_CLIENT: "true"
  CSI_RBD_FSGROUPPOLICY: "ReadWriteOnceWithFSType"
  CSI_CEPHFS_FSGROUPPOLICY: "None"
  # 修改CSI镜像版本,与Ceph版本相匹配
  # 版本修改参照 http://elrond.wang/2021/06/19/Kubernetes%E9%9B%86%E6%88%90Ceph/
  # 默认使用的镜像被墙,不用代理拉不下来
  ROOK_CSI_ALLOW_UNSUPPORTED_VERSION: "true"
  ROOK_CSI_CEPH_IMAGE: "quay.io/cephcsi/cephcsi:v3.2-canary"
  ROOK_CSI_REGISTRAR_IMAGE: "quay.io/k8scsi/csi-node-driver-registrar:v2.0.1"
  ROOK_CSI_RESIZER_IMAGE: "quay.io/k8scsi/csi-resizer:v1.0.1"
  ROOK_CSI_PROVISIONER_IMAGE: "quay.io/k8scsi/csi-provisioner:v2.0.4"
  ROOK_CSI_SNAPSHOTTER_IMAGE: "quay.io/k8scsi/csi-snapshotter:v4.0.0"
  ROOK_CSI_ATTACHER_IMAGE: "quay.io/k8scsi/csi-attacher:v3.0.2"
  ROOK_OBC_WATCH_OPERATOR_NAMESPACE: "true"
  ROOK_ENABLE_FLEX_DRIVER: "false"
  ROOK_ENABLE_DISCOVERY_DAEMON: "false"
  ROOK_CEPH_COMMANDS_TIMEOUT_SECONDS: "15"
  CSI_ENABLE_VOLUME_REPLICATION: "false"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rook-ceph-operator
  namespace: rook-ceph # namespace:operator
  labels:
    operator: rook
    storage-backend: ceph
spec:
  selector:
    matchLabels:
      app: rook-ceph-operator
  replicas: 1
  template:
    metadata:
      labels:
        app: rook-ceph-operator
    spec:
      serviceAccountName: rook-ceph-system
      containers:
        - name: rook-ceph-operator
          image: rook/ceph:v1.7.0
          args: ["ceph", "operator"]
          volumeMounts:
            - mountPath: /var/lib/rook
              name: rook-config
            - mountPath: /etc/ceph
              name: default-config-dir
          env:
            - name: ROOK_CURRENT_NAMESPACE_ONLY
              value: "false"
            - name: ROOK_DISCOVER_DEVICES_INTERVAL
              value: "60m"
            - name: ROOK_HOSTPATH_REQUIRES_PRIVILEGED
              value: "false"
            - name: ROOK_ENABLE_SELINUX_RELABELING
              value: "true"
            - name: ROOK_ENABLE_FSGROUP
              value: "true"
            - name: ROOK_DISABLE_DEVICE_HOTPLUG
              value: "false"
            - name: DISCOVER_DAEMON_UDEV_BLACKLIST
              value: "(?i)dm-[0-9]+,(?i)rbd[0-9]+,(?i)nbd[0-9]+"
            - name: ROOK_UNREACHABLE_NODE_TOLERATION_SECONDS
              value: "5"
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
      volumes:
        - name: rook-config
          emptyDir: {}
        - name: default-config-dir
          emptyDir: {}

3.3.2.3. 其他配置

  • 文件系统相关配置 filesystem.yaml
  • 对象存储相关配置 object.yaml
  • nfs相关配置 nfs.yaml

3.3.3. 开始部署

# 部署operator
kubectl create -f crds.yaml -f common.yaml -f operator.yaml
# 部署Ceph集群
kubectl create -f cluster.yaml
# 部署RGW
kubectl create -f object.yaml
# 部署NFS
kubectl create -f nfs.yaml
# 部署tool-box
kubectl create -f toolbox.yaml

3.3.4. 部署成功的状态

3.3.4.1. po状态

kubectl get po -n rook-ceph
# 输出如下
NAME                                                      READY   STATUS      RESTARTS   AGE
csi-cephfsplugin-nk6jm                                    3/3     Running     0          12s
csi-cephfsplugin-provisioner-666c6886db-lktgs             6/6     Running     0          11s
csi-cephfsplugin-provisioner-666c6886db-pw27p             6/6     Running     0          11s
csi-cephfsplugin-xh8kh                                    3/3     Running     0          12s
csi-cephfsplugin-zl2tm                                    3/3     Running     0          12s
csi-rbdplugin-5bjzv                                       3/3     Running     0          14s
csi-rbdplugin-cl2wc                                       3/3     Running     0          14s
csi-rbdplugin-provisioner-769b8d6f5-9tdzw                 6/6     Running     0          13s
csi-rbdplugin-provisioner-769b8d6f5-xfgpx                 6/6     Running     0          13s
csi-rbdplugin-wgbfc                                       3/3     Running     0          14s
rook-ceph-crashcollector-172.16.80.177-6856778cd6-vbdbp   1/1     Running     0          7d15h
rook-ceph-crashcollector-172.16.80.181-84c6474ccf-29jwc   1/1     Running     0          7d15h
rook-ceph-crashcollector-172.16.80.60-5d7c647d48-6rxgx    1/1     Running     0          34m
rook-ceph-mds-myfs-a-568d689779-wr8p7                     1/1     Running     0          7d15h
rook-ceph-mds-myfs-b-6f75c5c7f5-vl942                     1/1     Running     0          7d15h
rook-ceph-mgr-a-7d6cdd489d-2wh92                          1/1     Running     0          7d15h
rook-ceph-mon-a-594d7bcdd4-jrkcm                          1/1     Running     0          7d15h
rook-ceph-mon-b-d846c949-gmrkv                            1/1     Running     0          7d15h
rook-ceph-mon-c-868856f488-v264b                          1/1     Running     0          7d15h
rook-ceph-nfs-my-nfs-a-9fb6cbff6-cvhbc                    2/2     Running     0          7d15h
rook-ceph-nfs-my-nfs-b-5dfcdf6454-kktdg                   2/2     Running     0          7d15h
rook-ceph-nfs-my-nfs-c-78f748c5f6-kxszz                   2/2     Running     0          7d15h
rook-ceph-operator-6b88ff7b4c-nckkl                       1/1     Running     0          7d17h
rook-ceph-osd-0-898c48bcc-vvq4g                           1/1     Running     0          7d15h
rook-ceph-osd-1-7d65ff8c46-wmclv                          1/1     Running     0          7d15h
rook-ceph-osd-2-6856cd76c6-n2rqh                          1/1     Running     0          7d15h
rook-ceph-osd-prepare-172.16.80.177-rxqlz                 0/1     Completed   0          35m
rook-ceph-osd-prepare-172.16.80.181-mzrnx                 0/1     Completed   0          35m
rook-ceph-osd-prepare-172.16.80.60-twls6                  0/1     Completed   0          35m
rook-ceph-rgw-my-store-a-6465b6f48f-stwp4                 1/1     Running     0          34m
rook-ceph-tools-744c55f865-qpx8s                          1/1     Running     0          7d16h

3.3.5. 集群状态

进入toolbox查看

# 进入toolbox
kubectl exec -it -n rook-ceph rook-ceph-tools-744c55f865-qpx8s -- bash
# 执行命令
[root@rook-ceph-tools-744c55f865-qpx8s /]# ceph -s
  cluster:
    id:     e8bc8a28-0c21-48a0-b676-ada1f26bbc41
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,c (age 7d)
    mgr: a(active, since 6d)
    mds: myfs:1 {0=myfs-a=up:active} 1 up:standby-replay
    osd: 3 osds: 3 up (since 7d), 3 in (since 7d)
    rgw: 1 daemon active (my.store.a)

  task status:

  data:
    pools:   9 pools, 144 pgs
    objects: 286 objects, 19 KiB
    usage:   3.0 GiB used, 1.5 TiB / 1.5 TiB avail
    pgs:     144 active+clean

  io:
    client:   1.6 KiB/s rd, 170 B/s wr, 2 op/s rd, 0 op/s wr

到这里就部署完毕,可以直接使用了

4. 部署中的问题

4.1. csi镜像无法拉到

国内无法访问 k8s.gcr.io, 所以需要将镜像地址替换为红帽的镜像仓库,参照上面代码中的注释,各个版本对应的镜像版本不一样,按实际情况替换

4.2. 删除rook时卡住

cephcluster 删除一直卡着

  • 解决方法:

在另一个窗口执行

kubectl -n rook-ceph patch crd cephclusters.ceph.rook.io --type merge -p '{"metadata":{"finalizers": [null]}}'
  • 参照
    • https://rook.io/docs/rook/v1.1/ceph-teardown.html
    • https://github.com/rook/rook/issues/4410

4.3. 重装rook是mon只初始化了一个,后面的部署都卡住

operater一直打印如下日志

kubectl logs rook-ceph-operator-6b88ff7b4c-nckkl -n rook-ceph -f
2021-01-07 15:44:39.084593 I | op-mon: waiting for mon quorum with [a]
2021-01-07 15:44:39.095582 I | op-mon: mons running: [a]
2021-01-07 15:44:59.929839 I | op-mon: mons running: [a]
2021-01-07 15:45:20.704740 I | op-mon: mons running: [a]
2021-01-07 15:45:41.473138 I | op-mon: mons running: [a]
2021-01-07 15:46:02.242541 I | op-mon: mons running: [a]
2021-01-07 15:46:23.001123 I | op-mon: mons running: [a]
2021-01-07 15:46:43.853226 I | op-mon: mons running: [a]
2021-01-07 15:47:04.651949 I | op-mon: mons running: [a]
2021-01-07 15:47:25.420815 I | op-mon: mons running: [a]
2021-01-07 15:47:46.238038 I | op-mon: mons running: [a]
2021-01-07 15:48:07.077166 I | op-mon: mons running: [a]
2021-01-07 15:48:27.858133 I | op-mon: mons running: [a]

get po 只有一个mon,且也没有osd的pod

  • 解决方式
    • 删除mon数据目录, 默认为 /var/lib/rook, 参照 https://rook.io/docs/rook/v1.7/ceph-common-issues.html#monitors-are-the-only-pods-running

5. 最后

  • rook 文档相对详尽
  • rook 社区活跃
  • rook用来做测试非常方便
  • rook的一些高可用实现思路可以参考并用于进程版
  • 生产集群用rook的收益与风险不成正比,生产应谨慎考虑之后再做使用