Change OpenShift Data Foundation OSDs disk flavor (dimension)
In this article, I’ll show you how to migrate your OpenShift Data Foundation OSDs (disks) from one flavor to another; in my case, I’ll migrate OSDs and data from 0.5TiB disks to 2TiB disks; this will be a “rolling” migration with no service or data disruption.
Warning: If you are a Red Hat customer, open a support case before going forward, otherwise do the following steps at your own risk!
Requirements
Before starting, you’ll need:
- installed and working OpenShift Data Foundation;
- this article is based on ODF configured with the replica parameter set to [0], which is usually the default on hyperscalers; otherwise, you’ll need to adapt if you want to do this migration, for example, on bare metal (perhaps you’re using the LocalStorage operator on bare metal).
- in this guide, I’ll move data and disks from three OSD disks to three other OSD disks; if you have more than three OSD, you must redo this procedure from beginning to end or check if the destination three disks can store data from more than three source disks.
[0] In this guide, I’ll assume your OpenShift Data Foundation is installed on a hyperscale cloud provider, such as Azure or AWS, with three availability zones, and that you have replica set to three:
$ oc get storagecluster -n openshift-storage ocs-storagecluster -ojson | jq .spec.storageDeviceSets
[
{
"count": 1,
...
"replica": 3,
...
}
]
Run must-gather
Before applying any change, run an OpenShift must-gather:
$ oc adm must-gather
then create a specific ODF must-gather; in this example, I use ODF in version 4.10:
$ mkdir ~/odf-must-gather
$ oc adm must-gather --image=registry.redhat.io/odf4/ocs-must-gather-rhel8:v4.10 --dest-dir=~/odf-must-gather
Check cluster health
Check if your cluster is healthy:
$ NAMESPACE=openshift-storage;ROOK_POD=$(oc -n ${NAMESPACE} get pod -l app=rook-ceph-operator -o jsonpath='{.items[0].metadata.name}');oc exec -it ${ROOK_POD} -n ${NAMESPACE} -- ceph status --cluster=${NAMESPACE} --conf=/var/lib/rook/${NAMESPACE}/${NAMESPACE}.config --keyring=/var/lib/rook/${NAMESPACE}/client.admin.keyring | grep HEALTH
health: HEALTH_OK
WARNING: if your cluster is not in HEALTH_OK, stop any activities and check the ODF state!
Add Capacity
Add new capacity to your cluster using the new OSD flavor; in my case, the original storageDeviceSets is using 0.5TiB disks:
$ oc get storagecluster ocs-storagecluster -n openshift-storage -oyaml
...
storageDeviceSets:
- config: {}
count: 1
dataPVCTemplate:
metadata: {}
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Gi
storageClassName: gp3-csi
volumeMode: Block
status: {}
name: ocs-deviceset
placement: {}
preparePlacement: {}
replica: 3
resources:
limits:
cpu: "2"
memory: 5Gi
requests:
cpu: "2"
memory: 5Gi
switch to openshift-storage project and backup storagecluster:
$ oc project openshift-storage
$ oc get storagecluster ocs-storagecluster -oyaml | tee backup-storagecluster-ocs-storagecluster.yaml
add new OSDs, with desired flavor, in my case I’m adding a new storageDeviceSets with 2TiB disks:
$ oc edit storagecluster ocs-storagecluster
...
storageDeviceSets:
- config: {}
count: 1
dataPVCTemplate:
metadata: {}
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Gi
storageClassName: gp3-csi
volumeMode: Block
status: {}
name: ocs-deviceset
placement: {}
preparePlacement: {}
replica: 3
resources:
limits:
cpu: "2"
memory: 5Gi
requests:
cpu: "2"
memory: 5Gi
- config: {}
count: 1
dataPVCTemplate:
metadata: {}
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2000Gi
storageClassName: gp3-csi
volumeMode: Block
status: {}
name: ocs-deviceset-2t
placement: {}
preparePlacement: {}
replica: 3
resources:
limits:
cpu: "2"
memory: 5Gi
requests:
cpu: "2"
memory: 5Gi
version: 4.10.0
wait until ODF will rebalance all data, which means cluster will be in HEALTH_OK status and all placement groups (pgs) must be in active+clean state. To monitor rebalance, you can use a while true infinite loop:
$ while true; do NAMESPACE=openshift-storage;ROOK_POD=$(oc -n ${NAMESPACE} get pod -l app=rook-ceph-operator -o jsonpath='{.items[0].metadata.name}');oc exec -it ${ROOK_POD} -n ${NAMESPACE} -- ceph status --cluster=${NAMESPACE} --conf=/var/lib/rook/${NAMESPACE}/${NAMESPACE}.config --keyring=/var/lib/rook/${NAMESPACE}/client.admin.keyring | egrep 'HEALTH_OK|HEALTH_WARN|[0-9]+\s+remapped|[0-9]+\/[0-9]+[ a-z]+misplaced[ ().%a-z0-9]+|' ; sleep 10 ; done
cluster:
id: .....
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 2h)
mgr: a(active, since 2w)
mds: 1/1 daemons up, 1 hot standby
osd: 6 osds: 6 up (since 34s), 6 in (since 51s); 54 remapped pgs
data:
volumes: 1/1 healthy
pools: 4 pools, 97 pgs
objects: 17.77k objects, 54 GiB
usage: 157 GiB used, 7.2 TiB / 7.3 TiB avail
pgs: 45509/53298 objects misplaced (85.386%)
54 active+remapped+backfill_wait
37 active+clean
5 active+remapped
1 active+remapped+backfilling
io:
client: 1023 B/s rd, 217 KiB/s wr, 1 op/s rd, 6 op/s wr
recovery: 86 MiB/s, 1 keys/s, 30 objects/s
in the above example, you can see that Ceph is rebalancing and remapping PGs. Wait until all PGs are in active+clean state:
$ NAMESPACE=openshift-storage;ROOK_POD=$(oc -n ${NAMESPACE} get pod -l app=rook-ceph-operator -o jsonpath='{.items[0].metadata.name}');oc exec -it ${ROOK_POD} -n ${NAMESPACE} -- ceph status --cluster=${NAMESPACE} --conf=/var/lib/rook/${NAMESPACE}/${NAMESPACE}.config --keyring=/var/lib/rook/${NAMESPACE}/client.admin.keyring
cluster:
id: .....
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 2h)
mgr: a(active, since 2w)
mds: 1/1 daemons up, 1 hot standby
osd: 6 osds: 6 up (since 17m), 6 in (since 18m)
data:
volumes: 1/1 healthy
pools: 4 pools, 193 pgs
objects: 17.02k objects, 51 GiB
usage: 146 GiB used, 7.2 TiB / 7.3 TiB avail
pgs: 193 active+clean
io:
client: 853 B/s rd, 76 KiB/s wr, 1 op/s rd, 7 op/s wr
WARNING: wait until your cluster returns all PGs in active+clean state!
check also CephFS status:
$ NAMESPACE=openshift-storage;ROOK_POD=$(oc -n ${NAMESPACE} get pod -l app=rook-ceph-operator -o jsonpath='{.items[0].metadata.name}');oc exec -it ${ROOK_POD} -n ${NAMESPACE} -- ceph fs status --cluster=${NAMESPACE} --conf=/var/lib/rook/${NAMESPACE}/${NAMESPACE}.config --keyring=/var/lib/rook/${NAMESPACE}/client.admin.keyring
ocs-storagecluster-cephfilesystem - 12 clients
=================================
RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
0 active ocs-storagecluster-cephfilesystem-b Reqs: 37 /s 34.8k 27.2k 8369 27.1k
0-s standby-replay ocs-storagecluster-cephfilesystem-a Evts: 47 /s 82.3k 26.8k 8298 0
WARNING: one of the two MDS must be in active state!
Identify old OSDs / disks to remove
Take a note of your 3 OSD id to remove, they are based on your old flavor (weight 0.48830), to see ODF OSD topology run:
$ NAMESPACE=openshift-storage;ROOK_POD=$(oc -n ${NAMESPACE} get pod -l app=rook-ceph-operator -o jsonpath='{.items[0].metadata.name}');oc exec -it ${ROOK_POD} -n ${NAMESPACE} -- ceph osd tree --cluster=${NA
MESPACE} --conf=/var/lib/rook/${NAMESPACE}/${NAMESPACE}.config --keyring=/var/lib/rook/${NAMESPACE}/client.admin.keyring
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 7.32417 root default
-5 7.32417 region eu-central-1
-14 2.44139 zone eu-central-1a
-13 2.44139 host ip-XX-XX-XX-4-rete
2 ssd 0.48830 osd.2 up 1.00000 1.00000
5 ssd 1.95309 osd.5 up 1.00000 1.00000
-10 2.44139 zone eu-central-1b
-9 2.44139 host ip-XX-XX-XX-46-rete
1 ssd 0.48830 osd.1 up 1.00000 1.00000
4 ssd 1.95309 osd.4 up 1.00000 1.00000
-4 2.44139 zone eu-central-1c
-3 2.44139 host ip-XX-XX-XX-80-rete
0 ssd 0.48830 osd.0 up 1.00000 1.00000
3 ssd 1.95309 osd.3 up 1.00000 1.00000
In my case, the old OSD are osd.0, osd.1 and osd.2. Those OSDs need to be removed / deleted one by one, waiting for HEALTH_OK after every removal / deletion.
Remove the OSD from the old Storage flavor
Switch to the openshift-storage project
First switch to the openshift-storage project:
$ oc project openshift-storage
Copy the Ceph config and keyring files
Copy your Ceph config file and keyring file from rook container pod to your Linux box, and then those files will be transferred to one mon container in order to run Ceph commands after scaling down rook operator.
Copy files from rook container to your Linux box:
$ ROOK=$(oc get pod | grep rook-ceph-operator | awk '{print $1}')
$ echo ${ROOK}
rook-ceph-operator-5767bbc7b9-w8swd
$ oc rsync ${ROOK}:/var/lib/rook/openshift-storage/openshift-storage.config .
WARNING: cannot use rsync: rsync not available in container
openshift-storage.config
$ oc rsync ${ROOK}:/var/lib/rook/openshift-storage/client.admin.keyring .
WARNING: cannot use rsync: rsync not available in container
client.admin.keyring
Copy the openshift-storage.config and openshift-storage.config files from your Linux box to one mon container:
$ MONA=$(oc get pod | grep rook-ceph-mon | egrep '2\/2\s+Running' | head -n1 | awk '{print $1}')
$ echo ${MONA}
rook-ceph-mon-a-769fc864f-btmmr
$ oc cp openshift-storage.config ${MONA}:/tmp/openshift-storage.config
Defaulted container "mon" out of: mon, log-collector, chown-container-data-dir (init), init-mon-fs (init)
$ oc cp client.admin.keyring ${MONA}:/tmp/client.admin.keyring
Defaulted container "mon" out of: mon, log-collector, chown-container-data-dir (init), init-mon-fs (init)
NOTE: MONA, in one of Italian regional language means stupid people
Check the Ceph command on MONA container:
$ oc rsh ${MONA}
Defaulted container "mon" out of: mon, log-collector, chown-container-data-dir (init), init-mon-fs (init)
sh-4.4# ceph health --cluster=openshift-storage --conf=/tmp/openshift-storage.config --keyring=/tmp/client.admin.keyring
2023-XX -1 auth: unable to find a keyring on /var/lib/rook/openshift-storage/client.admin.keyring: (2) No such file or directory
2023-XX -1 AuthRegistry(0x7fbbb805bb68) no keyring found at /var/lib/rook/openshift-storage/client.admin.keyring, disabling cephx
HEALTH_OK
sh-4.4# exit
Scale down OpenShift Data Foundation operators
We can now scale to zero rook and ocs operators:
$ oc scale deploy ocs-operator --replicas=0
deployment.apps/ocs-operator scaled
$ oc scale deploy rook-ceph-operator --replicas=0
deployment.apps/rook-ceph-operator scaled
Remove one OSD
Now you can remove one OSD; in my case, I’ll remove osd.0 (zero), but in your case, it could be a different ID.
$ failed_osd_id=0
$ export PS1="[\u@\h \W]\ OSD=$failed_osd_id $ "
$ oc scale deploy rook-ceph-osd-${failed_osd_id} --replicas=0
deployment.apps/rook-ceph-osd-0 scaled
$ oc process -n openshift-storage ocs-osd-removal -p FAILED_OSD_IDS=${failed_osd_id} FORCE_OSD_REMOVAL=true |oc create -n openshift-storage -f -
job.batch/ocs-osd-removal-job created
$ JOBREMOVAL=$(oc get pod | grep ocs-osd-removal-job- | awk '{print $1}')
$ oc logs ${JOBREMOVAL} | egrep "cephosd: completed removal of OSD ${failed_osd_id}"
2023-XX I | cephosd: completed removal of OSD 0
NOTE: on the last command you must see cephosd: completed removal of OSD X, where X is your osd id (in my case zero).
Check the Ceph health status, where you can see a degraded state due to one osd removal:
$ oc rsh ${MONA}
Defaulted container "mon" out of: mon, log-collector, chown-container-data-dir (init), init-mon-fs (init)
sh-4.4#
sh-4.4# ceph status --cluster=openshift-storage --conf=/tmp/openshift-storage.config --keyring=/tmp/client.admin.keyring
2023-XX -1 auth: unable to find a keyring on /var/lib/rook/openshift-storage/client.admin.keyring: (2) No such file or directory
2023-XX -1 AuthRegistry(0x7f207005bb68) no keyring found at /var/lib/rook/openshift-storage/client.admin.keyring, disabling cephx
cluster:
id: .....
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 2h)
mgr: a(active, since 2w)
mds: 1/1 daemons up, 1 hot standby
osd: 5 osds: 5 up (since 19m), 5 in (since 9m)
data:
volumes: 1/1 healthy
pools: 4 pools, 193 pgs
objects: 17.10k objects, 52 GiB
usage: 146 GiB used, 6.7 TiB / 6.8 TiB avail
pgs: 193 active+clean
io:
client: 1.2 KiB/s rd, 460 KiB/s wr, 2 op/s rd, 7 op/s wr
sh-4.4#
wait until Ceph returns HEALTH_OK and all PGs are in active+clean state:
sh-4.4# while true; do ceph status --cluster=openshift-storage --conf=/tmp/openshift-storage.config --keyring=/tmp/client.admin.keyring | egrep --color=always '[0-9]+\/[0-9]+.*(degraded|misplaced)|' ; sleep 10 ; done
WARNING: before proceeding, you must wait for Ceph HEALTH_OK and all PGs in active+clean state!
Delete removal job:
$ oc delete job ocs-osd-removal-job
job.batch "ocs-osd-removal-job" deleted
Repeat these steps for each OSD you need to remove (in my case for osd.1 and osd.2)
Remove your old storageDeviceSets pointing to old OSD disks flavor
After removing all OSD from your old storageDeviceSets (in my case, with disk flavor set to 0.5TiB), you can remove it from your storagecluster object:
Make a backup before editing your storagecluster:
$ oc get storagecluster ocs-storagecluster -oyaml | tee storagecluster-ocs-storagecluster-before-remove-500g.yaml
change / edit your storagecluster storageDeviceSets so that only newly created storageDeviceSets remain; in my case, with 2TiB disks flavor:
$ oc edit storagecluster ocs-storagecluster
...
storageDeviceSets:
- config: {}
count: 1
dataPVCTemplate:
metadata: {}
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2000Gi
storageClassName: gp3-csi
volumeMode: Block
status: {}
name: ocs-deviceset-2t
placement: {}
preparePlacement: {}
replica: 3
resources:
limits:
cpu: "2"
memory: 5Gi
requests:
cpu: "2"
memory: 5Gi
version: 4.10.0
Scale up OpenShift Data Foundation operators
At this point, you can scale up ocs-operator:
$ oc scale deploy ocs-operator --replicas=1
deployment.apps/ocs-operator scaled
and then re-check Ceph health status:
$ NAMESPACE=openshift-storage;ROOK_POD=$(oc -n ${NAMESPACE} get pod -l app=rook-ceph-operator -o jsonpath='{.items[0].metadata.name}');oc exec -it ${ROOK_POD} -n ${NAMESPACE} -- ceph status --cluster=${NAMESPACE} --conf=/var/lib/rook/${NAMESPACE}/${NAMESPACE}.config --keyring=/var/lib/rook/${NAMESPACE}/client.admin.keyring | egrep -i 'remapped|misplaced|active\+clean|HEALTH_OK|'
cluster:
id: .....
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 3h)
mgr: a(active, since 2w)
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 7m), 3 in (since 6m)
data:
volumes: 1/1 healthy
pools: 4 pools, 193 pgs
objects: 17.15k objects, 52 GiB
usage: 145 GiB used, 5.7 TiB / 5.9 TiB avail
pgs: 193 active+clean
io:
client: 853 B/s rd, 246 KiB/s wr, 1 op/s rd, 4 op/s wr