在CentOS7上使用ceph-deploy安装分布式存储系统Ceph

关于Ceph的介绍本文不再赘述,可以查看官方文档进行了解

Sage Weil读博士的时候开发了这套牛逼的分布式存储系统,最初是奔着高性能分布式文件系统去的,结果云计算风口一来,Ceph重心转向了分布式块存储(Block Storage)和分布式对象存储(Object Storage)。从Luminous版本开始,CephFS也已经具有相当好的稳定性,可用在生产环境。Ceph现在是云计算、虚拟机部署的最火开源存储解决方案,据说有20%的OpenStack部署存储用的都是Ceph的Block Storage。

Ceph提供3种存储方式:对象存储,块存储和文件系统,下图展示了Ceph存储集群的架构:

硬件环境准备


主机名 配置 操作系统 IP地址 角色 磁盘(除系统盘)
ceph-mon01 2核2G CentOS7.5 10.211.55.7 ADM
MON
OSD
MGR
MDS
RGW
100G * 3
ceph-mon02 2核2G CentOS7.5 10.211.55.8 MON
OSD
MGR
MDS
RGW
100G * 3
ceph-mon03 2核2G CentOS7.5 10.211.55.9 MON
OSD
MGR
MDS
RGW
100G * 3

说明
Ceph要求必须是奇数个MON节点,而且最少3个(实验环境的话,1个也是可以的),ADM是可选的,可以把ADM放在MON上,只不过把ADM单独拿出来架构上看更清晰一些。当然也可以把MON放在 OSD上,只是生产环境上不推荐这么做。

生产环境中各节点最低硬件配置建议可参考官网文档https://docs.ceph.com/en/latest/start/hardware-recommendations/#minimum-hardware-recommendations

软件环境准备


所有Ceph集群节点安装CentOS7.X操作系统,安装完操作系统后我们需要在每个节点上(包括ADM)做一点基础配置,比如关闭SELINUX、关闭防火墙、时间同步等等。

配置主机名

1
hostnamectl set-hostname <hostname>

关闭SELINUX

1
2
3
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
setenforce 0
reboot

关闭防火墙

1
2
systemctl stop firewalld
systemctl disable firewalld

节点之间时间同步

server端

说明
任选一台机器作为server端;如果环境可以访问互联网,可以不需要自己搭建server端,参考后面的client端部分设置所有节点与公网ntp时间服务器(例如time1.cloud.tencent.com)同步时间即可

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# 设置时区为Asia/Shanghai
timedatectl set-timezone Asia/Shanghai

# 安装chrony并备份配置文件
yum install chrony ntpdate -y
cp -a /etc/chrony.conf /etc/chrony.conf.bak

# 修改server端配置文件如下,标注的地方需要修改
cat > /etc/chrony.conf << EOF
stratumweight 0
driftfile /var/lib/chrony/drift
rtcsync
makestep 10 3
allow 10.211.55.0/24 # 设置为实际环境客户端所属IP网段
smoothtime 400 0.01

bindcmdaddress 127.0.0.1
bindcmdaddress ::1

local stratum 8
manual
keyfile /etc/chrony.keys
#initstepslew 10 client1 client3 client6
noclientlog
logchange 0.5
logdir /var/log/chrony
EOF

# 启动服务,设置开机自启
systemctl restart chronyd.service
systemctl enable chronyd.service
systemctl status chronyd.service

client端

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 设置时区为Asia/Shanghai
timedatectl set-timezone Asia/Shanghai

# 安装chrony并备份配置文件
yum install chrony ntpdate -y
cp -a /etc/chrony.conf /etc/chrony.conf.bak

# 修改client端配置文件
sed -i "s%^server%#server%g" /etc/chrony.conf
echo "server 10.211.55.7 iburst" >> /etc/chrony.conf # 添加一行,其中的IP地址替换为实际环境server端的IP地址

ntpdate 10.211.55.7 # 手动同步一次时间,其中的IP地址替换为实际环境server端的IP地址

# 启动服务,设置开机自启
systemctl restart chronyd.service
systemctl enable chronyd.service
systemctl status chronyd.service

chronyc sources # 查看ntp_servers状态
chronyc tracking # 查看ntp详细信息

配置Host解析

1
2
3
4
5
cat >> /etc/hosts << EOF
10.211.55.7 ceph-mon01
10.211.55.8 ceph-mon02
10.211.55.9 ceph-mon03
EOF

安装EPEL软件源

1
yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm -y

配置Ceph Repo仓库

说明
CEPH_STABLE_RELEASE的值替换成要安装的Ceph的版本号,例如nautilus。版本号可在阿里云镜像站查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
CEPH_STABLE_RELEASE=nautilus
cat > /etc/yum.repos.d/ceph.repo << EOF
[Ceph]
name=Ceph packages for \$basearch
baseurl=https://mirrors.aliyun.com/ceph/rpm-${CEPH_STABLE_RELEASE}/el7/\$basearch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc

[ceph-noarch]
name=Ceph noarch packages
baseurl=https://mirrors.aliyun.com/ceph/rpm-${CEPH_STABLE_RELEASE}/el7/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc

[ceph-source]
name=Ceph source packages
baseurl=https://mirrors.aliyun.com/ceph/rpm-${CEPH_STABLE_RELEASE}/el7/SRPMS/
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc
EOF

配置免密登录

在ceph-mon01上执行ssh-keygen -t rsa生成密钥文件(一路直接回车即可),然后把公钥拷贝到每一个Ceph节点上,确保从ceph-mon01到每台节点(包括自身)上都能无密码ssh登录。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[root@ceph-mon01 ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:3rKj4eAISBB5T/B95dTCDhbyFmllUjFsJ9mkEbWNI6c root@ceph-mon01
The key's randomart image is:
+---[RSA 2048]----+
|.... . +B@Oo |
|...... o=BB+++ |
|.. o. .o++++= . |
|. . o .+ . |
| . S E |
|o . . |
|o . . o . |
| . o o ..o |
| . . o... |
+----[SHA256]-----+
[root@ceph-mon01 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub ceph-mon01
[root@ceph-mon01 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub ceph-mon02
[root@ceph-mon01 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub ceph-mon03

Ceph部署


说明
本节所有操作均在ceph-mon01节点上操作,若出现问题可优先查询文末Troubleshooting

安装ceph-deploy

比起在每个Ceph节点上手动安装Ceph,用ceph-deploy工具统一安装要方便得多

警告
ceph-deploy没有在比Nautilus更新的Ceph版本上进行过测试。它不支持RHEL8、CentOS 8或更新的操作系统

1
yum install ceph-deploy -y

创建部署目录

创建一个Ceph部署目录,之后所有ceph-deploy开头的命令都要在这个目录下执行

1
2
mkdir -p ~/ceph-cluster
cd ~/ceph-cluster/

创建集群

创建集群,告诉ceph-deploy哪些节点是监控节点,并指定Ceph集群的public_networkcluster_network。命令成功执行后会在ceph-cluster目录下生成ceph.conf,ceph-deploy-ceph.log,ceph.mon.keyring等相关文件

1
2
# ceph-deploy new MON [MON..]
ceph-deploy new ceph-mon01 ceph-mon02 ceph-mon03 --public-network 10.211.55.0/24 --cluster-network 10.211.55.0/24

安装Ceph

在每个节点上都安装Ceph

1
2
# ceph-deploy install HOST [HOST..]
ceph-deploy install ceph-mon01 ceph-mon02 ceph-mon03

说明
若通过ceph-deploy安装失败,也可尝试登录到每个节点执行yum install ceph ceph-radosgw -y安装Ceph

设置MON和KEY

1
ceph-deploy mon create-initial

说明
执行完成后会在部署目录~/ceph-cluster/下生成以下keyring文件

  • ceph.bootstrap-mds.keyring
  • ceph.bootstrap-mgr.keyring
  • ceph.bootstrap-osd.keyring
  • ceph.bootstrap-rgw.keyring
  • ceph.client.admin.keyring

将ceph.client.admin.keyring拷贝到各个节点上

1
2
# ceph-deploy --overwrite-conf admin HOST [HOST..]
ceph-deploy --overwrite-conf admin ceph-mon01 ceph-mon02 ceph-mon03

安装MGR

说明
MGR,即Ceph管理守护进程,进程名称为ceph-mgr,与MON一起运行,提供其监视和与外部监视和管理系统的接口。 自从12.x(Luminous)版本起,ceph-mgr是必须安装的,11.x(Kraken)版本之前,ceph-mgr是可选的。默认情况下,ceph-mgr不需要其他配置即可运行。

1
2
# ceph-deploy mgr create MGR [MGR..]
ceph-deploy mgr create ceph-mon01 ceph-mon02 ceph-mon03

启动OSD

说明
OSD可以通过两种方式管理它们存储的数据。从Luminous 12.2.z版本开始,新的默认(和推荐)后端是BlueStore。在Luminous之前,默认(也是唯一的选项)是FileStore。

查看OSD节点的硬盘情况

1
2
# ceph-deploy disk list HOST [HOST..]
ceph-deploy disk list ceph-mon01 ceph-mon02 ceph-mon03

查看创建OSD命令帮助

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# ceph-deploy osd --help
usage: ceph-deploy osd [-h] {list,create} ...

Create OSDs from a data disk on a remote host:

ceph-deploy osd create {node} --data /path/to/device

For bluestore, optional devices can be used::

ceph-deploy osd create {node} --data /path/to/data --block-db /path/to/db-device
ceph-deploy osd create {node} --data /path/to/data --block-wal /path/to/wal-device
ceph-deploy osd create {node} --data /path/to/data --block-db /path/to/db-device --block-wal /path/to/wal-device

For filestore, the journal must be specified, as well as the objectstore::

ceph-deploy osd create {node} --filestore --data /path/to/data --journal /path/to/journal

For data devices, it can be an existing logical volume in the format of:
vg/lv, or a device. For other OSD components like wal, db, and journal, it
can be logical volume (in vg/lv format) or it must be a GPT partition.

positional arguments:
{list,create}
list List OSD info from remote host(s)
create Create new Ceph OSD daemon by preparing and activating a
device

optional arguments:
-h, --help show this help message and exit

由HELP可知,对于数据设备,它可以是以下格式的现有逻辑卷VG/LV,或一个设备。对于其他OSD组件比如wal、db和journal,它可以是逻辑卷VG/LV,也可以是GPT分区。

使用FileStore

说明
以此实验环境为例,每台虚拟机上三块数据盘2块做OSD盘,1块划分成2个分区分别做OSD的journal盘

对所有节点的journal盘进行分区,一定要是GPT分区,分区数量与每台节点的OSD数量对应

1
2
3
parted -s /dev/sdd mklabel gpt
parted -s /dev/sdd mkpart primary 0% 50%
parted -s /dev/sdd mkpart primary 50% 100%

在ADM节点部署目录~/ceph-cluster下执行以下命令逐台擦除硬盘并创建OSD

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# ceph-deploy disk zap {node} /path/to/device [/path/to/device..]
ceph-deploy disk zap ceph-mon01 /dev/sdb /dev/sdc
ceph-deploy disk zap ceph-mon02 /dev/sdb /dev/sdc
ceph-deploy disk zap ceph-mon03 /dev/sdb /dev/sdc

# ceph-deploy osd create {node} --filestore --fs-type xfs --data /path/to/device --journal /path/to/journal
ceph-deploy osd create ceph-mon01 --filestore --fs-type xfs --data /dev/sdb --journal /dev/sdd1
ceph-deploy osd create ceph-mon01 --filestore --fs-type xfs --data /dev/sdc --journal /dev/sdd2

ceph-deploy osd create ceph-mon02 --filestore --fs-type xfs --data /dev/sdb --journal /dev/sdd1
ceph-deploy osd create ceph-mon02 --filestore --fs-type xfs --data /dev/sdc --journal /dev/sdd2

ceph-deploy osd create ceph-mon03 --filestore --fs-type xfs --data /dev/sdb --journal /dev/sdd1
ceph-deploy osd create ceph-mon03 --filestore --fs-type xfs --data /dev/sdc --journal /dev/sdd2

使用BlueStore

说明
以此实验环境为例,每台虚拟机上三块数据盘均作为OSD

在ADM节点部署目录~/ceph-cluster下执行以下命令逐台擦除硬盘并创建OSD

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# ceph-deploy disk zap {node} /path/to/device [/path/to/device..]
ceph-deploy disk zap ceph-mon01 /dev/sdb /dev/sdc /dev/sdd
ceph-deploy disk zap ceph-mon02 /dev/sdb /dev/sdc /dev/sdd
ceph-deploy disk zap ceph-mon03 /dev/sdb /dev/sdc /dev/sdd

# ceph-deploy osd create {node} --fs-type xfs --data /path/to/device
ceph-deploy osd create ceph-mon01 --fs-type xfs --data /dev/sdb
ceph-deploy osd create ceph-mon01 --fs-type xfs --data /dev/sdc
ceph-deploy osd create ceph-mon01 --fs-type xfs --data /dev/sdd

ceph-deploy osd create ceph-mon02 --fs-type xfs --data /dev/sdb
ceph-deploy osd create ceph-mon02 --fs-type xfs --data /dev/sdc
ceph-deploy osd create ceph-mon02 --fs-type xfs --data /dev/sdd

ceph-deploy osd create ceph-mon03 --fs-type xfs --data /dev/sdb
ceph-deploy osd create ceph-mon03 --fs-type xfs --data /dev/sdc
ceph-deploy osd create ceph-mon03 --fs-type xfs --data /dev/sdd

验证

执行ceph -s确认集群健康状态healthHEALTH_OK

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@ceph-mon01 ~]# ceph -s
cluster:
id: c1876cf4-ad68-442d-9c97-1bf3163f3542
health: HEALTH_OK

services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 (age 60m)
mgr: ceph-mon02(active, since 60m), standbys: ceph-mon03, ceph-mon01
osd: 6 osds: 6 up (since 5m), 6 in (since 5m)

data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 649 MiB used, 599 GiB / 600 GiB avail
pgs:

使用Ceph集群提供块存储服务


确认默认pg_num和副本数

  • pg_num:根据Total PGs = (#OSDs * 100) / pool size公式来决定pg_num(pgp_num应该设成和pg_num一样),所以6*100/3=200,官方建议取最接近的2的指数倍数,比如256。这是针对集群中所有pool的,每个pool的默认pg_num建议设置更小,比如32
  • 副本数:默认副本数建议为3,最小副本数建议为2

配置默认pg_num和副本数

编辑~/ceph-cluster/ceph.conf,在[global]下添加以下配置

1
2
3
4
osd_pool_default_size = 3
osd_pool_default_min_size = 2
osd_pool_default_pg_num = 32
osd_pool_default_pgp_num = 32

将配置文件同步到所有节点

1
ceph-deploy --overwrite-conf admin ceph-mon01 ceph-mon02 ceph-mon03

动态更新相关配置

1
2
3
4
ceph tell mon.\* injectargs '--osd_pool_default_size=3'
ceph tell mon.\* injectargs '--osd_pool_default_min_size=2'
ceph tell mon.\* injectargs '--osd_pool_default_pg_num=32'
ceph tell mon.\* injectargs '--osd_pool_default_pgp_num=32'

创建块存储池

创建一个块存储池koenli

1
2
3
4
5
6
# ceph osd pool create <poolname> <pg_num> <pgp_num>
ceph osd pool create koenli 32 32

# ceph osd pool application enable <poolname> <appname>
# <app-name> is 'cephfs', 'rbd', 'rgw', or freeform for custom applications
ceph osd pool application enable koenli rbd

创建完pool后再次查看集群状态,确认pgs均为active+clean状态,并且集群状态为HEALTH_OK

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@ceph-mon01 ~]# ceph -s
cluster:
id: c1876cf4-ad68-442d-9c97-1bf3163f3542
health: HEALTH_OK

services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 (age 75m)
mgr: ceph-mon02(active, since 75m), standbys: ceph-mon03, ceph-mon01
osd: 6 osds: 6 up (since 20m), 6 in (since 20m)

data:
pools: 1 pools, 256 pgs
objects: 0 objects, 0 B
usage: 654 MiB used, 599 GiB / 600 GiB avail
pgs: 32 active+clean

创建块并映射到本地

创建一个10G的块

1
2
# rbd create --size <size> <rbdname> --pool <poolname>
rbd create --size 10G disk01 --pool koenli

查看rbd

1
2
3
4
5
# rbs ls <poolname> -l
rbd ls koenli -l

# rbd info <poolname>/<rbdname>
rbd info koenli/disk01

将rbd块映射到本地

1
2
# rbd map <poolname>/<rbdname>
rbd map koenli/disk01

查看映射,可见koenli/disk0已经映射到本地,相当于本地的一块硬盘/dev/rbd0,可对其进行分区、格式化、挂载等相关操作,此处不再赘述。

1
2
3
# rbd showmapped 
id pool namespace image snap device
0 koenli disk01 - /dev/rbd0

取消映射并删除块

1
2
3
4
5
# rbd unmap <poolname>/<rbdname>
rbd unmap koenli/disk01

# rbd remove <poolname>/<rbdname>
rbd remove koenli/disk01

使用Ceph集群提供文件存储服务


启用MDS服务

在ADM节点部署目录~/ceph-cluster下执行以下命令启用MDS服务

1
2
# ceph-deploy mds create MDS [MDS..]
ceph-deploy mds create ceph-mon01 ceph-mon02 ceph-mon03

执行ceph -s验证,出现类似mds: 3 up:standby信息表示启用成功

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# ceph -s
cluster:
id: c1876cf4-ad68-442d-9c97-1bf3163f3542
health: HEALTH_OK

services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 (age 40m)
mgr: ceph-mon01(active, since 39m), standbys: ceph-mon03, ceph-mon02
mds: 3 up:standby
osd: 6 osds: 6 up (since 39m), 6 in (since 17h)

data:
pools: 1 pools, 256 pgs
objects: 3 objects, 19 B
usage: 660 MiB used, 599 GiB / 600 GiB avail
pgs: 32 active+clean

创建文件存储池

创建两个存储池,cephfs_metadata用于存文件系统元数据,cephfs_data用于存文件系统数据

1
2
3
4
5
6
7
8
# ceph osd pool create <poolname> <pg_num> <pgp_num>
ceph osd pool create cephfs_metadata 32 32
ceph osd pool create cephfs_data 32 32

# ceph osd pool application enable <poolname> <appname>
# <app-name> is 'cephfs', 'rbd', 'rgw', or freeform for custom applications
ceph osd pool application enable cephfs_metadata cephfs
ceph osd pool application enable cephfs_data cephfs

创建文件系统

1
2
# fs new <fsname> <metadata_pool> <data_pool>
ceph fs new koenlifs cephfs_metadata cephfs_data

查看文件系统和MDS状态

1
2
3
4
5
# ceph fs ls
name: koenlifs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]

# ceph mds stat
koenlifs:1 {0=ceph-mon01=up:active} 2 up:standby

验证挂载

在要挂载的节点上安装EPEL软件源并配置对应版本的Ceph软件源

说明
如果要挂载的节点为Ceph集群节点则忽略此步

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# 安装EPEL软件源
yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm -y

# 配置对应版本的Ceph软件源
# 将CEPH_STABLE_RELEASE的值替换成要安装的Ceph的版本号,例如nautilus
CEPH_STABLE_RELEASE=nautilus
cat > /etc/yum.repos.d/ceph.repo << EOF
[Ceph]
name=Ceph packages for \$basearch
baseurl=https://mirrors.aliyun.com/ceph/rpm-${CEPH_STABLE_RELEASE}/el7/\$basearch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc

[ceph-noarch]
name=Ceph noarch packages
baseurl=https://mirrors.aliyun.com/ceph/rpm-${CEPH_STABLE_RELEASE}/el7/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc

[ceph-source]
name=Ceph source packages
baseurl=https://mirrors.aliyun.com/ceph/rpm-${CEPH_STABLE_RELEASE}/el7/SRPMS/
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc
EOF

在要挂载的节点上安装ceph-fuse

1
yum install ceph-fuse -y

任选一台Ceph集群的节点,将/etc/ceph目录下的文件拷贝到要挂载的节点上

说明
如果挂载客户端为Ceph集群节点则忽略此步

1
scp -r /etc/ceph/* <client_ip>:/etc/ceph/

使用ceph-fuse命令挂载

说明
更多关于ceph-fuse的使用说明可以参考官方文档

1
2
# ceph-fuse -n client.<username> -m <mon-ip1>:<mon-port>,<mon-ip2>:<mon-port>,<mon-ip3>:<mon-port> <mountpoint>
ceph-fuse -n client.admin -m 10.211.55.7:6789,10.211.55.8:6789,10.211.55.9:6789 /mnt/

说明
若要开机自动挂载,需要在/etc/fstab中添加如下相关条目

1
2
# none    <mountpoint>  fuse.ceph ceph.id=<username>,ceph.conf=<path/to/ceph.conf>,_netdev,defaults  0 0
none /mnt/ fuse.ceph ceph.id=admin,ceph.conf=/etc/ceph/ceph.conf,_netdev,defaults 0 0

卸载

1
2
# umount <mountpoint>
umount /mnt/

说明
若配置过开机自动挂载,还需删除/etc/fstab中的对应挂载条目

使用Ceph集群提供对象存储服务


在ADM节点部署目录~/ceph-cluster下执行以下命令启用RGW服务

说明
radosgw的FastCGI可以支持多种类型的WebServer,如Apache2、Nginx等。Ceph从Hammer版本开始,在使用ceph-deploy的情况下默认使用内置的civetweb替代旧版本的Apache2部署方式。

1
2
# ceph-deploy rgw create RGW [RGW..]
ceph-deploy rgw create ceph-mon01 ceph-mon02 ceph-mon03

执行ceph -s验证,出现类似rgw: 3 daemons active (ceph-mon01, ceph-mon02, ceph-mon03)信息表示启用成功

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# ceph -s
cluster:
id: c1876cf4-ad68-442d-9c97-1bf3163f3542
health: HEALTH_WARN
5 daemons have recently crashed

services:
mon: 3 daemons, quorum ceph-mon01,ceph-mon02,ceph-mon03 (age 63m)
mgr: ceph-mon01(active, since 63m), standbys: ceph-mon03, ceph-mon02
mds: koenlifs:1 {0=ceph-mon02=up:active} 2 up:standby
osd: 6 osds: 6 up (since 63m), 6 in (since 29h)
rgw: 3 daemons active (ceph-mon01, ceph-mon02, ceph-mon03)

task status:

data:
pools: 7 pools, 144 pgs
objects: 210 objects, 64 KiB
usage: 691 MiB used, 599 GiB / 600 GiB avail
pgs: 224 active+clean

安装Dashboard


官方文档:https://docs.ceph.com/en/latest/mgr/dashboard/

在所有MGR节点安装ceph-mgr-dashboard

1
yum install ceph-mgr-dashboard -y

在任意一台MGR节点执行以下命令开启Dashboard

1
ceph mgr module enable dashboard

创建证书和WEB登录用户名密码

说明
也可直接通过ceph config set mgr mgr/dashboard/ssl false禁用SSL

1
2
3
4
5
6
7
8
9
10
# 生成自签证书
ceph dashboard create-self-signed-cert

# 配置WEB登录用户名密码
cat > /root/ceph-dashboard-password << EOF
123456
EOF

# ceph dashboard set-login-credentials <username> -i <passwordfile>
ceph dashboard set-login-credentials koenli -i /root/ceph-dashboard-password

查看服务访问方式

1
2
3
4
# ceph mgr services
{
"dashboard": "https://ceph-mon01:8443/"
}

完成以上配置后,在浏览器输入https://ceph-mon01:8443/,输入用户名koenli,密码123456登录即可查看

说明
默认通过主机名访问,可换成对应节点的IP地址或本地配置Host解析

说明
若禁用SSL,默认访问地址为https://ceph-mon01:8080/

如何剔除OSD


当OSD对应的硬盘设备出现坏道或其他情况,需要将磁盘对应的OSD剔除集群,进行维护,可参考以下步骤

说明
命令中的{id}替换为要剔除的OSD id,每一步执行完按照注释内容确认状态后再继续下一步

1
2
3
4
5
6
7
8
systemctl stop ceph-osd@{id}    # ceph osd tree查看对应OSD状态为down
ceph osd out osd.{id} # ceph osd tree查看对应OSD权重变为0
ceph osd crush remove osd.{id} # ceph osd tree查看对应OSD移出root default
ceph osd rm osd.{id} # ceph osd tree查看不到对应OSD
ceph auth rm osd.{id} # ceph auth list| grep -A 4 osd.{id}查看不到对应OSD的认证信息
umount /var/lib/ceph/osd/ceph-{id}/
dmsetup remove ceph--xxxx--data--yyyy
dd if=/dev/zero of=/dev/sdb bs=512K count=1

说明
如果执行systemctl stop ceph-osd@{id}关闭OSD许久后,OSD仍然处于up状态,可通过ceph osd down osd.{id}将OSD置为down

如何清理集群


部署过程中如果出现任何奇怪的问题无法解决,可以简单的删除一切从头再来:

1
2
3
ceph-deploy purge ceph-mon01 ceph-mon02 ceph-mon03
ceph-deploy purgedata ceph-mon01 ceph-mon02 ceph-mon03
ceph-deploy forgetkeys

Troubleshooting


ImportError: No module named pkg_resources

问题现象

执行ceph-deploy new MON [MON..]时出现如下错误信息:

1
2
3
4
5
6
Traceback (most recent call last):
File "/usr/bin/ceph-deploy", line 18, in <module>
from ceph_deploy.cli import main
File "/usr/lib/python2.7/site-packages/ceph_deploy/cli.py", line 1, in <module>
import pkg_resources
ImportError: No module named pkg_resources

解决方法

在ceph-deploy节点上安装python-setuptools

1
yum install python-setuptools -y

RuntimeError: Failed to execute command: ceph --version

问题现象

执行ceph-deploy install HOST [HOST..]时出现如下错误信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[ceph-mon][INFO  ] Running command: ceph --version
[ceph-mon][ERROR ] Traceback (most recent call last):
[ceph-mon][ERROR ] File "/usr/lib/python2.7/site-packages/ceph_deploy/lib/vendor/remoto/process.py", line 119, in run
[ceph-mon][ERROR ] reporting(conn, result, timeout)
[ceph-mon][ERROR ] File "/usr/lib/python2.7/site-packages/ceph_deploy/lib/vendor/remoto/log.py", line 13, in reporting
[ceph-mon][ERROR ] received = result.receive(timeout)
[ceph-mon][ERROR ] File "/usr/lib/python2.7/site-packages/ceph_deploy/lib/vendor/remoto/lib/vendor/execnet/gateway_base.py", line 704, in receive
[ceph-mon][ERROR ] raise self._getremoteerror() or EOFError()
[ceph-mon][ERROR ] RemoteError: Traceback (most recent call last):
[ceph-mon][ERROR ] File "/usr/lib/python2.7/site-packages/ceph_deploy/lib/vendor/remoto/lib/vendor/execnet/gateway_base.py", line 1036, in executetask
[ceph-mon][ERROR ] function(channel, **kwargs)
[ceph-mon][ERROR ] File "<remote exec>", line 12, in _remote_run
[ceph-mon][ERROR ] File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
[ceph-mon][ERROR ] errread, errwrite)
[ceph-mon][ERROR ] File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child
[ceph-mon][ERROR ] raise child_exception
[ceph-mon][ERROR ] OSError: [Errno 2] No such file or directory
[ceph-mon][ERROR ]
[ceph-mon][ERROR ]
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph --version

解决方法

登录到报错的节点执行以下命令安装Ceph

1
yum install ceph ceph-radosgw -y

stderr: wipefs: error

问题现象

执行ceph-deploy disk zap或在处理Ceph OSD换盘,对新盘进行zap处理的时候,出现如下错误信息:

1
2
3
4
5
6
7
[ceph-mon01][INFO  ] Running command: /usr/sbin/ceph-volume lvm zap /dev/sdb
[ceph-mon01][WARNIN] --> Zapping: /dev/sdb
[ceph-mon01][WARNIN] --> --destroy was not specified, but zapping a whole device will remove the partition table
[ceph-mon01][WARNIN] stderr: wipefs: error: /dev/sdb: probing initialization failed: Device or resource busy
[ceph-mon01][WARNIN] --> failed to wipefs device, will try again to workaround probable race condition
[ceph-mon01][WARNIN] --> RuntimeError: could not complete wipefs on device: /dev/sdb
[ceph-mon01][ERROR ] RuntimeError: command returned non-zero exit status: 1

解决方法

登录报错节点通过lsblk确认报错硬盘是否有残余映射信息,并通过dmsetup手动移除

1
2
3
4
5
6
7
# lsblk 
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdb 8:16 0 100G 0 disk
└─ceph--5a9ad8a8--a449--4086--ac27--e18cf778e831-osd--journal--f77ff517--9fbc--4caa--be33--0ce675e07a19 253:0 0 100G 0 lvm

dmsetup remove ceph--5a9ad8a8--a449--4086--ac27--e18cf778e831-osd--journal--f77ff517--9fbc--4caa--be33--0ce675e07a19
dd if=/dev/zero of=/dev/sdb bs=512K count=1

mons are allowing insecure global_id reclaim

问题现象

查看集群状态时显示如下错误信息:

1
mons are allowing insecure global_id reclaim

解决方法

Ceph v14.2.20修复了一个Ceph身份验证框架中的安全漏洞,增加了相关警告。相关信息,请参考CVE-2021-20288。可以通过设置将其禁用,文档建议升级到O版后在禁用此设置。

1
2
ceph config set mon mon_warn_on_insecure_global_id_reclaim_allowed false
ceph config set mon auth_allow_insecure_global_id_reclaim false

无法删除pool

问题现象

删除pool时出现如下错误信息:

1
Error EPERM: pool deletion is disabled; you must first set the mon_allow_pool_delete config option to true before you can destroy a pool

解决方法

编辑~/ceph-cluster/ceph.conf添加以下配置

1
2
3
...
[mon]
mon_allow_pool_delete = true

将配置文件同步到所有节点

1
ceph-deploy --overwrite-conf admin ceph-mon01 ceph-mon02 ceph-mon03

动态更新mon_allow_pool_delete参数

1
2
ceph tell mon.\* injectargs '--mon_allow_pool_delete=true'
ceph daemon /var/run/ceph/ceph-mon.ceph-mon01.asok config show |grep mon_allow_pool_delete

无法rbd映射到本地

问题现象

将rbd映射到本地时出现如下错误信息:

1
2
3
4
rbd: sysfs write failed
RBD image feature set mismatch. You can disable features unsupported by the kernel with "rbd feature disable koenli/disk01 object-map fast-diff deep-flatten".
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (6) No such device or address

解决方法

因为CentOS7默认内核版本不支持Ceph的一些特性,需要手动禁用掉才能map成功

1
rbd feature disable koenli/disk01 object-map fast-diff deep-flatten

RGW启动失败

问题现象

启用RGW服务后执行ceph -s在services里没看到有rgw的daemons,在RGW节点上通过systemctl status ceph-radosgw@rgw.<hostname>发现服务启动失败

尝试手动启动RGW服务出现如下错误信息:

1
2
3
4
5
6
7
8
9
10
11
# /usr/bin/radosgw -d --cluster ceph --name client.rgw.ceph-mon01 --setuser ceph --setgroup ceph --debug-rgw=20
2021-09-03 21:51:03.372 7f4f900a5900 0 deferred set uid:gid to 167:167 (ceph:ceph)
2021-09-03 21:51:03.372 7f4f900a5900 0 ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351) nautilus (stable), process radosgw, pid 12559
2021-09-03 21:51:03.394 7f4f79cd5700 20 reqs_thread_entry: start
2021-09-03 21:51:03.401 7f4f900a5900 20 rados->read ofs=0 len=0
2021-09-03 21:51:03.987 7f4f900a5900 0 rgw_init_ioctx ERROR: librados::Rados::pool_create returned (34) Numerical result out of range (this can be due to a pool or placement group misconfiguration, e.g. pg_num < pgp_num or mon_max_pg_per_osd exceeded)
2021-09-03 21:51:03.987 7f4f900a5900 20 get_rados_obj() on obj=.rgw.root:default.realm returned -34
2021-09-03 21:51:03.987 7f4f900a5900 0 failed reading realm info: ret -34 (34) Numerical result out of range
2021-09-03 21:51:03.987 7f4f900a5900 0 ERROR: failed to start notify service ((34) Numerical result out of range
2021-09-03 21:51:03.987 7f4f900a5900 0 ERROR: failed to init services (ret=(34) Numerical result out of range)
2021-09-03 21:51:03.989 7f4f900a5900 -1 Couldn't init storage provider (RADOS)

解决方法

启用RGW服务时需要新建4个存储池(.rgw.rootdefault.rgw.controldefault.rgw.metadefault.rgw.log),按错误信息来看这是由于某个存储池或PG配置错误造成的,比如pg_num < pgp_nummon_max_pg_per_osd超出,大多数是后者。因此可通过调低新建存储池时的默认pg_numpgp_num值或调低已有存储池的pg_numpgp_num来解决

1
2
3
4
5
6
7
# 调低新建存储池时的pg_num和pgp_num
ceph tell mon.\* injectargs '--osd_pool_default_pg_num=32'
ceph tell mon.\* injectargs '--osd_pool_default_pgp_num=32'

# 调低已有存储池的pg_num和pgp_num
ceph osd pool set <poolname> pg_num <pg_num>
ceph osd pool set <poolname> pgp_num <pgp_num>

参考文档