在CentOS7.1上安装分布式存储系统Ceph

关于Ceph的介绍本文不再赘述,可以查看官方文档进行了解

Sage Weil读博士的时候开发了这套牛逼的分布式存储系统,最初是奔着高性能分布式文件系统去的,结果云计算风口一来,Ceph重心转向了分布式块存储(Block Storage)和分布式对象存储(Object Storage),现在分布式文件系统CephFS还停在beta阶段。Ceph现在是云计算、虚拟机部署的最火开源存储解决方案,据说有20%的OpenStack部署存储用的都是Ceph的Block Storage。

Ceph 提供3种存储方式:对象存储,块存储和文件系统,下图展示了Ceph存储集群的架构:

硬件环境准备


由于条件限制,本文所有实验机器全是虚拟机,共准备了3台虚拟机,其中1台做监控节点(ceph-mon),2台做存储节点(ceph-osd1,ceph-osd2)。

Ceph要求必须是奇数个监控节点,而且最少3个(自己玩玩的话,1个也是可以的),ceph-adm是可选的,可以把ceph-adm放在monitor上,只不过把ceph-adm单独拿出来架构上看更清晰一些。当然也可以把mon放在 osd上,生产环境下是不推荐这样做的。

  • ADM 服务器硬件配置比较随意,只是用来操作和管理 Ceph;
  • MON 服务器1块硬盘用来安装操作系统;
  • OSD 服务器上用4块20GB的硬盘做Ceph存储,每个osd对应1块硬盘,每个osd需要1个Journal,所以4块硬盘需要4个Journal,我们用1块20GB硬盘做journal,将硬盘等分成4个区,这样每个区分别对应一个osd硬盘的journal.

软件环境准备


所有Ceph集群节点采用CentOS7.1版本(CentOS-7-x86_64-Minimal-1503-01),所有文件系统采用Ceph官方推荐的xfs。

安装完CentOS后我们需要在每个节点上(包括ceph-adm)做一点基础配置,比如关闭SELINUX、关闭防火墙、同步时间等。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
关闭 SELINUX
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
setenforce 0
reboot

关闭iptables
systemctl stop firewalld
systemctl disable firewalld

安装 EPEL 软件源:
rpm -Uvh https://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-8.noarch.rpm

同步时间
yum -y ntp
ntpdate asia.pool.ntp.org

修改/etc/hosts
cat >> /etc/hosts << EOF
192.168.128.131 ceph-mon
192.168.128.132 ceph-osd1
192.168.128.133 ceph-osd2
EOF

在每台osd服务器上对4块硬盘进行分区,创建XFS文件系统,对1块用作journal的硬盘分4个区,每个区对应一块硬盘,不需要创建文件系统,留给Ceph自己处理。

1
2
3
parted -a optimal --script /dev/sdc -- mktable gpt
parted -a optimal --script /dev/sdc -- mkpart primary xfs 0% 100%
mkfs.xfs -f /dev/sdc1

在生产环境中,每台osd服务器上硬盘远不止4台,以上命令需要对多个硬盘进行处理,重复的操作太多,以后还会陆续增加服务器,写成脚本parted.sh方便操作,其中/dev/sdc|d|e|f分别是4块硬盘,/dev/sdb是用做journal的硬盘:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/bin/bash

set -e
if [ ! -x "/sbin/parted" ]; then
echo "This script requires /sbin/parted to run!" >&2
exit 1
fi

DISKS="c d e f"
for i in ${DISKS}; do
echo "Creating partitions on /dev/sd${i} ..."
parted -a optimal --script /dev/sd${i} -- mktable gpt
parted -a optimal --script /dev/sd${i} -- mkpart primary xfs 0% 100%
sleep 1
#echo "Formatting /dev/sd${i}1 ..."
mkfs.xfs -f /dev/sd${i}1 &
done

JOURNALDISK="b"
for i in ${JOURNALDISK}; do
parted -s /dev/sd${i} mklabel gpt
parted -s /dev/sd${i} mkpart primary 0% 25%
parted -s /dev/sd${i} mkpart primary 26% 50%
parted -s /dev/sd${i} mkpart primary 51% 75%
parted -s /dev/sd${i} mkpart primary 76% 100%
done

在ceph-mon上运行ssh-keygen生成ssh key文件,注意passphrase是空,把ssh key拷贝到每一个Ceph节点上:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[root@ceph-mon ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
36:1d:33:20:38:15:4a:8f:50:c4:94:bc:43:ef:a2:a2 root@ceph-mon
The key's randomart image is:
+--[ RSA 2048]----+
| .*=++.. |
| oB+ . . |
| .o+. + |
| o . . + |
| o S . |
| . .. . |
| . . |
|. . |
|E. |
+-----------------+
[root@ceph-mon ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub ceph-mon
[root@ceph-mon ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub ceph-osd1
[root@ceph-mon ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub ceph-osd2

在ceph-mon上确保登陆到每台节点上都能无密码ssh登陆。

Ceph部署


比起在每个Ceph节点上手动安装Ceph,用ceph-deploy工具统一安装要方便得多:

1
yum install ceph-deploy -y

创建一个ceph工作目录,以后的操作都在这个目录下面进行:

1
2
[root@ceph-mon ~]# mkdir ~/ceph-cluster
[root@ceph-mon ~]# cd ceph-cluster/

初始化集群,告诉ceph-deploy哪些节点是监控节点,命令成功执行后会在ceph-cluster目录下生成ceph.conf,ceph.log,ceph.mon.keyring等相关文件:

1
[root@ceph-mon ceph-cluster]# ceph-deploy new ceph-mon

在每个Ceph节点上都安装Ceph:

1
[root@ceph-mon ceph-cluster]# ceph-deploy install ceph-mon ceph-osd1 ceph-osd2

此处可能出现类似如下错误:

1
2
3
4
5
6
[ceph-mon][ERROR ]   File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child
[ceph-mon][ERROR ] raise child_exception
[ceph-mon][ERROR ] OSError: [Errno 2] No such file or directory
[ceph-mon][ERROR ]
[ceph-mon][ERROR ]
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph --version

解决方法是在报错的节点上执行下面的命令:

1
[root@ceph-mon ceph-cluster]# yum install *argparse* -y

初始化监控节点:

1
[root@ceph-mon ceph-cluster]# ceph-deploy mon create-initial

查看一下Ceph存储节点的硬盘情况:

1
2
[root@ceph-mon ceph-cluster]# ceph-deploy disk list ceph-osd1
[root@ceph-mon ceph-cluster]# ceph-deploy disk list ceph-osd2

初始化Ceph硬盘,然后创建osd存储节点,存储节点:单个硬盘:对应的journal分区,一一对应:

1
2
3
4
5
6
7
创建ceph-osd1存储节点
[root@ceph-mon ceph-cluster]# ceph-deploy disk zap ceph-osd1:sdc ceph-osd1:sdd ceph-osd1:sde ceph-osd1:sdf
[root@ceph-mon ceph-cluster]# ceph-deploy osd create ceph-osd1:sdc:/dev/sdb1 ceph-osd1:sdd:/dev/sdb2 ceph-osd1:sde:/dev/sdb3 ceph-osd1:sdf:/dev/sdb4

创建ceph-osd2存储节点
[root@ceph-mon ceph-cluster]# ceph-deploy disk zap ceph-osd2:sdc ceph-osd2:sdd ceph-osd2:sde ceph-osd2:sdf
[root@ceph-mon ceph-cluster]# ceph-deploy osd create ceph-osd2:sdc:/dev/sdb1 ceph-osd2:sdd:/dev/sdb2 ceph-osd2:sde:/dev/sdb3 ceph-osd2:sdf:/dev/sdb4

最后,我们把生成的配置文件从ceph-adm同步部署到其他几个节点,使得每个节点的ceph配置一致:

1
[root@ceph-mon ceph-cluster]# ceph-deploy --overwrite-conf admin ceph-mon ceph-osd1 ceph-osd2

测试

看一下配置成功了没?

1
2
[root@ceph-mon ceph-cluster]# ceph health
HEALTH_WARN 6 pgs degraded; 6 pgs stuck degraded; 64 pgs stuck unclean; 6 pgs stuck undersized; 6 pgs undersized; too few PGs per OSD (16 < min 30)

增加PG数目,根据Total PGs = (#OSDs * 100) / pool size公式来决定pg_num(pgp_num应该设成和pg_num一样),所以8*100/2=400,Ceph官方推荐取最接近2的指数倍,所以选择512(若日是太大则选择256)。如果顺利的话,就应该可以看到HEALTH_OK了:

1
2
3
4
5
6
7
8
9
10
[root@ceph-mon ceph-cluster]# ceph osd pool set rbd size 2
set pool 0 size to 2
[root@ceph-mon ceph-cluster]# ceph osd pool set rbd min_size 2
set pool 0 min_size to 2
[root@ceph-mon ceph-cluster]# ceph osd pool set rbd pg_num 256
set pool 0 pg_num to 256
[root@ceph-mon ceph-cluster]# ceph osd pool set rbd pgp_num 256
set pool 0 pgp_num to 256
[root@ceph-mon ceph-cluster]# ceph health
HEALTH_OK

更详细一点:

1
2
3
4
5
6
7
8
9
[root@ceph-mon ceph-cluster]# ceph -s
cluster 38a7726b-6018-41f4-83c2-911b325116df
health HEALTH_OK
monmap e1: 1 mons at {ceph-mon=192.168.128.131:6789/0}
election epoch 2, quorum 0 ceph-mon
osdmap e46: 8 osds: 8 up, 8 in
pgmap v72: 256 pgs, 1 pools, 0 bytes data, 0 objects
276 MB used, 159 GB / 159 GB avail
256 active+clean

如果操作没有问题的话记得把上面操作写到ceph.conf文件里,并同步部署的各节点:

1
2
3
4
5
[root@ceph-mon ceph-cluster]# echo "osd pool default size = 2"  >> ~/ceph-cluster/ceph.conf
[root@ceph-mon ceph-cluster]# echo "osd pool default min size = 2" >> ~/ceph-cluster/ceph.conf
[root@ceph-mon ceph-cluster]# echo "osd pool default pg num = 256" >> ~/ceph-cluster/ceph.conf
[root@ceph-mon ceph-cluster]# echo "osd pool default pgp num = 256" >> ~/ceph-cluster/ceph.conf
[root@ceph-mon ceph-cluster]# ceph-deploy --overwrite-conf admin ceph-mon ceph-osd1 ceph-osd2

如果一切可以重来

部署过程中如果出现任何奇怪的问题无法解决,可以简单的删除一切从头再来:

1
2
3
[root@ceph-mon ceph-cluster]# ceph-deploy purge ceph-mon ceph-osd1 ceph-osd2
[root@ceph-mon ceph-cluster]# ceph-deploy purgedata ceph-mon ceph-osd1 ceph-osd2
[root@ceph-mon ceph-cluster]# ceph-deploy forgetkeys

Troubelshooting

如果出现任何网络问题,首先确认节点可以互相无密码ssh,各个节点的防火墙已关闭或加入规则

  • EOF

本文作者:Koen

参考链接:http://www.vpsee.com/2015/07/install-ceph-on-centos-7/