预估ceph的迁移数据量

引言

我们在进行 ceph 的 osd 的增加和减少的维护的时候,会碰到迁移数据,但是我们平时会怎么去回答关于迁移数据量的问题,一般来说,都是说很多,或者说根据环境来看,有没有精确的一个说法,到底要迁移多少数据?这个我以前也有思考过这个问题,当时想是对比前后的pg的分布,然后进行计算,正好在翻一些资料的时候,看到有alram写的一篇博客,alram是Inktank的程序员,也就是sage所在的公司,程序是一个python脚本,本篇会分析下这个对比的思路,以及运行效果

计算迁移量只需要一个修改后的crushmap就可以了,这个是离线计算的,所以不会对集群有什么影响

运行效果

准备修改后的crushmap

获取当前crushmap

1
ceph osd getcrushmap -o crushmap

解码crushmap

1
crushtool -d crushmap -o crushmap.txt

修改crushmap.txt
这个根据自己需要,修改成自己想修改成的crushmap即可,可以是增加,也可以是删除

减少节点的计算

假如删除一个osd.5 我们需要迁移多少数据
将crushmap里面的osd.5的weight改成0

1
crushtool -c crushmap.txt -o crushmapnew

运行计算脚本

1
2
3
4
5
[root@lab8106 ceph]# python jisuan.py --crushmap-file crushmapnew
POOL REMAPPED OSDs BYTES REBALANCE OBJECTS REBALANCE
rbd 59 6157238296 1469
data 54 5918162968 1412
metadata 53 5825888280 1390

可以看到迁移的数据量
REMAPPED OSDs 下面就是有多少份的PG数据需要迁移,这里面计算的方式是比较前后的分布

[1,2] - > [1,2] 迁移0个

[1,2] - > [4,2] 迁移1个

[1,2] - > [4,3] 迁移2个

上面的统计的是这样的个数,所以不太好说是PG或者是OSD,可以理解为PG内数据的份数,因为单个PG可能需要迁移一份,也有可能迁移两份,或者多份

增加节点的计算

如果增加一个osd.6 我们需要迁移多少数据
直接运行脚本

1
2
3
4
5
[root@lab8106 ceph]# python jisuan.py --crushmap-file crushmapnew
POOL REMAPPED OSDs BYTES REBALANCE OBJECTS REBALANCE
rbd 0 0 0
data 0 0 0
metadata 0 0 0

可以看到没有输出,这个是因为计算的脚本里面有个地方报错了,ceph内部有个限制,在将crushmap import进osdmap的时候,ceph会验证osdmap里面的osd个数和crushmap里面的osd个数是不是相同
所以这个地方需要多做一步,将osd的个数设置成跟预估的一致,这个是唯一对现有集群做的修改操作,

1
2
[root@lab8106 ceph]# ceph osd setmaxosd 7
set new max_osd = 7

然后再次运行就可以了

1
2
3
4
5
[root@lab8106 ceph]# python jisuan.py --crushmap-file crushmapnew
POOL REMAPPED OSDs BYTES REBALANCE OBJECTS REBALANCE
rbd 31 3590324224 856
data 34 3372220416 804
metadata 41 4492099584 1071

上面就是运行的效果,下面我们对内部的逻辑进行分析

代码和代码分析

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
#!/usr/bin/env python

import ast
import json
import os
import subprocess
import argparse
import sys

FNULL = open(os.devnull, 'w')

# assume the osdmap test output
# is the same lenght and order...
# if add support for PG increase
# that's gonna break
def diff_output(original, new, pools):
number_of_osd_remap = 0
osd_data_movement = 0

results = {}

pg_data, pg_objects = get_pg_info()

for i in range(len(original)):
orig_i = original[i]
new_i = new[i]

if orig_i[0].isdigit():
pg_id = orig_i.split('\t')[0]
pool_id = pg_id[0]
pool_name = pools[pool_id]['pool_name']

if not pool_name in results:
results[pool_name] = {}
results[pool_name]['osd_remap_counter'] = 0
results[pool_name]['osd_bytes_movement'] = 0
results[pool_name]['osd_objects_movement'] = 0

original_mappings = ast.literal_eval(orig_i.split('\t')[1])
new_mappings = ast.literal_eval(new_i.split('\t')[1])
intersection = list(set(original_mappings).intersection(set(new_mappings)))

osd_movement_for_this_pg = int(pools[pool_id]['pool_size']) - len(intersection)
osd_data_movement_for_this_pg = int(osd_movement_for_this_pg) * int(pg_data[pg_id])
osd_object_movement_for_this_pg = int(osd_movement_for_this_pg) * int(pg_objects[pg_id])

results[pool_name]['osd_remap_counter'] += osd_movement_for_this_pg
results[pool_name]['osd_bytes_movement'] += int(osd_data_movement_for_this_pg)
results[pool_name]['osd_objects_movement'] += int(osd_object_movement_for_this_pg)

elif orig_i.startswith('#osd'):
break

return results

def get_pools_info(osdmap_path):
pools = {}
args = ['osdmaptool', '--print', osdmap_path]
osdmap_out = subprocess.check_output(args, stderr=FNULL).split('\n')
for line in osdmap_out:
if line.startswith('pool'):
pool_id = line.split()[1]
pool_size = line.split()[5]
pool_name = line.split()[2].replace("'","")
pools[pool_id] = {}
pools[pool_id]['pool_size'] = pool_size
pools[pool_id]['pool_name'] = pool_name
elif line.startswith('max_osd'):
break

return pools

def get_osd_map(osdmap_path):
args = ['sudo', 'ceph', 'osd', 'getmap', '-o', osdmap_path]
subprocess.call(args, stdout=FNULL, stderr=subprocess.STDOUT)

def get_pg_info():
pg_data = {}
pg_objects = {}
args = ['sudo', 'ceph', 'pg', 'dump']
pgmap = subprocess.check_output(args, stderr=FNULL).split('\n')

for line in pgmap:
if line[0].isdigit():
pg_id = line.split('\t')[0]
pg_bytes = line.split('\t')[6]
pg_obj = line.split('\t')[1]
pg_data[pg_id] = pg_bytes
pg_objects[pg_id] = pg_obj
elif line.startswith('pool'):
break

return pg_data, pg_objects

def osdmaptool_test_map_pgs_dump(original_osdmap_path, crushmap):
new_osdmap_path = original_osdmap_path + '.new'
get_osd_map(original_osdmap_path)
args = ['osdmaptool', '--test-map-pgs-dump', original_osdmap_path]
original_osdmaptool_output = subprocess.check_output(args, stderr=FNULL).split('\n')

args = ['cp', original_osdmap_path, new_osdmap_path]
subprocess.call(args, stdout=FNULL, stderr=subprocess.STDOUT)
args = ['osdmaptool', '--import-crush', crushmap, new_osdmap_path]
subprocess.call(args, stdout=FNULL, stderr=subprocess.STDOUT)
args = ['osdmaptool', '--test-map-pgs-dump', new_osdmap_path]
new_osdmaptool_output = subprocess.check_output(args, stderr=FNULL).split('\n')

pools = get_pools_info(original_osdmap_path)
results = diff_output(original_osdmaptool_output, new_osdmaptool_output, pools)

return results


def dump_plain_output(results):
sys.stdout.write("%-20s %-20s %-20s %-20s\n" % ("POOL", "REMAPPED OSDs", "BYTES REBALANCE", "OBJECTS REBALANCE"))

for pool in results:
sys.stdout.write("%-20s %-20s %-20s %-20s\n" % (
pool,
results[pool]['osd_remap_counter'],
results[pool]['osd_bytes_movement'],
results[pool]['osd_objects_movement']
))

def cleanup(osdmap):
FNULL.close()
new_osdmap = osdmap + '.new'
os.remove(new_osdmap)

def parse_args():
parser = argparse.ArgumentParser(description='Ceph CRUSH change data movement calculator.')

parser.add_argument(
'--osdmap-file',
help="Where to save the original osdmap. Temp one will be <location>.new. Default: /tmp/osdmap",
default="/tmp/osdmap",
dest="osdmap_path"
)
parser.add_argument(
'--crushmap-file',
help="CRUSHmap to run the movement test against.",
required=True,
dest="new_crushmap"
)

parser.add_argument(
'--format',
help="Output format. Default: plain",
choices=['json', 'plain'],
dest="format",
default="plain"
)

args = parser.parse_args()
return args

if __name__ == '__main__':
ctx = parse_args()

results = osdmaptool_test_map_pgs_dump(ctx.osdmap_path, ctx.new_crushmap)
cleanup(ctx.osdmap_path)

if ctx.format == 'json':
print json.dumps(results)
elif ctx.format == 'plain':
dump_plain_output(results)

直接放在这里方便拷贝,也可以去原作者的gist里面去获取

主要代码分析

首先获取osdmap

1
ceph osd getmap -o /tmp/osdmap

获取原始pg分布

1
使用osdmaptool  --test-map-pgs-dump /tmp/osdmap

获取新的crushmap

1
这个是自己编辑成需要的crushmap

将新的crushmap注入到osdmap里面得到新的osdmap

1
osdmaptool --import-crush  crushmap  /tmp/new_osdmap_path

根据新的osdmap进行计算新的分布

1
osdmaptool  --test-map-pgs-dump /tmp/new_osdmap_path

然后比较两个输入进行对比得到结果

相关链接

Calculate data migration when changing the CRUSHmap
alram/crush_data_movement_calculator.py

变更记录

Why Who When
创建 武汉-运维-磨渣 2017-02-08