redis rdb 加载的一些思考
28 Dec 2023
|
|
是和mc团队聊天的一些记录,直接贴出来
rdb格式
https://github.com/sripathikrishnan/redis-rdb-tools/blob/d39c8e5127daf3e109c0f0e101af8ed0e5400493/docs/RDB_File_Format.textile
----------------------------# RDB is a binary format. There are no new lines or spaces in the file.
52 45 44 49 53 # Magic String "REDIS"
30 30 30 33 # RDB Version Number in big endian. In this case, version = 0003 = 3
----------------------------
FE 00 # FE = code that indicates database selector. db number = 00
----------------------------# Key-Value pair starts
FD $unsigned int # FD indicates "expiry time in seconds". After that, expiry time is read as a 4 byte unsigned int
$value-type # 1 byte flag indicating the type of value - set, map, sorted set etc.
$string-encoded-key # The key, encoded as a redis string
$encoded-value # The value. Encoding depends on $value-type
----------------------------
FC $unsigned long # FC indicates "expiry time in ms". After that, expiry time is read as a 8 byte unsigned long
$value-type # 1 byte flag indicating the type of value - set, map, sorted set etc.
$string-encoded-key # The key, encoded as a redis string
$encoded-value # The value. Encoding depends on $value-type
----------------------------
$value-type # This key value pair doesn't have an expiry. $value_type guaranteed != to FD, FC, FE and FF
$string-encoded-key
$encoded-value
----------------------------
FE $length-encoding # Previos db ends, next db starts. Database number read using length encoding.
----------------------------
... # Key value pairs for this database, additonal database
FF ## End of RDB file indicator
8 byte checksum ## CRC 32 checksum of the entire file.
大概是这样
magic-version-kv-checksum
kv
key - object - encoding - value
value - obejct - encoding
解析流程
while(1) {
sds key;
robj *val;
/* Read type. */
if ((type = rdbLoadType(rdb)) == -1) goto eoferr;
/* Handle special types. */
if (type == RDB_OPCODE_EXPIRETIME) {
所有操作封装在rio中,rdbload底层调用rio的read write来读数据
- 可以拆成buffer + batch read模式,拆分出读和buffer, AIO,加快读取解析速度
- 如何并发?已知key大概率没有重复(除非rehash期间dump,倒霉了)
- 本身是导入逻辑,不考虑key因果关系
32M page读,并发读10个,分别记住被斩断的key value,处理中间的数据
- 存在value特别巨大的场景,32M正好在中间,只能pin住,极端场景所有buffer都没找到key,只有value,一整个白读
- key value没有明显的区分定位手段
FF rdb结尾 - 读到buffer计算checksum 读完checksum 直接校验 FE db符号,默认就一个0,这是个设计错误,应该也没人使用
只要出现FC FD 就是key
- 如何快速找到一段字符串中的特殊的几个字符?感觉能SIMD
- 如果没有,说明是前一段buffer的一部分,pin住
- buffer大小?怎么定buffer合理?
- dag taskflow buffer parse任务存在依赖关系,类似sync_wait
结论是分片并发导入是能做的。不过一般用rdb大小也控制在8G以下,内存带宽足够,时间不算瓶颈
如果rdb需要下载的话,比如rdb从s3上拉回来,这种部分导入并发导入的需求就很大了