是和mc团队聊天的一些记录,直接贴出来

rdb格式

https://github.com/sripathikrishnan/redis-rdb-tools/blob/d39c8e5127daf3e109c0f0e101af8ed0e5400493/docs/RDB_File_Format.textile

----------------------------# RDB is a binary format. There are no new lines or spaces in the file.
52 45 44 49 53              # Magic String "REDIS"
30 30 30 33                 # RDB Version Number in big endian. In this case, version = 0003 = 3
----------------------------
FE 00                       # FE = code that indicates database selector. db number = 00
----------------------------# Key-Value pair starts
FD $unsigned int            # FD indicates "expiry time in seconds". After that, expiry time is read as a 4 byte unsigned int
$value-type                 # 1 byte flag indicating the type of value - set, map, sorted set etc.
$string-encoded-key         # The key, encoded as a redis string
$encoded-value              # The value. Encoding depends on $value-type
----------------------------
FC $unsigned long           # FC indicates "expiry time in ms". After that, expiry time is read as a 8 byte unsigned long
$value-type                 # 1 byte flag indicating the type of value - set, map, sorted set etc.
$string-encoded-key         # The key, encoded as a redis string
$encoded-value              # The value. Encoding depends on $value-type
----------------------------
$value-type                 # This key value pair doesn't have an expiry. $value_type guaranteed != to FD, FC, FE and FF
$string-encoded-key
$encoded-value
----------------------------
FE $length-encoding         # Previos db ends, next db starts. Database number read using length encoding.
----------------------------
...                         # Key value pairs for this database, additonal database


FF                          ## End of RDB file indicator
8 byte checksum             ## CRC 32 checksum of the entire file.

大概是这样

magic-version-kv-checksum

kv

key - object - encoding - value

value - obejct - encoding


解析流程


   while(1) {
        sds key;
        robj *val;

        /* Read type. */
        if ((type = rdbLoadType(rdb)) == -1) goto eoferr;

        /* Handle special types. */
        if (type == RDB_OPCODE_EXPIRETIME) {

所有操作封装在rio中,rdbload底层调用rio的read write来读数据

  • 可以拆成buffer + batch read模式,拆分出读和buffer, AIO,加快读取解析速度
  • 如何并发?已知key大概率没有重复(除非rehash期间dump,倒霉了)
    • 本身是导入逻辑,不考虑key因果关系

32M page读,并发读10个,分别记住被斩断的key value,处理中间的数据

  • 存在value特别巨大的场景,32M正好在中间,只能pin住,极端场景所有buffer都没找到key,只有value,一整个白读
  • key value没有明显的区分定位手段

FF rdb结尾 - 读到buffer计算checksum 读完checksum 直接校验 FE db符号,默认就一个0,这是个设计错误,应该也没人使用

只要出现FC FD 就是key

  • 如何快速找到一段字符串中的特殊的几个字符?感觉能SIMD
  • 如果没有,说明是前一段buffer的一部分,pin住
  • buffer大小?怎么定buffer合理?
  • dag taskflow buffer parse任务存在依赖关系,类似sync_wait

结论是分片并发导入是能做的。不过一般用rdb大小也控制在8G以下,内存带宽足够,时间不算瓶颈

如果rdb需要下载的话,比如rdb从s3上拉回来,这种部分导入并发导入的需求就很大了