rocksdb asyncio benchmark
TL;DR
本来想测测asyncio能带来多大提升,结果没啥提升,我都有点怀疑是不是我理解错了
准备
folly 集成在这里 https://github.com/facebook/rocksdb/wiki/RocksDB-Contribution-Guide#folly-integration
make checkout_folly
make build_folly
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_LIBURING=1 -DFAIL_ON_WARNINGS=0 -DWITH_SNAPPY=1 -DWITH_LZ4=1 -DWITH_ZSTD=1 -DUSE_COROUTINES=1 -DWITH_GFLAGS=1 -DROCKSDB_BUILD_SHARED=0 .. && make -j16 db_bench
make -j32
缺少什么依赖自己补充,有一点坑爹的地方就是这个folly生成的库的位置在/tmp ,所以说机器重启就没啦,还得重编,编译完记得保留一份文件
cp -r /tmp/fbcode_builder_getdeps-ZhomeZwZcodeZrocksdbZthird-partyZfollyZbuildZfbcode_builder folly_build_tmp
目录名字不一样自己改改
或者把依赖的库全复制到db_bench同级目录然后preload一下,总之得留一份
另外 我用的7.9版本,http://rocksdb.org/blog/2022/10/07/asynchronous-io-in-rocksdb.html
这个博客说的压测我自己运行会直接coredump。我就只能自己db_bench来玩了。。
摸索的数据贴在 https://github.com/wanghenshui/wanghenshui.github.io/issues/83
磁盘
之前也说过我用的固态是凯侠 RC20,我又跑了dd/fio,这里贴一下参数
dd
dd if=/data/tmp/test of=/data/tmp/test2 bs=64k
记录了1250000+0 的读入
记录了1250000+0 的写出
81920000000字节(82 GB,76 GiB)已复制,112.668 s,727 MB/s
我觉得效果还可以
fio fsync
fio --randrepeat=1 --ioengine=sync --direct=1 --gtod_reduce=1 --name=test --filename=/data/tmp/fio_test_file --bs=4k --iodepth=64 --size=4G --readwrite=randread --numjobs=32 --group_reporting
fio-3.30
Starting 32 processes
Jobs: 32 (f=32): [r(32)][100.0%][r=962MiB/s][r=246k IOPS][eta 00m:00s]
test: (groupid=0, jobs=32): err= 0: pid=11567: Sat Feb 11 19:17:06 2023
read: IOPS=244k, BW=953MiB/s (999MB/s)(128GiB/137532msec)
bw ( KiB/s): min=22408, max=1006104, per=100.00%, avg=976605.28, stdev=2430.25, samples=8768
iops : min= 5602, max=251526, avg=244151.32, stdev=607.56, samples=8768
cpu : usr=0.26%, sys=1.54%, ctx=33554635, majf=5, minf=346
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=33554432,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=953MiB/s (999MB/s), 953MiB/s-953MiB/s (999MB/s-999MB/s), io=128GiB (137GB), run=137532-137532msec
Disk stats (read/write):
nvme1n1: ios=33537154/0, merge=0/0, ticks=4317401/0, in_queue=4317401, util=99.94%
fio aio
w@w-msi:~$ fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/data/tmp/fio_test_file2 --bs=4k --iodepth=64 --size=4G --readwrite=randread
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.30
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=971MiB/s][r=248k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=11667: Sat Feb 11 19:19:47 2023
read: IOPS=247k, BW=965MiB/s (1012MB/s)(4096MiB/4246msec)
bw ( KiB/s): min=981088, max=996264, per=100.00%, avg=990337.00, stdev=5501.80, samples=8
iops : min=245272, max=249066, avg=247584.25, stdev=1375.45, samples=8
cpu : usr=12.82%, sys=43.18%, ctx=338888, majf=0, minf=71
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=1048576,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=965MiB/s (1012MB/s), 965MiB/s-965MiB/s (1012MB/s-1012MB/s), io=4096MiB (4295MB), run=4246-4246msec
Disk stats (read/write):
这个读写还可以把,但是我db bench 单线程读,磁盘就40M,给我测的失去信心了。连续读写可以,但是小文件读垃圾
压测数据以及结论
命令
NUM_THREADS=32 NUM_KEYS=100000000 DB_DIR=/data/tmp/ben WAL_DIR=/data/tmp/wal ./benchmark.sh bulkload,readrandom
NUM_THREADS=32 NUM_KEYS=1000000000 DB_DIR=/data/tmp/ben WAL_DIR=/data/tmp/wal ./benchmark.sh bulkload
NUM_THREADS=32 NUM_KEYS=100000000 DB_DIR=/data/tmp/ben WAL_DIR=/data/tmp/wal ./benchmark.sh readrandom
ASYNC_IO=1 NUM_THREADS=32 NUM_KEYS=100000000 DB_DIR=/data/tmp/ben WAL_DIR=/data/tmp/wal ./benchmark.sh readrandom
ASYNC_IO=1 NUM_THREADS=32 NUM_KEYS=100000000 DB_DIR=/data/tmp/ben WAL_DIR=/data/tmp/wal ./benchmark.sh multireadrandom
NUM_THREADS=32 NUM_KEYS=100000000 DB_DIR=/data/tmp/ben WAL_DIR=/data/tmp/wal ./benchmark.sh multireadrandom
一开始是10G的数据,然后重新生成了100G的数据
ASYNC_IO是我加的
diff
diff --git a/tools/benchmark.sh b/tools/benchmark.sh
index b41d25c78..df3f6e52e 100755
--- a/tools/benchmark.sh
+++ b/tools/benchmark.sh
@@ -170,7 +170,7 @@ compression_max_dict_bytes=${COMPRESSION_MAX_DICT_BYTES:-0}
compression_type=${COMPRESSION_TYPE:-zstd}
min_level_to_compress=${MIN_LEVEL_TO_COMPRESS:-"-1"}
compression_size_percent=${COMPRESSION_SIZE_PERCENT:-"-1"}
-
+async_io=${ASYNC_IO:-0}
duration=${DURATION:-0}
writes=${WRITES:-0}
@@ -291,6 +291,7 @@ const_params_base="
--memtablerep=skip_list \
--bloom_bits=10 \
--open_files=-1 \
+ -async_io=$async_io \
--subcompactions=$subcompactions \
\
$bench_args"
就加了个async_io
结果
# ops_sec - operations per second
# mb_sec - ops_sec * size-of-operation-in-MB
# lsm_sz - size of LSM tree
# blob_sz - size of BlobDB logs
# c_wgb - GB written by compaction
# w_amp - Write-amplification as (bytes written by compaction / bytes written by memtable flush)
# c_mbps - Average write rate for compaction
# c_wsecs - Wall clock seconds doing compaction
# c_csecs - CPU seconds doing compaction
# b_rgb - Blob compaction read GB
# b_wgb - Blob compaction write GB
# usec_op - Microseconds per operation
# p50, p99, p99.9, p99.99 - 50th, 99th, 99.9th, 99.99th percentile response time in usecs
# pmax - max response time in usecs
# uptime - RocksDB uptime in seconds
# stall% - Percentage of time writes are stalled
# Nstall - Number of stalls
# u_cpu - #seconds/1000 of user CPU
# s_cpu - #seconds/1000 of system CPU
# rss - max RSS in GB for db_bench process
# test - Name of test
# date - Date/time of test
# version - RocksDB version
# job_id - User-provided job ID
# githash - git hash at which db_bench was compiled
ops_sec mb_sec lsm_sz blob_sz c_wgb w_amp c_mbps c_wsecs c_csecs b_rgb b_wgb usec_op p50 p99 p99.9 p99.99 pmax uptime stall% Nstall u_cpu s_cpu rss test date version job_id githash
1536598 615.5 18GB 0GB 17.6 0.9 276.0 203 199 0 0 0.7 0.5 1 1 1139 41244 65 43.2 64 0.2 0.0 NA bulkload 2023-02-11T21:58:57 8.0.0 a72f591825
1480156 374.8 11GB 0GB 0.0 NA 0.0 0 0 0 0 21.6 2.6 327 696 1563 414661 2162 0.0 0 12.4 2.6 16.4 readrandom.t32 2023-02-11T22:02:17 8.0.0 a72f591825
1546717 619.5 177GB 0GB 177.2 0.9 280.7 2028 1964 0 0 0.6 0.5 1 1 1151 35774 646 46.8 651 2.2 0.1 NA bulkload 2023-02-11T22:55:03 8.0.0 a72f591825
867544 219.7 107GB 0GB 0.0 NA 0.0 0 0 0 0 36.9 2.6 408 960 3881 333915 3689:q 0.0 0 12.5 4.7 17.6 readrandom.t32 2023-02-12T14:15:06 8.0.0 a72f591825
810797 205.3 107GB 0GB 0.0 NA 0.0 0 0 0 0 39.4 2.6 440 1046 3490 18445 3947 0.0 0 12.3 4.8 18.5 readrandom.t32 2023-02-12T19:48:38 8.0.0 a72f591825
817781 207.1 107GB 0GB 0.0 NA 0.0 0 0 0 0 39.1 329.7 1559 3949 11708 19654 3914 0.0 0 12.1 4.8 17.6 multireadrandom.t32 2023-02-12T21:15:11 8.0.0 a72f591825
800529 202.7 107GB 0GB 0.0 NA 0.0 0 0 0 0 40.0 336.0 1615 4130 11987 23272 3998 0.0 0 12.1 4.9 18.3 multireadrandom.t32 2023-02-12T22:30:05 8.0.0 a72f591825
简单来说,开了AIO和没开AIO,没啥区别? 我没办法解释
没有使用folly的db_bench我还没测。有空我再补充
2023-02-20
重新测了一组数据
ASYNC_IO=1 NUM_THREADS=32 NUM_KEYS=100000000 DB_DIR=/data/tmp/ben WAL_DIR=/data/tmp/wal ./benchmark.sh multireadrandom
NUM_THREADS=32 NUM_KEYS=100000000 DB_DIR=/data/tmp/ben WAL_DIR=/data/tmp/wal ./benchmark.sh multireadrandom
ASYNC_IO=1 NUM_THREADS=32 NUM_KEYS=100000000 DB_DIR=/data/tmp/ben WAL_DIR=/data/tmp/wal ./benchmark.sh readrandom
NUM_THREADS=32 NUM_KEYS=100000000 DB_DIR=/data/tmp/ben WAL_DIR=/data/tmp/wal ./benchmark.sh readrandom
ASYNC_IO=1 NUM_THREADS=32 NUM_KEYS=100000000 DB_DIR=/data/tmp/ben WAL_DIR=/data/tmp/wal ./benchmark.sh multireadrandom
NUM_THREADS=32 NUM_KEYS=100000000 DB_DIR=/data/tmp/ben WAL_DIR=/data/tmp/wal ./benchmark.sh multireadrandom
ASYNC_IO=1 NUM_THREADS=32 NUM_KEYS=100000000 DB_DIR=/data/tmp/ben WAL_DIR=/data/tmp/wal ./benchmark.sh readrandom
NUM_THREADS=32 NUM_KEYS=100000000 DB_DIR=/data/tmp/ben WAL_DIR=/data/tmp/wal ./benchmark.sh readrandom
结果
# ops_sec - operations per second
# mb_sec - ops_sec * size-of-operation-in-MB
# lsm_sz - size of LSM tree
# blob_sz - size of BlobDB logs
# c_wgb - GB written by compaction
# w_amp - Write-amplification as (bytes written by compaction / bytes written by memtable flush)
# c_mbps - Average write rate for compaction
# c_wsecs - Wall clock seconds doing compaction
# c_csecs - CPU seconds doing compaction
# b_rgb - Blob compaction read GB
# b_wgb - Blob compaction write GB
# usec_op - Microseconds per operation
# p50, p99, p99.9, p99.99 - 50th, 99th, 99.9th, 99.99th percentile response time in usecs
# pmax - max response time in usecs
# uptime - RocksDB uptime in seconds
# stall% - Percentage of time writes are stalled
# Nstall - Number of stalls
# u_cpu - #seconds/1000 of user CPU
# s_cpu - #seconds/1000 of system CPU
# rss - max RSS in GB for db_bench process
# test - Name of test
# date - Date/time of test
# version - RocksDB version
# job_id - User-provided job ID
# githash - git hash at which db_bench was compiled
ops_sec mb_sec lsm_sz blob_sz c_wgb w_amp c_mbps c_wsecs c_csecs b_rgb b_wgb usec_op p50 p99 p99.9 p99.99 pmax uptime stall% Nstall u_cpu s_cpu rss test date version job_id githash
4325990 1095.4 107GB 0GB 0.0 NA 0.0 0 0 0 0 7.2 62.5 253 613 1257 337223 740 0.0 0 17.3 2.3 23.6 multireadrandom.t32 2023-02-19T21:15:01 8.0.0 a72f591825
4309570 1091.2 107GB 0GB 0.0 NA 0.0 0 0 0 0 7.3 62.3 263 536 1133 46141 743 0.0 0 17.2 2.3 24.2 multireadrandom.t32 2023-02-19T20:35:12 8.0.0 a72f591825
4425381 1120.5 107GB 0GB 0.0 NA 0.0 0 0 0 0 7.1 4.4 28 170 453 427580 724 0.0 0 17.3 2.3 24.1 readrandom.t32 2023-02-19T21:29:13 8.0.0 a72f591825
4391332 1111.9 107GB 0GB 0.0 NA 0.0 0 0 0 0 7.1 4.4 29 169 368 431012 729 0.0 0 17.3 2.3 24.4 readrandom.t32 2023-02-19T22:04:38 8.0.0 a72f591825
4103189 1039.0 107GB 0GB 0.0 NA 0.0 0 0 0 0 7.7 59.1 342 776 3829 418001 780 0.0 0 16.4 2.1 17.9 multireadrandom.t32 2023-02-19T23:03:14 8.0.0 a72f591825
4143277 1049.1 107GB 0GB 0.0 NA 0.0 0 0 0 0 7.7 59.0 322 564 4243 19460 773 0.0 0 16.3 2.2 21.4 multireadrandom.t32 2023-02-19T23:30:45 8.0.0 a72f591825
4291820 1086.7 107GB 0GB 0.0 NA 0.0 0 0 0 0 7.4 4.1 112 198 368 25312 746 0.0 0 16.7 2.0 21.9 readrandom.t32 2023-02-19T22:48:11 8.0.0 a72f591825
4004245 1013.9 107GB 0GB 0.0 NA 0.0 0 0 0 0 8.0 3.9 130 235 581 248845 800 0.0 0 16.3 2.3 17.8 readrandom.t32 2023-02-19T22:32:36 8.0.0 a72f591825
不用folly的 db_bench 开asyncio要比不开快一丢丢
folly 版本开asyncio也要比不开快一丢丢
用folly反而比不用folly更慢了
感觉是folly用的不对