cache warm一例
30 Jun 2024
|
|
原文
https://johnnysswlab.com/latency-sensitive-applications-and-the-memory-subsystem-keeping-the-data-in-the-cache/
while循环,没干活,干活逻辑是数据访问,那没干活分支应该可以热数据
比如原来的逻辑
td::unordered_map<int32_t, order> my_orders;
...
packet_t* p;
while(!exit) {
p = get_packet();
// If packet arrived
if (p) {
// Check if the identifier is known to us
auto it = my_orders.find(p->id);
if (it != my_orders.end()) {
send_answer(p->origin, it->second);
}
}
}
while里是个干活逻辑,但是有个大的if,我们可以把这个if拆出来分成干活不干活两个逻辑
std::unordered_map<int32_t, order> my_orders;
...
packet_t* p;
int64_t total_random_found = 0;
while(!exit) {
// 增加个检查header 然后再判断packet,不满足就去warm
// 如果header没满足,packet必不满足
if (packet_header_arrived()) {
p = get_packet();
// If packet arrived
if (p) {
// Check if the identifier is known to us
auto it = my_orders.find(p->id);
if (it != my_orders.end()) {
send_answer(p->origin, it->second);
}
}
} else {
// 不干活就Cache warming
auto random_id = get_random_id();
auto it = my_orders.find(random_id);
// 随便干点啥避免被编译器优化掉
total_random_found += (it != my_orders.end());
}
}
std::cout << "Total random found " << total_random_found << "\n";
当然这种cache warm不一定非得随机,有可能副作用
可以从历史值来用,有个词怎么说来着,启发式
硬件层也有cache warm 比如 intel
其实就是prefetch clflush那套,如果你知道具体访问哪个,那prefetch确实是比较高效的
amd也有 L3 Cache Range Reservation 不过没例子
作者测试了软件模拟cache warm,随机访问
数据,迭代多次的延迟,越小越好
hashmap数据量 | 正常访问hashmap | 没有访问的时候只warm 0 | 没有访问的时候随机warm |
---|---|---|---|
1 K | 226.1 (219.0) | 213.3 (205.1) | 132.5 (67.3) |
4 K | 324.7 (296.3) | 350.7 (331.3) | 140.1 (95.4) |
16 K | 396.8 (341.1) | 389.1 (354.5) | 208.7 (134.5) |
64 K | 425.5 (376.1) | 416.0 (360.6) | 232.1 (152.6) |
256 K | 514.2 (451.5) | 473.3 (480.6) | 338.8 (317.6) |
1 M | 599.8 (550.2) | 615.1 (573.6) | 466.3 (429.8) |
4 M | 702.1 (647.0) | 619.7 (649.2) | 531.3 (508.3) |
16 M | 756.7 (677.6) | 668.8 (707.4) | 543.2 (499.9) |
64 M | 769.1 (702.3) | 735.9 (734.2) | 641.0 (774.4) |
能看到随机访问 随机warm效果显著
和群友讨论
yangbowen认为没用,prefetch有用,手动warm当cpu傻逼(我感觉他没看懂这个例子)
mwish给了一些prefetch的资料 https://www.cs.cmu.edu/~chensm/papers/hashjoin_tods_preliminary.pdf
大家接触的多的例子就是prefetch,这种模拟cache warm还是比较少见的