聊聊cmov
TL,DR: 短代码cmov快 长代码jmp快 cmov如果依赖特别严重会有性能衰退
加了一些自己的理解
所谓的branch-free编程 或者说 branch-less编程,就是一种优化if分支降低branch miss的手段
比如借助冒号表达式,借助+=等等
而这种优化通常都会优化成cmov
大概十几年前 linus对于cmov不屑一顾,性能表现不行
测试代码在这里
#include <stdio.h>
/* How many iterations? */
#define ITERATIONS (100000000)
/* Which bit of the counter to test? */
#define BIT 1
#ifdef CMOV
#define choose(i, a, b) ({ \
unsigned long result; \
asm("testl %1,%2 ; cmovne %3,%0" \
:"=r" (result) \
:"i" (BIT), \
"g" (i), \
"rm" (a), \
"0" (b)); \
result; })
#else
#define choose(i, a, b) ({ \
unsigned long result; \
asm("testl %1,%2 ; je 1f ; mov %3,%0\n1:" \
:"=r" (result) \
:"i" (BIT), \
"g" (i), \
"g" (a), \
"0" (b)); \
result; })
#endif
int main(int argc, char **argv)
{
int i;
unsigned long sum = 0;
for (i = 0; i < ITERATIONS; i++) {
unsigned long a = 5, b = 7;
sum += choose(i, a, b);
}
printf("%lu\n", sum);
return 0;
}
我测试了一下
gcc -Wall -O2 cmov.c
time ./a.out
600000000
real 0m0.540s
user 0m0.065s
sys 0m0.001s
gcc -Wall -O2 -DCMOV cmov.c
time ./a.out
600000000
real 0m0.522s
user 0m0.055s
sys 0m0.000s
显然cmov更快 linus这个结论过时了,毕竟07年。
新机器cmov是快的。这个测试比较简单没有引入cacheline的影响
(for “a = b > c ? d : e;”): As a rule of thumb, we can say that a conditional jump is faster than a conditional move if the code is part of a dependency chain and the prediction rate is better than 75%. A conditional jump is also preferred if we can avoid a lengthy calculation of d or e when the other operand is chosen.
简单说就是依赖性弱cmov快否则jmp快 (你问chatgpt也是这个回答)
那么 __builtin_expect 会生成cmov吗?
答案是可能,但不一定,简单代码还是可以生成的不能,但是__builtin_expect_with_probability 一定不行
#define LIKELY(x) (__builtin_expect(!!(x), 1))
#define VERY_LIKELY(x) (__builtin_expect_with_probability(!!(x), 1, 0.999))
int foo1(int a, int b, int c)
{
if (a == b)
a &= c;
return a;
}
/*
foo1:
and edx, edi
mov eax, edi
cmp edi, esi
cmove eax, edx
ret
*/
int foo2(int a, int b, int c)
{
if (LIKELY(a == b))
a &= c;
return a;
}
/*
foo2:
and edx, edi
mov eax, edi
cmp edi, esi
cmove eax, edx
ret
*/
int foo3(int a, int b, int c)
{
if (VERY_LIKELY(a == b))
a &= c;
return a;
}
/*
foo3:
mov eax, edi
cmp edi, esi
jne .L7
and eax, edx
.L7:
ret
*/
冒号表达式一定能cmov吗?
while ((pos1 < size1) & (pos2 < size2)) {
v1 = input1[pos1];
v2 = input2[pos2];
output_buffer[pos++] = (v1 <= v2) ? v1 : v2;
pos1 = (v1 <= v2) ? pos1 + 1 : pos1;
pos2 = (v1 >= v2) ? pos2 + 1 : pos2;
}
这个代码无法预测,没法优化成cmov,一会大于一会小于,怎么假设啊
但是我们可以利用交换 +=
while ((pos1 < size1) & (pos2 < size2)) {
v1 = input1[pos1];
v2 = input2[pos2];
output_buffer[pos++] = (v1 <= v2) ? v1 : v2;
pos1 += (v1 <= v2);
pos2 += (v1 >= v2);
}
LLVM 后端引入分支避免生成cmov
#include <cstddef>
#include <cstdint>
#include <utility>
void downHeap(uint64_t* top, size_t size, size_t pos) {
size_t parent = pos;
size_t child;
while ((child = 2 * parent + 2) < size) {
auto left = child - 1;
child = top[left] < top[child] ? child : left; // <<-- Unpredictable, should be CMOV
if (top[parent] < top[child]) {
std::swap(top[parent], top[child]);
parent = child;
} else {
return;
}
}
if (--child < size && top[parent] < top[child]) {
std::swap(top[parent], top[child]);
}
}
用了cmov 但是性能反倒是更差 为了避免这个bug
引入 __builtin_unpredictable 的出发点是可以屏蔽掉cmov生成,但是只能影响middle-end
后段select会不会优化成cmov不能通过__builtin_unpredictable修正
无分支代码一定会生成cmov吗
不一定,可能会插入分支
比如 本应该生成cmov llvm给加分支了
#include <cstddef>
#include <cstdint>
// Select an element from an array in constant time.
uint64_t constant_time_lookup(const size_t secret_idx,
const uint64_t table[8]) {
uint64_t result = 0;
for (size_t i = 0; i < 8; i++) {
const bool cond = i == secret_idx;
const uint64_t mask = (-(int64_t)cond);
result |= table[i] & mask;
}
return result;
}
这个例子我不是很理解,貌似是密码学防止攻击,故意不优化成cmov
gcc一个例子 godbolt
unsigned foo(unsigned a, unsigned b)
{
unsigned t = ((a > 2) != 0) << 1;
t |= ((a < 10) != 0) << 2;
return b >> t;
}
再比如
unsigned r = ((a & 0xffff0000) != 0) << 4;
会改成
unsigned r;
if (a > 65535)
r = 16;
else
r = 0;
结论
- cmov不是万能药,短代码cmov快 长代码jmp快 cmov如果依赖特别严重会有性能衰退
- __builtin_expect是优化预测分支的,不是优化无分支的,__builtin_expect_with_probability可以优化无分支
- __builtin_unpredictable 想帮忙屏蔽cmov,但没啥用
- 编译器可能会给简单的无分支代码加上分支
其他
- Speculative问题 对cmov的影响这里就不讨论了,扯太远了
- https://danluu.com/branch-prediction/ 介绍分支预测的,不错