公众号
点击「查看原文」跳转到 GitHub 上对应文件,链接就可以点击了
qq群 点击进入
欢迎投稿,推荐或自荐文章/软件/资源等,评论区留言
本期文章由 HNY 赞助
标准委员会动态/ide/编译器信息放在这里
编译器信息最新动态推荐关注hellogcc公众号 本周更新 2024-08-21 第268期
看一乐
看一乐 感谢恒星投稿
给boost unordered实现gdb pretty print
gdb使用pretty print很简单
第一步
(gdb) set print pretty on
第二步,如果你的脚本在二进制的section中
(gdb) add-auto-load-safe-path path/to/executable
如果没有,有脚本,可以加载脚本
(gdb) source path/to/boost/libs/unordered/extra/boost_unordered_printers.py
其实脚本内容和放进二进制section内容是一样的,怎么放进二进制?可以学习这个 https://github.com/boostorg/outcome/blob/master/include/boost/outcome/outcome_gdb.h
我相信大部分读者是第一次知道gdb printer可以放到二进制里的
接下来是如何实现gdb printer
很简单,接口就这样
class BoostUnorderedFcaPrinter:
def __init__(self, val):
self.val = val
def to_string(self):
return f"This is a {self.val.type}"
目标,实现to_string
注册也非常简单
def boost_unordered_build_pretty_printer():
pp = gdb.printing.RegexpCollectionPrettyPrinter("boost_unordered")
add_template_printer = lambda name, printer: pp.add_printer(name, f"^{name}<.*>$", printer)
add_template_printer("boost::unordered::unordered_map", BoostUnorderedFcaPrinter)
add_template_printer("boost::unordered::unordered_multimap", BoostUnorderedFcaPrinter)
add_template_printer("boost::unordered::unordered_set", BoostUnorderedFcaPrinter)
add_template_printer("boost::unordered::unordered_multiset", BoostUnorderedFcaPrinter)
return pp
gdb.printing.register_pretty_printer(gdb.current_objfile(), boost_unordered_build_pretty_printer())
继续,咱们展开成员
def maybe_unwrap_foa_element(e):
element_type = "boost::unordered::detail::foa::element_type<"
if f"{e.type.strip_typedefs()}".startswith(element_type):
return e["p"]
else:
return e
简单吧,现在你学会了gdb.Value.type.strip_typedefs
咱们进化一下
class BoostUnorderedFcaPrinter:
def __init__(self, val):
self.val = val
self.name = f"{self.val.type.strip_typedefs()}".split("<")[0]
self.name = self.name.replace("boost::unordered::", "boost::")
self.is_map = self.name.endswith("map")
def to_string(self):
size = self.val["table_"]["size_"]
return f"{self.name} with {size} elements"
这样打印
(gdb) print my_unordered_map
$1 = boost::unordered_map with 3 elements
(gdb) print my_unordered_multiset
$2 = boost::unordered_multiset with 5 elements
考虑遍历成员
def display_hint(self):
return "map"
def children(self):
def generator():
# ...
while condition:
value = # ...
if self.is_map:
first = value["first"]
second = value["second"]
yield "", first
yield "", second
else:
yield "", count
yield "", value
return generator()
后面就不展开了
gdb python的玩法还是非常多的
PS: 吴乎提问: 放进二进制section这个,strip时不知道是跟着debug符号走,还是跟着binary走
笔者查了一下,使用的section debug_gdb_scripts也是debug symbol,strip会被删, 这种建议先strip后objcopy把debug_gdb_scripts搞回来 详情点击这个SO gdb section可以看这里
图形生成算法使用simd性能提升显著。代码就不贴了
介绍了fmtlib在减小二进制上的探索,比如查表优化
auto do_count_digits(uint32_t n) -> int {
// An optimization by Kendall Willets from https://bit.ly/3uOIQrB.
// This increments the upper 32 bits (log10(T) - 1) when >= T is added.
# define FMT_INC(T) (((sizeof(#T) - 1ull) << 32) - T)
static constexpr uint64_t table[] = {
FMT_INC(0), FMT_INC(0), FMT_INC(0), // 8
FMT_INC(10), FMT_INC(10), FMT_INC(10), // 64
FMT_INC(100), FMT_INC(100), FMT_INC(100), // 512
FMT_INC(1000), FMT_INC(1000), FMT_INC(1000), // 4096
FMT_INC(10000), FMT_INC(10000), FMT_INC(10000), // 32k
FMT_INC(100000), FMT_INC(100000), FMT_INC(100000), // 256k
FMT_INC(1000000), FMT_INC(1000000), FMT_INC(1000000), // 2048k
FMT_INC(10000000), FMT_INC(10000000), FMT_INC(10000000), // 16M
FMT_INC(100000000), FMT_INC(100000000), FMT_INC(100000000), // 128M
FMT_INC(1000000000), FMT_INC(1000000000), FMT_INC(1000000000), // 1024M
FMT_INC(1000000000), FMT_INC(1000000000) // 4B
};
auto inc = table[__builtin_clz(n | 1) ^ 31];
return static_cast<int>((n + inc) >> 32);
}
template <typename T> constexpr auto count_digits_fallback(T n) -> int {
int count = 1;
for (;;) {
// Integer division is slow so do it for a group of four digits instead
// of for every digit. The idea comes from the talk by Alexandrescu
// "Three Optimization Tips for C++". See speed-test for a comparison.
if (n < 10) return count;
if (n < 100) return count + 1;
if (n < 1000) return count + 2;
if (n < 10000) return count + 3;
n /= 10000u;
count += 4;
}
}
这两种用法,查表就是要多一堆符号的,如果业务要小二进制,就不用查表
另外就是 -nodefaultlibs
-fno-exceptions
这种场景下 new delete基本也得去掉 最后减小到14K,如果去除main6k fmt整体小于10k
批量生成随机数相当于给一堆数打散,第一反应就是shuffle
void shuffle(mytype *storage, uint64_t size) {
for (uint64_t i = size; i > 1; i--) {
uint64_t nextpos = random(i); // random value in [0,i)
std::swap(storage[i - 1], storage[nextpos]);
}
}
显然这个shuffle中的random是瓶颈,我们常规的实现就是%i,有没有更快的做法?
uint64_t random_bounded(uint64_t range) {
__uint128_t random64bit, multiresult;
uint64_t leftover;
uint64_t threshold;
random64bit = rng(); // 64-bit random integer
multiresult = random64bit * range;
leftover = (uint64_t)multiresult;
if (leftover < range) {
threshold = -range % range;
while (leftover < threshold) {
random64bit = rng();
multiresult = random64bit * range;
leftover = (uint64_t)multiresult;
}
}
return (uint64_t)(multiresult >> 64); // [0, range)
}
/ Fisher-Yates shuffle
void shuffle(uint64_t *storage, uint64_t size, uint64_t (*rng)(void)) {
uint64_t i;
for (i = size; i > 1; i--) {
uint64_t nextpos = random_bounded(i, rng);
uint64_t tmp = storage[i - 1];
uint64_t val = storage[nextpos];
storage[i - 1] = val;
storage[nextpos] = tmp;
}
}
这实际上也是gcc的实现,我们能不能拆成批量shuffle?
考虑一个场景,你需要多个shuffle,显然每次都执行random_bound代价大
能不能一个random_bound把多个shuffle一起计算?当然可以
然后多个shuffle组合成一个shuffle就是修改index的问题了是不是?
比如拆成两个
// product_bound can be any integer >= range1*range2
// it may be updated to become range1*range2
std::pair<uint64_t, uint64_t>
random_bounded_2(uint64_t range1, uint64_t range2,
uint64_t &product_bound) {
__uint128_t random64bit, multiresult;
uint64_t leftover;
uint64_t threshold;
random64bit = rng(); // 64-bit random integer
multiresult = random64bit * range1;
leftover = (uint64_t)multiresult;
uint64_t result1 = (uint64_t)(multiresult >> 64); // [0, range1)
multiresult = leftover * range2;
leftover = (uint64_t)multiresult;
uint64_t result2 = (uint64_t)(multiresult >> 64); // [0, range2)
if (leftover < product_bound) {
product_bound = range2 * range1;
if (leftover < product_bound) {
threshold = -product_bound % product_bound;
while (leftover < threshold) {
random64bit = rng();
multiresult = random64bit * range1;
leftover = (uint64_t)multiresult;
result1 = (uint64_t)(multiresult >> 64); // [0, range1)
multiresult = leftover * range2;
leftover = (uint64_t)multiresult;
result2 = (uint64_t)(multiresult >> 64); // [0, range2)
}
}
}
return std::make_pair(result1, result2);
}
void shuffle_2(mytype *storage, uint64_t size) {
uint64_t i = size;
for (; i > 1 << 30; i--) {
uint64_t index = random_bounded(i, g);
// index is in [0, i-1]
std::swap(storage[i - 1], storage[index]);
}
// Batches of 2 for sizes up to 2^30 elements
uint64_t product_bound = i * (i - 1);
for (; i > 1; i -= 2) {
auto [index1, index2] = random_bounded_2(i, i - 1,
product_bound, g);
// index1 is in [0, i-1]
// index2 is in [0, i-2]
std::swap(storage[i - 1], storage[index1]);
std::swap(storage[i - 2], storage[index2]);
}
}
测试linux gcc快30%
处理无限大无限小,各种语言的差别
python
>>> float("1e-1000")
0.0
>>> float("1e1000")
inf
golang
package main
import (
"fmt"
"strconv"
)
func main() {
f, err := strconv.ParseFloat("1e-1000", 64)
fmt.Println(f, err)
f, err = strconv.ParseFloat("1e1000", 64)
fmt.Println(f, err)
}
/*
0
+Inf strconv.ParseFloat: parsing "1e1000": value out of range
*/
c
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
int main(void) {
const char *p = "1e-1000 1e1000";
printf("Parsing '%s':\n", p);
char *end;
for (double f = strtod(p, &end); p != end; f = strtod(p, &end)) {
printf("'%.*s' -> ", (int)(end-p), p);
p = end;
if (errno == ERANGE){
printf("range error, got ");
errno = 0;
}
printf("%f\n", f);
}
}
/*
Parsing '1e-1000 1e1000':
'1e-1000' -> range error, got 0.000000
' 1e1000' -> range error, got inf
*/
上面的行为还算同意,c++呢?
五花八门
#include <cstdio>
#include <charconv>
#include <string>
int main() {
for(std::string str : {"1e-1000", "1e1000"}) {
double value = -1;
printf("parsing %s\n", str.c_str());
auto r = std::from_chars(str.data(), str.data() + str.size(), value);
if(r.ec == std::errc::result_out_of_range) { printf("out of range "); }
printf("%f\n", value);
}
return EXIT_SUCCESS;
}
如果是clang直接编译不过,不支持浮点数from_chars,只好用c的或者三方库 absl::from_chars可以测试一下
如果是msvc
parsing 1e-1000
out of range 0.000000
parsing 1e1000
out of range inf
现象和上面相同
gcc使用fast_float库理应现象相同,但是加了点边界判断
parsing 1e-1000
out of range -1.000000
parsing 1e1000
out of range -1.00000
导致区分不出了我操, -1引入歧义了
最近周报总结内容太少了,所以更新晚,见谅