公众号
欢迎投稿,推荐或自荐文章/软件/资源等
大部分都是能力补充,比如views::enumerate
, 比如new size信息?
编译器信息最新动态推荐关注hellogcc公众号 本周更新 2022-01-05 第131期
周末没事可以看看,c++的东西很多,llvm/编译器链接器 等等
B站直播
PLCT开放日和OSDT大会都会通过B站进行直播。直播地址:
https://live.bilibili.com/10339607 腾讯会议(网络研讨会)
吴伟 邀请您参加腾讯会议网络研讨会(Webinar)
会议主题:2022开源开发工具大会
会议时间:2022/12/10 09:30-18:00 (GMT+08:00) 中国标准时间 - 北京
重复周期:2022/12/10-2022/12/11 09:30-18:00, 每天
点击专属链接入会,或添加至会议列表:
https://meeting.tencent.com/dw/sOYndcvZQ9Ua
会议号: #腾讯会议:569-2778-2379
看个乐,感觉都知道了
学习一波,如何把coroutine和asio结合 (asio还是难用)
空基类优化,另外,现在是2022了,要用[[no_unique_address]]
一个代码调优
struct s {
static constexpr auto operator()() { return 1; }
};
auto l = [] static { return 2; };
static_assert(3 == s{}() + l());
static_assert(3 == s::operator()() +
decltype(l)::operator()());
兼容能力补充
如何实现 views::enumerate
第一个支持std::execution
设计的库?
msvc尴尬的bug
封c接口用的
#include <iostream>
#include <memory>
void old_c_api(int** p) {
*p = new int{42};
}
int main() {
auto pi = std::make_unique<int>(51);
old_c_api(std::out_ptr(pi));
std::cout << *pi << '\n';
}
pi的释放交给out_ptr了。如果是std::shared_ptr
?
#include <iostream>
#include <memory>
void old_c_api(int** p) {
*p = new int{42};
}
int main() {
auto pi = std::make_shared<int>(51);
// error C2338: static_assert failed: 'out_ptr_t with shared_ptr requires a deleter (N4892 [out.ptr.t]/3)'
// old_c_api(std::out_ptr(pi));
old_c_api(std::out_ptr(pi, std::default_delete<int>()));
std::cout << *pi << '\n';
}
编译期字符串一些操作。这个文章是吴咏炜写的,投accu了
长度,这个之前也说过
constexpr size_t length(const char* str) {
return char_traits<char>::length(str);
}
查找
constexpr const char*
find(const char* str, char ch) {
return char_traits<char>::find(
str, length(str), ch);
}
substr
constexpr auto substr(string_view sv,
size_t offset, size_t count) {
array<char, count + 1> result{};
copy_n(&sv[offset], count, result.data());
return result;
}
template <size_t Count>
constexpr auto substr(const char* str,
size_t offset = 0){
array<char, Count + 1> result{};
for (size_t i = 0; i < Count; ++i) {
result[i] = str[offset + i];
}
return result;
}
字符串模版参数
template <char... Cs>
struct compile_time_string {
static constexpr char value[]{Cs..., '\0'};
};
template <typename T, T... Cs>
constexpr compile_time_string<Cs...>
operator""_cts() {
return {};
}
template <size_t N>
struct compile_time_string {
constexpr compile_time_string(
const char (&str)[N])
{
copy_n(str, N, value);
}
char value[N]{};
};
template <compile_time_string cts>
constexpr auto operator""_cts() {
return cts;
}
我还以为又拿原子量自旋当mutex炒冷饭,原来是用c++20接口
使用c++20的atomic::wait
/atomic::notify
来实现锁, 相当于把futex那套东西抠出来?
没有看底层实现,不懂,这里有个windows的分析 https://zhuanlan.zhihu.com/p/413660695
struct one_byte_mutex
{
void lock() {
if (state.exchange(locked, std::memory_order_acquire) == unlocked)
return;
while (state.exchange(sleeper, std::memory_order_acquire) != unlocked)
state.wait(sleeper, std::memory_order_relaxed);
}
void unlock() {
if (state.exchange(unlocked, std::memory_order_release) == sleeper)
state.notify_one();
}
private:
std::atomic<uint8_t> state{ unlocked };
static constexpr uint8_t unlocked = 0;
static constexpr uint8_t locked = 0b01;
static constexpr uint8_t sleeper = 0b10;
};
template<typename T>
struct pointer_with_mutex {
T* get() const {
uint64_t masked = pointer.load(std::memory_order_relaxed) & ~both_bits;
return reinterpret_cast<T*>(masked);
}
void set(T* ptr) {
static_assert(std::alignment_of<T>::value >= 4);
uint64_t as_int = reinterpret_cast<uint64_t>(ptr);
uint64_t old = pointer.load(std::memory_order_relaxed);
while (!pointer.compare_exchange_weak(old, (old & both_bits) | as_int, std::memory_order_relaxed)) { }
}
void lock() {
uint64_t old = pointer.load(std::memory_order_relaxed);
if (!(old & both_bits) && pointer.compare_exchange_strong(old, old | locked, std::memory_order_acquire))
return;
for(;;) {
if (old & sleeper) {
pointer.wait(old, std::memory_order_relaxed);
old = pointer.load(std::memory_order_relaxed);
} else if (pointer.compare_exchange_weak(old, old | sleeper, std::memory_order_acquire)) {
if (!(old & both_bits))
return;
pointer.wait(old | sleeper, std::memory_order_relaxed);
old = pointer.load(std::memory_order_relaxed);
}
}
}
void unlock() {
uint64_t old = pointer.fetch_and(~both_bits, std::memory_order_release);
if (old & sleeper)
pointer.notify_one();
}
private:
std::atomic<uint64_t> pointer{ 0 };
static constexpr uint64_t locked = 0b01;
static constexpr uint64_t sleeper = 0b10;
static constexpr uint64_t both_bits = locked | sleeper;
};
性能数据
lock/unlock single-threaded | lock/unlock many threads | |
---|---|---|
std::mutex | 12ns | 71ns |
one_byte_mutex | 8ns | 228ns |
pointer_with_mutex | 15ns | 255ns |
看这个性能还可以,压测代码在这里 https://github.com/skarupke/two_bit_mutex/blob/main/benchmark.cpp
可以查一下多线程的问题到底出在哪里,缓存颠簸还是啥的。有空本地复现一下
老生常谈了,还有不会的吗
大哥在较真,为啥编译器生成的自动向量化汇编会没必要的加载两次变量。没细看。没结论。gcc做的不够好
求两个数中点,经典数据hacker’s delight有解
int f(int x, int y) {
return (x|y) - ((x^y)>>1);
}
int f(int x, int y) {
return ((x^y)>>1) + (x&y);
}
如果想要更全平台,用 std::midpoint
c++20
这个作者是之前那个算法博客 https://en.algorithmica.org/hpc/
其实讲的就是cacheline影响计算吞吐。没啥新东西。更多是现象列举。大家可以看看这个博客。之前也推过