C++ 中文周刊第26期

从reddit/hackernews/lobsters/meetingcpp摘抄一些c++动态。

每周更新

周刊项目地址 github，在线地址｜知乎专栏

欢迎投稿，推荐或自荐文章/软件/资源等，请提交 issue

资讯

编译器信息最新动态推荐关注hellogcc公众号

本周周报github直达

boost 1.77版本出了，一些改动

asio重大更新

增加协程支持

支持取消异步任务

filesystem重大更新

引入v4，适配std::filesysterm

新增两个库 describe和lambda2

describe是个宏反射库，不多说

lambda2有点像第九期提到的A Macro-Based Terse Lambda Expression

看例子

#include <boost/lambda2.hpp>
#include <algorithm>

using namespace boost::lambda2;

int count_even( int const * first, int const * last )
{
    return std::count_if( first, last, _1 % 2 == 0 );
}

文章

浅谈The C++ Executors

介绍executor概念，异步的抽象，future promise的替代品，未来c++23还是26估计就有了

Concurrent Deferred Reference Counting with Constant-Time Overhead

实现在这里没来得及看

C++20 modules with GCC11

手把手教你用c++20 mudule(真心难用)

Speeding up atan2f by 50x

通过代数缩放+simd并行来加快atan2f算法代码在这里，原理没研究

Stricter Expression Evaluation Order in C++17

这段代码的打印是什么样的？

#include <iostream> 

class Query {      
public:
    Query& addInt(int i) {
        std::cout << "addInt: " << i << '\n';
        return *this;
    }
    
    Query& addFloat(float f) {
        std::cout << "addFloat: " << f << '\n';
        return *this;
    }
};

float computeFloat() { 
    std::cout << "computing float... \n";
    return 10.1f; 
}

float computeInt() { 
    std::cout << "computing int... \n";
    return 8; 
}

int main() {
  Query q;
  q.addFloat(computeFloat()).addInt(computeInt());
}

核心问题在于求值顺序

你以为的顺序可能是computefloat- > addfloat -> computeint -> add int,实际上,在gcc 7之前的表现

computing int... 
computing float... 
addFloat: 10.1
addInt: 8

从c++17开始，规定了这个求值顺序

再比如

std::cout << a() << b() << c();

这个a b c可以是任意调用顺序，反正最终输出无所谓，但你可能有隐含依赖

在c++17之后，强制要求a b c

这里列一下顺序

a.b

a->b

a->*b

a(b1, b2, b3) // b1, b2, b3 - in any order

b @= a // '@' means any operator

a[b]

a << b

a >> b

经典例子

#include <iostream>
#include <string>

int main() {
    std::string s = "but I have heard it works even"
                    "if you don't believe in it";
    s.replace(0, 4, "")
     .replace(s.find("even"), 4, "only")
     .replace(s.find(" don't"), 6, "");
    std::cout << s;
}
//gcc7之后：I have heard it works only if you believe in it
//gcc7之前： I have heard it works evenonlyou donieve in it

Automate your C library type-based overload resolutions with C++17

给了一个糊c代码的方案

比如

Ipp8u*      ippsMalloc_8u(int len);
Ipp16u*     ippsMalloc_16u(int len);
Ipp32u*     ippsMalloc_32u(int len);
Ipp8s*      ippsMalloc_8s(int len);
Ipp16s*     ippsMalloc_16s(int len);
Ipp32s*     ippsMalloc_32s(int len);
Ipp64s*     ippsMalloc_64s(int len);
Ipp32f*     ippsMalloc_32f(int len);
Ipp64f*     ippsMalloc_64f(int len);

糊成这样

auto ippsMalloc = FunctionWrapper(ippsMalloc_8u, ippsMalloc_16u, ippsMalloc_32u, /* etc */);

auto ptr = PointerWrapper<Ipp8u, ippsFree>(ippsMalloc, size);

template <typename ... F>
class FunctionWrapper  final {
        static_assert(sizeof...(F) != 0, "FunctionWrapper should be not empty");
        std::tuple<F...> var;
        constexpr static inline std::size_t pack_size = sizeof...(F);
    template <typename ... Args>
    auto operator()(Args ... args) const {
        // compile-time
        constexpr auto verification_result = VerifyOverload<0, Args...>();
        if constexpr (!verification_result)
            static_assert(NoOverloadFound<F...>(), "No suitable overload is found");

        // run-time
        return std::get<verification_result>(var)(args...);
		}
};

代码仓库在这里

cppnow

cppnow 2021全放出来了。接着上周，继续总结

Taskflow: A Heterogeneous Task Graph Programming System with Control Flow: Tsung-Wei Huang

介绍taskflow这个并行计算框架的。介绍了很多很多次了

例子,一个DAG任务调度

#include <taskflow/taskflow.hpp>   // Taskflow is header-only
int main(){
	tf::Taskflow taskflow; 
	tf::Executor executor;
	auto [A, B, C, D] = taskflow.emplace(
		[] () { std::cout << "TaskA\n"; }
		[] () { std::cout << "TaskB\n"; },
		[] () { std::cout << "TaskC\n"; },
		[] () { std::cout << "TaskD\n"; }
	);
	A.precede(B, C); // A runs before B and C
	D.succeed(B, C);  // D runs after    B and C
	executor.run(taskflow).wait(); // submit the taskflow to the executor
	return 0
}

而且还内置profile，还header-only，感觉有点像folly::executor或者c++23executor是不是

当前已有的类库存在的问题：只是并行了，但对于任务的拼接处理不够好，也就是DAG任务流模式，缺少这种模型的支持

如果上面的例子用openmp重写，是这样的

#include <omp.h>  // OpenMP is a lang ext to describe parallelism using compiler directives
int main(){ 
	#omp parallel num_threads(std::thread::hardware_concurrency())
	{    
		int A_B, A_C, B_D, C_D;
		#pragma omp task depend(out: A_B, A_C) 
		{  
			std::cout << ”TaskA\n” ;
		}
		#pragma omp task depend(in: A_B; out: B_D)  
		{
			std::cout << ” TaskB\n” ;
		} 
		#pragma omp task depend(in: A_C; out: C_D)  
		{
			std::cout << ” TaskC\n” ;
		} 
		#pragma omp task depend(in: B_D, C_D) 
		{  
			std::cout << ”TaskD\n” ;
		}
	}
	return 0
}

用tbb实现是这样的

#include <tbb.h>  // Intel’s TBB is a general-purpose parallel programming library in C++
int main(){ 
	using namespace tbb;
	using namespace tbb:flow;
	int n = task_scheduler init::default_num_threads () ; 
	task scheduler_init init(n); 
	graph g;
	continue_node<continue_msg> A(g, [] (const continue msg &) { 
		std::cout << “TaskA” ; 
	}) ;
	continue_node<continue_msg> B(g, [] (const continue msg &) { 
		std::cout << “TaskB” ; 
	}) ;
	continue_node<continue_msg> C(g, [] (const continue msg &) { 
		std::cout << “TaskC” ; 
	}) ;
	continue_node<continue_msg> C(g, [] (const continue msg &) { 
		std::cout << “TaskD” ; 
	}) ;
	make_edge(A, B);
	make_edge(A, C);
	make_edge(B, D);
	make_edge(C, D);
	A.try_put(continue_msg());
	g.wait_for_all();
}

在复杂点的DAG，子流程多的，taskflow表达起来更简洁

条件加权的DAG也能处理

调度器工作决策一种是任务级别，要捋清依赖来做优化，一种是worker级别，可以搞work-steal

目前使用的用户也很多，之前也参加过cppcon，主要还是大力推广宣传(搞开源，不吹没人知道)

Designing Concurrent C++ Applications

这个介绍的是c++23即将引入的exexutor抽象，避免使用thread，其实和上面的taskflow差不太多，他实现了一个库concoro

演示代码在这里https://github.com/lucteo/cppnow2021-examples/blob/main/concurrency-tutorial/05_fork_join.cpp

Algorithms from a Compiler Developer’s Toolbox

介绍编译器开发人员用到的算法，这里不如直接看 编译器设计第三版这本书，讲了很多

Weak Interfaces Weak Defences

讲的是之前讲过多次的参数误用，参数类型一样的多个参数，有可能顺序写错造成bug，或者不同类型但是能互相转化的那种，也有可能顺序错了但是程序编译没啥问题

内建类型要小心

甚至还有swapdetector这种工具帮忙查这种错误

Clang-tidy也有这种分析

[clang-tidy] Add ‘readability-suspicious-call-argument’ check

Add ‘bugprone-easily-swappable-parameters’ check

gcc的一个bug 连gcc都会有这种问题

--- adxintrin.h (revision 249844)
+++ adxintrin.h (working copy)
@@ -33,7 +33,7 @@
 _subborrow_u32 (unsigned char __CF, unsigned int __X,
                unsigned int __Y, unsigned int *__P)
 {
-  return __builtin_ia32_sbb_u32 (__CF, __Y, __X, __P);
+  return __builtin_ia32_sbb_u32 (__CF, __X, __Y, __P);
 }
 
 extern __inline unsigned char
@@ -58,7 +58,7 @@
 _subborrow_u64 (unsigned char __CF, unsigned long long __X,
                unsigned long long __Y, unsigned long long *__P)
 {
-  return __builtin_ia32_sbb_u64 (__CF, __Y, __X, __P);
+  return __builtin_ia32_sbb_u64 (__CF, __X, __Y, __P);
 }
 
 extern __inline unsigned char

Simpler Strong Types

讲解提案p0109.如何能真正的强类型？？

解决方案

//put data member in first base class object
//and combine that with ops
//other short-hands can use that
template <typename V, 
       typename TAG, 
       template<typename...>class ...OPS>
struct strong : detail_::holder<V,TAG>,ops<TAG,OPS...> {
};
struct literGas : strong<double,literGas,Additive,Order,Out>{
};

还讲了非常多的组合场景。我看不懂，但我大受震撼

Library Approaches for Strong Type Aliases

和上一个话题差不多，还是讲强类型

类型可能由于顺序问题导致语义变了

比如

char const fillChar=’0’;
int const count=5;
std::string s(fillChar,count);
int const rows=3;
int const columns=4;
Matrix m{columns,rows};

再或者，参数意思不对

void sleep(int seconds);
int const pause_milliseconds=5;
sleep(pause_milliseconds);
void accelerate_to(double metres_per_second);
double const target_speed_mph=4.3;
accelerate_to(target_speed_mph);

再或者，参数无法表明自身

int port=1234;
Socket s1{port};
Socket s2{socket(AF_INET,SOCK_STREAM,0)};
int8_t count=get_count();
std::cout<<"There are "<<count<<" elements\n";

一个是port，一个是fd，混了

作者给了一个strong_typedef来解决

原有的基础类型塞到积累当value，通过子类或者其他基类把方法暴漏出来，这和上面那个方法差不多

事实上，我觉得，这就是个定义问题，不追求完美的解决方案，把构造接口改掉，消除歧义就好了。至于sleep这种参数误用，用api一定要确认好api的要求

Converting a State Machine to a C++ 20 Coroutine

手把手教你吧状态机改成协程，说实话我看到协程的那几个关键字就头疼。目前协程的生态是真滴粗糙

Taking Template One Step Further

介绍自己的两个库kiwaku和ofw 介绍他怎么用模版的。我没怎么看明白他想说啥

The Performance Price of Dynamic Memory in c++

几个结论

如果经常分配回收动态内存，new delete可能是瓶颈

- 内存碎片分配策略不高效等等

如果经常随机访问动态内存，cache miss可能是瓶颈

几个建议

别用std::vector<unique_ptr<blah>> 局部性太差，用boost::polly_container，用std::vector<std::variant<..>>或者干脆拆开

别经常malloc/free动态内存，有个固定的内存块 placement new是替代方案

或者定制alloctor，自己搞分配

如果对象小，尽可能用上SBO策略

用外部malloc实现 tcmalloc/jemalloc/mimalloc

关于cache的一些经验

数据局部性，局部性越好，不用加载cache，越快

64M searches on a hash set (std::unordered_set)

	Small hash set (32)	Medium hash set (1M)	Large hash set (64M)
Runtime (in s)	1.5	2.8	5

如果数据是可预测性比较高的，prefetcher优化就能用上

数据按cacheline读，尽可能利用cacheline的数据

另外，作者的博客非常不错，https://johnysswlab.com/

What is an ABI and Why is Breaking it Bad

一直兼容同一种ABI，保持二进制可用，现在看来已经成为一个新的约束。有些新功能，由于保持ABI不能加到c++中，

c++ 20有个P0192 half float方案，但是iostream可能会因此导致ABI break，方案推迟

计划后面增加支持ABI break的方案

Parallelism on Ranges: Should We

推销HPX库，一个类似boost的标准库补充库

std::vector<int> vi(10'000'000);
std::iota(vi.begin(), vi.end(), 1);
 
auto rng = vi | ranges::views::transform([](int i){return i*i;})
    | ranges::views::reverse;
 
hpx::ranges::for_each(hpx::execution::par, rng, [](auto i){return i;});

不过fork-join (DAG-like)之类的组合不太支持。后续的支持方向

Executors: The Art of Generating Composable APIs

还是hpx库，他们也实现了executor代码

项目

https://github.com/SanderMertens/flecs/tree/v2.4.0 一个ecs框架
https://github.com/DrCpp/DrMock 一个mock框架，但是没有看到DAG支持(类似gmock insequence的组件)

本文永久链接

我的博客即将同步至腾讯云+社区，邀请大家一同入驻：https://cloud.tencent.com/developer/support-plan?invite_code=1rdhgcnljj2x2

看到这里或许你有建议或者疑问或者指出错误，请留言评论! 多谢! 你的评论非常重要！也可以帮忙点赞收藏转发！多谢支持！

觉得写的不错那就给点吧, 在线乞讨

This site is open source. Improve this page.

C++ 中文周刊 第26期

资讯

编译器信息最新动态推荐关注hellogcc公众号

boost 1.77版本出了，一些改动

文章

cppnow

项目

C++ 中文周刊第26期