C++ 中文周刊 第130期

周刊项目地址

公众号

qq群 手机qq点击进入

RSS https://github.com/wanghenshui/cppweeklynews/releases.atom

欢迎投稿,推荐或自荐文章/软件/资源等

提交 issue

感谢ryan赞助


资讯

标准委员会动态/ide/编译器信息放在这里

编译器信息最新动态推荐关注hellogcc公众号 本周没更新 上期 2023-08-30 第217期

C++之父BS最近接受采访发表重要讲话 https://www.bilibili.com/video/BV1Su411P7uu/

关注一下生活,编程生命很长

我只能说不好说,不过这个视频还是可以一看的,浪费大家三分钟时间

文章

Bounded dynamicism with cross-modifying code

这个来自@勃亦学 评论指出,上期提到了一篇动态修改分支的论文,实际上这个技术不是什么新鲜玩意,内核也有,叫static key,不过不用了

也有现成的库提供,https://github.com/backtrace-labs/dynamic_flag

原理看不懂,和论文差不多意思,改汇编

C++:百尺竿头,总差一步

API变化让人难受

Just how constexpr is C++20’s std::string?

部分场景下std::vector std::string能constexpr 比如初始化列表?SSO优化?

但作用有限,别这么用,除非你知道你在做啥

How I joined the bug 323 community

GCC 323 bug单是比较有名?的bug,如果你不知道,你今天就知道了 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323

32位double 优化浮点数得到奇怪的数字

复现代码

#include <array>
#include <cmath>
#include <iostream>
#include <set>

using double2 = std::array<double, 2>;

struct Comparator
{
  static double deviation (double2 p)
  {
    const double2 p0 { 1, 0 };
    const double2 vp0p1 { -1 / std::sqrt(2), 1 / std::sqrt(2) };
    const double2 vp0p { p[0] - 1, p[1]};
    const double dotprod = vp0p1[0] * vp0p[0] + vp0p1[1] * vp0p[1];
    const double2 proj { 1 + dotprod * vp0p1[0], p0[1] + dotprod * vp0p1[1] };
    return std::sqrt ((proj[0] - p[0]) * (proj[0] - p[0]) +
                      (proj[1] - p[1]) * (proj[1] - p[1]));
  }

  bool operator() (const double2& a, const double2& b) const
  {
    const double da = deviation(a);
    const double db = deviation(b);
    if (da == db)
      return a < b;
    return da < db;
  }
};

void insert (std::set<double2, Comparator>& set, double2 point)
{
  const double deviation = Comparator::deviation(point);

  std::cerr << "Inserting " << std::defaultfloat << point[0] << " " << point[1]
            << " with deviation = " << deviation << " / hex="
            << std::hexfloat << deviation << std::endl;
  if (set.insert (point).second)
    std::cerr << " -> Success" << std::endl;
  else
    std::cerr << " -> Failure" << std::endl;
}

int main (int, char**)
{
  std::cerr.precision(18);
  std::set<double2, Comparator> set;
  insert(set, { 0, 0 });
  insert(set, { 1, 1 });
  return EXIT_SUCCESS;
}

64位编译没问题,32位编译带O优化必现

简单来说就是寄存器使用优化了,导致FPU计算行为不同,

两次计算第一次是64位,第二次复用寄存器再计算,用的就不是64位而是32位,精度不同

避免这个问题 可以voliate强制内存,或者-ffloat-store

或者把眼睛闭起来,忘记323 bug的故事,反正你之前也没听过

Locating ‘identifiers’ quickly (ARM NEON edition)

SIMD时间

代码我不贴了,看不懂 https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/blob/master/2023/09/03/src/identifiers.c

Panic better using modern C++

直接贴代码了 https://github.com/rnburn/bbai-kernel/blob/master/bbai/base/error/panic.h


#pragma once

#include <concepts>
#include <format>
#include <source_location>
#include <string_view>
#include <type_traits>

//--------------------------------------------------------------------------------------------------
// panic_dynamic_string_view
//--------------------------------------------------------------------------------------------------
struct panic_dynamic_string_view {
  template <class T>
    requires std::constructible_from<std::string_view, T>
  panic_dynamic_string_view(
      const T &s,
      std::source_location loc = std::source_location::current()) noexcept
      : s{s}, loc{loc} {}

  std::string_view s;
  std::source_location loc;
};

//--------------------------------------------------------------------------------------------------
// panic_format
//--------------------------------------------------------------------------------------------------
template <class... Args>
struct panic_format {
  template <class T>
  consteval panic_format(
      const T &s,
      std::source_location loc = std::source_location::current()) noexcept
      : fmt{s}, loc{loc} {}

  std::format_string<Args...> fmt;
  std::source_location loc;
};

//--------------------------------------------------------------------------------------------------
// panic_impl
//--------------------------------------------------------------------------------------------------
[[noreturn]] void panic_impl(const char* s) noexcept;

//--------------------------------------------------------------------------------------------------
// panic
//--------------------------------------------------------------------------------------------------
[[noreturn]] inline void panic(panic_dynamic_string_view s) noexcept {
  auto msg =
      std::format("{}:{} panic: {}\n", s.loc.file_name(), s.loc.line(), s.s);
  panic_impl(msg.c_str());
}

template <class... Args>
[[noreturn]] void panic(panic_format<std::type_identity_t<Args>...> fmt,
                        Args &&...args) noexcept
  requires (sizeof ...(Args) > 0)
{
  auto msg = std::format("{}:{} panic: {}\n", fmt.loc.file_name(), fmt.loc.line(),
                         std::format(fmt.fmt, std::forward<Args>(args)...));
  panic_impl(msg.c_str());
}

翻译: It’s not always obvious when tail-call optimization is allowed

大概意思是你不能假定尾递归一定发生,编译器能优化到。保险看汇编,更保险就改成循环模式

False sharing and 128-byte alignment/padding

为啥用128,之前不是64么?cacheline 不是64吗 CPU进化了我超

“Intel® 64 and IA-32 architectures optimization reference manual”, in section 3.7.3 “Hardware Prefetching for Second-Level Cache”, about the Intel Core microarchitecture:

“Streamer — Loads data or instructions from memory to the second-level cache. To use the streamer, organize the data or instructions in blocks of 128 bytes, aligned on 128 bytes. The first access to one of the two cache lines in this block while it is in memory triggers the streamer to prefetch the pair line.”

看代码 https://github.com/facebook/folly/blob/main/folly/lang/Align.h

//  Memory locations within the same cache line are subject to destructive
//  interference, also known as false sharing, which is when concurrent
//  accesses to these different memory locations from different cores, where at
//  least one of the concurrent accesses is or involves a store operation,
//  induce contention and harm performance.
//
//  Microbenchmarks indicate that pairs of cache lines also see destructive
//  interference under heavy use of atomic operations, as observed for atomic
//  increment on Sandy Bridge.
//
//  We assume a cache line size of 64, so we use a cache line pair size of 128
//  to avoid destructive interference.
//
//  mimic: std::hardware_destructive_interference_size, C++17
constexpr std::size_t hardware_destructive_interference_size =
    (kIsArchArm || kIsArchS390X) ? 64 : 128;

理解 std::hardware_destructive_interference_size和std::hardware_constructive_interference_size

destructive 避免false sharing constructive 尽可能的true sharing

现代CPU,false sharing 128更明显一些,64可能还是有影响。

如何使用? https://github.com/facebook/folly/blob/main/folly/ProducerConsumerQueue.h

  using AtomicIndex = std::atomic<unsigned int>;

  char pad0_[hardware_destructive_interference_size];
  const uint32_t size_;
  T* const records_;

  alignas(hardware_destructive_interference_size) AtomicIndex readIndex_;
  alignas(hardware_destructive_interference_size) AtomicIndex writeIndex_;

  char pad1_[hardware_destructive_interference_size - sizeof(AtomicIndex)];

libcopp对C++20协程的接入和接口设计

学习一下,希望大家人人都能实现自己的有栈协程

Performance Through Memory Layout

连续紧凑的内存对性能更友好,比如list/bst/graph自定义内存分配器

没啥说的,老观点了

深入解析 Hazard Pointer (上)

深入解析 Hazard Pointer (中)

深入解析 Hazard Pointer (下)

写的也不是很深入,看个大概

Epoch Based Reclamation

了解一下epoch推进技术, 直接贴伪代码了



#define N_THREADS 4 //一共4个线程
bool active[N_THREADS] = {false};
int epoches[N_THREADS] = {0};
int global_epoch = 0;
vector<int*> retire_list[3];
void read(int thread_id)
{
  active[thread_id] = true;
  epoches[thread_id] = global_epoch;
  //进入临界区了。可以安全的读取
  //...... 
  //读取完毕,离开临界区
  active[thread_id] = false;
}
void logical_deletion(int thread_id)
{
  active[thread_id] = true;
  epoches[thread_id] = global_epoch;
  //进入临界区了,这里,我们可以安全的读取
  //好了,假如说我们现在要删除它了。先逻辑删除。
  //而被逻辑删除的tmp指向的节点还不能马上被回收,因此把它加入到对应的retire list
  retire_list[global_epoch].push_back(tmp);
  //离开临界区
  active[thread_id] = false;
  //看看能不能物理删除
  try_gc();
}
bool try_gc()
{
  int &e = global_epoch;
  for (int i = 0; i < N_THREADS; i++) {
    if (active[i] && epoches[i] != e) {
        //还有部分线程没有更新到最新的全局的epoch值
        //这时候可以回收(e + 1) % 3对应的retire list。
        free((e + 1) % 3);//不是free(e),也不是free(e-1)。参看下面
        return false;
    }
  }
  //更新global epoch
  e = (e + 1) % 3;
  //更新之后,那些active线程中,部分线程的epoch值可能还是e - 1(模3)
  //那些inactive的线程,之后将读到最新的值,也就是e。
  //不管如何,(e + 1) % 3对应的retire list的那些内存,不会有人再访问到了,可以回收它们了
  //因此epoch的取值需要有三种,仅仅两种是不够的。
  free((e + 1) % 3);//不是free(e),也不是free(e-1)。参看下面
}
bool free(int epoch)
{
  for each pointer in retire_list[epoch]
    if (pointer is not NULL)
      delete pointer;
}

视频

C++ Weekly - Ep 392 - Google’s Bloaty McBloatface

介绍bloaty的,分析二进制谁占大头

Karl Åkerblom: A quick look at Tracy Profiler

介绍 https://github.com/wolfpld/tracy 的,但是作者不推荐虚拟机上用

我感觉 CS交互太扯了,其实有现成的网站 https://www.speedscope.app/ 这种,按照他的格式生成文件就好了。

直接支持perf。如果是c++应用,其实可以内置一个perf 命令,通过curl发起命令开始record,结束就生成文件,拿到文件传到speedscope上

perf record -a -F 999 -g -p PID > perf.data
perf script -i perf.data

Paul Dreik: Using variable templates on a tiny problem

实现一个mask bit函数,通常这个bit只能是0-32 怎么限制滥用?

template <int N>
concept is_within_32 = (N >= 0 && N <= 32);

template <int Nbits>
    requires is_within_32<Nbits>
constexpr uint32_t mask_of{(1ULL << Nbits) - 1};
void handle_data(uint32_t raw) { interpret(raw & mask_of<30>); }
void demo_misuse() {
    //  [[maybe_unused]] auto x = mask_of<-1>;    // narrowing (gcc), or
    // narrowing+shift (clang) (clang compile error)
    //  [[maybe_unused]] auto y = mask_of<33>;    // narrowing, compile error
    // [[maybe_unused]] auto z =
    //   mask_of<0xFF>;  // narrowing (gcc), or narrowing+shift (clang) compile
    //   error
}

看懂了吗,还是concept

开源项目需要人手

新项目介绍/版本更新

工作招聘

有没有工作岗位推荐一波,楼主失业了


本文永久链接

如果有疑问评论最好在上面链接到评论区里评论,这样方便搜索,微信公众号有点封闭/知乎吞评论

看到这里或许你有建议或者疑问或者指出错误,请留言评论! 多谢! 你的评论非常重要!也可以帮忙点赞收藏转发!多谢支持! 觉得写的不错那就给点吧, 在线乞讨 微信转账