类型擦除技术 type erasure以及std::function设计实现


本文是type erased printabledesign space for std::function 的整理总结

说是类型擦除技术,不如说是多态技术

函数指针多态 几种做法

  • void* 传统的万能参数
  • 继承接口值多态,dynamic_cast
  • 值语意的多态,type erasure 也就是类型擦除
    • std::function boost::any_range boost::any

来举个例子, type erased printable

打印托管的值 godbolt链接

#include <memory>
#include <ostream>

struct PrintableBase {
    virtual void print(std::ostream& os) const = 0;
    virtual ~PrintableBase() = default;
};

template<class T>
struct PrintableImpl : PrintableBase {
    T t_;
    explicit PrintableImpl(T t) : t_(std::move(t)) {}
    void print(std::ostream& os) const override { os << t_; }
};

class UniquePrintable {
    std::unique_ptr<PrintableBase> p_;
public:
    template<class T>
    UniquePrintable(T t) : p_(std::make_unique<PrintableImpl<T>>(std::move(t))) { }

    friend std::ostream& operator<<(std::ostream& os, const UniquePrintable& self) {
        self.p_->print(os);
        return os;
    }
};

#include <iostream>

void printit(UniquePrintable p) {
    std::cout << "The printable thing was: " << p << "." << std::endl;
}

int main() {
    printit(42);
    printit("hello world");
}

直接打印值(传引用) Godbolt.

#include <ostream>

class PrintableRef {
    const void *data_;
    void (*print_)(std::ostream&, const void *);
public:
    template<class T>
    PrintableRef(const T& t) : data_(&t), print_([](std::ostream& os, const void *data) {
        os << *(const T*)data;
    }) { }

    friend std::ostream& operator<<(std::ostream& os, const PrintableRef& self) {
        self.print_(os, self.data_);
        return os;
    }
};

#include <iostream>

void printit(PrintableRef p) {
    std::cout << "The printable thing was: " << p << "." << std::endl;
}

int main() {
    printit(42);
    printit("hello world");
}

这两种类型擦除,一个是统一接口,记住值类型,然后打印方法和类型绑定

一个是干脆在一开始就吧打印方法实例化,值类型 void* 存地址

也就是上面说到的两种技术的展开

第一种虚函数的方法是有开销的

说到std::function和std::any,标准库为这种虚函数做了优化,也叫SBO

首先从std::function的设计谈起

  • 函数要不要保存?保存就是std::function,不保存就是function_ref. (一种view,提案中)

    • 需求,需不需要管理这个函数,生命周期等,function_ref只用不管
  • 需不需要拷贝?需要就是std::function,不需要拷贝就是std::unique_function. (一种unique guard,提案中) 也就是move-only

  • 需不需要共享?共享带来函数副作用了

    uto f = [i=0]() mutable { return ++i; };
    F<int()> alpha = f;
    F<int()> beta = alpha;
    F<int()> gamma = f;
    assert(alpha() == 1);
    assert(beta() == 2);  // beta shares alpha's heap-managed state
    assert(gamma() == 1);  
    

    可能会有个shared_function的东西(我感觉多余)

  • SBO相关设计 类似SSO 就是在栈上开个buffer存,不分配堆资源

    • buffer大小?要不要可定制? 当前不同的标准库用的buffer不一样,clang libc++ 是24 gcc libstdc++是16

    自己设计,可能会定制化

    template<class Signature, size_t Capacity = 24, size_t Align = alignof(std::max_align_t)>
    class F;
      
    using SSECallback = F<int(), 32, 32>;  // suitable for lambdas that capture MMX vector type
    

    这点子没人想过?不可能,已经有人实现了 inplace_function.

  • 如果SBO优化不了怎么办?是不是需要支持alloctor接口?

  • 强制SBO,不能SBO的直接编译爆错,inplace_function.做了

  • SBO优化,要保证对象nothrow

    • static_assert(std::is_nothrow_move_constructible_v<T>) inside the constructor of F.
  • SBO优化针对not trivially copyable如何处理

    • libc++ 保证可以SBO,但是libstdc++没有
    • 通过static_assert(is_trivially_relocatable_v && sizeof(T) <= Capacity && alignof(T) <= Align) inside the constructor of F控制
  • function能不能empty,能不能从nullptr构造

  • function之间能不能转换类型?

还有很多角落,不想看了,这也是有各种function提案补充的原因


ref

  • https://www.newsmth.net/nForum/#!article/Programming/3083 发现个02年的介绍boost::any的帖子卧槽,历史的痕迹
  • https://akrzemi1.wordpress.com/2013/11/18/type-erasure-part-i/
  • 历史的痕迹 any_iterator http://thbecker.net/free_software_utilities/type_erasure_for_cpp_iterators/any_iterator.html
  • std::function实现介绍 gcc源码级https://www.cnblogs.com/jerry-fuyi/p/std_function_interface_implementation.html
  • std::function实现介绍,由浅入深 https://zhuanlan.zhihu.com/p/142175297
  • 这个文章写的不错。我写了一半发现有人写了。。。 直接看这个就好了https://fuzhe1989.github.io/2017/10/29/cpp-type-erasure/

Read More

c++反射的几种实现以及介绍几个库

人需求真是复杂。又想要名字信息,又想要泛化的访问接口

反射实现的几种方案

  • 预处理一层
    • 代表 QT Unreal 先用宏声明好需要处理的字段,然后让编译框架中的预-预编译处理器先处理一遍,展开对应的标记
    • 用libclang来做,metareflect cpp-reflection 还有个原理介绍
  • 注册,有几种方案
    • 用宏拼方法+注册 Rttr
    • 手写注册 meta
    • 编译器推导(功能有限) magic_get

ref

  • 复述了这篇博客的内容 https://blog.csdn.net/D_Guco/article/details/106744416
  • Rttr 这个手法就是宏注册
  • ponder 也是有一个注册中心的,把字符串和函数指针绑起来
  • cista 官网介绍 这个库的思路和之前提到的magic_get差不多,也提供宏注入的手段,他说灵感来自这个帖子

Read More

boost.pfr(Precise and Flat Reflection)也叫magic_get 如何使用以及实现原理

用途,提供最基本的反射能力,即不需要指定的访问字段

这种设计即能保证tuple类型访问又能保留名字信息,通过静态反射来搞定

局限:仅仅支持简单的聚合类型(aggregate types),多了继承就不行了,空基类也不行

struct simple_aggregate {  // SimpleAggregare
    std::string name;
    int age;
    boost::uuids::uuid uuid;
};

struct empty {             // SimpleAggregare
};

struct aggregate : empty { // not a SimpleAggregare
    std::string name;
    int age;
    boost::uuids::uuid uuid;
};

用法

#include <iostream>
#include <boost/pfr.hpp>
 
struct  Record
{
  std::string name;
  int         age;
  double      salary;
};
 
struct Point
{
  int x;
  int y;
};
 
int main()
{
  Point pt{2, 3};
  Record rec {"Baggins", 111, 999.99};
   
  auto print = [](auto const& member) {
    std::cout << member << " ";
  };  
  
  boost::pfr::for_each_field(rec, print);
  boost::pfr::for_each_field(pt, print);
}

文档里也介绍了原理

  • at compile-time: use aggregate initialization to detect fields count in user-provided structure
    • BOOST_PFR_USE_CPP17 == 1:
      • at compile-time: structured bindings are used to decompose a type T to known amount of fields
    • BOOST_PFR_USE_CPP17 == 0 && BOOST_PFR_USE_LOOPHOLE == 1:
      • at compile-time: use aggregate initialization to detect fields count in user-provided structure
      • at compile-time: make a structure that is convertible to anything and remember types it has been converted to during aggregate initialization of user-provided structure
      • at compile-time: using knowledge from previous steps create a tuple with exactly the same layout as in user-provided structure
      • at compile-time: find offsets for each field in user-provided structure using the tuple from previous step
      • at run-time: get pointer to each field, knowing the structure address and each field offset
      • at run-time: a tuple of references to fields is returned => all the tuple methods are available for the structure
    • BOOST_PFR_USE_CPP17 == 0 && BOOST_PFR_USE_LOOPHOLE == 0:
      • at compile-time: let I be is an index of current field, it equals 0
      • at run-time: T is constructed and field I is aggregate initialized using a separate instance of structure that is convertible to anything
      • at compile-time: I += 1
      • at compile-time: if I does not equal fields count goto step c. from inside of the conversion operator of the structure that is convertible to anything
      • at compile-time: using knowledge from previous steps create a tuple with exactly the same layout as in user-provided structure
      • at compile-time: find offsets for each field in user-provided structure using the tuple from previous step
      • at run-time: get pointer to each field, knowing the structure address and each field offset
  • at run-time: a tuple of references to fields is returned => all the tuple methods are available for the structure

现在是c++17~c++20了,考虑BOOST_PFR_USE_CPP17 == 1 就是利用结构化绑定和展开

原型大概这样

template <typename T, typename F>
 // requires std::is_aggregate_v<T>
void for_each_member(T const & v, F f);

首先,我们要能探测出这个结构体有多少个字段

template <typename T>
constexpr auto size_() 
  -> decltype(T{\
  {}, {}, {}, {}\
               }, 0u)
{ return 4u; }
 
template <typename T>
constexpr auto size_() 
  -> decltype(T{\
  {}, {}, {}\
               }, 0u)
{ return 3u; }
 
template <typename T>
constexpr auto size_() 
  -> decltype(T{\
  {}, {}\
               }, 0u)
{ return 2u; }
 
template <typename T>
constexpr auto size_() 
  -> decltype(T{\
  {}\
               }, 0u)
{ return 1u; }
 
template <typename T>
constexpr auto size_() 
  -> decltype(T{}, 0u)
{ return 0u; }
 
template <typename T>
constexpr size_t size() 
{ 
  static_assert(std::is_aggregate_v<T>);
  return size_<T>(); 
}

这段代码有点鬼畜, jeklly对两个括号没法解析,所以我加了斜杠

主要看这个decltype(T{\{}, {}\}, 0u), 要明白这是逗号表达式,左边的值是无所谓的,也就是说最后推导的肯定是usigned

但是能用T里面构造出来,就说明有几个字段,就匹配到了某个函数,返回值就是字段的个数

这里我们假定都是能用值初始化的,但可能某些字段不可以这样初始化,所以要加一个cast函数,来强制转换一下

struct init
{
  template <typename T>
  operator T(); // never defined
};
template <typename T>
constexpr auto size_() 
  -> decltype(T{init{}, init{}, init{}, init{}\
               }, 0u)
{ return 4u; }
 
template <typename T>
constexpr auto size_() 
  -> decltype(T{init{}, init{}, init{}\
               }, 0u)
{ return 3u; }
 
template <typename T>
constexpr auto size_() 
  -> decltype(T{init{}, init{}}, 0u)
{ return 2u; }
 
template <typename T>
constexpr auto size_() 
  -> decltype(T{init{}}, 0u)
{ return 1u; }
 
template <typename T>
constexpr auto size_() 
  -> decltype(T{}, 0u)
{ return 0u; }
 
template <typename T>
constexpr size_t size() 
{ 
  static_assert(std::is_aggregate_v<T>);
  return size_<T>(); 
}

看上去可以了,但是size<Point>();还是会报错,因为简单类型不一定需要多个字段都初始化,所以可能会匹配多个

引入tag dispatch

template <unsigned I>
struct tag : tag<I - 1> {};
 
template <>
struct tag<0> {};
template <typename T>
constexpr auto size_(tag<4>) 
  -> decltype(T{init{}, init{}, init{}, init{}}, 0u)
{ return 4u; }
 
template <typename T>
constexpr auto size_(tag<3>) 
  -> decltype(T{init{}, init{}, init{}}, 0u)
{ return 3u; }
 
template <typename T>
constexpr auto size_(tag<2>) 
  -> decltype(T{init{}, init{}}, 0u)
{ return 2u; }
 
template <typename T>
constexpr auto size_(tag<1>) 
  -> decltype(T{init{}}, 0u)
{ return 1u; }
 
template <typename T>
constexpr auto size_(tag<0>) 
  -> decltype(T{}, 0u)
{ return 0u; }
 
template <typename T>
constexpr size_t size() 
{ 
  static_assert(std::is_aggregate_v<T>);
  return size_<T>(tag<4>{}); // highest supported number 
}

这样就不会匹配错误了

对应的for_each就是结构化绑定

template <typename T, typename F>
void for_each_member(T const& v, F f)
{
  static_assert(std::is_aggregate_v<T>);
 
  if constexpr (size<T>() == 4u)
  {
    const auto& [m0, m1, m2, m3] = v;
    f(m0); f(m1); f(m2); f(m3);
  }
  else if constexpr (size<T>() == 3u)
  {
    const auto& [m0, m1, m2] = v;
    f(m0); f(m1); f(m2);
  }
  else if constexpr (size<T>() == 2u)
  {
    const auto& [m0, m1] = v;
    f(m0); f(m1);
  }
  else if constexpr (size<T>() == 1u)
  {
    const auto& [m0] = v;
    f(m0);
  }
}

知道size就好泛化了。

boost.pfr实现的更加泛化,有机会可以研究研究


ref


Read More

std::exchange用法

这个函数没啥好说的,主要是为了偷东西 诞生的,实现非常简单

template<class T, class U = T>
constexpr // since C++20
T exchange(T& obj, U&& new_value)
{
    T old_value = std::move(obj);
    obj = std::forward<U>(new_value);
    return old_value;
}

比如参考链接1里面 move constructor的实现

struct S
{
  int n;
 
  S(S&& other) noexcept : n{std::exchange(other.n, 0)}
  {}
 
  S& operator=(S&& other) noexcept 
  {
    if(this != &other)
        n = std::exchange(other.n, 0); // 移动 n ,并于 other.n 留下零
    return *this;
  }
};

我看到的用法

template <promise_base::urgent Urgent>
void promise_base::make_ready() noexcept {
    if (_task) {
        if (Urgent == urgent::yes) {
            ::seastar::schedule_urgent(std::exchange(_task, nullptr));
        } else {
            ::seastar::schedule(std::exchange(_task, nullptr));
        }
    }
}

可能就要比较std::swap和他的区别了,直接上结论吧,上限是std::swap的性能,要不是为了偷东西这个特性,不要用

SO有个链接做了简单验证,见参考链接2

然后Ben Deane 有个案例 std::exchange 惯用法,在参考链接3 4 里。简单概括下

就是用std:exchange 来省掉没必要的临时变量,链接3 的ppt可以看下,写的很漂亮,作者叫他 The “swap-and-iterate” pattern

我把参考链接四的代码贴一下

以前用swap

class Dispatcher {
    // We hold some vector of callables that represents
    // events to dispatch or actions to take
    using Callback = /* some callable */;
    std::vector<Callback> callbacks_;
 
    // Anyone can register an event to be dispatched later
    void defer_event(const Callback& cb) {
        callbacks_.push_back(cb);
    }
 
    // All events are dispatched when we call process
    void process() {
        std::vector<Callback> tmp{};
        using std::swap; // the "std::swap" two-step
        swap(tmp, callbacks_);
        for (const auto& callback : tmp) {
            std::invoke(callback);
        }
    }
  
    void post_event(Callback& cb) {
        Callback tmp{};
        using std::swap;
        swap(cb, tmp);
        PostToMainThread([this, cb_ = std::move(tmp)] {
            callbacks_.push_back(cb_);
        });
    }
};

改成exchange

class Dispatcher {
    // ...
 
    // All events are dispatched when we call process
    void process() {
        for (const auto& callback : std::exchange(callbacks_, {}) {
            std::invoke(callback);
        }
    }
    
    void post_event(Callback& cb) {
        PostToMainThread([this, cb_ = std::exchange(cb, {})] {
            callbacks_.push_back(cb_);
        });
    }
};

可能你会问,直接std::move不就完事儿,这里作者强调接口的灵活性?

强调move并不会empty,并不会clear,可能还有值,比如std::optional

结合lock

原本std::swap 是这样的

class Dispatcher {
    // ...
 
    // All events are dispatched when we call process
    void process() {
        std::vector<Callback> tmp{};
        {
            using std::swap;
            std::scoped_lock lock{mutex_};
            swap(tmp, callbacks_);
        }
        for (const auto& callback : tmp) {
            std::invoke(callback);
        }
    }
};

改成exchange 省掉一个数组

class Dispatcher {
    // ...
 
    // All events are dispatched when we call process
    void process() {
        std::scoped_lock lock{mutex_};
        for (const auto& callback : std::exchange(callbacks_, {})) {
            std::invoke(callback);
        }
    }
};

能不能吧lock也省掉?临时变量声明周期是一行,一行就够了

class Dispatcher {
    // ...
 
    // All events are dispatched when we call process
    void process() {
        const auto tmp = (std::scoped_lock{mutex_}, std::exchange(callbacks_, {}));
        for (const auto& callback : tmp) {
            std::invoke(callback);
        }
    }
};

ref

  • https://zh.cppreference.com/w/cpp/utility/exchange
  • https://stackoverflow.com/questions/20807938/stdswap-vs-stdexchange-vs-swap-operator
  • https://github.com/CppCon/CppCon2017/blob/master/Lightning%20Talks%20and%20Lunch%20Sessions/std%20exchange%20idioms/std%20exchange%20idioms%20-%20Ben%20Deane%20-%20CppCon%202017.pdf
  • https://www.fluentcpp.com/2020/09/25/stdexchange-patterns-fast-safe-expressive-and-probably-underused/

Read More

探测指针地址是否有效


文章摘抄自这里

场景,当访问不合法的地址,当场segment fault,为了避免,如何探测?

两种方案

  • 捕获sigsegv信号
  • 查/proc/self/maps 的地址范围,做个校验
    • 问题:多线程竞争,可能导致误判,不可取

作者写了个简单的代码,这里直接列出来看看原理即可

#define _GNU_SOURCE
#include <stdint.h>
#include <signal.h>
#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
#include <ucontext.h>

#ifdef __i386__
typedef uint32_t word_t;
#define IP_REG REG_EIP
#define IP_REG_SKIP 3
#define READ_CODE __asm__ __volatile__(".byte 0x8b, 0x03\n"  /* mov (%ebx), %eax */ \
                                       ".byte 0x41\n"        /* inc %ecx */ \
                                       : "=a"(ret), "=c"(tmp) : "b"(addr), "c"(tmp));
#endif

#ifdef __x86_64__
typedef uint64_t word_t;
#define IP_REG REG_RIP
#define IP_REG_SKIP 6
#define READ_CODE __asm__ __volatile__(".byte 0x48, 0x8b, 0x03\n"  /* mov (%rbx), %rax */ \
                                       ".byte 0x48, 0xff, 0xc1\n"  /* inc %rcx */ \
                                       : "=a"(ret), "=c"(tmp) : "b"(addr), "c"(tmp));
#endif

static void segv_action(int sig, siginfo_t *info, void *ucontext) {
    (void) sig;
    (void) info;
    ucontext_t *uctx = (ucontext_t*) ucontext;
    uctx->uc_mcontext.gregs[IP_REG] += IP_REG_SKIP;
}

struct sigaction peek_sigaction = {
    .sa_sigaction = segv_action,
    .sa_flags = SA_SIGINFO,
    .sa_mask = 0,
};

word_t peek(word_t *addr, int *success) {
    word_t ret;
    int tmp, res;
    struct sigaction prev_act;

    res = sigaction(SIGSEGV, &peek_sigaction, &prev_act);
    assert(res == 0);

    tmp = 0;
    READ_CODE

    res = sigaction(SIGSEGV, &prev_act, NULL);
    assert(res == 0);

    if (success) {
        *success = tmp;
    }

    return ret;
}

int main() {
    int success;
    word_t number = 22;
    word_t value;

    number = 22;
    value = peek(&number, &success);
    printf("%d %d\n", success, value);

    value = peek(NULL, &success);
    printf("%d %d\n", success, value);

    value = peek((word_t*)0x1234, &success);
    printf("%d %d\n", success, value);

    return 0;
}

看一乐啊,这里就是操作指针对应的寄存器,不保证正确(多线程下应该不对,这东西应该说进程级别的,放在最外层)

另外,如果不是写什么共享内存程序,segment fault 就挂了得了,别挽救了


Read More

(cppcon)Practical memory pool based allocators for Modern C++

Practical memory pool based allocators for Modern C++

又讲内存池实现的,内存池等于块池

bucket为基本单元 bucket has two properties: BlockSize and BlockCount

bucket主要接口,构造析构分配回收

class bucket {
public:
	const std::size_t BlockSize;
	const std::size_t BlockCount;
	bucket(std::size_t block_size, std::size_t block_count);
	~bucket();
	// Tests if the pointer belongs to this bucket
	bool belongs(void * ptr) const noexcept;
	// Returns nullptr if failed
	[[nodiscard]] void * allocate(std::size_t bytes) noexcept;
	void deallocate(void * ptr, std::size_t bytes) noexcept;
private:
	// Finds n free contiguous blocks in the ledger and returns the first block’s index or BlockCount on failure
	std::size_t find_contiguous_blocks(std::size_t n) const noexcept;
	// Marks n blocks in the ledger as “in-use” starting at ‘index’
	void set_blocks_in_use(std::size_t index, std::size_t n) noexcept;
	// Marks n blocks in the ledger as “free” starting at ‘index’
	void set_blocks_free(std::size_t index, std::size_t n) noexcept;
	// Actual memory for allocations
	std::byte* m_data{nullptr};
	// Reserves one bit per block to indicate whether it is in-use
	std::byte* m_ledger{nullptr};
};

bucket::bucket(std::size_t block_size, std::size_t block_count)
: BlockSize{block_size}
, BlockCount{block_count}
{
	const auto data_size = BlockSize * BlockCount;
	m_data = static_cast<std::byte*>(std::malloc(data_size));
	assert(m_data != nullptr);
	const auto ledger_size = 1 + ((BlockCount - 1) / 8);
	m_ledger = static_cast<std::byte*>(std::malloc(ledger_size));
	assert(m_ledger != nullptr);
	std::memset(m_data, 0, data_size);
	std::memset(m_ledger, 0, ledger_size);
}
bucket::~bucket() {
	std::free(m_ledger);
	std::free(m_data);
}


void * bucket::allocate(std::size_t bytes) noexcept {
	// Calculate the required number of blocks
	const auto n = 1 + ((bytes - 1) / BlockSize);
	const auto index = find_contiguous_blocks(n);
	if (index == BlockCount) {
		return nullptr;
	}
	set_blocks_in_use(index, n);
	return m_data + (index * BlockSize);
}

void bucket::deallocate(void * ptr, std::size_t bytes) noexcept {
	const auto p = static_cast<const std::byte *>(ptr);
	const std::size_t dist = static_cast<std::size_t>(p - m_data);
	// Calculate block index from pointer distance
	const auto index = dist / BlockSize;
	// Calculate the required number of blocks
	const auto n = 1 + ((bytes - 1) / BlockSize);
	// Update the ledger
	set_blocks_free(index, n);
}

然后就是由块来构成池子 指定 BlockSize和BlockCount

// The default implementation defines a pool with no buckets
template<std::size_t id>
struct bucket_descriptors {
	using type = std::tuple<>;
};

struct bucket_cfg16 {
	static constexpr std::size_t BlockSize = 16;
	static constexpr std::size_t BlockCount = 10000;
};
struct bucket_cfg32{
	static constexpr std::size_t BlockSize = 32;
	static constexpr std::size_t BlockCount = 10000;
};
struct bucket_cfg1024 {
	static constexpr std::size_t BlockSize = 1024;
	static constexpr std::size_t BlockCount = 50000;
};
template<>
struct bucket_descriptors<1> {
	using type = std::tuple<bucket_cfg16, bucket_cfg32, bucket_cfg1024>;
};

template<std::size_t id>
using bucket_descriptors_t = typename bucket_descriptors<id>::type;

template<std::size_t id>
static constexpr std::size_t bucket_count = std::tuple_size<bucket_descriptors_t<id>>::value;


template<std::size_t id>
using pool_type = std::array<bucket, bucket_count<id>>;

template<std::size_t id, std::size_t Idx>
struct get_size
    : std::integral_constant<std::size_t, std::tuple_element_t<Idx, bucket_descriptors_t<id>>::BlockSize>{\\
};
    
template<std::size_t id, std::size_t Idx>
struct get_count
    : std::integral_constant<std::size_t, std::tuple_element_t<Idx, bucket_descriptors_t<id>>::BlockCount>{\\
};

template<std::size_t id, std::size_t... Idx>
auto & get_instance(std::index_sequence<Idx...>) noexcept {
	static pool_type<id> instance{\{\{get_size<id, Idx>::value, get_count<id, Idx>::value} ...\}\};
	return instance;
}
template<std::size_t id>
auto & get_instance() noexcept {
	return get_instance<id>(std::make_index_sequence<bucket_count<id>>());
}

涉及到具体的分配策略,怎么找到所需的块呢?

直接找空闲的 有点像hash_map 开放地址法实现。浪费

// Assuming buckets are sorted by their block sizes
template<std::size_t id>
[[nodiscard]] void * allocate(std::size_t bytes) {
	auto & pool = get_instance<id>();
	for (auto & bucket : pool) {
		if(bucket.BlockSize >= bytes) {
			if(auto ptr = bucket.allocate(bytes); ptr != nullptr) {
				return ptr;
			}
		}
	}
	throw std::bad_alloc{};
}

需要额外的信息

template<std::size_t id>
[[nodiscard]] void * allocate(std::size_t bytes) {
	auto & pool = get_instance<id>();
	std::array<info, bucket_count<id>> deltas;
	std::size_t index = 0;
	for (const auto & bucket : pool) {
		deltas[index].index = index;
		if (bucket.BlockSize >= bytes) {
			deltas[index].waste = bucket.BlockSize - bytes;
			deltas[index].block_count = 1;
		} else {
			const auto n = 1 + ((bytes - 1) / bucket.BlockSize);
			const auto storage_required = n * bucket.BlockSize;
			deltas[index].waste = storage_required - bytes;
			deltas[index].block_count = n;
		}
		++index;
	}

    sort(deltas.begin(), deltas.end()); // std::sort() is allowed to allocate
    
	for (const auto & d : deltas)
		if (auto ptr = pool[d.index].allocate(bytes); ptr != nullptr)
			return ptr;
	
    throw std::bad_alloc{};
}

碎片问题?

实现allocator接口

不讲了。看代码

	template<typename T = std::uint8_t, std::size_t id = 0>
class static_pool_allocator {
public:
	//rebind不用实现吧,我记得好像废弃了
    template<typename U>
	static_pool_allocator(const static_pool_allocator<U, id> & other) noexcept
		: m_upstream_resource{other.upstream_resource()} {}
	template<typename U>
	static_pool_allocator & operator=(const static_pool_allocator<U, id> & other) noexcept {
		m_upstream_resource = other.upstream_resource();
		return *this;
	}
	static bool initialize_memory_pool() noexcept { return memory_pool::initialize<id>(); }
private:
	pmr::memory_resource * m_upstream_resource;
};

后面介绍了个分析allocate的工具

clang

  • 转成llvm bitcode -g -O0 -emit-llvm -DNDEBUG 然后用llvm-link链接pass
  • 设定llvm pass
    • 打印调用
  • 用llvm opt命令来执行这个pass

pass长这样

class AllocListPass : public llvm::FunctionPass {
public:
	static char ID;
	AllocListPass() : llvm::FunctionPass(ID) {}

    bool runOnFunction(llvm::Function & f) override {
		const auto pretty_name = boost::core::demangle(f.getName().str().c_str());
		static const std::regex call_regex{R"(void instrument::type_reg<([^,]+),(.+),([^,]+)>\(\))"};
		std::smatch match;
		if (std::regex_match(pretty_name, match, call_regex)) {
			if (match.size() == 4) {
				const auto pool_id = std::atoi(match[1].str().c_str());
				const auto type = match[2].str();
				const auto size = std::atoi(match[3].str().c_str());
				std::cout << "Pool ID: " << pool_id << ", Size: " << size << ", Type: " << type << "\n";
			}
		}
		return false; // does not alter the code, a read-only pass
	}
};
char AllocListPass::ID = 0;
static llvm::RegisterPass<AllocListPass> dummy("alloc-list", "This pass lists memory pool allocations");

llvm::ModulePass原理

  • dfs找入口 main等

  • 找到 type_reg<>
    • 记录分配信息
    • 打印函数调用
  • 检查递归,跳过一些分支

结果这样

Call graph for: Pool ID: 3, Size: 24, Type: std::__1::__list_node<int, void*>:
1. static_pool_allocator<std::__1::__list_node<int, void*>, 3ul>::allocate(unsigned long, void const*) called at /usr/include/c++/v1/memory:1547
2. std::__1::allocator_traits<static_pool_allocator<std::__1::__list_node<int, void*>, 3ul>>::allocate(static_pool_allocator<std::__1::__list_node<int, void*>, 3ul>&,
unsigned long) called at /usr/include/c++/v1/list:1079
3. std::__1::list<int, static_pool_allocator<int, 3ul>>::__allocate_node(static_pool_allocator<std::__1::__list_node<int, void*>, 3ul>&) called at
/usr/include/c++/v1/list:1569
4. std::__1::list<int, static_pool_allocator<int, 3ul>>::push_back(int const&) called at /home/program.cpp:12
5. x() called at /home/program.cpp:7
6. f() called at /home/program.cpp:2

llvm opt

opt -load alloc-analyzer.so -alloc-analyze -gen-hdr my_defs.hpp -entry-point "main"< home/program.bc -o /dev/null

ref

  • https://github.com/CppCon/CppCon2020/blob/main/Presentations/practical_memory_pool_based_allocators_for_modern_cpp/practical_memory_pool_based_allocators_for_modern_cpp__misha_shalem__cppcon_2020.pdf

Read More

(cppcon2020) back to basics

Algebraic Data Types

指 pair tuple (product type)(结构体) optional variant(sum type) (有index信息)

std::any不行,有类型擦除,丢信息了

pair tuple 多种信息

  • std::in_place/std::piecewise_construct
  • forward_as_tuple
  • std::tie
    • 用std::tie来实现比较
  • 结构化绑定
  • 公共接口还是定义一个类/结构体吧,丢失了名字信息,很可惜

optional,优雅,不浪费堆空间

optional 默认值, 没有值 对于指针场景,没有堆使用。优雅

std::unique_ptr<ComplicatedObject> obj_ = nullptr;
void setComplicated(int a, int b) {
	obj_ = std::make_unique<ComplicatedObject>(a, b);
}

std::optional<ComplicatedObject> obj_ = std::nullopt;
void setComplicated(int a, int b) {
	obj_.emplace(a, b);
}
  • std::optional<int> o = std::nullopt
  • value_or方法,非常优雅,省一个if
    • 必须是constexpr的,不然or不了
  • Setter, 用(std::optional) const std::optional<T\>&会拿到个临时对象?

variant 优雅的union

index方法 返回下表

std::get可以 通过index和类型来访问 类似的std::get_if

std::visit

poor man’s Expected<T>.

std::variant<std::string, std::errc> vGetenv(const char *name);
if (auto v = vGetenv("foo"); std::get_if<std::string>(&v)) {
	const auto& value = std::get<std::string>(v);
	std::cout << "Value is: " << value << "\n";
} else {
	std::error_condition error = std::get<std::errc>(v);
	std::cout << "Error was: " << error.message() << "\n";
}

Class Layout

  • 静态变量,非虚成员函数,静态成员函数,类型成员不会影响类的存储布局

  • 空基类,[[no_unique_address]]修饰,强制指定不占用空间

    • 空基类优化暂且不提,因为依赖编译器是否做优化。
  • 比较,auto operator <=>(const Flatland &) const = default;

    • 别默认内存布局memcmp,可能会失败(什么时候会失败????)
  • POD,以及pack

    • is_standard_layout_v
    class NarrowLand {
        unsigned char x;       // offset 0
        unsigned long long y;  // offset 8 (still!)
        unsigned long long z;  // offset 16
        friend bool operator ==(NarrowLand const &lhs, NarrowLand const &rhs);
    };
    bool operator ==(NarrowLand const &lhs, NarrowLand const &rhs) {
    	if constexpr (has_unique_object_representations_v<NarrowLand>)
    		return !memcmp(&lhs, &rhs, sizeof(NarrowLand));
    	else
    		return lhs.x == rhs.x && lhs.y == rhs.y && lhs.z == rhs.z;
    }
    
  • vptr

    • dynamic_cast 开销大 static_cast有偏移
  • 介绍了一波实现。太累了。不看了。这段东西看着就头疼


Concurrency

What is a data race and how do we fix it?

  • The hardware can reorder accesses 指令重排
  • ABA
    • busy-wait aka spinning
    • std::mutex
      • exception-safety? RAII
  • condition_variable for “wait until” 生产消费
    • produce/consume happen only once, consider std::promise/std::future 实际上内部也是mutex+cv
struct TokenPool {
	std::vector<Token> tokens_;
	std::mutex mtx_;
	std::condition_variable cv_;
	void returnToken(Token t) {
		std::unique_lock lk(mtx_);
		tokens_.push_back(t);
		lk.unlock();//!
		cv_.notify_one();
	}
	Token getToken() {
		std::unique_lock lk(mtx_);
		while (tokens_.empty()) {
			cv_.wait(lk);
		}
		Token t = std::move(tokens_.back());
		tokens_.pop_back();
		return t;
	}
};

Static initialization and once_flag 多线程的初始化

class Logger {
	std::mutex mtx_;
	std::optional<NetworkConnection> conn_;
	NetworkConnection& getConn() {
		std::lock_guard<std::mutex> lk(mtx_);
		if (!conn_.has_value()) {
			conn_ = NetworkConnection(defaultHost);
		}
		return *conn_;
	}
};

class Logger {
	std::once_flag once_;
	std::optional<NetworkConnection> conn_;
	NetworkConnection& getConn() {
		std::call_once(once_, []() {
			conn_ = NetworkConnection(defaultHost);
		});
		return *conn_;
	}
};
mutex condition_variable once_flag
lock blocks only if someone “owns” the mutex. wait always blocks. call_once blocks only if the “done” flag isn’t yet set.
Many threads can queue up on lock. Many threads can queue up on wait. Many threads can queue up on call_once.
Calling unlock unblocks exactly one waiter: the new “owner.” Calling notify_one unblocks exactly one waiter. Failing at the callback unblocks exactly one waiter: the new “owner.”
  Calling notify_all unblocks all waiters. Succeeding at the callback unblocks all waiters and sets the “done” flag.

New C++17 and C++20 primitives

  • shared_mutex
  • counting_semaphore
using Sem = std::counting_semaphore<256>;
struct SemReleaser {
	bool operator()(Sem *s) const { s->release(); }
};
class AnonymousTokenPool {
	Sem sem_{100};
	using Token = std::unique_ptr<Sem, SemReleaser>;
	Token borrowToken() {
		sem_.acquire(); // may block
		return Token(&sem_);
	}
};
  • std::latch
  • std::barrier<>

image-20200923190229383

Patterns for sharing data

  • Remember: Protect shared data with a mutex.
    • You must protect every access, both reads and writes, to avoid UB.
    • Maybe use a reader-writer lock (std::shared_mutex) for perf.
  • Remember: Producer/consumer? Use mutex + condition_variable.

  • Best of all, though: Avoid sharing mutable data between threads.
    • Make the data immutable.
    • Clone a “working copy” for yourself, mutate that copy, and then quickly “merge” your changes back into the original when you’re done

In conclusion

  • Unprotected data races are UB
    • Use std::mutex to protect all accesses (both reads and writes)
  • Thread-safe static initialization is your friend
    • Use std::once_flag only when the initializee is non-static
  • mutex + condition_variable are best friends

  • C++20 gives us “counting” primitives like semaphore and latch

  • But if your program is fundamentally multithreaded, look for higher-level facilities: promise/future, coroutines, ASIO, TBB

  • std::atomic_ref<T>
  • std::jthread

Exceptions

  • 异常带来的开销大于错误的影响
    • 解决方案 std::expected<T, E>
  • 异常使得函数难以理解
  • 异常依赖动态库
  • 异常加大二进制大小
  • 什么时候使用/不用异常
    • 不经常发生的错误
      • 经常出错,出错属于正常场景,别用
    • 异常不能处理的场景
      • IO错误
    • 构造函数等等不应该出错的场景
      • 引用空指针,越界等等应该保证不出错
  • 异常安全保证
      • All functions should at least provide the basic exception safety guarantee, if possible and reasonable the strong guarantee.
      • Consider the no-throw guarantee, but only provide it if you can guarantee it even for possible future changes.
    • 基本的异常安全保证
      • 没有资源泄漏
      • Invariants are preserved ?
    • 强异常安全保证
      • Invariants are preserved
      • 没有资源泄漏
      • 状态未改变 commit-or-rollback
    • 不抛异常
      • 操作不能失败
      • noexcept
  • RAII RAII*is the single most important idiom of the C++ programming language. Use it!
  • 不能失败
  • 析构函数
    • stack unwinding
    • 失败就terminate了
    • 默认noexcept
    • 清理必须安全
  • move 操作符
    • Core Guideline C.66: Make move operations noexcept
  • swap操作符,由基本的操作实现,不会失败

Lambda Expressions

c++20

[capture clause] <template parameters\> (parameter list)
specifier exception attribute -> return type requires { body }
  • specifier
    • mutable
    • constexpr(能推导出来,所以这个非必须)
    • consteval
  • exception
    • noexcept
    • throw 别用
  • requires
    • capture clause
    • template parameters
    • arguments passed in the parameter list
    • anything which can be checked at compile time
  • capture std::unique_ptr
std::unique_ptr<Widget> myPtr = std::make_unique<Widget>();
auto myLamb = [ capturedPtr = std::move(myPtr) ] ( )
{ return capturedPtr->computeSize(); };

Move Semantics

再谈右值

  • No rvalue reference as function return type
int&& func() { return 42; }
void test() {
	int a = func();//返回之前已经销毁
}

std::move

template <class T>
constexpr remove_reference_t<T>&& move(T&& t) noexcept
{
	return static_cast<remove_reference_t<T>&&>(t);
}
  • Next operation after std::move is destruction or assignment move完只能销毁或者重新赋值,其他操作会引入问题
  • Don’t std::move the return of a local variable 别move返回值

move ctor

  • Move constructor / assignment should be explicitly noexcept
  • Use t =default when possible
  • Moved-from object must be left in a valid state
  • Make move assignment safe for self-assignment
struct S {
	double* data;
	S( S&& other ) noexcept
		: data(std::exchange(other.data, nullptr))
	{ }
    
    S& operator=( S&& other ) noexcept {
        if (this == &other) return *this;
		
        delete[] data;
		data = std::exchange(other.data, nullptr);
		return *this;
	}
};

完美转发

template <class T> void f(T&& value)
{
	g(std::forward<T>(value));
}

有些类型只能move不能copy


Smart Pointers

  • std::unique_ptr
    • 没有copy语义
    • 比raw pointer无劣势
    • 定制deleter(需要可见)
    • 数组的特化std::unique_ptr<T[]>
    • std::make_unique 要比std::unique_ptr构造要快
  • std::shared_ptr
    • 有count原子计数。有消耗
      • 本身线程安全但是不保证引用的资源是线程安全
    • 定制deleter
    • std::make_shared 要比std::shared_ptr构造要快
    • std::shared_ptr<void>
      • https://www.cnblogs.com/imjustice/p/how_shared_ptr_void_works.html
      • https://stackoverflow.com/questions/5913396/why-do-stdshared-ptrvoid-work
  • std::weak_ptr
    • 借,生成shared_ptr

使用建议,最好别用share or

  • std::atomic_shared_ptr/std::atomic_weak_ptr -> std::atomic<std::shared_ptr<T>> std::atomic<std::weak_ptr<T>>

C++ Templates

  • c++17引入CTAD 没啥说的

  • using

template<size_t N>
using CharArray = std::array<char, N>;
  • std::array受限于NTTP这种参数难受,不如std::span

  • 变参模板

  • SFINAE

    • 其他套路
      • tag dispatch
      • if constexpr
    • c++20 用concept代替

Abstract Machines/The Structure of a Program

讲了一遍编译原理

  • ODR

  • ABI

  • name-mangling

  • 变量存储在哪

    • 注意static和thread_local

ref

  • https://github.com/CppCon/CppCon2020/blob/main/Presentations/back_to_basics_algebraic_data_types/back_to_basics_algebraic_data_types__arthur_odwyer__cppcon_2020.pdf
  • https://github.com/CppCon/CppCon2020/blob/main/Presentations/back_to_basics_class_layout/back_to_basics_class_layout__steve_dewhurst__cppcon_2020.pdf
  • https://github.com/CppCon/CppCon2020/blob/main/Presentations/back_to_basics_concurrency/back_to_basics_concurrency__arthur_odwyer__cppcon_2020.pdf
  • https://github.com/CppCon/CppCon2020/blob/main/Presentations/back_to_basics_exceptions/back_to_basics_exceptions__klaus_iglberger__cppcon_2020.pdf
  • https://github.com/CppCon/CppCon2020/blob/main/Presentations/back_to_basics_lambda_expressions/back_to_basics_lambda_expressions__barbara_geller__ansel_sermersheim__cppcon_2020.pdf
  • https://github.com/CppCon/CppCon2020/blob/main/Presentations/back_to_basics_move_semantics/back_to_basics_move_semantics__david_olsen__cppcon_2020.pdf
    • Nicolai M. Josuttis, C++ Move Semantics: The Complete Guide,http://www.cppmove.com/
    • C++ Core Guidelineshttps://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines.html
    • Nicolai Josuttis, “The Hidden Secrets of Move Semantics”, CppCon 2020
    • Nicolai Josuttis, “The Nightmare of Move Semantics for Trivial Classes”, CppCon 2017 https://www.youtube.com/watch?v=PNRju6_yn3o
  • https://github.com/CppCon/CppCon2020/blob/main/Presentations/back_to_basics_smart_pointers/back_to_basics_smart_pointers__rainer_grimm__cppcon_2020.pdf
  • https://github.com/CppCon/CppCon2020/blob/main/Presentations/back_to_basics_templates_part_1/back_to_basics_templates_part_1__andreas_fertig__cppcon_2020.pdf
  • https://github.com/CppCon/CppCon2020/blob/main/Presentations/back_to_basics_templates_part_2/back_to_basics_templates_part_2__andreas_fertig__cppcon_2020.pdf
  • https://github.com/CppCon/CppCon2020/blob/main/Presentations/back_to_basics_the_abstract_machine/back_to_basics_the_abstract_machine__bob_steagall__cppcon_2020.pdf
  • https://github.com/CppCon/CppCon2020/blob/main/Presentations/back_to_basics_the_structure_of_a_program/back_to_basics_the_structure_of_a_program__bob_steagall__cppcon_2020.pdf

Read More

(转)boost.asio新框架的设计概念总结

1.66版本,boost.asio库重新设计了框架,目前最新版为1.71。读了几天代码后,对框架中相关概念总结。因为是泛型编程的库,所以分析的概念层的设计。

可通过boost官方文档,strand的1.65和1.66两版本文档比较,查证ts和io_context, executor首次出现在1.66。

新框架有几个核心概念,Context,Scheduler,Service,Executor,Strand。

Context:

  • asio所有功能都必需在一个*Context* 里调度执行
  • 每个Context 都有一个Service 注册表,管理Service
  • 每个Context 下的Service 都是唯一的
  • 每个Context 都有一个Scheduler
  • Context 必须通过在线程运行poll()或run()进入调度消费Scheduler 执行队列并执行任务
  • io_context是一种对io操作优先的优化Context,将io事件复路分集方法做成内嵌任务
  • io_context的win版本对Schdeluer 进行了优化,聚合了iocp。
  • 可以在多线程上同时运行poll()或run(),并且线程安全

Scheduler:

  • 首先是一个Context 的一个服务
  • 有一条op_queue执行队列
  • 所有Service 的调度都最终依赖Scheduler 调度
  • Scheduler 的dispatch()方法将任务调度到执行队列

Service:

  • 为某种功能提供调度以及功能服务
  • 最终依赖所在的 ContextScheduler 调度服务
  • 每种 Service 都有一个service_impl类,并为这个类提供服务

Executor:

  • 相当于ios中的可并行的dispatch_queue
  • 相当于一个 Context 的服务,或者对 ContextExecution 行为的委托
  • 最终依赖所在的ContextScheduler调度服务

Strand:

  • 相当于ios中的串行化的dispatch_queue
  • 分两种服务,绑定本io Context 以及可以指定Executor (即不同类型Context)
  • 每个Strand 有独立的执行队列
  • Strand 本身作为一个任务,必须在Scheduler 进行调度分派。
  • 同一个Strand 同时只能在一条线程上分派执行队列
  • 当多线程同时对Strand 分派时,其它线程只能将任务缓冲到等待队列
  • 利用本身强制串行化的特性,可代替同步锁,保护变量和代码,减少线程切换

ref

  • https://www.cnblogs.com/bbqzsl/p/11919502.html
  • asio使用样例,不错 https://github.com/franktea/network
  • 介绍实现的 https://zhuanlan.zhihu.com/p/55503053
  • http://spiritsaway.info/asio-implementation.html#f69817

Read More

go源码剖析笔记

环境

go version #go version go1.10.4 linux/amd64
lsb_release -d #Description:    Ubuntu 18.04.1 LTS
gdb --version #GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git

引导

测试代码 test.go

package main
func main() {
    println("hello, world");
}
go build -gcflags "-N -l" -o test test.go
gdb test
(gdb) info files
Symbols from "/mnt/c/Program Files/cmder/test".
Local exec file:
        `/mnt/c/Program Files/cmder/test', file type elf64-x86-64.
        Entry point: 0x4477c0
        0x0000000000401000 - 0x000000000044c213 is .text
        0x000000000044d000 - 0x00000000004757a3 is .rodata
        0x00000000004758e0 - 0x0000000000475f80 is .typelink
        0x0000000000475f80 - 0x0000000000475f88 is .itablink
        0x0000000000475f88 - 0x0000000000475f88 is .gosymtab
        0x0000000000475fa0 - 0x00000000004a3630 is .gopclntab
        0x00000000004a4000 - 0x00000000004a4a08 is .noptrdata
        0x00000000004a4a20 - 0x00000000004a65b0 is .data
        0x00000000004a65c0 - 0x00000000004c2888 is .bss
        0x00000000004c28a0 - 0x00000000004c4e58 is .noptrbss
        0x0000000000400f9c - 0x0000000000401000 is .note.go.buildid
(gdb) b *0x4477c0
Breakpoint 1 at 0x4477c0: file /usr/lib/go-1.10/src/runtime/rt0_linux_amd64.s, line 8.

版本对应的汇编有变化,没有明显的main,但是入口肯定是_rt0_amd64

#include "textflag.h"

TEXT _rt0_amd64_linux(SB),NOSPLIT,$-8
        JMP     _rt0_amd64(SB)

TEXT _rt0_amd64_linux_lib(SB),NOSPLIT,$0
        JMP     _rt0_amd64_lib(SB)
        
        
  
(gdb) b _rt0_amd64
Breakpoint 2 at 0x444100: file /usr/lib/go-1.10/src/runtime/asm_amd64.s, line 15.

对应汇编是书里的runtime.rt0_go

TEXT _rt0_amd64(SB),NOSPLIT,$-8
        MOVQ    0(SP), DI       // argc
        LEAQ    8(SP), SI       // argv
        JMP     runtime·rt0_go(SB)
b runtime.rt0_go
Breakpoint 3 at 0x444110: file /usr/lib/go-1.10/src/runtime/asm_amd64.s, line 89.
       ;前面有很多对于汇编指令cpu类型的判断,参数入栈等等
       // create a new goroutine to start program
        MOVQ    $runtime·mainPC(SB), AX                // entry
        PUSHQ   AX
        PUSHQ   $0                      // arg size
        CALL    runtime·newproc(SB)
        POPQ    AX
        POPQ    AX

        // start this M
        CALL    runtime·mstart(SB)

        MOVL    $0xf1, 0xf1  // crash
        RET

DATA    runtime·mainPC+0(SB)/8,$runtime·main(SB)
GLOBL   runtime·mainPC(SB),RODATA,$8
b runtime.schedinit
Breakpoint 6 at 0x423a60: file /usr/lib/go-1.10/src/runtime/proc.go, line 477.
b runtime.main
Breakpoint 4 at 0x4228b0: file /usr/lib/go-1.10/src/runtime/proc.go, line 109.

schedinit 入口

// The bootstrap sequence is:
//
//      call osinit
//      call schedinit
//      make & queue new G
//      call runtime·mstart
//
// The new G calls runtime·main.
func schedinit() {
        // raceinit must be the first call to race detector.
        // In particular, it must be done before mallocinit below calls racemapshadow.
        _g_ := getg()
        if raceenabled {
                _g_.racectx, raceprocctx0 = raceinit()
        }

        sched.maxmcount = 10000

        tracebackinit()
        moduledataverify()
        stackinit()
        mallocinit()
        mcommoninit(_g_.m)
        alginit()       // maps must not be used before this call
        modulesinit()   // provides activeModules
        typelinksinit() // uses maps, activeModules
        itabsinit()     // uses activeModules
        
        msigsave(_g_.m)
        initSigmask = _g_.m.sigmask

        goargs()
        goenvs()
        //处理GODEBUG GOTRACEBACK宏
        parsedebugvars()
        //垃圾回收器初始化
        gcinit()

        sched.lastpoll = uint64(nanotime())
        //通过CPU core和GOMAXPROCS确定P数量
        procs := ncpu
        if n, ok := atoi32(gogetenv("GOMAXPROCS")); ok && n > 0 {
                procs = n
        }
        // 调整P数量
        if procresize(procs) != nil {
                throw("unknown runnable goroutine during bootstrap")
        }

        // For cgocheck > 1, we turn on the write barrier at all times
        // and check all pointer writes. We can't do this until after
        // procresize because the write barrier needs a P.
        if debug.cgocheck > 1 {
                writeBarrier.cgo = true
                writeBarrier.enabled = true
                for _, p := range allp {
                        p.wbBuf.reset()
                }
        }


下一步是runtime.main

// The main goroutine.
func main() {
        g := getg()

        // Racectx of m0->g0 is used only as the parent of the main goroutine.
        // It must not be used for anything else.
        g.m.g0.racectx = 0

        // Max stack size is 1 GB on 64-bit, 250 MB on 32-bit.
        // Using decimal instead of binary GB and MB because
        // they look nicer in the stack overflow failure message.
        if sys.PtrSize == 8 {
                maxstacksize = 1000000000
        } else {
                maxstacksize = 250000000
        }

        // Allow newproc to start new Ms.
        //启动系统后台监控/定期垃圾回收,并发任务调度相关
        mainStarted = true
        systemstack(func() {
                newm(sysmon, nil)
        })

        // Lock the main goroutine onto this, the main OS thread,
        // during initialization. Most programs won't care, but a few
        // do require certain calls to be made by the main thread.
        // Those can arrange for main.main to run in the main thread
        // by calling runtime.LockOSThread during initialization
        // to preserve the lock.
        lockOSThread()

        if g.m != &m0 {
                throw("runtime.main not on m0")
        }

        runtime_init() // must be before defer
        if nanotime() == 0 {
                throw("nanotime returning zero")
        }

        // Defer unlock so that runtime.Goexit during init does the unlock too.
        needUnlock := true
        defer func() {
                if needUnlock {
                        unlockOSThread()
                }
        }()
            // Record when the world started. Must be after runtime_init
        // because nanotime on some platforms depends on startNano.
        runtimeInitTime = nanotime()

        gcenable()

        main_init_done = make(chan bool)
        if iscgo {
                if _cgo_thread_start == nil {
                        throw("_cgo_thread_start missing")
                }
                if GOOS != "windows" {
                        if _cgo_setenv == nil {
                                throw("_cgo_setenv missing")
                        }
                        if _cgo_unsetenv == nil {
                                throw("_cgo_unsetenv missing")
                        }
                }
                if _cgo_notify_runtime_init_done == nil {
                        throw("_cgo_notify_runtime_init_done missing")
                }
                // Start the template thread in case we enter Go from
                // a C-created thread and need to create a new thread.
                startTemplateThread()
                cgocall(_cgo_notify_runtime_init_done, nil)
        }

        fn := main_init // make an indirect call, as the linker doesn't know the address of the main package when laying down the runtime
        fn()
        close(main_init_done)

        needUnlock = false
        unlockOSThread()

        if isarchive || islibrary {
                // A program compiled with -buildmode=c-archive or c-shared
                // has a main, but it is not executed.
                return
        }
        fn = main_main // make an indirect call, as the linker doesn't know the address of the main package when laying down the runtime
        fn()
        if raceenabled {
                racefini()
        }
        // Make racy client program work: if panicking on
        // another goroutine at the same time as main returns,
        // let the other goroutine finish printing the panic trace.
        // Once it does, it will exit. See issues 3934 and 20018.
        if atomic.Load(&runningPanicDefers) != 0 {
                // Running deferred functions should not take long.
                for c := 0; c < 1000; c++ {
                        if atomic.Load(&runningPanicDefers) == 0 {
                                break
                        }
                        Gosched()
                }
        }
        if atomic.Load(&panicking) != 0 {
                gopark(nil, nil, "panicwait", traceEvGoStop, 1)
        }

        exit(0)
        //? 这啥
        for {
                var x *int32
                *x = 0
        }

一个复杂示例

//cat lib/sum.go
package lib
func init() {
    println("sum.init")
}

func Sum(x ...int) int {
    n  := 0
    for _, i := range x{
        n += i
    }
    return n
}
//cat test.go
package main
import (
    "./lib"
)
func init() {
    println("test.init")
}

func test() {
    println(lib.Sum(1,2,3))
}

//cat main.go
package main

import (
        _ "net/http"
)

func init() {
    println("main.init.2")
}

func main() {
    test()
}

func init() {
    println("main.init.1")
}

执行结果

go build -gcflags "-N -l" -o test
./test
sum.init
main.init.2
main.init.1
test.init
6

查看反汇编

;go tool objdump -s "runtime\.init\b" test
TEXT runtime.init.0(SB) /usr/lib/go-1.10/src/runtime/cpuflags_amd64.go
TEXT runtime.init.1(SB) /usr/lib/go-1.10/src/runtime/mgcwork.go
  mgcwork.go:25         0x420860                c3                      RET
TEXT runtime.init.2(SB) /usr/lib/go-1.10/src/runtime/mstats.go
  mstats.go:438         0x4260d0                64488b0c25f8ffffff      MOVQ 
TEXT runtime.init.3(SB) /usr/lib/go-1.10/src/runtime/panic.go
TEXT runtime.init.4(SB) /usr/lib/go-1.10/src/runtime/proc.go
TEXT runtime.init.5(SB) /usr/lib/go-1.10/src/runtime/signal_unix.go
  signal_unix.go:64     0x43e450                c3                      RET
TEXT runtime.init(SB) <autogenerated>


;go tool objdump -s "main\.init\b" test
TEXT main.init.0(SB) /mnt/c/Program Files/cmder/main.go
TEXT main.init.1(SB) /mnt/c/Program Files/cmder/main.go
TEXT main.init.2(SB) /mnt/c/Program Files/cmder/test.go
TEXT main.init(SB) <autogenerated>
 <autogenerated>:1     0x5e31ec                e81f63ffff              CALL net/http.init(SB)

  <autogenerated>:1     0x5e31f1                e83afdffff              CALL _/mnt/c/Program_Files/cmder/lib.init(SB)
  <autogenerated>:1     0x5e31f6                e895fdffff              CALL main.init.0(SB)

  <autogenerated>:1     0x5e31fb                e820feffff              CALL main.init.1(SB)

  <autogenerated>:1     0x5e3200                e87bfeffff              CALL main.init.2(SB)

  <autogenerated>:1     0x5e3205                c605822a1f0002          MOVB $0x2, main.initdone.(SB)

  <autogenerated>:1     0x5e320c                488b2c24                MOVQ 0(SP), BP

  <autogenerated>:1     0x5e3210                4883c408                ADDQ $0x8, SP

  <autogenerated>:1     0x5e3214                c3                      RET

  <autogenerated>:1     0x5e3215                e80600e7ff              CALL runtime.morestack_noctxt(SB)

  <autogenerated>:1     0x5e321a                eb84                    JMP main.init(SB)

结论

所有init都会在同一个goroutine执行

所有init函数结束后才会执行main.main

内存分配

基本策略

  • 每次从操作系统申请一大块内存,减少系统调用
  • 内存分配器
    • 大块内存预先切成小块构成链表
    • 分配就从链表里提取一块
    • 回收旧放回链表
    • 空闲过多会归还给系统降低整体开销

内存块

  • span page 大块内存
  • object切分span多个小块
  • 哦,抄的tcmalloc

初始化动作

三个数组组成内存管理结构

  • spans,管理span的,按页对应,地址按页对齐能快速定位(?这里的原理不太清楚,我对页这些东西计算一直处于一知半解水平)
  • bitmap 为每个对象提供4bit标记为,保存指针,GC标记
  • arena 申请内存,用户可分配上限

arena和spans bitmap存在映射关系,三者可以按需同步线性扩张

都用mheap维护,在mallocinit里初始化

来个示例

//test.go
package main

import(
    "fmt"
    "os"
    "github.com/shirou/gosutil/process"
)

var ps *process.Process

func mem(n int) {
    if ps == nil {
            p, err := process.NewProcess(int32(os.Getpid()))
                if err != nil {
                    panic(err)
                }

                ps = p
        }

        mem, _ := ps.MemoryInfoEx()
        fmt.Printf("%d, VMS:%d MB, RSS:%d MB\n", n, mem,.VMS>>20, mem.RSS>>20)
}

func main(){
    mem(1)
        data : new([10][1024*1024]byte)
    mem(2)

    for i := range data {
            for x, n := 0, len(data[i]); x<n; x++ {
                    data[i][x] = 1
                }
                mem(3)
        }
}

分配

不要以为new一定会分配在堆上,随着优化内联

package main
import ()
func test() *int {
    x := new(int)
    *x = 0xAABB
    return x
}
func main() {
    println(*test())
}
go build -gcflags "-l" -o test test.go
go tool objdump -s "main\.test" test
TEXT main.test(SB) /mnt/c/Program Files/cmder/test.go
  test.go:4             0x44c150                64488b0c25f8ffffff      MOVQ FS:0xfffffff8, CX
  test.go:4             0x44c159                483b6110                CMPQ 0x10(CX), SP
  test.go:4             0x44c15d                7639                    JBE 0x44c198
  test.go:4             0x44c15f                4883ec18                SUBQ $0x18, SP
  test.go:4             0x44c163                48896c2410              MOVQ BP, 0x10(SP)
  test.go:4             0x44c168                488d6c2410              LEAQ 0x10(SP), BP
  test.go:5             0x44c16d                488d05acac0000          LEAQ 0xacac(IP), AX
  test.go:5             0x44c174                48890424                MOVQ AX, 0(SP)
  test.go:5             0x44c178                e8a3effbff              CALL runtime.newobject(SB)
  test.go:5             0x44c17d                488b442408              MOVQ 0x8(SP), AX
  test.go:6             0x44c182                48c700bbaa0000          MOVQ $0xaabb, 0(AX)
  test.go:7             0x44c189                4889442420              MOVQ AX, 0x20(SP)
  test.go:7             0x44c18e                488b6c2410              MOVQ 0x10(SP), BP
  test.go:7             0x44c193                4883c418                ADDQ $0x18, SP
  test.go:7             0x44c197                c3                      RET
  test.go:4             0x44c198                e8d383ffff              CALL runtime.morestack_noctxt(SB)
  test.go:4             0x44c19d                ebb1                    JMP main.test(SB)
  


go build -o test test.go
go tool objdump -s "main\.main" test
TEXT main.main(SB) /mnt/c/Program Files/cmder/test.go
  test.go:10            0x44c150                64488b0c25f8ffffff      MOVQ FS:0xfffffff8, CX
  test.go:10            0x44c159                483b6110                CMPQ 0x10(CX), SP
  test.go:10            0x44c15d                7634                    JBE 0x44c193
  test.go:10            0x44c15f                4883ec10                SUBQ $0x10, SP
  test.go:10            0x44c163                48896c2408              MOVQ BP, 0x8(SP)
  test.go:10            0x44c168                488d6c2408              LEAQ 0x8(SP), BP
  test.go:11            0x44c16d                e88e59fdff              CALL runtime.printlock(SB)
  test.go:11            0x44c172                48c70424bbaa0000        MOVQ $0xaabb, 0(SP)
  test.go:11            0x44c17a                e80161fdff              CALL runtime.printint(SB)
  test.go:11            0x44c17f                e80c5cfdff              CALL runtime.printnl(SB)
  test.go:11            0x44c184                e8f759fdff              CALL runtime.printunlock(SB)
  test.go:12            0x44c189                488b6c2408              MOVQ 0x8(SP), BP
  test.go:12            0x44c18e                4883c410                ADDQ $0x10, SP
  test.go:12            0x44c192                c3                      RET
  test.go:10            0x44c193                e8d883ffff              CALL runtime.morestack_noctxt(SB)
  test.go:10            0x44c198                ebb6                    JMP main.main(SB)

逃逸分析-gcflag “-m”

分配思路 malloc.go

  • 大对象heap
  • 小对象cache.alloc[sizeclass].freelist object
  • 微小对象使用cache.tiny object

回收

回收以span为单位

释放

sysmon监控任务来搞

具体释放是madvie(v, n, _MADV_DONTNEED) 系统来决定。如果物理内存资源充足,就不会回收避免无谓的损耗,不过再次使用肯定会pagefault然后分配新的内存

垃圾回收

缩短STW时间

抑制堆增长 充分利用CPU资源

  • 三色标记和写屏障
    • 所有都是白色
    • 扫描出所有可达对象,标记成灰色,放出待处理队列
    • 队列提取出灰色对象,将其引用对象标记为灰色放入队列,自身标记为黑色
    • 写屏障监视对象内崔修改,重新标色或放回队列

gcController控制

辅助回收,避免分配速度大于后台标记导致的堆恶性扩张

ref


Read More

八月待读 need review

公司不让访问外网了呜呜

https://support.hypernode.com/en/troubleshooting/performance/how-to-debug-out-of-memory-oom-events

https://briancallahan.net/blog/20200816.html

https://lobste.rs/s/2vj4sb/file_handling_unix_tips_traps_outright

https://danluu.com/file-consistency/

https://rachelbythebay.com/w/2020/08/11/files/

https://robertovitillo.com/what-every-developer-should-know-about-database-consistency/

http://brooker.co.za/blog/2020/05/25/reading.html

https://codahale.com/work-is-work/

https://bartoszmilewski.com/2020/08/11/benign-data-races-considered-harmful/

https://www.fluentcpp.com/2020/06/12/a-generic-component-for-out-of-line-lambdas/

http://brooker.co.za/blog/2020/03/22/rust.html

https://briancallahan.net/blog/20200812.html

https://blog.kevinjahns.de/are-crdts-suitable-for-shared-editing/

https://zhuanlan.zhihu.com/p/108968057

https://www.cockroachlabs.com/blog/cockroachdb-sigmod-2020/?utm_campaign=cooperpress-2020-Q2&utm_source=dbweekly&utm_medium=newsletter&utm_content=dbweekly-primary-sigmod

https://ketanbhatt.com/db-concurrency-defects/

Read More

^