(译)对于模版类,尽可能的使用Hidden Friend函数定义operator,而不是放在外侧当成模版方法


原文链接

两种比较实现

一种是通用的模版方法

template<class V>
struct Cat {
    V value_;
};

template<class V>
bool operator<(const Cat<V>& a, const Cat<V>& b) {
    return a.value_ < b.value_;
}

另一种是友元函数

template<class V>
struct Dog {
    V value_;

    friend bool operator<(const Dog& a, const Dog& b) {
        return a.value_ < b.value_;
    }
};

这也叫Hidden Friend 惯用法,更推荐这种写法,比如这种场景

template<class T>
void sort_in_place(const std::vector<T>& vt) {
    std::vector<std::reference_wrapper<const T>> vr(vt.begin(), vt.end());
    std::sort(vr.begin(), vr.end());
    std::transform(vr.begin(), vr.end(),
        std::ostream_iterator<int>(std::cout), std::mem_fn(&T::value_));
}

使用reference_wrapper,增加一层,对于sort比较,通过ADL找对应的operator < , 推导失败,(Godbolt.)

opt/compiler-explorer/gcc-snapshot/lib/gcc/x86_64-linux-gnu/11.0.0/../../../../include/c++/11.0.0/bits/predefined_ops.h:43:23: error: invalid operands to binary expression ('std::reference_wrapper<const Cat<int>>' and 'std::reference_wrapper<const Cat<int>>')
      { return *__it1 < *__it2; }
               ~~~~~~ ^ ~~~~~~

对于Dog类,用到friend方法,能隐式把 const Dog<int>& 转换reference_wrapper<Dog<int»

对于Cat类,operator < 需要具体的类型来推导,否则直接报错

这个技巧也叫Barton–Nackman trick

标准库的写法通常都是Cat,reference_wrapper 也是后加的,大部分没有sort_in_place这种需求


看到这里或许你有建议或者疑问或者指出我的错误,请留言评论或者邮件mailto:wanghenshui@qq.com, 多谢! 你的评论非常重要!

觉得写的不错可以点开扫码赞助几毛 微信转账
Read More

(译)还是讨论folly的静态注入技术:合法访问私有成员函数


原文链接

需求,不改动Foo类的前提下访问bar和x,即使他们是private

// foo.h
#pragma once
#include <iostream>

class Foo {
    int bar() const {
        std::cout << __PRETTY_FUNCTION__;
        return x;
    }

    int x{42};
};

先是总结了一遍folly的技术

// access_private_of_foo.h
#pragma once
#include "foo.h"

// Unique namespace in each TU.
namespace {
// Unique type in each TU (internal linkage).
struct TranslationUnitTag {};
}  // namespace

// 'Foo::bar()' invoker.
template <typename UniqueTag,
          auto mem_fn_ptr>
struct InvokePrivateFooBar {
    // (Injected) friend definition.
    friend int invoke_private_Foo_bar(Foo const& foo) {
        return (foo.*mem_fn_ptr)();
    }
};
// Friend (re-)declaration.
int invoke_private_Foo_bar(Foo const& foo);

// Single explicit instantiation definition.
template struct InvokePrivateFooBar<TranslationUnitTag, &Foo::bar>;

// 'Foo::x' accessor.
template <typename UniqueTag,
          auto mem_ptr>
struct AccessPrivateMemFooX {
    // (Injected) friend definition.
    friend int& access_private_Foo_x(Foo& foo) {
        return foo.*mem_ptr;
    }
};
// Friend (re-)declaration.
int& access_private_Foo_x(Foo& foo);

// Single explicit instantiation definition.
template struct AccessPrivateMemFooX<TranslationUnitTag, &Foo::x>;

这个代码更清晰一点,之前也谈到过,见这篇文章

现在是2020年了,考虑c++20的做法

C++20 implemented P0692R1 (Access Checking on Specializations), summarized in P2131R0 (Changes between C++17 and C++20 DIS) as

This change fixes a long-standing, somewhat obscure situation, where it was not possible to declare a template specialization for a template argument that is a private (or protected) member type. For example, given class Foo { class Bar {}; };, the access Foo::Bar is now allowed in template<class> struct X; template<> struct X<Foo::Bar>;.

特化模版,模版参数可以填private/protected成员函数, 也就规避了显式实例化,保留原来的特化即可

回到这个函数接口,原来的友元技术不变,只是去掉显式实例化

// accessprivate.h
#pragma once
template <auto mem_ptr>
struct AccessPrivate
{
    // kMemPtr is intended to be either a pointer to a private
    // member function or pointer to a private data member.
    static constexpr auto kMemPtr = mem_ptr;
    struct Delegate;  // Define only in explicit specializations.
};
// access_private_of_foo_cpp20.h
#pragma once
#include "accessprivate.h"
#include "foo.h"

// Specialize the nested Delegate class for each private
// member function or data member of Foo that we'd like to access.

template <>
struct AccessPrivate<&Foo::bar>::Delegate {
    // (Injected) friend definition.
    friend int invoke_private_Foo_bar(Foo const& foo) {
        return (foo.*kMemPtr)();
    }
};
// Friend (re-)declaration.
int invoke_private_Foo_bar(Foo const& foo);

template <>
struct AccessPrivate<&Foo::x>::Delegate {
    // (Injected) friend definition.
    friend int& access_private_Foo_x(Foo& foo) {
        return foo.*kMemPtr;
    }
};
// Friend (re-)declaration.
int& access_private_Foo_x(Foo& foo);

注意这里,声明了Delegate,只特化需要的注入访问接口,之前的显式实例化,以及匿名空间Tag(TU唯一)都去掉了。加了一层Delegate

用宏整理一下

// accessprivate/accessprivate.h
#pragma once

namespace accessprivate {
template <auto mem_ptr>
struct AccessPrivate
{
    // kMemPtr is intended to be either a pointer to a private
    // member function or pointer to a private data member.
    static constexpr auto kMemPtr = mem_ptr;
    struct Delegate;  // Define only in explicit specializations.
};

}  // namespace accessprivate

// DEFINE_ACCESSOR(<qualified class name>, <class data member>)
//
// Example usage:
//   DEFINE_ACCESSOR(foo::Foo, x)
//
// Defines:
//   auto& accessprivate::get_x(foo::Foo&)
#define DEFINE_ACCESSOR(qualified_class_name, class_data_member)\
namespace accessprivate {\
template <>\
struct AccessPrivate<&qualified_class_name::class_data_member>::Delegate {\
    friend auto& get_##class_data_member(\
        qualified_class_name& obj) { return obj.*kMemPtr; }\
};\
auto& get_##class_data_member(qualified_class_name& obj);\
}

这样写getter setter更简单

#include <iostream>
#include "accessprivate/accessprivate.h"

namespace bar {

struct Bar {
    int getX() const { return x; }
    int getY() const { return y; }
private:
    int x{42};
    int y{88};
};

}  // namespace bar

DEFINE_ACCESSOR(bar::Bar, x)
// -> accessprivate::get_x(Bar&)
DEFINE_ACCESSOR(bar::Bar, y)
// -> accessprivate::get_y(Bar&)

void demo() {
    bar::Bar b{};
    accessprivate::get_x(b) = 13;
    accessprivate::get_y(b) = 33;
    std::cout << b.getX() << " " << b.getY();  // 13 33
}

作者已经写了仓库 c++17可用


ref

  • 原文中列出了一些c++的标准中对应的描述,这里不列举了,不仔细追究什么符号查找之类的限定了
  • 作者的博客很值得一读,老语言律师了
  • 还有一个讨论,技巧和folly一样,不多说了 https://quuxplusone.github.io/blog/2020/12/03/steal-a-private-member/

看到这里或许你有建议或者疑问或者指出我的错误,请留言评论或者邮件mailto:wanghenshui@qq.com, 多谢!

觉得写的不错可以点开扫码赞助几毛 微信转账
Read More

(译)编译器是如何处理没用到的代码的?


原文链接

作者整理了一份测试的表格(这个大哥是真爱c++啊这种细节都要扣我感觉魔怔了有点)

编译器是否会对没被用到的___ 发出警告 Clang GCC ICC MSVC
static function -Wall -Wall   -W4
static variable -Wall -Wall    
private data member -Wall      
private static data member        
private member function        
private static member function        
data member of private class        
static data member of private class        
member function of private class        
static member function of private class        
anonymous-namespaced function -Wall -Wall    
anonymous-namespaced variable -Wall -Wall    
data member of anonymous-namespaced class        
static data member of anonymous-namespaced class -Wall -Wall    
member function of anonymous-namespaced class   -Wall    
static member function of anonymous-namespaced class   -Wall    
function taking anonymous-namespaced class -Wall -Wall    
编译器是否会优化掉未使用的____ Clang GCC ICC MSVC
static function -O0 -O1 -O0 -Od
static variable -O0 -O0 -O1 -Od
private data member
private static data member
private member function
private static member function
static data member of private class
member function of private class
static member function of private class
anonymous-namespaced function -O0 -O1 -O0  
anonymous-namespaced variable -O0 -O0 -O1 -Od
static data member of anonymous-namespaced class -O0 -O0 -O1  
member function of anonymous-namespaced class -O0 -O1 -O1  
static member function of anonymous-namespaced class -O0 -O1 -O1  
function taking anonymous-namespaced class -O0 -O1 -O1  

还有很多优化空间

注意 没用到的私有函数是不回被删掉的,所以有个hack: 模版参数是私有函数指针,通过显式实例化绕开private限制,实现静态注入/调用,详情看这篇文章


看到这里或许你有建议或者疑问或者指出我的错误,请留言评论或者邮件mailto:wanghenshui@qq.com, 多谢! 你的评论非常重要!

觉得写的不错可以点开扫码赞助几毛 微信转账
Read More

(译)socket in your shell


整理自这篇博客

简单说,就是基本工具shell也可以用socket来做服务/客户端(尤其是在没有nc/telnet的场景下)

作者列了普通bash和zsh下两种用法

bash

echo "text!" > /dev/$PROTO/$HOST/$PORT

一个检测例子

#!/bin/bash
if exec 3>/dev/tcp/localhost/4000 ; then
	echo "server up!"
else
	echo "server down."
fi

我以前都用netcat检测

也可以用exec检测

samplecurl

#!/bin/bash
exec 3<>/dev/tcp/"$1"/80
echo -e "GET / HTTP/1.1\n" >&3
cat <&3

使用

$ ./simplecurl www.google.com
HTTP/1.1 200 OK
Date: Thu, 03 Dec 2020 00:57:30 GMT
Expires: -1
....
<google website>

zsh

有内建模块支持

zmodload zsh/net/tcp

这行放到.zshrc ,或者shell里执行,就加载了ztcp

# host machine:
lfd=$(ztcp -l 7128)
talkfd=$(ztcp -a $lfd)

# client machine
talkfd=$(ztcp HOST 7128)

这样客户端服务端的fd有了,就可以通话了

# host machine
echo -e "hello!" >&$talkfd

# client machine
read -r line <&$talkfd; print -r - $line
> hello!

看到这里或许你有建议或者疑问或者指出我的错误,请留言评论或者邮件mailto:wanghenshui@qq.com, 多谢! 你的评论非常重要!

觉得写的不错可以点开扫码赞助几毛 微信转账
Read More


(译)现代存储硬件足够快啦就是老api不太好用


这里存储设备指的optane这种 原文

简单整理,用deepl翻译的

作者是老工程师了,列出了常见的几种对存储的误解

  • IO比复制更重,所以复制数据代替直接读是合理的,因为省了一次IO
  • “我们设计的系统要非常快,所以全放在内存里是必须的”
  • 文件拆分成多个反而会慢,因为会产生随机IO 不如直接从一个文件里读,顺序的
  • Direct IO非常慢,只适用于特殊的设备,如果没有对应的cache支持,会很糟糕

作者的观点是,现在设备非常牛逼,以前的api有很多设计不合理的地方,各种拷贝,分配 ,read ahead等等操作过于昂贵

即:传统api的设计是因为IO昂贵,所以做了些昂贵的动作来弥补

  • 读没读到 cache-miss -> 产生page-fault 加载数据到内存 -> 读好了,产生中断
    • 如果是普通用户态进程,再拷贝一份给进程
    • 如果用了mmap,要更新虚拟页

在以前,IO很慢,对比来说这些更新拷贝要慢一百倍,所以这些也无足轻重,但是现在IO延迟非常低,可以看三星nvme ssd指标,基本上耗时数量级持平

简单计算,最坏情况,设备耗时也没占上一半,时间都浪费在哪里了?这就涉及到第二个问题 读放大

操作系统读数据是按照页来读,每次最低4k,如果你读一个1k的数据,这个文件分成了两个,那也就是说,你要用8k的页读1k的数据,浪费87%,实际上,系统也预读(read ahead) 每次默认预读128k,方便后面继续读,那也就是说相当于用256k来读1k,非常恐怖的浪费

那么 用direct IO直接读怎么样,不会有页浪费了吧

问题还在,老api并不是并发读,尽管读的快但是比cpu还是慢,阻塞,还是要等待

所以又变成多文件,提高吞吐,但是

  • 多文件又有多的read ahead浪费,
  • 而且多文件可能就要多线程,还是放大,如果你并没有那么多文件,这个优化点也用不上

新的api

io_uring是革命性的,但还是低级的api:

  • io_uring的IO调度还是会收到之前提到的各种缓存问题影响
  • Direct IO有很多隐藏的条件(caveats 注释事项) 比如只能内存对齐读,io_uring作为新api对于类似的问题没有任何改进

为了使用io_uring你需要分批积累和调度,这就需要一个何时做的策略,以及某种事件循环

为此,作者设计了一个io框架glommio Direct IO,基于轮训,无中断 register-buffer

Glommio处理两种文件类型

  • 随机访问文件
    • 不需要缓冲区,直接用io_uring注册好的缓冲区拿过来用,没有拷贝,没有内存映射,用户拿到一个引用计数指针
    • 因为指导这是随机IO,要多少读多少
  • 流文件
    • 设计的和rust的asyncread一样,多一层拷贝,也有不用拷贝的api

作者列出了他的库做的抽象拿到的数据,和bufferred io做比较

  bufferred IO DirectIO(glommed) +开启预读 read ahead 提高并发度 +使用避免拷贝的api + 增大buffer
53G 2x内存 顺序读sequential 56s, 945.14 MB/s 115s, 463.23 MB/s 22s, 2.35 GB/s 21s, 2.45 GB/s 7s, 7.29 GB/s

注意,随机读+ scan对内存page cache污染比较严重

在小的数据集下

Buffered I/O: size span of 1.65 GB, for 20s, 693870 IOPS
Direct I/O: size span of 1.65 GB, for 20s, 551547 IOPS

虽然慢了一点,但是内存占用上有优势

对于大的数据,优势还是比较明显的,快三倍

Buffered I/O: size span of 53.69 GB, for 20s, 237858 IOPS
Direct I/O: size span of 53.69 GB, for 20s, 547479 IOPS

作者的结论是 新时代新硬件direct IO还是非常可观的,建议相关的知识复习一下


ref/ps

  • https://github.com/DataDog/glommio 有时间仔细看看 这个作者之前是做seastar的,seastar是DirectIO+Future/Promise
  • 详细介绍的文档 https://www.datadoghq.com/blog/engineering/introducing-glommio/
  • 代码文档 https://docs.rs/glommio/0.2.0-alpha/glommio/

看到这里或许你有建议或者疑问或者指出我的错误,请留言评论或者邮件mailto:wanghenshui@qq.com, 多谢!

觉得写的不错可以点开扫码赞助几毛 微信转账
Read More

(译)用std::list的splice接口来实现LRU Cache

原文 splice拼接

这是老考试题了,实现一个查O1 插入O1的LRU cache

首先,要保证O1查找,必然需要一个hash表存key,可以用unordered_map unordered_map性能表现特别差,暂且不讨论

然后,保证O1插入 hash表/链表页满足条件

但是,要实现LRU排序,必须要引入list来维护插入顺序

是cache,要有大小限定,过期淘汰最后元素,就需要list的顺序性

get, 要刷新状态,把对应的元素提到链表头,也就是用到splice的地方

存两份不合理,保证查找,hash存key 指针,指针指向链表,淘汰的时候移除指针的同时,把hashmap的元素删掉, 这样就维护起来了

代码接口

template<typename K, typename V, size_t Capacity>
class LRUCache {
public:

 //Assert that Max size is > 0
 static_assert(Capacity > 0);

 /*Adds a key=>value item
  Returns false if key already exists*/
 bool put(const K& k, const V& v);

 /*Gets the value for a key.
  Returns empty std::optional if not found.
  The returned item becomes most-recently-used*/
 std::optional<V> get(const K& k);

 //Erases an item
 void erase(const K& k);

 //Utility function.
 //Calls callback for each {key,value}
 template<typename C>
 void forEach(const C& callback) const {
   for(auto& [k,v] : items) {
    callback(k, v);
   }
 }

private:
 /*std::list stores items (pair<K,V>) in
 most-recently-used to least-recently-used order.*/
 std::list<std::pair<K,V>> items;

 //unordered_map acts as an index to the items store above.
 std::unordered_map<K, typename std::list<std::pair<K,V>>::iterator> index;
};

put简单,两个表加一下就行了,如果慢了,拿到表尾,删两个表中的元素

template<typename K, typename V, size_t Capacity>
bool
LRUCache<K,V,Capacity>::put(const K& k, const V& v) {
 //Return false if the key already exists
 if(index.count(k)) {
  return false;
 }

 //Check if cache is full
 if(items.size() == Capacity) {
  //Delete the LRU item
  index.erase(items.back().first); //Erase the last item key from the map
  items.pop_back(); //Evict last item from the list 
 }

 //Insert the new item at front of the list
 items.emplace_front(k, v);

 //Insert {key->item_iterator} in the map 
 index.emplace(k, items.begin());

 return true;
}

get要做的,拼链表,因为访问到了,要刷新一下

template<typename K, typename V, size_t Capacity>
std::optional<V>
LRUCache<K,V,Capacity>::get(const K& k) {
 auto itr = index.find(k);
 if(itr == index.end()) {
  return {}; //empty std::optional
 }

 /*Use list splice to transfer this item to
  the first position, which makes the item
  most-recently-used. Iterators still stay valid.*/
 items.splice(items.begin(), items, itr->second);
 //从items.begin()这里开始拼接,拼接 items的 itr->second节点 就相当于抽出来拼上

 //Return the value in a std::optional
 return itr->second->second;
}

erase非常简单,和put差不多,逆向的

template<typename K, typename V, size_t Capacity>
void
LRUCache<K,V,Capacity>::erase(const K& k) {
 auto itr = index.find(k);
 if(itr == index.end()) {
  return;
 }

 //Erase from the list
 items.erase(itr->second);

 //Erase from the  map
 index.erase(itr);
}

这种splice的用法,就是从xx上把iter指向的node偷出来拼到参数指定的位置上,说是拼接,不如说是偷

c++17中,map引入来新方法,extract,也是偷节点

用法

//Ascending order
std::map<int, std::string> m1 = {
  {1, "One"}, {2, "Two"}, {3, "Three"} \
};
//Descending order
std::map<int, std::string, std::greater<>> m2 = {
  {4, "Four"}, {5, "Five"} \
};

//Print both maps
for(auto [k, v] : m1)
 std::cout << v << " "; //One Two Three

for(auto [k, v] : m2)
 std::cout << v << " "; //Five Four

//extract from m1 and insert to m2
m2.insert(m1.extract(3));

//get another node from the above node factory
m2.insert(generateNode(6, "Six"));

//Now print m2
for(auto [k, v] : m2)
 std::cout << v << " "; //Six Five Four Three

看上去和splice非常像。splice说实话这个参数设计比较复杂,应该设计成extract这样的组合小函数,更清晰一些


参考资料/链接

  • https://zh.cppreference.com/w/cpp/container/list/splice
  • splice和list::size 有一段历史
    • 可以看这篇吐槽 https://blog.csdn.net/russell_tao/article/details/8572000
    • 作者实现上的取舍 https://howardhinnant.github.io/On_list_size.html
  • 介绍extract https://www.nextptr.com/question/qa1532449120/update-keys-of-map-or-set-with-node-extract

看到这里或许你有建议或者疑问或者指出我的错误,请留言评论或者邮件mailto:wanghenshui@qq.com, 多谢! 你的评论非常重要!

觉得写的不错可以点开扫码赞助几毛 微信转账
Read More

(译) 为啥select*性能差


原文链接

作者这个经验是放在oracle上的,其他的关系型数据库有类似的判断

增加了网络的流量

那肯定啊,不需要的列/行被搜出来丢弃了,没有意义占用带宽影响延迟

增加调用方的CPU

计算量上去了

可能失去优化器优化的机会

SQL> @xi f2czqvfz3pj5w 0

SELECT * FROM soe_small.customers

---------------------------------------------------------------------------
| Id | Operation         | Name      | Starts | A-Rows | A-Time   | Reads |
---------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |           |      1 |   1699K| 00:00.57 | 28475 |
|  1 |  TABLE ACCESS FULL| CUSTOMERS |      1 |   1699K| 00:00.57 | 28475 |
---------------------------------------------------------------------------
SQL> @xi 9gwxhcvwngh96 0

SELECT customer_id, dob FROM soe_small.customers

---------------------------------------------------------------------------------------
| Id  | Operation            | Name              | Starts | A-Rows | A-Time   | Reads |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                   |      1 |   1699K| 00:00.21 |  5915 |
|   1 |  INDEX FAST FULL SCAN| IDX_CUSTOMER_DOB2 |      1 |   1699K| 00:00.21 |  5915 |
---------------------------------------------------------------------------------------

一个是全表扫一个是索引扫,效率肯定不一样

省内存

SELECT * FROM soe_small.customers ORDER BY customer_since

Plan hash value: 2792773903

----------------------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | A-Rows |   A-Time   | Used-Mem |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |   1699K|00:00:02.31 |          |
|   1 |  SORT ORDER BY     |           |      1 |   1699K|00:00:02.31 |  232M (0)|
|   2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |   1699K|00:00:00.24 |          |
----------------------------------------------------------------------------------

效果显而易见

SELECT customer_id,dob FROM soe_small.customers ORDER BY customer_since

Plan hash value: 2792773903

----------------------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | A-Rows |   A-Time   | Used-Mem |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |   1699K|00:00:00.59 |          |
|   1 |  SORT ORDER BY     |           |      1 |   1699K|00:00:00.59 |   67M (0)|
|   2 |   TABLE ACCESS FULL| CUSTOMERS |      1 |   1699K|00:00:00.13 |          |
----------------------------------------------------------------------------------

增加服务端cpu占用

首先,大量的数据的parse要浪费cpu,优化也要浪费cpu

SQL> SET AUTOTRACE TRACE STAT

SQL> SELECT * FROM widetable /* test100 */;

100 rows selected.

Statistics
----------------------------------------------------------
       2004  recursive calls
       5267  db block gets
       2458  consistent gets
          9  physical reads
    1110236  redo size
     361858  bytes sent via SQL*Net to client
        363  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed
        
        
SQL> SELECT id,col1 FROM widetable /* test101 */;

100 rows selected.

Statistics
----------------------------------------------------------
          5  recursive calls
         10  db block gets
         51  consistent gets
          0  physical reads
       2056  redo size
       1510  bytes sent via SQL*Net to client
        369  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

作者写了个插件,Session Snapper ,可以抓时间

SQL> SELECT * FROM widetable /* test1 */;

SQL> @snapper stats,gather=t 5 1 1136
Sampling SID 1136 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v4.30 - by Tanel Poder ( http://blog.tanelpoder.com/snapper )

-----------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                          ,         DELTA
-----------------------------------------------------------------------------
   1136, SYSTEM    , TIME, hard parse elapsed time            ,         78158
   1136, SYSTEM    , TIME, parse time elapsed                 ,         80912
   1136, SYSTEM    , TIME, PL/SQL execution elapsed time      ,           127
   1136, SYSTEM    , TIME, DB CPU                             ,         89580
   1136, SYSTEM    , TIME, sql execute elapsed time           ,          5659
   1136, SYSTEM    , TIME, DB time                            ,         89616


SQL> SELECT id,col1 FROM widetable /* test2 */;

-----------------------------------------------------------------------------
    SID, USERNAME  , TYPE, STATISTIC                          ,         DELTA
-----------------------------------------------------------------------------
   1136, SYSTEM    , TIME, hard parse elapsed time            ,          1162
   1136, SYSTEM    , TIME, parse time elapsed                 ,          1513
   1136, SYSTEM    , TIME, PL/SQL execution elapsed time      ,           110
   1136, SYSTEM    , TIME, DB CPU                             ,          2281
   1136, SYSTEM    , TIME, sql execute elapsed time           ,           376
   1136, SYSTEM    , TIME, DB time                            ,          2128

能看得出来这个parse时间上的节省

缓存的cursor 浪费内存

SQL> SELECT sharable_mem, sql_id, child_number, sql_text FROM v$sql 
     WHERE sql_text LIKE 'SELECT % FROM widetable';

SHARABLE_MEM SQL_ID        CHILD_NUMBER SQL_TEXT
------------ ------------- ------------ -------------------------------------
       19470 b98yvssnnk13p            0 SELECT id,col1 FROM widetable
      886600 c4d3jr3fjfa3t            0 SELECT * FROM widetable

作者还有个插件 sqlmem.sql ,可以看具体的浪费(作者有点东西)

SQL> @sqlmem c4d3jr3fjfa3t
Show shared pool memory usage of SQL statement with SQL_ID c4d3jr3fjfa3t

CHILD_NUMBER SHARABLE_MEM PERSISTENT_MEM RUNTIME_MEM
------------ ------------ -------------- -----------
           0       886600         324792      219488


TOTAL_SIZE   AVG_SIZE     CHUNKS ALLOC_CL CHUNK_TYPE STRUCTURE            FUNCTION             CHUNK_COM            HEAP_ADDR
---------- ---------- ---------- -------- ---------- -------------------- -------------------- -------------------- ----------------
    272000        272       1000 freeabl           0 kccdef               qkxrMem              kccdef: qkxrMem      000000019FF49290
    128000        128       1000 freeabl           0 opn                  qkexrInitO           opn: qkexrInitO      000000019FF49290
    112568         56       2002 freeabl           0                      qosdInitExprCtx      qosdInitExprCtx      000000019FF49290
     96456         96       1000 freeabl           0                      qosdUpdateExprM      qosdUpdateExprM      000000019FF49290
     57320         57       1000 freeabl           0 idndef*[]            qkex                 idndef*[]: qkex      000000019FF49290
     48304         48       1000 freeabl           0 qeSel                qkxrXfor             qeSel: qkxrXfor      000000019FF49290
     40808         40       1005 freeabl           0 idndef               qcuAll               idndef : qcuAll      000000019FF49290
     40024      40024          1 freeabl           0 kafco                qkacol               kafco : qkacol       000000019FF49290
     37272        591         63 freeabl           0                      237.kggec            237.kggec            000000019FF49290
     16080       8040          2 freeabl           0 qeeRwo               qeeCrea              qeeRwo: qeeCrea      000000019FF49290
      8032       8032          1 freeabl           0 kggac                kggacCre             kggac: kggacCre      000000019FF49290
      8024       8024          1 freeabl           0 kksoff               opitca               kksoff : opitca      000000019FF49290
      3392         64         53 freeabl           0 kksol                kksnsg               kksol : kksnsg       000000019FF49290
      2880       2880          1 free              0                      free memory          free memory          000000019FF49290
      1152        576          2 freeabl           0                      16751.kgght          16751.kgght          000000019FF49290
      1040       1040          1 freeabl           0 ctxdef               kksLoadC             ctxdef:kksLoadC      000000019FF49290
       640        320          2 freeabl           0                      615.kggec            615.kggec            000000019FF49290
       624        624          1 recr           4095                      237.kggec            237.kggec            000000019FF49290
       472        472          1 freeabl           0 qertbs               qertbIAl             qertbs:qertbIAl      000000019FF49290
...

53 rows selected.

对比

SQL> @sqlmem b98yvssnnk13p
Show shared pool memory usage of SQL statement with SQL_ID b98yvssnnk13p

CHILD_NUMBER SHARABLE_MEM PERSISTENT_MEM RUNTIME_MEM
------------ ------------ -------------- -----------
           0        19470           7072        5560


TOTAL_SIZE   AVG_SIZE     CHUNKS ALLOC_CL CHUNK_TYPE STRUCTURE            FUNCTION             CHUNK_COM            HEAP_ADDR
---------- ---------- ---------- -------- ---------- -------------------- -------------------- -------------------- ----------------
      1640       1640          1 free              0                      free memory          free memory          00000001AF2B75D0
      1152        576          2 freeabl           0                      16751.kgght          16751.kgght          00000001AF2B75D0
      1040       1040          1 freeabl           0 ctxdef               kksLoadC             ctxdef:kksLoadC      00000001AF2B75D0
       640        320          2 freeabl           0                      615.kggec            615.kggec            00000001AF2B75D0
       624        624          1 recr           4095                      237.kggec            237.kggec            00000001AF2B75D0
       544        272          2 freeabl           0 kccdef               qkxrMem              kccdef: qkxrMem      00000001AF2B75D0
       472        472          1 freeabl           0 qertbs               qertbIAl             qertbs:qertbIAl      00000001AF2B75D0
       456        456          1 freeabl           0 opixpop              kctdef               opixpop:kctdef       00000001AF2B75D0
       456        456          1 freeabl           0 kctdef               qcdlgo               kctdef : qcdlgo      00000001AF2B75D0
       328         54          6 freeabl           0                      qosdInitExprCtx      qosdInitExprCtx      00000001AF2B75D0
       312        312          1 freeabl           0 pqctx                kkfdParal            pqctx:kkfdParal      00000001AF2B75D0
       296        296          1 freeabl           0                      unmdef in opipr      unmdef in opipr      00000001AF2B75D0
       256        128          2 freeabl           0 opn                  qkexrInitO           opn: qkexrInitO      00000001AF2B75D0
       256         42          6 freeabl           0 idndef               qcuAll               idndef : qcuAll      00000001AF2B75D0
       208         41          5 freeabl           0                      kggsmInitCompac      kggsmInitCompac      00000001AF2B75D0
       192         96          2 freeabl           0                      qosdUpdateExprM      qosdUpdateExprM      00000001AF2B75D0
       184        184          1 freeabl           0                      237.kggec            237.kggec            00000001AF2B75D0
...

1000:2

大对象LOB

浪费更多,(延迟上,带宽上,cpu上等等)

另外,不要在select *上select

SELECT
    id, a 
FROM (
    SELECT * FROM tl
)

SELECT * FROM (
    SELECT id, a FROM tl
)

PS作者还有很多工具 https://tanelpoder.com/psnapper/ https://0x.tools/ 有点意思,定位问题专家了学习关注一波


看到这里或许你有建议或者疑问或者指出我的错误,请留言评论或者邮件mailto:wanghenshui@qq.com, 多谢!

觉得写的不错可以点开扫码赞助几毛 微信转账
Read More

(译)std::any和void*的对比

翻译整理自这篇文章

std::any不是替代void*的产物,但是在某些场景下确实是更安全的替代品,并且 std::any也是构建在void*之上的

实际上就是记住类型信息的void* (type-aware void *)

struct any {
 void* object;
 type_info tinfo;
};

由于不是模版,不能携带类型信息,所以要有额外的绑定信息

而且 std::any还要做small object optimization, SOO (也叫SBO, small buffer optimization), 如果存个int/double指针只有两三个,不需要堆分配,直接SBO

此外,std::any还支持移动语义,偷数据

std::any a = std::string("Hello");

//value cast creates a copy
std::cout << std::any_cast<std::string>(a) << "\n"; //Hello

//reference cast
std::any_cast<std::string&>(a)[0] = 'h'; //cast as reference and change

//value is changed to "hello" now

//cast as const reference and print
std::cout << std::any_cast<const std::string&>(a) << "\n"; //hello

//  --- prints "Wrong Type!" below ---
try {
 std::cout << std::any_cast<double>(a) << "\n";
}catch(const std::bad_any_cast&) {
 std::cout << "Wrong Type!\n";
}

//Pointer cast example
//    ---     prints "hello" below   ---
if(auto* ptr = std::any_cast<std::string>(&a)) {
 std::cout << *ptr << "\n";
} else {
 std::cout << "Wrong Type!\n";
}

//move example
auto str = std::any_cast<std::string&&>(std::move(a));

//std::string in 'a' is moved
std::cout << str << "\n"; //hello

//string in 'a' is moved but it is not destroyed
//therefore 'a' is not empty.
std::cout << std::boolalpha << a.has_value() <<  "\n"; //true

//but should print ""
std::cout << std::any_cast<std::string>(a) << "\n"; //should be ""

std::any的一个典型应用场景

假设我们要实现一个带TTL的cache, key是string,值可以是任意

class TTLCache {
public:

 //Initializes with a given ttl (in seconds)
 TTLCache(uint32_t ttl):ttlSeconds(ttl){}

 //Adds an item to the cache along with the current timestamp
 bool add(const std::string& key, const std::any& value);

 //Gets a value from cache if exists
 // - otherwise returns empty std::any
 std::any get(const std::string& key);

 //Erases an item for a given key if exists
 void erase(const std::string& key);

 // Fires periodically in a separate thread and erases the items
 //  - from cache that are older than the ttlSeconds
 void onTimer();

 //...more interfaces...

private:

 //Values stored along with timestamp
 struct Item {
  time_t timestamp;
  std::any value;
 };

 //Expire time (ttl) of items in seconds
 uint32_t ttlSeconds;

 //Items are stored against keys along with timestamp
 std::unordered_map<std::string, Item> items;
};

暂时不考虑什么O1效率之类的问题

void *的一个典型应用场景

网络传输数据,user data,用void *表达任意二进制/字符串/协议数据

//Clients send requests to servers
struct Request {
 /*..Request fields..*/

 //User data can be set by clients
 void* userData;
};

//When a response comes to the client, it has
// - same user data that was attached to the Request
struct Response {
 /*..Response fields..*/

 //User data copied from Request
 void* userData;
};


void sendRequest() {
 Request req;
 //Prepare request
 req.userData = new std::string("state data"); //Attach user data
 //Send request to server...
}

//Process response 
void processResponse(Response& res) {
 auto state = (std::string*)(res.userData); //cast not type-safe
 //Process response using state data....
 delete state;  // delete state
}

发送数据new出来,处理数据知道数据是new的,处理后删掉

这种场景下,不类型安全且需要堆分配,没有SBO优化

可以用std::any轻松替换

//--- Suppose userData is std::any ---

void sendRequest() {
 Request req;
 req.userData = std::string("state data"); //attach user data
 //send request to server
}

void processResponse(Response& res) {
 auto& state = std::any_cast<std::string&>(res.userData); //throws if type does not match
 //Process response using state data....
 //No need to explicitly delete the user data.
}

优化也用上了,也不用担心类型的问题,也不用担心释放的问题,一箭三雕

这种user data之前有一种解决方案,std::shared_ptr<void> 这里有文章介绍, 简单说,就是利用shared_ptr构造的时候会记录类型,保证析构

译者注: 之前比较无知还反驳过同事不能这么用

std::shared_ptr<void> vps = std::make_shared<std::string>(); //OK 
vps.reset();  //Appropriate destructor is called

auto sps = std::static_pointer_cast<std::string>(vps); //OK with typecast
//sps is std::shared_ptr<std::string>

针对user data场景,直接把userdata类型换成shared_ptr<void>就行了

缺点在于用不上SBO优化,也有多余的记录开销,也没有移动内部对象的能力, 如果用不上c++17,临时用用也可以,最佳方案还是这个std::any


看到这里或许你有建议或者疑问或者指出我的错误,请留言评论或者邮件mailto:wanghenshui@qq.com, 多谢! 你的评论非常重要!

觉得写的不错可以点开扫码赞助几毛 微信转账
Read More

(译)搞定深度递归


原文链接

简单来说,作者写sql parse代码,可能需要分析表达式,但是表达式特别多,parse代码一半都是遍历二叉树,有递归的,这样递归深度就上去了

unique_ptr<Expression> analyzeExpression(AST* astNode) {  
   switch (astNode->getType()) {  
    case AST::BinaryExpression: return analyzeBinaryExpression(astNode->as<BinaryExpAST>());  
    case AST::CaseExpression: return analyzeCaseExpression(astNode->as<CaseExpAST>());  
    ...  
   }  
 }  
 unique_ptr<Expression> analyzeBinaryExpression(BinaryExpAST* astNode) {  
   auto left = analyzeExpression(astNode->left);  
   auto right = analyzeExpression(astNode->right);  
   auto type = inferBinaryType(astNode->getOp(), left, right);  
   return make_unique<BinaryExpression>(astNode->getOp(), move(left), move(right), type);  
 }  

表达式规模300000个,直接栈溢出了,不得不探索解决办法

__builtin_frame_address(0)

抓到堆栈溢出直接抛异常,解决的比较恶心

  • 表达式太大,直接这个查询就挂了
  • 其次,很多地方都这样遍历,不能确定哪里会有这种较大的场景,
  • 在优化的过程中,树的结构可能会调整,说不定更深了,表达式更大了,为了优化作出的努力反而因为栈溢出停了

指定堆栈?更恶心了

-fsplit-stack

编译器帮忙维护堆栈吧,这个flag就相当于编译器自动的分堆栈,但是实际测试中,编译器直接内部错误(ICE) ,也没有别人使用过的案例,放弃

boost.context

最终解决方案,用户态堆栈,且不用自己维护,代码改成这个样子

 unique_ptr<Expression> analyzeExpression(AST* astNode) {  
   if (StackGuard::needsNewStack())  
    return StackGuard::newStack([=]() { return analyzeExpression(astNode); });  
   ... // unchanged  
 }  

靠boost.context来切换堆栈,降低心智负担,代码也没有那么丑陋,可维护


ref

  • 他们还发了论文 https://db.in.tum.de/~radke/papers/hugejoins.pdf 这个论文内容说优化的,不是上面的工程实践内容

看到这里或许你有建议或者疑问或者指出我的错误,请留言评论或者邮件mailto:wanghenshui@qq.com, 多谢!

觉得写的不错可以点开扫码赞助几毛 微信转账
Read More

^