aurora相关整理


ref

  1. https://nan01ab.github.io/2017/06/Amazon-Aurora.html 另外这个博客全是论文。不错
  2. http://liuyangming.tech/05-2019/aurora.html博客不错 http://liuyangming.tech/02-2020/myrocks.html
  3. https://www.cnblogs.com/cchust/p/7476876.html
  4. http://mysql.taobao.org/monthly/2015/10/07/
  5. https://zhuanlan.zhihu.com/p/27872160
  6. https://blog.acolyer.org/2019/03/27/amazon-aurora-on-avoiding-distributed-consensus-for-i-os-commits-and-membership-changes/

Read More

(译)讨论folly的静态注入技术:如何不改接口合法的访问私有成员函数?

原文链接

这段代码是研究 folly发现的 源代码在这里

前提: 方法

class Widget {
private:
  void forbidden();
};

访问

void hijack(Widget& w) {
  w.forbidden();  // ERROR!
}
  In function 'void hijack(Widget&)':
  error: 'void Widget::forbidden()' is private
  within this context
        |     w.forbidden();
        |   

解决思路

类函数可以通过指针来调用!

比如

class Calculator {
  float current_val = 0.f;
 public:
   void clear_value() { current_val = 0.f; };
   float value() const {
     return current_val;
   };

   void add(float x) { current_val += x; };
   void multiply(float x) { current_val *= x; };
};

using Operation = void (Calculator::*)(float);
Operation op1 = &Calculator::add;
Operation op2 = &Calculator::multiply;
Calculator calc{};
(calc.*op1)(123.0f); // Calls add
(calc.*op2)(10.0f);  // Calls multiply

私有的函数通过公有函数传指针,绕过

class Widget {
 public:
  static auto forbidden_fun() {
    return &Widget::forbidden;
  }
 private:
  void forbidden();
};

void hijack(Widget& w) {
  using ForbiddenFun = void (Widget::*)();
  ForbiddenFun const forbidden_fun = Widget::forbidden_fun();

  // Calls a private member function on the Widget
  // instance passed in to the function.
  (w.*forbidden_fun)();
}

但是一般函数是不会这么设计API的,太傻逼了,那怎么搞?

通过模版实例化绕过!

The C++17 standard contains the following paragraph (with the parts of interest to us marked in bold):

17.7.2 (item 12)

The usual access checking rules do not apply to names used to specify explicit instantiations. [Note: In particular, the template arguments and names used in the function declarator (including parameter types, return types and exception specifications) may be private types or objects which would normally not be accessible and the template may be a member template or member function which would not normally be accessible.]

重点 显式实例化

最终方案敲定: 私有成员函数指针做模版的非类型模版参数(NTTP)

// The first template parameter is the type
// signature of the pointer-to-member-function.
// The second template parameter is the pointer
// itself.
template <
  typename ForbiddenFun,
  ForbiddenFun forbidden_fun
>
struct HijackImpl {
  static void apply(Widget& w) {
    // Calls a private method of Widget
    (w.*forbidden_fun)();
  }
};

// Explicit instantiation is allowed to refer to
// `Widget::forbidden` in a scope where it's not
// normally permissible.
template struct HijackImpl<
  decltype(&Widget::forbidden),
  &Widget::forbidden
>;

void hijack(Widget& w) {
  HijackImpl<decltype(&Widget::forbidden), &Widget::forbidden>::apply(w);
}

但是还是报错,理论上可行,但实际上还是会提示私有,原因在于HijackImpl不是显式实例化

freind封装一层调用 + 显式实例化

// HijackImpl is the mechanism for injecting the
// private member function pointer into the
// hijack function.
template <
  typename ForbiddenFun,
  ForbiddenFun forbidden_fun
>
class HijackImpl {
  // Definition of free function inside the class
  // template to give it access to the
  // forbidden_fun template argument.
  // Marking hijack as a friend prevents it from
  // becoming a member function.
  friend void hijack(Widget& w) {
    (w.*forbidden_fun)();
  }
};
// Declaration in the enclosing namespace to make
// hijack available for name lookup.
void hijack(Widget& w);

// Explicit instantiation of HijackImpl template
// bypasses access controls in the Widget class.
template class
HijackImpl<
  decltype(&Widget::forbidden),
  &Widget::forbidden
>;

总结这几条

  • 通过显式模版实例化把私有成员函数暴露出来
  • 用成员函数的地址指针作为HijackImpl的模版参数
  • 定义hijack函数在HijackImpl内部,直接用私有成员函数指针做函数调用
  • 通过freind修饰来hijack,这样hijack就可以在外面调用里面的HijackImpl
  • 显式实例化,这样调用就可以了

还有一个最终的问题,实现和实例化都在头文件,在所有的编译单元(translation units, TU)里, 显式实例化只能是一个,否则会报mutiple 链接错误,如何保证?

folly的做法,加个匿名tag,这样每个TU的符号名都不一样,最终方案如下

namespace {
// This is a *different* type in every translation
// unit because of the anonymous namespace.
struct TranslationUnitTag {};
}

void hijack(Widget& w);

template <
  typename Tag,
  typename ForbiddenFun,
  ForbiddenFun forbidden_fun
>
class HijackImpl {
  friend void hijack(Widget& w) {
    (w.*forbidden_fun)();
  }
};

// Every translation unit gets its own unique
// explicit instantiation because of the
// guaranteed-unique tag parameter.
template class HijackImpl<
  TranslationUnitTag,
  decltype(&Widget::forbidden),
  &Widget::forbidden
>;

参考

  • The Power of Hidden Friends in C++’ posted 25 June 2019: https://www.justsoftwaresolutions.co.uk/cplusplus/hidden-friends.html
  • Dan Saks ‘Making New Friends’ https://www.youtube.com/watch?v=POa_V15je8Y ](https://www.youtube.com/watch?v=POa_V15je8Y)
  • Johannes Schaub ‘Access to private members. That’s easy!’,http://bloglitb.blogspot.com/2011/12/access-to-private-members-safer.html
  • Johannes Schaub ‘Access to private members: Safer nastiness’, posted 30 December 2011: http://bloglitb.blogspot.com/2011/12/access-to-private-members-safer.html
  • https://dfrib.github.io/a-foliage-of-folly/ 这个文章更进一步,接下来翻译这个
Read More

(转)Correctly implementing a spinlock in cpp


https://rigtorp.se/spinlock/

不多说,上代码

struct alignas(64) spinlock {
  std::atomic<bool> lock_ = {0};

  void lock() noexcept {
    for (;;) {
      // Optimistically assume the lock is free on the first try
      if (!lock_.exchange(true, std::memory_order_acquire)) {
        return;
      }
      // Wait for lock to be released without generating cache misses
      while (lock_.load(std::memory_order_relaxed)) {
        // Issue X86 PAUSE or ARM YIELD instruction to reduce contention between
        // hyper-threads
        __builtin_ia32_pause();
      }
    }
  }

  bool try_lock() noexcept {
    // First do a relaxed load to check if lock is free in order to prevent
    // unnecessary cache misses if someone does while(!try_lock())
    return !lock_.load(std::memory_order_relaxed) &&
           !lock_.exchange(true, std::memory_order_acquire);
  }

  void unlock() noexcept {
    lock_.store(false, std::memory_order_release);
  }
};

Ticket spinlocks

https://mfukar.github.io/2017/09/08/ticketspinlock.html

struct TicketSpinLock {
    /**
     * Attempt to grab the lock:
     * 1. Get a ticket number
     * 2. Wait for it
     */
    void enter() {
        const auto ticket = next_ticket.fetch_add(1, std::memory_order_relaxed);

        while (true) {
            const auto currently_serving = now_serving.load(std::memory_order_acquire);
            if (currently_serving == ticket) {
                break;
            }

            const size_t previous_ticket = ticket - currently_serving;
            const size_t delay_slots = BACKOFF_MIN * previous_ticket;

            while (delay_slots--) {
                spin_wait();
            }
        }
    }
    static inline void spin_wait(void) {
    #if (COMPILER == GCC || COMPILER == LLVM)
        /* volatile here prevents the asm block from being moved by the optimiser: */
        asm volatile("pause" ::: "memory");
    #elif (COMPILER == MVCC)
        __mm_pause();
    #endif
    }

    /**
     * Since we're in the critical section, no one can modify `now_serving`
     * but this thread. We just want the update to be atomic. Therefore we can use
     * a simple store instead of `now_serving.fetch_add()`:
     */
    void leave() {
        const auto successor = now_serving.load(std::memory_order_relaxed) + 1;
        now_serving.store(successor, std::memory_order_release);
    }

    /* These are aligned on a cache line boundary in order to avoid false sharing: */
    alignas(CACHELINE_SIZE) std::atomic_size_t now_serving = {0};
    alignas(CACHELINE_SIZE) std::atomic_size_t next_ticket = {0};
};

static_assert(sizeof(TicketSpinLock) == 2*CACHELINE_SIZE,
    "TicketSpinLock members may not be aligned on a cache-line boundary");


Read More

遇到的两个jenkins问题


傻逼jenkins

不知道平台的人把jenkins怎么了,可能是升级了。能用内置CI还是不要用第三方组件,真是闹心

  • 乱码

image-20200422170106071

不止这一个命令,git rm都会乱码,我还以为是脚本隐藏了不可见字符,改了半天啊不好使

然后猜测是有中文注释的原因,去掉,依旧不行

最后发现参考链接1 在脚本前加一行

export LANG="en_US.UTF-8"  
  • 找不到命令

image-20200422170524986

PATH被清空了。在脚本前加上PATH定义即可

export PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin"

ref

  1. https://blog.csdn.net/qq_35732831/article/details/85236562
  2. https://www.cnblogs.com/weifeng1463/p/9419358.html
  3. https://testerhome.com/topics/15136

Read More


asan常见的抓错报告



asan常见的 抓错报告 编译带上 -fsanitize=address 链接带上 -lasan

global-buffer-overflow memcmp的长度可能越界

R: AddressSanitizer: global-buffer-overflow on address 0x000000a8f8ff at pc 0x7ff6eafde870 bp 0x7ffc75471220 sp 0x7ffc754709d0 READ of size 49 at 0x000000a8f8ff thread T0 #0 0x7ff6eafde86f in __interceptor_memcmp ../../../../gcc-5.4.0/libsanitizer/asan/asan_interceptors.cc:333

注意memcmp的第三个参数,取两个字符串中最小的长度

相关概念 OOB memory access

heap-buffer-overflow strlen访问内存越界

assert(n == strlen(val)); AddressSanitizer: heap-buffer-overflow

可能字符串没有分配’\0’的空间,用strlen会导致堆空间越界

AddressSanitizer: attempting to call malloc_usable_size

这个rocksdb的报错。 搜了一圈,二进制是jemalloc编的,和asan和rocksdb 有冲突产生的报错。临时禁止掉

ASAN_OPTIONS=check_malloc_usable_size=0

重编二进制,不带jemalloc,好使了

AddressSanitizer: attempting to call malloc_usable_size() for pointer which is not owned: 0x7f121aed6000
    #0 0x7f121f506990 in __interceptor_malloc_usable_size ../../../../gcc-5.4.0/libsanitizer/asan/asan_malloc_linux.cc:104
    #1 0x8c7929 in rocksdb::Arena::AllocateNewBlock(unsigned long) util/arena.cc:221
    #2 0x8c79c4 in rocksdb::Arena::AllocateFallback(unsigned long, bool) util/arena.cc:114
    #3 0x8df67a in rocksdb::LogBuffer::AddLogToBuffer(unsigned long, char const*, __va_list_tag*) util/log_buffer.cc:24
    #4 0x8df8c8 in rocksdb::LogToBuffer(rocksdb::LogBuffer*, char const*, ...) util/log_buffer.cc:88
    #5 0x749300 in rocksdb::DBImpl::FlushMemTableToOutputFile(rocksdb::ColumnFamilyData*, rocksdb::MutableCFOptions const&, bool*, rocksdb::JobContext*, rocksdb::SuperVersionContext*, rocksdb::LogBuffer*) db/db_impl_compaction_flush.cc:183
    #6 0x74c1f4 in rocksdb::DBImpl::FlushMemTablesToOutputFiles(rocksdb::autovector<rocksdb::DBImpl::BGFlushArg, 8ul> const&, bool*, rocksdb::JobContext*, rocksdb::LogBuffer*) db/db_impl_compaction_flush.cc:229
    #7 0x74d3b0 in rocksdb::DBImpl::BackgroundFlush(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::FlushReason*) db/db_impl_compaction_flush.cc:2025
    #8 0x74da4f in rocksdb::DBImpl::BackgroundCallFlush() db/db_impl_compaction_flush.cc:2059
    #9 0x8e8a27 in std::function<void ()>::operator()() const /usr/local/include/c++/5.4.0/functional:2267
    #10 0x8e8a27 in rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long) util/threadpool_imp.cc:265
    #11 0x8e8c0e in rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*) util/threadpool_imp.cc:303
    #12 0x7f121e1fb8ef in execute_native_thread_routine ../../../../../gcc-5.4.0/libstdc++-v3/src/c++11/thread.cc:84
    #13 0x7f121dd19dc4 in start_thread (/lib64/libpthread.so.0+0x7dc4)
    #14 0x7f121da477fc in __clone (/lib64/libc.so.6+0xf67fc)

AddressSanitizer can not describe address in more detail (wild memory access suspected).
SUMMARY: AddressSanitizer: bad-malloc_usable_size ../../../../gcc-5.4.0/libsanitizer/asan/asan_malloc_linux.cc:104 __interceptor_malloc_usable_size
Thread T2 created by T0 here:
    #0 0x7f121f4a80d4 in __interceptor_pthread_create ../../../../gcc-5.4.0/libsanitizer/asan/asan_interceptors.cc:179
    #1 0x7f121e1fba32 in __gthread_create /home/vdb/gcc-5.4-build/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits/gthr-default.h:662
    #2 0x7f121e1fba32 in std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) ../../../../../gcc-5.4.0/libstdc++-v3/src/c++11/thread.cc:149

ref

  • 这里有建议不要使用memcmp的讨论,还是怕越界 https://github.com/cesanta/mongoose/issues/564
  • https://github.com/pcrain/slippc/issues/16 一个global buffer overflow case

Read More

fd泄漏 or socket相关问题分析命令总结


fd数目有没有上涨?

 lsof -n|awk '{print $2}'| sort | uniq -c | sort -nr | head

20个最高fd线程

for x in `ps -eF| awk '{ print $2 }'`;do echo `ls /proc/$x/fd 2> /dev/null | wc -l` $x `cat /proc/$x/cmdline 2> /dev/null`;done | sort -n -r | head -n 20

具体到进程

ll /proc/pid/fd | wc -l

fd都用来干啥了

strace -p pid  -f -e read,write,close

Ref

  • https://oroboro.com/file-handle-leaks-server/ 一个fd泄漏总结
    • 大众错误观点
      • time-wait太多导致fd占用 -> 不会。close就可以复用了。和time-wait两回事
      • close fd太慢 -> 不会。调用close返回值后就可以复用,是否真正关闭是系统的事儿
    • 几个常见场景
      • 子进程导致的重复fd
      • 太多连接
      • 创建子进程的时候关闭fd泄漏
  • https://serverfault.com/questions/135742/how-to-track-down-a-file-descriptor-leak
  • 查看所有tcphttp://blog.fatedier.com/2016/07/18/stat-all-connection-info-of-special-process-in-linux/

Read More

Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook


根据ppt和论文总结一下


概述

如今KV应用非常广泛,然而

  • KV数据集在不同的应用上有不同的表现。对现实生活中的数据集分析非常有限
  • 同一个应用,数据集也是不断变化的,怎么采集分析这些变动?
  • 基于上,如何分析真正的瓶颈在哪,如何提高性能?

方法和工具

  • 方法 收集数据集,分析数据结构,简历数据集模型,对比,提高benchmark性能,调优
  • 工具 trace collector, trace replayer, trace analyzer, benchmarks

论文基于三个rocksdb应用来分析

案例分析

UDB

facebook做的社交数据收集工具,底层是mysql on myrocks

  rocksdb key rocksdb value
primary key table index number + primary key columns + checksum
secondary key table index number + secondary key + primary key checksum

UDB的RocksDB通过6个ColumnFamily来存储不同类型的数据,分别是:

Object:存储object数据

Assoc:存储associations数据

Assoc_count:存储每个object的association数

Object_2ry,Assoc_2ry:object和association的secondary index

Non-SG:存储其他非社交图数据相关的服务

ZippyDB UP2X

rocksdb kv集群,用来保存AIML信息的

采集的数据类别

  • 查询构成
  • kv大小以及分布
  • kv 热点以及访问分布
  • qps
  • 热key分布
  • Key-space and temporal localities等等

由于上面的特性大多和业务相关,就不列举了。只列keysize

三个应用的 key size特点,都集中在一个范围 这不是废话吗

图太大不贴了,看ppt 15页

然后通过trace_replay重放数据集,自己构造一组类似的数据集,通过ycsb来模拟

具体怎么用的没有讲


ref

  • 详细的论文描述看这里 https://www.jianshu.com/p/97d9bdd3cd4e 我只说了个大概
  • https://www.usenix.org/system/files/fast20-cao_zhichao.pdf
  • https://rockset.com/rocksdb/RocksDBZhichaoFAST2020.pdf?fbclid=IwAR0j6IpFrZ_hiYJOJLf5bMENUC2v86LUw69KWh_0ZBvQxMqWiDahyb0IYDw
  • 文章中提到的工具在论文引用里介绍了,wiki 页面 https://github.com/facebook/rocksdb/wiki/RocksDB-Trace%2C-Replay%2C-Analyzer%2C-and-Workload-Generation 有机会可以试试

Read More

数据结构算法相关查缺补漏

刷题单

https://www.lintcode.com/ladder/47/

https://www.lintcode.com/ladder/2/

kuangbin这个太难了。面试不考虑 https://vjudge.net/article/187

https://leetcode-solution-leetcode-pp.gitbook.io/leetcode-solution 这个题解可以看一遍

https://github.com/SharingSource/LogicStack-LeetCode/wiki 这个分类很牛,搜索到这里的同学,看这个wiki就行了。我基本也炒这个

Read More

afl测试


AFL是一种fuzz test工具,可以用来测试各种输入是否引起被测工具崩溃,比如gcc之类的。但是如果是网络模块,比如redis,nginx,没有好的模拟网络的办法。下面是一个演示示例,结合preeny来mock网络

准备工作

编译afl ,tarball在这里https://lcamtuf.coredump.cx/afl/下载

CC=/usr/local/bin/gcc make -j#注意自己的gcc版本。如果不需要考虑这个问题直接make
make install
#cmake指定,编译自己的二进制,指定g++
cmake ../ -DCXX_COMPILER_PATH=/usr/local/bin/afl-g++
#如果不是cmake,指定CC
CXX=/usr/local/bin/afl-g++ make -j

编译preeny没什么难的 参考https://github.com/zardus/preeny readme即可

测试

preeny可以把标准输入强制转换成socket输入,指定好LD_PRELOAD即可 参考链接 2 3 分别给了redis和nginx的例子

我这里使用的是redis,环境是wsl,参考的参考链接2生成的用例

LD_PRELOAD=/mnt/d/github/preeny/x86_64-linux-gnu/desock.so afl-fuzz -m 8G -i fuzz_in -o fuzz_out/ ./redis-server

测试preeny是否生效可以使用

LD_PRELOAD=/mnt/d/github/preeny/x86_64-linux-gnu/desock.so ./redis-server ./redis.conf  < set a b

跑了一个周末,没有发现崩溃的现象。

注意

wsl setsockopt TCP_NODELAY会报错invalid argument。屏蔽掉即可


ref

本文参考

  1. 主要思路 https://copyninja.info/blog/afl-and-network-programs.html
  2. https://volatileminds.net/2015/08/20/advanced-afl-usage-preeny.html
  3. https://lolware.net/2015/04/28/nginx-fuzzing.html

几个afl使用例子

  1. http://0x4c43.cn/2018/0722/use-afl-for-fuzz-testing/ 测试imageshark的
  2. https://stfpeak.github.io/2017/06/11/Finding-bugs-using-AFL/ 举例测试输入漏洞
  3. https://www.freebuf.com/articles/system/191536.html fuzz介绍,原理
  4. http://zeroyu.xyz/2019/05/15/how-to-use-afl-fuzz/ afl使用指南
  5. https://paper.seebug.org/496/ 原理
  6. https://www.fastly.com/blog/how-fuzz-server-american-fuzzy-lop

Read More

^