读书笔记取自<<bpf之巅>>第六章节
cpu很多时候承担着运行指令码的重要作用,在linux内核中承担着cpu调度、暴露系统调用、处理中断等功能。
cpu调度器
系统内核需要在不同程序之间共享cpu资源。cpu调度器,可以 通过多个状态展示出队列的运行状态。
cpu的几种状态,这里不说了,之前在进程调度里有过介绍(CFS),bpf书中的介绍:
1)on-proc cpu上运行的线程
2)runnable 指可以运行,但正在排队等待的线程
3)sleep 指正在等待的其他事件
有两种方式可以让线程脱离cpu执行:
1)主动脱离。它发生在线程阻塞i\o、锁或者主动休眠时候;
2)被动脱离,如果线程运行时长超过了调度器分配给其他的cpu时间,或者高优先级抢占低优先级的任务时候,就会被调度器调高cpu,以便让其他线程运行。当cpu从一个进程\线程切换到另一个进程\线程运行时候,需要更换内存寻址信息和其他上下文信息,我们将这种行为称为上下文切换。
运行队列:
正在排队的线程数量基本就反应了cpu的饱和度?
cpu缓存
L1 是指令和cpu缓存
L2二级缓存,它是为了协调一级缓存和内存之间的速度。cpu调用缓存首先是一级缓存,当处理器的速度逐渐提升,会导致一级缓存就供不应求,这样就得提升到二级缓存了。二级缓存它比一级缓存的速度相对来说会慢,但是它比一级缓存的空间容量要大。主要就是做一级缓存和内存之间数据临时交换的地方用。
L3三级缓存是为读取二级缓存后未命中的数据设计的—种缓存,在拥有三级缓存的CPU中,只有约5%的数据需要从内存中调用,这进一步提高了CPU的效率。其运作原理在于使用较快速的储存装置保留一份从慢速储存装置中所读取数据并进行拷贝,当有需要再从较慢的储存体中读写数据时,缓存(cache)能够使得读写的动作先在快速的装置上完成,如此会使系统的响应较为快速。
BPF分析能力
思考一些问题:
1、创建了哪些新进程,运行时间有多长?
2、为什么CPU系统时间很高?是由于系统调用导致的吗?具体是哪些系统调用。
3、线程每次唤醒时在CPU上花费多长时间
4、运行队列最长时间有多少线程在等待执行
5、不同CPU之间的运行队列是否有平衡
6、为什么某个线程会脱离CPU,脱离CPU时间有多长
7、哪些软中断和硬中断占用了CPU时间
8、当其他队列运行中有需要运行程序时候,哪些CPU处于空闲状态
9、应用程序请求时候三级缓存?命中率是多少?
硬件统计
zhanglei@ubuntu:~$ sudo perf stat -d gzip file1
gzip: file1: No such file or directory
Performance counter stats for 'gzip file1':
1.83 msec task-clock # 0.531 CPUs utilized
3 context-switches # 1.640 K/sec
0 cpu-migrations # 0.000 /sec
62 page-faults # 33.896 K/sec
<not supported> cycles
<not supported> instructions
<not supported> branches
<not supported> branch-misses
0 L1-dcache-loads # 0.000 /sec
0 L1-dcache-load-misses # 0.00% of all L1-dcache accesses
<not supported> LLC-loads
<not supported> LLC-load-misses
0.003442219 seconds time elapsed
0.002568000 seconds user
0.000000000 seconds sys
perf list
perf list 命令可以获取当前处理器和PMC(?Performance Monitoring Counter)列表
zhanglei@ubuntu:~$ sudo perf list
List of pre-defined events (to be used in -e):
ref-cycles [Hardware event]
alignment-faults [Software event]
bpf-output [Software event]
cgroup-switches [Software event]
context-switches OR cs [Software event]
cpu-clock [Software event]
cpu-migrations OR migrations [Software event]
dummy [Software event]
emulation-faults [Software event]
major-faults [Software event]
minor-faults [Software event]
page-faults OR faults [Software event]
task-clock [Software event]
duration_time [Tool event]
L1-dcache-load-misses [Hardware cache event]
L1-dcache-loads [Hardware cache event]
L1-dcache-stores [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
branch-load-misses [Hardware cache event]
branch-loads [Hardware cache event]
dTLB-load-misses [Hardware cache event]
dTLB-loads [Hardware cache event]
:
perf hook cpu的事件
zhanglei@ubuntu:~$ sudo perf stat -e mem_load_retired.l3_hit -e mem_load_retired.l3_miss -a -I 1000
# time counts unit events
1.001626865 0 mem_load_retired.l3_hit
1.001626865 0 mem_load_retired.l3_miss
2.004326782 0 mem_load_retired.l3_hit
2.004326782 0 mem_load_retired.l3_miss
3.006497771 0 mem_load_retired.l3_hit
3.006497771 0 mem_load_retired.l3_miss
4.007369596 0 mem_load_retired.l3_hit
4.007369596 0 mem_load_retired.l3_miss
5.009261293 0 mem_load_retired.l3_hit
5.009261293 0 mem_load_retired.l3_miss
6.011483930 0 mem_load_retired.l3_hit
6.011483930 0 mem_load_retired.l3_miss
7.013964688 0 mem_load_retired.l3_hit
分析cpu的工具
runqlat
zhanglei@ubuntu:~$ sudo runqlat-bpfcc 10 1
[sudo] password for zhanglei:
In file included from <built-in>:2:
In file included from /virtual/include/bcc/bpf.h:12:
In file included from include/linux/types.h:6:
In file included from include/uapi/linux/types.h:14:
In file included from ./include/uapi/linux/posix_types.h:5:
In file included from include/linux/stddef.h:5:
In file included from include/uapi/linux/stddef.h:2:
In file included from include/linux/compiler_types.h:80:
include/linux/compiler-clang.h:41:9: warning: '__HAVE_BUILTIN_BSWAP32__' macro redefined [-Wmacro-redefined]
#define __HAVE_BUILTIN_BSWAP32__
^
<command line>:4:9: note: previous definition is here
#define __HAVE_BUILTIN_BSWAP32__ 1
^
In file included from <built-in>:2:
In file included from /virtual/include/bcc/bpf.h:12:
In file included from include/linux/types.h:6:
In file included from include/uapi/linux/types.h:14:
In file included from ./include/uapi/linux/posix_types.h:5:
In file included from include/linux/stddef.h:5:
In file included from include/uapi/linux/stddef.h:2:
In file included from include/linux/compiler_types.h:80:
include/linux/compiler-clang.h:42:9: warning: '__HAVE_BUILTIN_BSWAP64__' macro redefined [-Wmacro-redefined]
#define __HAVE_BUILTIN_BSWAP64__
^
<command line>:5:9: note: previous definition is here
#define __HAVE_BUILTIN_BSWAP64__ 1
^
In file included from <built-in>:2:
In file included from /virtual/include/bcc/bpf.h:12:
In file included from include/linux/types.h:6:
In file included from include/uapi/linux/types.h:14:
In file included from ./include/uapi/linux/posix_types.h:5:
In file included from include/linux/stddef.h:5:
In file included from include/uapi/linux/stddef.h:2:
In file included from include/linux/compiler_types.h:80:
include/linux/compiler-clang.h:43:9: warning: '__HAVE_BUILTIN_BSWAP16__' macro redefined [-Wmacro-redefined]
#define __HAVE_BUILTIN_BSWAP16__
^
<command line>:3:9: note: previous definition is here
#define __HAVE_BUILTIN_BSWAP16__ 1
^
3 warnings generated.
Tracing run queue latency... Hit Ctrl-C to end.
usecs : count distribution
0 -> 1 : 20 |** |
2 -> 3 : 69 |******** |
4 -> 7 : 74 |******** |
8 -> 15 : 60 |******* |
16 -> 31 : 86 |********** |
32 -> 63 : 144 |***************** |
64 -> 127 : 334 |****************************************|
128 -> 255 : 111 |************* |
256 -> 511 : 31 |*** |
512 -> 1023 : 7 | |
1024 -> 2047 : 1 | |
可以看到cpu切片打不都在255微妙以下
|