文章

perf性能分析(5) -- linux perf 工具介绍

1. perf 介绍

perf及子命令可以测量/记录系统性能,可以记录的性能数据项繁多。包括CPU/PMU等硬件数据,以及software counter/tracepoint等系统内核采集的数据。可以关注的几类:

  • CPU / PMU (Performance Monitoring Unit)数据。包括: dTLB, iTLB, cache 计数以及miss计数;branchbranch miss计数。
  • memory 延时、阻塞;
  • bus延时、阻塞;
  • front end/back end阻塞;
  • virtual memory相关: TLB相关。
  • pipeline相关。

查看perf命令及子命令帮助信息:

1
2
3
4
5
6
7
8
9
man perf

man perf-top

man perf-stat

man perf-record

man perf-report

查看perf所有子命令:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
$ perf

 usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]

 The most commonly used perf commands are:
   annotate        Read perf.data (created by perf record) and display annotated code
   archive         Create archive with object files with build-ids found in perf.data file
   bench           General framework for benchmark suites
   buildid-cache   Manage build-id cache.
   buildid-list    List the buildids in a perf.data file
   c2c             Shared Data C2C/HITM Analyzer.
   config          Get and set variables in a configuration file.
   daemon          Run record sessions on background
   data            Data file related processing
   diff            Read perf.data files and display the differential profile
   evlist          List the event names in a perf.data file
   ftrace          simple wrapper for kernel's ftrace functionality
   inject          Filter to augment the events stream with additional information
   iostat          Show I/O performance metrics
   kallsyms        Searches running kernel for symbols
   kvm             Tool to trace/measure kvm guest os
   list            List all symbolic event types
   mem             Profile memory accesses
   record          Run a command and record its profile into perf.data
   report          Read perf.data (created by perf record) and display the profile
   script          Read perf.data (created by perf record) and display trace output
   stat            Run a command and gather performance counter statistics
   test            Runs sanity tests.
   top             System profiling tool.
   version         display the version of perf binary
   probe           Define new dynamic tracepoints

 See 'perf help COMMAND' for more information on a specific command.

查看子命令的帮助信息,如perf stat

1
2
3
4
5
6
7
8
9
10
$ perf stat -h

 Usage: perf stat [<options>] [<command>]

    -a, --all-cpus        system-wide collection from all CPUs
    -A, --no-aggr         disable aggregation across CPUs or PMUs
    -B, --big-num         print large numbers with thousands' separators
    -C, --cpu <cpu>       list of cpus to monitor in system-wide
    -D, --delay <n>       ms to wait before starting measurement after program start (-1: start with events disabled)
    # ......

2. Events

perf 记录的性能数据项,称为events。主要分为软件 Events 和硬件 Events

软件 Events 比如有:context-switchs, minor-fault等等。 硬件 Events 主要记录micro-architecture相关性能数据,由CPU/PMU提供。如果硬件没有提供,该对应该event不可用。

使用 perf list 查看 perf 支持的 events

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
$ perf list

List of pre-defined events (to be used in -e or -M):

  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]
  bus-cycles                                         [Hardware event]
  cache-misses                                       [Hardware event]
  cache-references                                   [Hardware event]
  cpu-cycles OR cycles                               [Hardware event]
  instructions                                       [Hardware event]
  ref-cycles                                         [Hardware event]
  alignment-faults                                   [Software event]
  bpf-output                                         [Software event]
  cgroup-switches                                    [Software event]
  context-switches OR cs                             [Software event]
  cpu-clock                                          [Software event]
  cpu-migrations OR migrations                       [Software event]
  dummy                                              [Software event]
  emulation-faults                                   [Software event]
  major-faults                                       [Software event]
  minor-faults                                       [Software event]
  page-faults OR faults                              [Software event]
  task-clock                                         [Software event]
  duration_time                                      [Tool event]
  user_time                                          [Tool event]
  system_time                                        [Tool event]

cpu:
  L1-dcache-loads OR cpu/L1-dcache-loads/
  L1-dcache-load-misses OR cpu/L1-dcache-load-misses/
  L1-dcache-stores OR cpu/L1-dcache-stores/
  L1-icache-load-misses OR cpu/L1-icache-load-misses/
  LLC-loads OR cpu/LLC-loads/
  LLC-load-misses OR cpu/LLC-load-misses/
  LLC-stores OR cpu/LLC-stores/
  LLC-store-misses OR cpu/LLC-store-misses/
  dTLB-loads OR cpu/dTLB-loads/
  dTLB-load-misses OR cpu/dTLB-load-misses/
  dTLB-stores OR cpu/dTLB-stores/
  dTLB-store-misses OR cpu/dTLB-store-misses/
  iTLB-loads OR cpu/iTLB-loads/
  iTLB-load-misses OR cpu/iTLB-load-misses/
  branch-loads OR cpu/branch-loads/
  branch-load-misses OR cpu/branch-load-misses/
  node-loads OR cpu/node-loads/
  node-load-misses OR cpu/node-load-misses/
  node-stores OR cpu/node-stores/
  node-store-misses OR cpu/node-store-misses/
  branch-instructions OR cpu/branch-instructions/    [Kernel PMU event]
  branch-misses OR cpu/branch-misses/                [Kernel PMU event]
  bus-cycles OR cpu/bus-cycles/                      [Kernel PMU event]
  cache-misses OR cpu/cache-misses/                  [Kernel PMU event]
  cache-references OR cpu/cache-references/          [Kernel PMU event]
  cpu-cycles OR cpu/cpu-cycles/                      [Kernel PMU event]
  instructions OR cpu/instructions/                  [Kernel PMU event]
  mem-loads OR cpu/mem-loads/                        [Kernel PMU event]
  mem-stores OR cpu/mem-stores/                      [Kernel PMU event]
  # ......

3. 使用 perf stat 记录性能数据,并输出到终端

使用 perf stat 命令,可以运行被测试程序,并在程序结束之后,统计各不同Events的计数,并打印出来。也可以使用-p参数,指定运行程序的进程号。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ perf stat ls -l /usr/bin/ls

 Performance counter stats for 'ls -lh /home/hxf0223/':

              1.65 msec task-clock                       #    0.757 CPUs utilized
                 0      context-switches                 #    0.000 /sec
                 0      cpu-migrations                   #    0.000 /sec
               112      page-faults                      #   67.878 K/sec
         4,892,473      cycles                           #    2.965 GHz
         3,714,068      instructions                     #    0.76  insn per cycle
           681,289      branches                         #  412.897 M/sec
            20,522      branch-misses                    #    3.01% of all branches

       0.002179855 seconds time elapsed

       0.000000000 seconds user
       0.002198000 seconds sys

可以指定需要记录的events (-e参数),以及repeat次数(-r参数):

1
2
3
4
5
6
7
perf  stat -r 6 -e  cache-misses ls -lh ~/

 Performance counter stats for 'ls -lh /home/hxf0223' (6 runs):

            20,763      cache-misses                                                            ( +- 14.14% )

         0.0015583 +- 0.0000369 seconds time elapsed  ( +-  2.37% )

4. 记录性能数据到文件,以及分析:perf record, perf report, perf annotate

4.1 perf record 记录性能数据到文件

使用 perf record 命令,运行被测试程序,并记录测量数据到perf.data数据文件。

1
2
3
4
5
6
7
8
9
10
$ perf record ./test_grain_size

./test_grain_size. Process ID: 6226
Duration: 232.74 seconds
Sum: 1.25e+07. loop num: 10000
[ perf record: Woken up 934 times to write data ]
[ perf record: Captured and wrote 283.898 MB perf.data (7441535 samples) ]

$ ls -lh perf.data
-rw------- 1 <groupname> <username> 284M Oct 31 21:25 perf.data

4.2 perf report 查看性能数据

使用 perf report 命令,查看perf.data数据文件。使用如下命令,直接打开perf.data文件:

1
perf report

4.3 perf annotate 显示源码级别的性能数据

1
perf annotate -i perf.data

如何编译使用-ggdb,则可以显示源码级别的性能数据。

参考资料

本文由作者按照 CC BY 4.0 进行授权