628 lines
24 KiB
Markdown
628 lines
24 KiB
Markdown
|
# Executable commands reference
|
|||
|
|
|||
|
[TOC]
|
|||
|
|
|||
|
## How simpleperf works
|
|||
|
|
|||
|
Modern CPUs have a hardware component called the performance monitoring unit (PMU). The PMU has
|
|||
|
several hardware counters, counting events like how many cpu cycles have happened, how many
|
|||
|
instructions have executed, or how many cache misses have happened.
|
|||
|
|
|||
|
The Linux kernel wraps these hardware counters into hardware perf events. In addition, the Linux
|
|||
|
kernel also provides hardware independent software events and tracepoint events. The Linux kernel
|
|||
|
exposes all events to userspace via the perf_event_open system call, which is used by simpleperf.
|
|||
|
|
|||
|
Simpleperf has three main commands: stat, record and report.
|
|||
|
|
|||
|
The stat command gives a summary of how many events have happened in the profiled processes in a
|
|||
|
time period. Here’s how it works:
|
|||
|
1. Given user options, simpleperf enables profiling by making a system call to the kernel.
|
|||
|
2. The kernel enables counters while the profiled processes are running.
|
|||
|
3. After profiling, simpleperf reads counters from the kernel, and reports a counter summary.
|
|||
|
|
|||
|
The record command records samples of the profiled processes in a time period. Here’s how it works:
|
|||
|
1. Given user options, simpleperf enables profiling by making a system call to the kernel.
|
|||
|
2. Simpleperf creates mapped buffers between simpleperf and the kernel.
|
|||
|
3. The kernel enables counters while the profiled processes are running.
|
|||
|
4. Each time a given number of events happen, the kernel dumps a sample to the mapped buffers.
|
|||
|
5. Simpleperf reads samples from the mapped buffers and stores profiling data in a file called
|
|||
|
perf.data.
|
|||
|
|
|||
|
The report command reads perf.data and any shared libraries used by the profiled processes,
|
|||
|
and outputs a report showing where the time was spent.
|
|||
|
|
|||
|
## Commands
|
|||
|
|
|||
|
Simpleperf supports several commands, listed below:
|
|||
|
|
|||
|
```
|
|||
|
The debug-unwind command: debug/test dwarf based offline unwinding, used for debugging simpleperf.
|
|||
|
The dump command: dumps content in perf.data, used for debugging simpleperf.
|
|||
|
The help command: prints help information for other commands.
|
|||
|
The kmem command: collects kernel memory allocation information (will be replaced by Python scripts).
|
|||
|
The list command: lists all event types supported on the Android device.
|
|||
|
The record command: profiles processes and stores profiling data in perf.data.
|
|||
|
The report command: reports profiling data in perf.data.
|
|||
|
The report-sample command: reports each sample in perf.data, used for supporting integration of
|
|||
|
simpleperf in Android Studio.
|
|||
|
The stat command: profiles processes and prints counter summary.
|
|||
|
|
|||
|
```
|
|||
|
|
|||
|
Each command supports different options, which can be seen through help message.
|
|||
|
|
|||
|
```sh
|
|||
|
# List all commands.
|
|||
|
$ simpleperf --help
|
|||
|
|
|||
|
# Print help message for record command.
|
|||
|
$ simpleperf record --help
|
|||
|
```
|
|||
|
|
|||
|
Below describes the most frequently used commands, which are list, stat, record and report.
|
|||
|
|
|||
|
## The list command
|
|||
|
|
|||
|
The list command lists all events available on the device. Different devices may support different
|
|||
|
events because they have different hardware and kernels.
|
|||
|
|
|||
|
```sh
|
|||
|
$ simpleperf list
|
|||
|
List of hw-cache events:
|
|||
|
branch-loads
|
|||
|
...
|
|||
|
List of hardware events:
|
|||
|
cpu-cycles
|
|||
|
instructions
|
|||
|
...
|
|||
|
List of software events:
|
|||
|
cpu-clock
|
|||
|
task-clock
|
|||
|
...
|
|||
|
```
|
|||
|
|
|||
|
On ARM/ARM64, the list command also shows a list of raw events, they are the events supported by
|
|||
|
the ARM PMU on the device. The kernel has wrapped part of them into hardware events and hw-cache
|
|||
|
events. For example, raw-cpu-cycles is wrapped into cpu-cycles, raw-instruction-retired is wrapped
|
|||
|
into instructions. The raw events are provided in case we want to use some events supported on the
|
|||
|
device, but unfortunately not wrapped by the kernel.
|
|||
|
|
|||
|
## The stat command
|
|||
|
|
|||
|
The stat command is used to get event counter values of the profiled processes. By passing options,
|
|||
|
we can select which events to use, which processes/threads to monitor, how long to monitor and the
|
|||
|
print interval.
|
|||
|
|
|||
|
```sh
|
|||
|
# Stat using default events (cpu-cycles,instructions,...), and monitor process 7394 for 10 seconds.
|
|||
|
$ simpleperf stat -p 7394 --duration 10
|
|||
|
Performance counter statistics:
|
|||
|
|
|||
|
1,320,496,145 cpu-cycles # 0.131736 GHz (100%)
|
|||
|
510,426,028 instructions # 2.587047 cycles per instruction (100%)
|
|||
|
4,692,338 branch-misses # 468.118 K/sec (100%)
|
|||
|
886.008130(ms) task-clock # 0.088390 cpus used (100%)
|
|||
|
753 context-switches # 75.121 /sec (100%)
|
|||
|
870 page-faults # 86.793 /sec (100%)
|
|||
|
|
|||
|
Total test time: 10.023829 seconds.
|
|||
|
```
|
|||
|
|
|||
|
### Select events to stat
|
|||
|
|
|||
|
We can select which events to use via -e.
|
|||
|
|
|||
|
```sh
|
|||
|
# Stat event cpu-cycles.
|
|||
|
$ simpleperf stat -e cpu-cycles -p 11904 --duration 10
|
|||
|
|
|||
|
# Stat event cache-references and cache-misses.
|
|||
|
$ simpleperf stat -e cache-references,cache-misses -p 11904 --duration 10
|
|||
|
```
|
|||
|
|
|||
|
When running the stat command, if the number of hardware events is larger than the number of
|
|||
|
hardware counters available in the PMU, the kernel shares hardware counters between events, so each
|
|||
|
event is only monitored for part of the total time. In the example below, there is a percentage at
|
|||
|
the end of each row, showing the percentage of the total time that each event was actually
|
|||
|
monitored.
|
|||
|
|
|||
|
```sh
|
|||
|
# Stat using event cache-references, cache-references:u,....
|
|||
|
$ simpleperf stat -p 7394 -e cache-references,cache-references:u,cache-references:k \
|
|||
|
-e cache-misses,cache-misses:u,cache-misses:k,instructions --duration 1
|
|||
|
Performance counter statistics:
|
|||
|
|
|||
|
4,331,018 cache-references # 4.861 M/sec (87%)
|
|||
|
3,064,089 cache-references:u # 3.439 M/sec (87%)
|
|||
|
1,364,959 cache-references:k # 1.532 M/sec (87%)
|
|||
|
91,721 cache-misses # 102.918 K/sec (87%)
|
|||
|
45,735 cache-misses:u # 51.327 K/sec (87%)
|
|||
|
38,447 cache-misses:k # 43.131 K/sec (87%)
|
|||
|
9,688,515 instructions # 10.561 M/sec (89%)
|
|||
|
|
|||
|
Total test time: 1.026802 seconds.
|
|||
|
```
|
|||
|
|
|||
|
In the example above, each event is monitored about 87% of the total time. But there is no
|
|||
|
guarantee that any pair of events are always monitored at the same time. If we want to have some
|
|||
|
events monitored at the same time, we can use --group.
|
|||
|
|
|||
|
```sh
|
|||
|
# Stat using event cache-references, cache-references:u,....
|
|||
|
$ simpleperf stat -p 7964 --group cache-references,cache-misses \
|
|||
|
--group cache-references:u,cache-misses:u --group cache-references:k,cache-misses:k \
|
|||
|
-e instructions --duration 1
|
|||
|
Performance counter statistics:
|
|||
|
|
|||
|
3,638,900 cache-references # 4.786 M/sec (74%)
|
|||
|
65,171 cache-misses # 1.790953% miss rate (74%)
|
|||
|
2,390,433 cache-references:u # 3.153 M/sec (74%)
|
|||
|
32,280 cache-misses:u # 1.350383% miss rate (74%)
|
|||
|
879,035 cache-references:k # 1.251 M/sec (68%)
|
|||
|
30,303 cache-misses:k # 3.447303% miss rate (68%)
|
|||
|
8,921,161 instructions # 10.070 M/sec (86%)
|
|||
|
|
|||
|
Total test time: 1.029843 seconds.
|
|||
|
```
|
|||
|
|
|||
|
### Select target to stat
|
|||
|
|
|||
|
We can select which processes or threads to monitor via -p or -t. Monitoring a
|
|||
|
process is the same as monitoring all threads in the process. Simpleperf can also fork a child
|
|||
|
process to run the new command and then monitor the child process.
|
|||
|
|
|||
|
```sh
|
|||
|
# Stat process 11904 and 11905.
|
|||
|
$ simpleperf stat -p 11904,11905 --duration 10
|
|||
|
|
|||
|
# Stat thread 11904 and 11905.
|
|||
|
$ simpleperf stat -t 11904,11905 --duration 10
|
|||
|
|
|||
|
# Start a child process running `ls`, and stat it.
|
|||
|
$ simpleperf stat ls
|
|||
|
|
|||
|
# Stat the process of an Android application. This only works for debuggable apps on non-rooted
|
|||
|
# devices.
|
|||
|
$ simpleperf stat --app simpleperf.example.cpp
|
|||
|
|
|||
|
# Stat system wide using -a.
|
|||
|
$ simpleperf stat -a --duration 10
|
|||
|
```
|
|||
|
|
|||
|
### Decide how long to stat
|
|||
|
|
|||
|
When monitoring existing threads, we can use --duration to decide how long to monitor. When
|
|||
|
monitoring a child process running a new command, simpleperf monitors until the child process ends.
|
|||
|
In this case, we can use Ctrl-C to stop monitoring at any time.
|
|||
|
|
|||
|
```sh
|
|||
|
# Stat process 11904 for 10 seconds.
|
|||
|
$ simpleperf stat -p 11904 --duration 10
|
|||
|
|
|||
|
# Stat until the child process running `ls` finishes.
|
|||
|
$ simpleperf stat ls
|
|||
|
|
|||
|
# Stop monitoring using Ctrl-C.
|
|||
|
$ simpleperf stat -p 11904 --duration 10
|
|||
|
^C
|
|||
|
```
|
|||
|
|
|||
|
If you want to write a script to control how long to monitor, you can send one of SIGINT, SIGTERM,
|
|||
|
SIGHUP signals to simpleperf to stop monitoring.
|
|||
|
|
|||
|
### Decide the print interval
|
|||
|
|
|||
|
When monitoring perf counters, we can also use --interval to decide the print interval.
|
|||
|
|
|||
|
```sh
|
|||
|
# Print stat for process 11904 every 300ms.
|
|||
|
$ simpleperf stat -p 11904 --duration 10 --interval 300
|
|||
|
|
|||
|
# Print system wide stat at interval of 300ms for 10 seconds. Note that system wide profiling needs
|
|||
|
# root privilege.
|
|||
|
$ su 0 simpleperf stat -a --duration 10 --interval 300
|
|||
|
```
|
|||
|
|
|||
|
### Display counters in systrace
|
|||
|
|
|||
|
Simpleperf can also work with systrace to dump counters in the collected trace. Below is an example
|
|||
|
to do a system wide stat.
|
|||
|
|
|||
|
```sh
|
|||
|
# Capture instructions (kernel only) and cache misses with interval of 300 milliseconds for 15
|
|||
|
# seconds.
|
|||
|
$ su 0 simpleperf stat -e instructions:k,cache-misses -a --interval 300 --duration 15
|
|||
|
# On host launch systrace to collect trace for 10 seconds.
|
|||
|
(HOST)$ external/chromium-trace/systrace.py --time=10 -o new.html sched gfx view
|
|||
|
# Open the collected new.html in browser and perf counters will be shown up.
|
|||
|
```
|
|||
|
|
|||
|
### Show event count per thread
|
|||
|
|
|||
|
By default, stat cmd outputs an event count sum for all monitored targets. But when `--per-thread`
|
|||
|
option is used, stat cmd outputs an event count for each thread in monitored targets. It can be
|
|||
|
used to find busy threads in a process or system wide. With `--per-thread` option, stat cmd opens
|
|||
|
a perf_event_file for each exisiting thread. If a monitored thread creates new threads, event
|
|||
|
count for new threads will be added to the monitored thread by default, otherwise omitted if
|
|||
|
`--no-inherit` option is also used.
|
|||
|
|
|||
|
```sh
|
|||
|
# Print event counts for each thread in process 11904. Event counts for threads created after
|
|||
|
# stat cmd will be added to threads creating them.
|
|||
|
$ simpleperf stat --per-thread -p 11904 --duration 1
|
|||
|
|
|||
|
# Print event counts for all threads running in the system every 1s. Threads not running will not
|
|||
|
# be reported.
|
|||
|
$ su 0 simpleperf stat --per-thread -a --interval 1000 --interval-only-values
|
|||
|
|
|||
|
# Print event counts for all threads running in the system every 1s. Event counts for threads
|
|||
|
# created after stat cmd will be omitted.
|
|||
|
$ su 0 simpleperf stat --per-thread -a --interval 1000 --interval-only-values --no-inherit
|
|||
|
```
|
|||
|
|
|||
|
### Show event count per core
|
|||
|
|
|||
|
By default, stat cmd outputs an event count sum for all monitored cpu cores. But when `--per-core`
|
|||
|
option is used, stat cmd outputs an event count for each core. It can be used to see how events
|
|||
|
are distributed on different cores.
|
|||
|
When stating non-system wide with `--per-core` option, simpleperf creates a perf event for each
|
|||
|
monitored thread on each core. When a thread is in running state, perf events on all cores are
|
|||
|
enabled, but only the perf event on the core running the thread is in running state. So the
|
|||
|
percentage comment shows runtime_on_a_core / runtime_on_all_cores. Note that, percentage is still
|
|||
|
affected by hardware counter multiplexing. Check simpleperf log output for ways to distinguish it.
|
|||
|
|
|||
|
```sh
|
|||
|
# Print event counts for each cpu running threads in process 11904.
|
|||
|
# A percentage shows runtime_on_a_cpu / runtime_on_all_cpus.
|
|||
|
$ simpleperf stat --per-core -p 11904 --duration 1
|
|||
|
Performance counter statistics:
|
|||
|
|
|||
|
# cpu count event_name # percentage = event_run_time / enabled_time
|
|||
|
7 56,552,838 cpu-cycles # (60%)
|
|||
|
3 25,958,605 cpu-cycles # (20%)
|
|||
|
0 22,822,698 cpu-cycles # (15%)
|
|||
|
1 6,661,495 cpu-cycles # (5%)
|
|||
|
4 1,519,093 cpu-cycles # (0%)
|
|||
|
|
|||
|
Total test time: 1.001082 seconds.
|
|||
|
|
|||
|
# Print event counts for each cpu system wide.
|
|||
|
$ su 0 simpleperf stat --per-core -a --duration 1
|
|||
|
|
|||
|
# Print cpu-cycle event counts for each cpu for each thread running in the system.
|
|||
|
$ su 0 simpleperf stat -e cpu-cycles -a --per-thread --per-core --duration 1
|
|||
|
```
|
|||
|
|
|||
|
## The record command
|
|||
|
|
|||
|
The record command is used to dump samples of the profiled processes. Each sample can contain
|
|||
|
information like the time at which the sample was generated, the number of events since last
|
|||
|
sample, the program counter of a thread, the call chain of a thread.
|
|||
|
|
|||
|
By passing options, we can select which events to use, which processes/threads to monitor,
|
|||
|
what frequency to dump samples, how long to monitor, and where to store samples.
|
|||
|
|
|||
|
```sh
|
|||
|
# Record on process 7394 for 10 seconds, using default event (cpu-cycles), using default sample
|
|||
|
# frequency (4000 samples per second), writing records to perf.data.
|
|||
|
$ simpleperf record -p 7394 --duration 10
|
|||
|
simpleperf I cmd_record.cpp:316] Samples recorded: 21430. Samples lost: 0.
|
|||
|
```
|
|||
|
|
|||
|
### Select events to record
|
|||
|
|
|||
|
By default, the cpu-cycles event is used to evaluate consumed cpu cycles. But we can also use other
|
|||
|
events via -e.
|
|||
|
|
|||
|
```sh
|
|||
|
# Record using event instructions.
|
|||
|
$ simpleperf record -e instructions -p 11904 --duration 10
|
|||
|
|
|||
|
# Record using task-clock, which shows the passed CPU time in nanoseconds.
|
|||
|
$ simpleperf record -e task-clock -p 11904 --duration 10
|
|||
|
```
|
|||
|
|
|||
|
### Select target to record
|
|||
|
|
|||
|
The way to select target in record command is similar to that in the stat command.
|
|||
|
|
|||
|
```sh
|
|||
|
# Record process 11904 and 11905.
|
|||
|
$ simpleperf record -p 11904,11905 --duration 10
|
|||
|
|
|||
|
# Record thread 11904 and 11905.
|
|||
|
$ simpleperf record -t 11904,11905 --duration 10
|
|||
|
|
|||
|
# Record a child process running `ls`.
|
|||
|
$ simpleperf record ls
|
|||
|
|
|||
|
# Record the process of an Android application. This only works for debuggable apps on non-rooted
|
|||
|
# devices.
|
|||
|
$ simpleperf record --app simpleperf.example.cpp
|
|||
|
|
|||
|
# Record system wide.
|
|||
|
$ simpleperf record -a --duration 10
|
|||
|
```
|
|||
|
|
|||
|
### Set the frequency to record
|
|||
|
|
|||
|
We can set the frequency to dump records via -f or -c. For example, -f 4000 means
|
|||
|
dumping approximately 4000 records every second when the monitored thread runs. If a monitored
|
|||
|
thread runs 0.2s in one second (it can be preempted or blocked in other times), simpleperf dumps
|
|||
|
about 4000 * 0.2 / 1.0 = 800 records every second. Another way is using -c. For example, -c 10000
|
|||
|
means dumping one record whenever 10000 events happen.
|
|||
|
|
|||
|
```sh
|
|||
|
# Record with sample frequency 1000: sample 1000 times every second running.
|
|||
|
$ simpleperf record -f 1000 -p 11904,11905 --duration 10
|
|||
|
|
|||
|
# Record with sample period 100000: sample 1 time every 100000 events.
|
|||
|
$ simpleperf record -c 100000 -t 11904,11905 --duration 10
|
|||
|
```
|
|||
|
|
|||
|
To avoid taking too much time generating samples, kernel >= 3.10 sets the max percent of cpu time
|
|||
|
used for generating samples (default is 25%), and decreases the max allowed sample frequency when
|
|||
|
hitting that limit. Simpleperf uses --cpu-percent option to adjust it, but it needs either root
|
|||
|
privilege or to be on Android >= Q.
|
|||
|
|
|||
|
```sh
|
|||
|
# Record with sample frequency 10000, with max allowed cpu percent to be 50%.
|
|||
|
$ simpleperf record -f 1000 -p 11904,11905 --duration 10 --cpu-percent 50
|
|||
|
```
|
|||
|
|
|||
|
### Decide how long to record
|
|||
|
|
|||
|
The way to decide how long to monitor in record command is similar to that in the stat command.
|
|||
|
|
|||
|
```sh
|
|||
|
# Record process 11904 for 10 seconds.
|
|||
|
$ simpleperf record -p 11904 --duration 10
|
|||
|
|
|||
|
# Record until the child process running `ls` finishes.
|
|||
|
$ simpleperf record ls
|
|||
|
|
|||
|
# Stop monitoring using Ctrl-C.
|
|||
|
$ simpleperf record -p 11904 --duration 10
|
|||
|
^C
|
|||
|
```
|
|||
|
|
|||
|
If you want to write a script to control how long to monitor, you can send one of SIGINT, SIGTERM,
|
|||
|
SIGHUP signals to simpleperf to stop monitoring.
|
|||
|
|
|||
|
### Set the path to store profiling data
|
|||
|
|
|||
|
By default, simpleperf stores profiling data in perf.data in the current directory. But the path
|
|||
|
can be changed using -o.
|
|||
|
|
|||
|
```sh
|
|||
|
# Write records to data/perf2.data.
|
|||
|
$ simpleperf record -p 11904 -o data/perf2.data --duration 10
|
|||
|
```
|
|||
|
|
|||
|
#### Record call graphs
|
|||
|
|
|||
|
A call graph is a tree showing function call relations. Below is an example.
|
|||
|
|
|||
|
```
|
|||
|
main() {
|
|||
|
FunctionOne();
|
|||
|
FunctionTwo();
|
|||
|
}
|
|||
|
FunctionOne() {
|
|||
|
FunctionTwo();
|
|||
|
FunctionThree();
|
|||
|
}
|
|||
|
a call graph:
|
|||
|
main-> FunctionOne
|
|||
|
| |
|
|||
|
| |-> FunctionTwo
|
|||
|
| |-> FunctionThree
|
|||
|
|
|
|||
|
|-> FunctionTwo
|
|||
|
```
|
|||
|
|
|||
|
A call graph shows how a function calls other functions, and a reversed call graph shows how
|
|||
|
a function is called by other functions. To show a call graph, we need to first record it, then
|
|||
|
report it.
|
|||
|
|
|||
|
There are two ways to record a call graph, one is recording a dwarf based call graph, the other is
|
|||
|
recording a stack frame based call graph. Recording dwarf based call graphs needs support of debug
|
|||
|
information in native binaries. While recording stack frame based call graphs needs support of
|
|||
|
stack frame registers.
|
|||
|
|
|||
|
```sh
|
|||
|
# Record a dwarf based call graph
|
|||
|
$ simpleperf record -p 11904 -g --duration 10
|
|||
|
|
|||
|
# Record a stack frame based call graph
|
|||
|
$ simpleperf record -p 11904 --call-graph fp --duration 10
|
|||
|
```
|
|||
|
|
|||
|
[Here](README.md#suggestions-about-recording-call-graphs) are some suggestions about recording call graphs.
|
|||
|
|
|||
|
### Record both on CPU time and off CPU time
|
|||
|
|
|||
|
Simpleperf is a CPU profiler, which generates samples for a thread only when it is running on a
|
|||
|
CPU. But sometimes we want to know where the thread time is spent off-cpu (like preempted by other
|
|||
|
threads, blocked in IO or waiting for some events). To support this, simpleperf added a
|
|||
|
--trace-offcpu option to the record command. When --trace-offcpu is used, simpleperf does the
|
|||
|
following things:
|
|||
|
|
|||
|
1) Only cpu-clock/task-clock event is allowed to be used with --trace-offcpu. This let simpleperf
|
|||
|
generate on-cpu samples for cpu-clock event.
|
|||
|
2) Simpleperf also monitors sched:sched_switch event, which will generate a sched_switch sample
|
|||
|
each time the monitored thread is scheduled off cpu.
|
|||
|
3) Simpleperf also records context switch records. So it knows when the thread is scheduled back on
|
|||
|
a cpu.
|
|||
|
|
|||
|
The samples and context switch records collected by simpleperf for a thread are shown below:
|
|||
|
|
|||
|

|
|||
|
|
|||
|
Here we have two types of samples:
|
|||
|
1) on-cpu samples generated for cpu-clock event. The period value in each sample means how many
|
|||
|
nanoseconds are spent on cpu (for the callchain of this sample).
|
|||
|
2) off-cpu (sched_switch) samples generated for sched:sched_switch event. The period value is
|
|||
|
calculated as **Timestamp of the next switch on record** minus **Timestamp of the current sample**
|
|||
|
by simpleperf. So the period value in each sample means how many nanoseconds are spent off cpu
|
|||
|
(for the callchain of this sample).
|
|||
|
|
|||
|
**note**: In reality, switch on records and samples may lost. To mitigate the loss of accuracy, we
|
|||
|
calculate the period of an off-cpu sample as **Timestamp of the next switch on record or sample**
|
|||
|
minus **Timestamp of the current sample**.
|
|||
|
|
|||
|
When reporting via python scripts, simpleperf_report_lib.py provides SetTraceOffCpuMode() method
|
|||
|
to control how to report the samples:
|
|||
|
1) on-cpu mode: only report on-cpu samples.
|
|||
|
2) off-cpu mode: only report off-cpu samples.
|
|||
|
3) on-off-cpu mode: report both on-cpu and off-cpu samples, which can be split by event name.
|
|||
|
4) mixed-on-off-cpu mode: report on-cpu and off-cpu samples under the same event name.
|
|||
|
|
|||
|
If not set, mixed-on-off-cpu mode will be used to report.
|
|||
|
|
|||
|
When using report_html.py, inferno and report_sample.py, the report mode can be set by
|
|||
|
--trace-offcpu option.
|
|||
|
|
|||
|
Below are some examples recording and reporting trace offcpu profiles.
|
|||
|
|
|||
|
```sh
|
|||
|
# Check if --trace-offcpu is supported by the kernel (should be available on kernel >= 4.2).
|
|||
|
$ simpleperf list --show-features
|
|||
|
trace-offcpu
|
|||
|
...
|
|||
|
|
|||
|
# Record with --trace-offcpu.
|
|||
|
$ simpleperf record -g -p 11904 --duration 10 --trace-offcpu -e cpu-clock
|
|||
|
|
|||
|
# Record system wide with --trace-offcpu.
|
|||
|
$ simpleperf record -a -g --duration 3 --trace-offcpu -e cpu-clock
|
|||
|
|
|||
|
# Record with --trace-offcpu using app_profiler.py.
|
|||
|
$ ./app_profiler.py -p com.google.samples.apps.sunflower \
|
|||
|
-r "-g -e cpu-clock:u --duration 10 --trace-offcpu"
|
|||
|
|
|||
|
# Report on-cpu samples.
|
|||
|
$ ./report_html.py --trace-offcpu on-cpu
|
|||
|
# Report off-cpu samples.
|
|||
|
$ ./report_html.py --trace-offcpu off-cpu
|
|||
|
# Report on-cpu and off-cpu samples under different event names.
|
|||
|
$ ./report_html.py --trace-offcpu on-off-cpu
|
|||
|
# Report on-cpu and off-cpu samples under the same event name.
|
|||
|
$ ./report_html.py --trace-offcpu mixed-on-off-cpu
|
|||
|
```
|
|||
|
|
|||
|
## The report command
|
|||
|
|
|||
|
The report command is used to report profiling data generated by the record command. The report
|
|||
|
contains a table of sample entries. Each sample entry is a row in the report. The report command
|
|||
|
groups samples belong to the same process, thread, library, function in the same sample entry. Then
|
|||
|
sort the sample entries based on the event count a sample entry has.
|
|||
|
|
|||
|
By passing options, we can decide how to filter out uninteresting samples, how to group samples
|
|||
|
into sample entries, and where to find profiling data and binaries.
|
|||
|
|
|||
|
Below is an example. Records are grouped into 4 sample entries, each entry is a row. There are
|
|||
|
several columns, each column shows piece of information belonging to a sample entry. The first
|
|||
|
column is Overhead, which shows the percentage of events inside the current sample entry in total
|
|||
|
events. As the perf event is cpu-cycles, the overhead is the percentage of CPU cycles used in each
|
|||
|
function.
|
|||
|
|
|||
|
```sh
|
|||
|
# Reports perf.data, using only records sampled in libsudo-game-jni.so, grouping records using
|
|||
|
# thread name(comm), process id(pid), thread id(tid), function name(symbol), and showing sample
|
|||
|
# count for each row.
|
|||
|
$ simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so \
|
|||
|
--sort comm,pid,tid,symbol -n
|
|||
|
Cmdline: /data/data/com.example.sudogame/simpleperf record -p 7394 --duration 10
|
|||
|
Arch: arm64
|
|||
|
Event: cpu-cycles (type 0, config 0)
|
|||
|
Samples: 28235
|
|||
|
Event count: 546356211
|
|||
|
|
|||
|
Overhead Sample Command Pid Tid Symbol
|
|||
|
59.25% 16680 sudogame 7394 7394 checkValid(Board const&, int, int)
|
|||
|
20.42% 5620 sudogame 7394 7394 canFindSolution_r(Board&, int, int)
|
|||
|
13.82% 4088 sudogame 7394 7394 randomBlock_r(Board&, int, int, int, int, int)
|
|||
|
6.24% 1756 sudogame 7394 7394 @plt
|
|||
|
```
|
|||
|
|
|||
|
### Set the path to read profiling data
|
|||
|
|
|||
|
By default, the report command reads profiling data from perf.data in the current directory.
|
|||
|
But the path can be changed using -i.
|
|||
|
|
|||
|
```sh
|
|||
|
$ simpleperf report -i data/perf2.data
|
|||
|
```
|
|||
|
|
|||
|
### Set the path to find binaries
|
|||
|
|
|||
|
To report function symbols, simpleperf needs to read executable binaries used by the monitored
|
|||
|
processes to get symbol table and debug information. By default, the paths are the executable
|
|||
|
binaries used by monitored processes while recording. However, these binaries may not exist when
|
|||
|
reporting or not contain symbol table and debug information. So we can use --symfs to redirect
|
|||
|
the paths.
|
|||
|
|
|||
|
```sh
|
|||
|
# In this case, when simpleperf wants to read executable binary /A/b, it reads file in /A/b.
|
|||
|
$ simpleperf report
|
|||
|
|
|||
|
# In this case, when simpleperf wants to read executable binary /A/b, it prefers file in
|
|||
|
# /debug_dir/A/b to file in /A/b.
|
|||
|
$ simpleperf report --symfs /debug_dir
|
|||
|
|
|||
|
# Read symbols for system libraries built locally. Note that this is not needed since Android O,
|
|||
|
# which ships symbols for system libraries on device.
|
|||
|
$ simpleperf report --symfs $ANDROID_PRODUCT_OUT/symbols
|
|||
|
```
|
|||
|
|
|||
|
### Filter samples
|
|||
|
|
|||
|
When reporting, it happens that not all records are of interest. The report command supports four
|
|||
|
filters to select samples of interest.
|
|||
|
|
|||
|
```sh
|
|||
|
# Report records in threads having name sudogame.
|
|||
|
$ simpleperf report --comms sudogame
|
|||
|
|
|||
|
# Report records in process 7394 or 7395
|
|||
|
$ simpleperf report --pids 7394,7395
|
|||
|
|
|||
|
# Report records in thread 7394 or 7395.
|
|||
|
$ simpleperf report --tids 7394,7395
|
|||
|
|
|||
|
# Report records in libsudo-game-jni.so.
|
|||
|
$ simpleperf report --dsos /data/app/com.example.sudogame-2/lib/arm64/libsudo-game-jni.so
|
|||
|
```
|
|||
|
|
|||
|
### Group samples into sample entries
|
|||
|
|
|||
|
The report command uses --sort to decide how to group sample entries.
|
|||
|
|
|||
|
```sh
|
|||
|
# Group records based on their process id: records having the same process id are in the same
|
|||
|
# sample entry.
|
|||
|
$ simpleperf report --sort pid
|
|||
|
|
|||
|
# Group records based on their thread id and thread comm: records having the same thread id and
|
|||
|
# thread name are in the same sample entry.
|
|||
|
$ simpleperf report --sort tid,comm
|
|||
|
|
|||
|
# Group records based on their binary and function: records in the same binary and function are in
|
|||
|
# the same sample entry.
|
|||
|
$ simpleperf report --sort dso,symbol
|
|||
|
|
|||
|
# Default option: --sort comm,pid,tid,dso,symbol. Group records in the same thread, and belong to
|
|||
|
# the same function in the same binary.
|
|||
|
$ simpleperf report
|
|||
|
```
|
|||
|
|
|||
|
#### Report call graphs
|
|||
|
|
|||
|
To report a call graph, please make sure the profiling data is recorded with call graphs,
|
|||
|
as [here](#record-call-graphs).
|
|||
|
|
|||
|
```
|
|||
|
$ simpleperf report -g
|
|||
|
```
|