NVIDIA System Profiler provides a simple interface to manage multiple connections to Linux-based devices via SSH. The network connections manager can be launched through the device selection dropdown:
The dialog has simple controls that allow adding, removing, and modifying connections:
Security notice: SSH is only used to establish the initial connection to the target device, perform checks, and upload necessary files. The actual profiling commands and data is being transfered through a raw, unencrypted socket. NVIDIA System Profiler should not be used in a network setup where MITM (man-in-the-middle) attack is possible, or where untrusted parties may have network access to the target device.
While connecting to the target device, you will be prompted to input the user's password. Please note that if you choose to remember the password, it will be stored in plain text in the configuration file on the host. Stored passwords are bound to the public key fingerprint of the remote device.
The No authentication option is useful for devices configured for passwordless
login using root
username. To enable such a configuration, edit the file
/etc/ssh/sshd_config
on the target and specify the following option:
PermitRootLogin yes
Then set empty password using passwd
and restart the SSH service with
service ssh restart
.
Trace all processes — on compatible devices (with kernel module support version 1.107 or higher), enables trace of all processes and threads in the system. Scheduler events from all tasks will be recorded.
Collect PMU counters allows to choose which ARM PMU (Performance Monitoring Unit) counters NVIDIA System Profiler will sample. Enable specific counters when interested in correlating cache misses to functions in your application.
Currently NVIDIA System Profiler can only sample one process. Sampling here means that the profilee will be stopped periodically, and backtraces of active threads will be recorded.
Sampling rate defines how often the profilee will be interrupted in order to collect backtraces. This option has to be used carefully: perfect value depends on many factors.
Low sampling rate requires longer profiling session to produce statistically significant information. This is especially important when additional time-based filtering is then applied in the report view to analyze smaller periods of time.
High sampling rate, on the other hand, produces higher overhead. For some applications, it might hide performance issues related to waiting other work to complete, such as waiting to acquire mutex locks, waiting for GPU synchronization or waiting for network or disk-based IO to complete.
In some cases, forcing to use frame pointers as unwind algorithm is a viable option to reduce performance overhead. Unwind (backtracing) algorithms can be configured using the Choose modes... button.
Most applications use stripped libraries. In this case, many symbols may stay unresolved. If unstripped libraries exist, paths to them can be specified using the Symbol locations... button. Symbol resolution happens on host, and therefore does not affect performance of profiling on the target.
Additionally, debug versions of ELF files may be picked up from the target system. Refer to the corresponding section of the documentation.
NVIDIA System Profiler can work with Linux-based devices in three modes:
The purpose of the configuration here is to define which process the profiler will attach to for sampling and tracing. Additionally, the profiler can launch a process prior to attaching to it, ensuring that all environment variables are set correctly to successfully collect trace information.
In Attach only mode, the process is selected by its name and command line
arguments, as visible using the ps
tool.
In Attach or launch mode, the process is to first search as if in the Attach only mode, but if it is not found, the process is launched using the same path and command line arguments. If NVTX, CUDA, or other trace settings are selected, the process will be automatically launched with appropriate environment variables.
Note that in some cases, the capabilities of NVIDIA System Profiler are not sufficient to correctly launch the application; for example, if certain environment variables have to be corrected. In this case, the application has to be started manually and NVIDIA System Profiler should be used in Attach only mode.
The Edit arguments... link will open an editor window, where every command line argument is edited on a separate line. This is convenient when arguments contain spaces or quotes.
To properly populate the Search criteria field based on a currently running process on the target system, use the Select a process button on the right, which has ellipsis as the caption. The list of processes is automatically refreshed upon opening.
Attach by PID mode should be used to connect to a specific process.
To choose one of the currently running processes on the target system, use the Select a process button on the right.
Collect CUDA trace. See CUDA Trace for more information.
Collect NVTX trace. See NVTX Trace for more information.
Collect OpenGL trace. See OpenGL Trace for more information.
Collect GPU context switch trace. See GPU Context Switch Trace for more information.
NVIDIA® System Profiler Documentation Rev. 3.9.170817 ©2017. NVIDIA Corporation. All Rights Reserved.