Nsight Systems provides a simple interface to profile on localhost or manage multiple connections to Linux or Windows based devices via SSH. The network connections manager can be launched through the device selection dropdown:
The dialog has simple controls that allow adding, removing, and modifying connections:
Security notice: SSH is only used to establish the initial connection to a target device, perform checks, and upload necessary files. The actual profiling commands and data are transferred through a raw, unencrypted socket. Nsight Systems should not be used in a network setup where MITM (man-in-the-middle) attack is possible, or where untrusted parties may have network access to the target device.
While connecting to the target device, you will be prompted to input the user's password. Please note that if you choose to remember the password, it will be stored in plain text in the configuration file on the host. Stored passwords are bound to the public key fingerprint of the remote device.
To perform remote profiling to a target Windows based machines, install and configure an OpenSSH Server on the target machine.
The No authentication option is useful for devices configured for passwordless
login using root
username. To enable such a configuration, edit the file
/etc/ssh/sshd_config
on the target and specify the following option:
PermitRootLogin yes
Then set empty password using passwd
and restart the SSH service with
service ssh restart
.
Open ports: The Nsight Systems daemon requires port 22 and port 45555 to be open for listening. You can confirm that these ports are open with the following command:
sudo firewall-cmd --list-ports --permanent sudo firewall-cmd --reload
To open a port use the following command, skip --permanent option to open only for this session:
sudo firewall-cmd --permanent --add-port 45555/tcp sudo firewall-cmd --reload
Likewise if you are running on a cloud system you must open port 22 and port 45555 for ingress.
System wide profiling is only available on x86 for Linux targets and only when run with root privileges.
Ftrace Events Collection
Select Ftrace events
Choose which events you would like to collect.
(BETA) GPU Context Switch Trace
Tracing of context switching on the GPU is enabled with driver r435.17 or higher. Note that GPU context switch trace is only supported on this version for single GPU systems.
Here is a screenshot showing three CUDA kernels running simultaneously in three different CUDA contexts on a single GPU.
Target sampling options on Linux:
Three different backtrace collections options are available when sampling CPU instruction pointers. Backtraces can be generated using Intel (c) Last Branch Record (LBR) registers. LBR backtraces generate minimal overhead but the backtraces have limited depth. Backtraces can also be generated using DWARF debug data. DWARF backtraces incur more overhead than LBR backtraces but have much better depth. Finally, backtraces can be generated using frame pointers. Frame pointer backtraces incur medium overhead and have good depth but only resolve frames in the portions of the application and its libraries (including 3rd party libraries) that were compiled with frame pointers enabled. Normally, frame pointers are disabled by default during compilation.
By default, Nsight Systems will use Intel(c) LBRs if available and fall back to using dwarf unwind if they are not. Choose modes... will allow you to override the default.
The Include child processes switch controls whether API tracing is only for the launched process, or for all existing and new child processes of the launched process. If you are running your application through a script, for example a bash script, you need to set this checkbox.
The Include child processes switch does not control sampling in this version of Nsight Systems. The full process tree will be sampled regardless of this setting. This will be fixed in a future version of the product.
Nsight Systems can sample one process tree. Sampling here means interrupting each processor after a certain number of events and collecting an instruction pointer (IP) / backtrace sample if the processor is executing the profilee.
When sampling the CPU on an x86_64 target, Nsight Systems traces thread context switches and infers thread state as either Running or Blocked. Note that Blocked in the timeline indicates the thread may be Blocked (Interruptible) or Blocked (Uninterruptible). Blocked (Uninterruptible) often occurs when a thread has transitioned into the kernel and cannot be interrupted by a signal. Sampling can be enhanced with OS runtime libraries tracing; see OS Runtime Libraries Trace for more information.
Most applications use stripped libraries. In this case, many symbols may stay unresolved. If unstripped libraries exist, paths to them can be specified using the Symbol locations... button. Symbol resolution happens on host, and therefore does not affect performance of profiling on the target.
Additionally, debug versions of ELF files may be picked up from the target system. Refer to the corresponding section of the documentation.
Target sampling options on Windows:
Nsight Systems can sample one process tree. Sampling here means interrupting each processor periodically. The sampling rate is defined in the project settings and is either 100Hz, 1KHz (default value), 2Khz, 4KHz or 8KHz.
On Windows, Nsight Systems can collect thread activity of one process tree. Collecting thread activity means that each thread context switch event is logged and (optionally) a backtrace is collected at the point that the thread is scheduled back for execution. Thread states are displayed on the timeline.
If it was collected, the thread backtrace is displayed when hovering over a region where the thread execution is blocked.
Symbol locations:
Symbol resolution happens on host, and therefore does not affect performance of profiling on the target.
Press the Symbol locations... button to open the Configure debug symbols location dialog.
Use this dialog to specify:
To use a symbol server:
a. Install Debugging Tools for Windows, a part of the Windows 10 SDK
b. Add the symbol server URL using the Add Server button
Information about Microsoft's public symbol server, which enables getting Windows operating system related debug symbols can be found here.
Press the hotkey to start and/or stop a trace session from within the target application’s graphic window. This is useful when tracing games and graphic applications that use fullscreen display. In these scenarios switching to Nsight Systems’ UI would unnecessarily introduce the window-manager’s footprint into the trace. To enable the use of Hotkey check the Hotkey checkbox in the project settings page:
The default hotkey is F12.
A different hotkey binding can be configured by setting the HotKeyIntValue configuration field in the config.ini file.
Set the decimal numeric identifier of the hotkey you would like to use for triggering start/stop from the target app graphics window. The default value is 123 which corresponds to 0x7B, or the F12 key.
Virtual key identifiers are detailed in MSDN's Virtual-Key Codes.
Note that you must convert the hexadecimal values detailed in this page to their decimal counterpart before using them in the file. For example, to use the F1 key as a start/stop trace hotkey, use the following settings in the config.ini file:
HotKeyIntValue=112
Copyright (c) 2012-2019, NVIDIA Corporation. All rights reserved.