The Nsight Systems CLI provides a simple interface to collect on a target without using the GUI. The collected data can then be copied to any system and analyzed later.
The CLI is distributed in the Target directory of the standard Nsight Systems download package. Users who want to install the CLI as a standalone tool can do so by copying the files within the Target directory. If you want the CLI output file (.qdstrm) to be auto-converted (to .qdrep) after the analysis is complete, you will need to copy the host directory as well.
If you wish to run the CLI without root (recommended mode), you will want to install in a directory where you have full access.
The Nsight Systems command lines can have one of two forms:
nsys [global_option]
or
nsys [command_switch] [optional command_switch_options] [application] [optional application_options]
All command line options are case sensitive. For command switch options, when
short options are used, the parameters should follow the switch after a space;
e.g. -s cpu
. When long options are used, the switch should be followed by an
equal sign and then the parameter(s); e.g. --sample=cpu
.
For this version of Nsight Systems, you must launch a process from the command line to begin analysis. If an instance of the requested process is already running when the CLI command is issued, the collection will fail. The launched process will be terminated when collection is complete unless the user specifies the --kill none option (details below).
The Nsight Systems CLI supports concurrent analysis by using sessions. Each Nsight Systems session is defined by a sequence of CLI commands that define one or more collections (e.g. when and what data is collected). A session begins with either a start, launch, or profile command. A session ends with a shutdown command, when a profile command terminates, or, if requested, when all the process tree(s) launched in the session exit. Multiple sessions can run concurrently on the same system.
A couple of notes about the use of paths in your command line.
The Nsight Systems command line interface does not handle paths with spaces properly. Please use paths without spaces
If you run a command (like python X Y Z
) from a directory where the command
is not located (like /home/mystuff
), and the directory includes a subdirectory
with the same name as the command (like /home/mystuff/python
), the command line
parser will interpret that as "/home/mystuff/python X Y Z"
. This will not work
because python, in this context, would reference the directory, not an
executable. Please either run from the command's home directory or use the
full path to the command.
Global Option Short | Global Option Long | Description |
---|---|---|
-h | --help | Help message providing information about available command switches and their options. |
-v | --version | Output Nsight Systems CLI version information. |
The Nsight Systems command line interface can be used in two modes. You
may launch your application and begin analysis with options specified to the
nsys profile
command. Alternatively, you can control the launch of an
application and data collection using interactive CLI commands.
Command | Description |
---|---|
profile | A fully formed profiling description requiring and accepting no further input. The command switch options used (see below table) determine when the collection starts, stops, what collectors are used (e.g. API trace, IP sampling, etc.), what processes are monitored, etc. |
start | Start a collection in interactive mode. The start command can be executed before or after a launch command. |
stop | Stop a collection that was started in interactive mode. When executed, all active collections stop, the CLI process terminates but the application continues running. |
cancel | Cancels an existing collection started in interactive mode. All data already collected in the current collection is discarded. |
launch | In interactive mode, launches an application in an environment that supports the requested options. The launch command can be executed before or after a start command. |
shutdown | Disconnects the CLI process from the launched application and forces the CLI process to exit. If a collection is pending or active, it is cancelled |
export | Generates an export file from an existing .qdrep file. For more information about the exported formats see the /documentation/nsys-exporter directory in your Nsight Systems installation directory. |
status | Reports on the status of a CLI-based collection or the suitability of the profiling environment. |
sessions | Gives information about all sessions running on the system. |
nvprof | Special option to help with transition from legacy NVIDIA nvprof tool. Calling "nsys nvprof [options]" will provide the best available translation of "nvprof [options]" See Migrating from NVIDIA nvprof topic for details. No additional functionality of nsys will be available when using this option. Note: Not available on IBM Power targets. |
After choosing the profile
command switch, the following options are
available. Usage:
nsys [global-options] profile [options] <application> [application-arguments]
Short | Long | Possible Parameters | Default | Switch Description |
---|---|---|---|---|
-t | --trace | cublas, cuda, cudnn, nvtx, opengl, openacc, openmp, osrt, mpi, vulkan, none | cuda, opengl, nvtx, osrt | Select the API(s) to be traced. The osrt switch controls the OS runtime libraries tracing. Multiple APIs can be selected, separated by commas only (no spaces). Since OpenACC, cuDNN and cuBLAS APIs are tightly linked with CUDA, selecting one of those APIs will automatically enable CUDA tracing. See information on --mpi-impl option below if mpi is selected. If the none option is selected, no APIs are traced and no other API can be selected. Note: cublas, cudnn, opengl, and vulkan are not available on IBM Power target. |
--mpi-impl | openmpi,mpich | openmpi | When using --trace=mpi to trace MPI APIs use --mpi-impl to specify which MPI implementation the application is using. If you are using a different MPI implementation, see Tracing MPI API calls section below. Calling --mpi-impl without --trace=mpi is not supported. | |
-s | --sample | cpu, none | cpu | Select whether or not to collect CPU samples. If none is selected, sampling is disabled. |
-b | --backtrace | fp,lbr,dwarf,none | lbr | Select the backtrace method to use while sampling. The option lbr uses Intel(c) Corporation's Last Branch Records, available only with Intel(c) CPUs codenamed Haswell and later. The option fp is frame pointer and assumes that frame pointers were enabled during compilation. The option dwarf uses DWARF's CFI (Call Frame Information). Note: Only frame pointers are available for backtrace on IBM Power |
--command-file | < filename > | none | Open a file that contains profile switches and parse the switches. Note additional switches on the command line will override switches in the file. | |
-y | --delay | < seconds > | 0 | Collection start delay in seconds. |
-d | --duration | < seconds > | NA | Collection duration in seconds, duration must be greater than zero. Note that the profiler does not detach from the application, it lives until application termination. |
-e | --env-var | A=B | NA | Set environment variable(s) for the application process to be launched. Environment variables should be defined as A=B. Multiple environment variables can be specified as A=B,C=D. |
--osrt-threshold | < nanoseconds > | 1000 ns | Set the minimum time that a OS Runtime event must take before it is collected. Setting this value too low can cause high application overhead and seriously increase the size of your results file. Note: Not available for IBM Power targets. | |
--cudabacktrace | true,false | false | When tracing CUDA APIs, this option enables collection of a backtrace when a CUDA API is invoked. This may lead to significant runtime overhead. See the --cudabacktrace-threshold switch. Note: CPU sampling must be enabled to collect CUDA API backtraces. Note: Not available on IBM Power Targets. | |
--cudabacktrace-threshold | < nanoseconds > | 1000 ns | Set the duration, in nanoseconds, that CUDA APIs must execute before backtraces are collected. Setting this value too low can cause high application overhead and seriously increase the size of your results file. Note: Not relevant to IBM Power targets. | |
-o | --output | < filename > | report# | Set the .qdstrm filename. Any %q{ENV_VAR} pattern in the name will be substituted with the value of the environment variable. Any %h pattern in the filename will be substituted with the hostname of the system. The extension .qdstrm will be automatically appended. The default is report1.qdstrm, with the number incrementing to avoid overwriting files, in /home/user/nvidia_nsight_systems working directory. |
--export | sqlite, none | none | Create additional output file(s) based on the data collected. Current options are sqlite or none. WARNING: If the collection captures a large amount of data, creating the database file may take several minutes to complete. | |
--stats | true, false | false | Generate summary statistics after the collection. WARNING: When set to true, an SQLite database will be created after the collection. If the collection captures a large amount of data, creating the database file may take several minutes to complete. | |
-f | --force-overwrite | true, false | false | If true, overwrite all existing result files with same output filename (.qdstrm,.qdrep, .sqlite) |
-w | --show-output | true, false | true | If true, send target process’ stdout and stderr streams to the console. |
-n | --inherit-environment | true, false | true | When true, the current environment variables and the tool’s environment variables will be specified for the launched process. When false, only the tool’s environment variables will be specified for the launched process. |
-x | --stop-on-exit | true, false | true | If true, stop collecting automatically when the launched process has exited or when the duration expires - whichever occurs first. If false, duration must be set and the collection stops only when the duration expires. Nsight Systems does not officially support runs longer than 5 minutes. |
--wait | primary,all | all | If primary, the CLI will wait on the application process termination. If all, the CLI will additionally wait on re-parented processes created by the application. | |
--trace-fork-before-exec | true, false | false | If true, trace any child process after fork and before they call one of the exec functions. Beware, tracing in this interval relies on undefined behavior and might cause your application to crash or deadlock. | |
-c | --capture-range | none, cudaProfilerApi, nvtx | none | When -c cudaProfilerApi (or nvtx) is used, profiling will start only when cudaProfilerStart API is invoked or the specified NVTX range (specified using -p/--nvtx-capture) is started in the application. |
--stop-on-range-end | true,false | true | Stop profiling when the capture range ends. Applicable only when used along with --capture-range option. | |
-p | --nvtx-capture | range@domain,range,range@ | Specify NVTX capture range. See below for details. This option is applicable only when used along with --capture-range=nvtx. | |
--ftrace | Collect ftrace events. Argument should list events to collect as: subsystem1/event1,subsystem2/event2. Requires root. No ftrace events are collected by default. Note: Not available on IBM Power targets. | |||
--ftrace-keep-user-config | Skip initial ftrace setup and collect already configured events. Default resets the ftrace configuration. | |||
--gpuctxsw | true,false | false | Trace GPU context switches. Note that this requires driver r435.17 or later and root permission. Not available on IBM Power targets. | |
--kill | none, sigkill, sigterm, signal number | sigterm | Send signal to the target application's process group. | |
--session-new | [a-Z][0-9,a-Z,spaces] | profile-<id>-<application> | Name the session created by the command. Name must start with an alphabetical character followed by printable or space characters. Any '%q{ENV_VAR}' pattern will be substituted with the value of the environment variable. Any '%h' pattern will be substituted with the hostname of the system. Any '%%' pattern will be substituted with '%'. |
After choosing the launch
command switch, the following options are
available. Usage:
nsys [global-options] launch [options] <application> [application-arguments]
Short | Long | Possible Parameters | Default | Switch Description |
---|---|---|---|---|
-t | --trace | cublas, cuda, cudnn, nvtx, opengl, openacc, openmp, osrt, mpi, vulkan, none | cuda, opengl, nvtx, osrt | Select the API(s) to be traced. The osrt switch controls the OS runtime libraries tracing. Multiple APIs can be selected, separated by commas only (no spaces). Since OpenACC, cuDNN and cuBLAS APIs are tightly linked with CUDA, selecting one of those APIs will automatically enable CUDA tracing. See information on --mpi-impl option below if mpi is selected. If the none option is selected, no APIs are traced and no other API can be selected. Note: cublas, cudnn, opengl, and vulkan are not available on IBM Power target. |
--mpi-impl | openmpi,mpich | openmpi | When using --trace=mpi to trace MPI APIs use --mpi-impl to specify which MPI implementation the application is using. If you are using a different MPI implementation, see Tracing MPI API calls section below. Calling --mpi-impl without --trace=mpi is not supported. | |
-s | --sample | cpu, none | cpu | Select whether or not to collect CPU samples. If none is selected, sampling is disabled. |
-b | --backtrace | fp,lbr,dwarf,none | lbr | Select the backtrace method to use while sampling. The option lbr uses Intel(c) Corporation's Last Branch Records, available only with Intel(c) CPUs codenamed Haswell and later. The option fp is frame pointer and assumes that frame pointers were enabled during compilation.The option dwarf uses DWARF's CFI (Call Frame Information). Note: Only frame pointers are available for backtrace on IBM Power |
--command-file | < filename > | none | Open a file that contains launch switches and parse the switches. Note additional switches on the command line will override switches in the file. | |
-e | --env-var | A=B | NA | Set environment variable(s) for the application process to be launched. Environment variables should be defined as A=B. Multiple environment variables can be specified as A=B,C=D. |
--osrt-threshold | < nanoseconds > | 1000 ns | Set the minimum time that a OS Runtime event must take before it is collected. Setting this value too low can cause high application overhead and seriously increase the size of your results file. Note: Not available for IBM Power targets. | |
--cudabacktrace | true,false | false | When tracing CUDA APIs, this option enables collection of a backtrace when a CUDA API is invoked. This may lead to significant runtime overhead. See the --cudabacktrace-threshold switch. Note: CPU sampling must be enabled to collect CUDA API backtraces. Note: Not available on IBM Power targets. | |
--cudabacktrace-threshold | < nanoseconds > | 1000 ns | Set the duration, in nanoseconds, that CUDA APIs must execute before backtraces are collected. Setting this value too low can cause high application overhead and seriously increase the size of your results file. Note: Not relevant to IBM Power targets. | |
-w | --show-output | true, false | true | If true, send target process’ stdout and stderr streams to the console |
-n | --inherit-environment | true, false | true | When true, the current environment variables and the tool’s environment variables will be specified for the launched process. When false, only the tool’s environment variables will be specified for the launched process. |
-p | --nvtx-capture | message@idomain | none | Specify NVTX capture range. See below for details. |
--trace-fork-before-exec | true, false | false | If true, trace any child process after fork and before they call one of the exec functions. Beware, tracing in this interval relies on undefined behavior and might cause your application to crash or deadlock. | |
--wait | primary,all | all | If primary, the CLI will wait on the application process termination. If all, the CLI will additionally wait on re-parented processes created by the application. | |
--session | session identifier | none | Launch the application in the indicated session. The option argument must represent a valid session name or ID as reported by 'nsys sessions list'. Any '%q{ENV_VAR}' pattern will be substituted with the value of the environment variable. Any '%h' pattern will be substituted with the hostname of the system. Any '%%' pattern will be substituted with '%'. | |
--session-new | [a-Z][0-9,a-Z,spaces] | [default] | Launch the application in a new session. Name must start with an alphabetical character followed by printable or space characters. Any '%q{ENV_VAR}' pattern will be substituted with the value of the environment variable. Any '%h' pattern will be substituted with the hostname of the system. Any '%%' pattern will be substituted with '%'. |
After choosing the start
command switch, the following options are
available. Usage:
nsys [global-options] start [options]
Short | Long | Possible Parameters | Default | Switch Description |
---|---|---|---|---|
-c | --capture-range | none, cudaProfilerApi, nvtx | none | If set to cudaProfilerApi, profiling will start on the first call to cudaProfilerStart. Valid only with CUDA tracing enabled. If set to nvtx the profiling will start when the first NVTX capture range is started (see below for NVTX capture range definition). |
-o | --output | < filename > | report# | Set the .qdstrm filename. Any %q{ENV_VAR} pattern in the name will be substituted with the value of the environment variable. Any %h pattern in the filename will be substituted with the hostname of the system. The extension .qdstrm will be automatically appended. The default is report1.qdstrm, with the number incrementing to avoid overwriting files, in /home/user/nvidia_nsight_systems working directory. |
--export | sqlite, none | none | Create additional output file(s) based on the data collected. Current options are sqlite or none. WARNING: If the collection captures a large amount of data, creating the database file may take several minutes to complete. | |
--stats | true, false | false | Generate summary statistics after the collection. WARNING: When set to true, an SQLite database will be created after the collection. If the collection captures a large amount of data, creating the database file may take several minutes to complete. | |
-f | --force-overwrite | true, false | false | If true, overwrite all existing result files with same output filename (.qdstrm,.qdrep, .sqlite) |
-x | --stop-on-exit | true, false | true | If true, stop collecting automatically when all tracked processes have exited or when `stop` command is issued - whichever occurs first. If false, stop only on `stop` command. Note: When this is true, `stop` command is optional. Nsight Systems does not officially support runs longer than 5 minutes. |
--stop-on-range-end | true, false | true | If true, stop collecting when the specified capture range ends. Valid only when --capture-range is set. | |
--ftrace | Collect ftrace events. Argument should list events to collect as: subsystem1/event1,subsystem2/event2. Requires root. No ftrace events are collected by default. Note: Not supported on IBM Power targets. | |||
--ftrace-keep-user-config | Skip initial ftrace setup and collect already configured events. Default resets the ftrace configuration. | |||
--gpuctxsw | true,false | false | Trace GPU context switches. Note that this requires driver r435.17 or later and root permission. Not supported on IBM Power targets. | |
--session | session identifier | none | Start the application in the indicated session. The option argument must represent a valid session name or ID as reported by 'nsys sessions list'. Any '%q{ENV_VAR}' pattern will be substituted with the value of the environment variable. Any '%h' pattern will be substituted with the hostname of the system. Any '%%' pattern will be substituted with '%'. | |
--session-new | [a-Z][0-9,a-Z,spaces] | [default] | Start the application in a new session. Name must start with an alphabetical character followed by printable or space characters. Any '%q{ENV_VAR}' pattern will be substituted with the value of the environment variable. Any '%h' pattern will be substituted with the hostname of the system. Any '%%' pattern will be substituted with '%'. |
After choosing the export
command switch, the following options are
available. Usage:
nsys [global-options] export [options] [qdrep-file]
Short | Long | Possible Parameters | Default | Switch Description |
---|---|---|---|---|
-o | --output | <filename> | <inputfile.ext> | Set the .output filename. The default is the .qdrep filename with the extension for the chosen format. |
-t | --type | sqlite, hdr, text, json, info | sqlite | Export format type. HDF format is supported only on x86_64 Linux and Windows |
-f | --force-overwrite | true, false | false | If true, overwrite existing result file |
-q | --quiet | true, false | false | If true, do not display progress bar |
--separate-strings | true,false | false | Output stored strings and thread names separately, with one value per line. This affects JSON and text output only. |
After choosing the status
command switch, the following options are
available. Usage:
nsys [global-options] status [options]
Short | Long | Possible Parameters | Default | Switch Description |
---|---|---|---|---|
<none> | Returns current state of the CLI. | |||
-e | --environment | Returns information about the system regarding suitability of the profiling environment. | ||
--session | session identifier | none | Print the status of the indicated session. The option argument must represent a valid session name or ID as reported by 'nsys sessions list'. Any '%q{ENV_VAR}' pattern will be substituted with the value of the environment variable. Any '%h' pattern will be substituted with the hostname of the system. Any '%%' pattern will be substituted with '%'. |
After choosing the shutdown
command switch, the following options are
available. Usage:
nsys [global-options] shutdown [options]
Short | Long | Possible Parameters | Default | Switch Description |
---|---|---|---|---|
--kill | none, sigkill, sigterm, signal number | sigterm | Send signal to the target application's process group. | |
--session | session identifier | none | Shutdown the indicated session. The option argument must represent a valid session name or ID as reported by 'nsys sessions list'. Any '%q{ENV_VAR}' pattern will be substituted with the value of the environment variable. Any '%h' pattern will be substituted with the hostname of the system. Any '%%' pattern will be substituted with '%'. |
After choosing the cancel
command switch, the following options are
available. Usage:
nsys [global-options] cancel [options]
Short | Long | Possible Parameters | Default | Switch Description |
---|---|---|---|---|
--session | session identifier | none | Cancel the indicated session. The option argument must represent a valid session name or ID as reported by 'nsys sessions list'. Any '%q{ENV_VAR}' pattern will be substituted with the value of the environment variable. Any '%h' pattern will be substituted with the hostname of the system. Any '%%' pattern will be substituted with '%'. |
After choosing the stop
command switch, the following options are
available. Usage:
nsys [global-options] stop [options]
Short | Long | Possible Parameters | Default | Switch Description |
---|---|---|---|---|
--session | session identifier | none | Stop the indicated session. The option argument must represent a valid session name or ID as reported by 'nsys sessions list'. Any '%q{ENV_VAR}' pattern will be substituted with the value of the environment variable. Any '%h' pattern will be substituted with the hostname of the system. Any '%%' pattern will be substituted with '%'. |
After choosing the sessions
command switch, the following subcommands are
available. Usage:
nsys [global-options] sessions [subcommand]
Subcommand | Description |
---|---|
list | List all active sessions including ID, name, and state information |
Version Information
nsys -v
Effect: Prints tool version information to the screen.
Default analysis run
nsys profile <application> [application-arguments]
Effect: Launch the application using the given arguments. Start collecting immediately and end collection when the application stops. Trace CUDA, OpenGL, NVTX, and OS runtime libraries APIs. Collect CPU sampling information. Profile any child processes. Generate the report#.qdstrm file in the default location, incrementing the report number if needed to avoid overwriting any existing output files.
Limited trace only run
nsys profile --trace=cuda,nvtx -d 20 --sample=none -o my_test <application> [application-arguments]
Effect: Launch the application using the given arguments. Start collecting immediately and end collection after 20 seconds or when the application ends. Trace CUDA and NVTX APIs only. Do not collect CPU sampling information. Profile any child processes. Generate the output file as my_test.qdstrm in the current working directory.
Delayed start run
nsys profile -e TEST_ONLY=0 -y 20 <application> [application-arguments]
Effect: Set environment variable TEST_ONLY=0. Launch the application using the given arguments. Start collecting after 20 seconds and end collection at application exit. Trace CUDA, OpenGL, NVTX, and OS runtime libraries APIs. Collect CPU sampling information. Profile any child processes. Generate the report#.qdstrm file in the default location, incrementing if needed to avoid overwriting any existing output files.
Collect ftrace events
nsys profile --ftrace=drm/drm_vblank_event -d 20
Effect: Collect ftrace drm_vblank_event
events for 20 seconds. Generate the
report#.dqstrm file in the current working directory. Note that ftrace event
collection requires running as root. To get a list of ftrace events available
from the kernel, run sudo cat /sys/kernel/debug/tracing/available_events
Typical case: profile a Python script that uses CUDA
nsys profile --trace=cuda,cudnn,cublas,osrt,nvtx --delay=60 python my_dnn_script.py
Effect: Launch a Python script and start profiling it 60 seconds after the launch, tracing CUDA, cuDNN, cuBLAS, OS runtime APIs, and NVTX.
Typical case: profile an app that uses Vulkan
nsys profile --trace=vulkan,osrt,nvtx --delay=60 ./myapp
Effect: Launch an app and start profiling it 60 seconds after the launch, tracing Vulkan, OS runtime APIs, and NVTX.
Collect from beginning of application, end manually
nsys start --stop-on-exit=false nsys launch --trace=cuda,nvtx --sample=none <application> [application-arguments] nsys stop
Effect: Create interactive CLI process and set it up to begin collecting as soon as an application is launched. Launch the application, set up to allow tracing of CUDA and NVTX only. Stop only when explicitly requested. Generate the report#.qdstrm in the default location.
Note: If you start a collection and fail to stop the collection (or if you are allowing it to stop on exit, and the application runs for too long) your system’s storage space may be filled with collected data causing significant issues for the system. Nsight Systems will collect a different amount of data/sec depending on options, but in general Nsight Systems does not support runs of more than 5 minutes duration.
Run application, begin collection manually, run until process ends
nsys launch -w true <application> [application-arguments] nsys start
Effect: Create interactive CLI and launch an application set up for default analysis. Send application output to the terminal. No data is collected until you manually start collection at area of interest. Profile until the application ends. Generate the report#.qdstrm in the default location.
Note: If you launch an application and that application and any descendants exit before start is called Nsight Systems will create a fully formed .qdstrm file containing no data.
Run application, start/stop collection using cudaProfilerStart/Stop
nsys start -c cudaProfileApi nsys launch -w true <application> [application-arguments]
Effect: Create interactive CLI process and set it up to begin collecting as soon
as a cudaProfileStart() is detected. Launch application for default analysis,
sending application output to the terminal. Stop collection at next call to
cudaProfilerStop, when the user calls nsys stop
, or when the root process
terminates. Generate the report#.qdstrm in the default location.
Note: If you call nsys launch
before nsys start -c cudaProfilerApi
and the
code contains a large number of short duration cudaProfilerStart/Stop pairs,
Nsight Systems may be unable to process them correctly, causing a fault.
This will be corrected in a future version.
Note: The Nsight Systems CLI does not support multiple calls to the cudaProfilerStart/Stop API at this time.
Run application, start/stop collection using NVTX
nsys start -c nvtx nsys launch -w true -p MESSAGE@DOMAIN <application> [application-arguments]
Effect: Create interactive CLI process and set it up to begin collecting as soon
as an NVTX range with given message in given domain (capture range) is opened.
Launch application for default analysis, sending application output to the terminal.
Stop collection when all capture ranges are closed, when the user calls nsys stop
,
or when the root process terminates. Generate the report#.qdstrm in the default location.
Note: The Nsight Systems CLI only triggers the profiling session for the first capture range.
NVTX capture range can be specified:
Message@Domain: All ranges with given message in given domain are capture ranges. For example:
nsys launch -w true -p profiler@service ./app
Would make the profiling start when the first range with message "profiler" is opened in domain "service".
Message@*: All ranges with given message in all domains are capture ranges. For example:
nsys launch -w true -p profiler@* ./app
Would make the profiling start when the first range with message "profiler" is opened in any domain.
Message: All ranges with given message in default domain are capture ranges. For example:
nsys launch -w true -p profiler ./app
Would make the profiling start when the first range with message "profiler" is opened in the default domain.
By default only messages, provided by NVTX registered strings are considered to
avoid additional overhead. To enable non-registered strings check please launch
your application with NSYS_NVTX_PROFILER_REGISTER_ONLY=0
environment:
nsys launch -w true -p profiler@service -e NSYS_NVTX_PROFILER_REGISTER_ONLY=0 ./app
Run application, start/stop collection multiple times
The interactive CLI supports multiple sequential collections per launch.
nsys launch <application> [application-arguments] nsys start nsys stop nsys start nsys stop nsys shutdown --kill sigkill
Effect: Create interactive CLI and launch an application set up for default analysis. Send application output to the terminal. No data is collected until the start command is executed. Collect data from start until stop requested, generate report#.qdstrm in the current working directory. Collect data from second start until the secont stop request, generate report#.qdstrm (incremented by one) in the current working directory. Shutdown the interactive CLI and send sigkill to the target application's process group.
Note: Calling nsys cancel
after nsys start
will cancel the collection without
generating a report.
You use the --stats option with the nsys profile
or nsys start
command to
generate a set of useful summary statistics.
If your run traces CUDA, these include CUDA API, Kernel, and Memory Operation statistics:
If your run traces OS runtime events or NVTX push-pop ranges:
Recipes for these statistics as well as documentation on how to create your own metrics will be available in a future version of the tool.
The CLI generates a .qdstrm file. The .qdstrm file is an intermediate result file, not intended for multiple imports. It needs to be processed, either by importing it into the GUI or by using the standalone QdstrmImporter to generate an optimized .qdrep file. Use this .qdrep file when re-opening the result on the same machine, opening the result on a different machine, or sharing results with teammates.
This version of Nsight Systems will attempt to automatically convert the .qdstrm file to a .qdrep file with the same name after the run finishes if the required libraries are available. The ability to turn off auto-conversion will be added in a later version.
Import Into the GUI The CLI and host GUI versions must match to import a .qdstrm file successfully. The host GUI is backward compatible only with .qdrep files.
Copy the .qdstrm file you are interested in viewing to a system where the Nsight Systems host GUI is installed. Launch the Nsight Systems GUI. Select File->Import... and choose the .qdstrm file you wish to open.
The import of really large, multi-gigabyte, .qdstrm files may take up all of the memory on the host computer and lock up the system. This will be fixed in a later version.
Create .qdrep Using QdstrmImporter
The CLI and QdstrmImporter versions must match to convert a .qdstrm file into a .qdrep file. This .qdrep file can then be opened in the same verion or more recent versions of the GUI.
To run QdstrmImporter on the host system, find the QdstrmImporter binary in the Host-x86_64 directory in your installation. QdstrmImporter is available for all host platforms. See options below.
To run QdstrmImporter on the target system, copy the Linux Host-x86_64 directory to the target Linux system or install Nsight Systems for Linux host directly on the target. The Windows or MacOS host QdstrmImporter will not work on a Linux Target. See options below.
QdstrmImporter Option Short | QdstrmImporter Option Long | Parameter | Description |
---|---|---|---|
-h | --help | Help message providing information about available options and their parameters. | |
-v | --version | Output QdstrmImporter version information | |
-i | --input-file | filename or path | Import .qdstrm file from this location. |
-o | --output-file | filename or path | Provide a different file name or path for the resulting .qdrep file. Default is the same name and path as the .qdstrm file |
The Nsight Systems CLI has built-in API trace support via --trace=mpi option only for the OpenMPI and MPICH implementations of MPI. It traces a default list of synchronous MPI APIs. If you require more control over the list of traced APIs or if you are using a different MPI implementation, see github nvtx pmpi wrappers. You can use this documentation to generate a shared object to wrap a list of synchronous MPI APIs with NVTX using the MPI profiling interface (PMPI). If you set your LD_PRELOAD environment variable to the path of that object, nsys will capture and report the MPI API trace information when --trace=nvtx is used. There is no need to use --trace=MPI.
NVTX tracing is automatically enabled when MPI trace is turned on.
This version of the Nsight Systems CLI supports concurrent use of the
nsys profile
command. Each instance will create a separate report file.
You cannot use multiple instances of the interactive CLI concurrently, or use
the interactive CLI concurrently with nsys profile
in this version.
Nsight Systems can be used to profile applications launched with mpirun
command. Since concurrent use of the CLI is supported only when using the
nsys profile
command, Nsight Systems cannot profile each node from the
GUI or from the interactive CLI.
To profile everything, putting the data in one file:
nsys [nsys options] mpirun [mpi options]
To profile everything putting the data from each rank into a separate file:
mpirun [mpi options] nsys profile [nsys options]
To profile a single MPI process use a wrapper script. The following script (called "wrap.sh") runs nsys on rank 0 only:
#!/bin/bash if [[ $OMPI_COMM_WORLD_RANK == 0 ]]; then ~/nsys/nsys profile ./myapp "$@" --mydummyargument else ./myapp "$@" fi
and then execute mpirun ./wrap.sh
.
Note: Currently you will need a dummy argument to the process, so that Nsight Systems can decide which process to profile. This means that your process must accept dummy arguments to take advantage of this workaround. This script as written is for Open MPI, but should be easily adaptable to other MPI implementations.
Copyright (c) 2012-2020, NVIDIA Corporation. All rights reserved.