Profiling from the CLI on Linux Devices

Installing the CLI on Your Target

The Nsight Systems CLI provides a simple interface to collect on a target without using the GUI. The collected data can then be copied to any system and analyzed later.

The CLI is distributed in the Target directory of the standard Nsight Systems download package. Users who want to install the CLI as a standalone tool can do so by copying the files within the Target directory. If you want the CLI output file (.qdstrm) to be auto-converted (to .qdrep) after the analysis is complete, you will need to copy the host directory as well.

If you wish to run the CLI without root (recommended mode), you will want to install in a directory where you have full access.

Command Line Options

The Nsight Systems command lines can have one of two forms:

nsys [global_option]

or

nsys [command_switch] [optional command_switch_options]
     [application] [optional application_options]

All command line options are case sensitive. For command switch options, when short options are used, the parameters should follow the switch after a space; e.g. -s cpu. When long options are used, the switch should be followed by an equal sign and then the parameter(s); e.g. --sample=cpu.

For this version of Nsight Systems, you must launch a process from the command line to begin analysis. If an instance of the requested process is already running when the CLI command is issued, the collection will fail. The launched process will be terminated when collection is complete unless the user specifies the --kill none option (details below).

The Nsight Systems CLI supports concurrent analysis by using sessions. Each Nsight Systems session is defined by a sequence of CLI commands that define one or more collections (e.g. when and what data is collected). A session begins with either a start, launch, or profile command. A session ends with a shutdown command, when a profile command terminates, or, if requested, when all the process tree(s) launched in the session exit. Multiple sessions can run concurrently on the same system.

A couple of notes about the use of paths in your command line.

CLI Global Options

Global Option Short Global Option Long Description
-h --help Help message providing information about available command switches and their options.
-v --version Output Nsight Systems CLI version information.

CLI Command Switches

The Nsight Systems command line interface can be used in two modes. You may launch your application and begin analysis with options specified to the nsys profile command. Alternatively, you can control the launch of an application and data collection using interactive CLI commands.

Command Description
profile A fully formed profiling description requiring and accepting no further input. The command switch options used (see below table) determine when the collection starts, stops, what collectors are used (e.g. API trace, IP sampling, etc.), what processes are monitored, etc.
start Start a collection in interactive mode. The start command can be executed before or after a launch command.
stop Stop a collection that was started in interactive mode. When executed, all active collections stop, the CLI process terminates but the application continues running.
cancel Cancels an existing collection started in interactive mode. All data already collected in the current collection is discarded.
launch In interactive mode, launches an application in an environment that supports the requested options. The launch command can be executed before or after a start command.
shutdown Disconnects the CLI process from the launched application and forces the CLI process to exit. If a collection is pending or active, it is cancelled
export Generates an export file from an existing .qdrep file. For more information about the exported formats see the /documentation/nsys-exporter directory in your Nsight Systems installation directory.
status Reports on the status of a CLI-based collection or the suitability of the profiling environment.
sessions Gives information about all sessions running on the system.
nvprof Special option to help with transition from legacy NVIDIA nvprof tool. Calling "nsys nvprof [options]" will provide the best available translation of "nvprof [options]" See Migrating from NVIDIA nvprof topic for details. No additional functionality of nsys will be available when using this option. Note: Not available on IBM Power targets.

CLI Profile Command Switch Options

After choosing the profile command switch, the following options are available. Usage:

nsys [global-options] profile [options] <application> [application-arguments]
Short Long Possible Parameters Default Switch Description
-t --trace cublas, cuda, cudnn, nvtx, opengl, openacc, openmp, osrt, mpi, vulkan, none cuda, opengl, nvtx, osrt Select the API(s) to be traced. The osrt switch controls the OS runtime libraries tracing. Multiple APIs can be selected, separated by commas only (no spaces). Since OpenACC, cuDNN and cuBLAS APIs are tightly linked with CUDA, selecting one of those APIs will automatically enable CUDA tracing. See information on --mpi-impl option below if mpi is selected. If the none option is selected, no APIs are traced and no other API can be selected. Note: cublas, cudnn, opengl, and vulkan are not available on IBM Power target.
--mpi-impl openmpi,mpich openmpi When using --trace=mpi to trace MPI APIs use --mpi-impl to specify which MPI implementation the application is using. If you are using a different MPI implementation, see Tracing MPI API calls section below. Calling --mpi-impl without --trace=mpi is not supported.
-s --sample cpu, none cpu Select whether or not to collect CPU samples. If none is selected, sampling is disabled.
-b --backtrace fp,lbr,dwarf,none lbr Select the backtrace method to use while sampling. The option lbr uses Intel(c) Corporation's Last Branch Records, available only with Intel(c) CPUs codenamed Haswell and later. The option fp is frame pointer and assumes that frame pointers were enabled during compilation. The option dwarf uses DWARF's CFI (Call Frame Information). Note: Only frame pointers are available for backtrace on IBM Power
--command-file < filename > none Open a file that contains profile switches and parse the switches. Note additional switches on the command line will override switches in the file.
-y --delay < seconds > 0 Collection start delay in seconds.
-d --duration < seconds > NA Collection duration in seconds, duration must be greater than zero. Note that the profiler does not detach from the application, it lives until application termination.
-e --env-var A=B NA Set environment variable(s) for the application process to be launched. Environment variables should be defined as A=B. Multiple environment variables can be specified as A=B,C=D.
--osrt-threshold < nanoseconds > 1000 ns Set the minimum time that a OS Runtime event must take before it is collected. Setting this value too low can cause high application overhead and seriously increase the size of your results file. Note: Not available for IBM Power targets.
--cudabacktrace true,false false When tracing CUDA APIs, this option enables collection of a backtrace when a CUDA API is invoked. This may lead to significant runtime overhead. See the --cudabacktrace-threshold switch. Note: CPU sampling must be enabled to collect CUDA API backtraces. Note: Not available on IBM Power Targets.
--cudabacktrace-threshold < nanoseconds > 1000 ns Set the duration, in nanoseconds, that CUDA APIs must execute before backtraces are collected. Setting this value too low can cause high application overhead and seriously increase the size of your results file. Note: Not relevant to IBM Power targets.
-o --output < filename > report# Set the .qdstrm filename. Any %q{ENV_VAR} pattern in the name will be substituted with the value of the environment variable. Any %h pattern in the filename will be substituted with the hostname of the system. The extension .qdstrm will be automatically appended. The default is report1.qdstrm, with the number incrementing to avoid overwriting files, in /home/user/nvidia_nsight_systems working directory.
--export sqlite, none none Create additional output file(s) based on the data collected. Current options are sqlite or none. WARNING: If the collection captures a large amount of data, creating the database file may take several minutes to complete.
--stats true, false false Generate summary statistics after the collection. WARNING: When set to true, an SQLite database will be created after the collection. If the collection captures a large amount of data, creating the database file may take several minutes to complete.
-f --force-overwrite true, false false If true, overwrite all existing result files with same output filename (.qdstrm,.qdrep, .sqlite)
-w --show-output true, false true If true, send target process’ stdout and stderr streams to the console.
-n --inherit-environment true, false true When true, the current environment variables and the tool’s environment variables will be specified for the launched process. When false, only the tool’s environment variables will be specified for the launched process.
-x --stop-on-exit true, false true If true, stop collecting automatically when the launched process has exited or when the duration expires - whichever occurs first. If false, duration must be set and the collection stops only when the duration expires. Nsight Systems does not officially support runs longer than 5 minutes.
--wait primary,all all If primary, the CLI will wait on the application process termination. If all, the CLI will additionally wait on re-parented processes created by the application.
--trace-fork-before-exec true, false false If true, trace any child process after fork and before they call one of the exec functions. Beware, tracing in this interval relies on undefined behavior and might cause your application to crash or deadlock.
-c --capture-range none, cudaProfilerApi, nvtx none When -c cudaProfilerApi (or nvtx) is used, profiling will start only when cudaProfilerStart API is invoked or the specified NVTX range (specified using -p/--nvtx-capture) is started in the application.
--stop-on-range-end true,false true Stop profiling when the capture range ends. Applicable only when used along with --capture-range option.
-p --nvtx-capture range@domain,range,range@ Specify NVTX capture range. See below for details. This option is applicable only when used along with --capture-range=nvtx.
--ftrace Collect ftrace events. Argument should list events to collect as: subsystem1/event1,subsystem2/event2. Requires root. No ftrace events are collected by default. Note: Not available on IBM Power targets.
--ftrace-keep-user-config Skip initial ftrace setup and collect already configured events. Default resets the ftrace configuration.
--gpuctxsw true,false false Trace GPU context switches. Note that this requires driver r435.17 or later and root permission. Not available on IBM Power targets.
--kill none, sigkill, sigterm, signal number sigterm Send signal to the target application's process group.
--session-new [a-Z][0-9,a-Z,spaces] profile-<id>-<application> Name the session created by the command. Name must start with an alphabetical character followed by printable or space characters. Any '%q{ENV_VAR}' pattern will be substituted with the value of the environment variable. Any '%h' pattern will be substituted with the hostname of the system. Any '%%' pattern will be substituted with '%'.

CLI Launch Command Switch Options

After choosing the launch command switch, the following options are available. Usage:

nsys [global-options] launch [options] <application> [application-arguments]
Short Long Possible Parameters Default Switch Description
-t --trace cublas, cuda, cudnn, nvtx, opengl, openacc, openmp, osrt, mpi, vulkan, none cuda, opengl, nvtx, osrt Select the API(s) to be traced. The osrt switch controls the OS runtime libraries tracing. Multiple APIs can be selected, separated by commas only (no spaces). Since OpenACC, cuDNN and cuBLAS APIs are tightly linked with CUDA, selecting one of those APIs will automatically enable CUDA tracing. See information on --mpi-impl option below if mpi is selected. If the none option is selected, no APIs are traced and no other API can be selected. Note: cublas, cudnn, opengl, and vulkan are not available on IBM Power target.
--mpi-impl openmpi,mpich openmpi When using --trace=mpi to trace MPI APIs use --mpi-impl to specify which MPI implementation the application is using. If you are using a different MPI implementation, see Tracing MPI API calls section below. Calling --mpi-impl without --trace=mpi is not supported.
-s --sample cpu, none cpu Select whether or not to collect CPU samples. If none is selected, sampling is disabled.
-b --backtrace fp,lbr,dwarf,none lbr Select the backtrace method to use while sampling. The option lbr uses Intel(c) Corporation's Last Branch Records, available only with Intel(c) CPUs codenamed Haswell and later. The option fp is frame pointer and assumes that frame pointers were enabled during compilation.The option dwarf uses DWARF's CFI (Call Frame Information). Note: Only frame pointers are available for backtrace on IBM Power
--command-file < filename > none Open a file that contains launch switches and parse the switches. Note additional switches on the command line will override switches in the file.
-e --env-var A=B NA Set environment variable(s) for the application process to be launched. Environment variables should be defined as A=B. Multiple environment variables can be specified as A=B,C=D.
--osrt-threshold < nanoseconds > 1000 ns Set the minimum time that a OS Runtime event must take before it is collected. Setting this value too low can cause high application overhead and seriously increase the size of your results file. Note: Not available for IBM Power targets.
--cudabacktrace true,false false When tracing CUDA APIs, this option enables collection of a backtrace when a CUDA API is invoked. This may lead to significant runtime overhead. See the --cudabacktrace-threshold switch. Note: CPU sampling must be enabled to collect CUDA API backtraces. Note: Not available on IBM Power targets.
--cudabacktrace-threshold < nanoseconds > 1000 ns Set the duration, in nanoseconds, that CUDA APIs must execute before backtraces are collected. Setting this value too low can cause high application overhead and seriously increase the size of your results file. Note: Not relevant to IBM Power targets.
-w --show-output true, false true If true, send target process’ stdout and stderr streams to the console
-n --inherit-environment true, false true When true, the current environment variables and the tool’s environment variables will be specified for the launched process. When false, only the tool’s environment variables will be specified for the launched process.
-p --nvtx-capture message@idomain none Specify NVTX capture range. See below for details.
--trace-fork-before-exec true, false false If true, trace any child process after fork and before they call one of the exec functions. Beware, tracing in this interval relies on undefined behavior and might cause your application to crash or deadlock.
--wait primary,all all If primary, the CLI will wait on the application process termination. If all, the CLI will additionally wait on re-parented processes created by the application.
--session session identifier none Launch the application in the indicated session. The option argument must represent a valid session name or ID as reported by 'nsys sessions list'. Any '%q{ENV_VAR}' pattern will be substituted with the value of the environment variable. Any '%h' pattern will be substituted with the hostname of the system. Any '%%' pattern will be substituted with '%'.
--session-new [a-Z][0-9,a-Z,spaces] [default] Launch the application in a new session. Name must start with an alphabetical character followed by printable or space characters. Any '%q{ENV_VAR}' pattern will be substituted with the value of the environment variable. Any '%h' pattern will be substituted with the hostname of the system. Any '%%' pattern will be substituted with '%'.

CLI Start Command Switch Options

After choosing the start command switch, the following options are available. Usage:

nsys [global-options] start [options]
Short Long Possible Parameters Default Switch Description
-c --capture-range none, cudaProfilerApi, nvtx none If set to cudaProfilerApi, profiling will start on the first call to cudaProfilerStart. Valid only with CUDA tracing enabled. If set to nvtx the profiling will start when the first NVTX capture range is started (see below for NVTX capture range definition).
-o --output < filename > report# Set the .qdstrm filename. Any %q{ENV_VAR} pattern in the name will be substituted with the value of the environment variable. Any %h pattern in the filename will be substituted with the hostname of the system. The extension .qdstrm will be automatically appended. The default is report1.qdstrm, with the number incrementing to avoid overwriting files, in /home/user/nvidia_nsight_systems working directory.
--export sqlite, none none Create additional output file(s) based on the data collected. Current options are sqlite or none. WARNING: If the collection captures a large amount of data, creating the database file may take several minutes to complete.
--stats true, false false Generate summary statistics after the collection. WARNING: When set to true, an SQLite database will be created after the collection. If the collection captures a large amount of data, creating the database file may take several minutes to complete.
-f --force-overwrite true, false false If true, overwrite all existing result files with same output filename (.qdstrm,.qdrep, .sqlite)
-x --stop-on-exit true, false true If true, stop collecting automatically when all tracked processes have exited or when `stop` command is issued - whichever occurs first. If false, stop only on `stop` command. Note: When this is true, `stop` command is optional. Nsight Systems does not officially support runs longer than 5 minutes.
--stop-on-range-end true, false true If true, stop collecting when the specified capture range ends. Valid only when --capture-range is set.
--ftrace Collect ftrace events. Argument should list events to collect as: subsystem1/event1,subsystem2/event2. Requires root. No ftrace events are collected by default. Note: Not supported on IBM Power targets.
--ftrace-keep-user-config Skip initial ftrace setup and collect already configured events. Default resets the ftrace configuration.
--gpuctxsw true,false false Trace GPU context switches. Note that this requires driver r435.17 or later and root permission. Not supported on IBM Power targets.
--session session identifier none Start the application in the indicated session. The option argument must represent a valid session name or ID as reported by 'nsys sessions list'. Any '%q{ENV_VAR}' pattern will be substituted with the value of the environment variable. Any '%h' pattern will be substituted with the hostname of the system. Any '%%' pattern will be substituted with '%'.
--session-new [a-Z][0-9,a-Z,spaces] [default] Start the application in a new session. Name must start with an alphabetical character followed by printable or space characters. Any '%q{ENV_VAR}' pattern will be substituted with the value of the environment variable. Any '%h' pattern will be substituted with the hostname of the system. Any '%%' pattern will be substituted with '%'.

CLI Export Command Switch Options

After choosing the export command switch, the following options are available. Usage:

nsys [global-options] export [options] [qdrep-file]
Short Long Possible Parameters Default Switch Description
-o --output <filename> <inputfile.ext> Set the .output filename. The default is the .qdrep filename with the extension for the chosen format.
-t --type sqlite, hdr, text, json, info sqlite Export format type. HDF format is supported only on x86_64 Linux and Windows
-f --force-overwrite true, false false If true, overwrite existing result file
-q --quiet true, false false If true, do not display progress bar
--separate-strings true,false false Output stored strings and thread names separately, with one value per line. This affects JSON and text output only.

CLI Status Command Switch Options

After choosing the status command switch, the following options are available. Usage:

nsys [global-options] status [options]
Short Long Possible Parameters Default Switch Description
<none> Returns current state of the CLI.
-e --environment Returns information about the system regarding suitability of the profiling environment.
--session session identifier none Print the status of the indicated session. The option argument must represent a valid session name or ID as reported by 'nsys sessions list'. Any '%q{ENV_VAR}' pattern will be substituted with the value of the environment variable. Any '%h' pattern will be substituted with the hostname of the system. Any '%%' pattern will be substituted with '%'.

CLI Shutdown Command Switch Options

After choosing the shutdown command switch, the following options are available. Usage:

nsys [global-options] shutdown [options]
Short Long Possible Parameters Default Switch Description
--kill none, sigkill, sigterm, signal number sigterm Send signal to the target application's process group.
--session session identifier none Shutdown the indicated session. The option argument must represent a valid session name or ID as reported by 'nsys sessions list'. Any '%q{ENV_VAR}' pattern will be substituted with the value of the environment variable. Any '%h' pattern will be substituted with the hostname of the system. Any '%%' pattern will be substituted with '%'.

CLI Cancel Command Switch Options

After choosing the cancel command switch, the following options are available. Usage:

nsys [global-options] cancel [options]
Short Long Possible Parameters Default Switch Description
--session session identifier none Cancel the indicated session. The option argument must represent a valid session name or ID as reported by 'nsys sessions list'. Any '%q{ENV_VAR}' pattern will be substituted with the value of the environment variable. Any '%h' pattern will be substituted with the hostname of the system. Any '%%' pattern will be substituted with '%'.

CLI Stop Command Switch Options

After choosing the stop command switch, the following options are available. Usage:

nsys [global-options] stop [options]
Short Long Possible Parameters Default Switch Description
--session session identifier none Stop the indicated session. The option argument must represent a valid session name or ID as reported by 'nsys sessions list'. Any '%q{ENV_VAR}' pattern will be substituted with the value of the environment variable. Any '%h' pattern will be substituted with the hostname of the system. Any '%%' pattern will be substituted with '%'.

CLI Sessions Command Switch Subcommands

After choosing the sessions command switch, the following subcommands are available. Usage:

nsys [global-options] sessions [subcommand]
Subcommand Description
list List all active sessions including ID, name, and state information

Example Single Command Lines

Version Information

nsys -v

Effect: Prints tool version information to the screen.

Default analysis run

nsys profile <application> [application-arguments]

Effect: Launch the application using the given arguments. Start collecting immediately and end collection when the application stops. Trace CUDA, OpenGL, NVTX, and OS runtime libraries APIs. Collect CPU sampling information. Profile any child processes. Generate the report#.qdstrm file in the default location, incrementing the report number if needed to avoid overwriting any existing output files.

Limited trace only run

nsys profile --trace=cuda,nvtx -d 20 --sample=none
  -o my_test <application> [application-arguments]

Effect: Launch the application using the given arguments. Start collecting immediately and end collection after 20 seconds or when the application ends. Trace CUDA and NVTX APIs only. Do not collect CPU sampling information. Profile any child processes. Generate the output file as my_test.qdstrm in the current working directory.

Delayed start run

nsys profile -e TEST_ONLY=0 -y 20 <application> [application-arguments]

Effect: Set environment variable TEST_ONLY=0. Launch the application using the given arguments. Start collecting after 20 seconds and end collection at application exit. Trace CUDA, OpenGL, NVTX, and OS runtime libraries APIs. Collect CPU sampling information. Profile any child processes. Generate the report#.qdstrm file in the default location, incrementing if needed to avoid overwriting any existing output files.

Collect ftrace events

nsys profile --ftrace=drm/drm_vblank_event -d 20

Effect: Collect ftrace drm_vblank_event events for 20 seconds. Generate the report#.dqstrm file in the current working directory. Note that ftrace event collection requires running as root. To get a list of ftrace events available from the kernel, run sudo cat /sys/kernel/debug/tracing/available_events

Typical case: profile a Python script that uses CUDA

nsys profile --trace=cuda,cudnn,cublas,osrt,nvtx --delay=60 python my_dnn_script.py

Effect: Launch a Python script and start profiling it 60 seconds after the launch, tracing CUDA, cuDNN, cuBLAS, OS runtime APIs, and NVTX.

Typical case: profile an app that uses Vulkan

nsys profile --trace=vulkan,osrt,nvtx --delay=60 ./myapp

Effect: Launch an app and start profiling it 60 seconds after the launch, tracing Vulkan, OS runtime APIs, and NVTX.

Example Interactive CLI Command Sequences

Collect from beginning of application, end manually

nsys start --stop-on-exit=false
nsys launch --trace=cuda,nvtx --sample=none <application> [application-arguments]
nsys stop

Effect: Create interactive CLI process and set it up to begin collecting as soon as an application is launched. Launch the application, set up to allow tracing of CUDA and NVTX only. Stop only when explicitly requested. Generate the report#.qdstrm in the default location.

Note: If you start a collection and fail to stop the collection (or if you are allowing it to stop on exit, and the application runs for too long) your system’s storage space may be filled with collected data causing significant issues for the system. Nsight Systems will collect a different amount of data/sec depending on options, but in general Nsight Systems does not support runs of more than 5 minutes duration.

Run application, begin collection manually, run until process ends

nsys launch -w true <application> [application-arguments]
nsys start

Effect: Create interactive CLI and launch an application set up for default analysis. Send application output to the terminal. No data is collected until you manually start collection at area of interest. Profile until the application ends. Generate the report#.qdstrm in the default location.

Note: If you launch an application and that application and any descendants exit before start is called Nsight Systems will create a fully formed .qdstrm file containing no data.

Run application, start/stop collection using cudaProfilerStart/Stop

nsys start -c cudaProfileApi
nsys launch -w true <application> [application-arguments]

Effect: Create interactive CLI process and set it up to begin collecting as soon as a cudaProfileStart() is detected. Launch application for default analysis, sending application output to the terminal. Stop collection at next call to cudaProfilerStop, when the user calls nsys stop, or when the root process terminates. Generate the report#.qdstrm in the default location.

Note: If you call nsys launch before nsys start -c cudaProfilerApi and the code contains a large number of short duration cudaProfilerStart/Stop pairs, Nsight Systems may be unable to process them correctly, causing a fault. This will be corrected in a future version.

Note: The Nsight Systems CLI does not support multiple calls to the cudaProfilerStart/Stop API at this time.

Run application, start/stop collection using NVTX

nsys start -c nvtx
nsys launch -w true -p MESSAGE@DOMAIN <application> [application-arguments]

Effect: Create interactive CLI process and set it up to begin collecting as soon as an NVTX range with given message in given domain (capture range) is opened. Launch application for default analysis, sending application output to the terminal. Stop collection when all capture ranges are closed, when the user calls nsys stop, or when the root process terminates. Generate the report#.qdstrm in the default location.

Note: The Nsight Systems CLI only triggers the profiling session for the first capture range.

NVTX capture range can be specified:

By default only messages, provided by NVTX registered strings are considered to avoid additional overhead. To enable non-registered strings check please launch your application with NSYS_NVTX_PROFILER_REGISTER_ONLY=0 environment:

nsys launch -w true -p profiler@service -e NSYS_NVTX_PROFILER_REGISTER_ONLY=0 ./app

Run application, start/stop collection multiple times

The interactive CLI supports multiple sequential collections per launch.

nsys launch <application> [application-arguments]
nsys start
nsys stop
nsys start
nsys stop
nsys shutdown --kill sigkill

Effect: Create interactive CLI and launch an application set up for default analysis. Send application output to the terminal. No data is collected until the start command is executed. Collect data from start until stop requested, generate report#.qdstrm in the current working directory. Collect data from second start until the secont stop request, generate report#.qdstrm (incremented by one) in the current working directory. Shutdown the interactive CLI and send sigkill to the target application's process group.

Note: Calling nsys cancel after nsys start will cancel the collection without generating a report.

Example Output from --stats Option

You use the --stats option with the nsys profile or nsys start command to generate a set of useful summary statistics.

If your run traces CUDA, these include CUDA API, Kernel, and Memory Operation statistics:

CUDA statistics

If your run traces OS runtime events or NVTX push-pop ranges:

OS runtime and NVTX Statistics

Recipes for these statistics as well as documentation on how to create your own metrics will be available in a future version of the tool.

Importing and Viewing Command Line Results Files

The CLI generates a .qdstrm file. The .qdstrm file is an intermediate result file, not intended for multiple imports. It needs to be processed, either by importing it into the GUI or by using the standalone QdstrmImporter to generate an optimized .qdrep file. Use this .qdrep file when re-opening the result on the same machine, opening the result on a different machine, or sharing results with teammates.

This version of Nsight Systems will attempt to automatically convert the .qdstrm file to a .qdrep file with the same name after the run finishes if the required libraries are available. The ability to turn off auto-conversion will be added in a later version.

Import Into the GUI The CLI and host GUI versions must match to import a .qdstrm file successfully. The host GUI is backward compatible only with .qdrep files.

Copy the .qdstrm file you are interested in viewing to a system where the Nsight Systems host GUI is installed. Launch the Nsight Systems GUI. Select File->Import... and choose the .qdstrm file you wish to open.

Import qdstrm

The import of really large, multi-gigabyte, .qdstrm files may take up all of the memory on the host computer and lock up the system. This will be fixed in a later version.

Create .qdrep Using QdstrmImporter

The CLI and QdstrmImporter versions must match to convert a .qdstrm file into a .qdrep file. This .qdrep file can then be opened in the same verion or more recent versions of the GUI.

To run QdstrmImporter on the host system, find the QdstrmImporter binary in the Host-x86_64 directory in your installation. QdstrmImporter is available for all host platforms. See options below.

To run QdstrmImporter on the target system, copy the Linux Host-x86_64 directory to the target Linux system or install Nsight Systems for Linux host directly on the target. The Windows or MacOS host QdstrmImporter will not work on a Linux Target. See options below.

QdstrmImporter Option Short QdstrmImporter Option Long Parameter Description
-h --help Help message providing information about available options and their parameters.
-v --version Output QdstrmImporter version information
-i --input-file filename or path Import .qdstrm file from this location.
-o --output-file filename or path Provide a different file name or path for the resulting .qdrep file. Default is the same name and path as the .qdstrm file

Using the CLI to Analyze MPI Codes

Tracing MPI API calls

The Nsight Systems CLI has built-in API trace support via --trace=mpi option only for the OpenMPI and MPICH implementations of MPI. It traces a default list of synchronous MPI APIs. If you require more control over the list of traced APIs or if you are using a different MPI implementation, see github nvtx pmpi wrappers. You can use this documentation to generate a shared object to wrap a list of synchronous MPI APIs with NVTX using the MPI profiling interface (PMPI). If you set your LD_PRELOAD environment variable to the path of that object, nsys will capture and report the MPI API trace information when --trace=nvtx is used. There is no need to use --trace=MPI.

NVTX tracing is automatically enabled when MPI trace is turned on.

Using the CLI to Profile Applications Launched with mpirun

This version of the Nsight Systems CLI supports concurrent use of the nsys profile command. Each instance will create a separate report file. You cannot use multiple instances of the interactive CLI concurrently, or use the interactive CLI concurrently with nsys profile in this version.

Nsight Systems can be used to profile applications launched with mpirun command. Since concurrent use of the CLI is supported only when using the nsys profile command, Nsight Systems cannot profile each node from the GUI or from the interactive CLI.

To profile everything, putting the data in one file:

nsys [nsys options] mpirun [mpi options]

To profile everything putting the data from each rank into a separate file:

mpirun [mpi options] nsys profile [nsys options]

To profile a single MPI process use a wrapper script. The following script (called "wrap.sh") runs nsys on rank 0 only:

#!/bin/bash
if [[ $OMPI_COMM_WORLD_RANK == 0 ]]; then
~/nsys/nsys profile ./myapp "$@" --mydummyargument
else
./myapp "$@"
fi

and then execute mpirun ./wrap.sh.

Note: Currently you will need a dummy argument to the process, so that Nsight Systems can decide which process to profile. This means that your process must accept dummy arguments to take advantage of this workaround. This script as written is for Open MPI, but should be easily adaptable to other MPI implementations.


Copyright (c) 2012-2020, NVIDIA Corporation. All rights reserved.