Users can generate a new report by stopping a profiling session. If a profiling session has been canceled, a report will not be generated, and all collected data will be discarded.
A new .qdrep
file will be created and put into the same directory as the
project file (.qdproj
).
An existing .qdrep
file can be opened using File > Open....
Report files (.qdrep
) are self-contained and can be shared with other users of
Nsight Systems. The only requirement is that the same or newer version of
Nsight Systems is always used to open report files.
Project files (.qdproj
) are currently not shareable, since they contain full
paths to the report files.
To quickly navigate to the directory containing the report file, right click on it in the Project Explorer, and choose Show in folder... in the context menu.
While generating a new report or loading an existing one, a new tab will be created. The most important parts of the report tab are:
View selector — Allows switching between Analysis Summary, Timeline View, Diagnostics Summary, and Symbol Resolution Logs views.
Timeline — This is where all charts are displayed.
Function table — Located below the timeline, it displays statistical information about functions in the target application in multiple ways.
Additionally, the following controls are available:
This view shows a summary of the profiling session. In particular, it is useful to review the project configuration used to generate this report. Information from this view can be selected and copied using the mouse cursor.
The timeline view consists of two main controls: the timeline at the top, and a bottom pane that contains the events view and the function table. In some cases, when sampling of a process has not been enabled, the function table might be empty and hidden.
The bottom view selector sets the view that is displayed in the bottom pane.
Timeline is a versatile control that contains a tree-like hierarchy on the left, and corresponding charts on the right.
Contents of the hierarchy depend on the project settings used to collect the report. For example, if a certain feature has not been enabled, corresponding rows will not be show on the timeline.
To display trace events in the Events View right-click a timeline row and select the “Show in Events View” command. The events of the selected row and all of its sub-rows will be displayed in the Events View.
If a timeline row has been selected for display in the Events View then double-clicking a timeline item on that row will automatically scroll the content of the Events View to make the corresponding Events View item visible and select it.
The Events View provides a tabular display of the trace events. The view contents can be searched and sorted.
Double-clicking an item in the Events View automatically focuses the Timeline View on the corresponding timeline item.
API calls, GPU executions and debug markers that occurred within the boundaries of a debug marker are displayed nested to that debug marker. Multiple levels of nesting are supported.
Events view recognizes these types of debug markers: - NVTX - Vulkan VKEXTdebugmarker markers, VKEXTdebugutils labels - PIX events and markers - OpenGL KHR_debug markers
The function table can work in three modes:
Top-Down View — In this mode, expanding top-level functions provides information about the callee functions. One of the top-level functions is typically the main function of your application, or another entry point defined by the runtime libraries.
Bottom-Up View — This is a reverse of the Top-Down view. On the top level, there are functions directly hit by the sampling profiler. To explore all possible call chains leading to these functions, you need to expand the subtrees of the top-level functions.
Flat View — This view enumerates all functions ever observed by the profiler, even if they have never been directly hit, but just appeared somewhere on the call stack. This view typically provides an high-level overview of which parts of the code are CPU-intensive.
Each of the views helps understand particular performance issues of the application being profiled. For example:
When trying to find specific bottleneck functions that can be optimized, the Bottom-Up view should be used. Typically, the top few functions should be examined. Expand them to understand in which contexts they are being used.
To navigate the call tree of the application and while generally searching for algorithms and parts of the code that consume unexpectedly large amount of CPU time, the Top-Down view should be used.
To quickly assess which parts of the application, or high level parts of an algorithm, consume significant amount of CPU time, use the Flat view.
The Top-Down and Bottom-Up views have Self and Total columns, while the Flat view has a Flat column. It is important to understand the meaning of each of the columns:
Top-Down view
Self column denotes the relative amount of time spent executing instructions of this particular function.
Total column shows how much time has been spent executing this function, including all other functions called from this one. Total values of sibling rows sum up to the Total value of the parent row, or 100% for the top-level rows.
Bottom-Up view
Self column for top-level rows, as in the Top-Down view, shows how much time has been spent directly in this function. Self times of all top-level rows add up to 100%.
Self column for children rows breaks down the value of the parent row based on the various call chains leading to that function. Self times of sibling rows add up to the value of the parent row.
Flat view
Note that if low-impact functions have been filtered out, values may not add up correctly to 100%, or to the value of the parent row. This filtering can be disabled.
Contents of the symbols table is tightly related to the timeline. Users can apply and modify filters on the timeline, and they will affect which information is displayed in the symbols table:
Per-thread filtering — Each thread that has sampling information associated with it has a checkbox next to it on the timeline. Only threads with selected checkboxes are represented in the symbols table.
Time filtering — A time filter can be setup on the timeline by pressing the left mouse button, dragging over a region of interest on the timeline, and then choosing Filter by selection in the dropdown menu. In this case, only sampling information collected during the selected time range will be used to build the symbols table.
Note that if too little sampling data is being used to build the symbols table (for example, when the sampling rate is configured to be low, and a short period of time is used for time-based filtering), the numbers in the symbols table might not be representative or accurate in some cases.
Collapse unresolved lines is useful if some of the binary code does not have symbols. In this case, subtrees that consist of only unresolved symbols get collapsed in the Top-Down view, since they provide very little useful information.
Hide functions with CPU usage below X% is useful for large applications, where the sampling profiler hits lots of function just a few times. To filter out the "long tail," which is typically not important for CPU performance bottleneck analysis, this checkbox should be selected.
This view shows important messages. Some of them were generated during the profiling session, while some were added while processing and analyzing data in the report. Messages can be one of the following types:
To draw attention to important diagnostics messages, a summary line is displayed on the timeline view in the top right corner:
Information from this view can be selected and copied using the mouse cursor.
This view shows all messages related to the process of resolving symbols. It might be useful to debug issues when some of the symbol names in the symbols table of the timeline view are unresolved.
Nsight Systems has equal support for OpenGL and OpenGL ES. For brevity, in this section OpenGL will be used instead of OpenGL and OpenGL ES.
On the Analysis Summary page, there is a list of all OpenGL functions requested to be traced.
On the timeline, new rows will appear within each CPU thread that uses OpenGL:
OpenGL API row — shows trace ranges of OpenGL API function that the user requested to trace.
OpenGL GPU workload row — shows when batches of OpenGL draw calls are being executed on the GPU. Since draw calls are executed back-to-back, the GPU workload trace ranges include many OpenGL draw calls and operations in order to optimize performance overhead, rather than tracing each individual operation.
Ranges defined by the KHR_debug
calls are represented similarly to OpenGL API
and OpenGL GPU workload trace. GPU ranges in this case represent incremental
draw cost. They cannot fully account for GPUs that can execute multiple draw
calls in parallel. In this case, Nsight Systems will not show overlapping
GPU ranges.
See CUDA Trace for more information.
Copyright (c) 2012-2020, NVIDIA Corporation. All rights reserved.