Direct3D 12 is a low-overhead 3D graphics and compute API for Microsoft Windows. Information about Direct3D 12 can be found at the Direct3D 12 Programming Guide.
Nsight Systems can capture information about Direct3D 12 usage by the profiled process. This includes capturing the execution time of D3D12 API functions, corresponding workloads executed on the GPU, performance markers, and frame durations.
The Command List Creation row displays time periods when command lists were
being created. This enables developers to improve their application’s
multithreaded command list creation. Command list creation time period is
measured between the call to ID3D12GraphicsCommandList::Reset
and the call to ID3D12GraphicsCommandList::Close
.
The GPU row shows an aggregated view of D3D12 API calls and GPU workloads. Note that not all D3D12 API calls are logged.
A Command Queue row is displayed for each D3D12 command queue created by the profiled application. The row’s header displays the queue's running index and its type (Direct, Compute, Copy).
The API row displays time periods where ID3D12CommandQueue::ExecuteCommandLists
was called. The GPU Workload row displays time periods where workloads were
executed by the GPU. The workload’s type (Graphics, Compute, Copy, etc.) is
displayed on the bar representing the workload’s GPU execution.
In addition, you can see the PIX command queue CPU-side performance markers, GPU-side performance markers and the GPU Command List performance markers, each in their row.
Clicking on a GPU workload highlights the corresponding
ID3D12CommandQueue::ExecuteCommandLists
, ID3D12GraphicsCommandList::Reset
and ID3D12GraphicsCommandList::Close API
calls, and vice versa.
Detecting which CPU thread was blocked by a fence can be difficult in complex apps that run tens of CPU threads. The timeline view displays the 3 operations involved: * The CPU thread pushing a signal command and fence value into the command queue. This is displayed on the DX12 Synchronization sub-row of the calling thread. * The GPU executing that command, setting the fence value and signalling the fence. This is displayed on the GPU Queue Synchronization sub-row. * The CPU thread calling a Win32 wait API to block-wait until the fence is signalled. This is displayed on the Thread's OS runtime libraries row.
Clicking one of these will highlight it and the corresponding other two calls.
Copyright (c) 2012-2019, NVIDIA Corporation. All rights reserved.