D3D12 Trace

Direct3D 12 Overview

Direct3D 12 is a low-overhead 3D graphics and compute API for Microsoft Windows. Information about Direct3D 12 can be found at the Direct3D 12 Programming Guide.

Nsight Systems can capture information about Direct3D 12 usage by the profiled process. This includes capturing the execution time of D3D12 API functions, corresponding workloads executed on the GPU, performance markers, and frame durations.

D3D12 overview picture

The Command List Creation row displays time periods when command lists were being created. This enables developers to improve their application’s multithreaded command list creation. Command list creation time period is measured between the call to ID3D12GraphicsCommandList::Reset and the call to ID3D12GraphicsCommandList::Close.

D3D12 commandlist creation

The GPU row shows an aggregated view of D3D12 API calls and GPU workloads. Note that not all D3D12 API calls are logged.

D3D12 GPU aggregated

A Command Queue row is displayed for each D3D12 command queue created by the profiled application. The row’s header displays the queue's running index and its type (Direct, Compute, Copy).

D3D12 command queue overview

The API row displays time periods where ID3D12CommandQueue::ExecuteCommandLists was called. The GPU Workload row displays time periods where workloads were executed by the GPU. The workload’s type (Graphics, Compute, Copy, etc.) is displayed on the bar representing the workload’s GPU execution.

D3D12 API and Workload

In addition, you can see the PIX command queue CPU-side performance markers, GPU-side performance markers and the GPU Command List performance markers, each in their row.

D3D12 CPU marker

D3D12 GPU marker

D3D12 commandlist marker

Clicking on a GPU workload highlights the corresponding ID3D12CommandQueue::ExecuteCommandLists, ID3D12GraphicsCommandList::Reset and ID3D12GraphicsCommandList::Close API calls, and vice versa.

D3D12 correlation

Detecting which CPU thread was blocked by a fence can be difficult in complex apps that run tens of CPU threads. The timeline view displays the 3 operations involved: * The CPU thread pushing a signal command and fence value into the command queue. This is displayed on the DX12 Synchronization sub-row of the calling thread. * The GPU executing that command, setting the fence value and signalling the fence. This is displayed on the GPU Queue Synchronization sub-row. * The CPU thread calling a Win32 wait API to block-wait until the fence is signalled. This is displayed on the Thread's OS runtime libraries row.

Clicking one of these will highlight it and the corresponding other two calls.

D3D12 fence sync


Copyright (c) 2012-2019, NVIDIA Corporation. All rights reserved.