Release Notes


NVIDIA PerfWorks Pro v0.46.11

System Requirements

One of the following development boards, with 64-bit image: 

Supported Platforms

PerfWorks Pro supports NVIDIA GeForce, Quadro, and Tesla GPUs based upon the NVIDIA Kepler, Maxwell, and Pascal architectures.

PerfWorks Pro supports Microsoft hybrid system on NVIDIA r378.49 or higher driver (Windows 8.1a and Windows 10).

PerfWorks Pro does not support SLI, Optimus and Microsoft hybrid systems.

Release Highlights

PerfWorks Pro 0.46.*:

0.46.8 Fixed the guide of how to build the samples for Vibrante Linux in the release notes.

PerfWorks Pro 0.45.*:

0.45.0 Added support for driver 381.65. (54295)

PerfWorks Pro 0.44.*:

0.44.0 The following new metrics were added:

New Metrics Kepler Maxwell Pascal
cpu__time_duration all all all
gpu__clip_primitives_in all all all
gpu__clip_primitives_out all all all
gpu__earlyz_samples_failed_depth gk208, gk20a all all
gpu__earlyz_samples_failed_stencil gk208, gk20a all all
gpu__earlyz_samples_failed_stencil all all all
gpu__latez_samples_failed_depth gk208, gk20a all all
gpu__latez_samples_failed_stencil gk208, gk20a all all
gpu__latez_samples_passed all all all
pa__pa2raster_stalled_pct   all  
sm__pipe_alu_utilization_pct all all all
sm__pipe_interp_utilization_pct all all all
sm__pixout_stall_pct all all all
system__time_duration all all all
tex__stalled_pct all all all
zcull__fragments_accepted all all all
zcull__fragments_accepted_pct all all all
zcull__fragments_rejected all all all
zcull__fragments_rejected_pct all all all
zcull__fragments_tested all all all
zcull__fragments_trivially_accepted all all all
zcull__fragments_trivially_accepted_pct all all all
zcull__tiles_accepted all all all
zcull__tiles_accepted_pct all all all
zcull__tiles_rejected all all all
zcull__tiles_rejected_pct all all all
zcull__tiles_tested all all all
zcull__tiles_trivially_accepted all all all
zcull__tiles_trivially_accepted_pct all all all
PerfWorks Pro 0.41.*:

0.41.0 Improved the metrics crop__sol_pct and zrop__sol_pct. (53731,53732)

PerfWorks Pro 0.38.*:
PerfWorks Pro 0.34.*:
PerfWorks Pro 0.33.*:
PerfWorks Pro 0.32.*:
PerfWorks Pro 0.30.*:
PerfWorks Pro 0.29.*:
PerfWorks Pro 0.27.*:
PerfWorks Pro 0.25.*:
PerfWorks Pro 0.24.*:
PerfWorks Pro 0.22.*:
PerfWorks Pro 0.21.*:
PerfWorks Pro 0.19.*:
PerfWorks Pro 0.18.*:

Below, "[API]" stands in for D3D11, D3D12_Queue, D3D12_CommandList, OpenGL, EGL, CUDA, etc. Note that some of the APIs take no context parameter, as they use the current context instead. The context-less APIs are OpenGL, EGL, and CUDA. (49061)

Specific changes are as follows:

Removed API 				New API
--------------------------------------------------------------------------------------------
NVPA_LoadDriver*() NVPA_[API]_LoadDriver() NVPA_Register*() NVPA_[API]_Register()
NVPA_UnregisterContext() NVPA_[API]_Unregister() NVPA_Context_GetConfig() NVPA_[API]_GetConfig()
NVPA_Context_GetSliDeviceCount() NVPA_[API]_GetSliDeviceCount()
NVPA_Context_GetDeviceIndex() NVPA_[API]_GetDeviceIndex()
NVPA_Context_Finish() NVPA_[API]_Finish()
NVPA_Context_PredictStackDataReady() NVPA_[API]_PredictStackDataReady()
NVPA_Context_GetStackData() NVPA_[API]_GetStackData()
NVPA_Object_PushRange() NVPA_[API]_PushRange()
NVPA_Object_PopRange() NVPA_[API]_PopRange()
NVPA_Object_GetNumRangeIds() NVPA_[API]_GetNumRangeIds()
NVPA_Object_GetRangeIds() NVPA_[API]_GetRangeIds()
NVPA_Context_BeginSession() NVPA_[API]_BeginSession()
NVPA_Context_EndSession() NVPA_[API]_EndSession()
NVPA_Context_BeginPass() NVPA_[API]_BeginPass()
NVPA_Context_EndPass() NVPA_[API]_EndPass()
PerfWorks Pro 0.17.*:
PerfWorks Pro 0.15.*:
PerfWorks Pro 0.14.*:
PerfWorks Pro 0.13.*:
PerfWorks Pro 0.12.*:
PerfWorks Pro 0.11.*:
PerfWorks Pro 0.10.*:
PerfWorks Pro 0.9.*
PerfWorks Pro 0.8.*
PerfWorks Pro 0.7.*
PerfWorks Pro 0.6.*
PerfWorks Pro 0.5.*
PerfWorks Pro 0.4.*
PerfWorks Pro 0.3.*
PerfWorks Pro 0.2.*
PerfWorks Pro 0.1.*

Known Issues

SDK Contents

\---PerfWorks
+---bin
| +---<platform1>
| \---<platform2>
+---lib
| +---<platform1>
| \---<platform2>
+---doc
+---include
\---samples

Linking Against the API

PerfWorks Tools

nvperf is a command line tool for offline querying of PerfWorks metrics.

Usage: 
nvperf <command> ...
where commands are
chips : list supported chip families
devices : list available devices and their properties
help : display this message
metrics : list and schedule metrics for a virtual chip
For help on an individual command, use nvperf <command> --help

Querying Supported Metrics
The command 'nvperf metrics --chip gm200 --list' outputs the list of metrics supported on NVIDIA GM200 GPUs.

nvperf metrics --chip gm200 --list
# metric name # tags # description
crop__busy_cycles_avg compute graphics realtime Number of cycles the crop is busy.
crop__busy_cycles_max compute graphics realtime Number of cycles the busiest crop is busy.
crop__busy_pct_avg realtime Percentage of time the crop is busy.
crop__busy_pct_max realtime Percentage of time the busiest crop is busy.
... etc ...

Querying number of passes to collect Metrics

It can take several passes to collect some performance metrics. nvperf command can schedule a list of metrics and report the number of passes.
The command 'nvperf metrics --chip gm200 gr__busy_pct sm__busy_pct_avg' schedules the metrics gr__busy_pct and sm__busy_pct_avg and outputs the number of passes.

nvperf metrics --chip gm200 gr__busy_pct sm__busy_pct_avg
Required passes to schedule all metrics: 1

The metric 'all' will schedule all available metrics.
nvperf metrics --chip gm200 all
Required passes to schedule all metrics: 41

PerfWorks Samples

Samples Directory Stucture:

|-- extensions
| |-- build build files for extensions
| |-- include headers for extensions referenced by sample code
| | |-- nvperfapi_utils helper library to configure a PerfWorks profiler
| | +-- winsys helper library to create a single window with a graphics context
| |-- lib built extensions will be deployed here
| +-- src source for extensions
| |-- nvperfapi_utils
| +-- winsys
+-- samples per graphics API samples
|-- bin built samples will be deployed here
|-- build build files for the samples
+-- gles GLES samples
+-- simple    Basic app to demonstrate use of PerfWorks API
+-- assets
|-- shaders
+-- src_shaders

How to Build the Samples:

Each sample will have a corresponding Makefile that will deploy the built sample into its corresponding bin directory.

The Makefiles provided are meant to be cross-compiled on a standard Linux host machine. Before running make, edit the variables:

TEGRA_SDK_PATH := "<SDK ROOT>"
COMPILER_BIN_PATH := "<TOOLCHAIN_ROOT>/tegra-4.9-nv/usr/bin/aarch64-gnu-linux"

at the top of the two following makefiles:

<sdk_root>/samples/extensions/build/l4t/Makefile
<sdk_root>/samples/samples/build/l4t/Makefile

Samples Prerequisites:

A suitable aarch64-unknown-linux-gnu cross-compiler must be installed at:

/usr/bin

How to Run the Samples:

Once built, copy the contents of the samples/gles/bin directory to the target device. This should include the built sample binary and a prebuilt PerfWorks library.

Support

Support issues can be mailed to PerfWorks@nvidia.com.

Copyright

NVIDIA® PerfWorks SDK Documentation ©2015-2017. NVIDIA Corporation. All Rights Reserved.

 


 

NVIDIA® PerfWorks Documentation Rev. 0.46.170612 ©2017. NVIDIA Corporation. All Rights Reserved.