================================================================================ CONTENTS ================================================================================ -- Release Highlights -- Supported NVIDIA Display Drivers and Tegra Images -- Supported NVIDIA Hardware -- Supported Operating Systems -- New Features -- Known Issues -- Revision History -- More Information ================================================================================ Release Highlights ================================================================================ 4.4.0 * Adds support for Maxwell gm200 based GPUs including GeForce Titan X 4.3.0 * Adds support for Nvidia Tegra X1 32-bit and 64-bit devices 4.2.3 * The PerfKit SDK directory structure has new subdirectory under bin, lib, and samples bin so that 32-bit and 64-bit libraries and binaries can be shipped in one package. 4.2.2 * Adds support for Maxwell gm206 based GPUs including the GeForce GTX 960. 4.2.1 * Adds QNX support for Jetson Pro (Tegra K1) Pro platform. * On Microsoft Windows NvPerfKitControl can now be used in place of NvInstEnabler. * On Microsoft Windows NvPmSampleOGL and NvPmSampleDX replace the previous samples. 4.1.1 * Adds Android KitKat and Lollipop support for Tegra K1 32-bit and 64-bit devices. * Adds NvPerfKitControl to control driver instrumentation and set gpu clock rate as sysmax, promax and restore on Android platform. 4.1.0 * Adds support for Vibrante 3.0 Linux 1.0a for Jetson Pro (Tegra K1) platform. * Adds support for Linux4Tegra rel21 for Jetson TK1 (Tegra K1) platform. * Adds support for new Maxwell gm204 based GPUs including GeForce GTX970 and GTX980. * On Microsoft Windows adds initial support for NVIDIA Optimus and Microsoft Hybrid systems. * On Microsoft Windows removes dependency on Nvda.NvDebugApi.dll. 4.0.1 * Adds support for Vibrante 3.0 Linux for Jetson Pro (Tegra K1) platform. 4.0.0 * Added support for Maxwell architecture including GeForce 830M, 840M, 850M, 860M, 745, 750, and 750 Ti. * Removed support for all Tesla architecture based GPUs. * Added new functions to the API to support 1. counters with floating point data types 2. overflow indicator for counters 3. counter enumeration function supporting user data ================================================================================ Supported NVIDIA Display Drivers and Tegra Images ================================================================================ * On Windows NVIDIA r334.89 is recommended for 4.0.0. r343.98 is recommended for 4.1.0. r346.xx or greater is recommended for 4.2.0. r347.74 or greater is recommended for 4.4.0. * Vibrante 3.0-Linux Alpha3 on Jetson Pro TK1 platform. * L4T rel21 and rel22 on Jetson TK1 platform. * QNX rel23 on Jetson Pro TK1 platform. ================================================================================ Supported NVIDIA Hardware ================================================================================ PerfKit supports the NVIDIA GPUs listed below: * GeForce 8XX Series, GTX9XX, and GTX TITAN X (Maxwell Architecture) * GeForce 6XX and 7XX Series (Kepler Architecture) * GeForce 4XX and 5XX Series (Fermi Architecture) PerfKit supports the NVIDIA Tegra Platforms listed below: * NVIDIA Jetson Pro (K1) * NVIDIA Jetson TK1 * NVIDIA Shield Tablet PerfKit may or may not be available on other NVIDIA GPUs including Quadro and Telsa. The full list of supported NVIDIA GPUs can be found at https://developer.nvidia.com/nsight-visual-studio-edition-supported-gpus-full-list ================================================================================ Supported Operating Systems ================================================================================ PerfKit supports * Microsoft Windows Vista * Microsoft Windows 7 * Microsoft Windows 8 * Microsoft Windows 8.1 * Vibrante Linux 3.0a3 * Linux4Tegra rel21 * Android KitKat * Android Lollipop * QNX 6.60 ================================================================================ New Features in 4.2.1 ================================================================================ NEW PLATFORMS Adds support for QNX on NVIDIA Jetson Pro (Tegra TK1). MAXWELL ARCHITECTURE SUPPORT New Counters Gpu Family ---------------------------------------------------------------------------- l2_hitrate Maxwell COMPUTE tex_cache_sector_misses Maxwell COMPUTE BUG FIXES * CPU usage of NvInstEnabler.exe on Microsoft Windows should be reduced. ================================================================================ New Features in 4.1 ================================================================================ NEW PLATFORMS Adds support for Android KitKat and L on NVIDIA Shield Tablet and other Tegra TK1 32-bit and 64-bit devices. Adds support for NVIDIA Vibrante Linux on NVIDIA Jetson Pro. Adds support for NVIDIA Linux4Tegra on NVIDIA Jetson TK1. MICROSOFT WINDOWS PerfKit 4.1.0 for Windows now consists of only NvPmApi.Core.dll. The dependency on Nvda.NvDebugApi.dll has been removed. TEGRA PerfKit for Tegra contains a new OpenGL ES sample named NvPmSample. The sample works on Android, Vibrante3.0 and Linux4Tegra releases. The sample provides an example of how to implement both real-time mode and experiment mode for different types of OpenGL ES workloads. MAXWELL ARCHITETURE SUPPORT Perfkit 4.1.0 adds support for gm204. FERMI/KEPLER/MAXWELL ARCHITECTURE SUPPORT Renamed Counters ---------------------------------------------------------------------------- Old Name New Name tex*_cache_texel_queries tex*_cache_tex_queries_gpc*_tpc* New Counters ---------------------------------------------------------------------------- fb_read_bytes COMPUTE fb_read_sectors COMPUTE fb_write_bytes COMPUTE fb_write_sectors COMPUTE ================================================================================ New Features in 4.0 ================================================================================ COUNTER DATA TYPES NVPMAPI previously only supported UINT64 data type for raw counters, aggregate counters, and simplified experiments. This release adds support for double precision floating point counters. - The function NVPMGetCounterAttribute and attribute NVPMA_COUNTER_VALUE_TYPE can be used to query the type of a counter. The supported types are NVPM_VALUE_TYPE_{UINT64, FLOAT64}. - The legacy functions NVPMSample, NVPMGetCounterValueByName, and NVPMGetCounterValue will only return functions as UINT64. New counters of type FLOAT64 will be cast to UINT64 and returned. - The new functions NVPMGetCounterValueUint64 and NVPMGetCounterValueFloat64 should be used to query counter values. If the function is used to query the value of a counter with a different type the error NVPM_INCORRECT_VALUE_TYPE will be returned. - The structure NVPMSampleValueEx has been extended to support both UINT64 and FLOAT64 data types. The counter type is passed through the ulFlags member. COUNTER OVERFLOW The majority of GPU hardware counters are 32-bit and can overflow on long draw calls or long compute dispatches. The new functions NVPMGetCounterValueUint64 and NVPMGetCounterValueFloat64 have an output parameter pOverflow that is set to 1 if the counter overflowed. NVPMSampleValueEx indicates overflow for each counter using the flag NVPMSampleValueEx.ulFlags & NVPMSAMPLEEX_FLAG_OVERFLOW. ENHANCED COUNTER ENUMERATION The new function NVPMEnumCountersByContextUserData accepts a void* pUserData parameter that is passed to each callback simplifying enumeration of counter data. MAXWELL ARCHITECTURE SUPPORT Perfkit 4.0 adds support for gm107 and gm108 chips. In many cases the same counters are maintained between Kepler and Maxwell. TESLA ARCHITECTURE SUPPORT (REMOVED) Tesla architecture (not Tesla brand) GPUs have been removed from PerfKit. Please use PerfKit 3.2.2 if you need support for Tesla architecture based GPUs. FERMI ARCHITECTURE SUPPORT Removed Counters ---------------------------------------------------------------------------- crop_busy rop_busy zrop_busy fb_read_req_subp*_fb* Renamed Counters ---------------------------------------------------------------------------- Old Name New Name elapsed_cycles_gpc0 elapsed_cycles ia_l2_read_bytes l2_read_bytes_ia l2_fb_read_bytes l2_read_bytes_vidmem l2_fb_write_bytes l2_write_bytes_vidmem rop_l2_read_bytes l2_read_bytes_rop rop_l2_write_bytes l2_write_bytes_rop tex_l2_read_bytes l2_read_bytes_tex tex_l2_requests l2_read_sectors_tex New Counters ---------------------------------------------------------------------------- l2_read_bytes_mem l2_read_bytes_sysmem l2_write_bytes_mem l2_write_bytes_sysmem Counter Domain Changes ---------------------------------------------------------------------------- Old New elapsed_cycles BOTH COMPUTE inst_executed_cs_vsm0 BOTH COMPUTE fb_subp*_{read, write}_sectors_fb* BOTH COMPUTE l1_local_load_transaction_miss* BOTH COMPUTE l2_slice*_read_sectors_tex_fb* BOTH COMPUTE tex*_bank_conflicts_gpc*_tpc* BOTH COMPUTE tex*_cache_sector_{misses, queries}_gpc*_tpc* BOTH COMPUTE KEPLER ARCHITECTURE SUPPORT Removed Counters ---------------------------------------------------------------------------- crop_busy cta_launched // use sm_ctas_launched cta_launched_vsm* // use sm_ctas_launched_vsm* elapsed_cycles_gpc* elapsed_cycles_gpc*_tpc* inst_executed_{shader_type}_vsm* rop_busy sm_inst_executed_lsu_red_vsm* tex_l2_read_bytes // use l2_read_bytes_tex tex_l2_requests // use l2_read_sectors_tex zrop_busy Renamed Counters ---------------------------------------------------------------------------- Old Name New Name elapsed_cycles_gpc0 elapsed_cycles ia_l2_read_bytes l2_read_bytes_ia l2_fb_read_bytes l2_read_bytes_vidmem l2_fb_write_bytes l2_write_bytes_vidmem rop_l2_read_bytes l2_read_bytes_rop rop_l2_write_bytes l2_write_bytes_rop l2_read_sysmem_bytes l2_read_bytes_sysmem l2_read_vidmem_bytes l2_read_bytes_vidmem l2_write_sysmem_bytes l2_write_bytes_sysmem l2_write_vidmem_bytes l2_write_bytes_vidmem New Counters ---------------------------------------------------------------------------- l1_atoms_transactions_per_request l1_global_load_transactions_per_request l1_global_store_transactions_per_request l1_local_load_transactions_per_request l1_local_store_transactions_per_request l1_reds_transactions_per_request l1_shared_load_transactions_per_request l1_shared_store_transactions_per_request l2_read_bytes l2_read_bytes_mem l2_read_sectors l2_write_bytes l2_write_bytes_mem l2_write_sectors sm_active_warps_vsm* sm_executed_ipc sm_issued_ipc Counter Domain Changes ---------------------------------------------------------------------------- Old New elapsed_cycles BOTH COMPUTE fb_subp*_{read, write}_sectors_fb* BOTH COMPUTE l1_local_load_transaction_miss* BOTH COMPUTE l2_read_sectors_tex COMPUTE BOTH l2_slice*_read_sectors_tex_fb* BOTH COMPUTE sm_active_cycles_vsm* BOTH COMPUTE tex*_bank_conflicts_gpc*_tpc* BOTH COMPUTE tex*_cache_sector_{misses, queries}_gpc*_tpc* BOTH COMPUTE tex*_ldg* BOTH COMPUTE ================================================================================ Known Issues ================================================================================ * NVPMAPI does not support devices in SLI configuration. When in SLI configuration NVPMAPI may return counters from the wrong device. * Maxwell gm20x devices may require running the application as Administrator or root (sudo) in order to be correctly profiled. It will be fixed in new drivers in future. ================================================================================ Revision History ================================================================================ PerfKit 3.x/4.x version scheme follows the NVIDIA Nsight Visual Studio Edition version. PerfKit SDK previously used a different version scheme that ended with version 6.x. 2015/03 Perfkit 4.4.0 2015/02 PerfKit 4.3.0 2015/01 PerfKit 4.2.3 2015/01 PerfKit 4.2.2 2015/01 PerfKit 4.2.1 2014/12 PerfKit 4.2.0 2014/10 PerfKit 4.1.1 2014/09 PerfKit 4.1.0 2014/07 PerfKit 4.0.1 2014/04 PerfKit 4.0.0 2013/12 PerfKit 3.2.2 2013/11 PerfKit 3.2.1 2013/09 PerfKit 3.2.0 2013/08 PerfKit 3.1.0 ================================================================================ More Information ================================================================================ Additional information and downloads can be found at http://developer.nvidia.com/nvidia-perfkit Support issues can be mailed to PerfKit@nvidia.com.