There are three types of counters available through Tegra Graphics Debugger. Hardware counters provide data directly from various points inside the GPU. Software counters give insight into the state and performance of the driver. Simplified Experiments are multi-pass experiments that give detailed information about the state of the GPU.
The GPU counters give results accumulated from the previous time the GPU was sampled. For instance, the triangle_count gives the number of triangles rendered since the last sample was taken. Once you integrate the counters into your own application, you can sample on a per-frame basis and correlate the data to a given frame.
All of the software/driver counters represent a per frame accounting. These counters are accumulated and updated in the driver per frame, so even if you sample at a sub-frame rate frequency, the software counters will hold the same data (from the previous frame) until the end of the current frame.
Counter data is provided as either raw values or as a percentage. Raw counters count events (triangles, pixels, milliseconds, etc.) since the last call. Percentage counters are event counts based on the clock rate where the event count is divided by the number of cycles. For example, gpu_idle counts the number of clock ticks that the GPU was idle since the last call. This value is automatically divided by the total number of clock ticks to give the percentage of time that the GPU was idle.
The table below outlines all performance counters that are supported in Tegra Graphics Debugger.
Name | API | Unit | Description |
---|---|---|---|
cpu_00_frequency | The current frequency of the CPU core in Hz. | ||
cpu_00_load | The utilization of the CPU core. | ||
cpu_01_frequency | The current frequency of the CPU core in Hz. | ||
cpu_01_load | The utilization of the CPU core. | ||
cpu_02_frequency | The current frequency of the CPU core in Hz. | ||
cpu_02_load | The utilization of the CPU core. | ||
cpu_03_frequency | The current frequency of the CPU core in Hz. | ||
cpu_03_load | The utilization of the CPU core. | ||
cpu_load | The average utilization of all the CPU cores. | ||
elapsed_cycles | Compute | Max elapsed cycles of all the GPCs. | |
geom_busy | Graphics | GPU | Cycles the geometry unit is busy. |
GPU Bottleneck | Graphics | GPU | Index for GPU bottleneck |
GPU_busy | Both | GPU | Cycles the Graphics engine or the Compute engine is busy. |
GPU_idle | Both | GPU | Cycles the Graphics engine and Compute engine is idle. |
IA Bottleneck | Graphics | GPU | Input Attribute Is Bottleneck |
IA SOL | Graphics | GPU | Input Attribute SOL |
ia_requests | Graphics | GPU | Number of Input Assembler requests. |
inst_executed_cs | Both | SM | Instructions executed by Compute shaders (CS), not including replays. |
inst_executed_cs_ratio | Both | SM | Percentage of total instructions executed that were executed by a Compute shader. |
inst_executed_gs | Graphics | SM | Instructions executed by geometry shaders (GS), not including replays. |
inst_executed_gs_ratio | Graphics | SM | Percentage of total instructions executed that were executed by a geometry shader. |
inst_executed_ps | Graphics | SM | Instructions executed by pixel shaders (PS), not including replays. |
inst_executed_ps_ratio | Graphics | SM | Percentage of total instructions executed that were executed by a pixel shader. |
inst_executed_tcs | Graphics | SM | Instructions executed by tesselation control shaders (TCS/hull), not including replays. |
inst_executed_tcs_ratio | Graphics | SM | Percentage of total instructions executed that were executed by a hull shader. |
inst_executed_tes | Graphics | SM | Instructions executed by tesselation evaluation shaders (TES/domain), not including replays. |
inst_executed_tes_ratio | Graphics | SM | Percentage of total instructions executed that were executed by a domain shader. |
inst_executed_vs | Graphics | SM | Instructions executed by vertex shaders (VS), not including replays. |
inst_executed_vs_ratio | Graphics | SM | Percentage of total instructions executed that were executed by a vertex shader. |
l1_atoms_bytes | Compute | Cache | Number of bytes written through L1 for ATOM instructions. |
l1_atoms_transactions | Compute | Cache | ATOM transactions. A transaction is 128 bytes. |
l1_atoms_transactions_per_request | Compute | Cache | Number of atom transactions in L1 per atom instructions executed. |
l1_global_load_bytes | Compute | Cache | Number of bytes read from L1 for global memory. |
l1_global_load_hitrate | Compute | Cache | Hit rate in percent in L1 for global load operations. |
l1_global_load_transactions | Compute | Cache | Global load transactions. A transaction is 128 bytes. |
l1_global_load_transactions_hit | Compute | Cache | Global load transactions that hit in the L1 cache. A transaction is 128 bytes. |
l1_global_load_transactions_hit_vsm0 | Compute | Cache | Global load transactions that hit in the L1 cache by this SM. A transaction is 128 bytes. Increments by 0-1 per cycle per SM. |
l1_global_load_transactions_miss | Compute | Cache | Global load transactions that miss in the L1 cache. A transaction is 128 bytes. |
l1_global_load_transactions_miss_vsm0 | Compute | Cache | Global load transactions that miss in the L1 cache by this SM. A transaction is 128 bytes. Increments by 0-1 per cycle per SM. |
l1_global_load_transactions_per_request | Compute | Cache | Number of global load transactions in L1 per global/surface load instructions executed. |
l1_global_load_uncached_transactions | Compute | Cache | Uncached global load executed. A transaction is 128 bytes. |
l1_global_load_uncached_transactions_vsm0 | Compute | Cache | Uncached global load executed by this SM. A transaction is 128 bytes. Increments by 0-1 per cycle per SM. |
l1_global_store_bytes | Compute | Cache | Number of bytes written to L1 for global memory. |
l1_global_store_transactions | Compute | Cache | Global store transactions executed. A transaction is 128 bytes. |
l1_global_store_transactions_per_request | Compute | Cache | Number of global store transactions in L1 per global/surface store instructions executed. |
l1_global_store_transactions_vsm0 | Compute | Cache | Global store transactions executed by this SM. A transaction is 128 bytes. Increments by 0-1 per cycle per SM. |
l1_global_uncached_load_bytes | Compute | Cache | Number of bytes read from L2 for global uncached memory. |
l1_hitrate | Compute | Cache | Hit rate in percent in L1 for global load and local load and store operations. |
l1_l2_bytes | Graphics | Cache | Number of bytes transferred to the L2 unit by the L1 unit. |
l1_l2_requests | Graphics | Cache | Number of L2 requests from the L1 unit. |
l1_local_load_bytes | Compute | Cache | Number of bytes read from L1 for local memory. |
l1_local_load_hitrate | Compute | Cache | Hit rate in percent in L1 for local load operations. |
l1_local_load_transactions | Compute | Cache | Local load transactions. A transaction is 128 bytes. |
l1_local_load_transactions_hit | Compute | Cache | Local load transactions that hit in the L1 cache. A transaction is 128 bytes. |
l1_local_load_transactions_hit_vsm0 | Compute | Cache | Local load transactions that hit in the L1 cache by this SM. A transaction is 128 bytes. Increments by 0-1 per cycle per SM. |
l1_local_load_transactions_miss | Both | Cache | Local load transactions that miss in the L1 cache. A transaction is 128 bytes. |
l1_local_load_transactions_miss_vsm0 | Both | Cache | Local load transactions that miss in the L1 cache by this SM. A transaction is 128 bytes. Increments by 0-1 per cycle per SM. |
l1_local_load_transactions_per_request | Compute | Cache | Number of local load transactions in L1 per local load instructions executed. |
l1_local_store_bytes | Compute | Cache | Number of bytes written to L1 for local memory. |
l1_local_store_hitrate | Compute | Cache | Hit rate in percent in L1 for local store operations. |
l1_local_store_transactions | Compute | Cache | Local store transactions. A transaction is 128 bytes. |
l1_local_store_transactions_hit | Compute | Cache | Local store transactions that hit in the L1 cache. A transaction is 128 bytes. |
l1_local_store_transactions_hit_vsm0 | Compute | Cache | Local store transactions that hit in the L1 cache by this SM. A transaction is 128 bytes. Increments by 0-1 per cycle per SM. |
l1_local_store_transactions_miss | Compute | Cache | Local store transactions that miss in the L1 cache. A transaction is 128 bytes. |
l1_local_store_transactions_miss_vsm0 | Compute | Cache | Local store transactions that miss in the L1 cache by this SM. A transaction is 128 bytes. Increments by 0-1 per cycle per SM. |
l1_local_store_transactions_per_request | Compute | Cache | Number of local store transactions in L1 per local store instructions executed. |
l1_reds_bytes | Compute | Cache | Number of bytes written through L1 for RED instructions. |
l1_reds_transactions | Compute | Cache | RED transactions. A transaction is 128 bytes. |
l1_reds_transactions_per_request | Compute | Cache | Number of red transactions in L1 per red instructions executed. |
l1_shared_bank_conflicts | Compute | Cache | Number of bank conflicts for shared memory operations. |
l1_shared_load_bytes | Compute | Cache | Number of bytes read from L1 for shared memory. |
l1_shared_load_transactions | Compute | Cache | Shared load transactions. A transaction is 256 bytes. |
l1_shared_load_transactions_per_request | Compute | Cache | Number of shared load transactions in L1 per shared load instructions executed. |
l1_shared_load_transactions_vsm0 | Compute | Cache | Shared load transactions by this SM. A transaction is 256 bytes. Increments by 0-1 per cycle per SM. |
l1_shared_store_bytes | Compute | Cache | Number of bytes written to L1 for shared memory. |
l1_shared_store_transactions | Compute | Cache | Shared store transactions. A transaction is 256 bytes. |
l1_shared_store_transactions_per_request | Compute | Cache | Number of shared store transactions in L1 per shared store instructions executed. |
l1_shared_store_transactions_vsm0 | Compute | Cache | Shared store transactions by this SM. A transaction is 256 bytes. Increments by 0-1 per cycle per SM. |
L2 Bottleneck | Graphics | GPU | L2 Is Bottleneck |
l2_read_bytes | Compute | Cache | Number of bytes read from L2. |
l2_read_bytes_atomic | Compute | Cache | Number of bytes read by atomic from L2. |
l2_read_bytes_ia | Graphics | GPU | Number of bytes returned from L2 to the Input Assembler. |
l2_read_bytes_l1 | Compute | Cache | Number of bytes read by L1 from L2. |
l2_read_bytes_rop | Graphics | Cache | Number of bytes read to the L2 unit by the ROP unit. |
l2_read_bytes_tex | Compute | Cache | Number of bytes read by texture from L2. |
l2_read_sectors | Compute | Cache | Number of sectors read from L2. A sector is 32 bytes. |
l2_read_sectors_atomic | Compute | Cache | Number of sectors read by atomic from L2. A sector is 32 bytes. |
l2_read_sectors_l1 | Compute | Cache | Number of sectors read by L1 from L2. A sector is 32 bytes. |
l2_read_sectors_tex | Compute | Cache | Number of sectors read by texture from L2. A sector is 32 bytes. |
l2_slice0_read_sectors_atomic_fb0 | Compute | Cache | Sector reads for ATOM/RED to L2 cache that hit in the L2 cache in the given slice and FB partition. A sector is 32 bytes. |
l2_slice0_read_sectors_fb0 | Compute | Cache | Sector reads that hit in the L2 cache in the given slice and FB partition. A sector is 32 bytes. |
l2_slice0_read_sectors_l1_fb0 | Compute | Cache | Sector reads from L1 to L2 cache that hit in the L2 cache in the given slice and FB partition. A sector is 32 bytes. |
l2_slice0_read_sectors_tex_fb0 | Compute | Cache | Sector reads from TEX to L2 cache that hit in the L2 cache in the given slice and FB partition. A sector is 32 bytes. |
l2_slice0_write_sectors_atomic_fb0 | Compute | Cache | Sector writes for ATOM/RED to L2 cache in the given slice and FB partition. A sector is 32 bytes. |
l2_slice0_write_sectors_fb0 | Compute | Cache | Sector writes to the L2 cache in the given slice and FB partition. A sector is 32 bytes. |
l2_slice0_write_sectors_l1_fb0 | Compute | Cache | Sector writes from L1 to L2 cache in the given slice and FB partition. A sector is 32 bytes. |
l2_slice0_write_sectors_tex_fb0 | Compute | Cache | Sector writes from TEX to L2 cache in the given slice and FB partition. A sector is 32 bytes. |
l2_write_bytes | Compute | Cache | Number of bytes written to L2. |
l2_write_bytes_atomic | Compute | Cache | Number of bytes written by atomic to L2. |
l2_write_bytes_l1 | Compute | Cache | Number of bytes written by L1 to L2. |
l2_write_bytes_rop | Graphics | Cache | Number of bytes written to the L2 unit by the ROP unit. |
l2_write_bytes_tex | Compute | Cache | Number of bytes written by texture to L2. |
l2_write_sectors | Compute | Cache | Number of sectors written to L2. A sector is 32 bytes. |
l2_write_sectors_atomic | Compute | Cache | Number of sectors written by atomic to L2. A sector is 32 bytes. |
l2_write_sectors_l1 | Compute | Cache | Number of sectors written by L1 to L2. A sector is 32 bytes. |
l2_write_sectors_tex | Compute | Cache | Number of sectors written by texture to L2. A sector is 32 bytes. |
OGL % driver waiting | OGL Percent of time in frame that driver is waiting. | ||
OGL AGP/PCI-E usage (bytes) | OGL Current amount of AGP or PCI-E memory (non-local video memory) used in bytes. | ||
OGL AGP/PCI-E usage (MB) | OGL Current amount of AGP or PCI-E memory (non-local video memory) used in MB. | ||
OGL driver sleeping | OGL Last frame mSec sleeping in OGL driver. | ||
OGL FPS | OGL Frames/Sec rendered since last sample. | ||
OGL Frame Batch Count | OGL Number of draw batches issued during the last frame. | ||
OGL Frame Primitive Count | OGL Number of primitives issued during the last frame. | ||
OGL Frame Time | OGL Last frame to frame time measured by OGL in mSec. | ||
OGL Frame Vertex Count | OGL Number of vertices issued during the last frame. | ||
OGL vidmem bytes | OGL Current amount of video memory (local video memory) allocated in bytes. Drawables and render targets are not counted. | ||
OGL vidmem MB | OGL Current amount of video memory (local video memory) allocated in MB. Drawables and render targets are not counted. | ||
OGL vidmem total bytes | OGL total amount of video memory (local video memory) in bytes. | ||
OGL vidmem total MB | OGL total amount of video memory (local video memory) in MB. | ||
Primitive Setup Bottleneck | Graphics | GPU | Primitive Setup is the Bottleneck |
Primitive Setup SOL | Graphics | GPU | Primitive Setup SOL |
Rasterization Bottleneck | Graphics | GPU | Rasterization is the Bottleneck |
Rasterization SOL | Graphics | GPU | Rasterization SOL |
ROP Bottleneck | Graphics | GPU | ROP Is Bottleneck |
ROP SOL | Graphics | GPU | ROP SOL |
setup_primitive_count | Graphics | GPU | Count of primitives seen by the setup unit. |
shaded_pixel_count | Graphics | GPU | Number of rasterized pixels sent to the shading units. |
shader_busy | Graphics | GPU | Cycles the shader unit is busy. |
SHD Bottleneck | Graphics | GPU | SHD Is Bottleneck |
SHD SOL | Graphics | GPU | SHD SOL |
shd_l1_read_bytes | Graphics | Cache | Number of bytes transferred from the L1 unit by the shader unit. |
shd_l1_requests | Graphics | Cache | Number of L1 requests from the shader unit. |
shd_tex_read_bytes | Graphics | Cache | Number of bytes read from the texture unit by the shader unit. |
shd_tex_requests | Graphics | Cache | Number of texel read requests from the shader unit. |
sm_active_cycles | Compute | SM | Sum of cycles that SM was active. Increments by 0-NumSMs per cycle. |
sm_active_cycles_vsm0 | Both | SM | Number of cycles that this SM has at least one active warp. Increments by 0-1 per cycle per SM. |
sm_active_warps | Compute | SM | Sum of warps that SM was active. Increments by 0-64 per cycle per SM. |
sm_branches_diverged | Compute | SM | Increments by one if at least one thread in a warp diverges (that is, follows a different execution path) via a data dependent conditional branch. |
sm_branches_diverged_vsm0 | Compute | SM | Divergent branches by this VSM. This counter increments by one if at least one thread in a warp diverges (that is, follows a different execution path) via a data dependent conditional branch. Increments by 0-1 per cycle. |
sm_branches_executed | Compute | SM | Counts the number of branch instructions executed. |
sm_branches_executed_vsm0 | Compute | SM | Branches taken by this VSM. This counter increments by one if at least one thread in a warp takes the branch. Increments by 0-1 per cycle. |
sm_branches_taken | Compute | SM | Increments by one if at least one thread in a warp takes the branch. |
sm_branches_taken_vsm0 | Compute | SM | Increments by one if at least one thread in a warp takes the branch. Increments by 0-4 per cycle. |
sm_ctas_launched | Compute | SM | Thread blocks launched. Increments by 1 per thread block launched. |
sm_ctas_launched_vsm0 | Compute | SM | Thread blocks launched. Increments by 1 per thread block launched. |
sm_executed_ipc | Compute | SM | The average instructions executed per active cycle per SM. Final value is between 0 and 7. |
sm_inst_executed | Compute | SM | Instructions executed, not including replays. |
sm_inst_executed_atomics | Compute | Memory | ATOM instructions executed, including ATOM.CAS. |
sm_inst_executed_generic_loads | Compute | Memory | Generic load instructions executed. |
sm_inst_executed_generic_loads_vsm0 | Compute | Memory | Generic load instructions executed by this SM. Increments by 0-1 per cycle per SM. |
sm_inst_executed_generic_stores | Compute | Memory | Generic store instructions executed. |
sm_inst_executed_generic_stores_vsm0 | Compute | Memory | Generic store instructions executed by this SM. Increments by 0-1 per cycle per SM. |
sm_inst_executed_local_loads | Compute | Memory | Local load instructions executed. |
sm_inst_executed_local_loads_vsm0 | Compute | Memory | Local load instructions executed by this SM. Increments by 0-1 per cycle per SM. |
sm_inst_executed_local_stores | Compute | Memory | Local store instructions executed. |
sm_inst_executed_local_stores_vsm0 | Compute | Memory | Local store instructions executed by this SM. Increments by 0-1 per cycle per SM. |
sm_inst_executed_lsu_red_vsm0 | Compute | Memory | reduction in SM Quad0 GPC0.TPC0.SM |
sm_inst_executed_reductions | Compute | Memory | RED instructions executed. |
sm_inst_executed_shared_loads | Compute | Memory | Shared load instructions executed. |
sm_inst_executed_shared_loads_vsm0 | Compute | Memory | Shared load instructions executed by this SM. Increments by 0-1 per cycle per SM. |
sm_inst_executed_shared_stores | Compute | Memory | Shared store instructions executed. |
sm_inst_executed_shared_stores_vsm0 | Compute | Memory | Shared store instructions executed by this SM. Increments by 0-1 per cycle per SM. |
sm_inst_executed_surface_loads_byte | Compute | Memory | Surface load instructions (byte mode) executed. |
sm_inst_executed_surface_loads_byte_vsm0 | Compute | Memory | Surface load (byte mode) instructions executed by this SM. Increments by 0-1 per cycle per SM. |
sm_inst_executed_surface_loads_pixel | Compute | Memory | Surface load instructions (pixel mode) executed. |
sm_inst_executed_surface_loads_pixel_vsm0 | Compute | Memory | Surface load (pixel mode) instructions executed by this SM. Increments by 0-1 per cycle per SM. |
sm_inst_executed_surface_stores_byte | Compute | Memory | Surface store instructions (byte mode) executed. |
sm_inst_executed_surface_stores_byte_vsm0 | Compute | Memory | Surface store (byte mode) instructions executed by this SM. Increments by 0-1 per cycle per SM. |
sm_inst_executed_surface_stores_pixel | Compute | Memory | Surface store instructions (pixel mode) executed. |
sm_inst_executed_surface_stores_pixel_vsm0 | Compute | Memory | Surface store (pixel mode) instructions executed by this SM. Increments by 0-1 per cycle per SM. |
sm_inst_executed_texture | Compute | Cache | Texture instructions executed. |
sm_inst_executed_vsm0 | Compute | SM | Instructions executed in this SM, not including replays. Increments by 0-8 per cycle per SM. |
sm_inst_issued | Compute | SM | Instructions issued by the scheduler, including replays. |
sm_inst_issued_vsm0 | Compute | SM | Number of active cycles that this warp scheduler issued an instruction. |
sm_issued_ipc | Compute | SM | The average instructions issued per active cycle per SM. Final value is between 0 and 7. |
sm_pmevent_00 | Compute | SM | __prof_trigger00/pmevent instructions executed. |
sm_pmevent_00_vsm0 | Compute | SM | __prof_trigger00/pmevent instructions executed where at least 1 thread is not predicated off. Increments by 0-1 per warp instruction executed. |
sm_pmevent_01 | Compute | SM | __prof_trigger01/pmevent instructions executed. |
sm_pmevent_01_vsm0 | Compute | SM | __prof_trigger01/pmevent instructions executed where at least 1 thread is not predicated off. Increments by 0-1 per warp instruction executed. |
sm_pmevent_02 | Compute | SM | __prof_trigger02/pmevent instructions executed. |
sm_pmevent_02_vsm0 | Compute | SM | __prof_trigger02/pmevent instructions executed where at least 1 thread is not predicated off. Increments by 0-1 per warp instruction executed. |
sm_pmevent_03 | Compute | SM | __prof_trigger03/pmevent instructions executed. |
sm_pmevent_03_vsm0 | Compute | SM | __prof_trigger03/pmevent instructions executed where at least 1 thread is not predicated off. Increments by 0-1 per warp instruction executed. |
sm_pmevent_04 | Compute | SM | __prof_trigger04/pmevent instructions executed. |
sm_pmevent_04_vsm0 | Compute | SM | __prof_trigger04/pmevent instructions executed where at least 1 thread is not predicated off. Increments by 0-1 per warp instruction executed. |
sm_pmevent_05 | Compute | SM | __prof_trigger05/pmevent instructions executed. |
sm_pmevent_05_vsm0 | Compute | SM | __prof_trigger05/pmevent instructions executed where at least 1 thread is not predicated off. Increments by 0-1 per warp instruction executed. |
sm_pmevent_06 | Compute | SM | __prof_trigger06/pmevent instructions executed. |
sm_pmevent_06_vsm0 | Compute | SM | __prof_trigger06/pmevent instructions executed where at least 1 thread is not predicated off. Increments by 0-1 per warp instruction executed. |
sm_pmevent_07 | Compute | SM | __prof_trigger07/pmevent instructions executed. |
sm_pmevent_07_vsm0 | Compute | SM | __prof_trigger07/pmevent instructions executed where at least 1 thread is not predicated off. Increments by 0-1 per warp instruction executed. |
sm_thread_inst_executed | Compute | SM | Thread instructions executed, not including replays. |
sm_warps_launched | Compute | SM | Warps launched. Increments by 1 per warp launched. |
sm_warps_launched_vsm0 | Compute | SM | Warps launched. Increments by 1 per warp launched. |
Stream Out Bottleneck | Graphics | GPU | Stream Out Is Bottleneck |
Stream Out SOL | Graphics | GPU | Stream Out SOL |
stream_out_bytes | Graphics | GPU | Number of bytes streamed out. |
Tessellator SOL | Graphics | GPU | Tessellator SOL |
TEX Bottleneck | Graphics | GPU | TEX Is Bottleneck |
TEX SOL | Graphics | GPU | TEX SOL |
tex_bank_conflicts | Both | Cache | Bank conflicts occurred while accessing data from the texture units. |
tex_cache_hitrate | Both | Cache | Hit rate of texture cache queries. |
tex_cache_read_bytes | Both | Cache | Number of bytes read from all texture units. |
tex_cache_sector_queries | Both | Cache | Sector texture cache requests in all texture units. A sector is 32 bytes. |
tex0_bank_conflicts_gpc0_tpc0 | Both | Cache | Texture bank conflicts accurred while accessing data from the given texture unit in the TPC. |
tex0_cache_sector_misses_gpc0_tpc0 | Both | Cache | Sector texture cache misses in the given texture unit in the TPC. A sector is 32 bytes. |
tex0_cache_sector_queries_gpc0_tpc0 | Both | Cache | Sector texture cache requests in the given texture unit in the TPC. A sector is 32 bytes. |
tex0_cache_texel_queries | Graphics | Cache | Number of texture cache queries (32b each request) |
tex1_bank_conflicts_gpc0_tpc0 | Both | Cache | Texture bank conflicts accurred while accessing data from the given texture unit in the TPC. |
tex1_cache_sector_misses_gpc0_tpc0 | Both | Cache | Sector texture cache misses in the given texture unit in the TPC. A sector is 32 bytes. |
tex1_cache_sector_queries_gpc0_tpc0 | Both | Cache | Sector texture cache requests in the given texture unit in the TPC. A sector is 32 bytes. |
texture_busy | Graphics | GPU | Cycles the texture unit is busy. |
threads_launched | Compute | SM | Count the total number of threads launched for this TPC. |
threads_launched_gpc0_tpc0 | Compute | SM | Threads launched by this SM. Increments by 1 per thread launched. |
ZCull Bottleneck | Graphics | GPU | ZCull is the Bottleneck |
ZCull SOL | Graphics | GPU | ZCull SOL |
NVIDIA® Tegra Graphics Debugger Documentation Rev. 2.5.170811 ©2014-2017. NVIDIA Corporation. All Rights Reserved.