3.16. Occupancy

This section describes the occupancy calculation functions of the low-level CUDA driver application programming interface.

Functions

CUresult cuOccupancyMaxActiveBlocksPerMultiprocessor ( int* numBlocks, CUfunction func, int  blockSize, size_t dynamicSMemSize )
Returns occupancy of a function.
CUresult cuOccupancyMaxPotentialBlockSize ( int* minGridSize, int* blockSize, CUfunction func, CUoccupancyB2DSize blockSizeToDynamicSMemSize, size_t dynamicSMemSize, int  blockSizeLimit )
Suggest a launch configuration with reasonable occupancy.

Functions

CUresult cuOccupancyMaxActiveBlocksPerMultiprocessor ( int* numBlocks, CUfunction func, int  blockSize, size_t dynamicSMemSize )
Returns occupancy of a function.
Parameters
numBlocks
- Returned occupancy
func
- Kernel for which occupancy is calulated
blockSize
- Block size the kernel is intended to be launched with
dynamicSMemSize
- Per-block dynamic shared memory usage intended, in bytes
Description

Returns in *numBlocks the number of the maximum active blocks per streaming multiprocessor.

Note:

Note that this function may also return error codes from previous, asynchronous launches.

CUresult cuOccupancyMaxPotentialBlockSize ( int* minGridSize, int* blockSize, CUfunction func, CUoccupancyB2DSize blockSizeToDynamicSMemSize, size_t dynamicSMemSize, int  blockSizeLimit )
Suggest a launch configuration with reasonable occupancy.
Parameters
minGridSize
- Returned minimum grid size needed to achieve the maximum occupancy
blockSize
- Returned maximum block size that can achieve the maximum occupancy
func
- Kernel for which launch configuration is calulated
blockSizeToDynamicSMemSize
- A function that calculates how much per-block dynamic shared memory func uses based on the block size
dynamicSMemSize
- Dynamic shared memory usage intended, in bytes
blockSizeLimit
- The maximum block size func is designed to handle
Description

Returns in *blockSize a reasonable block size that can achieve the maximum occupancy (or, the maximum number of active warps with the fewest blocks per multiprocessor), and in *minGridSize the minimum grid size to achieve the maximum occupancy.

If blockSizeLimit is 0, the configurator will use the maximum block size permitted by the device / function instead.

If per-block dynamic shared memory allocation is not needed, the user should leave both blockSizeToDynamicSMemSize and dynamicSMemSize as 0.

If per-block dynamic shared memory allocation is needed, then if the dynamic shared memory size is constant regardless of block size, the size should be passed through dynamicSMemSize, and blockSizeToDynamicSMemSize should be NULL.

Otherwise, if the per-block dynamic shared memory size varies with different block sizes, the user needs to provide a unary function through blockSizeToDynamicSMemSize that computes the dynamic shared memory needed by func for any given block size. dynamicSMemSize is ignored. An example signature is:

‎    // Take block size, returns dynamic shared memory needed
          size_t blockToSmem(int blockSize);

Note:

Note that this function may also return error codes from previous, asynchronous launches.