2.4. Stream Management
This section describes the stream management functions of the CUDA runtime application programming interface.
Typedefs
- typedef void(CUDART_CB* cudaStreamCallback_t )( cudaStream_t stream, cudaError_t status, void* userData )
Functions
- cudaError_t cudaStreamAddCallback ( cudaStream_t stream, cudaStreamCallback_t callback, void* userData, unsigned int flags )
- Add a callback to a compute stream.
- cudaError_t cudaStreamCreate ( cudaStream_t* pStream )
- Create an asynchronous stream.
- cudaError_t cudaStreamCreateWithFlags ( cudaStream_t* pStream, unsigned int flags )
- Create an asynchronous stream.
- cudaError_t cudaStreamCreateWithPriority ( cudaStream_t* pStream, unsigned int flags, int priority )
- Create an asynchronous stream with the specified priority.
- cudaError_t cudaStreamDestroy ( cudaStream_t stream )
- Destroys and cleans up an asynchronous stream.
- cudaError_t cudaStreamGetFlags ( cudaStream_t hStream, unsigned int* flags )
- Query the flags of a stream.
- cudaError_t cudaStreamGetPriority ( cudaStream_t hStream, int* priority )
- Query the priority of a stream.
- cudaError_t cudaStreamQuery ( cudaStream_t stream )
- Queries an asynchronous stream for completion status.
- cudaError_t cudaStreamSynchronize ( cudaStream_t stream )
- Waits for stream tasks to complete.
- cudaError_t cudaStreamWaitEvent ( cudaStream_t stream, cudaEvent_t event, unsigned int flags )
- Make a compute stream wait on an event.
Typedefs
- void(CUDART_CB* cudaStreamCallback_t )( cudaStream_t stream, cudaError_t status, void* userData )
-
Type of stream callback functions.
- stream
- The stream as passed to cudaStreamAddCallback, may be NULL.
- cudaError_t status
- userData
- User parameter provided at registration.
Parameters
Functions
- cudaError_t cudaStreamAddCallback ( cudaStream_t stream, cudaStreamCallback_t callback, void* userData, unsigned int flags )
-
Add a callback to a compute stream.
Parameters
- stream
- - Stream to add callback to
- callback
- - The function to call once preceding stream operations are complete
- userData
- - User specified data to be passed to the callback function
- flags
- - Reserved for future use, must be 0
Description
Adds a callback to be called on the host after all currently enqueued items in the stream have completed. For each cudaStreamAddCallback call, a callback will be executed exactly once. The callback will block later work in the stream until it is finished.
The callback may be passed cudaSuccess or an error code. In the event of a device error, all subsequently executed callbacks will receive an appropriate cudaError_t.
Callbacks must not make any CUDA API calls. Attempting to use CUDA APIs will result in cudaErrorNotPermitted. Callbacks must not perform any synchronization that may depend on outstanding device work or other callbacks that are not mandated to run earlier. Callbacks without a mandated order (in independent streams) execute in undefined order and may be serialized.
This API requires compute capability 1.1 or greater. See cudaDeviceGetAttribute or cudaGetDeviceProperties to query compute capability. Calling this API with an earlier compute version will return cudaErrorNotSupported.
Note:Note that this function may also return error codes from previous, asynchronous launches.
See also:
cudaStreamCreate, cudaStreamCreateWithFlags, cudaStreamQuery, cudaStreamSynchronize, cudaStreamWaitEvent, cudaStreamDestroy
- cudaError_t cudaStreamCreate ( cudaStream_t* pStream )
-
Create an asynchronous stream.
Parameters
- pStream
- - Pointer to new stream identifier
Returns
Description
Creates a new asynchronous stream.
Note:Note that this function may also return error codes from previous, asynchronous launches.
See also:
cudaStreamCreateWithPriority, cudaStreamCreateWithFlags, cudaStreamGetPriority, cudaStreamGetFlags, cudaStreamQuery, cudaStreamSynchronize, cudaStreamWaitEvent, cudaStreamAddCallback, cudaStreamDestroy
- cudaError_t cudaStreamCreateWithFlags ( cudaStream_t* pStream, unsigned int flags )
-
Create an asynchronous stream.
Parameters
- pStream
- - Pointer to new stream identifier
- flags
- - Parameters for stream creation
Returns
Description
Creates a new asynchronous stream. The flags argument determines the behaviors of the stream. Valid values for flags are
-
cudaStreamDefault: Default stream creation flag.
-
cudaStreamNonBlocking: Specifies that work running in the created stream may run concurrently with work in stream 0 (the NULL stream), and that the created stream should perform no implicit synchronization with stream 0.
Note:Note that this function may also return error codes from previous, asynchronous launches.
See also:
cudaStreamCreate, cudaStreamCreateWithPriority, cudaStreamGetFlags, cudaStreamQuery, cudaStreamSynchronize, cudaStreamWaitEvent, cudaStreamAddCallback, cudaStreamDestroy
- cudaError_t cudaStreamCreateWithPriority ( cudaStream_t* pStream, unsigned int flags, int priority )
-
Create an asynchronous stream with the specified priority.
Parameters
- pStream
- - Pointer to new stream identifier
- flags
- - Flags for stream creation. See cudaStreamCreateWithFlags for a list of valid flags that can be passed
- priority
- - Priority of the stream. Lower numbers represent higher priorities. See cudaDeviceGetStreamPriorityRange for more information about the meaningful stream priorities that can be passed.
Returns
Description
Creates a stream with the specified priority and returns a handle in pStream. This API alters the scheduler priority of work in the stream. Work in a higher priority stream may preempt work already executing in a low priority stream.
priority follows a convention where lower numbers represent higher priorities. '0' represents default priority. The range of meaningful numerical priorities can be queried using cudaDeviceGetStreamPriorityRange. If the specified priority is outside the numerical range returned by cudaDeviceGetStreamPriorityRange, it will automatically be clamped to the lowest or the highest number in the range.
Note:-
Note that this function may also return error codes from previous, asynchronous launches.
-
Stream priorities are supported only on Quadro and Tesla GPUs with compute capability 3.5 or higher.
-
In the current implementation, only compute kernels launched in priority streams are affected by the stream's priority. Stream priorities have no effect on host-to-device and device-to-host memory operations.
See also:
cudaStreamCreate, cudaStreamCreateWithFlags, cudaDeviceGetStreamPriorityRange, cudaStreamGetPriority, cudaStreamQuery, cudaStreamWaitEvent, cudaStreamAddCallback, cudaStreamSynchronize, cudaStreamDestroy
- cudaError_t cudaStreamDestroy ( cudaStream_t stream )
-
Destroys and cleans up an asynchronous stream.
Parameters
- stream
- - Stream identifier
Description
Destroys and cleans up the asynchronous stream specified by stream.
In case the device is still doing work in the stream stream when cudaStreamDestroy() is called, the function will return immediately and the resources associated with stream will be released automatically once the device has completed all work in stream.
Note:Note that this function may also return error codes from previous, asynchronous launches.
See also:
cudaStreamCreate, cudaStreamCreateWithFlags, cudaStreamQuery, cudaStreamWaitEvent, cudaStreamSynchronize, cudaStreamAddCallback
- cudaError_t cudaStreamGetFlags ( cudaStream_t hStream, unsigned int* flags )
-
Query the flags of a stream.
Parameters
- hStream
- - Handle to the stream to be queried
- flags
- - Pointer to an unsigned integer in which the stream's flags are returned
Description
Query the flags of a stream. The flags are returned in flags. See cudaStreamCreateWithFlags for a list of valid flags.
Note:Note that this function may also return error codes from previous, asynchronous launches.
See also:
cudaStreamCreateWithPriority, cudaStreamCreateWithFlags, cudaStreamGetPriority
- cudaError_t cudaStreamGetPriority ( cudaStream_t hStream, int* priority )
-
Query the priority of a stream.
Parameters
- hStream
- - Handle to the stream to be queried
- priority
- - Pointer to a signed integer in which the stream's priority is returned
Description
Query the priority of a stream. The priority is returned in in priority. Note that if the stream was created with a priority outside the meaningful numerical range returned by cudaDeviceGetStreamPriorityRange, this function returns the clamped priority. See cudaStreamCreateWithPriority for details about priority clamping.
Note:Note that this function may also return error codes from previous, asynchronous launches.
See also:
cudaStreamCreateWithPriority, cudaDeviceGetStreamPriorityRange, cudaStreamGetFlags
- cudaError_t cudaStreamQuery ( cudaStream_t stream )
-
Queries an asynchronous stream for completion status.
Parameters
- stream
- - Stream identifier
Description
Returns cudaSuccess if all operations in stream have completed, or cudaErrorNotReady if not.
Note:Note that this function may also return error codes from previous, asynchronous launches.
See also:
cudaStreamCreate, cudaStreamCreateWithFlags, cudaStreamWaitEvent, cudaStreamSynchronize, cudaStreamAddCallback, cudaStreamDestroy
- cudaError_t cudaStreamSynchronize ( cudaStream_t stream )
-
Waits for stream tasks to complete.
Parameters
- stream
- - Stream identifier
Description
Blocks until stream has completed all operations. If the cudaDeviceScheduleBlockingSync flag was set for this device, the host thread will block until the stream is finished with all of its tasks.
Note:Note that this function may also return error codes from previous, asynchronous launches.
See also:
cudaStreamCreate, cudaStreamCreateWithFlags, cudaStreamQuery, cudaStreamWaitEvent, cudaStreamAddCallback, cudaStreamDestroy
- cudaError_t cudaStreamWaitEvent ( cudaStream_t stream, cudaEvent_t event, unsigned int flags )
-
Make a compute stream wait on an event.
Parameters
- stream
- - Stream to wait
- event
- - Event to wait on
- flags
- - Parameters for the operation (must be 0)
Description
Makes all future work submitted to stream wait until event reports completion before beginning execution. This synchronization will be performed efficiently on the device. The event event may be from a different context than stream, in which case this function will perform cross-device synchronization.
The stream stream will wait only for the completion of the most recent host call to cudaEventRecord() on event. Once this call has returned, any functions (including cudaEventRecord() and cudaEventDestroy()) may be called on event again, and the subsequent calls will not have any effect on stream.
If stream is NULL, any future work submitted in any stream will wait for event to complete before beginning execution. This effectively creates a barrier for all future work submitted to the device on this thread.
If cudaEventRecord() has not been called on event, this call acts as if the record has already completed, and so is a functional no-op.
Note:Note that this function may also return error codes from previous, asynchronous launches.
See also:
cudaStreamCreate, cudaStreamCreateWithFlags, cudaStreamQuery, cudaStreamSynchronize, cudaStreamAddCallback, cudaStreamDestroy