The GPU Computing SDK includes 100+ code samples, utilities, whitepapers, and additional documentation to help you get started developing, porting, and optimizing your applications for the CUDA architecture. You can get quick access to many of the SDK resources on this page, SDK documentation, or download the complete SDK.
Please note that you may need to install the latest NVIDIA drivers and CUDA Toolkit to compile and run the code samples.
Refer to the SDK release notes for more information.
|
||
![]() This sample shows the implementation of multi-threaded heterogeneous computing workloads with tight cooperation between CPU and GPU. The new OpenCL 1.1 features user events, thread-safe API calls and event callbacks are utilized. |
![]() ![]() ![]()
Download - Windows (x86) |
|
|
||
![]() A simple test application that demonstrates a new CUDA 4.0 driver ability to embed PTX in a OpenCL kernel. |
![]() ![]() ![]()
Download - Windows (x86) |
|
|
||
![]() This sample enumerates the properties of the OpenCL devices present in the system. |
![]() ![]() ![]()
Download - Windows (x86) |
|
|
||
![]() This is a simple test program to measure the memcopy bandwidth of the GPU. It currently is capable of measuring device to device copy bandwidth, host to device and host to device copy bandwidth for pageable and page-locked memory, memory mapped and direct access. |
![]() ![]() ![]()
Download - Windows (x86) |
|
|
||
![]() Element by element addition of two 1-dimensional arrays. Implemented in OpenCL for CUDA GPU's, with functional comparison against a simple C++ host CPU implementation. |
![]() ![]() ![]()
Download - Windows (x86) |
|
|
||
![]() Dot Product (scalar product) of set of input vector pairs. Implemented in OpenCL for CUDA GPU's, with functional comparison against a simple C++ host CPU implementation. |
![]() ![]() ![]()
Download - Windows (x86) |
|
|
||
![]() Simple matrix-vector multiplication example showing increasingly optimized implementations. |
![]() ![]() ![]()
Download - Windows (x86) |
|
|
||
![]() This application demonstrates how to make use of multiple GPUs in OpenCL. |
![]() ![]() ![]()
Download - Windows (x86) |
|
|
||
![]() Simple program which demonstrates interoperability between OpenCL and OpenGL. The program modifies vertex positions with OpenCL and uses OpenGL to render the geometry. |
![]() ![]() ![]()
Download - Windows (x86) |
|
|
||
![]() Simple program which demonstrates Direct3D10 texture interoperability with OpenCL. The program creates a number of D3D10 textures (2D, 3D, and CubeMap) which are written to from OpenCL kernels. Direct3D then renders the results on the screen. |
![]() ![]() ![]() |
|
|
||
![]() Simple program which demonstrates Direct3D9 texture interoperability with OpenCL. The program creates a number of D3D9 textures (2D, 3D, and CubeMap) which are written to from OpenCL kernels. Direct3D then renders the results on the screen. |
![]() ![]() ![]() |
|
|
||
![]() This sample implements matrix multiplication and is exactly the same as Chapter 6 of the programming guide. It has been written for clarity of exposition to illustrate various OpenCL programming principles, not with the goal of providing the most performant generic kernel for matrix multiplication. CUBLAS provides high-performance matrix multiplication. |
![]() ![]() ![]()
Download - Windows (x86) |
|
|
||
![]() This sample demonstrates a very fast and efficient parallel radix sort implemented in OpenCL for CUDA GPUs. |
![]() ![]() ![]()
Download - Windows (x86) |
|
|
||
![]() This sample evaluates fair call and put prices for a given set of European options by Black-Scholes formula. |
![]() ![]() ![]()
Download - Windows (x86) |
|
|
||
![]() This sample implements Niederreiter quasirandom number generator and Moro's Inverse Cumulative Normal Distribution generator. |
![]() ![]() ![]()
Download - Windows (x86) |
|
|
||
![]() This sample implements Mersenne Twister random number generator and Cartesian Box-Muller transformation on the GPU. |
![]() ![]() ![]()
Download - Windows (x86) |
|
|
||
![]() Simple example that demonstrates use of 3D textures in OpenCL. |
![]() ![]() ![]()
Download - Windows (x86) |
|
|
||
![]() Linear 2-dimensional variable-width Box Filter of RGBA image. Implemented in OpenCL for CUDA GPU's, with performance comparison against simple C++ on host CPU. Each of the R, G, B and A channels are treated independently with results computed concurrently for each. |
![]() ![]() ![]()
Download - Windows (x86) |
|
|
||
![]() 2-dimensional 3x3 Sobel Magnitude Filter of RGBA image. Implemented in OpenCL for CUDA GPU's, with performance comparison against simple C++ on host CPU. Gradient magnitude for each of the R, G & B channels is computed concurrently and independently, then combined into a single gradient intensity with linear weighting factors. |
![]() ![]() ![]()
Download - Windows (x86) |
|
|
||
![]() Multi-GPU enabled, 2-dimensional 3x3 Median Filter of RGBA image. Implemented in OpenCL for CUDA GPU's, with performance comparison against simple C++ on host CPU. Each of the R, G & B channels are treated independently with results computed concurrently for each. |
![]() ![]() ![]()
Download - Windows (x86) |
|
|
||
![]() This sample implements convolution filter of a 2D image with arbitrary separable kernel. |
![]() ![]() ![]()
Download - Windows (x86) |
|
|
||
![]() This sample demonstrates basic volume rendering using 3D textures. |
![]() ![]() ![]()
Download - Windows (x86) |