SLI Zone
NVIDIA.com Developer Home

NVIDIA OpenCL SDK Code Samples

The GPU Computing SDK provides examples with source code, utilities, and white papers to help you get started writing GPU Computing software. The full SDK includes dozens of code samples covering a wide range of applications.

Refer to the following README for related SDK information ( README )

The latest NVIDIA display drivers are required to run code samples. Please obtain the latest display driver here.

The NVIDIA OpenCL Toolkit is required to compile code samples. Please obtain the OpenCL Toolkit from here.

Select the category to view:

OpenCL Device Query For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample enumerates the properties of the OpenCL devices present in the system.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Bandwidth Test For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This is a simple test program to measure the memcopy bandwidth of the GPU. It currently is capable of measuring device to device copy bandwidth, host to device and host to device copy bandwidth for pageable and page-locked memory, memory mapped and direct access.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Vector Addition For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Element by element addition of two 1-dimensional arrays. Implemented in OpenCL for CUDA GPU's, with functional comparison against a simple C++ host CPU implementation.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Dot Product For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Dot Product (scalar product) of set of input vector pairs. Implemented in OpenCL for CUDA GPU's, with functional comparison against a simple C++ host CPU implementation.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Matrix Vector Multiplication For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Simple matrix-vector multiplication example showing increasingly optimized implementations.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Simple Multi-GPU For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This application demonstrates how to make use of multiple GPUs in OpenCL.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Simple OpenGL Interop For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Simple program which demonstrates interoperability between OpenCL and OpenGL. The program modifies vertex positions with OpenCL and uses OpenGL to render the geometry.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Scan For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This example demonstrates an efficient OpenCL implementation of parallel prefix sum, also known as "scan". Given an array of numbers, scan computes a new array in which each element is the sum of all the elements before it in the input array.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


Parallel Reduction For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

A parallel sum reduction that computes the sum of large arrays of values. This sample demonstrates several important optimization strategies for parallel algorithms like reduction.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Matrix Transpose For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Efficient matrix transpose.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Matrix Multiplication For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample implements matrix multiplication and is exactly the same as Chapter 6 of the programming guide. It has been written for clarity of exposition to illustrate various OpenCL programming principles, not with the goal of providing the most performant generic kernel for matrix multiplication. CUBLAS provides high-performance matrix multiplication.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL 3D FDTD For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample applies a finite differences time domain progression stencil on a 3D surface.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL DCT 8x8 For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample demonstrates how Discrete Cosine Transform (DCT) for 8x8 blocks can be implemented in OpenCL.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL DirectX Texture Compressor (DXTC) For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

High Quality DXT Compression using OpenCL. This example shows how to implement an existing computationally-intensive CPU compression algorithm in parallel on the GPU, and obtain an order of magnitude performance improvement.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU



Whitepaper
Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Radix Sort For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample demonstrates a very fast and efficient parallel radix sort implemented in OpenCL for CUDA GPUs.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Sorting Networks For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample implements bitonic sort algorithm for batches of short arrays
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Black-Scholes Option Pricing For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample evaluates fair call and put prices for a given set of European options by Black-Scholes formula.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU



Whitepaper
Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Quasirandom Generator For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample implements Niederreiter quasirandom number generator and Moro's Inverse Cumulative Normal Distribution generator.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Mersenne Twister For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample implements Mersenne Twister random number generator and Cartesian Box-Muller transformation on the GPU.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL 64-bin and 256-bin Histogram For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample demonstrates efficient implementation of 64-bin and 256-bin histograms.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU



Whitepaper
Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Post-Process OpenGL-Rendered Image For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample shows how to post-process an image rendered in OpenGL using OpenCL.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


Simple Texture 3D For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Simple example that demonstrates use of 3D textures in OpenCL.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Box Filter 8x8 For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Linear 2-dimensional 8x8 Box Filter of RGBA image. Implemented in OpenCL for CUDA GPU's, with performance comparison against simple C++ on host CPU. Each of the R, G, B and A channels are treated independently with results computed concurrently for each.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Sobel Filter For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

2-dimensional 3x3 Sobel Magnitude Filter of RGBA image. Implemented in OpenCL for CUDA GPU's, with performance comparison against simple C++ on host CPU. Gradient magnitude for each of the R, G & B channels is computed concurrently and independently, then combined into a single gradient intensity with linear weighting factors.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Median Filter For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Multi-GPU enabled, 2-dimensional 3x3 Median Filter of RGBA image. Implemented in OpenCL for CUDA GPU's, with performance comparison against simple C++ on host CPU. Each of the R, G & B channels are treated independently with results computed concurrently for each.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Separable Convolution For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample implements convolution filter of a 2D image with arbitrary separable kernel.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Recursive Gaussian Filter For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

2-dimensional Gaussian Blur Filter of RGBA image using IRF method. Implemented in OpenCL for CUDA GPU's, with performance comparison against simple C++ on host CPU. Each of the R, G, B and A channels are treated independently with results computed concurrently for each.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Volume rendering For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample demonstrates basic volume rendering using 3D textures.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL Particle Collision Simulation For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Simulation of elastic collisions of a large # of bodies. Implemented in OpenCL for CUDA GPU's.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


OpenCL N-Body Physics Simulation For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Gravitational Simulation of a large # of bodies. Implemented in OpenCL for CUDA GPU's.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU




Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

Last Update: 2/28/2010
NVPerfHUD 4