SLI Zone
NVIDIA.com Developer Home

NVIDIA CUDA SDK Code Samples

The CUDA Developer SDK provides examples with source code, utilities, and white papers to help you get started writing software with CUDA. The SDK includes dozens of code samples covering a wide range of applications including:

  • Simple techniques such as C++ code integration and efficient loading of custom datatypes
  • How-To examples covering CUDA BLAS and FFT libraries, texture fetching in CUDA, and CUDA interoperation with the OpenGL and Direct3D graphics APIS
  • Linear algebra primitives such as matrix transpose and matrix-matrix multiplication
  • Data-parallel algorithms such as parallel prefix sum of large arrays
  • Performance: profiling using timers and bandwidth tests
  • Advanced application examples such as image convolution, Black-Scholes options pricing and binomial options pricing
Refer to the following READMEs for more information ( Linux , Windows )

This code is released free of charge for use in derivative works, whether academic, commercial, or personal. (Full License)

The NVIDIA CUDA Toolkit is required to run and compile code samples. Please obtain the CUDA Toolkit here


64-bin Histogram For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample demonstrates efficient implementation of 64-bin histogram.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


Separable Convolution For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample implements a separable convolution filter of a 2D signal with a gaussian kernel.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


Shader-based separable convolution For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Shader-based implementation of a separable 2D convolution with a gaussian kernel. Used for performance comparison against convolutionSeparable.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


convolutionFFT2D For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample demonstrates how 2D convolutions with very large kernel sizes can be efficiently implemented using FFT transformations.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


Sobel Filter For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample implements the Sobel edge detection filter for 8-bit monochrome images.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


MersenneTwister For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample implements Mersenne Twister random number generator and Cartesian Box-Muller transformation on the GPU.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


Monte-Carlo Option Pricing For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample evaluates fair call price for a given set of European options using Monte-Carlo approach.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


Black-Scholes Option Pricing For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample evaluates fair call and put prices for a given set of European options by Black-Scholes formula.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


Binomial Option Pricing For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample evaluates fair call price for a given set of European options under binomial model.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


Fluids (OpenGL Version) For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

An example of fluid simulation using CUDA and CUFFT, with OpenGL rendering.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


Fluids (Direct3D Version) For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

An example of fluid simulation using CUDA and CUFFT, with Direct3D 9 rendering.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows


Image denoising For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample demonstrates two adaptive image denoising technqiues: KNN and NLM, based on computation of both geometric and color distance between texels. While both techniques are implemented in the DirectX SDK using shaders, massively speeded up variation of the latter techique, taking advantage of shared memory, is implemented in addition to DirectX counterparts.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


DirectX Texture Compressor (DXTC) For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Simple DXT compressor.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


Post-Process in OpenGL For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample shows how to post-process an image rendered in OpenGL using CUDA.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


Box Filter For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Fast image box filter using CUDA with OpenGL rendering.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


Scan For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This example demonstrates an efficient CUDA implementation of parallel prefix sum, also known as "scan". Given an array of numbers, scan computes a new array in which each element is the sum of all the elements before it in the input array.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


Scan of Large Arrays For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This example demonstrates an efficient CUDA implementation of parallel prefix sum (also known as "scan") for arbitrary-sized arrays. Given an array of numbers, scan computes a new array in which each element is the sum of all the elements before it in the input array.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


Bitonic Sort For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Bitonic sort is a very simple parallel sorting algorithm that is very efficient when sorting a small number of elements: http://citeseer.ist.psu.edu/blelloch98experimental.html This implementation is based on: http://www.tools-of-computing.com/tc/CS/Sorts/bitonic_sort.htm
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


1D Discrete Haar Wavelet Decomposition For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Discrete Haar wavelet decomposition for 1D signals with a length which is a power of 2.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


Matrix Transpose For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Efficient matrix transpose.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


Scalar Product For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample calculates scalar products of a given set of input vector pairs.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


Matrix Multiplication For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample implements matrix multiplication and is exactly the same as Chapter 6 of the programming guide. It has been written for clarity of exposition to illustrate various CUDA programming principles, not with the goal of providing the most performant generic kernel for matrix multiplication. CUBLAS provides high-performance matrix multiplication.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


Matrix Multiplication (Driver Version) For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample implements matrix multiplication using the CUDA driver API. It has been written for clarity of exposition to illustrate various CUDA programming principles, not with the goal of providing the most performant generic kernel for matrix multiplication. CUBLAS provides high-performance matrix multiplication.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


Bandwidth Test For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This is a simple test program to measure the memcopy bandwidth of the GPU. It currently is capable of measuring device to device copy bandwidth, host to device copy bandwidth for pageable and page-locked memory, and device to host copy bandwidth for pageable and page-locked memory.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


Clock For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This example shows how to use the clock function to measure the performance of kernel accurately.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


Multi-GPU For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This application demonstrates how to use the CUDA api to use multiple GPUs.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


Simple CUBLAS For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Example of using CUBLAS.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


Simple CUFFT For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Example of using CUFFT. In this example, CUFFT is used to compute the 1D-convolution of some signal with some filter by transforming both into frequency domain, multiplying them together, and transforming the signal back to time domain.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


Simple Texture For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Simple example that demonstrates use of textures in CUDA.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


Simple Texture (Driver Version) For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Simple example that demonstrates use of textures in CUDA using the driver API.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


Simple OpenGL For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Simple program which demonstrates interoperability between CUDA and OpenGL. The program modifies vertex positions with CUDA and uses OpenGL to render the geometry.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


Simple Direct3D For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Simple program which demonstrates interoperability between CUDA and Direct3D.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows


Simple Templates For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample is a templatized version of the template project. It also shows how to correctly templatize dynamically allocated shared memory arrays.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


Aligned Types For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

A simple test, showing huge access speed gap between aligned and misaligned structures.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


C++ Integration For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This example demonstrates how to integrate CUDA into an existing C++ application, i.e. the CUDA entry point on host side is only a function which is called from C++ code and only the file containing this function is compiled with nvcc. It also demonstrates that vector types can be used from cpp.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


Template For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

A trivial template project that can be used as a starting point to create new CUDA projects.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


Device Query For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample enumerates the properties of the CUDA devices present in the system.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux

Last Update: 6/27/2007
NVPerfHUD 4