NVIDIA CUDA C SDK - Featured Code Samples

The GPU Computing SDK includes 100+ code samples, utilities, whitepapers, and additional documentation to help you get started developing, porting, and optimizing your applications for the CUDA architecture. You can get quick access to many of the SDK resources on this page, SDK documentation, or download the complete SDK.

Please note that you may need to install the latest NVIDIA drivers $sdkTextto compile and run the code samples.

Refer to the SDK release notes for more information.

February 2012

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. CUDA Segmentation Tree Thrust Library
This sample demonstrates an approach to the image segmentation trees construction. This method is based on Boruvka's MST algorithm.
  Minimum Required GPU
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

September 2011

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. simpleAssert
This CUDA Runtime API sample is a very basic sample that implements how to use the assert function in the device code. Requires Compute Capability 2.0 .
  Minimum Required GPU
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Simple Cubemap Texture
Simple example that demonstrates how to use a new CUDA 4.1 feature to support cubemap Textures in CUDA C.
  Minimum Required GPU
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

August 2011

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Volumetric Filtering with 3D Textures and Surface Writes
This sample demonstrates 3D Volumetric Filtering using 3D Textures and 3D Surface Writes.
  Minimum Required GPU
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Optical Flow
Variational optical flow estimation example. Uses textures for image operations. Shows how simple PDE solver can be accelerated with CUDA.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. NewDelete
This sample demonstrates dynamic global memory allocation through device C++ new and delete operators and virtual function declarations available with CUDA 4.0.
  Minimum Required GPU
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

February 2011

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Simple Peer-to-Peer Transfers with Multi-GPU
This application demonstrates the new CUDA 4.0 APIs that support Peer-To-Peer (P2P) copies, Peer-To-Peer (P2P) addressing, and UVA (Unified Virtual Memory Addressing) between multiple Tesla GPUs.
  Minimum Required GPU
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Using Inline PTX
A simple test application that demonstrates a new CUDA 4.0 ability to embed PTX in a CUDA kernel.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Simple Layered Texture
Simple example that demonstrates how to use a new CUDA 4.0 feature to support layered Textures in CUDA C.
  Minimum Required GPU
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

August 2010

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. VFlockingD3D10
This sample demonstrates a CUDA mathematical simulation of group of birds behavior when in flight.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. CUDA Video Encode (C Library) API
This sample demonstrates how to effectively use the CUDA Video Encoder API encode H.264 video. Video input in YUV formats are taken as input (either CPU system or GPU memory) and video output frames are encoded to an H.264 file
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. SLI D3D10 Texture
Simple program which demonstrates SLI with Direct3D10 Texture interoperability with CUDA. The program creates a D3D10 Texture which is written to from a CUDA kernel. Direct3D then renders the results on the screen. A Direct3D Capable device is required.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

July 2010

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Bilateral Filter
Bilateral filter is an edge-preserving non-linear smoothing filter that is implemented with CUDA with OpenGL rendering. It can be used in image recovery and denoising. Each pixel is weight by considering both the spatial distance and color distance between its neibors. Reference:"C. Tomasi, R. Manduchi, Bilateral Filtering for Gray and Color Images, proceeding of the ICCV, 1998, http://users.soe.ucsc.edu/~manduchi/Papers/ICCV98.pdf"
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. simplePrintf
This CUDA Runtime API sample is a very basic sample that implements how to use the printf function in the device code. Specifically, for devices with compute capability less than 2.0, the function cuPrintf is called; otherwise, printf can be used directly.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Simple Surface Write
Simple example that demonstrates the use of 2D surface references (Write-to-Texture)
  Minimum Required GPU
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

April 2010

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Function Pointers
This sample illustrates how to use function pointers and implements the Sobel Edge Detection filter for 8-bit monochrome images.
  Minimum Required GPU
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Interval
Interval arithmetic operators and example. Uses various C++ features (templates and recursion). The recursive mode requires Compute SM 2.0 capabilities.
  Minimum Required GPU
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

March 2010

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Simple D3D11 Texture
Simple program which demonstrates Direct3D11 Texture interoperability with CUDA. The program creates a number of D3D11 Textures (2D, 3D, and CubeMap) which are written to from CUDA kernels. Direct3D then renders the results on the screen. A Direct3D Capable device is required.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

February 2010

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Simple Multi Copy and Compute
Supported in GPUs with Compute Capability 1.1, overlaping compute with one memcopy is possible from the host system. For Quadro and Tesla GPUs with Compute Capability 2.0, a second overlapped copy operation in either direction at full speed is possible (PCI-e is symmetric). This sample illustrates the usage of CUDA streams to achieve overlapping of kernel execution with data copies to and from the device.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. simpleMPI
Simple example demonstrating how to use MPI in combination with CUDA. This executable is not pre-built with the SDK installer.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

August 2009

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Vector Addition
This CUDA Runtime API sample is a very basic sample that implements element by element vector addition. It is the same as the sample illustrating Chapter 3 of the programming guide with some additions like error checking.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

July 2009

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Vector Addition Driver API
This Vector Addition sample is a basic sample that is implemented element by element. It is the same as the sample illustrating Chapter 3 of the programming guide with some additions like error checking. This sample also uses the new CUDA 4.0 kernel launch Driver API.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

June 2009

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Device Query
This sample enumerates the properties of the CUDA devices present in the system.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

May 2009

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Device Query Driver API
This sample enumerates the properties of the CUDA devices present using CUDA Driver API calls
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

April 2009

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Template
A trivial template project that can be used as a starting point to create new CUDA projects.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Template using CUDA Runtime
A trivial template project that can be used as a starting point to create new CUDA Runtime API projects.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

March 2009

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. C++ Integration
This example demonstrates how to integrate CUDA into an existing C++ application, i.e. the CUDA entry point on host side is only a function which is called from C++ code and only the file containing this function is compiled with nvcc. It also demonstrates that vector types can be used from cpp.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

February 2009

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Bandwidth Test
This is a simple test program to measure the memcopy bandwidth of the GPU and memcpy bandwidth across PCI-e. This test application is capable of measuring device to device copy bandwidth, host to device copy bandwidth for pageable and page-locked memory, and device to host copy bandwidth for pageable and page-locked memory.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

January 2009

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Excel 2007 CUDA Integration Example
This sample demonstrates how to integrate Excel 2007 with CUDA using array formulas. This plug-in depends on the Microsoft Excel Developer Kit. This sample is not pre-built with the CUDA SDK.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Excel 2010 CUDA Integration Example
This sample demonstrates how to integrate Excel 2010 with CUDA using array formulas. This plug-in depends on the Microsoft Excel 2010 Developer Kit, which can be downloaded from the Microsoft Developer website. This sample is not pre-built with the CUDA SDK.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. asyncAPI
This sample uses CUDA streams and events to overlap execution on CPU and GPU.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

December 2008

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Clock
This example shows how to use the clock function to measure the performance of kernel accurately.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

November 2008

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Simple Atomic Intrinsics
A simple demonstration of global memory atomic instructions. Requires Compute Capability 1.1 or higher.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

October 2008

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Pitch Linear Texture
Use of Pitch Linear Textures
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

September 2008

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. simpleStreams
This sample uses CUDA streams to overlap kernel executions with memory copies between the host and a GPU device. This sample uses a new CUDA 4.0 feature that supports pinning of generic host memory. Requires Compute Capability 1.1 or higher.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

August 2008

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Simple Templates
This sample is a templatized version of the template project. It also shows how to correctly templatize dynamically allocated shared memory arrays.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

July 2008

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. CUDA C 3D FDTD
This sample applies a finite differences time domain progression stencil on a 3D surface.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Simple Texture
Simple example that demonstrates use of Textures in CUDA.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

June 2008

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Simple Texture (Driver Version)
Simple example that demonstrates use of Textures in CUDA. This sample uses the new CUDA 4.0 kernel launch Driver API.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

May 2008

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Simple Vote Intrinsics
Simple program which demonstrates how to use the Vote (any, all) intrinsic instruction in a CUDA kernel. Requires Compute Capability 1.2 or higher.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

April 2008

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. simpleZeroCopy
This sample illustrates how to use Zero MemCopy, kernels can read and write directly to pinned system memory. This sample requires GPUs that support this feature (MCP79 and GT200).
  Minimum Required GPU
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

March 2008

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. CUDA Context Thread Management
Simple program illustrating how to the CUDA Context Management API and uses the new CUDA 4.0parameter passing and CUDA launch API. CUDA contexts can be created separately and attached independently to different threads.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

December 2007

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Simple Multi-GPU
This application demonstrates how to use the new CUDA 4.0 API for CUDA context management and multi-threaded access to run CUDA kernels on multiple-GPUs.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

November 2007

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Simple Direct3D9 (Vertex Arrays)
Simple program which demonstrates interoperability between CUDA and Direct3D9. The program generates a vertex array with CUDA and uses Direct3D9 to render the geometry. A Direct3D capable device is required.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

October 2007

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Simple D3D9 Texture
Simple program which demonstrates Direct3D9 Texture interoperability with CUDA. The program creates a number of D3D9 Textures (2D, 3D, and CubeMap) which are written to from CUDA kernels. Direct3D then renders the results on the screen. A Direct3D capable device is required.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

September 2007

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Simple Direct3D10 (Vertex Array)
Simple program which demonstrates interoperability between CUDA and Direct3D10. The program generates a vertex array with CUDA and uses Direct3D10 to render the geometry. A Direct3D Capable device is required.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

July 2007

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Simple D3D10 Texture
Simple program which demonstrates how to interoperate CUDA with Direct3D10 Texture. The program creates a number of D3D10 Textures (2D, 3D, and CubeMap) which are generated from CUDA kernels. Direct3D then renders the results on the screen. A Direct3D10 Capable device is required.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

June 2007

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Simple OpenGL
Simple program which demonstrates interoperability between CUDA and OpenGL. The program modifies vertex positions with CUDA and uses OpenGL to render the geometry.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

May 2007

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Simple Texture 3D
Simple example that demonstrates use of 3D Textures in CUDA.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

April 2007

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. cudaOpenMP
This sample demonstrates how to use OpenMP API to write an application for multiple GPUs. This executable is not pre-built with the SDK installer.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

March 2007

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Matrix Multiplication (CUDA Runtime API Version)
This sample implements matrix multiplication and is exactly the same as Chapter 6 of the programming guide. It has been written for clarity of exposition to illustrate various CUDA programming principles, not with the goal of providing the most performant generic kernel for matrix multiplication. To illustrate GPU performance for matrix multiply, this sample also shows how to use the new CUDA 4.0 interface for CUBLAS to demonstrate high-performance performance for matrix multiplication.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

January 2007

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Matrix Multiplication (CUDA Driver API version with Dynamic Linking Version)
This sample revisits matrix multiplication using the CUDA driver API. It demonstrates how to link to CUDA driver at runtime and how to use JIT (just-in-time) compilation from PTX code. It has been written for clarity of exposition to illustrate various CUDA programming principles, not with the goal of providing the most performant generic kernel for matrix multiplication. CUBLAS provides high-performance matrix multiplication.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

December 2006

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Scalar Product
This sample calculates scalar products of a given set of input vector pairs.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

November 2006

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Concurrent Kernels
This sample demonstrates the use of CUDA streams for concurrent execution of several kernels on devices of compute capability 2.0 or higher. Devices of compute capability 1.x will run the kernels sequentially.It also illustrates how to introduce dependencies between CUDA streams with the new cudaStreamWaitEvent function introduced in CUDA 3.2
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Aligned Types
A simple test, showing huge access speed gap between aligned and misaligned structures.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

October 2006

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. PTX Just-in-Time compilation
This sample trates how to use JIT compilation for PTX code.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

September 2006

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. DCT8x8
This sample demonstrates how Discrete Cosine Transform (DCT) for blocks of 8 by 8 pixels can be performed using CUDA: a naive implementation by definition and a more traditional approach used in many libraries. As opposed to implementing DCT in a fragment shader, CUDA allows for an easier and more efficient implementation.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

August 2006

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. 1D Discrete Haar Wavelet Decomposition
Discrete Haar wavelet decomposition for 1D signals with a length which is a power of 2.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

July 2006

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Eigenvalues
The computation of all or a subset of all eigenvalues is an important problem in Linear Algebra, statistics, physics, and many other fields. This sample demonstrates a parallel implementation of a bisection algorithm for the computation of all eigenvalues of a tridiagonal symmetric matrix of arbitrary size with CUDA.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

June 2006

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Fast Walsh Transform
Naturally(Hadamard)-ordered Fast Walsh Tranform for batched vectors of arbitrary eligible(power of two) lengths
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

May 2006

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. CUDA Histogram
This sample demonstrates efficient implementation of 64-bin and 256-bin histogram.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

April 2006

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Line of Sight
This sample is an implementation of a simple line-of-sight algorithm: Given a height map and a ray originating at some observation point, it computes all the points along the ray that are visible from the observation point. The implementation is based on the Thrust library (http://code.google.com/p/thrust/).
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

March 2006

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Matrix Transpose
This sample demonstrates Matrix Transpose. Different performance are shown to achieve high performance.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

February 2006

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Box Filter
Fast image box filter using CUDA with OpenGL rendering.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

January 2006

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Post-Process in OpenGL
This sample shows how to post-process an image rendered in OpenGL using CUDA.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

December 2005

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. CUDA Parallel Reduction
A parallel sum reduction that computes the sum of a large arrays of values. This sample demonstrates several important optimization strategies for 1:Data-Parallel Algorithms like reduction.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

November 2005

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. CUDA Parallel Prefix Sum (Scan)
This example demonstrates an efficient CUDA implementation of parallel prefix sum, also known as "scan". Given an array of numbers, scan computes a new array in which each element is the sum of all the elements before it in the input array.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

October 2005

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. DirectX Texture Compressor (DXTC)
High Quality DXT Compression using CUDA. This example shows how to implement an existing computationally-intensive CPU compression algorithm in parallel on the GPU, and obtain an order of magnitude performance improvement.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

September 2005

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Image denoising
This sample demonstrates two adaptive image denoising technqiues: KNN and NLM, based on computation of both geometric and color distance between texels. While both techniques are implemented in the DirectX SDK using shaders, massively speeded up variation of the latter techique, taking advantage of shared memory, is implemented in addition to DirectX counterparts.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

August 2005

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Sobel Filter
This sample implements the Sobel edge detection filter for 8-bit monochrome images.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

July 2005

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Recursive Gaussian Filter
This sample implements a Gaussian blur using Deriche's recursive method. The advantage of this method is that the execution time is independent of the filter width.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

June 2005

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. CUDA Video Decoder D3D9 API
This sample demonstrates how to efficiently use the CUDA Video Decoder API to decode MPEG-2, VC-1, or H.264 sources. YUV to RGB conversion of video is accomplished with CUDA kernel. The output result is rendered to a D3D9 surface. The decoded video is not displayed on the screen, but with -displayvideo at the command line parameter, the video output can be seen. Requires a Direct3D capable device and Compute Capability 1.1 or higher.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

May 2005

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. CUDA Video Decoder GL API
This sample demonstrates how to efficiently use the CUDA Video Decoder API to decode video sources based on MPEG-2, VC-1, and H.264. YUV to RGB conversion of video is accomplished with CUDA kernel. The output result is rendered to a OpenGL surface. The decoded video is black, but can be enabled with -displayvideo added to the command line. Requires Compute Capability 1.1 or higher.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

April 2005

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Bicubic Texture Filtering
This sample demonstrates how to efficiently implement bicubic Texture filtering in CUDA.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

March 2005

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Fluids (Direct3D Version)
An example of fluid simulation using CUDA and CUFFT, with Direct3D 9 rendering. A Direct3D Capable device is required.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

February 2005

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Fluids (OpenGL Version)
An example of fluid simulation using CUDA and CUFFT, with OpenGL rendering.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

January 2005

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. CUDA FFT Ocean Simulation
This sample simulates an Ocean heightfield using CUFFT and renders the result using OpenGL.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

December 2004

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. FFT-Based 2D Convolution
This sample demonstrates how 2D convolutions with very large kernel sizes can be efficiently implemented using FFT transformations.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

November 2004

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. CUDA Separable Convolution
This sample implements a separable convolution filter of a 2D signal with a gaussian kernel.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

October 2004

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Texture-based Separable Convolution
Texture-based implementation of a separable 2D convolution with a gaussian kernel. Used for performance comparison against convolutionSeparable.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

August 2004

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. threadFenceReduction
This sample shows how to perform a reduction operation on an array of values using the thread Fence intrinsic. to produce a single value in a single kernel (as opposed to two or more kernel calls as shown in the "reduction" SDK sample). Single-pass reduction requires global atomic instructions (Compute Capability 1.1 or later) and the _threadfence() intrinsic (CUDA 2.2 or later).
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

July 2004

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. CUDA Radix Sort using the Thrust Library
This sample demonstrates a very fast and efficient parallel radix sort uses Thrust library (http://code.google.com/p/thrust/).. The included RadixSort class can sort either key-value pairs (with float or unsigned integer keys) or keys only.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

June 2004

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. CUDA Sorting Networks
This sample implements bitonic sort and odd-even merge sort (also known as Batcher's sort), algorithms belonging to the class of sorting networks. While generally subefficient on large sequences compared to algorithms with better asymptotic algorithmic complexity (i.e. merge sort or radix sort), may be the algorithms of choice for sorting batches of short- to mid-sized (key, value) array pairs. Refer to the excellent tutorial by H. W. Lang http://www.iti.fh-flensburg.de/lang/algorithmen/sortieren/networks/indexen.htm
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Merge Sort
This sample implements a merge sort (also known as Batcher's sort), algorithms belonging to the class of sorting networks. While generally subefficient on large sequences compared to algorithms with better asymptotic algorithmic complexity (i.e. merge sort or radix sort), may be the algorithms of choice for sorting batches of short- to mid-sized (key, value) array pairs. Refer to the excellent tutorial by H. W. Lang http://www.iti.fh-flensburg.de/lang/algorithmen/sortieren/networks/indexen.htm
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

April 2004

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Binomial Option Pricing
This sample evaluates fair call price for a given set of European options under binomial model. This sample will also take advantage of double precision if a GTX 200 class GPU is present.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

March 2004

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Black-Scholes Option Pricing
This sample evaluates fair call and put prices for a given set of European options by Black-Scholes formula.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

February 2004

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Niederreiter Quasirandom Sequence Generator
This sample implements Niederreiter Quasirandom Sequence Generator and Inverse Cumulative Normal Distribution function for Standart Normal Distribution generation.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

January 2004

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Monte Carlo Option Pricing
This sample evaluates fair call price for a given set of European options using Monte Carlo approach. This sample use double precision hardware if a GTX 200 class GPU is present.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

December 2003

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Monte Carlo Option Pricing with Multi-GPU support
This sample evaluates fair call price for a given set of European options using the Monte Carlo approach, taking advantage of all CUDA-capable GPUs installed in the system. This sample use double precision hardware if a GTX 200 class GPU is present. The sample also takes advantage of CUDA 4.0 capability to supporting using a single CPU thread to control multiple GPUs
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

October 2003

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Mandelbrot
This sample uses CUDA to compute and display the Mandelbrot or Julia sets interactively. It also illustrates the use of "double single" arithmetic to improve precision when zooming a long way into the pattern. This sample use double precision hardware if a GT200 class GPU is present. Thanks to Mark Granger of NewTek who submitted this sample to the SDK!
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

September 2003

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Particles
This sample uses CUDA to simulate and visualize a large set of particles and their physical interaction. It implements a uniform grid data structure using either atomic operations or a fast radix sort from the Thrust library
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

August 2003

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Marching Cubes Isosurfaces
This sample extracts a geometric isosurface from a volume dataset using the marching cubes algorithm. It uses the scan (prefix sum) function from the Thrust library to perform stream compaction.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

July 2003

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Volume Rendering with 3D Textures
This sample demonstrates basic volume rendering using 3D Textures.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

June 2003

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. CUDA N-Body Simulation
This sample demonstrates efficient all-pairs simulation of a gravitational n-body simulation in CUDA. This sample accompanies the GPU Gems 3 chapter "Fast N-Body Simulation with CUDA". With CUDA 4.0, the nBody sample has been updated to take advantage of new features to easily scale the n-body simulation across multiple GPUs in a single PC. Adding “-numdevices=” to the command line option will cause the sample to use N devices (if available) for simulation. In this mode, the position and velocity data for all bodies are read from system memory using “zero copy” rather than from device memory. For a small number of devices (4 or fewer) and a large enough number of bodies, bandwidth is not a bottleneck so we can achieve strong scaling across these devices.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

May 2003

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Smoke Particles
Smoke simulation with volumetric shadows using half-angle slicing technique. Uses CUDA for procedural simulation, Thrust Library for sorting algorithms, and OpenGL for graphics rendering.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

April 2003

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Sobol Quasirandom Number Generator
This sample implements Sobol Quasirandom Sequence Generator.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac

February 2001

For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Matrix Multiplication (CUDA Driver API Version)
This sample implements matrix multiplication and uses the new CUDA 4.0 kernel launch Driver API. It has been written for clarity of exposition to illustrate various CUDA programming principles, not with the goal of providing the most performant generic kernel for matrix multiplication. CUBLAS provides high-performance matrix multiplication.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac