The CUDA Developer SDK provides examples with source code, utilities, and white papers to help you get started writing software with CUDA. The SDK includes dozens of code samples covering a wide range of applications including:
This code is released free of charge for use in derivative works, whether academic, commercial, or personal. (Full License)
The NVIDIA CUDA Toolkit is required to run and compile code samples. Please obtain the CUDA Toolkit here
|
Monte-Carlo Option Pricing with multi-GPU support
This sample evaluates fair call price for a given set of European options using Monte-Carlo approach, taking advantage of all CUDA-capable GPUs installed in the system. |
|
or later
Download - Windows
Download - Linux
|
|
|
FFT Ocean Simulation
This sample simulates an Ocean heightfield using CUFFT and renders the result using OpenGL. |
|
or later
Download - Windows
Download - Linux
|
|
|
256-bin Histogram
This sample demonstrates efficient implementation of 256-bin histogram. |
|
or later
Whitepaper
Download - Windows
Download - Linux
|
|
|
64-bin Histogram
This sample demonstrates efficient implementation of 64-bin histogram. |
|
or later
Whitepaper
Download - Windows
Download - Linux
|
|
|
FFT-Based 2D Convolution
This sample demonstrates how 2D convolutions with very large kernel sizes can be efficiently implemented using FFT transformations. |
|
or later
Whitepaper
Download - Windows
Download - Linux
|
|
|
MersenneTwister
This sample implements Mersenne Twister random number generator and Cartesian Box-Muller transformation on the GPU. |
|
or later
Whitepaper
Download - Windows
Download - Linux
|
|
|
Monte-Carlo Option Pricing
This sample evaluates fair call price for a given set of European options using Monte-Carlo approach. |
|
or later
Whitepaper
Download - Windows
Download - Linux
|
|
|
Binomial Option Pricing
This sample evaluates fair call price for a given set of European options under binomial model. |
|
or later
Whitepaper
Download - Windows
Download - Linux
|
|
|
Image denoising
This sample demonstrates two adaptive image denoising technqiues: KNN and NLM, based on computation of both geometric and color distance between texels. While both techniques are implemented in the DirectX SDK using shaders, massively speeded up variation of the latter techique, taking advantage of shared memory, is implemented in addition to DirectX counterparts. |
|
or later
Whitepaper
Download - Windows
Download - Linux
|
|
|
DirectX Texture Compressor (DXTC)
High Quality DXT Compression using CUDA.
This example shows how to implement an existing computationally-intensive CPU compression algorithm in parallel on the GPU, and obtain an order of magnitude performance improvement. |
|
or later
Whitepaper
Download - Windows
Download - Linux
|
|
|
Eigenvalues
The computation of all or a subset of all eigenvalues is an important problem in linear algebra, statistics, physics, and many other fields. This sample demonstrates a parallel implementation of a bisection algorithm for the computation of all eigenvalues of a
tridiagonal symmetric matrix of arbitrary size with CUDA. |
|
or later
Whitepaper
Download - Windows
Download - Linux
|
|
|
Sobel Filter
This sample implements the Sobel edge detection filter for 8-bit monochrome images. |
|
or later
Download - Windows
Download - Linux
|
|
|
Scan
This example demonstrates an efficient CUDA implementation of parallel prefix sum, also known as "scan". Given an array of numbers, scan computes a new array in which each element is the sum of all the elements before it in the input array. |
|
or later
Whitepaper
Download - Windows
Download - Linux
|
|
|
Scan of Large Arrays
This example demonstrates an efficient CUDA implementation of parallel prefix sum (also known as "scan") for arbitrary-sized arrays. Given an array of numbers, scan computes a new array in which each element is the sum of all the elements before it in the input array. |
|
or later
Whitepaper
Download - Windows
Download - Linux
|
|
|
N-Body Simulation
This sample demonstrates efficient all-pairs simulation of a gravitational n-body simulation in CUDA. This sample accompanies the GPU Gems 3 chapter "Fast N-Body Simulation with CUDA". |
|
or later
Whitepaper
Download - Windows
Download - Linux
|
|
|
Parallel Reduction
A parallel sum reduction that computes the sum of large arrays of values. This sample demonstrates several important optimization stratezies for parallel algorithms like reduction. |
|
or later
Whitepaper
Download - Windows
Download - Linux
|
|
|
Fluids (OpenGL Version)
An example of fluid simulation using CUDA and CUFFT, with OpenGL rendering. |
|
or later
Download - Windows
Download - Linux
|
|
|
Fluids (Direct3D Version)
An example of fluid simulation using CUDA and CUFFT, with Direct3D 9 rendering. |
|
or later
Download - Windows
|
|
|
Fast Walsh Transform
Naturally(Hadamard)-ordered Fast Walsh Tranform for batched vectors of arbitrary eligible(power of two) lengths |
|
or later
Download - Windows
Download - Linux
|
|
|
1D Discrete Haar Wavelet Decomposition
Discrete Haar wavelet decomposition for 1D signals with a length which is a power of 2. |
|
or later
Download - Windows
Download - Linux
|
|