SLI Zone
NVIDIA.com Developer Home

NVIDIA CUDA SDK x64 - CUDA Advanced Topics

The CUDA Developer SDK provides examples with source code, utilities, and white papers to help you get started writing software with CUDA. The SDK includes dozens of code samples covering a wide range of applications including:

  • Simple techniques such as C++ code integration and efficient loading of custom datatypes
  • How-To examples covering CUDA BLAS and FFT libraries, texture fetching in CUDA, and CUDA interoperation with the OpenGL and Direct3D graphics APIS
  • Linear algebra primitives such as matrix transpose and matrix-matrix multiplication
  • Data-parallel algorithms such as parallel prefix sum of large arrays
  • Performance: profiling using timers and bandwidth tests
  • Advanced application examples such as image convolution, Black-Scholes options pricing and binomial options pricing
Refer to the following READMEs for more information ( Linux , Windows )

This code is released free of charge for use in derivative works, whether academic, commercial, or personal. (Full License)

The NVIDIA CUDA Toolkit is required to run and compile code samples. Please obtain the CUDA Toolkit here

Quick Links:
Data-Parallel Algorithms Computational Finance
Performance Strategies Linear Algebra
Physically-Based Simulation CUDA Basic Topics
Graphics Interop Image/Video Processing and Data Compression
CUDA Advanced Topics


Monte-Carlo Option Pricing with multi-GPU support For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample evaluates fair call price for a given set of European options using Monte-Carlo approach, taking advantage of all CUDA-capable GPUs installed in the system.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


FFT Ocean Simulation For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample simulates an Ocean heightfield using CUFFT and renders the result using OpenGL.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


256-bin Histogram For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample demonstrates efficient implementation of 256-bin histogram.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


64-bin Histogram For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample demonstrates efficient implementation of 64-bin histogram.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


FFT-Based 2D Convolution For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample demonstrates how 2D convolutions with very large kernel sizes can be efficiently implemented using FFT transformations.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


MersenneTwister For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample implements Mersenne Twister random number generator and Cartesian Box-Muller transformation on the GPU.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


Monte-Carlo Option Pricing For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample evaluates fair call price for a given set of European options using Monte-Carlo approach.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


Binomial Option Pricing For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample evaluates fair call price for a given set of European options under binomial model.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


Image denoising For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample demonstrates two adaptive image denoising technqiues: KNN and NLM, based on computation of both geometric and color distance between texels. While both techniques are implemented in the DirectX SDK using shaders, massively speeded up variation of the latter techique, taking advantage of shared memory, is implemented in addition to DirectX counterparts.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


DirectX Texture Compressor (DXTC) For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

High Quality DXT Compression using CUDA. This example shows how to implement an existing computationally-intensive CPU compression algorithm in parallel on the GPU, and obtain an order of magnitude performance improvement.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


Eigenvalues For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

The computation of all or a subset of all eigenvalues is an important problem in linear algebra, statistics, physics, and many other fields. This sample demonstrates a parallel implementation of a bisection algorithm for the computation of all eigenvalues of a tridiagonal symmetric matrix of arbitrary size with CUDA.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


Sobel Filter For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample implements the Sobel edge detection filter for 8-bit monochrome images.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


Scan For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This example demonstrates an efficient CUDA implementation of parallel prefix sum, also known as "scan". Given an array of numbers, scan computes a new array in which each element is the sum of all the elements before it in the input array.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


Scan of Large Arrays For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This example demonstrates an efficient CUDA implementation of parallel prefix sum (also known as "scan") for arbitrary-sized arrays. Given an array of numbers, scan computes a new array in which each element is the sum of all the elements before it in the input array.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


N-Body Simulation For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

This sample demonstrates efficient all-pairs simulation of a gravitational n-body simulation in CUDA. This sample accompanies the GPU Gems 3 chapter "Fast N-Body Simulation with CUDA".
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


Parallel Reduction For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

A parallel sum reduction that computes the sum of large arrays of values. This sample demonstrates several important optimization stratezies for parallel algorithms like reduction.
  Minimum Required GPU
Minimum Required GPUor later



Whitepaper
Download - Windows
Download - Linux


Fluids (OpenGL Version) For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

An example of fluid simulation using CUDA and CUFFT, with OpenGL rendering.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


Fluids (Direct3D Version) For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

An example of fluid simulation using CUDA and CUFFT, with Direct3D 9 rendering.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows


Fast Walsh Transform For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Naturally(Hadamard)-ordered Fast Walsh Tranform for batched vectors of arbitrary eligible(power of two) lengths
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux


1D Discrete Haar Wavelet Decomposition For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon.

Discrete Haar wavelet decomposition for 1D signals with a length which is a power of 2.
  Minimum Required GPU
Minimum Required GPUor later




Download - Windows
Download - Linux

Last Update: 11/12/2007
NVPerfHUD 4