
MonteCarlo Option Pricing with multiGPU support
This sample evaluates fair call price for a given set of European options using MonteCarlo approach, taking advantage of all CUDAcapable GPUs installed in the system. 

or later
Download  Windows
Download  Linux



FFT Ocean Simulation
This sample simulates an Ocean heightfield using CUFFT and renders the result using OpenGL. 

or later
Download  Windows
Download  Linux



256bin Histogram
This sample demonstrates efficient implementation of 256bin histogram. 

or later
Whitepaper
Download  Windows
Download  Linux



64bin Histogram
This sample demonstrates efficient implementation of 64bin histogram. 

or later
Whitepaper
Download  Windows
Download  Linux



Separable Convolution
This sample implements a separable convolution filter of a 2D signal with a gaussian kernel. 

or later
Whitepaper
Download  Windows
Download  Linux



Texturebased Separable Convolution
Texturebased implementation of a separable 2D convolution with a gaussian kernel. Used for performance comparison against convolutionSeparable. 

or later
Download  Windows
Download  Linux



FFTBased 2D Convolution
This sample demonstrates how 2D convolutions with very large kernel sizes can be efficiently implemented using FFT transformations. 

or later
Whitepaper
Download  Windows
Download  Linux



MersenneTwister
This sample implements Mersenne Twister random number generator and Cartesian BoxMuller transformation on the GPU. 

or later
Whitepaper
Download  Windows
Download  Linux



MonteCarlo Option Pricing
This sample evaluates fair call price for a given set of European options using MonteCarlo approach. 

or later
Whitepaper
Download  Windows
Download  Linux



BlackScholes Option Pricing
This sample evaluates fair call and put prices for a given set of European options by BlackScholes formula. 

or later
Whitepaper
Download  Windows
Download  Linux



Binomial Option Pricing
This sample evaluates fair call price for a given set of European options under binomial model. 

or later
Whitepaper
Download  Windows
Download  Linux



Image denoising
This sample demonstrates two adaptive image denoising technqiues: KNN and NLM, based on computation of both geometric and color distance between texels. While both techniques are implemented in the DirectX SDK using shaders, massively speeded up variation of the latter techique, taking advantage of shared memory, is implemented in addition to DirectX counterparts. 

or later
Whitepaper
Download  Windows
Download  Linux



DirectX Texture Compressor (DXTC)
High Quality DXT Compression using CUDA.
This example shows how to implement an existing computationallyintensive CPU compression algorithm in parallel on the GPU, and obtain an order of magnitude performance improvement. 

or later
Whitepaper
Download  Windows
Download  Linux



PostProcess in OpenGL
This sample shows how to postprocess an image rendered in OpenGL using CUDA. 

or later
Download  Windows
Download  Linux



Box Filter
Fast image box filter using CUDA with OpenGL rendering. 

or later
Download  Windows
Download  Linux



Bitonic Sort
Bitonic sort is a very simple parallel sorting algorithm that is very
efficient when sorting a small number of elements:
http://citeseer.ist.psu.edu/blelloch98experimental.html
This implementation is based on:
http://www.toolsofcomputing.com/tc/CS/Sorts/bitonic_sort.htm


or later
Download  Windows
Download  Linux



Matrix Transpose
Efficient matrix transpose. 

or later
Download  Windows
Download  Linux



Scalar Product
This sample calculates scalar products of a given set of input vector pairs. 

or later
Download  Windows
Download  Linux



Clock
This example shows how to use the clock function to measure the performance of kernel accurately. 

or later
Download  Windows
Download  Linux



MultiGPU
This application demonstrates how to use the CUDA API to use multiple GPUs.


or later
Download  Windows
Download  Linux



Aligned Types
A simple test, showing huge access speed gap between aligned and misaligned structures. 

or later
Download  Windows
Download  Linux



NBody Simulation
This sample demonstrates efficient allpairs simulation of a gravitational nbody simulation in CUDA. This sample accompanies the GPU Gems 3 chapter "Fast NBody Simulation with CUDA". 

or later
Whitepaper
Download  Windows
Download  Linux



Parallel Reduction
A parallel sum reduction that computes the sum of large arrays of values. This sample demonstrates several important optimization stratezies for parallel algorithms like reduction. 

or later
Whitepaper
Download  Windows
Download  Linux



asyncAPI
This sample uses CUDA streams and events to overlap execution on CPU and GPU. 

or later
Download  Windows
Download  Linux



cudaOpenMP
This sample shows how to use OpenMP API to write an application for multiple GPUs. 

or later
Download  Windows



simpleStreams
This sample uses CUDA streams to overlap kernel executions with memcopies between the device and the host. 

or later
Download  Windows
Download  Linux



Mandelbrot
This sample uses CUDA to compute and display the Mandelbrot set. 

or later
Download  Windows
Download  Linux



Particles
This sample uses CUDA to simulates and visualizes a large set of particles and their physical interaction. 

or later
Whitepaper
Download  Windows
Download  Linux



Simple Atomics
A simple demonstration of global memory atomic instructions. 

or later
Download  Windows
Download  Linux



Fast Walsh Transform
Naturally(Hadamard)ordered Fast Walsh Tranform for batched vectors of arbitrary eligible(power of two) lengths 

or later
Download  Windows
Download  Linux



Eigenvalues
The computation of all or a subset of all eigenvalues is an important problem in linear algebra, statistics, physics, and many other fields. This sample demonstrates a parallel implementation of a bisection algorithm for the computation of all eigenvalues of a
tridiagonal symmetric matrix of arbitrary size with CUDA. 

or later
Whitepaper
Download  Windows
Download  Linux



Sobel Filter
This sample implements the Sobel edge detection filter for 8bit monochrome images. 

or later
Download  Windows
Download  Linux



Device Query
This sample enumerates the properties of the CUDA devices present in the system. 

or later
Download  Windows
Download  Linux



Simple Templates
This sample is a templatized version of the template project. It also shows how to correctly templatize dynamically allocated shared memory arrays. 

or later
Download  Windows
Download  Linux



Bandwidth Test
This is a simple test program to measure the memcopy bandwidth of the GPU. It currently is capable of measuring device to device copy bandwidth, host to device copy bandwidth for pageable and pagelocked memory, and device to host copy bandwidth for pageable and pagelocked memory. 

or later
Download  Windows
Download  Linux



Scan
This example demonstrates an efficient CUDA implementation of parallel prefix sum, also known as "scan". Given an array of numbers, scan computes a new array in which each element is the sum of all the elements before it in the input array. 

or later
Whitepaper
Download  Windows
Download  Linux



Scan of Large Arrays
This example demonstrates an efficient CUDA implementation of parallel prefix sum (also known as "scan") for arbitrarysized arrays. Given an array of numbers, scan computes a new array in which each element is the sum of all the elements before it in the input array. 

or later
Whitepaper
Download  Windows
Download  Linux



Simple Texture (Driver Version)
Simple example that demonstrates use of textures in CUDA using the driver API. 

or later
Download  Windows
Download  Linux



Fluids (OpenGL Version)
An example of fluid simulation using CUDA and CUFFT, with OpenGL rendering. 

or later
Download  Windows
Download  Linux



Fluids (Direct3D Version)
An example of fluid simulation using CUDA and CUFFT, with Direct3D 9 rendering. 

or later
Download  Windows



Simple Texture
Simple example that demonstrates use of textures in CUDA. 

or later
Download  Windows
Download  Linux



Matrix Multiplication (Driver Version)
This sample implements matrix multiplication using the CUDA driver API.
It has been written for clarity of exposition to illustrate various CUDA programming principles, not with the goal of providing the most performant generic kernel for matrix multiplication.
CUBLAS provides highperformance matrix multiplication. 

or later
Download  Windows
Download  Linux



Template
A trivial template project that can be used as a starting point to create new CUDA projects. 

or later
Download  Windows
Download  Linux



Simple CUFFT
Example of using CUFFT. In this example, CUFFT is used to compute the 1Dconvolution of some signal with some filter by transforming both into frequency domain, multiplying them together, and transforming the signal back to time domain. 

or later
Download  Windows
Download  Linux



Simple Direct3D
Simple program which demonstrates interoperability between CUDA and Direct3D. The program modifies vertex positions with CUDA and uses Direct3D to render the geometry. 

or later
Download  Windows



Simple OpenGL
Simple program which demonstrates interoperability between CUDA and OpenGL. The program modifies vertex positions with CUDA and uses OpenGL to render the geometry. 

or later
Download  Windows
Download  Linux



Simple CUBLAS
Example of using CUBLAS. 

or later
Download  Windows
Download  Linux



Matrix Multiplication
This sample implements matrix multiplication and is exactly the same as Chapter 6 of the programming guide.
It has been written for clarity of exposition to illustrate various CUDA programming principles, not with the goal of providing the most performant generic kernel for matrix multiplication.
CUBLAS provides highperformance matrix multiplication. 

or later
Download  Windows
Download  Linux



C++ Integration
This example demonstrates how to integrate CUDA into an existing C++ application, i.e. the CUDA entry point on host side is only a function which is called from C++ code and only the file containing this function is compiled with nvcc. It also demonstrates that vector types can be used from cpp. 

or later
Download  Windows
Download  Linux



1D Discrete Haar Wavelet Decomposition
Discrete Haar wavelet decomposition for 1D signals with a length which is a power of 2. 

or later
Download  Windows
Download  Linux

