
256bin Histogram
This sample demonstrates efficient implementation of 256bin histogram. 

or later
Whitepaper
Download  Windows
Download  Linux



64bin Histogram
This sample demonstrates efficient implementation of 64bin histogram. 

or later
Whitepaper
Download  Windows
Download  Linux



Separable Convolution
This sample implements a separable convolution filter of a 2D signal with a gaussian kernel. 

or later
Whitepaper
Download  Windows
Download  Linux



Texturebased Separable Convolution
Texturebased implementation of a separable 2D convolution with a gaussian kernel. Used for performance comparison against convolutionSeparable. 

or later
Download  Windows
Download  Linux



Bitonic Sort
Bitonic sort is a very simple parallel sorting algorithm that is very
efficient when sorting a small number of elements:
http://citeseer.ist.psu.edu/blelloch98experimental.html
This implementation is based on:
http://www.toolsofcomputing.com/tc/CS/Sorts/bitonic_sort.htm


or later
Download  Windows
Download  Linux



NBody Simulation
This sample demonstrates efficient allpairs simulation of a gravitational nbody simulation in CUDA. This sample accompanies the GPU Gems 3 chapter "Fast NBody Simulation with CUDA". 

or later
Whitepaper
Download  Windows
Download  Linux



Parallel Reduction
A parallel sum reduction that computes the sum of large arrays of values. This sample demonstrates several important optimization stratezies for parallel algorithms like reduction. 

or later
Whitepaper
Download  Windows
Download  Linux



Mandelbrot
This sample uses CUDA to compute and display the Mandelbrot set. 

or later
Download  Windows
Download  Linux



Fast Walsh Transform
Naturally(Hadamard)ordered Fast Walsh Tranform for batched vectors of arbitrary eligible(power of two) lengths 

or later
Download  Windows
Download  Linux



Scan
This example demonstrates an efficient CUDA implementation of parallel prefix sum, also known as "scan". Given an array of numbers, scan computes a new array in which each element is the sum of all the elements before it in the input array. 

or later
Whitepaper
Download  Windows
Download  Linux



Scan of Large Arrays
This example demonstrates an efficient CUDA implementation of parallel prefix sum (also known as "scan") for arbitrarysized arrays. Given an array of numbers, scan computes a new array in which each element is the sum of all the elements before it in the input array. 

or later
Whitepaper
Download  Windows
Download  Linux

