NVIDIA CUDA SDK - Linear Algebra

The CUDA Developer SDK provides examples with source code, utilities, and white papers to help you get started writing software with CUDA. The SDK includes dozens of code samples covering a wide range of applications including:

Simple techniques such as C++ code integration and efficient loading of custom datatypes
How-To examples covering CUDA BLAS and FFT libraries, texture fetching in CUDA, and CUDA interoperation with the OpenGL and Direct3D graphics APIS
Linear algebra primitives such as matrix transpose and matrix-matrix multiplication
Data-parallel algorithms such as parallel prefix sum of large arrays
Performance: profiling using timers and bandwidth tests
Advanced application examples such as image convolution, Black-Scholes options pricing and binomial options pricing

Refer to the following READMEs for more information ( Linux , Windows )

This code is released free of charge for use in derivative works, whether academic, commercial, or personal. (Full License)

The NVIDIA CUDA Toolkit is required to run and compile code samples. Please obtain the CUDA Toolkit here

Quick Links:

Data-Parallel Algorithms	Computational Finance
Performance Strategies	Linear Algebra
Physically-Based Simulation	CUDA Basic Topics
Graphics Interop	Image/Video Processing and Data Compression
CUDA Advanced Topics


FFT Ocean Simulation This sample simulates an Ocean heightfield using CUFFT and renders the result using OpenGL.		or later Download - Windows Download - Linux


Separable Convolution This sample implements a separable convolution filter of a 2D signal with a gaussian kernel.		or later Whitepaper Download - Windows Download - Linux


Texture-based Separable Convolution Texture-based implementation of a separable 2D convolution with a gaussian kernel. Used for performance comparison against convolutionSeparable.		or later Download - Windows Download - Linux


FFT-Based 2D Convolution This sample demonstrates how 2D convolutions with very large kernel sizes can be efficiently implemented using FFT transformations.		or later Whitepaper Download - Windows Download - Linux


Matrix Transpose Efficient matrix transpose.		or later Download - Windows Download - Linux


Scalar Product This sample calculates scalar products of a given set of input vector pairs.		or later Download - Windows Download - Linux


Fast Walsh Transform Naturally(Hadamard)-ordered Fast Walsh Tranform for batched vectors of arbitrary eligible(power of two) lengths		or later Download - Windows Download - Linux


Eigenvalues The computation of all or a subset of all eigenvalues is an important problem in linear algebra, statistics, physics, and many other fields. This sample demonstrates a parallel implementation of a bisection algorithm for the computation of all eigenvalues of a tridiagonal symmetric matrix of arbitrary size with CUDA.		or later Whitepaper Download - Windows Download - Linux


Matrix Multiplication (Driver Version) This sample implements matrix multiplication using the CUDA driver API. It has been written for clarity of exposition to illustrate various CUDA programming principles, not with the goal of providing the most performant generic kernel for matrix multiplication. CUBLAS provides high-performance matrix multiplication.		or later Download - Windows Download - Linux


Simple CUBLAS Example of using CUBLAS.		or later Download - Windows Download - Linux


Matrix Multiplication This sample implements matrix multiplication and is exactly the same as Chapter 6 of the programming guide. It has been written for clarity of exposition to illustrate various CUDA programming principles, not with the goal of providing the most performant generic kernel for matrix multiplication. CUBLAS provides high-performance matrix multiplication.		or later Download - Windows Download - Linux